BusinessObjects Board

Replacing forumtopics.com/busobj/ links to internal links

Hello

Another big task after the migration is to replace the links to forumtopics.com/busobj/… inside each post. For example here:

You can see two links to 2 differents threads. The good news is that during the migration process we have kept the same id for content. So the thread 70866 is the same is old BOB and current Discourse database. The bad news is that there is probably a lot of links like that.

I suggest to run this project like that:

  • identify the post where we need to update the link
  • define a query to update the links (it’s a postgresql database, we can use some regex function I think)
  • run the query on a small amount of posts
  • run on all the database (take care to make sure the content will be transformed from raw to cooked)

I will first check how many post are concerned. Feel free to help me on the SQL query definition !

Identify the posts

I am using this kind of query on the database:

select 
regexp_matches(raw, 'http://(?:www\.|)forumtopics.com/busobj/([\w\d\.?=]*?)(?:\d*?)', 'g'), 
count(1) 
from posts 
group by 1 
order by 2 desc;

Here is a table to list my findings

regexp_matches count
{viewtopic.php?t=} 16084
{viewtopic.php?p=} 4293
{faq.php?mode=rules} 2193
{search.php} 1123
{faq.php?mode=bbcode} 810
{viewforum.php?f=} 354
{images} 262
{faq.php?mode=ask} 192
{shop.php} 63
{profile.php?mode=viewprofile} 59
{files} 47
{printview.php?t=} 43
{templates} 42
{repository} 37
{search.php?mode=results} 28
{index.php} 12
{contact.php} 12
{download.php?id=} 10
{downloads} 10

(update: change to a regex function more powerful)

Some points:

  • If there is less than 10 occurences then I will do it manually probably. Let’s focus on top of the list.
  • I understand that viewtopic.php?t= is linking to a topic id I am fine with that. But viewtopic.php?p= is linking to post id ?? :sweat_smile:

first run on viewtopic.php?t

I have ran this query:

set raw = regexp_replace(raw, 'http:\/\/(?:www\.|)forumtopics\.com\/busobj\/viewtopic\.php\?t=(\d{4,6})(&[\w=\+;&#]+|)([\n\r\) ,\[\]])','https://bobj-board.org/t/\1\3','g'), 
baked_version = null 
where raw ~ 'http:\/\/(?:www\.|)forumtopics\.com\/busobj\/viewtopic\.php\?t=(\d{4,6})(&[\w=\+;&#]+|)[\n\r\) ,\[\]]' 
and topic_id < 255520;

To update a big part of the first row (14684 post updated).

Now my initial table look that:

regexp_matches count
{viewtopic.php?p=} 4440
{faq.php?mode=rules} 2174
{search.php} 1135
{faq.php?mode=bbcode} 829
{viewforum.php?f=} 356
{viewtopic.php?t=} 296
{images} 286
{faq.php?mode=ask} 194
{profile.php?mode=viewprofile} 91
{shop.php} 66
{files} 49
{printview.php?t=} 43
{templates} 42
{repository} 37
{search.php?mode=results} 26
{index.php} 12
{contact.php} 12
{downloads} 10
{download.php?id=} 10
{mybob.php} 9
{faq.php} 8
{temp} 7
{search} 7
{""} 7
{conf} 7
{profile.php?mode=editprofile} 6
{menu} 6
{faq.php?mode=privacy} 4
{language} 4
{groupcp.php?g=} 2
{viewtopic.php?} 2
{faq.php?sid=8d1e188e28308dae542e815802b} 1
{search.php?} 1
{search.php?search_id=} 1

It’s better !!

I will focus on following rows now,

Second run on viewtopic.php?t= and work on bottom of the list

I have run some more queries like the previous one to reduce the number of broken link to forumtopics for viewtopic.php?t=. This was the easiest part because the topic id value is consistent between phpbb and discourse.

I have also work on the bottom of the list to remove some more exotic links (download.php contact.php mybob.php profile.php).

Now my initial query show only 21 rows:

regexp_matches count
{viewtopic.php?p=} 4440
{faq.php?mode=rules} 2173
{search.php} 1134
{faq.php?mode=bbcode} 829
{viewforum.php?f=} 352
{images} 249
{faq.php?mode=ask} 194
{profile.php?mode=viewprofile} 91
{viewtopic.php?t=} 82
{shop.php} 66
{files} 43
{printview.php?t=} 43
{repository} 37
{search.php?mode=results} 26
{index.php} 12
{faq.php} 8
{search} 7
{conf} 7
{""} 7
{viewtopic.php?} 2
{search.php?} 1

But now I want to focus on viewtopic.php?p=. This is when there is a link to a particular post, not a topic. I think I will be able to link to same level, but my goal is to link at least on the right topic.

For that I need to convert the original post id to a topic id. It’s a bit more complex and the regexp_replace function provided by postgresql is not enough for this work.

work on faq.php

Running some queries to replace old url to new TOS / FAQ:

Query details
select                                                                                        
regexp_matches(raw, 'http:\/\/(?:www\.|)forumtopics\.com\/busobj\/faq\.php\?mode=(\w+)#(\d+)', 'g'), 
count(1) 
from posts 
group by 1 
order by 2 desc;


update posts
set raw = replace(raw, 'https://bobj-board.org/tos#12', 'https://bobj-board.org/tos#heading--cross-posting'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#12%';

update posts
set raw = replace(raw, 'https://bobj-board.org/faq#editing#5', 'https://bobj-board.org/faq#editing'), 
baked_version = null 
where raw like '%https://bobj-board.org/faq#editing#5%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#14', 'https://bobj-board.org/tos#heading--bumping'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#14%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#17', 'https://bobj-board.org/tos#heading--copyright'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#17%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#16', 'https://bobj-board.org/tos#heading--dont-spam'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#16%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#15', 'https://bobj-board.org/tos#heading--dont-spam'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#15%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#13', 'https://bobj-board.org/tos#heading--me-too'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#13%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#35', 'https://bobj-board.org/faq#attachments'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#35%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#1', 'https://bobj-board.org/tos#heading--be-polite'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#1%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#34', 'https://bobj-board.org/faq#attachments'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#34%';




update posts
set raw = replace(raw, 'https://bobj-board.org/tos#26', 'https://bobj-board.org/faq#attachments'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#26%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#28', 'https://bobj-board.org/faq#attachments'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#28%';


update posts
set raw = replace(raw, 'https://bobj-board.org/tos#25', 'https://bobj-board.org/faq#moderators'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#25%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#2', 'https://bobj-board.org/tos#heading--dont-advertise'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#2%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#4', 'https://bobj-board.org/tos#heading--dont-advertise'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#4%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#7', 'https://bobj-board.org/tos#heading--dont-advertise'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#7%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#6', 'https://bobj-board.org/tos#heading--dont-advertise'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#6%';

update posts
set raw = replace(raw, 'https://bobj-board.org/tos#3', 'https://bobj-board.org/tos#heading--dont-advertise'), 
baked_version = null 
where raw like '%https://bobj-board.org/tos#3%';



update posts
set raw = replace(raw, 'https://bobj-board.org/faq#ask-smart#7', 'https://bobj-board.org/faq#ask-smart'), 
baked_version = null 
where raw like '%https://bobj-board.org/faq#ask-smart#7%';

update posts
set raw = replace(raw, 'https://bobj-board.org/faq#ask-smart#2', 'https://bobj-board.org/faq#ask-smart'), 
baked_version = null 
where raw like '%https://bobj-board.org/faq#ask-smart#2%';

update posts
set raw = replace(raw, 'https://bobj-board.org/faq#ask-smart#10', 'https://bobj-board.org/faq#ask-smart'), 
baked_version = null 
where raw like '%https://bobj-board.org/faq#ask-smart#10%';

update posts
set raw = replace(raw, 'https://bobj-board.org/faq#ask-smart#6', 'https://bobj-board.org/faq#ask-smart'), 
baked_version = null 
where raw like '%https://bobj-board.org/faq#ask-smart#6%';

update posts
set raw = replace(raw, 'https://bobj-board.org/faq#ask-smart#15', 'https://bobj-board.org/faq#ask-smart'), 
baked_version = null 
where raw like '%https://bobj-board.org/faq#ask-smart#15%';

working on viewtopic.php?p=

This one was a pretty hard one ! I haven’t be able to treat it via SQL only, I have run a small ruby script via Discourse console.

More info
posts = Post.where("raw like '%busobj/viewtopic.php?p=%'")

REGEX = /http\:\/\/(?:www\.|)forumtopics\.com\/busobj\/viewtopic\.php\?p=(\d{3,7})(&amp;[\w=\+;&%#]+|&amp;|#\d{0,7}|)([\n\r\) ,\[\]])/

posts.find_each do |post|
  puts "post [https://bobj-board.org/t/#{post.topic.id}/#{post.post_number}]"
  matches = post.raw.to_enum(:scan, REGEX).map {Regexp.last_match}
  matches.each do |m|
    puts m
    ref_post = Post.find_by_id(m[1])
    if ref_post != nil
      new_url = "https://bobj-board.org/t/#{ref_post.topic.id}/#{ref_post.post_number}#{m[3]}"
      puts "  replace: #{m[0]} by #{new_url}"
      new_raw = post.raw.gsub(m[0], new_url)
      post.update_column(:raw, new_raw)
      post.update_column(:baked_version, nil)
    end
  end
end

My initial query is now like that:

regexp_matches count
{search.php} 1132
{viewforum.php?f=} 352
{images} 246
{viewtopic.php?p=} 87
{viewtopic.php?t=} 82
{printview.php?t=} 43
{files} 43
{repository} 37
{search.php?mode=results} 26
{index.php} 12
{search} 7
{conf} 7
{""} 7
{viewtopic.php?} 2
{search.php?} 1

I will stop this task for now !