BusinessObjects Board

Search box - why truncate search strings unnecessarily?

I wonder why the search field would truncate certain words such as “name” or “report”. I know these are very common but they may really help when you need to filter the results. I can’t imagine how truncating these words would serve anyone better. If possible at all, I would like to ask that the truncation be removed from the search process.
Thanks so much.


questionguy (BOB member since 2008-02-05)

You must be asking a question about the search on this forum rather than about some sort of searching within the BusinessObjects product, right?

This question has been asked and answered many times in our “About BOB” forum, which is where I’m moving this topic.

In particular, you may be asking about “Stop Words”, which is covered in this Search topic, as well as other questions in the About BOB forum, where people asked why words are excluded from searches.

If you want to see all the past discussions on stop words, do a Search using keywords stop words. :wink:


Anita Craig :us: (BOB member since 2002-06-17)

It keeps the server from being overloaded. When you search for a word with lots of hits the temporary workspace for the search grows too large. As a result, the server bogs down for everyone else. If you want to search for report-related items, try limiting the search to the report building forum. We recently added a feature where you can search a forum and all of the child forums with a single pass to make this easier.

There are other search options being considered, but none that will be implemented this year.


Dave Rathbun :us: (BOB member since 2002-06-06)

Gotcha, thanks guys.

Sorry I’m not familiar with search algorithms. The reason I thought truncating isn’t resource-conserving because I know for some query engines, having more keywords actually expedites the process. That is, if the key words are ANDed and not ORed together.


questionguy (BOB member since 2008-02-05)

You are absolutely correct, except in the case of phpBB the issue with very common words is the problem. And the bigger we get, the more important the problem is.

In a nutshell, the logic is to capture the post_id for any post that contains the word that you are searching for. Then the post_id values for the next word are obtained. And then the next. So more common words have a much larger set of post_ids. When the list gets overly large, the search process spikes CPU and RAM usage.

As an example, on our old server (a single CPU) I did some analysis on the word frequency. I took the four worst words (meaning most common) and did a search for a combination of those four words. When it was all done, there were less than 30 topics with a combination of all four. But in order to get to that list, the CPU usage spiked over 70% for that one process, and the RAM usage spiked for that php session.

The advantage to the search system provided by phpBB2 is that unique or infrequently used words are found in the fastest way possible. The disadvantage is that common word searches present a server load.

I have spent years analyzing and tweaking the search process as much as I can without changing the core algorithm. I have an entire series of blog posts on another site related to this. Because a change would require an alteration of the core algorithm, a change of this type is probably not going to happen anytime soon.


Dave Rathbun :us: (BOB member since 2002-06-06)