The search function no longer seems to find any of the older posts. Is this correctable?
Hi there. I've forwarded your question to our webmaster. Do you know how far back it IS searching? It may be a time limit hardcoded into the forum software. And remember that we used to be on Delphiforums ... those posts weren't transferred over when we started hosting the forums on our own site.
I did a search for the word "spell" and the earliest date I got was March 2008.
On the other hand searching by author seems to go all of the way back to November 05 when these boards started.
Ok, So I had to do some searching to figure this one out. Here is the deal.
Every time you enter a post it is broken down into words.
Those words that are not yet in the database (because they are new, or you misspelled them ) are added to the wordlist table.
Once words have been entered in the wordlist table (and assigned a WORD_ID value) then every unique word in your post is entered in the search_wordmatch table. So this might be post 20, and I might have 50 unique words in it, so there will be 50 rows in the search wordmatch table.
The tradeoff is speed versus size. By indexing every word in every post as it is entered, phpBB knows where each unique word ever used in the forum appears.
Now when someone searches for "spell" phpBB looks up that word in the search_wordlist table (where it is unique, and therefore very fast to find) and gets the WORD_ID for that word.
Next it scans the wordmatch table to get a list of all of the posts that include that word.
Finally, it builds the list of results by post or topic as requested in the search.php page.
Again, the tradeoff is size versus speed. On most forums the search_wordmatch table will be one of the largest (if not the largest) tables in the database. But search results come back very quickly.
An alternative is to drop the search word and wordmatch tables and simply brute force "text search" the post data. But that takes much longer.
The devs for phpBB added a feature called "stop words". This is a text file that you can edit, and it contains words that should be "stopped" from searches. For example, how many posts on this forum do you think include the words post, topic, or forum? Quite a few. It would be essentially pointless to return search results with the word post as you could get over half of the database, and nobody is going to read half a million posts.
Now, here is the problem. For some reason or other, the wordmatch table starts indexing at post number 18521. So all the posts numbered 1 to 18520 are NOT being searched because they are not indexed. To fix this I am going to have to reindex the ENTIRE board, which will A) take some time, B) proably not something I want to do during peak hours, and C) need to either write or find a script to actually do the re-index.
So, I will put this on the list, but for now that explains what is going on. Sorry, it is not something that can be fixed quickly.
-Wendy the webmaster