When searching for text in a MySQL table, you have two choices:
I always wondered how those two methods would scale as the number of records increases. So I made an experiment.
Here is the setting:
I compared two approaches:
WHERE search_text LIKE "%word1%" AND search_text LIKE "%word2%" AND search_text LIKE "%word3%")
WHERE MATCH(search_text) AGAINST ("+word1 +word2 +word3" IN BOOLEAN MODE))
Horizontal axis is number of records, vertical axis is average time to process a query:
I took away several lessons from this:
For a medium-sized data set of up to 10,000 records (500,000 words) or so, LIKE only takes a fraction of a second. This means when optimizing a typical Rails action, you should probably look further than the database. A view can easily take many times longer to render. So measure before blaming the database.
FULLTEXT performance differs by a factor of 78 between a vocabulary of 1,000 words and 100,000 words. I guess that larger vocabularies result in a very wide but shallow inverted index that can quickly determine if a query has matches or not. An educated person has a passive vocabulary of 15,000 to 20,000 words, so FULLTEXT should work well for natural language texts.
While the FULLTEXT approach was many times faster than the LIKE approach in my tests, both approaches seem to scale linearly with the number of records. For a typical web projects where you need to index well under 5 million words, FULLTEXT will be fast enough to serve your searches until the project reaches end-of-life. But if you expect your data to grow indefinitely, FULLTEXT can only postpone the scaling pain and you will eventually need to deal with it, probably using a non-Mysql solution.