Friday, June 25, 2010

MySql phrase search

I read at 
http://web.informbank.com/articles/technology/mysql-regexp-search-phrase.htm
that : 
---
Creating a SQL query for searching an exact phrase in a given field(s) in a table is not much harder than the simple query for searching a substring. However this query could be very slow if you are searching through huge amount of text in the database. So, a good solution to this problem is first to eliminate the rows which do not contain the phrase as a substring (which is fast) and then perform the regular expression (REGEXP) only to the others. Here is what I'm talking about:
SELECT * FROM abc_table WHERE t_text LIKE '%search phrase%' AND t_text REGEXP '[[:<:]]search phrase[[:>:]]'
--

What I don't understand is, suppose I remove the REGEXP part from the above query,
even then, shouldn't the query be equally fast?
In both the queries, each row is being searched anyway for the given text,
so what's the difference?

Ans : 
Sorry - I'm probably not being too clear

SELECT * FROM abc_table WHERE t_text REGEXP '[[:<:]]search phrase[[:>:]]'
gets the right answer, but is slow

SELECT * FROM abc_table WHERE t_text LIKE '%search phrase%' 
eliminates most but not all of the wrong answers quickly

Their suggestion of

SELECT * FROM abc_table WHERE t_text LIKE '%search phrase%' AND t_text REGEXP '[[:<:]]search phrase[[:>:]]'
does 
SELECT * FROM abc_table WHERE t_text LIKE '%search phrase%' 
to it gets rid of most of the wrong answers quickly, followed by 
 t_text REGEXP '[[:<:]]search phrase[[:>:]]'
to get rid of any remain wrong answers.

LIKE is faster than REGEXP, but doesn't get the whole result. Using LIKE first gets you close quickly, and means the slower REGEXP is use on less cases.

No comments:

Blog Archive