Full text search allows you to search through content, or part of a content.
It’s not designed on top of SQL, it uses an indexed DB which is much more efficient than doing some LIKE queries.
Search engine uses indexed DB. I’m not telling google is using sphinx but it’s based on the same concept.
To be able to search some terms, your content must be previously indexed. Each term of each document is indexed with a position.
So the main problem of full-text search is to keep an indexed DB up-to-date. If your content changes very often then you should index it often too.
But indexing is time consuming, and it will use some resources. In addition searching should still be available during indexation.
Sphinx is open source search server. Thinking sphinx is a gem built for rails which interact greatly with it.
Sphinx must be setup to be able to use Thinking Sphinx. It’s available for the 3 main OS.
Thinking Sphinx can be installed as a gem and configured ìn config/sphinx.yml
This is an example:
development:
port: 9312
enable_star: 1
min_infix_len: 3
test:
port: 9313
enable_star: 1
min_infix_len: 3
You can set the port for each environment or add the ability to search terms part. There is many other options, take a look at the advanced configuration guide.
The index has to be rebuilt when changing configuration file.
rake ts:rebuild
Thinking Sphinx will generate his own config file like config/development.sphinx.conf
and maintain his indexed db in db/sphinx/development/
As I said before, indexing is not done on the fly when you add new content. You should explicitly add a routine. Trying to index with model callbacks is not a good idea.
So you have to choose the better interval between two indexations.
Whenever is a great gem to automate this job. For example:
every 10.minutes do
rake "ts:index"
end
every :reboot do
rake "ts:start"
end
Imagine an e-commerce website. You’ve got tons of products with descriptions. Products belongs_to a category, has some pictures and a state (online or not).
product:
title: "my great product"
description: "my great description"
category\_id: 1
state: "online"
If a user browse a category and search a product, we should restrict his search to the scope of this category. We also want to be able to restrict search to product having pictures and an online state.
In fact we want to cumulate full-text search and SQL restrictions.
Thinking sphinx can interact with a SQL database like MySQL. There is attributes to do that. People often misunderstood difference between indexed fields (used with indexes) and attributes (used with has).
Fields are searchable while attributes match exact values. The state and category_id are exact values. But be careful, you should consider each attribute as an integer, even state which is a string.
So we’re gonna use the CRC32 method to index string as integer and be able to search it.
define_index do
# fields to use for term search
indexes title, sortable: true
indexes description
# attributes usable as sort / filter criteria
has category_id
has "CRC32(state)", as: :state, type: :integer
has "COUNT(DISTINCT pictures.id) > 0", as: :has_pictures, type: :boolean
join pictures
end
sphinx_scope :ts_online do
{ with: { state: "online".to_crc32 } }
end
The ruby method to_crc32
is not part of the core nor ActiveSupport, it’s a thinking sphinx’s addition
using Zlib
.
Zlib.crc32 "ruby"
# => 3428506893
Be aware that this is not perfectly reliable as crc32 can have some collisions depending on the term.
And here we go, we’re now able to search products with a particular category, some pictures or not and specific terms in the title and/or description.
Sphinx is an easy to setup and powerful solution to add full-text search to your web application. I really encourage you to dig into the documentation to discover all the available options.
If you don’t want to use Sphinx there is other full-text search engine. solr is also very popular and has ruby gem like sunspot to interact with.
The Synbioz Team. Free to be together.