Blog tech

Full-text search with sphinx and thinking sphinx

Rédigé par Martin Catty | 18 mai 2012

What is full-text search ?

Full text search allows you to search through content, or part of a content.

It’s not designed on top of SQL, it uses an indexed DB which is much more efficient than doing some LIKE queries.

Search engine uses indexed DB. I’m not telling google is using sphinx but it’s based on the same concept.

To be able to search some terms, your content must be previously indexed. Each term of each document is indexed with a position.

So the main problem of full-text search is to keep an indexed DB up-to-date. If your content changes very often then you should index it often too.

But indexing is time consuming, and it will use some resources. In addition searching should still be available during indexation.

Sphinx and Thinking Sphinx

Sphinx is open source search server. Thinking sphinx is a gem built for rails which interact greatly with it.

Sphinx must be setup to be able to use Thinking Sphinx. It’s available for the 3 main OS.

Installing and configuring Thinking Sphinx

Thinking Sphinx can be installed as a gem and configured ìn config/sphinx.yml This is an example:

development:
  port: 9312
  enable_star: 1
  min_infix_len: 3
test:
  port: 9313
  enable_star: 1
  min_infix_len: 3

You can set the port for each environment or add the ability to search terms part. There is many other options, take a look at the advanced configuration guide.

The index has to be rebuilt when changing configuration file.

rake ts:rebuild

Thinking Sphinx will generate his own config file like config/development.sphinx.conf and maintain his indexed db in db/sphinx/development/

Indexing

As I said before, indexing is not done on the fly when you add new content. You should explicitly add a routine. Trying to index with model callbacks is not a good idea.

So you have to choose the better interval between two indexations.

Whenever is a great gem to automate this job. For example:

every 10.minutes do
  rake "ts:index"
end

every :reboot do
  rake "ts:start"
end

What about using sphinx with database filters ?

Imagine an e-commerce website. You’ve got tons of products with descriptions. Products belongs_to a category, has some pictures and a state (online or not).

product:
  title: "my great product"
  description: "my great description"
  category\_id: 1
  state: "online"

If a user browse a category and search a product, we should restrict his search to the scope of this category. We also want to be able to restrict search to product having pictures and an online state.

In fact we want to cumulate full-text search and SQL restrictions.

Thinking sphinx can interact with a SQL database like MySQL. There is attributes to do that. People often misunderstood difference between indexed fields (used with indexes) and attributes (used with has).

Fields are searchable while attributes match exact values. The state and category_id are exact values. But be careful, you should consider each attribute as an integer, even state which is a string.

So we’re gonna use the CRC32 method to index string as integer and be able to search it.

define_index do
   # fields to use for term search
   indexes title, sortable: true
   indexes description

   # attributes usable as sort / filter criteria
   has category_id

   has "CRC32(state)", as: :state, type: :integer

   has "COUNT(DISTINCT pictures.id) > 0", as: :has_pictures, type: :boolean

   join pictures
 end
  sphinx_scope :ts_online do
    { with: { state: "online".to_crc32 } }
  end

The ruby method to_crc32 is not part of the core nor ActiveSupport, it’s a thinking sphinx’s addition using Zlib.

Zlib.crc32 "ruby"
# => 3428506893

Be aware that this is not perfectly reliable as crc32 can have some collisions depending on the term.

Conclusion

And here we go, we’re now able to search products with a particular category, some pictures or not and specific terms in the title and/or description.

Sphinx is an easy to setup and powerful solution to add full-text search to your web application. I really encourage you to dig into the documentation to discover all the available options.

If you don’t want to use Sphinx there is other full-text search engine. solr is also very popular and has ruby gem like sunspot to interact with.

The Synbioz Team. Free to be together.