Saturday, June 20, 2009

Free-text search using acts_as_ferret plugin in ROR - Part I

Recently I had to implement free-text searching module in one of my rails application project. There are some plugins available on web for free-text searching in rails. Among them acts_as_ferret is recommended by most developers. I decided to give it a try. After I had got it running it was really a pleasant experience. Though you could also end up scratching your head if you don't get it up and running on your production server. I will share you my experiences during deployment in a production environment using Apache and Mongrel in follow up tutorials on ferret. In this tutorial I will give a little introduction to acts_as_ferret plugin and show you the power of ferret. There is a great tutorial for acts_as_ferret on rails envy tutorial.

What is acts_as_ferret?
Acts_as_ferret is a free-text search plugin for rails application. It is based on ferret(a high-performance ruby text search engine library based on Apache Lucene). This gives you the power of file based indexing cutting off the overhead of database hits during searching. You can implement model based searching on your application on the fly using this. Cutting off the introduction, let us start adding the functionality of ferret based search on our application. What say you??

Installing ferret and acts_as_ferret
Install gem of ferret using command:
gem install ferret
Install acts_as_ferret plugin using command:
script/plugin install svn://
Update acts_as_ferret plugin using command:
gem update acts_as_ferret

How to enable ferret for indexing your model fields?
The first thing that you need to do is to decide which models and corresponding fields that you want to index for searching. Say, you have a model named Book and you want to search for books according to title, author_name or publisher_name attributes. You need to index those three fields to enable you to do just that. You can do that by just adding the following line to your model's source code:
acts_as_ferret({:fields => {
:title => {:store => :yes, :boost => 5},
:author_name => {:store => :yes, :boost => 2},
:publisher_name => {:store => :yes, :boost => 2}
},:store_class_name => true },:remote => true )
I will let you understand all the attributes of acts_as_ferret definition. As you can see that :fields holds all the field names needed to be indexed. During searching only these fields will be searched and returned. Also for each of the fields I have defined two values for :store and :boost attributes.

The :store attribute is needed if you want to store the field value to the index files. You might be wondering why?? The beauty of storing any field value is that you don't need to hit the database if you want something like highlighting search keywords in your field value. It will not generate any new query against database as the action will be performed on the stored value.

The :boost attribute is for giving boost to the score value for a indexed field which will enable the field to get higher priority in search results. Say, you might want to see the books with the title matches first and then with the author_name or publisher_name.

The :store_class_name attribute is needed for enabling multi-model searching if you want to search in more than one model. To enable multi-model searching you have to define acts_as_ferret on each of the models you want to search for as above.

The :remote attribute is needed if you want to use DRb server. You can find information about DRb server here.

How to do a basic search?
To search within a single model you need to call find_with_ferret() method like this:
search_result = Book.find_with_ferret(query_string, ferret_options_hash, ar_options_hash)
Here, search_result will hold the result returned by the find_with_ferret which is an array of objects. This is of type ActsAsFerret::SearchResults. query_string is the string to search for, ferret_options_hash is the ferret search options and finally ar_options_hash is the option hash for active record filter option.

An example of single model searching could be like this:
result = Book.find_with_ferret('adventure', {:page => 1, :per_page => 25}, {:conditions => ['library_id = ?',]})
Here in this example, the search keyword is 'adventure' and the ferret options are :page which is indicating the first page and :per_page which is indicating the search result items per page is 25. And the active record condition is used for filtering the result returned by ferret to get more specific results needed.

An example multi-model searching could be like this:
result = Book.find_with_ferret('adventure', {:page => 1, :per_page => 25, :multi => [Library]}, {:conditions => ['library_id = ?',]})
To enable multi-model searching we need to add :multi attribute to the ferret_options_hash which will contain an array of model names separated by comma. The result will hold an array of objects returned by the search method. We can manipulate the result using index as we do for arrays.

The default sorting option is ferret_score which is returned as part of the result. But in real world often we need to sort result according to some other attributes. Say, we need to sort result according to date of publication. To do that we need to convert the publication_date to integer. We can do that by writing a method in model and add that method to be indexed. We are not only able to add fields to be indexed, we can add methods to be indexed as well. For example,
acts_as_ferret({:fields => { :title => {:store => :yes, :boost => 5},
:author_name => {:store => :yes, :boost => 2},
:publisher_name => {:store => :yes, :boost => 2},
:publication_date_for_sort => {:index => :untokenized_omit_norms},
},:store_class_name => true },:remote => true )

def publication_date_for_sort
return publication_date.to_i
Now we can write something like this which will return the results sorted by publication date:
@sort_field =
:publication_date_for_sort, :reverse => true)
result = Book.find_with_ferret('adventure', {:page => 1, :per_page => 25, :sort => @sort_field}, {:conditions => ['library_id = ?',]})

The :reverse attribute is used to specify whether you want your result in ASC or DESC order.