skip navigation

Here you will find ideas and code straight from the Software Development Team at Sport Ngin. Our focus is on building great software products for the world of youth and amateur sports. We are fortunate to be able to combine our love of sports with our passion for writing code.

The Sport Ngin application originated in 2006 as a single Ruby on Rails 1.2 application. Today the Sport Ngin Platform is composed of more than 20 applications built on Rails and Node.js, forming a service oriented architecture that is poised to scale for the future.

About Us
Home

Getting Started with Elasticsearch on Rails

08/13/2014, 3:15pm CDT
By Ian Ehlert

How to get up and running with Elasticsearch in a Rails project.

This guide aims to reduce some of the confusion around implementing Elasticsearch indexing within a rails project.

So you think you need Elasticsearch?

Stop! First you should make sure that you actually need it! Many times, the problem that you're trying to solve could more easily be solved with some advanced SQL queries and maybe some new indexes. In general there are two cases where using elasticsearch makes sense.

  1. Full text searching - This one is pretty obvious, it's what elasticsearch was built for.
  2. Denormalizing complex data - Because we generally try to normalize our data to match up with our models, we can run into some performance problems when querying across these normalized tables. Using a search technology like elasticsearch can make denormalizing that data, and retrieving it much faster.

Ok, so you need Elasticsearch... What now?

The rest of this document will assume that you're working on a Rails project that needs Elasticsearch. However, many of the concepts will still apply in other situations.

Running Elasticsearch

In development (on OS X) you can install the latest Elasticsearch using homebrew:

brew install elasticsearch

If you're on a different platform, follow the installation instructions provided by Elasticsearch.

If you need to get an Elasticsearch cluster up and running in production, you can follow a guide I previously wrote on how to launch it on AWS Opsworks: Elasticsearch on Opsworks

Add the gem

If this is the first elasticsearch index you are adding to the project, you'll need to install the elasticsearch-rails and elasticsearch-model gems. The elasticsearch-rails gem provides some rake tasks and ActiveSupport instrumentation. The more important elasticsearch-model gem adds a lot of helper methods to your ActiveRecord::Base models.

Configure the gem

You'll need an initializer file to set up the Elasticsearch Client that the models use. Here is a simple example that sets up the default host to be localhost:9200, but uses the config/elasticsearch.yml file to overwrite the defaults.

config = {
  host: "http://localhost:9200/",
  transport_options: {
    request: { timeout: 5 }
  },
}

if File.exists?("config/elasticsearch.yml")
  config.merge!(YAML.load_file("config/elasticsearch.yml").symbolize_keys)
end

Elasticsearch::Model.client = Elasticsearch::Client.new(config)

Index the Model

Now that the rails project is all ready, you can set up the index on the model.

Before you start setting up your index, you'll need to include Elasticsearch::Model in your model. This provides a bunch of helper methods to work with elasticsearch. They are all namespaced under __elasticsearch__, so you don't have to worry about them overwriting your methods. Check out the documentation to see what including this provides for you.

Set up the Index

The obvious first step is to configure the index on the model that you want to search. Encapsulating the indexing/searching logic into an ActiveSupport::Concern that can be included in your model is the recommended approach.

The key method here is the as_indexed_json method. You can define your entire index using this method. It defaults to just calling as_json on your object, so you'll probably want to overload that to define the data that you actually want indexed. As a simple example, you may want to define an index on your Article model. Here's what it could look like:

def as_indexed_json
  self.as_json({
    only: [:title, :description, :text, :type, :status],
    include: {
      author: { only: :name },
      tags: { only: :name },
    }
  })
end

The key point here is that you only want to index the fields/relations that you actually need the search indexing on. The more fields you index, the larger the index will be in Elasticsearch, meaning more powerful hardware will be needed. Also, the larger the index is, the longer the search query will take.

Next you need to decide how many shards should hold the data. If it's not expected to be a huge amount of data, one primary shard is recommended. This StackOverflow Answer helps explain sharding and replicas in Elasticsearch.

This can be defined using settings index: { number_of_shards: 1 }. Check out the elasticsearch-model documentation for more information.

This is just a basic example of defining an index. You can get much more advanced using mappings, but in many cases just defining it using the as_indexed_json method will be enough.

To index your data, you can use the built in Model.__elasticsearch__.import to index all of your model's records. This will be really slow, so it's really only useful for small data sets or testing out in development.

Set up the Querying

Now that you have the index defined, it's finally time to start querying! Remember, you need to index some data before you can start querying, so use that import method from above.

The elasticsearch-model gem provides a powerful search interface to the RESTful Elasticsearch API. You can query/filter using any of the API endpoints defined in the Elasticsearch documentation.

The simplest way to query is just using Model.__elasticsearch__.search("my search terms"). This will just do the normal text searching across all fields in your index. This may be all you need, but very often you will need to combine filtering with querying. Filters are like the SQL WHERE clause; they trim down your index to a subset of records before then applying the search query. This can be extremely powerful when you're trying to solve a complicated query problem using denormalized data.

I've found that using the query string query provides the most flexibility for any mix of querying/filtering that you need. Here's a simple example:

options[:per_page] ||= 10
options[:from] = options[:page] * options[:per_page]

Article.__elasticsearch__.search(
  query: { query_string: {
    query: "*text search terms* AND type:\"blog\" AND status:\"published\""
  }},
  size: options[:per_page],
  from: options[:from]
)

The most interesting part is the query in the above example. Surrounding your query search terms in * allows it to do the full text searching using partial word matches. The other parts (AND type:\"blog\") perform exact matching on those fields.

One thing to note is that __elasticsearch__.search returns an Elasticsearch::Model::Response object. To generate ActiveRecord objects out of the response, just call .records on the response object. It will then load the objects from your database using their ids so it's nice and fast.

Also, keep in mind some characters are considered "special" characters in Elasticsearch and need to be escaped. This regex should handle most (if not all) cases:

query.gsub!(/([#{Regexp.escape('\\+-&|!(){}[]^~*?:/')}])/, '\\\\\1')

Set up Callbacks

Automatic Callbacks can be set up by just including Elasticsearch::Model::Callbacks. Including this will set up some after_commit callbacks on ActiveRecord models to update the indexes when records are created/updated/destroyed. The default callbacks are pretty naive, so it's generally better to write your own custom callbacks.

Custom Callbacks are easily defined and you can control when you actually want the indexes to be updated. It's also best practice to offload the index updating into a backgrounded/asynchronous process. This can help prevent problems if the Elasticsearch server were to disappear or have issues.

Write a Faster Import

As mentioned above, the built-in import is not very efficient. It grabs every record, builds the ActiveRecord object, and then calls as_indexed_json on it. If you're working with a large set of data that spans multiple relations, this can take many hours to complete. The elasticsearch-rails client provides a relatively simple interface for the Elasticsearch Bulk API, which can speed up this process greatly.

Here's a simple example using the Article model:

module ArticleImport
  def self.import
    Article.includes(:author, :tags).find_in_batches do |articles|
      bulk_index(articles)
    end
  end

  def self.prepare_records(articles)
    articles.map do |article|
      { index: { _id: article.id, data: article.as_indexed_json } }
    end
  end

  def self.bulk_index(articles)
    Article.__elasticsearch__.client.bulk({
      index: ::Article.__elasticsearch__.index_name,
      type: ::Article.__elasticsearch__.document_type,
      body: prepare_records(articles)
    })
  end
end

This can still obviously be optimized by not building ActiveRecord objects at all, but is a huge improvement over the default import.

That's it!

Piece of cake, right? There are a decent number of steps to getting Elasticsearch up and running with a Rails project, but it's nothing too terribly complicated. This definitely isn't the only way to use the elasticsearch-rails gem with your project, but it worked out pretty well for us.

Tag(s): Home  Ruby