Here you will find ideas and code straight from the Software Development Team at SportsEngine. Our focus is on building great software products for the world of youth and amateur sports. We are fortunate to be able to combine our love of sports with our passion for writing code.
The SportsEngine application originated in 2006 as a single Ruby on Rails 1.2 application. Today the SportsEngine Platform is composed of more than 20 applications built on Rails and Node.js, forming a service oriented architecture that is poised to scale for the future.
This guide aims to reduce some of the confusion around implementing Elasticsearch indexing within a rails project.
Stop! First you should make sure that you actually need it! Many times, the problem that you're trying to solve could more easily be solved with some advanced SQL queries and maybe some new indexes. In general there are two cases where using elasticsearch makes sense.
The rest of this document will assume that you're working on a Rails project that needs Elasticsearch. However, many of the concepts will still apply in other situations.
In development (on OS X) you can install the latest Elasticsearch using homebrew:
brew install elasticsearch
If you're on a different platform, follow the installation instructions provided by Elasticsearch.
If you need to get an Elasticsearch cluster up and running in production, you can follow a guide I previously wrote on how to launch it on AWS Opsworks: Elasticsearch on Opsworks
If this is the first elasticsearch index you are adding to the project, you'll need to install the elasticsearch-rails and elasticsearch-model gems. The elasticsearch-rails
gem provides some rake tasks and ActiveSupport instrumentation. The more important elasticsearch-model
gem adds a lot of helper methods to your ActiveRecord::Base models.
You'll need an initializer file to set up the Elasticsearch Client that the models use. Here is a simple example that sets up the default host to be localhost:9200
, but uses the config/elasticsearch.yml
file to overwrite the defaults.
config = {
host: "http://localhost:9200/",
transport_options: {
request: { timeout: 5 }
},
}
if File.exists?("config/elasticsearch.yml")
config.merge!(YAML.load_file("config/elasticsearch.yml").symbolize_keys)
end
Elasticsearch::Model.client = Elasticsearch::Client.new(config)
Now that the rails project is all ready, you can set up the index on the model.
Before you start setting up your index, you'll need to include Elasticsearch::Model
in your model. This provides a bunch of helper methods to work with elasticsearch. They are all namespaced under __elasticsearch__
, so you don't have to worry about them overwriting your methods. Check out the documentation to see what including this provides for you.
The obvious first step is to configure the index on the model that you want to search. Encapsulating the indexing/searching logic into an ActiveSupport::Concern
that can be included in your model is the recommended approach.
The key method here is the as_indexed_json
method. You can define your entire index using this method. It defaults to just calling as_json
on your object, so you'll probably want to overload that to define the data that you actually want indexed. As a simple example, you may want to define an index on your Article
model. Here's what it could look like:
def as_indexed_json
self.as_json({
only: [:title, :description, :text, :type, :status],
include: {
author: { only: :name },
tags: { only: :name },
}
})
end
The key point here is that you only want to index the fields/relations that you actually need the search indexing on. The more fields you index, the larger the index will be in Elasticsearch, meaning more powerful hardware will be needed. Also, the larger the index is, the longer the search query will take.
Next you need to decide how many shards should hold the data. If it's not expected to be a huge amount of data, one primary shard is recommended. This StackOverflow Answer helps explain sharding and replicas in Elasticsearch.
This can be defined using settings index: { number_of_shards: 1 }
. Check out the elasticsearch-model documentation for more information.
This is just a basic example of defining an index. You can get much more advanced using mappings
, but in many cases just defining it using the as_indexed_json
method will be enough.
To index your data, you can use the built in Model.__elasticsearch__.import
to index all of your model's records. This will be really slow, so it's really only useful for small data sets or testing out in development.
Now that you have the index defined, it's finally time to start querying! Remember, you need to index some data before you can start querying, so use that import
method from above.
The elasticsearch-model
gem provides a powerful search interface to the RESTful Elasticsearch API. You can query/filter using any of the API endpoints defined in the Elasticsearch documentation.
The simplest way to query is just using Model.__elasticsearch__.search("my search terms")
. This will just do the normal text searching across all fields in your index. This may be all you need, but very often you will need to combine filtering with querying. Filters are like the SQL WHERE
clause; they trim down your index to a subset of records before then applying the search query. This can be extremely powerful when you're trying to solve a complicated query problem using denormalized data.
I've found that using the query string query provides the most flexibility for any mix of querying/filtering that you need. Here's a simple example:
options[:per_page] ||= 10
options[:from] = options[:page] * options[:per_page]
Article.__elasticsearch__.search(
query: { query_string: {
query: "*text search terms* AND type:\"blog\" AND status:\"published\""
}},
size: options[:per_page],
from: options[:from]
)
The most interesting part is the query
in the above example. Surrounding your query search terms in *
allows it to do the full text searching using partial word matches. The other parts (AND type:\"blog\"
) perform exact matching on those fields.
One thing to note is that __elasticsearch__.search
returns an Elasticsearch::Model::Response
object. To generate ActiveRecord objects out of the response, just call .records
on the response object. It will then load the objects from your database using their ids so it's nice and fast.
Also, keep in mind some characters are considered "special" characters in Elasticsearch and need to be escaped. This regex should handle most (if not all) cases:
query.gsub!(/([#{Regexp.escape('\\+-&|!(){}[]^~*?:/')}])/, '\\\\\1')
Automatic Callbacks can be set up by just including Elasticsearch::Model::Callbacks
. Including this will set up some after_commit
callbacks on ActiveRecord models to update the indexes when records are created/updated/destroyed. The default callbacks are pretty naive, so it's generally better to write your own custom callbacks.
Custom Callbacks are easily defined and you can control when you actually want the indexes to be updated. It's also best practice to offload the index updating into a backgrounded/asynchronous process. This can help prevent problems if the Elasticsearch server were to disappear or have issues.
As mentioned above, the built-in import
is not very efficient. It grabs every record, builds the ActiveRecord object, and then calls as_indexed_json
on it. If you're working with a large set of data that spans multiple relations, this can take many hours to complete. The elasticsearch-rails client provides a relatively simple interface for the Elasticsearch Bulk API, which can speed up this process greatly.
Here's a simple example using the Article
model:
module ArticleImport
def self.import
Article.includes(:author, :tags).find_in_batches do |articles|
bulk_index(articles)
end
end
def self.prepare_records(articles)
articles.map do |article|
{ index: { _id: article.id, data: article.as_indexed_json } }
end
end
def self.bulk_index(articles)
Article.__elasticsearch__.client.bulk({
index: ::Article.__elasticsearch__.index_name,
type: ::Article.__elasticsearch__.document_type,
body: prepare_records(articles)
})
end
end
This can still obviously be optimized by not building ActiveRecord objects at all, but is a huge improvement over the default import.
Piece of cake, right? There are a decent number of steps to getting Elasticsearch up and running with a Rails project, but it's nothing too terribly complicated. This definitely isn't the only way to use the elasticsearch-rails
gem with your project, but it worked out pretty well for us.