Elasticsearch by Example: Part 2
Having convinced ourselves that Elasticsearch was the solution to our faceted search functionality problem, we learn the basics of using it.

This article is part of a series, starting with Elasticsearch by Example: Part 1, exploring the Elasticsearch database / search engine.
Amazon Elasticsearch Service
While Elasticsearch itself is open-source software (can even be run on your development machine), I was happy to pay Amazon $0.036 per hour for a cloud based solution suitable for learning to avoid the installation hassle.
In this case, I setup a new Elasticsearch domain with the following choices:
- Elasticsearch version: 5.5
- Instance count: 1
- Instance Type: t2.small.elasticsearch
- Domain access policy: Allow open access to domain
Once active, you interact with your domain through its API endpoint. For example issuing a HTTP GET on the endpoint results in:
GET: ENDPOINT
{
"name": "QOV3LPp",
"cluster_name": "979494976816:demo",
"cluster_uuid": "wz7Q32YfTJOc0JZhMCcsUQ",
"version": {
"number": "5.5.2",
"build_hash": "b2f0c09",
"build_date": "2017-08-21T22:31:55.076Z",
"build_snapshot": false,
"lucene_version": "6.6.0"
},
"tagline": "You Know, for Search"
}
Basic Concepts
In learning Elasticsearch, I did not find a single tutorial that was sufficient for my learning; but rather I ended up piecing my understanding of it through a combination of them.
The official Elasticsearch has a page explaining the basic concepts; in particular the relationship between an cluster, index, type, and document.
In many tutorials they draw analogies to SQL databases:
- cluster = server(s)
- index = database
- type = table
- document = tuple (or row)
This analogy, however, falls apart when you consider when to use indices vs. types as discussed in an article Index vs. Type.
you may be surprised that there are not as many use cases for types as you expected. And this is right: there are actually few use cases for having several types in the same index for the reasons that we mentioned above.
—Adrien Grand, Elasticsearch
With this in mind, we are likely to often to have a one-to-one relationship between indices and types.
In our simple example, we have an environment with:
- a cluster consisting of a single node
- a single index: shirts
- shirts has a single shard and replica set
- shirts has a single type: shirt
The Document Structure
For our shirts example, we will start with a representative document (Elasticsearch itself maintains metadata, e.g., a unique id, separately) as follows:
{
"name": "tshirt",
"size": "S",
"color": "black",
"fabric": "cotton",
"price": 1000
}
Some observations:
- In addition to the four facet dimensions, we include a descriptive name property
- We store the price as an integer (1000 = $10.00); Elasticsearch is powerful enough to allow us to not to have to store price ranges separately
Create
Elasticsearch provides a fairly familiar RESTful API for Create, Read, Update, and Delete (CRUD) operations.
We begin by creating the index itself; essentially providing the index’s settings including the mapping (schema for the types). We issue the following HTTP request using a favorite API testing tool, e.g,. Postman.
PUT: ENDPOINT/shirts
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"shirt": {
"properties": {
"name": {
"type": "text"
},
"size": {
"type": "keyword"
},
"color": {
"type": "keyword"
},
"fabric": {
"type": "keyword"
}
"price": {
"type": "long"
}
}
}
}
}
We use the the text type for name to enable full-text searches for it. We use the keyword type for dimensions to prevent full-text searches for them. As indicated above, the price is stored as a long type.
With the index created, we can create a shirt document (with server generated id) with:
POST: ENDPOINT/shirts/shirt
{
"name": "tshirt",
"size": "S",
"color": "black",
"fabric": "cotton",
"price": 1000
}
In order to mass create shirts, we can use the bulk create API endpoint using the pattern of a first row indicating the index and type followed by a row with the data.
POST: ENDPOINT/shirts/_bulk
{ "index": { "_index": "shirts", "_type": "shirt" } }
{ "name": "tshirt", "size": "M", "color": "black", "fabric": "cotton", price": 1000 }
{ "index": { "_index": "shirts", "_type": "shirt" } }
{ "name": "tshirt", "size": "L", "color": "black", "fabric": "cotton", "price": 1000 }
...
Observations:
- Each row is a complete JSON entry; cannot be multi-line
- Each row (including the last one) needs to be terminated with a new line.
Sidebar: Should I Use Elasticsearch as My Primary Database?
You may be thinking, as I was, that Elasticsearch is really a NoSQL database under the hood with some powerful indexing / searching features. As such, why not use it as as primary database?
The folks at Elasticsearch offer up their thoughts in an article.
Elasticsearch is commonly used in addition to another database. A database system with stronger focus on constraints, correctness and robustness, and on being readily and transactionally updatable, has the master record — which is then asynchronously pushed to Elasticsearch. (Or pulled, if you use one of Elasticsearch’s “rivers”.) Keeping things in sync is something we’ll cover in depth in a future article. Here at Found, we typically use PostgreSQL and ZooKeeper as keeper of truths, which we feed into Elasticsearch for awesome searching.
— Alex Brasetvik, Elasticsearch
At this point, my thinking is gravitating towards their advice of using something else as my primary database and cloning data to Elasticsearch for advanced search functionality.
As such, I will focus the remainder of this series specifically on search; you can read up on the remaining CRUD operations on your own.
Search
The simplest search is retrieve all the documents.
GET: ENDPOINT/_search : Retrieves all the documents in the cluster
GET: ENDPOINT/shirts/_search: Retrieves all the documents in the shirts index
GET: ENDPOINT/shirts/shirt/_search: Retrieve all the documents of type shirt in the shirts index.
Instead of GET one can POST JSON formatted queries using these same URLS, for example the following also retrieve all the documents of type shirt in the shirts index.
POST: ENDPOINT/shirts/shirt/_search
{
"query": {
"match_all": {}
}
}
Next Steps
In the next article, Elasticsearch by Example: Part 3 we will explore search queries that are relevant to our faceted search problem.