Elasticsearch by Example: Part 1
I started an e-commerce-like project and found myself falling (still falling) down the rabbit-hole of Elasticsearch. Thought to share my experience through example.

The Problem
The crux of the problem can be best described as the Amazon search feature; like you would use to narrow your choices when you are buying a shirt.

Apparently there is a more elegant term for this: faceted search.
Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing information organized according to a faceted classification system, allowing users to explore a collection of information by applying multiple filters. A faceted classification system classifies each information element along multiple explicit dimensions, called facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, pre-determined, taxonomic order.
— Wikipedia
For the purposes of this article, the simplified problem is building a website that lets one buy shirts that can be filtered by size, color, fabric, and price (and leave open the option for many other dimensions). Also, let us say there are many thousands of shirts; so brute force solutions are noticeably less performant.
To simplify things a bit, I treated the filtering for prices as a series of ranges, e.g., under $25, $25 — $50, etc.
Using a Hammer to Drive a Screw
Let me explain my initial thoughts on how I was going to store and query the data for this problem.
Knowing a bit about SQL databases, I started thinking about how I could use them to solve the problem.
I started with a singular table of shirts with columns (among other columns) containing the data for size, color, fabric, and price-range. Knowing that we have a lot of data, I knew that we were going to have to use indicies to speed up the queries.
The simple approach of creating four indicies, one for each dimension, works if I only allow one one to select a single dimension for a query. If I allow querying with two or more dimensions, e.g., small black shirts, then this approach fails. In this scenario, the best that it can do is use the size index to quickly get a list of all the small shirts and then one-by-one determine if they are black (or visa-versa).
A better solution is to create combination indicies to allow for querying with multiple dimensions. Because order matters, the combination index:
size — color — fabric — price-range
is only suitable for the queries involving the following dimensions:
- size
- size + color
- size + color + fabric
- size + color + fabric + price-range
Thinking about this a bit, I determined that I needed a lot (4! = 24) of combination indicies to allow queries across any number of dimensions. This gets much worse as the number of dimensions grow.
Yuck!
Breaking out a Saw
Reading up on “typical” NoSQL databases (for example Firebase), I concluded that I could do no better with them. Firebase only supports indicies on single “columns” of data. The way to simulate combination indicies is to simply create additional (24 in our case) indexed “columns” consisting of merged and delimited columns, e.g.,
S|black|cotton|25-50
The additional headache, as described in another article, is that you have to manage these columns yourself.
Double-yuck!
Googled the Answer
After a bit of Googling, I found an interesting article, On-Site Search Design Patterns for E-Commerce: Schema Structure, Data Driven Ranking & More, that specifically addresses faceted searches and more. The article provides a number of relevant design patterns using the Elasticsearch database / search engine.
Unfortunately, however, I was new to Elasticsearch and found their example overly complicated.
Next Steps
Much of the rest of this series will be explaining and applying the design patterns from the found article to the shirt problem we setup for this series.
We will start with the basics of Elasticsearch in the next article, Elasticsearch by Example: Part 2.