Elasticsearch by Example: Part 3

John Tucker
codeburst
Published in
3 min readSep 13, 2017

--

Using Elasticsearch’s search feature, we solve the facet search problem.

This article is part of a series, starting with Elasticsearch by Example: Part 1, exploring the Elasticsearch database / search engine.

More Search

Because our dimensions, e.g., size, are stored as keywords (as opposed to text) we will be querying with the term (exact term) operator as opposed to the match (full-text search) operator. For example to retrieve all the small shirts, we would use:

POST: ENDPOINT/shirts/shirt/_search

{
"query": {
"term": {
"size": "S"
}
}
}

Because price is a number (long), for shirts between $0 and $10 inclusive we can use:

POST: ENDPOINT/shirts/shirt/_search

{
"query": {
"range": {
"price": {
"gte" : 0,
"lte" : 1000
}
}
}
}

For small black shirts between $0 and $10.00 inclusive we would use a bool query:

POST: ENDPOINT/shirts/shirt/_search

{
"query": {
"bool": {
"filter": [
{ "term": { "size" : "S" } },
{ "term": { "color" : "black" } },
{ "range": {
"price": {
"gte" : 0,
"lte" : 1000
}
}}
]
}
}
}

Observation: We use the filter, vs. the more common must, operator as we do not need the document scoring feature; performance issue.

Aggregation

One common feature of websites using faceted searches is a count of the documents matching the facet choice.

note: Interestingly enough, Amazon does not provide this feature.

We can use the Elasticsearch terms aggregation feature to obtain these counts from the query results. Say we wanted to obtain all of the sizes and colors (including counts) of cotton shirts.

POST: ENDPOINT/shirts/shirt/_search

{
"query": {
"term": {
"fabric": "cotton"
}
},
"aggs": {
"size": {
"terms": {
"field": "size"
}
},
"color": {
"terms": {
"field": "color"
}
}
}
}

With a aggregations section below the query results in the response looking like:

...
"aggregations": {
"color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "black",
"doc_count": 6
},
{
"key": "red",
"doc_count": 3
}
]
},
"size": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "L",
"doc_count": 3
},
{
"key": "M",
"doc_count": 3
},
{
"key": "S",
"doc_count": 3
}
]
}
}
...

Looks Like We Have Solved the Problem

Looking at these example queries, we have solved the data problem underlying building our facet search enabled shirt website. It also includes the bonus of providing the facet choice counts.

With our over-simplified example, however, we are missing problems that will occur as we add additional complexity.

Large Number of Facets

In our example, we only had four facets: size, color, fabric, and price. What happens if we have many more (say 30). First, we would have to add them to the index mapping. And then our aggregation clauses would need to include all of the facets.

Not too bad, but a bit unwieldy.

Other Indices

Another complexity would be if we introduced additional things, other than shirts, to our problem. In particular, say we were adding televisions that included its own facets: size, resolution, refresh rate, etc.

Because the index mapping is different (due to the different facets), we would use a separate index (and type) to store the television data.

Likewise, we would have to use aggregation queries that were specific to televisions.

Obviously, this will become tedious as we need to add more things (with their own facets).

Next Steps

In the next article, Elasticsearch by Example: Part 4, we explore an alternative way to store and query data that will avoid the problems we observed here.

--

--