codeburst

Bursts of code to power through your day. Web Development articles, tutorials, and news.

Follow publication

Member-only story

Elasticsearch — Search in your local language

Roman Orac
codeburst
Published in
5 min readApr 20, 2020

How to implement efficient keywords search in Python for languages like Russian, Polish, French, Hungarian and more.

Photo by Gianfranco Lanzio on Unsplash

A while ago, we were developing an application for searching keywords in documents. One of the problems we faced was an efficient search in the Slovene language.

Search in Slovene documents is more demanding than in English, which is known as a morphologically relatively simple language. The Slovene language has some features uncommon to most other languages. These are cases and the grammatical number dual. For example, in English, the word “dogs” denotes a plural word for a dog, wherein Slovenian we have pes (a dog), psa (two dogs), psi (dogs).

Here are a few links that might interest you:

- Labeling and Data Engineering for Conversational AI and Analytics- Data Science for Business Leaders [Course]- Intro to Machine Learning with PyTorch [Course]- Become a Growth Product Manager [Course]- Deep Learning (Adaptive Computation and ML series) [Ebook]- Free skill tests for Data Scientists & Machine Learning Engineers

Some of the links above are affiliate links and if you go through them to make a purchase I’ll earn a commission. Keep in mind that I link courses because of their quality and not because of the commission I receive from your purchases.

Solution

To make search efficient, we chose ElasticSearch as we had positive experiences with it in the past. But ElasticSearch doesn’t offer Slovene lemmatizer out of the box. Luckily for us, there is a great plugin LemmaGen that solves this shortcoming. The same procedure described bellow also works for:

  • Bulgarian,
  • Czech,
  • Estonian,
  • French,
  • Hungarian,
  • Macedonian,
  • Persian,
  • Polish,
  • Romanian,
  • Russian,
  • Slovak,
  • Slovene
  • Serbian,
  • Ukrainian.

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Published in codeburst

Bursts of code to power through your day. Web Development articles, tutorials, and news.

No responses yet

Write a response