legendkeeper

Technical Notes #1: Autocomplete

Added 2018-09-24 05:42:17 +0000 UTC

Hey y'all, this update is more technical in nature. I'm coding while simultaneously working on the intro video--No visually striking update for you yet, but I figured I'd make a technical post in the meantime.

This post will dive deep into features of a search engine called Elasticsearch, so if you don't care about that, you might be bored out of your mind! My next public post will have some pretty striking content.

One of the core tenants of LegendKeeper is keeping (lel) your information easily-accessible at all times. Functionally, that means that the app has a search bar that talks to an index of everything in your world encyclopedia. (I also do a search when auto-completing @-mentions within the wiki editor, but we'll leave that out for now.) It turns out that making autocomplete both accurate and usable is a tough problem, but ultimately solvable.

On one hand, when a user types a partial string into the search bar, you could query your database for partial name matches. This works, but is pretty inflexible. Depending on your database stack and your ORM, you may run into issues with case sensitivity and lack of fuzzy matches. LegendKeeper is powered by PostgresQL, so I could always write a custom SQL query, but the ORM I use is pretty good about being database agnostic, so I'd like to stay in that paradigm. I also don't want to query the database every time someone types.

I wanted a search solution that's extensible and will be able to support any kind of search-dreams I come up with in the future. I ended up going with Elasticsearch to solve search for LK. My professional experience with it lead me to come up with a search solution fairly quickly, plus it'll be able to handle any sort of search task in the future.

Elasticsearch out of the box actually isn't that great at "search-as-you-type". It requires a lot of setup, and took some trial and error to get search results that made sense. For example, with simple prefix queries in Elasticsearch, if you're trying to find "Amulet of Fireballs", and search "Fireballs", Amulet of Fireballs wouldn't be in your results. Fireballs isn't a prefix of Amulet of Fireballs, so ES is technically correct here. That's a poor user experience though, and we can do better!

ES has a suggestions system engineered specifically for the purpose of autocomplete. You can mark a field in an Elasticsearch document as type "completion", which allows you to add autocomplete suggestions for entries when you index them. For example, if I was indexing "Amulet of Fireballs", I could simultaneously add permutations of the phrase as suggestions, e.g. ["amulet ", "fireballs amulet", "fire amulet"]. This worked pretty well, but didn't work great for partial strings, like "amul". Additionally, I don't want every possible suggestion to appear to the user: the search index is shared across users, and I wouldn't want you seeing results for someone else's entries! Elasticsearch suggestions don't work with other queries or filters, but do have a feature called contexts that allow you to scope suggestions by custom properties, but I found it cumbersome to use. Back to the drawing board.

ES offers a number of text analyzers for the purposes of indexing documents, one of which is the the Edge N-Gram Analyzer. This takes a word or phrase and chunks it into a bunch of pieces, called n-grams. An n-gram is a contiguous sequence of n items from a given sample of text or speech.

If I used this analyzer when indexing "Amulet of Fireballs", for example, the item would be indexed as ["A", "Am", "Amu", ..., "F", "Fi", "Fir", ...] Lots of permutations there, but it does give some pretty great coverage on all the possible ways a user could type in the name. This was looking good, until I typed in the whole phrase, "Amulet of Fireballs" in the search bar: No matches. What! Well, depending on your N-gram size, it's entirely possible that "Amulet of Fireballs" isn't an n-gram, as it's bigger than the max n-gram size you set. I could just increase the n-gram size, but the n-gram analyzer also didn't treat whitespace the way I expected: Even if I told the analyzer to consider white space as part of the word, search results would disappear when typing spaces after words.

Luckily, search queries utilizing n-gram analysis are just normal queries, so I could bundle them with any number of other query types. By combining the n-gram search, a prefix search, a whole phrase search, and a keyword search into one query and boosting results that met more and more of the requirements, I ended up with search results that were much more intuitive.