In order to improve content discoverability, display contextual suggestions, recommend related articles and deflect support tickets, we rely on an algorithm that takes into account a number of factors. Below is a short summary of what's involved.

  • Different fields are weighted differently: If there are matching words in keywords, title and body, the results are ranked in the said order. This is particular useful as you can influence the answers delivered by tagging articles with the right keywords, and make sure that the title nicely summarizes the article so the user is more likely to click it.
  • Number of matching words: Articles that contain the most matching contextual words will appear before articles that contain only one or two. For instance, if someone searches for the phrase heavy metal, articles that contain both words will rank higher than articles that only contain heavy or metal.
  • Word frequency: The more frequent a word appears in an article, the higher it is ranked.
  • Inverse document frequency: The more common a word is, the less weight it is given when ranking articles. This is because words that appear very frequently (e.g. the, is, of, a, etc) usually carry very little information and are therefore less relevant. As an example, if most articles in your knowledge base contain the word heavy but very few articles have the word metal, heavy is deemed less important than metal.
  • Synonyms (English only): When you publish an article, we generate a synonym set based on relevant key phrases that appear in it to improve its discoverability. As an example, when someone searches for customer, but most articles use the word user instead, the search functionality will still surface those articles. It is important to note that since synonyms are context-specific, we give such phrases less weight when ranking the articles.
  • Stemming and lemmatization: When publishing an article, we use a process called stemming and lemmatization to reduce words to their root form. For example, the word metallic will be stemmed to metal so the article can be surfaced when someone searches for metal.
  • Fuzzy matches: To handle typos and variations in spelling (think British vs American English), we allow the algorithm to be a bit more flexible when surfacing relevant results, up to a maximum of two edits. If someone gets a bit excited and searches for metalll, we will still return articles containing the word metal but not when they search for mettallllll.