by Artificial Intelligence
Searzh is a cutting edge, extremely fast, fault-tolerant,
search system which allows for severe spelling errors.
It implements a unique, user-friendly and very robust algorithm, where the results are ranked by relevance.
By machine learning it can adapt to any pattern such as user behaviour or common misspellings.
Searzh is an excellent tool for searching in databases on, personal names,
names of artists, article titles, and in general all proper nouns.
Give it a try with the MusicBrainz Database,
(which contains more than a million artists) and see how it works.
Enter your text in the text-box to get a relevance-ranking search.
Relevance-ranking:
Misspellings justify fault-tolerance. A misspelling is just a type of word variation and there are several others.
For example, for proper nouns and in particular personal names, the ordering of names may differ, a name may be missing, it may be abbreviated or replaced by a nickname.
Names may be split differently, titles may be present or names may be truncated due to a short input field.
The non-standardised use of delimiters like ";", ",", "-" is also a problem.
In addition names may be transliterated: in other words,
translated from a script-based alphabet to a latin-based
alphabet. Unfortunately, this process is poorly standardised.
For example, a single Arabic name may have 10 - 50 Latin variants.
Considering that a full Arabic name usually consists of four parts,
there may be millions of possible names corresponding to one,
single script-based name.
Regular fault tolerance with the "Levenshtein" algorithm/metric (also known as "edit-distance" or
"fuzzy search") sorts the resulting hits according to the number of spelling errors.
This may seem a perfect metric. However, in general, it does not behave well for large variations.
The name of an artist given in the wrong order is likely to give a mismatch.
Give it a try in the text box below. Make some challenging searches and compare the two algorithms.
Leveshtein:
Levenshtein will strongly favour artist names with same text length as the query string.
Due to this, the algorithm cannot provide a proper autocomplete
function.
Another problem with Levenshtein is the high CPU load and slow response time.
It can be made faster by limiting the spelling errors
to one or two. Unfortunately, this requires the user to
enter almost all of the characters correctly before a relevant
response is possible. Even so, the result will be a nuisance,
in that the user will have a lot of irrelevant results. Such
products are widespread and they have a general tendency to put
fault tolerance, in a bad light.
Compare this to a relevancy ranking algorithm. It allows placing
different weights on the artists, thereby impacting the order
of the results and as well an adjustment of the autocomplete
properties.
In some rare applications, however, unlimited Levenshtein can be a good choice.
This is why Searzh supplies this metric and several
others as well, i.e. Jaccard, Jaro and JaroWinkler (specially developed for the US Census).
Searzh implementations are however,
very fast, and has no error limits.
It can be difficult to know the exact spelling of proper nouns: e.g. the titles of articles. Furthermore, about 5-10% of the population suffer
from severe dyslexia. Customers may find the correct website, but the problem is to find the desired article at a website or in an app, due to misspelling.
Such problems are common and arise even for small numbers of articles or other types of search words.
Searzh can dramatically increase a users hit rate.
Searzh will be pleased to analyze your data and use-cases free of charge. We are also willing to discuss
a no-cure-no-pay solution for systems with a proper hit to miss ratio logging.