by Artificial Intelligence

Searzh is a cutting edge, extremely fast, fault-tolerant, search system which allows for severe spelling errors. It implements a unique, user-friendly and very robust algorithm, where the results are ranked by relevance.

By machine learning it can adapt to any pattern such as user behaviour or common misspellings.

Searzh is an excellent tool for searching in databases on, personal names, names of artists, article titles, and in general all proper nouns. Give it a try with the MusicBrainz Database, (which contains more than a million artists) and see how it works. Enter your text in the text-box to get a relevance-ranking search.

Relevance-ranking:

Misspellings justify fault-tolerance. A misspelling is just a type of word variation and there are several others.

For example, for proper nouns and in particular personal names, the ordering of names may differ, a name may be missing, it may be abbreviated or replaced by a nickname. Names may be split differently, titles may be present or names may be truncated due to a short input field. The non-standardised use of delimiters like ";", ",", "-"  is also a problem.

In addition names may be transliterated: in other words, translated from a script-based alphabet to a latin-based alphabet. Unfortunately, this process is poorly standardised. For example, a single Arabic name may have 10 - 50 Latin variants. Considering that a full Arabic name usually consists of four parts, there may be millions of possible names corresponding to one, single script-based name.

Regular fault tolerance with the "Levenshtein" algorithm/metric (also known as "edit-distance" or "fuzzy search") sorts the resulting hits according to the number of spelling errors.

This may seem a perfect metric. However, in general, it does not behave well for large variations. The name of an artist given in the wrong order is likely to give a mismatch. Give it a try in the text box below. Make some challenging searches and compare the two algorithms.

Leveshtein:

Levenshtein will strongly favour artist names with same text length as the query string. Due to this, the algorithm cannot provide a proper autocomplete function.

Another problem with Levenshtein is the high CPU load and slow response time. It can be made faster by limiting the spelling errors to one or two. Unfortunately, this requires the user to enter almost all of the characters correctly before a relevant response is possible. Even so, the result will be a nuisance, in that the user will have a lot of irrelevant results. Such products are widespread and they have a general tendency to put fault tolerance, in a bad light.

Compare this to a relevancy ranking algorithm. It allows placing different weights on the artists, thereby impacting the order of the results and as well an adjustment of the autocomplete properties.

In some rare applications, however, unlimited Levenshtein can be a good choice. This is why Searzh supplies this metric and several others as well, i.e. Jaccard, Jaro and JaroWinkler (specially developed for the US Census). Searzh implementations are however, very fast, and has no error limits.

It can be difficult to know the exact spelling of proper nouns: e.g. the titles of articles. Furthermore, about 5-10% of the population suffer from severe dyslexia. Customers may find the correct website, but the problem is to find the desired article at a website or in an app, due to misspelling.

Such problems are common and arise even for small numbers of articles or other types of search words.

Searzh can dramatically increase a users hit rate.

Searzh will be pleased to analyze your data and use-cases free of charge. We are also willing to discuss a no-cure-no-pay solution for systems with a proper hit to miss ratio logging.

Advantages of Searzh

On premises or in cloud

Use REST or class lib

Unicode

Supports any alphabeth

Ultrafast

App responds in < 5 ms

Scalable

Time and space scales linearly

Technical Information

Searzh is made by programmers for programmers. It is a software module which is very easy to integrate with existing systems.

Searzh is implemented in C# and Java. As a result it executes on Windows, Linux, and MacOS.

Searzh implements a REST API. Supported operations: Load, Search, Insert and Delete.

Furthermore Searzh compiles for Microsoft UWP and may be ported to Android and iOS via Xamarin.

The results of a search request are arranged as a sorted list according to the selected metric/algorithm. Each list entry contains the score and the foreign key of the record and, optionally, the record text.

Searzh can load clients data from any data source. Once loaded, the system can be kept syncronized with insert and delete operations.

Searzh is also an excellent tool for cleaning database system of duplicates, and to prevent such inserts.

Searzh supports a comprehensive set of metrics/algorithms. Metrics may also be combined: for example by merging.

Searzh will not cause the server to hang even if very common character combinations like the article "the" is entered.

Searzh will provide proper timeout handling in case of cpu overload.

logo

Searzh as, orgno: 979 925 166, post@searzh.as, phone:+47 932 24 535