Input
Result
The translation will appear here...
How this works
This translator works by utilising the Levenshtein distance of two words in two different languages.
The Levenshtein distance is defined as the number of letter changes you would have to make to a word in order to turn it into another word. For example, to turn "hello" into "help", you remove the "o" and turn the "l" into a "p", so the Levenshtein distance is 2.
The input text is compared word by word to a corpus of the most common 15000 words in the target language and the best candidate "translated" word is chosen. The punctuation is kept from the original.
Because there are almost always multiple candidates, a tie-break is applied as a combination of Levenshtein distance (calculated with rapidfuzz) and word frequency (provided by wordfreq). The exact formula is log(freq)/(1+0.1*dist). This works with any real or imaginary word (e.g. "qwerty" becomes "party" in English).
To make this more performant on this low-powered server, I apply a pre-baked cross-language cache for common words as well as a dynamic cache for new unseen words. Additionally, I pre-select candidate words from buckets of word-length N±1 that dynamically expand when needed to reduce the search space.