'which algorithm does google keyboard uses for automatic suggestions (personal vocab included)?

I am confused since google cannnot train their text generation models with each individuals personal vocabulary.

I was trying to develop something similar but i got stuck when the number of classes became dynamic during neural network training.

if i dont know the number of classes , how can 1 give the size of the layers and the dimension of input.

lets say google knows words from english vocab and i include some words from my slang into my personal dictionary, it is able to suggest those words to me in future.

assume that the google vocab consists of 10 words and i give it 5 words in a sequence , it one hot encodes them to a size of (5x10) vector of 1's and 0's.

and i then include 4 new words . now total vocab size = 14

but the RNN (if used) was trained using a vocab size of just 10 words. it cannnot encode those new words since those didnt belong in the vocab while training.

does it retrain its text generation model using (5x14) sized vectors?



Solution 1:[1]

According to this Google AI article, they used a combination of two things:

  • Neural Spatial Model: that addresses cognitive and motor errors ( manifesting in misspellings, character insertions, deletions or swaps, etc) at the character level. They used a character-level LSTM model trained using CTC loss function inspired from speech processing.
  • Finite-State Transducers: that provides language lexical constraints (what words occur in a language) and grammatical constraints (what words are likely to follow other words). The latter is done using a probabilistic n-gram model to represent the language model for the keyboard.

In your case, I suggest the following:

  • Your RNN needs to be trained on the character-level not the word-level to trace misspellings.
  • Use an n-gram model to track which words are usually accompanied by which words. This needs to be trained on your preferred vocabulary.
  • Lastly, it's not easy but it's doable!!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Anwarvic