'Phonetic Speech Recognition

I'm trying to get Latin Speech-Recognition for which I'll need, . . . not word-recognition but . . . phonetic-vowel-and-consonant-recognition (since Latin has only 40 sounds, but over 40,000 words x 60 avg. endings = 2.5 MILLION word-forms). The problem is, . . . both the Web Speech API and Google Cloud Speech only begin you with supposedly similar-sounding complete words (and from an English grammar, too, since there are no 2.5 Million-word Latin Grammars out there), and so there's no way for me to get down to processing the actual phonetic sounds, IN PARTICULAR JUST THE WORD-STEM (the first half of the word), which distinguishes each word, rather than the word-ending which uselessly (to me) tells how it's functioning in the sentence. Ideally, I'd want to have a grammar of word-stems such as

  • "am-" (short for amo,amare,amavi,amatus, etc.),
  • "vid-" (short for video,videre,vidi,visus, etc.),
  • "laet-" (short for laetus, laeta, laetum, etc.)

  • etc.

But speech-recognition technology can't search for that.
So where can I get phonetic speech recognition?

I prefer jS, pHp, or Node, and preferably client-side, rather than streaming.

Here's my code so far, for the Web Speech API. The key thing is the console.log()s which show my trying to dig into each returned possible-word's properties:

speech.onresult = function(event) { 
    var interim_transcript = '';
    var final_transcript = '';

    for (var i = event.resultIndex; i < event.results.length; ++i) { 
        if (event.results[i].isFinal) { 
            final_transcript += event.results[i][0].transcript;

            // This console.log shows all 3 word-guess possibilities.
               console.log(event.results[i]);
                    //These console.logs show each individual possibility:
                     //console.log('Poss-1:'); console.log(event.results[i][0]);
                     //console.log('Poss-2:'); console.log(event.results[i][1]);
                     //console.log('Poss-3:'); console.log(event.results[i][2]);
            for (var a in event.results[i]) {
                for (var b in event.results[i][a]) {
                  /*This black-&-yellow console.log below shows me trying to dig into
                  each returned possibility's PROPERTIES, but alas, the only 
                  returned properties are 
                  (1) the transcript (i.e. the guessed word), 
                  (2) the confidence (i.e. the 0-to-1 likelihood of it being that word)
                  (3) the prototype 
                   */
                    console.log("%c Poss-"+a+" %c "+b+": "+event.results[i][a][b], 'background-color: black; color: yellow; font-size: 14px;', 'background-color: black; color: red; font-size: 14px;'); 
                }        
            }

      } 
    }
    if (action == "start") {
        transcription.value += final_transcript;
        interim_span.innerHTML = interim_transcript;                       
    }
};    


Solution 1:[1]

You can use create a SpeechGrammarList. See also JSpeech Grammar Format.

Example description and code at MDN

The SpeechGrammarList interface of the Web Speech API represents a list of SpeechGrammar objects containing words or patterns of words that we want the recognition service to recognize.

Grammar is defined using JSpeech Grammar Format (JSGF.) Other formats may also be supported in the future.

var grammar = '#JSGF V1.0; grammar colors; public <color> = aqua | azure | beige | bisque | black | blue | brown | chocolate | coral | crimson | cyan | fuchsia | ghostwhite | gold | goldenrod | gray | green | indigo | ivory | khaki | lavender | lime | linen | magenta | maroon | moccasin | navy | olive | orange | orchid | peru | pink | plum | purple | red | salmon | sienna | silver | snow | tan | teal | thistle | tomato | turquoise | violet | white | yellow ;'
var recognition = new SpeechRecognition();
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognition.grammars = speechRecognitionList;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 guest271314