'How can I extract and store the text generated from an automatic speech recognition deep learning app
The app can be viewed in huggingface https://huggingface.co/spaces/rowel/asr
import gradio as gr
from transformers import pipeline
model = pipeline(task="automatic-speech-recognition",
model="facebook/s2t-medium-librispeech-asr")
gr.Interface.from_pipeline(model,
title="Automatic Speech Recognition (ASR)",
description="Using pipeline with Facebook S2T for ASR.",
examples=['data/ljspeech.wav',]
).launch()
I don't know where the text files are stored with that very few lines of code. I would like to store the sentence text in a string.
Honestly I only know basic python programming. I would just like to store them into string variables and do something with them.
Solution 1:[1]
You can open up the Interface.from_pipeline
abstraction, and define your own Gradio interface. You need to define your own inputs, outputs, and prediction function, thus accessing the text prediction from the model. Here is an example.
You can test is here https://huggingface.co/spaces/radames/Speech-Recognition-Example
import gradio as gr
from transformers import pipeline
model = pipeline(task="automatic-speech-recognition",
model="facebook/s2t-medium-librispeech-asr")
def predict_speech_to_text(audio):
prediction = model(audio)
# text variable contains your voice-to-text string
text = prediction['text']
return text
gr.Interface(fn=predict_speech_to_text,
title="Automatic Speech Recognition (ASR)",
inputs=gr.inputs.Audio(
source="microphone", type="filepath", label="Input"),
outputs=gr.outputs.Textbox(label="Output"),
description="Using pipeline with F acebook S2T for ASR.",
examples=['ljspeech.wav'],
allow_flagging='never'
).launch()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | calmar |