I use audio32KHz.flac default to test then it's ok. But I try my file FLAC, 32KHz, It not working:
Fatal error: Uncaught Google\ApiCore\ApiException:
{
"message": "Invalid audio channel count",
"code": 3,
"status": "INVALID_ARGUMENT",
"details": []
}
thrown in C:\xampp\htdocs\speech\speech-19\vendor\google\gax\src\ApiException.php on line 139
How to convert file my file.FLAC to mono FLAC? Thank you!
1. To convert your audio file to mono:
You can use the sox library (Easy to install and use)
sudo apt-get install -y sox
Then convert your file to mono:
sox yourfile.flac output.flac channels 1
2. To use the API with multiple channels audio files:
a). Add these two arguments to your config. I don't know php but I believe you would write it like this:
->setaudioChannelCount(2)
->setenableSeparateRecognitionPerChannel(true)
Reference: Transcribing audio with multiple channels
b). Use the gcloud alpha command:
gcloud alpha ml speech recognize yourfile.flac --language-code='en-US' --audio-channel-count=2 --separate-channel-recognition
Reference: gcloud alpha ml speech recognize
Related
I am working on a speech recognition project. I am using Google speechrecognition api. I have deployed the django project on GCP flex environment using a dockerfile.
Dockerfile:
FROM gcr.io/google-appengine/python
RUN apt-get update
RUN apt-get install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 -y
RUN apt-get install python3-pyaudio
RUN virtualenv -p python3.7 /env
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt
ADD . /app
CMD gunicorn -b :$PORT main:app
app.yaml file:
runtime: custom
env: flex
entrypoint: gunicorn -b :$PORT main:app
runtime_config:
python_version: 3
code for taking voice input.
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone(device_index=0) as source:
print("speak")
audio = r.listen(source)
try:
voice_data =" " + r.recognize_google(audio)
I am getting the error: Assertion Error - Device index out of range (0 devices available; device index should be between 0 and -1 inclusive).
# set up PyAudio
self.pyaudio_module = self.get_pyaudio()
audio = self.pyaudio_module.PyAudio()
try:
count = audio.get_device_count() # obtain device count
if device_index is not None: # ensure device index is in range
assert 0 <= device_index < count, "Device index out of range ({} devices available; device index should be between 0 and {} inclusive)".format(count, count - 1) …
if sample_rate is None: # automatically set the sample rate to the hardware's default sample rate if not specified
device_info = audio.get_device_info_by_index(device_index) if device_index is not None else audio.get_default_input_device_info()
assert isinstance(device_info.get("defaultSampleRate"), (float, int)) and device_info["defaultSampleRate"] > 0, "Invalid device info returned from PyAudio: {}".format(device_info)
sample_rate = int(device_info["defaultSampleRate"])
except Exception:
audio.terminate()
It is unable to detect the audio device when I am going to the url. I need to detect the voice from the hosted webapp. What can I do to resolve this issue?
It seems that the error appears because there is not an audio card in a VM instance of AppEngine. Even if the sound card/drivers are installed, I wonder how the microphone device can be connected to the instance.
This question was marked with label google-speech-api, but the Speech API Client Libraries are not used in the code you shared. Instead, it is used the python package SpeechRecognition. Supposing that you want to use Speech API Client Libraries, you need to use streaming_recognize(), and I'm afraid that you need to change the code for taking voice input from web users' microphone, not the local device microphone.
In this link we can find an example that streams from a file, note that streaming recognition will convert speech data on the fly and won't wait the operation to finish like in the other methods. I'm not python expert, but from this example you would need to change this line to read from other source (from the web users' microphone):
with io.open('./hello.wav', 'rb') as stream:
You would need to do something like the following (audio: true) in the web app to read from the user's microphone, see this link for more reference:
navigator.mediaDevices.getUserMedia({ audio: true, video: false })
.then(handleSuccess);
A complete example using this approach is the Google Cloud Speech Node with Socket Playground guide. You might want to reuse some NodeJS code to connect it to your current python application. By the way, NodeJS is also available in AppEngine Flex.
I stored like 300 GB of audio data (mp3/wav mostly) on Amazon S3 and am trying to access it in a SageMaker notebook instance to do some data transformations. I'm trying to use either torchaudio or librosa to load a file as a waveform. torchaudio expects the file path as the input, librosa can either use a file path or file-like object. I tried using s3fs to get the url to the file but torchaudio doesn't recognize it as a file. And apparently SageMaker has problems installing librosa so I can't use that. What should I do?
For anyone who has this issue and has to use Sagemaker, I found installing librosa using the following:
!conda install -y -c conda-forge librosa
rather than via pip allowed me to use it in Sagemaker.
I ended up not using SageMaker for this, but for anybody else having similar problems, I solved this by opening the file using s3fs and writing it to a tempfile.NamedTemporaryFile. This gave me a file path that I could pass into either torchaudio.load or librosa.core.load. This was also important because I wanted the extra resampling functionality of librosa.core.load, but it doesn't accept file-like objects for loading mp3s.
I have downloaded some video from youtube using youtube-dl from many different playlists. Now i want all video's title should be included there uploader's name or channels name without downloading all video again so which cmd i need i am using window 10.
You may extract uploader name from -j JSON metadata. E.g. as:
youtube-dl.exe -j https://www.youtube.com/watch?v=YOUR-URL | python.exe -c "import sys, json; print(json.load(sys.stdin)['uploader'])"
-j option doesn't download a whole video.
I am using youtube-dl to download from a playlist for offline viewing. The operators of the playlist have started putting a scheduled video in the playlist that causes the downloads to fail. When trying to download the videos on the playlist, when it tries to download a video that isn't available (the scheduled video), it fails and the downloads abort.
How can I have the playlist download continue when there is a missing video?
My command:
/share/Multimedia/temp/youtube-dl -f 'best[ext=mp4]' -o "/share/Multimedia/YouTube/TheNational/%(upload_date)s.%(title)s.%(ext)s" --restrict-filenames --dateafter today-3day --no-mtime --download-archive "/share/Multimedia/temp/dllist-thenational.txt" --playlist-end 10 https://www.youtube.com/playlist?list=PLvntPLkd9IMcbAHH-x19G85v_RE-ScYjk
The download results from today:
[youtube:playlist] PLvntPLkd9IMcbAHH-x19G85v_RE-ScYjk: Downloading webpage
[download] Downloading playlist: The National | Full Show | Live Streaming Nightly at 9PM ET
[youtube:playlist] playlist The National | Full Show | Live Streaming Nightly at 9PM ET: Downloading 10 videos
[download] Downloading video 1 of 10
[youtube] pZ2AG5roG-A: Downloading webpage
[youtube] pZ2AG5roG-A: Downloading video info webpage
ERROR: This video is unavailable.
I want to playlist download to ignore the missing file and continue to the next available video.
Thanks.
I would add these before -f
-i, --ignore-errors
Continue on download errors, for example to skip unavailable videos in a playlist
-c, --continue
Force resume of partially downloaded files. By default, youtube-dl will resume downloads if possible.
I'm starting the Google Codelab on transfer learning with GCP: After installing the cloud sdk in the Cloud shell:
sudo pip install --upgrade pillow
curl https://storage.googleapis.com/cloud-ml/scripts/setup_cloud_shell.sh | bash
export PATH=${HOME}/.local/bin:${PATH}
I cannot go beyond the following command:
gcloud beta ml init-project
which returns the following:
ERROR: (gcloud.beta.ml) Invalid choice: 'init-project'.
Usage: gcloud beta ml [optional flags] <group>
group may be language | speech | video | vision
For detailed information on this command and its flags, run:
gcloud beta ml --help
Ref: https://codelabs.developers.google.com/codelabs/cpb102-txf-learning/index.html?index=..%2F..%2Findex#2
I guess the codelab is old and the SDK has been changed. Can anybody point me to documentation that is perhaps more up-to-date for using the cloud sdk to do transfer learning?
Thanks!
Here is a more up to date codelabs that steps you through transfer learning:
https://codelabs.developers.google.com/codelabs/scd-coastline/index.html
That init-project call is no longer required; it happens automatically.