I'm trying to suppress the output of aplay, but without success.
I know how to suppress print statments with stdout, but I didn't figured out how to archive the same result with pydub module.
For example when I play a sound with this code
from pydub import AudioSegment
from pydub.playback import play
next_kot = AudioSegment.from_ogg('/home/effe/Voz/Hello.ogg')
play(next_kot)
The output generated (in red!) is
avplay version 9.18-6:9.18-0ubuntu0.14.04.1, Copyright (c) 2003-2014
the Libav developers built on Mar 16 2015 13:19:10 with gcc 4.8
(Ubuntu 4.8.2-19ubuntu1) Input #0, wav, from '/tmp/tmp5DUj0a.wav':
Duration: 00:00:01.32, bitrate: 1411 kb/s
Stream #0.0: Audio: pcm_s16le, 44100 Hz, 2 channels, s16, 1411 kb/s
When you concatenate more audio is easy to lose key information.
Is there a way to cut off this kind of output?
Thanks.
I ran into this same issue and here is what I did. You can just create a new function named _play_with_ffplay_suppress and have the following code in it. The difference between the answer above and mine is that Jiaaro used
stdout=open(os.devnull, 'w')
stderr=os.stdout
and I used "devnull" after creating a variable with the same name. Very tiny difference but I hope it solves the error that you mentioned in the comment.
stderr=devnull
stdout=devnull
Here is my code:
#rhp - additional import added
import os
#rhp-custom function to supress output while playing mp3 files
def _play_with_ffplay_suppress(seg):
with NamedTemporaryFile("w+b", suffix=".wav") as f:
seg.export(f.name, "wav")
devnull = open(os.devnull, 'w')
subprocess.call([PLAYER,"-nodisp", "-autoexit", "-hide_banner", f.name],stdout=devnull, stderr=devnull)
For more information you can read about the call function in the subprocess module in Python here https://docs.python.org/3/library/subprocess.html.
the playback functions are really simple (and mostly included for easy use in the interactive python shell) - Your best bet is probably to make a copy of the playback code that is better suited to your needs:
if you're using ffplay this should work:
import os
from pydub.utils import get_player_name
PLAYER = get_player_name()
def play_with_ffplay(seg):
with NamedTemporaryFile("w+b", suffix=".wav") as f:
seg.export(f.name, "wav")
subprocess.call(
[PLAYER, "-nodisp", "-autoexit", f.name],
stdout=open(os.devnull, 'w'),
stderr=os.stdout
)
note: ffmpeg is always going to open a new window for ffplay though - I'd recommend installing pyaudio and using that for playback instead
Related
Working hard for 4 days now to fix the google cloud speech to text api to work, but still see no light at the end of the tunnel. Searched on the net a lot, read the documentations a lot but see no result.
Our site is bbsradio.com, we are trying to auto extract transcript from our mp3 files using google speech-to-text api. Code is written on PHP and almost exact copy of this: https://github.com/GoogleCloudPlatform/php-docs-samples/blob/master/speech/src/transcribe_async.php
I see process is completed and its reached out here "$operation->pollUntilComplete();" but its not showing it was successful at "if ($operation->operationSucceeded()) {" and its not returning any error either at $operation->getError().
I am converting the mp3 to raw file like this: ffmpeg -y -loglevel panic -i /public_html/sites/default/files/show-archives/audio-clips-9-23-2020/911freefall2020-05-24.mp3 -f s16le -acodec pcm_s16le -vn -ac 1 -ar 16000 -map_metadata -1 /home/mp3_to_raw/911freefall2020-05-24.raw
While tried with FLAC format as well, not worked. I tested converted FLAC file using windows media player, I can listen conversation clearly. I checked the files its Hz 16000, channel = 1 and its 16 bit. I see file is uploaded in cloud storage. Checked this:
https://cloud.google.com/speech-to-text/docs/troubleshooting and
https://cloud.google.com/speech-to-text/docs/best-practices
There are lot of discussion and documentation, seems nothing is helpful at this moment. If some one can really help me out to find out the issue, it will be really really really great!
TLDR; convert from MP3 to a 1-channel FLAC file with the same sample rate as your MP3 file.
Long explanation:
Since you're using MP3 files as your process input, probably you MP3 compression artifacts might be hurting you when you resample to to 16KHz (you cannot hear this, but the algoritm will).
To confirm this theory:
Execute ffprobe -hide_banner filename.mp3 it will output something like this:
Metadata:
...
Duration: 00:02:12.21, start: 0.025057, bitrate: 320 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Metadata:
encoder : LAME3.99r
In this case, the sample rate is OK for Google-Spech-Api. Just transcode the file without changing the sample rate (remove the -ar 16000 from your ffmpeg command)
You might get into trouble if the original MP3 bitrate is low. 320kb/s seems safe (unless the recording has a lot of noise).
Take into account that voice recoded under 64kb/s (ISDN line quality) can be understood only by humans if there is some noise.
At last I found the solution and reason of the issue. Actually getting empty results is a bug of the php api code. What you need to do:
Replace this:
$operation->pollUntilComplete();
by this:
while(!$operation->isDone()){
$operation->pollUntilComplete();
}
Read this: enter link description here
In my website i'm now converting my upload images into webp, because is smaller than the other formats, the users will load my pages more faster(Also mobile user). But it's take some time to converter a medium image.
import StringIO
import time
from PIL import Image as PilImage
img = PilImage.open('222.jpg')
originalThumbStr = StringIO.StringIO()
now = time.time()
img.convert('RGBA').save(originalThumbStr, 'webp', quality=75)
print(time.time() - now)
It's take 2,8 seconds to convert this follow image:
860kbytes, 1920 x 1080
My memory is 8gb ram, with a processor with 4 cores(Intel I5), without GPU.
I'm using Pillow==5.4.1.
Is there a faster way to convert a image into WEBB more faster. 2,8s it's seems to long to wait.
If you want them done fast, use vips. So, taking your 1920x1080 image and using vips in the Terminal:
vips webpsave autumn.jpg autumn.webp --Q 70
That takes 0.3s on my MacBook Pro, i.e. it is 10x faster than the 3s your PIL implementation achieves.
If you want lots done really fast, use GNU Parallel and vips. So, I made 100 copies of your image and converted the whole lot to WEBP in parallel like this:
parallel vips webpsave {} {#}.webp --Q 70 ::: *jpg
That took 4.9s for 100 copies of your image, i.e. it is 50x faster than the 3s your PIL implementation achieves.
You could also use the pyvips binding - I am no expert on this but this works and takes 0.3s too:
#!/usr/bin/env python3
import pyvips
# VIPS
img = pyvips.Image.new_from_file("autumn.jpg", access='sequential')
img.write_to_file("autumn.webp")
So, my best suggestion would be to take the 2 lines of code above and use a multiprocessing pool or multithreading approach to get a whole directory of images processed. That could look like this:
#!/usr/bin/env python3
import pyvips
from glob import glob
from pathlib import Path
from multiprocessing import Pool
def doOne(f):
img = pyvips.Image.new_from_file(f, access='sequential')
webpname = Path(f).stem + ".webp"
img.write_to_file(webpname)
if __name__ == '__main__':
files = glob("*.jpg")
with Pool(12) as pool:
pool.map(doOne, files)
That takes 3.3s to convert 100 copies of your image into WEBP equivalents on my 12-core MacBook Pro with NVME disk.
I am using Google Cloud Platform Speech-to-Text API trial account service. I am not able to get text from an audio file. I do not know what exact encoding and sample Rate Hertz I should use for MP3 file of bit rate 128kbps. I tried various options but I am not getting the transcription.
const speech = require('#google-cloud/speech');
const config = {
encoding: 'LINEAR16', //AMR, AMR_WB, LINEAR16(for wav)
sampleRateHertz: 16000, //16000 giving blank result.
languageCode: 'en-US'
};
MP3 is now supported in beta:
MP3 Only available as beta. See RecognitionConfig reference for details.
https://cloud.google.com/speech-to-text/docs/encoding
MP3 MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sampleRateHertz can be optionally unset if not known.
https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#AudioEncoding
You can find out the sample rate using a variety of tools such as iTunes. CD-quality audio uses a sample rate of 44100 Hertz. Read more here:
https://en.wikipedia.org/wiki/44,100_Hz
To use this in a Google SDK, you may need to use one of the beta SDKs that defines this. Here is the constant from the Go Beta SDK:
RecognitionConfig_MP3 RecognitionConfig_AudioEncoding = 8
https://godoc.org/google.golang.org/genproto/googleapis/cloud/speech/v1p1beta1
According to the official documentation (https://cloud.google.com/speech-to-text/docs/encoding),
Only the following formats are supported:
FLAC
LINEAR16
MULAW
AMR
AMR_WB
OGG_OPUS
SPEEX_WITH_HEADER_BYTE
Anything else will be rejected.
Your best bet is to convert the MP3 file to either:
FLAC. .NET: How can I convert an mp3 or a wav file to .flac
Wav and use LINEAR16 in that case. You can use NAudio. Converting mp3 data to wav data C#
Honestly it is annoying that Google does not support MP3 from the get-go compared to Amazon, IBM and Microsoft who do as it forces us to jump through hoops and also increase the bandwidth usage since FLAC and LINEAR16 are lossless and therefore much bigger to transmit.
I had the same issue and resolved it by converting it to FLAC.
Try converting your audio to FLAC and use
encoding: 'FLAC',
For conversion, you can use sox
ref: https://www.npmjs.com/package/sox
now, the mp3 type for spedch-to-text,only available in module speech_v1p1beta1 ,you must post your request for this module,and you will get what you want.
the encoding: 'MP3'
python example like this:
from google.cloud import speech_v1p1beta1 as speech
import io
import base64
client = speech.SpeechClient()
speech_file = "your mp3 file path"
with io.open(speech_file, "rb") as audio_file:
content = (audio_file.read())
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.MP3,
sample_rate_hertz=44100,
language_code="en-US",
)
response = client.recognize(config=config, audio=audio)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
print(response)
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u"Transcript: {}".format(result.alternatives[0].transcript))
result
The Problem:
I'm working on a reader for multiple ip camera streams. The application needs to run on a ubuntu AWS EC2 instance. I've been unsuccessful in trying to reliably fetch and decode the frames from RTSP h.264 streams.
What I've Tried:
I've used OpenCV's and SciKit-Video's VideoCapture classes, neither of which could both fetch and decode the frames from my test stream. I have verified that my test stream is readable using VLC and openRTSP, so I believe this is an encoding issue.
I've also attempted to build some solutions using python's subprocess module to run the aforementioned command-line applications. This allows me to read the stream reliably, but it raises the issue that the decoder fails due to it (apparently) not finding the keyframe data it needs to decode the stream.
Below is the code for this example. It tells openRTSP to periodically save some amount of video as a separate file and uses an openCV VideoCapture to get a single frame from each of those samples. Code:
def openrtsp_thread(queue, feed_name, source, sample_time, intruder, cleanup=True):
(major, minor, subminor) = (cv2.__version__).split('.')
cmd = 'openRTSP -V -4 -v -P ' + str(sample_time) + ' ' + source
out_dir = '/aws_odw/frame-store/'+feed_name.replace(' ','-')+'/'
try:
os.mkdir(out_dir)
except:
print '[?]['+feed_name+']: Not creating new frame directory'
pass
os.chdir(out_dir)
p = subprocess.Popen(cmd.split(' '))
while True:
try:
for dirs, files, filenames in os.walk(out_dir):
for f in filenames:
cap = cv2.VideoCapture(os.path.join(out_dir, f))
if int(major) < 3:
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
else:
fps = cap.get(cv2.CAP_PROP_FPS)
#for i in range(int(float(sample_time)*fps*0.5)):
ret, frame = cap.read()
cap.release()
print 'enqueueing...'
queue.put((feed_name, frame, intruder))
except (KeyboardInterrupt, SystemExit):
print '[x]['+feed_name+']: keyboard interrupt, cleaning up...'
break
p.send_signal(signal.SIGUSR1)
p.wait()
print '[*]['+feed_name+']: exiting gracefully.'
Can anyone offer any pointers? I don't know much about video encoding, so I'm feeling pretty lost. Any help would be very appreciated.
Edit: the end goal here is to queue the frames in python for real-time precessing in a computer vision application.
I'm using the Python Speech Recognition library to recognize speech input from the microphone.
This works fine with my default microphone.
This is the code I'm using. According to what I understood of the documentation
Creates a new Microphone instance, which represents a physical
microphone on the computer. Subclass of AudioSource.
If device_index is unspecified or None, the default microphone is used
as the audio source. Otherwise, device_index should be the index of
the device to use for audio input. https://pypi.python.org/pypi/SpeechRecognition/
The problem is that when I want to get the node with pyaudio.get_device_count() - 1. I'm getting this error.
AttributeError: 'module' object has no attribute 'get_device_count'
So I'm not sure how to configure the microphone to use a usb microphone
import pyaudio
import speech_recognition as sr
index = pyaudio.get_device_count() - 1
print index
r = sr.Recognizer()
with sr.Microphone(index) as source:
audio = r.listen(source)
try:
print("You said " + r.recognize(audio))
except LookupError:
print("Could not understand audio")
myPyAudio=pyaudio.PyAudio()
print "Seeing pyaudio devices:",myPyAudio.get_device_count()
That's a bug in the library. I just pushed out a fix in 1.3.1, so this should now be fixed!
Version 1.3.1 retains full backwards compatibility with previous versions.