I'm trying to implement inline mode in Telegram bot answering with list of voice messages. Using nginx for serving .ogg files encoded with OPUS and using them in voice_url in InlineQueryResultVoice. After doing some research, I discovered that it worked OK for voice files with size > ~8KB. However smaller files are sent as documents and aren't playable in Telegram client (but they are still correct .ogg audios). Also links are playable in browser.
I use ffmpeg -i <mp3_file> -acodec libopus -b:a 48k -vbr on -compression_level 10 <ogg_file> to convert from mp3. Also tried opus-tools.
Example of what I send in a result array:
[{
"type":"voice",
"id":"m183802670825221348",
"title":"Something",
"voice_url":"https://<domain>/voice/m183802670825221348.ogg",
"duration":1
},
...]
What am I doing wrong?
Related
I am trying to send audio from a microphone input between a server and client using pyaudio, I only need voice quality sampled at a rate of 8000. Without compression it works fine and I am trying to add zlib compression to reduce the bandwidth.
In my server the stream_callback function is
def callback(in_data, frame_count, time_info, status):
for s in read_list[1:]:
s.send(zlib.compress(in_data))
return (None, pyaudio.paContinue)
In my client I am trying to decompress like this
try:
while True:
data = s.recv(CHUNK)
stream.write(zlib.decompress(data, zlib.MAX_WBITS | 16))
except KeyboardInterrupt:
pass
I have tried various parameters with zlib.MAX_WBITS but all return this error:
zlib.error: Error -3 while decompressing data: incorrect header check
Edit: I have also tried with no second parameter with zlib.decompress
Can someone suggest what I am doing wrong please, TIA
You don't need the second parameter of zlib.decompress at all. What you have in the question would look for a gzip stream instead of a zlib stream.
For compressing audio, you should use an audio compressor. Take a look at this answer.
Working hard for 4 days now to fix the google cloud speech to text api to work, but still see no light at the end of the tunnel. Searched on the net a lot, read the documentations a lot but see no result.
Our site is bbsradio.com, we are trying to auto extract transcript from our mp3 files using google speech-to-text api. Code is written on PHP and almost exact copy of this: https://github.com/GoogleCloudPlatform/php-docs-samples/blob/master/speech/src/transcribe_async.php
I see process is completed and its reached out here "$operation->pollUntilComplete();" but its not showing it was successful at "if ($operation->operationSucceeded()) {" and its not returning any error either at $operation->getError().
I am converting the mp3 to raw file like this: ffmpeg -y -loglevel panic -i /public_html/sites/default/files/show-archives/audio-clips-9-23-2020/911freefall2020-05-24.mp3 -f s16le -acodec pcm_s16le -vn -ac 1 -ar 16000 -map_metadata -1 /home/mp3_to_raw/911freefall2020-05-24.raw
While tried with FLAC format as well, not worked. I tested converted FLAC file using windows media player, I can listen conversation clearly. I checked the files its Hz 16000, channel = 1 and its 16 bit. I see file is uploaded in cloud storage. Checked this:
https://cloud.google.com/speech-to-text/docs/troubleshooting and
https://cloud.google.com/speech-to-text/docs/best-practices
There are lot of discussion and documentation, seems nothing is helpful at this moment. If some one can really help me out to find out the issue, it will be really really really great!
TLDR; convert from MP3 to a 1-channel FLAC file with the same sample rate as your MP3 file.
Long explanation:
Since you're using MP3 files as your process input, probably you MP3 compression artifacts might be hurting you when you resample to to 16KHz (you cannot hear this, but the algoritm will).
To confirm this theory:
Execute ffprobe -hide_banner filename.mp3 it will output something like this:
Metadata:
...
Duration: 00:02:12.21, start: 0.025057, bitrate: 320 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Metadata:
encoder : LAME3.99r
In this case, the sample rate is OK for Google-Spech-Api. Just transcode the file without changing the sample rate (remove the -ar 16000 from your ffmpeg command)
You might get into trouble if the original MP3 bitrate is low. 320kb/s seems safe (unless the recording has a lot of noise).
Take into account that voice recoded under 64kb/s (ISDN line quality) can be understood only by humans if there is some noise.
At last I found the solution and reason of the issue. Actually getting empty results is a bug of the php api code. What you need to do:
Replace this:
$operation->pollUntilComplete();
by this:
while(!$operation->isDone()){
$operation->pollUntilComplete();
}
Read this: enter link description here
I am using Google Cloud Platform Speech-to-Text API trial account service. I am not able to get text from an audio file. I do not know what exact encoding and sample Rate Hertz I should use for MP3 file of bit rate 128kbps. I tried various options but I am not getting the transcription.
const speech = require('#google-cloud/speech');
const config = {
encoding: 'LINEAR16', //AMR, AMR_WB, LINEAR16(for wav)
sampleRateHertz: 16000, //16000 giving blank result.
languageCode: 'en-US'
};
MP3 is now supported in beta:
MP3 Only available as beta. See RecognitionConfig reference for details.
https://cloud.google.com/speech-to-text/docs/encoding
MP3 MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sampleRateHertz can be optionally unset if not known.
https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#AudioEncoding
You can find out the sample rate using a variety of tools such as iTunes. CD-quality audio uses a sample rate of 44100 Hertz. Read more here:
https://en.wikipedia.org/wiki/44,100_Hz
To use this in a Google SDK, you may need to use one of the beta SDKs that defines this. Here is the constant from the Go Beta SDK:
RecognitionConfig_MP3 RecognitionConfig_AudioEncoding = 8
https://godoc.org/google.golang.org/genproto/googleapis/cloud/speech/v1p1beta1
According to the official documentation (https://cloud.google.com/speech-to-text/docs/encoding),
Only the following formats are supported:
FLAC
LINEAR16
MULAW
AMR
AMR_WB
OGG_OPUS
SPEEX_WITH_HEADER_BYTE
Anything else will be rejected.
Your best bet is to convert the MP3 file to either:
FLAC. .NET: How can I convert an mp3 or a wav file to .flac
Wav and use LINEAR16 in that case. You can use NAudio. Converting mp3 data to wav data C#
Honestly it is annoying that Google does not support MP3 from the get-go compared to Amazon, IBM and Microsoft who do as it forces us to jump through hoops and also increase the bandwidth usage since FLAC and LINEAR16 are lossless and therefore much bigger to transmit.
I had the same issue and resolved it by converting it to FLAC.
Try converting your audio to FLAC and use
encoding: 'FLAC',
For conversion, you can use sox
ref: https://www.npmjs.com/package/sox
now, the mp3 type for spedch-to-text,only available in module speech_v1p1beta1 ,you must post your request for this module,and you will get what you want.
the encoding: 'MP3'
python example like this:
from google.cloud import speech_v1p1beta1 as speech
import io
import base64
client = speech.SpeechClient()
speech_file = "your mp3 file path"
with io.open(speech_file, "rb") as audio_file:
content = (audio_file.read())
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.MP3,
sample_rate_hertz=44100,
language_code="en-US",
)
response = client.recognize(config=config, audio=audio)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
print(response)
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u"Transcript: {}".format(result.alternatives[0].transcript))
result
hi i am using wowza streaming engine 4 , to stream smil file
i am able to trace events when file play on flash,
and gether informantions like , which file play , time etc,
in onConnect() event,
precisely i want to get which file is played from my smil file.
but in case of apple hls streaming when i try to get file name in onHTTPSessionDestroy() method , eg.
public void onHTTPSessionDestroy(IHTTPStreamerSession httpSession) {
String streamName = httpSession.getStreamName();
}
i only get the name of smil file , not the actual file played .
is it possible to get the played file info in wowza hls steaming
New Api is introduce in wowza 4.1.1
public void onHTTPStreamerRequest(IHTTPStreamerSession httpSession, IHTTPStreamerRequestContext reqContext)
we can get check which bit-rate is played like media_w1577403587_b4000000_1.ts
_b is the bitrate
I want to receive images from an IP camera over HTTP via GET request. I have written a program that creates TCP socket connection with the camera and sends the following GET request to the camera:
GET /mjpeg?res=full HTTP/1.1\r\nHost: 143.205.116.14\r\n\r\n
After that, I receive images with the following function in a while loop:
while((tmpres = recv(sock,(void *) buf, SIZE, 0)) > 0 && check<10)
.....
where SIZE represents the size of the buffer. I, infact, don't know what size to define here. I am receiving a color image of size 2940x1920. So I define SIZE=2940x1920x3. After receiving the MJPEG image, I decode it with ffmpeg. But I observe that ffmpeg just partially/uncorrectly decodes the image and I just see a half (or even less) of the image. I assume it could be a size problem.
Any help in this regard would be highly appreciated.
Regards,
Khan
Why reinvent the wheel (for the umpteenth time)? Use a ready-made HTTP client library, such as libcurl.
For that matter, perhaps you can even just write your entire solution as a shell script using the curl command line program:
#!/bin/sh
curl -O "http://143.205.116.14/mjpeg?res=full" || echo "Error."
curl -o myfile.jpg "http://143.205.116.14/mjpeg?res=full" || echo "Error."
# ffmpeg ...
Save bytes received as binary file and analyze. May be an incomplete image (image can be encoded as progressive JPEG - is interlaced in fact - that means if you truncate the file you'll see horizontal lines.) or can be a ffmpeg decoding issue. Or something different. What is check < 10 condition ?