I am trying to send audio from a microphone input between a server and client using pyaudio, I only need voice quality sampled at a rate of 8000. Without compression it works fine and I am trying to add zlib compression to reduce the bandwidth.
In my server the stream_callback function is
def callback(in_data, frame_count, time_info, status):
for s in read_list[1:]:
s.send(zlib.compress(in_data))
return (None, pyaudio.paContinue)
In my client I am trying to decompress like this
try:
while True:
data = s.recv(CHUNK)
stream.write(zlib.decompress(data, zlib.MAX_WBITS | 16))
except KeyboardInterrupt:
pass
I have tried various parameters with zlib.MAX_WBITS but all return this error:
zlib.error: Error -3 while decompressing data: incorrect header check
Edit: I have also tried with no second parameter with zlib.decompress
Can someone suggest what I am doing wrong please, TIA
You don't need the second parameter of zlib.decompress at all. What you have in the question would look for a gzip stream instead of a zlib stream.
For compressing audio, you should use an audio compressor. Take a look at this answer.
Related
I am using Google Cloud Platform Speech-to-Text API trial account service. I am not able to get text from an audio file. I do not know what exact encoding and sample Rate Hertz I should use for MP3 file of bit rate 128kbps. I tried various options but I am not getting the transcription.
const speech = require('#google-cloud/speech');
const config = {
encoding: 'LINEAR16', //AMR, AMR_WB, LINEAR16(for wav)
sampleRateHertz: 16000, //16000 giving blank result.
languageCode: 'en-US'
};
MP3 is now supported in beta:
MP3 Only available as beta. See RecognitionConfig reference for details.
https://cloud.google.com/speech-to-text/docs/encoding
MP3 MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sampleRateHertz can be optionally unset if not known.
https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#AudioEncoding
You can find out the sample rate using a variety of tools such as iTunes. CD-quality audio uses a sample rate of 44100 Hertz. Read more here:
https://en.wikipedia.org/wiki/44,100_Hz
To use this in a Google SDK, you may need to use one of the beta SDKs that defines this. Here is the constant from the Go Beta SDK:
RecognitionConfig_MP3 RecognitionConfig_AudioEncoding = 8
https://godoc.org/google.golang.org/genproto/googleapis/cloud/speech/v1p1beta1
According to the official documentation (https://cloud.google.com/speech-to-text/docs/encoding),
Only the following formats are supported:
FLAC
LINEAR16
MULAW
AMR
AMR_WB
OGG_OPUS
SPEEX_WITH_HEADER_BYTE
Anything else will be rejected.
Your best bet is to convert the MP3 file to either:
FLAC. .NET: How can I convert an mp3 or a wav file to .flac
Wav and use LINEAR16 in that case. You can use NAudio. Converting mp3 data to wav data C#
Honestly it is annoying that Google does not support MP3 from the get-go compared to Amazon, IBM and Microsoft who do as it forces us to jump through hoops and also increase the bandwidth usage since FLAC and LINEAR16 are lossless and therefore much bigger to transmit.
I had the same issue and resolved it by converting it to FLAC.
Try converting your audio to FLAC and use
encoding: 'FLAC',
For conversion, you can use sox
ref: https://www.npmjs.com/package/sox
now, the mp3 type for spedch-to-text,only available in module speech_v1p1beta1 ,you must post your request for this module,and you will get what you want.
the encoding: 'MP3'
python example like this:
from google.cloud import speech_v1p1beta1 as speech
import io
import base64
client = speech.SpeechClient()
speech_file = "your mp3 file path"
with io.open(speech_file, "rb") as audio_file:
content = (audio_file.read())
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.MP3,
sample_rate_hertz=44100,
language_code="en-US",
)
response = client.recognize(config=config, audio=audio)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
print(response)
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u"Transcript: {}".format(result.alternatives[0].transcript))
result
I have an issue trying to decompress an imap message compressed using deflate method. The things I've tryed so far were isolating one of the directions of an IMAP conversation (using wireshark's follow tcp function) and saving the message data in an raw format that I hope it contains only the deflated message part. I then found some programs like tinf (1st and 3rd example) and miniz (tgunzip example) and tryed to inflate back that file, but with no succes.
I am missing something? Thank you in advance.
tinf - http://www.ibsensoftware.com/download.html
Miniz - https://code.google.com/archive/p/miniz/source/default/source
Try piping that raw data to:
perl -MCompress::Zlib -pe 'BEGIN{$i = inflateInit(-WindowBits => -15)}
$_=$i->inflate($_)'
The important part is the -WindowBits => -15 that changes the expected format into a raw one without adler checksum.
(that's derived from the dovecot source, works for me on Thunderbird to gmail network capture).
From RFC4978 that specifies IMAP compression (emphasis mine):
When using the zlib library (see RFC1951), the functions
deflateInit2(), deflate(), inflateInit2(), and inflate() suffice to
implement this extension. The windowBits value must be in the range
-8 to -15, or else deflateInit2() uses the wrong format.
deflateParams() can be used to improve compression rate and resource
use. The Z_FULL_FLUSH argument to deflate() can be used to clear the
dictionary (the receiving peer does not need to do anything).
The Problem:
I'm working on a reader for multiple ip camera streams. The application needs to run on a ubuntu AWS EC2 instance. I've been unsuccessful in trying to reliably fetch and decode the frames from RTSP h.264 streams.
What I've Tried:
I've used OpenCV's and SciKit-Video's VideoCapture classes, neither of which could both fetch and decode the frames from my test stream. I have verified that my test stream is readable using VLC and openRTSP, so I believe this is an encoding issue.
I've also attempted to build some solutions using python's subprocess module to run the aforementioned command-line applications. This allows me to read the stream reliably, but it raises the issue that the decoder fails due to it (apparently) not finding the keyframe data it needs to decode the stream.
Below is the code for this example. It tells openRTSP to periodically save some amount of video as a separate file and uses an openCV VideoCapture to get a single frame from each of those samples. Code:
def openrtsp_thread(queue, feed_name, source, sample_time, intruder, cleanup=True):
(major, minor, subminor) = (cv2.__version__).split('.')
cmd = 'openRTSP -V -4 -v -P ' + str(sample_time) + ' ' + source
out_dir = '/aws_odw/frame-store/'+feed_name.replace(' ','-')+'/'
try:
os.mkdir(out_dir)
except:
print '[?]['+feed_name+']: Not creating new frame directory'
pass
os.chdir(out_dir)
p = subprocess.Popen(cmd.split(' '))
while True:
try:
for dirs, files, filenames in os.walk(out_dir):
for f in filenames:
cap = cv2.VideoCapture(os.path.join(out_dir, f))
if int(major) < 3:
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
else:
fps = cap.get(cv2.CAP_PROP_FPS)
#for i in range(int(float(sample_time)*fps*0.5)):
ret, frame = cap.read()
cap.release()
print 'enqueueing...'
queue.put((feed_name, frame, intruder))
except (KeyboardInterrupt, SystemExit):
print '[x]['+feed_name+']: keyboard interrupt, cleaning up...'
break
p.send_signal(signal.SIGUSR1)
p.wait()
print '[*]['+feed_name+']: exiting gracefully.'
Can anyone offer any pointers? I don't know much about video encoding, so I'm feeling pretty lost. Any help would be very appreciated.
Edit: the end goal here is to queue the frames in python for real-time precessing in a computer vision application.
Using wireshark, I could see the html page I was requesting (segment reconstruction). I was not able to use pyshark to do this task, so I turned around to scapy. Using scapy and sniffing wlan0, I am able to print request headers with this code:
from scapy.all import *
def http_header(packet):
http_packet=str(packet)
if http_packet.find('GET'):
return GET_print(packet)
def GET_print(packet1):
ret = packet1.sprintf("{Raw:%Raw.load%}\n")
return ret
sniff(iface='wlan0', prn=http_header, filter="tcp port 80")
Now, I wish to be able to reconstruct the full request to find images and print the html page requested.
What you are searching for is
IP Packet defragmentation
TCP Stream reassembly
see here
scapy
provides best effort ip.defragmentation via defragment([list_of_packets,]) but does not provide generic tcp stream reassembly. Anyway, here's a very basic TCPStreamReassembler that may work for your usecase but operates on the invalid assumption that a consecutive stream will be split into segments of the max segment size (mss). It will concat segments == mss until a segment < mss is found. it will then spit out a reassembled TCP packet with the full payload.
Note TCP Stream Reassembly is not trivial as you have to take care of Retransmissions, Ordering, ACKs, ...
tshark
according to this answer tshark has a command-line option equivalent to wiresharks "follow tcp stream" that takes a pcap and creates multiple output files for all the tcp sessions/"conversations"
since it looks like pyshark is only an interface to the tshark binary it should be pretty straight forward to implement that functionality if it is not already implemented.
With Scapy 2.4.3+, you can use
sniff([...], session=TCPSession)
to reconstruct the HTTP packets
I want to receive images from an IP camera over HTTP via GET request. I have written a program that creates TCP socket connection with the camera and sends the following GET request to the camera:
GET /mjpeg?res=full HTTP/1.1\r\nHost: 143.205.116.14\r\n\r\n
After that, I receive images with the following function in a while loop:
while((tmpres = recv(sock,(void *) buf, SIZE, 0)) > 0 && check<10)
.....
where SIZE represents the size of the buffer. I, infact, don't know what size to define here. I am receiving a color image of size 2940x1920. So I define SIZE=2940x1920x3. After receiving the MJPEG image, I decode it with ffmpeg. But I observe that ffmpeg just partially/uncorrectly decodes the image and I just see a half (or even less) of the image. I assume it could be a size problem.
Any help in this regard would be highly appreciated.
Regards,
Khan
Why reinvent the wheel (for the umpteenth time)? Use a ready-made HTTP client library, such as libcurl.
For that matter, perhaps you can even just write your entire solution as a shell script using the curl command line program:
#!/bin/sh
curl -O "http://143.205.116.14/mjpeg?res=full" || echo "Error."
curl -o myfile.jpg "http://143.205.116.14/mjpeg?res=full" || echo "Error."
# ffmpeg ...
Save bytes received as binary file and analyze. May be an incomplete image (image can be encoded as progressive JPEG - is interlaced in fact - that means if you truncate the file you'll see horizontal lines.) or can be a ffmpeg decoding issue. Or something different. What is check < 10 condition ?