I created my own MP3 frame parser. I can read each frame individually if I do it sequentially.
My problem is that I have yet to figure out how to find the byte offset of the nth frame (without having to read all the prior frames).
CBR makes this process easier, but I still don't know how the padding bit factors in.
For example, take a file that has the following info:
Total file length: 4916595
===== ID3 METADATA =====
ID3v2 header length: 143
===== MP3 FRAMES =====
Version: MPEG Version 1
Layer: Layer III
Error protection: No
Bitrate: 192
Frequency: 44.1
Some frames have a byte length of 626 and other frames have a length of 627.
Let's say I want to find the 100th frame, I can't simply do 100 * 626, nor can I do 100 * 627.
How should I factor in the padding bit in my formula to find nth frame's byte offfset?
Probably MP3val solves your problem. Once installed, run:
mp3val -f file.mp3
Related
I am trying to make a .mp4 file from the audio and video rtp packets of an IP camera.
When the audio format in the camera is configured as MPEG-2 ADTS i receive rtp packets with a payload size of 144 bytes, but these are made of two audio frames 72 bytes each.
However, when i configure the format as MPEG-1 the payload is made of only one audio frame.
What is the reason for this difference? I could get this info from some bits of the payload? as i do for the bitrate, samplerate, etc. I have read that the theoric packet size is 144 bytes, so how could i retrieve the frame size and number of frames in the package?
Besides, in order to calculate the theoric frame duration i am using the next formula:
time = 1/bitrate * framesize (in bytes) * 8
This is working well in the case of MPEG-2 with different combinations of bitrate and samplerate. However, it does not seem to work for the MPEG-1. Am i doing something wrong here?
I'm making an small real-time audio-video application using Directshow. I use SampleGrabber to grab samples from Audio Capture filter. The SampleGrabber's callback is called every second and each sample's size is 88200 bytes. I printed the WAVEFORMATEX:
WAVE_FORMAT_PCM: true
nChannels: 2
nSamplesPerSec: 44100
nAvgBytesPerSec: 176400
nBlockAlign: 4
wBitsPerSample: 16
cbSize: 0
so I have 2 questions:
Is 'sample' in Directshow's aspect is different from 'sample' in audio recording? Because as I know, there are 44100 samples per second (each costs 16 bits) while directshow's SampleGrabber only grab 1 sample per second (each costs 88200 bytes). Look like lots of sample are aggregated and put into a 'buffer' ?
If lots of audio sample are put into a buffer so the buffer's size should be 176400 bytes per sec. Why it is only 88200 bytes per buffer? Is only 1 channel used?
Directshow "sample" is a term for buffer with data:
When a pin delivers media data to another pin, it does not pass a direct pointer to the memory buffer. Instead, it delivers a pointer to a COM object that manages the memory. This object, called a media sample, exposes the IMediaSample interface.
Then
... size should be 176400 bytes per sec. Why it is only 88200 bytes per buffer?
No it should not. You're seeing the default behavior of capture filter to produce 500 ms buffers. You can use IAMBufferNegotiation interface (related questions and search for other) to override this behavior.
I have a fpga board and I write a VHDL code that can get Images (in binary) from serial port and save them in a SDRAM on my board. then FPGA display images on a monitor via a VGA cable. my problem is filling the SDRAM take to long(about 10 minutes with 115200 baud rate).
on my computer I wrote a python code to send image(in binary) to FPGA via serial port. my code read binary file that saved in my hard disk and send them to FPGA.
my question is if I use buffer to save my images insted of binary file, do I get a better result? if so, can you help me how to do that, please? if not, can you suggest me a solution, please?
thanks in advans,
Unless you are significantly compressing before download, and decompressing the image after download, the problem is your 115,200 baud transfer rate, not the speed of reading from a file.
At the standard N/8/1 line encoding, each byte requires 10 bits to transfer, so you will be transferring 1150 bytes per second.
In 10 minutes, you will transfer 1150 * 60 * 10 = 6,912,000 bytes. At 3 bytes per pixel (for R, G, and B), this is 2,304,600 pixels, which happens to be the number of pixels in a 1920 by 1200 image.
The answer is to (a) increase the baud rate; and/or (b) compress your image (using something simple to decompress on the FPGA like RLE, if it is amenable to that sort of compression).
I'm currently writing a small application that's making use of the FFmpeg library in order to decode audio files (especially avformat and swresample) in C++.
Now I need the total number of samples in an audio stream. I know that the exact number can only be found out by actually decoding all the frames, I just need an estimation. What is the preferred method here? How can I find out the duration of a file?
There's some good info in this question about how to get info out of ffmpeg: FFMPEG Can't Display The Duration Of a Video.
To work out the number of samples in an audio stream, you need three basic bits of info:
The duration (in seconds)
The sample rate (in samples per second)
The number of channels in the stream (e.g. 2 for stereo)
Once you have that info, the total number of samples in your stream is simply [duration] * [rate] * [channels].
Note that this is not equivalent to bytes, as the samples are likely to be at least 16 bit, and possibly 24.
I believe what you need is the formula that is AUDIORATE / FRAMERATE. For instance, if ar=48000, and frame rate of video is let's say 50fps then 48000/50 = 960 samples per frame you need.
Buffer calculation comes later as samples_per_frame * nChannels * (audiobit/8).
AudioBit is usually 16bit (24 or 32bits also possible). So for 8 channels audio at 16bit 48Khz, you'll need 960 * 8 * 2 = 15360 bytes per audio frame.
Offical way to do this last calculation is to use :
av_samples_get_buffer_size(NULL, nChannels, SamplesPerFrame, audio_st->codec->sample_fmt, 0)
function.
av_samples_get_buffer_size(NULL, 8, 960, audio_st->codec->sample_fmt, 0)
will return also 15360 (For experts: yes I'm assuming format is pcm_s16le).
So this answers first part of your question. Hope that helps.
I'm working on a custom Windows DirectShow source filter based on CSource and CSourceStream for each pin. There are two pins - video output and audio output. Both pins work fine when individually rendered in graphedit and similar tools such as Graph Studio with correct time stamps, frame rates and sound. I'm rendering the video to the Video Mixing Renderer (VMR7 or VMR9).
However when I render both pins the video plays back too fast while the audio still sounds correct. The video plays back approximately 50% too fast but I think this is limited by the speed of decoding.
The timestamps on the samples are the same in both cases. If I render the audio stream to a null renderer (the one in qedit.dll) then the video stream plays back at the correct frame rate. The filter is a 32 bit filter running on a Win7 x64 system.
When I added support for IMediaSeeking seeking I found that the seeking bar for the audio stream behaved quite bizarrely. However the problem happens without IMediaSeeking support.
Any suggestions for what could be causing this or suggestions for further investigation?
The output types from the audio and video pin are pasted below:
Mediatyp: Video Subtype: RGB24 Format: Type VideoInfo Video Size: 1024 x 576 Pixel, 24 Bit Image Size: 1769472 Bytes Compression: RGB Source: width 0, height 0 Target: width 0, height 0 Bitrate: 0 bits/sec. Errorrate: 0 bits/sec. Avg. display time: 41708 µsec.
Mediatyp: Video Subtype: RGB32 Format: Type VideoInfo Video Size: 1024 x 576 Pixel, 32 Bit Image Size: 2359296 Bytes Compression: RGB Source: width 0, height 0 Target: width 0, height 0 Bitrate: 0 bits/sec. Errorrate: 0 bits/sec. Avg. display time: 41708 µsec.
Majortyp: Audio
Subtype: PCM audio
Sample Size: 3
Type WaveFormatEx
Wave Format: Unknown
Channels: 1
Samples/sec.: 48000
Avg. bytes/sec.:144000
Block align: 3
Bits/sample: 24
I realised the problem straight after posting the question. A case of debugging by framing the question correctly.
The audio stream had completely bogus time stamps. The audio and video streams played back fine individually but did not synch at all with each other when played together.