How calculate MPEG-1, 2 frame duration - rtp

I am trying to make a .mp4 file from the audio and video rtp packets of an IP camera.
When the audio format in the camera is configured as MPEG-2 ADTS i receive rtp packets with a payload size of 144 bytes, but these are made of two audio frames 72 bytes each.
However, when i configure the format as MPEG-1 the payload is made of only one audio frame.
What is the reason for this difference? I could get this info from some bits of the payload? as i do for the bitrate, samplerate, etc. I have read that the theoric packet size is 144 bytes, so how could i retrieve the frame size and number of frames in the package?
Besides, in order to calculate the theoric frame duration i am using the next formula:
time = 1/bitrate * framesize (in bytes) * 8
This is working well in the case of MPEG-2 with different combinations of bitrate and samplerate. However, it does not seem to work for the MPEG-1. Am i doing something wrong here?

Related

How determine raw AAC frame size without ADTS header in .mp4?

I am trying to restore corrupted video with sound. Video restore good because each video frame consist frist bytes header with size. But in .mp4 AAC saved as raw byte stream without ADTS header.
At first I wanted take all bytes data between 2 video frames and save in as 1 audio frame, but there may be more the one audio frame. Each AAC frame started at 0x21 ... .... So i am tryied split all received data by '0x21' separator, but in not always work because some AAC-samples may consist inside several value = 0x21.
I found out that ffmpeg when read .mp4 format parse moov atom -> ... -> stsz (Sample Size). But if moov atom was lost how I can restore correct AAC sizes for audio frames?

FFmpeg resample audio while decoding

I am having a task to build a decoder that generates exactly 1 raw audio frame for 1 raw video frame, from an encoded mpegts network stream, so that users can use the API by calling getFrames() and receive exactly these two frames.
Currently I am reading with av_read_frame in a thread, decode as packets come, audio or video; collect until a video packet is hit. Problem is generally multiple audio packets are received before video is seen.
av_read_frame is blocking, returns when certain amount of audio data is collected (1152 samples for mp2); and decoding that packet gives a raw AVFrame having duration of T (depends on samplerate); whereas the video frame generally has duration bigger than T (depends on fps), so multiple audio frames are received before it.
I was guessing I have to find a way to merge collected audio frames into 1 single frame just when video is hit. Also resampling and setting timestamp to align with video is needed I guess. I don't know if this is even valid though.
What is the smoothest way to sync video and audio in this manner ?

IS reading from buffer quicker than reading from a file in python

I have a fpga board and I write a VHDL code that can get Images (in binary) from serial port and save them in a SDRAM on my board. then FPGA display images on a monitor via a VGA cable. my problem is filling the SDRAM take to long(about 10 minutes with 115200 baud rate).
on my computer I wrote a python code to send image(in binary) to FPGA via serial port. my code read binary file that saved in my hard disk and send them to FPGA.
my question is if I use buffer to save my images insted of binary file, do I get a better result? if so, can you help me how to do that, please? if not, can you suggest me a solution, please?
thanks in advans,
Unless you are significantly compressing before download, and decompressing the image after download, the problem is your 115,200 baud transfer rate, not the speed of reading from a file.
At the standard N/8/1 line encoding, each byte requires 10 bits to transfer, so you will be transferring 1150 bytes per second.
In 10 minutes, you will transfer 1150 * 60 * 10 = 6,912,000 bytes. At 3 bytes per pixel (for R, G, and B), this is 2,304,600 pixels, which happens to be the number of pixels in a 1920 by 1200 image.
The answer is to (a) increase the baud rate; and/or (b) compress your image (using something simple to decompress on the FPGA like RLE, if it is amenable to that sort of compression).

FFmpeg: How to estimate number of samples in audio stream?

I'm currently writing a small application that's making use of the FFmpeg library in order to decode audio files (especially avformat and swresample) in C++.
Now I need the total number of samples in an audio stream. I know that the exact number can only be found out by actually decoding all the frames, I just need an estimation. What is the preferred method here? How can I find out the duration of a file?
There's some good info in this question about how to get info out of ffmpeg: FFMPEG Can't Display The Duration Of a Video.
To work out the number of samples in an audio stream, you need three basic bits of info:
The duration (in seconds)
The sample rate (in samples per second)
The number of channels in the stream (e.g. 2 for stereo)
Once you have that info, the total number of samples in your stream is simply [duration] * [rate] * [channels].
Note that this is not equivalent to bytes, as the samples are likely to be at least 16 bit, and possibly 24.
I believe what you need is the formula that is AUDIORATE / FRAMERATE. For instance, if ar=48000, and frame rate of video is let's say 50fps then 48000/50 = 960 samples per frame you need.
Buffer calculation comes later as samples_per_frame * nChannels * (audiobit/8).
AudioBit is usually 16bit (24 or 32bits also possible). So for 8 channels audio at 16bit 48Khz, you'll need 960 * 8 * 2 = 15360 bytes per audio frame.
Offical way to do this last calculation is to use :
av_samples_get_buffer_size(NULL, nChannels, SamplesPerFrame, audio_st->codec->sample_fmt, 0)
function.
av_samples_get_buffer_size(NULL, 8, 960, audio_st->codec->sample_fmt, 0)
will return also 15360 (For experts: yes I'm assuming format is pcm_s16le).
So this answers first part of your question. Hope that helps.

Writing 4:2:0 YUV-Rawdata into an AVI-File via DirectShow in C++

I'm trying to write some 4:2:0 rawdata received from a capture card into an AVI-File. For every pixel the char buffer contains 2 Bytes (16 Bit). The order of the data is the same as FOURCC UYVY: YUV 4:2:2 (Y sample at every pixel, U and V sampled at every second pixel horizontally on each line). A macropixel contains 2 pixels in 1 u_int32.
First I tried the OpenCV Videowriter. But this is simply too slow for this huge amount of video data (I'm capturing 2 video streams, each is 1080p25 format), so I switched to the "Video for Windows"-Library by Windows. But even this one does't proceed the file writing in real time. My last chance is Directshow. I want to use the AVI Mux and the File Writer Filters to store my raw data as an AVI-File, but I'm not shure how to "give" the AVI Mux my raw data (char array) which contains just video data in UYVY-order and no audio. Maybe you can give me an advice. This is what I've got until now:
CoInitialize(NULL);
IGraphBuilder*pGraph= NULL;
CoCreateInstance(CLSID_FilterGraph, NULL,CLSCTX_INPROC_SERVER,IID_IGraphBuilder, (void **)&pGraph);
IMediaControl*pMediaControl= NULL;
pGraph->QueryInterface(IID_IMediaControl,(void **)&pMediaControl);
ICaptureGraphBuilder2 *pCapture= NULL;
CoCreateInstance(CLSID_CaptureGraphBuilder2, NULL,CLSCTX_INPROC,IID_ICaptureGraphBuilder2, (void **)&pCapture);
IBaseFilter *pMux;
pCapture->SetOutputFileName(&MEDIASUBTYPE_Avi,L"Test.avi",&pMux,NULL);
IBaseFilter *pCap;
pCapture->RenderStream(&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Video,pCap,NULL,pMux);
Thx a lot and Regards,
Valentin
(As you mentioned 10 fps in previous question, which I assume to be effective frame rate) Are you writing dual 1920x1080 12 bits per pixel 10 fps into a file? This is 60 megabytes per second, you might be just hitting your HDD writing capacity limit.
Choosing different API is not going to help if your HDD is not powerful enough. You need to either compress data, or lower resolution or FPS. Or use faster drives.