RTP timestamp not linear? - rtp

I was trying to reconstruct an audio conversation (a-b call using g711 audio) using the rtp time-stamp. I used to fill silence using difference of two rtp time-stamp and sampling rate. The conversation went out of sync and then I see that rtp time-stamp is not linear.I was not able to get exact clock time using rtp time-stamp and resulted in sync issues. How do i calculate the exact time.

I have the same problem with a Stream provided by GStreamer, whic doesnt provide monotonic timestamps.
for Example: The Difference between the stamps should bei exactly 1920, but it is between ~120 and ~3500, but in average 1920.
The problem here is that there is no way to find missing samples, because you never know if the high difference is from the Encoder delay or from a sample missing.
If you have only Audio to decode, I would try to put "valid" PTS values to each sample (in my case basetime+1920, basetime+3840 and so on.)
The big problem here comes when video AND audio were combined. Here this trick doesnt work well, when samples are missing and there is no way to find out when this is the case :(

when you want to send rtp you should notice about two things:
the time stamp is incremented due to the amout of byte sents.
e.g for PT=10, you may have this pattern:
1160 byte , time stamp increment: 1154 and wait 26 ms
lets see how this calculation happens:
. number of packet should be sent in one second : 1/(26ms) = 38
time stamp increment : clockrate / # = 1154

Regarding to RFC3550 (https://www.ietf.org/rfc/rfc3550.txt)
The sampling instant MUST be derived from a clock that increments
monotonically
Its not a choice nor an option. By the way please read the full description of the timestamp field of the RTP packet, there I found it also:
As an example, for fixed-rate audio
the timestamp clock would likely increment by one for each
sampling period. If an audio application reads blocks covering
160 sampling periods from the input device, the timestamp would be
increased by 160 for each such block, regardless of whether the
block is transmitted in a packet or dropped as silent.
If you want to check linearity then use the RTCP SR RTP and NTP timestamps field. At the SR report the RTP timestamp belongs to the NTP timestamp.
So the difference of two consecutive RTP timestamp (lets call them dRTPt_1, dRTP_2, ...) and the difference of two consecutive NTP timestamps (lets call them dNTP_1, dNTP_2, ...) and then multiply dRTP_t* with clock rate and check weather you get dNTP_t*.
But first please read the RFC.

Related

How to validate properly ffmpeg pts/dts after demuxing/decoding?

How should I validate pts/dts after demuxing and then after decoding?
For me it is significant to have valid pts all the time for days and
possibly weeks of continuous streaming.
After demuxing I check:
dts <= pts
prev_packet_dts < next_packet_pts
I also discard packets with AV_NOPTS_VALUE and wait for packets with
proper pts, because I don't know video duration at this case.
pts of packets can be not increasing because of I-P-B frames
Is it all right?
What about decoded AVFrames?
Should 'pts' be increasing all the time?
Why at some point 'pts' could lag behind 'dts'?
Why pict_type is a parameter of AVFrame? Should be at AVPacket, because
AVPacket is a compressed frame, not the opposite?
Ideally, yes. Unless if your format allows discontinuities, or wraps timestamps around due to overflow, like MPEG-TS.
Writing error.
It is an informational field, indicating the provenance of the frame. It can be used by filters or encoders, e.g. keyframe alignment during a re-encode.
At libav support I was advised to not rely on decoder output. It is more solid to produce pts/dts for encoding/muxing manually and I should search for ffmpeg tools sources to proper implementation. I will search for this approach.
For now I discard AVFrames only with AV_NOPTS_VALUE, and the rest of encoding/muxing works fine.
Validation of AVPackets after Demuxing remains the same, as described above.

MIDI file events to "real-time" without PPQ or SMPTE in header

I am writing simple c++ synthesizer with MIDI playback. I've already implemented playback, but in some midi files information about PPQ or SMPTE(or data invalid, eg. all data bytes is 0) is absent and if i use "default" values of PPQ(ex. 24) and tempo from event(in this files tempo event is only one) playback is too slow or too fast. In this case i correct this value by hand. But if I import this midi in any DAW, they read file correctly and play melody with target BPM.
How to correctly convert events tick to real-time in this case? What am I missing and what do DAWs do in this case?
The ticks-per-quarter-note value is part of the header chunk, so it is present in every file.
If this value is zero, then the file is invalid and cannot be played at all.
For tempo and time signature, the default values are defined in the SMF specification:
All MIDI Files should specify tempo and time signature. If they don't, the time signature is assumed to be 4/4, and the tempo 120 beats per minute.
(120 BPM is the same as a tempo value of 500,000 microseconds per quarter note.)

Decoding an unknown CRC or checksum?

I've been trying decode the CRC or checksum algorithm that is being used for the serial communication between a drone and its camera for about a week without a lot of luck and I was wondering if anybody here sees something I am missing or has any suggestions.
A typical packet looks like this:
FE1A390100020001AE0BE0FF090046250B00040000004E0D32080008540D8808F4016B54
They always start with 0xFE. The 2nd byte is the total size of the packet minus 10 bytes. The packet sizes vary, but I think I am specifically interested the 0x1A size. Byte 3 seems to be a packet counter because it usually increases by 1, but sometimes I have seen it jump to a completely different number for a few packets (usually when changing to a 0x22 size packet) before resuming the increment by 1 sequence. The last 2 bytes always change and I believe are the checksum or CRC. All the rest of the bytes seem to stay the same from one 0x1A packet to the next unless I manipulate the drones radio controls.
Right after powering up there is a series of packets that I assume is for initializing the communication. They are the shortest packets and have the least amount of change between them so it seems like they might be the easyiest to look at. Here are the first 7 bytes sent after powering it on.
From Drone to camera
Time:
8.3982205 FE030001000000010200018F68
8.39934725 FE03010100000001020001A844
8.400473958 FE03020100000001020001C130
8.401600708 FE050301000000000000000001AAE8
8.402900792 FE1A040100020001000000000000000000000C000300000853060008AB028808F4014629
8.406020958 FE22050100030002000000000000000000000000000000000000B3FFFFFFDE22006300FF615110050000C956
8.4098345 FE1A060100020001000000000000000000000C000300000853060008AB028808F40180A9
If I put the first 3 packets into reveng with -w 16 -s then it comes back with:
reveng: warning: you have only given 3 samples
reveng: warning: to reduce false positives, give 4 or more samples
width=16 poly=0x1487 init=0x0334 refin=false refout=false xorout=0x0000 check=0xa5b9 residue=0x0000 name=(none)
If i add the 4th packet it finds the same poly, but there rest of it looks differnt:
width=16 poly=0x1487 init=0x417d refin=false refout=false xorout=0x5582 check=0xbfa2 residue=0xb059 name=(none)
If i add the 5th packet reveng comes back with no model found.
However, if I remove packet 4 and then run it with packets, 1,2,3 and 5 if finds the same poly again, but different values for the rest:
width=16 poly=0x1487 init=0x804b refin=false refout=false xorout=0x0138 check=0x7dcc residue=0xc8ca name=(none)
Most combinations of packets containing a 0x1A size packet and the first 3 initialization packets that I run through reveng come back with 'no model found'. So far every time I have run reveng with only 0x1a sized packets has failed to find a model.
I think it is possible that after the initialization packets it some how incorporates info it receives from the camera to the drone into the CRC calculation for the data going from the drone to the camera, but there isn't a lot of data in those packets. Here are the first 9 packets that are sent from the camera to the drone. Prior to the first 0x1A packet being sent from the drone, the only data sent from the camera seems to be 0x7D0001.
From camera to drone:
Time
3.474456792 FE0500020000000000007D00013D40
4.475220208 FE0501020000000000007D000168C5
5.476483875 FE0502020000000000007D00018642
6.477295958 FE0503020000000000007D0001D3C7
7.4783405 FE0504020000000000007D00014B45
8.479420458 FE06050200010003FA078538B838B3
8.480811667 FE0506020000000000007D0001F047
9.48057875 FE0507020000000000007D0001A5C2
9.481883 FE06080200010003F9078638B8386037
I have tried incorporating 0x7D0001 into the packets and running them through reveng, but that didn't seem to help.
I have also tried reveng -w 8 -s on various combinations of packets without finding a model. And I have tried various checksum algos manually (possibly incorrectly) without success.
I have a bunch more data that I have captured here:
https://drive.google.com/open?id=1v8MCaXOvP_2Wv_hcaqhUZnXvqNI1_2Ur
Any ideas? Suggestions? This has been driving me nuts for a week

IMediaSample Time and MediaTime

What is the primary difference between SetTime and SetMediaTime?
Right now in my directshow livesource I calculate the time it like this
REFERENCE_TIME rtStart = m_rtLastSampleTime;
m_rtLastSampleTime += pVih->AvgTimePerFrame;
pms->SetTime(&rtStart, &m_rtLastSampleTime);
pms->SetSyncPoint(TRUE);
pms->SetDiscontinuity(rtStart <= 1);
This doesn't work with some encoders.
I've noticed that source that do work with these encoders set mediatime and they seem to jump up.
Media Times:
Optionally, the filter can also specify a media time for the sample. In a video stream, media time represents the frame number. In an audio stream, media time represents the sample number in the packet. For example, if each packet contains one second of 44.1 kilohertz (kHz) audio, the first packet has a media start time of zero and a media stop time of 44100. In a seekable stream, the media time is always relative to the start time of the stream. For example, suppose you seek to 2 seconds from the start of a 15-fps video stream. The first media sample after the seek has a time stamp of zero but a media time of 30.
Renderer and mux filters can use the media time to determine whether frames or samples have been dropped, by checking for gaps. However, filters are not required to set the media time. To set the media time on a sample, call the IMediaSample::SetMediaTime method.
I don't think it is actually used anywhere. SetTime, on the contrary, is important.

Calculate TS File Duration

I am working on a media player application : Which plays ISDB-T audio and video.
I am using GStreamer for decoding & rendering.
For AV Sync to work perfectly, I should regulate file reads: so that data will be not be pushed to Gstreamer neither too fast nor too slow.
If I know the duration of TS file before hand, then I can regulate my reads. But how to calculate the TS file duration ?
Because, I need to verify the application with multiple TS files, cannot calculate the duration using some utility and keep changing the file reads - How can this be achieved in program?
Thanks,
Kranti
If you have sufficient knowledge in the encoding and PES layer inside the transport stream, then you can read the time-stamps within the TS and calculate it yourself.
It requires seeking to the end of the file, searching for the last time-stamp, and subtracting the first time stamp of the same program in the beginning of the file.
EDIT: In addition to the above method you need to include the last frame duration.
((last_pts - first_pts) + frame_duration) / pts_resolution
Lets say you have a 30 fps segment with a duration of 6.006s
((1081080 - 543543) + 3003) / 90000 = 6.006
in most cases, each PES header contains a PTS and/or DTS, which is measured in 90kHz frequency. so the steps may include:
find the program you need to demux from MPEG TS.
find the PID of stream.
find the first TS packet with PID found, and payload_start_indicator set to 1; that will be the starting of a PES frame, which will contain a PES header.
Parse the PES header to find the starting PTS of the stream.
parse the file backwards from end, to find a packet with same PID and payload_start_indicator set, which will contain the last PTS.
find thier difference, divide it by 90000 will give duration in Seconds