I have received video packets(h264) and audio packets(aac) from network(somewhere) separately. And i can get the ntptime of every packet(a or v).
now I want write them to a mp4 file in 25 fps, how can l do?
I have tried to do it using ffmpeg, but failed. please help me, thanks.
Related
I have a gstreamer pipeline created from the python gst bindings, which is set up to play a headset's microphone back to the headset's speaker. This works fine and is playing in a pipeline like this:
JackAudioSrc -> GstAudioMixer -> Queue -> GstJackAudioSink
Then many seconds later I want to play a short 10 second .wav file into the pipeline so the wav file is mixed with the microphone and heard on the headset. To do this, a GstFileSrc is dynamically added to the GstAudioMixer to mix in a short 10 second wav file to the headset's speaker, which gives pipeline like this:
GstJackAudioSrc -> GstAudioMixer -> Queue -> GstJackAudioSink
/
Gstfilesrc -> Gstwavparse ->/
When the Gstfilesrc and Gstwavparse file is dynamically added to a sink pad of the mixer, at a time say 6 seconds since the start of the pipeline, only the last 4 seconds of the wav is heard.
The problem seems to be that the wav file seeks to the time relative to when the pipeline started PLAYING.
I have tried changing "do-timestamp" in a multifilesrc, and GstIndentity "sync"=True, and can't find a way to set "live" on a filesrc, and many others but to no avail.
However, the whole 10 second wav file will play nicely if the pipeline is set to Gst.State.NULL then back to Gst.State.PLAYING when the filesrc is added at 6 seconds. This works as the pipeline time gets set back to zero, but this produces a click on the headset, which is unacceptable.
How can I ensure that the wav file starts playing from the start of the wav file, so that the whole 10 seconds is heard on the headset, if added to the pipeline at any random time?
An Update:
I can now get the timing of the wave file correct by adding a clocksync and setting its timestamp offset, before the wavparse:
nanosecs = pipeline.query_position(Gst.Format.TIME)[1]
clocksync.set_property("ts-offset", nanosecs)
Although the start/stop times are now correct, the wav audio is corrupted and heard as nothing but clicks and blips, but at least it starts playing at the correct time and finishes at the correct time. Note that without the clocksync the wav file audio is perfectly clear, it just starts and stops at the wrong time. So the ts-offset is somehow corrupting the audio.
Why is the audio being corrupted?
So I got this working and the answer is not to use the clocksync, but instead request a mixer sink pad, then call set_offset(nanosecs) on the mixer sink pad, before linking the wavparse to the mixer:
sink_pad = audio_mixer.get_request_pad("sink_%u")
nanosecs = pipeline.query_position(Gst.Format.TIME)[1]
sink_pad.set_offset(nanosecs)
sink_pad.add_probe(GstPadProbeType.IDLE, wav_callback)
def wav_callback(pad, pad_probe_info, userdata):
wavparse.link(audio_mixer)
wav_bin.set_state(Gst.State.PLAYING)
return Gst.PadProbeReturn.REMOVE
Then if the wav file needs to be rewound/replayed:
def replay_wav():
global wav_bin
global sink_pad
wav_bin.seek_simple(Gst.Format.TIME, Gst.SeekFlags.FLUSH, 0)
nanosecs = pipeline.query_position(Gst.Format.TIME)[1]
sink_pad.set_offset(nanosecs)
Working hard for 4 days now to fix the google cloud speech to text api to work, but still see no light at the end of the tunnel. Searched on the net a lot, read the documentations a lot but see no result.
Our site is bbsradio.com, we are trying to auto extract transcript from our mp3 files using google speech-to-text api. Code is written on PHP and almost exact copy of this: https://github.com/GoogleCloudPlatform/php-docs-samples/blob/master/speech/src/transcribe_async.php
I see process is completed and its reached out here "$operation->pollUntilComplete();" but its not showing it was successful at "if ($operation->operationSucceeded()) {" and its not returning any error either at $operation->getError().
I am converting the mp3 to raw file like this: ffmpeg -y -loglevel panic -i /public_html/sites/default/files/show-archives/audio-clips-9-23-2020/911freefall2020-05-24.mp3 -f s16le -acodec pcm_s16le -vn -ac 1 -ar 16000 -map_metadata -1 /home/mp3_to_raw/911freefall2020-05-24.raw
While tried with FLAC format as well, not worked. I tested converted FLAC file using windows media player, I can listen conversation clearly. I checked the files its Hz 16000, channel = 1 and its 16 bit. I see file is uploaded in cloud storage. Checked this:
https://cloud.google.com/speech-to-text/docs/troubleshooting and
https://cloud.google.com/speech-to-text/docs/best-practices
There are lot of discussion and documentation, seems nothing is helpful at this moment. If some one can really help me out to find out the issue, it will be really really really great!
TLDR; convert from MP3 to a 1-channel FLAC file with the same sample rate as your MP3 file.
Long explanation:
Since you're using MP3 files as your process input, probably you MP3 compression artifacts might be hurting you when you resample to to 16KHz (you cannot hear this, but the algoritm will).
To confirm this theory:
Execute ffprobe -hide_banner filename.mp3 it will output something like this:
Metadata:
...
Duration: 00:02:12.21, start: 0.025057, bitrate: 320 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Metadata:
encoder : LAME3.99r
In this case, the sample rate is OK for Google-Spech-Api. Just transcode the file without changing the sample rate (remove the -ar 16000 from your ffmpeg command)
You might get into trouble if the original MP3 bitrate is low. 320kb/s seems safe (unless the recording has a lot of noise).
Take into account that voice recoded under 64kb/s (ISDN line quality) can be understood only by humans if there is some noise.
At last I found the solution and reason of the issue. Actually getting empty results is a bug of the php api code. What you need to do:
Replace this:
$operation->pollUntilComplete();
by this:
while(!$operation->isDone()){
$operation->pollUntilComplete();
}
Read this: enter link description here
I have to get a processed video from gstreamer pipe, compress it based on h264 or h265 alg. and then write it to storage. There are some problems in this project that must be handled:
Saved video must be playable by any standard video players such as vlcplaye, windows media player, kmplayer and ...
If for any reason the destination file does not close properly (such as a power outage), the entire file should not be lost and the saved video should be playable until the problem occurs.
My solution to this project with these constraints, is an opencv writer with a gstreamer pipe as follow:
...
std::string gstPipe("appsrc ! videoconvert ! omxh264enc ! "
"splitmuxsink muxer=matroskamux "
"max-size-time=50000000000 location="
"/file/path/save%d.mkv");
cv::Size frameSize(frameWidth, frameHeight);
bool result = videoWriter.open(gstPipe, cv::CAP_GSTREAMER, 0,
fps, frameSize);
This solution splits a video stream into multiple files, but it is needed to save whole video in one file.
Does anyone have a better solution to offer?
Thank you very much in advance for your helps.
I'm trying to record a lossless video from a webcam using opencv.
I would like to use the FFV1 codec for this.
I open my video writter like this:
theVideoWriter.open(filename,CV_FOURCC('F','F','V','1'), 30, cv::Size(1280,720), true);
I can successfully open the video writter but while recording I get following error message:
[ffv1 # 26d78020] Provided packet is too small, needs to be 8310784
The resulting video is not playable. Other FFmpeg codecs like FMP4 work fine.
What does that error mean and how can I fix it?
I was writing as I could not find the answer in previous topics. I am using live555 to stream live video (h264) and audio(g723), which are being recorded by a web camera. The video part is already done and it works perfectly, but I have no clue about the audio task.
As long as I have read I have to create a ServerMediaSession to which I should add two subsessions: one for the video and one for the audio. For the video part I created a subclass of OnDemandServerMediaSubsession, a subclass of FramedSource and the Encoder class, but for the audio aspect I do not know on which classes should I base the implementation.
The web camera records and delivers audio frames in g723 format separatedly from the video. I would say the audio is raw as when I try to play it in VLC it says that it could not find any startcode; so I suppose it is the raw audio stream what is recorded by the web cam.
I was wondering if someone could give me a hint.
For an audio stream ,your override of OnDemandServerMediaSubsession::createNewRTPSink should create a SimpleRTPSink.
Something like :
RTPSink* YourAudioMediaSubsession::createNewRTPSink(Groupsock* rtpGroupsock, unsigned char rtpPayloadTypeIfDynamic, FramedSource* inputSource)
{
return SimpleRTPSink::createNew(envir(), rtpGroupsock,
4,
frequency,
"audio",
"G723",
channels );
}
The frequency and the number of channels should comes from the inputSource.