Analyzing audio in C++ / Unreal Engine 5 without plugins - c++

This project is done in Unreal Engine 5. I would prefer doing this in C++ instead of Blueprints, but I'm open to any ideas.
I am trying to take an audio file and create the data needed to make an audio waveform (sample image of audio waveform below).
I would go "frame by frame" (Sorry I don't know the audio equivalent to video) and try to find the information needed for this, such as the peaks and troughs of the audio file. For example:
frame 0: audio level 0
frame 1: audio level 1
frame 2: audio level 0
frame 3: audio level 2
frame 4: audio level 0
and using this data, a visual waveform can be created.
I have been looking at resources to get this but they all involve plugins that already do this, but I need to create this feature myself and I don't know where to start. If anyone has any sources or tips to get started that would be much appreciated, thank you.

Related

Programmatically capturing video with the FFmpeg libaries (not the libav fork) with variable frame rate in c++

I am working on a simulator in c++ and OpenGL and I wanted to add some video capture capabilities (cross platform would be a requirement here). I decided to work with FFmpeg since I can directly put my rendered frames into a video. So far so good, but in a 3D rendering engine you are usually far from having a constant frame rate and I think that it is not a good idea to go constant there. Therefore I am trying to figure out how to capture a variable frame rate video with FFmpeg or how to get from my variable frame rate of the simulator to a constant frame rate for the video in FFmpeg. Can anybody help me out here? How are videos usually captured in variable frame rate environments?
Variable frame rate is mostly an issue in the muxing stage, since your container (e.g. good ol' AVI) might not support VFR. As long as you're muxing into a format that supports per-frame timestamps, you should be OK. Good examples of this are mkv (matroska) or mp4. Then, as long as the AVPacket.dts is set correctly during encoding/muxing, you should be fine and your video should be VFR.

C++ Video processing frame by frame

Iam stuck with a project in which iam required to write a program in C++ that gets every frame of a raw .yuv video file and calculates the Signal to Noise ratio.
Iam stuck in this and can't find where to start from .. any guide to a tutorial or anything written on how to do this ? how to read a video and get the frames of the videos in c++?
Check out the ffmpeg libraries https://www.ffmpeg.org/about.html for extracting frames from a video stream.
There are other libraries, like OpenCV, which may also help with the image analysis part, and Windows-specific APIs.
For measuring signal:noise, you'll need a mathematical model for noise detection, like autocorrelation.

Using Async_reader and Wave Parser in DirectShow filter graph results in video seeking issues

Some background:
I am attempting to create a DirectShow source filter based on the pushsource example from the DirectShow SDK. This essentially outputs a set of bitmaps, each of which can last for a long time (for example 30 seconds), to a video. I have set up a filter graph which uses Async_reader with a Wave Parser for audio and my new filter to push the video (the filter is a CSourceStream and I populate my frames in the FillBuffer function). These are both connected to a WMASFWriter to output a WMV.
The problem:
When I attempt to seek through the resulting video, I have to wait until a bitmap's start time occurs before it is displayed. For example, if I'm currently seeing bitmap 4 and skip back to the time which bitmap 2 is displayed the video output will not change until the third bitmap starts. Initially I wondered if I wasn't allowing FillBuffer to be called enough (as at the moment it's only once per bitmap) however I have since noted that when the audio track is very short (just a second long perhaps), I can seek through the video as expected. Is there a another way I should be introducing audio into the filter graph? Do I need to perform some kind of indexing when the WMV has been rendered? I'm at a bit of a loss...
You may need to do indexing as a post-processing step. Try indexing it with Windows Media File Editor from Windows Media Encoder SDK and see if this improves seeking.
Reducing key frame interval in the encoder profile may improve seeking. This can be done in Windows Media Profile Editor from the SDK. Note that this will cause file size increase.

encoding camera with audio source in realtime with WMAsfWriter - jitter problem

I build a DirectShow graph consisting of my video capture filter
(grabbing the screen), default audio input filter both connected
through spliiter to WM Asf Writter output filter and to VMR9 renderer.
This means I want to have realtime audio/video encoding to disk
together with preview. The problem is that no matter what WM profile I
choose (even very low resolution profile) the output video file is
always "jitter" - every few frames there is a delay. The audio is ok -
there is no jitter in audio. The CPU usage is low < 10% so I believe
this is not a problem of lack of CPU resources. I think I'm time-
stamping my frames correctly.
What could be the reason?
Below is a link to recorder video explaining the problem:
http://www.youtube.com/watch?v=b71iK-wG0zU
Thanks
Dominik Tomczak
I have had this problem in the past. Your problem is the volume of data being written to disk. Writing to a faster drive is a great and simple solution to this problem. The other thing I've done is placing a video compressor into the graph. You need to make sure both input streams are using the same reference clock. I have had a lot of problems using this compressor scheme and keeping a good preview. My preview's frame rate dies even if i use an infinite Tee rather than a Smart Tee, the result written to disk was fine though. Its also worth noting that the more of a beast the machine i was running it on was the less of an issue so it may not actually provide much of a win if you need both over sticking a new faster hard disk in the machine.
I don't think this is an issue. The volume of data written is less than 1MB/s (average compression ratio during encoding). I found the reason - when I build the graph without audio input (WM ASF writer has only video input pint) and my video capture pin is connected through Smart Tree to preview pin and to WM ASF writer input video pin then there is no glitch in the output movie. I reckon this is the problem with audio to video synchronization in my graph. The same happens when I build the graph in GraphEdit. Without audio, no glitch. With audio, there is a constant glitch every 1s. I wonder whether I time stamp my frames wrongly bu I think I'm doing it correctly. How is the general solution for audio to video synchronization in DirectShow graphs?

YUV/PCM Visualizer to measure Lip Sync

I have a two dump files of raw video and raw audio from an encoder and I want to be able to measure the "Lip-sync". Imagine a video of a hammer striking an anvil. I want to go frame by frame and see that when the hammer finally hits the anvil, there is a spike in amplitude on the audio track.
Because of the speed that everything happens at, I cannot merely listen to the audio, i need to see the waveform in time domain.
Are there any tools out there that will let me see both the video and audio?
If you are concerned about validating a decoder then generally from a validation perspective the goal is to check Audio and Video PTS values against a common real time clock.
Raw YUV and PCM files do not include timestamps. If you know the frame-rate and sample-rate you can use a raw yuv file viewer (I wrote my own) to figure out the time (from start of file) of a given frame in the video, and a tool like Audacity to figure out the time form start of file to a start of tone in the audio file. this still may not tell you the whole story since tools usually embed a delay between the audio and video in the ts/ps file. Or you can hook up ab OScope and go old school.