Convert decoded frames to WAV format using ffmpeg libraries - c++

I have a stl queue of pointers to decoded frames on the heap
(e.g. decode_queue<AVFrame *>)
and I want to take those frames from the stl queue and encode them in WAV format. I tried using the examples that come with ffmpeg they break when I try to change the encoder to pcm_s32le/pcm_s16le. For example for decoding_encoding.c example, I just tried to change some parameters and all of a sudden I am getting a floating point exception.
Is there a something I can do? I am really lost.
UPDATE
In decoding_encoding.c i changed the line:
codec = avcodec_find_encoder(AV_CODEC_ID_MP2);
to
codec = avcodec_find_encoder(AV_CODEC_ID_PCM_S16LE);
this changed ended up making the c->frame_size == 0 So the buffer is never created:
buffer_size = av_samples_get_buffer_size(NULL, c->channels, c->frame_size,
c->sample_fmt, 0);
In that case instead of the line:
frame->nb_samples = c->frame_size;
I changed it to
frame->nb_samples = 10000
and
buffer_size = av_samples_get_buffer_size(NULL, c->channels, 10000,
c->sample_fmt, 0);
just to see what happens(basically substituted c->frame_size for 10000). It compiles but it creates an output file but the player cannot find a stream for it or it is filled with garbage. At this point I am still stuck not sure what to do to get the output.WAV file.
Also, is a RIFF header automatically added or do I have to manually add it in as well?

Related

C++ FFmpeg pick codec from system

I'm currently setting up my output context for creating .avi like this:
avformat_alloc_output_context2(&outContext, NULL, NULL, "out.avi");
if (!outContext)
die("Could not allocate output context");
However, the resulting video quality is very unpleasant. As such, I'd like to be able to fetch the installed codecs on the system and use one of them in avformat_alloc_output_context2. Similar to below:
So I guess my two questions are:
How do I create a list (array) containing the installed codecs (as above)?
How do I use one of them in the output container?
If possible, I'd also like to be able to modify output quality (0%-100%) and open the codec configuration window.
First, make your map with string(or whatever) with AVCodecID, like this :
std::map<string, AVCodecID> _codecList;
_codecList["h264"] = AV_CODEC_ID_H264;
_codecList["mpeg4"] = AV_CODEC_ID_MPEG4;
....
Note that FFmpeg does not provide information that which codec is available in what container so you should validate yourself. but you can reference following link(at least it is officlal) : https://en.wikipedia.org/wiki/Comparison_of_video_container_formats
Next thing to do is that find encoder by name, or AVCodecID in following code :
avcodec_find_encoder_by_name("libx264");
avcodec_find_encoder(AV_CODEC_ID_H264);
Both are return AVCodec* so you can use this when calling avformat_new_stream(), like this :
AVCodecID codec_id = (_codecList.find("h264") != _codecList.end()) ?
_codecList.find("h264") : AV_CODEC_ID_NONE;
if(codec_id == AV_CODEC_ID_NONE)
{
return -1;
}
AVCodec* encoder = avcodec_find_encoder(codec_id);
// or you can just get it from avcodec_find_encoder_by_name("libx264");
AVStream* newStream = avformat_new_stream(avFormatContext, encoder);
Thre are so many things when determining video quality. x264, especially has more options. In this case, you can list it by crf value or bitrate things(you can't use both option). You can determine it with AVCodecContext.
AVCodecContex* codec_ctx = newStream->codec;
codec_ctx->bitrate = 1000000 // 1MB
// codec_ctx->qmin = 18;
// codec_ctx->qmin = 31;
Once you done, open it with avcodec_open2
avcodec_open2(avFormatContext, encoder, NULL);
And Don't forget to close when you release it.
avcodec_close(avFormatContext);
There is much to do when you creating your own output stream. If you have deeper experience with it, i think that this answer will be enough.
But If you don't have much experience with FFmpeg, you can find my full example in here(https://github.com/sorrowhill/FFmpegTutorial)

iOS waveform generator connected via AUGraph

I have created a simple waveform generator which is connected to an AUGraph. I have reused some sample code from Apple to set AudioStreamBasicDescription like this
void SetCanonical(UInt32 nChannels, bool interleaved)
// note: leaves sample rate untouched
{
mFormatID = kAudioFormatLinearPCM;
int sampleSize = SizeOf32(AudioSampleType);
mFormatFlags = kAudioFormatFlagsCanonical;
mBitsPerChannel = 8 * sampleSize;
mChannelsPerFrame = nChannels;
mFramesPerPacket = 1;
if (interleaved)
mBytesPerPacket = mBytesPerFrame = nChannels * sampleSize;
else {
mBytesPerPacket = mBytesPerFrame = sampleSize;
mFormatFlags |= kAudioFormatFlagIsNonInterleaved;
}
}
In my class I call this function
mClientFormat.SetCanonical(2, true);
mClientFormat.mSampleRate = kSampleRate;
while sample rate is
#define kSampleRate 44100.0f;
The other setting are taken from sample code as well
// output unit
CAComponentDescription output_desc(kAudioUnitType_Output, kAudioUnitSubType_RemoteIO, kAudioUnitManufacturer_Apple);
// iPodEQ unit
CAComponentDescription eq_desc(kAudioUnitType_Effect, kAudioUnitSubType_AUiPodEQ, kAudioUnitManufacturer_Apple);
// multichannel mixer unit
CAComponentDescription mixer_desc(kAudioUnitType_Mixer, kAudioUnitSubType_MultiChannelMixer, kAudioUnitManufacturer_Apple);
Everything works fine, but the problem is that I am not getting stereo sound and my callback function is failing (bad access) when I try to reach the second buffer
Float32 *bufferLeft = (Float32 *)ioData->mBuffers[0].mData;
Float32 *bufferRight = (Float32 *)ioData->mBuffers[1].mData;
// Generate the samples
for (UInt32 frame = 0; frame < inNumberFrames; frame++)
{
switch (generator.soundType) {
case 0: //Sine
bufferLeft[frame] = sinf(thetaLeft) * amplitude;
bufferRight[frame] = sinf(thetaRight) * amplitude;
break;
So it seems I am getting mono instead of stereo. The pointer bufferRight is empty, but don't know why.
Any help will be appreciated.
I can see two possible errors. First, as #invalidname pointed out, recording in stereo probably isn't going to work on a mono device such as the iPhone. Well, it might work, but if it does, you're just going to get back dual-mono stereo streams anyways, so why bother? You might as well configure your stream to work in mono and spare yourself the CPU overhead.
The second problem is probably the source of your sound distortion. Your stream description format flags should be:
kAudioFormatFlagIsSignedInteger |
kAudioFormatFlagsNativeEndian |
kAudioFormatFlagIsPacked
Also, don't forget to set the mReserved flag to 0. While the value of this flag is probably being ignored, it doesn't hurt to explicitly set it to 0 just to make sure.
Edit: Another more general tip for debugging audio on the iPhone -- if you are getting distortion, clipping, or other weird effects, grab the data payload from your phone and look at the recording in a wave editor. Being able to zoom down and look at the individual samples will give you a lot of clues about what's going wrong.
To do this, you need to open up the "Organizer" window, click on your phone, and then expand the little arrow next to your application (in the same place where you would normally uninstall it). Now you will see a little downward pointing arrow, and if you click it, Xcode will copy the data payload from your app to somewhere on your hard drive. If you are dumping your recordings to disk, you'll find the files extracted here.
reference from link
I'm guessing the problem is that you're specifying an interleaved format, but then accessing the buffers as if they were non-interleaved in your IO callback. ioData->mBuffers[1] is invalid because all the data, both left and right channels, is interleaved in ioData->mBuffers[0].mData. Check ioData->mNumberBuffers. My guess is it is set to 1. Also, verify that ioData->mBuffers[0].mNumberChannels is set to 2, which would indicate interleaved data.
Also check out the Core Audio Public Utility classes to help with things like setting up formats. Makes it so much easier. Your code for setting up format could be reduced to one line, and you'd be more confident it is right (though to me your format looks set up correctly - if what you want is interleaved 16-bit int):
CAStreamBasicDescription myFormat(44100.0, 2, CAStreamBasicDescription::kPCMFormatInt16, true)
Apple used to package these classes up in the SDK that was installed with Xcode, but now you need to download them here: https://developer.apple.com/library/mac/samplecode/CoreAudioUtilityClasses/Introduction/Intro.html
Anyway, it looks like the easiest fix for you is to just change the format to non-interleaved. So in your code: mClientFormat.SetCanonical(2, false);

Saving output frame as an image file CUDA decoder

I am trying to save the decoded image file back as a BMP image using the code in CUDA Decoder project.
if (g_bReadback && g_ReadbackSID)
{
CUresult result = cuMemcpyDtoHAsync(g_bFrameData[active_field], pDecodedFrame[active_field], (nDecodedPitch * nHeight * 3 / 2), g_ReadbackSID);
long padded_size = (nWidth * nHeight * 3 );
CString output_file;
output_file.Format(_T("image/sample_45.BMP"));
SaveBMP(g_bFrameData[active_field],nWidth,nHeight,padded_size,output_file );
if (result != CUDA_SUCCESS)
{
printf("cuMemAllocHost returned %d\n", (int)result);
}
}
But the saved image looks like this
Can anybody help me out here what am i doing wrong .. Thank you.
After investigating further, there were several modifications I made to your approach.
pDecodedFrame is actually in some non-RGB format, I think it is NV12 format which I believe is a particular YUV variant.
pDecodedFrame gets converted to an RGB format on the GPU using a particular CUDA kernel
the target buffer for this conversion will either be a surface provided by OpenGL if g_bUseInterop is specified, or else an ordinary region allocated by the driver API version of cudaMalloc if interop is not specified.
The target buffer mentioned above is pInteropFrame (even in the non-interop case). So to make an example for you, for simplicity I chose to only use the non-interop case, because it's much easier to grab the RGB buffer (pInteropFrame) in that case.
The method here copies pInteropFrame back to the host, after it has been populated with the appropriate RGB image by cudaPostProcessFrame. There is also a routine to save the image as a bitmap file. All of my modifications are delineated with comments that include RMC so search for that if you want to find all the changes/additions I made.
To use, drop this file in the cudaDecodeGL project as a replacement for the videoDecodeGL.cpp source file. Then rebuild the project. Then run the executable normally to display the video. To capture a specific frame, run the executable with the nointerop command-line switch, eg. cudaDecodGL nointerop and the video will not display, but the decode operation and frame capture will take place, and the frame will be saved in a framecap.bmp file. If you want to change the specific frame number that is captured, modify the g_FrameCapSelect = 37; variable to some other number besides 37, and recompile.
Here is the replacement for videoDecodeGL.cpp I used pastebin because SO has a limit on the number of characters that can be entered in a question body.
Note that my approach is independent of whether readback is specified. I would recommend not using readback for this sequence.

Using libtiff's TIFFReadRawTile to get a jpeg tile without decompression/compression

i've a pyramidal tiled tiff file and I want to extract the tiles without decoding and re-encoding the jpeg, i've seen that using TIFFReadRawTile() function you can extract the raw tile without decoding, how can i write the extracted buffer to a readable jpeg file?
The task you are up to is not a trivial one. You might want to take a closer look at tiff2pdf utility's source code. The utility does what you need and you might extract relevant parts from it.
The problem is, the utility does many other things you will have to discard. Also, not any JPEG-in-TIFF could be successfully processed by the utility. Basically, because there is enough semi-broken TIFFs out there.
I've found that actually there is no way to get the encoded tile without directly messing with the huffmann tables of the tiff, which is pretty tricky.
The only way I've found is to read the decoded tile and then do some magic with vips to output to jpeg directly.
tdata_t buf;
tsize_t len;
buf = _TIFFmalloc( TIFFTileSize( tif ) );
len = TIFFReadEncodedTile(tif, tile, buf, (tsize_t) -1);
VImage result ((void *) buf, 256, 256, 3, VImage::FMTUCHAR);
void *outBuffer;
unsigned long len;
vips_jpegsave_buffer(result, &outBuffer, &len, "Q", 90, NULL);
and the use cout to output the image after some headers.

Problem with ffmpeg function avformat_seek_file()

I am trying to seek the given frame in the video using ffmpeg library. I knew that there is av_seek_frame() function but it was recommended to use avformat_seek_file()instead. Something similar mentioned here.
I know that avformat_seek_file() can't always take you to exact frame you want, but this is ok for me. I just want to jump to the nearest keyframe. So i open the video, find videostream and calling it like this:
avformat_seek_file( formatContext, streamId, 0, frameNumber, frameNumber, AVSEEK_FLAG_FRAME )
It always returns 0, so i understand it as correct finish. However, it doesn't work as it should to. I check byte position like here before and after calling avformat_seek_file(). Actually it changes, but it changes always in the same way whenever i'm trying to put different target frame numbers! I mean that byteposition after this call is always same even with different frameNumber values. Obviously, i'm doing something wrong but i don't know what exactly. I don't know if it does matter but i'm using .h264 files for that. I tried different flags, different files, using timestamps instead of frames, flushing buffers before and after and so on but it doesn't work for me. I will be very grateful if someone could show me what is wrong with it.
I had the same issue, see the code bellow (it works for me):
...
checkPosition(input_files[file_index].ctx);
...
void checkPosition(AVFormatContext *is) {
int stream_index = av_find_default_stream_index(is);
//Convert ts to frame
tm = av_rescale(tm, is->streams[stream_index]->time_base.den, is->streams[stream_index]->time_base.num);
tm /= 1000;
//SEEK
if (avformat_seek_file(is, stream_index, INT64_MIN, tm, INT64_MAX, 0) < 0) {
av_log(NULL, AV_LOG_ERROR, "ERROR av_seek_frame: %u\n", tm);
} else {
av_log(NULL, AV_LOG_ERROR, "SUCCEEDED av_seek_frame: %u newPos:%d\n", tm, is->pb->pos);
avcodec_flush_buffers(is->streams[stream_index]->codec);
}
}
your problem may be related to the fact that your input is raw .h264. try using e.g. mp4box to mux it into a .mp4 file, then load the mp4 file with ffmpeg and try to seek to a keyframe again. e.g.:
mp4box -new -add my_file.h264 my_file.mp4