Core Audio specify which audio track to decode - c++

I am able to successfully get the decoded PCM data of an audio file using Core Audio API. Below is the reduced code that shows how do I do that:
CFStringRef urlStr = CFStringCreateWithCString(kCFAllocatorDefault, "file.m4a", kCFStringEncodingUTF8);
CFURLRef urlRef = CFURLCreateWithFileSystemPath(NULL, urlStr, kCFURLPOSIXPathStyle, false);
ExtAudioFileOpenURL(urlRef, &m_audioFile);
bzero(&m_outputFormat, sizeof(AudioStreamBasicDescription));
m_outputFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
m_outputFormat.mSampleRate = m_inputFormat.mSampleRate;
m_outputFormat.mFormatID = kAudioFormatLinearPCM;
m_outputFormat.mChannelsPerFrame = m_inputFormat.mChannelsPerFrame;
m_outputFormat.mBytesPerFrame = sizeof(short) * m_outputFormat.mChannelsPerFrame;
m_outputFormat.mBitsPerChannel = sizeof(short) * 8;
m_outputFormat.mFramesPerPacket = 1;
m_outputFormat.mBytesPerPacket = m_outputFormat.mBytesPerFrame * m_outputFormat.mFramesPerPacket;
ExtAudioFileSetProperty(m_audioFile, kExtAudioFileProperty_ClientDataFormat, sizeof(m_outputFormat), &m_outputFormat)
short* transformData = new short[sampleCount];
AudioBufferList fillBufList;
fillBufList.mNumberBuffers = 1;
fillBufList.mBuffers[0].mNumberChannels = channels;
fillBufList.mBuffers[0].mDataByteSize = m_sampleCount * sizeof(short);
fillBufList.mBuffers[0].mData = (void*)(&transformData[0]);
ExtAudioFileRead(m_audioFile, &m_frameCount, &fillBufList);
I am interested in how can I specify the audio track I want to decode (suppose that media file contains more than one)?

One method is to decode all tracks and then extract (copy) the desired track data (every other sample for interleaved stereo, etc.) into another buffer, array, or file. Compared to the decode time, the extra copy time is insignificant.

Related

How to encapsulate the H.264 bitstream of video file in C++

I'm trying to convert a video file (.mp4) to a Dicom file.
I have succeeded to do it by storing single images (one per frame of the video) in the Dicom, but the result is a too large file, it's not good for me.
Instead I want to encapsulate the H.264 bitstream as it is stored in the video file, into the Dicom file.
I've tried to get the bytes of the file as follows:
std::ifstream inFile(file_name, std::ifstream::binary);
inFile.seekg(0, inFile.end);
std::streampos length = inFile.tellg();
inFile.seekg(0, inFile.beg);
std::vector<unsigned char> bytes(length);
inFile.read((char*)&bytes[0], length);
but I think I have missed something like encapsulating for the read bytes because the result Dicom file was a black image.
In python I would use pydicom.encaps.encapsulate function for this purpose:
https://pydicom.github.io/pydicom/dev/reference/generated/pydicom.encaps.encapsulate.html
with open(videofile, 'rb') as f:
dataset.PixelData = encapsulate([f.read()])
Is there anything in C ++ that is equivalent to the encapsulate function?
or any different way to get the encapsulated pixel data of video at one stream and not frame by frame?
This is the code of initializing the Dcmdataset, using the bytes extracted:
VideoFileStream* vfs = new VideoFileStream();
vfs->setFilename(file_name);
if (!vfs->open())
return false;
DcmDataset* dataset = new DcmDataset();
dataset->putAndInsertOFStringArray(DCM_SeriesInstanceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_SERIES_UID_ROOT));
dataset->putAndInsertOFStringArray(DCM_SOPInstanceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_INSTANCE_UID_ROOT));
dataset->putAndInsertOFStringArray(DCM_StudyInstanceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_STUDY_UID_ROOT));
dataset->putAndInsertOFStringArray(DCM_MediaStorageSOPInstanceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_UID_ROOT));
dataset->putAndInsertString(DCM_MediaStorageSOPClassUID, UID_VideoPhotographicImageStorage);
dataset->putAndInsertString(DCM_SOPClassUID, UID_VideoPhotographicImageStorage);
dataset->putAndInsertOFStringArray(DCM_TransferSyntaxUID, UID_MPEG4HighProfileLevel4_1TransferSyntax);
dataset->putAndInsertOFStringArray(DCM_PatientID, "987655");
dataset->putAndInsertOFStringArray(DCM_StudyDate, "20050509");
dataset->putAndInsertOFStringArray(DCM_Modality, "ES");
dataset->putAndInsertOFStringArray(DCM_PhotometricInterpretation, "YBR_PARTIAL_420");
dataset->putAndInsertUint16(DCM_SamplesPerPixel, 3);
dataset->putAndInsertUint16(DCM_BitsAllocated, 8);
dataset->putAndInsertUint16(DCM_BitsStored, 8);
dataset->putAndInsertUint16(DCM_HighBit, 7);
dataset->putAndInsertUint16(DCM_Rows, vfs->height());
dataset->putAndInsertUint16(DCM_Columns, vfs->width());
dataset->putAndInsertUint16(DCM_CineRate, vfs->framerate());
dataset->putAndInsertUint16(DCM_FrameTime, 1000.0 * 1 / vfs->framerate());
const Uint16* arr = new Uint16[]{ 0x18,0x00, 0x63, 0x10 };
dataset->putAndInsertUint16Array(DCM_FrameIncrementPointer, arr, 4);
dataset->putAndInsertString(DCM_NumberOfFrames, std::to_string(vfs->numFrames()).c_str());
dataset->putAndInsertOFStringArray(DCM_FrameOfReferenceUID, dcmGenerateUniqueIdentifier(new char[100], SITE_UID_ROOT));
dataset->putAndInsertUint16(DCM_PixelRepresentation, 0);
dataset->putAndInsertUint16(DCM_PlanarConfiguration, 0);
dataset->putAndInsertOFStringArray(DCM_ImageType, "ORIGINAL");
dataset->putAndInsertOFStringArray(DCM_LossyImageCompression, "01");
dataset->putAndInsertOFStringArray(DCM_LossyImageCompressionMethod, "ISO_14496_10");
dataset->putAndInsertUint16(DCM_LossyImageCompressionRatio, 30);
dataset->putAndInsertUint8Array(DCM_PixelData, (const Uint8 *)bytes.data(), length);
DJ_RPLossy repParam;
dataset->chooseRepresentation(EXS_MPEG4HighProfileLevel4_1, &repParam);
dataset->updateOriginalXfer();
DcmFileFormat fileformat(dataset);
OFCondition status = fileformat.saveFile("C://temp//videoTest", EXS_LittleEndianExplicit);
The trick is to redirect the value of the attribute PixelData to a file stream. With this, the video is loaded in chunks and on demand (i.e. when the attribute is accessed).
But you have to create the whole structure explicitly, that is:
The Pixel Data element
The Pixel Sequence with...
...the offset table
...a single item containing the contents of the MPEG file
Code
// set length to the size of the video file
DcmInputFileStream dcmFileStream(videofile.c_str(), 0);
DcmPixelSequence* pixelSequence = new DcmPixelSequence(DCM_PixelSequenceTag));
DcmPixelItem* offsetTable = new DcmPixelItem(DCM_PixelItemTag);
pixelSequence->insert(offsetTable);
DcmPixelItem* frame = new DcmPixelItem(DCM_PixelItemTag);
frame->createValueFromTempFile(dcmFileStream.newFactory(), OFstatic_cast(Uint32, length), EBO_LittleEndian);
pixelSequence->insert(frame);
DcmPixelData* pixelData = new DcmPixeldata(DCM_PixelData);
pixelData->putOriginalRepresentation(EXS_MPEG4HighProfileLevel4_1, nullptr, pixelSequence);
dataset->insert(pixelData, true);
DcmFileFormat fileformat(dataset);
OFCondition status = fileformat.saveFile("C://temp//videoTest");
Note that you "destroy" the compression if you save the file in VR Implicit Little Endian.
As mentioned above and obvious in the code, the whole MPEG file is wrapped into a single item in the PixelData. This is DICOM conformant but you may want to encapsulate single frames each in one item.
Note : No error handling presented here

openh264 - bEnableFrameSkip=0, bitrate can't be controlled

there are a lot of questions asked regarding opencv + H.264 but
none of them gave detailed explanation.
i am using openh264(openh264-1.4.0-win32msvc.dll) along with opencv 3.1(custom build with cmake having ffmpeg enabled) in visual studio, i wanted to save video coming from webcam in mp4 format with H.264 compression
VideoWriter write = VideoWriter("D:/movie.mp4", CV_FOURCC('H', '2',
'6', '4'), 10.0, cv::Size(192, 144), true);
before using openh264, in console window i was seeing an warning message
"Failed to load openh264 library : openh264-1.4.0-win32msvc.dll
please check your environment and/or download from here:
https://github.com/cisco/openh264/releases"
(also video was not been saved)
so i downloaded the dll & kept in a folder with my program file(exe)
now when i run the program, i'm seeing different error
"[OpenH264] this = 0x0DE312C0, warning: bEnabledFrameSkip=0, bitrate can't be controlled for RC_QUALITY_MODE and RC_TIMESTAMP_MODE without enabling skip frame"
(now video is saved, but size is very high! bit rate is around 1200 Kbps)
for me, the sole purpose of using h264 is to reduce the file size.. i think i may have to build openh264 myself with some changes to remove this error, can anyone guide me how? or tell me if there is a way to reduce bit rate somehow through code?
P.S: I tried giving -1 instead of CV_FOURCC(), 'installed codecs' window in my system showed up, i couldn't find h264 or x264 or h264vfw even though i have installed variety of codec packs & h264 from here
Thanks & regards
If you want to control bitrate, You have to use both
encoderParemeters.iRCMode = RC_OFF_MODE;
encoderParemeters.bEnableFrameSkip = true;
Here I am showing all the Openh264 Encoding parameters as an Example:
long nReturnedValueFromEncoder = WelsCreateSVCEncoder(&m_pSVCVideoEncoder);
m_nVideoWidth = 352;
m_nVideoHeight = 288;
SEncParamExt encoderParemeters;
memset(&encoderParemeters, 0, sizeof(SEncParamExt));
m_pSVCVideoEncoder->GetDefaultParams(&encoderParemeters);
encoderParemeters.iUsageType = CAMERA_VIDEO_REAL_TIME;
encoderParemeters.iTemporalLayerNum = 0;
encoderParemeters.uiIntraPeriod = 15;
encoderParemeters.eSpsPpsIdStrategy = INCREASING_ID;
encoderParemeters.bEnableSSEI = false;
encoderParemeters.bEnableFrameCroppingFlag = true;
encoderParemeters.iLoopFilterDisableIdc = 0;
encoderParemeters.iLoopFilterAlphaC0Offset = 0;
encoderParemeters.iLoopFilterBetaOffset = 0;
encoderParemeters.iMultipleThreadIdc = 0;
encoderParemeters.iRCMode = RC_BITRATE_MODE;
encoderParemeters.iMinQp = 0;
encoderParemeters.iMaxQp = 52;
encoderParemeters.bEnableDenoise = false;
encoderParemeters.bEnableSceneChangeDetect = false;
encoderParemeters.bEnableBackgroundDetection = true;
encoderParemeters.bEnableAdaptiveQuant = false;
encoderParemeters.bEnableFrameSkip = true;
encoderParemeters.bEnableLongTermReference = true;
encoderParemeters.iLtrMarkPeriod = 20;
encoderParemeters.bPrefixNalAddingCtrl = false;
encoderParemeters.iSpatialLayerNum = 1;
SSpatialLayerConfig *spartialLayerConfiguration = &encoderParemeters.sSpatialLayers[0];
spartialLayerConfiguration->uiProfileIdc = PRO_BASELINE;//;
encoderParemeters.iPicWidth = spartialLayerConfiguration->iVideoWidth = m_nVideoWidth;
encoderParemeters.iPicHeight = spartialLayerConfiguration->iVideoHeight = m_nVideoHeight;
encoderParemeters.fMaxFrameRate = spartialLayerConfiguration->fFrameRate = (float)30;
encoderParemeters.iTargetBitrate = spartialLayerConfiguration->iSpatialBitrate = 500000;
encoderParemeters.iTargetBitrate = spartialLayerConfiguration->iMaxSpatialBitrate = 500000;
spartialLayerConfiguration->iDLayerQp = 24;
//spartialLayerConfiguration->sSliceCfg.uiSliceMode = SM_SINGLE_SLICE;
spartialLayerConfiguration->sSliceArgument.uiSliceMode = SM_SINGLE_SLICE;
nReturnedValueFromEncoder = m_pSVCVideoEncoder->InitializeExt(&encoderParemeters);
Hope it will help you.
You can simply controlled bit rate using RC_BITRATE_MODE and enabling bEnableFrameSkip in Openh264.

what is difference in content of image object and image raw data

I want to send an image over UDP network in small packets of size 1024 bytes.
i have two options.
imgBinaryFormatter->Serialize(memStream, objNewImage); // Sending an image object
OR
imgBinaryFormatter->Serialize(memStream, objNewImage->RawData); // Sending a raw data of image
what is difference in their content and when to use ?
For reference full function is given below
Image^ objNewImage = Image::FromFile(fullPath); // fullpath is full path of an image
MemoryStream^ memStream = gcnew MemoryStream();
Formatters::Binary::BinaryFormatter^ imgBinaryFormatter = gcnew Formatters::Binary::BinaryFormatter(); // Binary formatter
imgBinaryFormatter->Serialize(memStream, objNewImage); // Or objNewImage->RawData ??
arrImgArray = memStream->ToArray(); // COnvert stream to byte array
int iNoOfPackets = arrImgArray->Length / 1024;
int i;
for (i = 1; i < iNoOfPackets; i++){
socket->SendTo(arrImgArray, 1024*(i-1), 1024, SocketFlags::None, receiversAdd);
}
int remainedBytes = arrImgArray->Length - 1024 * iNoOfPackets;
socket->SendTo(arrImgArray, 1024 * iNoOfPackets, remainedBytes, SocketFlags::None, receiversAdd);
If you find improvements in code, feel free to edit code with suitable solution for memory constraint application.
It's better to use
the Image.Save Method (Stream, ImageFormat) for serialization into a Stream
and
the Image.FromStream Method (Stream) for deserialization from a Stream
or one of their overloads

initialising NSInputStream from a particular portion of file (copied in Document directory)?

I am uploading a large file from my iOS app and file transfer is in chunk upload. i am using the below code to initialise NSInputStream for Chunks.
// for example
NSInteger chunkCount = 20;
for(int i=0; i<chunkCount; i++) {
NSFileHandle *handle = [NSFileHandle fileHandleForReadingAtPath:filePath];
[handle seekToFileOffset:(unsigned long long)i * (chunkCount == 1?fileSize:chunkSize)];
NSData *fileData = [handle readDataOfLength:chunkSize];
NSInputStream *iStream = [[NSInputStream alloc]initWithData:fileData];
}
But I'd like to know if I can have a method of NSInputStream by which i can initialise iStream from the range of file Stream rather than NSData.
Thanks
There is NSStreamFileCurrentOffsetKey property for file streams to specify read offset.
NSInputStream *s = [NSInputStream inputStreamWithFileAtPath:path];
[s setProperty:offset forKey:NSStreamFileCurrentOffsetKey];

Reading and processing WAV file data in C/C++

I'm currently doing a very very important school project. I need to extract the information of a WAVE file in C/C++ and use the information to obtain the LPC of a voice signal. But, in order to do that, I need to do some pre-processing to the signal, like doing Zero crossing and energy analysis, among other things. Which means that I need the sign and a real value. The problem is that I don't know how to obtain useful information and the correct format for that. I have already read every single field in the file, but I'm not sure I am doing it right. Suggestions, please?
This is the way I read the file at the moment:
readI = fread(&bps, 1, 2, audio);
printf("bits per sample = %d \n", bps);
Thanks in advance.
My first recommendation would be to use some kind of library to help you out. Most sound solutions seem overkill, so a simple library (like the one recommended in the comment of your question, libsndfile) should do the trick.
If you just want to know how to read WAV files so you can write your own (since your school might turn its nose up at having you use a library like any other regular person), a quick google search will give you all the info you need plus some people who have already wrote many tutorials on reading the .wav format.
If you still don't get it, here's some of my own code where I read the header and all other chunks of the WAV/RIFF data file until I get to the data chunk. It's based exclusively off the WAV Format Specification. Extracting the actual sound data is not very hard: you can either read it raw and use it raw or do a conversion to a format you'd have more comfort with internally (32-bit PCM uncompressed data or something).
When looking at the below code, replace reader.Read...( ... ) with equivalent fread calls for integer values and byte sizes of the indicated type. WavChunks is an enum that is the Little Endian values of the IDs inside of a WAV file chunk, and the format variable is one of the types of the Wav Format Types that can be contained in the WAV File Format:
enum class WavChunks {
RiffHeader = 0x46464952,
WavRiff = 0x54651475,
Format = 0x020746d66,
LabeledText = 0x478747C6,
Instrumentation = 0x478747C6,
Sample = 0x6C706D73,
Fact = 0x47361666,
Data = 0x61746164,
Junk = 0x4b4e554a,
};
enum class WavFormat {
PulseCodeModulation = 0x01,
IEEEFloatingPoint = 0x03,
ALaw = 0x06,
MuLaw = 0x07,
IMAADPCM = 0x11,
YamahaITUG723ADPCM = 0x16,
GSM610 = 0x31,
ITUG721ADPCM = 0x40,
MPEG = 0x50,
Extensible = 0xFFFE
};
int32 chunkid = 0;
bool datachunk = false;
while ( !datachunk ) {
chunkid = reader.ReadInt32( );
switch ( (WavChunks)chunkid ) {
case WavChunks::Format:
formatsize = reader.ReadInt32( );
format = (WavFormat)reader.ReadInt16( );
channels = (Channels)reader.ReadInt16( );
channelcount = (int)channels;
samplerate = reader.ReadInt32( );
bitspersecond = reader.ReadInt32( );
formatblockalign = reader.ReadInt16( );
bitdepth = reader.ReadInt16( );
if ( formatsize == 18 ) {
int32 extradata = reader.ReadInt16( );
reader.Seek( extradata, SeekOrigin::Current );
}
break;
case WavChunks::RiffHeader:
headerid = chunkid;
memsize = reader.ReadInt32( );
riffstyle = reader.ReadInt32( );
break;
case WavChunks::Data:
datachunk = true;
datasize = reader.ReadInt32( );
break;
default:
int32 skipsize = reader.ReadInt32( );
reader.Seek( skipsize, SeekOrigin::Current );
break;
}
}