Convert video into different qualities AWS MediaConvert - amazon-web-services

I have a test.mp4 file (for example). I need to convert it so that there was an option to select the quality on the client-side in the player.
For example, if the video is in 4k resolution, then the client should be able to select the quality of auto, 4k, 1080p, 720p, and 480p.
If the video is 1080p, the choice should be auto, 1080p, 720p and 480p.
And so on.
I know I should choose to convert to Apple HLS and get an m3u8 file in the output.
Tried using ABR, but that's not what I need.
I use AWS MediaConvert to convert.

What you are describing sounds like an HLS bitrate stack. I'll answer based on that assumption.
It will be the responsibility of the playback software to present a menu of the available resolutions. If you want the player to disable its adaptive rendition selection logic and permit the viewer to stay on a specified rendition regardless of segment download times, that workflow needs to be configured within the video player object. In either case you will need an asset file group consisting of manifests and segments.
FYI, MediaConvert has both an automatic ABR mode (which determines the number of renditions & bitrate settings automatically) and a 'manual mode' where you provide the parameters of each child rendition. In this mode, each child rendition is added as a separate Output under the main Apple HLS Output Group. More information can be found here: https://docs.aws.amazon.com/mediaconvert/latest/ug/outputs-file-ABR.html.

Related

How to improve the transcription quality in AWS Transcribe

I have the few audio files which are the conversation between Customer and Agent stored successfully in S3.
I try to convert the audio files as text using AWS transcribe and it is converting successfully.
But the weird part is, It is not even 60 % accurate, These are my configuration for the AWS Transcribe
1) Language code - English(Indian)
2) Audio Frequency - 8000HZ
3) Format - WAV
As per this guidelines (https://docs.aws.amazon.com/transcribe/latest/dg/limits-guidelines.html),
I set the Audio Frequency and Format to 8KHZ and Format as WAV
Do I need to change any other parameters for improving the audio quality?
Any help is appreciated.
Thanks,
Harry
Many thing can affect transcript quality, like background noise in audio, speaker overlap, speakers' accent. Higher quality audio usually gives better result.
You can try using custom vocabularies. You can create these custom vocabularies as mentioned here https://docs.aws.amazon.com/transcribe/latest/dg/how-vocabulary.html
This custom vocabulary list should some prior keywords which would be spoken and are specific to this domain. However, as per my experience these custom vocabularies overfit (incorrectly outputs the words in transcript from the custom vocabulary) at times.

HLS, AWS Elastic Encoder, and adaptive streaming

I'm currently working on simple VOD browser-based service, using mostly AWS technologies. HLS will be used as the streaming protocol, which is supported by Elastic Transcoder.
Currently, the source material is 720p (1280x720), and this is also the resolution I want to show to all devices that can handle it. I would like the videos to work on desktops, iPad's, and most smartphones. I'm using ViBlast with videojs, as the player.
I have the following questions:
the m3u8 playlist allows to specify multiple streams. Should each resolution get's it own playlist (with different source-streams on different bitrates), or can I put everything in one playlist (so one playlist can serve different resolutions and bitrates).
Seems desktops and most recent tablets can display 1280x720, I assume the same playlist can be used. I just need to specify bitrates. However, what is the best resolution for mobile phones? Seems every device has other dimensions (looking at Android here).
Which bitrate should I use for each device? I'm doing some research, but it seems every article has a different recommendation for the "best" setting, but never explain how they got those numbers.
If I use a playlist which contains different sources with different resolutions, does the order in the playlist matter? I've read somewhere that lowest bitrates should be listed first, but does this also apply to resolutions? Or does the player automatically picks the stream which best matches the screen?
I'm looking for a "good enough" solution that will fit most devices.
Hope this helps.
the m3u8 playlist allows to specify multiple streams. Should each
resolution get's it own playlist (with different source-streams on
different bitrates), or can I put everything in one playlist (so one
playlist can serve different resolutions and bitrates).
For reference, here is Apple's 'Technical Note TN2224' on the subject which is a good guideline for the info below.
https://developer.apple.com/library/content/technotes/tn2224/_index.html
Short answer: Each resolution should have its own variant playlist.
Typically there is one master playlist with references to the variant playlists (aka renditions). The variant playlists are different quality streams of the same video, varying in bitrate and resolution. But each variant only contains one bitrate level. Sample master playlist:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=4648000,RESOLUTION=3840x2160
4648k/stream.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2670000,RESOLUTION=1920x1080
2670k/stream.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1823000,RESOLUTION=1280x720
1823k/stream.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=975000,RESOLUTION=854x480
975k/stream.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=491000,RESOLUTION=640x360
491k/stream.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=186000,RESOLUTION=256x144
186k/stream.m3u8
"The bitrates are specified in the EXT-X-STREAM-INF tag using the BANDWIDTH attribute" (TN2224). And each descending bandwidth (bitrate) level has a corresponding lower resolution because there is less data available and usually expected to be viewed on smaller, mobile screens.
Seems desktops and most recent tablets can display 1280x720, I assume
the same playlist can be used. I just need to specify bitrates.
However, what is the best resolution for mobile phones? Seems every
device has other dimensions (looking at Android here).
Resolution and bitrate go together. A stream encoded with a 186K bitrate (very low) does not have enough data to fill a 1280x720 screen. But a mobile device on a cell network might not be able to download a high bitrate. So you need several variants options available, each with the appropriate resolution and bitrate.
Don't focus on a specific device or else you'll never finish. Build a ladder of bitrate/resolution variants using common 16:9 aspect ratios. E.g. 1280x720, 1024x576, 640x360,...
There are several things to consider though. Bitrate, resolution you are already considering. But are these videos encoded using H.264? If so you should consider the profile level. Here is a good article on the topic: http://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=94216&PageNum=1.
Which bitrate should I use for each device? I'm doing some research,
but it seems every article has a different recommendation for the
"best" setting, but never explain how they got those numbers.
Same answer as resolution. Don't focus on the actual device. Build a ladder of bitrate/resolution variants that allows the device to select the most appropriate based on available bandwidth, battery life, processing power, etc.
If I use a playlist which contains different sources with different
resolutions, does the order in the playlist matter? I've read
somewhere that lowest bitrates should be listed first, but does this
also apply to resolutions? Or does the player automatically picks the
stream which best matches the screen?
Each publisher or manufacturer might build their player differently. But this is what Apple recommends in TN2224.
"First bit rate should be one that most clients can sustain
The first entry in the master playlist will be played at the initiation of a stream and is used as part of a test to determine which stream is most appropriate. The order of the other streams is irrelevant. Therefore, the first bit rate in the playlist should be the one that most clients can sustain."
Hope that helps.
Ian

How use MFT in windows application without using media transform pipeline

I am newbie in media foundation programming and windows programing as well.
It might looks very silly question but i didn't get clear answer anywhere.
My application is to capture screen, scale, encode and send the data to network. I am looking to improve the performance of my pipeline. so i want to change some intermediate libraries like scaling or encoding libraries.
When i do a lot of search for better option of scaling and encoding, i end up with some MFT(media foundation transform) e.g.Video Processor MFT and H.264 Video Encoder MFT.
My application already implemented pipeline and i don't want to change complete architecture.
can we directly use MFT as a library and add in my project? or i have to build complete pipeline, source and sink.
As per architecture of Media foundation a MFT is intermediate block. It requires IMFTransform::GetInputStreamInfo and IMFTransform::GetOutputStreamInfo.
Is it any way to call direct API's of MFT to perform (scaling and encoding) with creating complete pipeline?
Please provide link if any similar question already asked.
Yes you can create this IMFTransform directly and use it in isolation from pipeline. It is very typical usage model for encoder MFT.
You will need to configure input / output media types, start streaming, feed input frames and grab output frames.
Depending on whether your transform is synchronous or asynchronous (which may differ depending on HW or SW implementation of your MFT) you may need use basic (https://msdn.microsoft.com/en-us/library/windows/desktop/aa965264(v=vs.85).aspx) or async (https://msdn.microsoft.com/en-us/library/windows/desktop/dd317909(v=vs.85).aspx) processing model.

selectively 'turn off' more than one out pin stream directshow filter

I'm sure this question would have been asked before but I've searched and can't find anything specific to help a solution.
I'll start out outlining the initial concerns and if more indepth technical information is needed then I can give it. Hopefully there is enough information for the initial question(s).
I'm writing an app using c++ and directshow in visual studio 2010. The main project specification is for a live preview and, at any time of choosing, record the video to mpeg2 to harddrive then to dvd to be played in a standard dvd player, all the time the live preview is not to be interrupted.
The capturing seems a pretty trivial standard straight forward thing to do with directshow.
There are a couple of custom filters that i wrote. Nothing amazing but we wanted our own custom screen overlay information - time and date etc - this must be in the preview and the recorded file. I use the avi decompressor connected to the capture card video out pin, and connect the avi decompressor to my filter to give me an rgb image that i can manipulate. The output from this filter is then split via an inftee filter, one goes to the screen, the other goes into the ms mpeg2 encoder. The audio goes from the capture card audio out into the same mpeg2 encoder. Output from the mpeg2 encoder then goes to a file. That file then gets authored for dvd and burnt to dvd.
So my questions are...
Where and how would be the best place to allow starting and stopping of only mpeg2 file output, to be done via user action?
I have tried using smart tee filters - 1 for video and 1 for audio as the last filter BEFORE the mpeg2 encoder, then using the iamstreamcontrol interface to turn off the pins at the appropriate time. Should this cause any timing issues with the final mpeg2? as the output file will play via mplayer and vlc etc but doesnt get converted to be mpeg2 dvd compliant ( for testing - via any dvd authoring software - complaints of a broken file and somteimes gives time references ) - is it possible that time stamps in the file are a problem and giving an error? If the file is captured from the first moment that capture commences ( as opposed to say after 5 mins of streaming ) then everything is ok.
I did think of using the streambuffer route - http://msdn.microsoft.com/en-gb/library/windows/desktop/dd693041(v=vs.85).aspx - but I'm not sure on the best direction to takes things. It seems that are possibly a few choices for the best direction.
Any help and tips would be greatly appreciated. Especially if theres websites/books/information of DirectShow filters,pins,graphs and how they all flow together.
EDIT: I was thinking of making my own copy of the 'Smart Tee' filter that in that I would have 2 pins coming in - audio and video - and 4 out pins - 2 video ( 1 preview and 1 capture ) and 2 of the same for audio, but would I end up with the same issue? And what is the correct way to handle 'switching off' the capture pins of that custom filter. Would I be wasting my time to work on something like this? Is it a simple case of overriding the Active/Inactive methods of the output pin(s) and either send or not send the sample? I feel its not that easy?
Many thanks!
Where and how would be the best place to allow starting and stopping of only mpeg2 file output, to be done via user action?
For this kind of action I would recommend GMFBridge. Creating your own filter is not easy. GMFBridge allows you to use two separate graphs with a dynamic connection. Use the first graph for the preview and the second graph for the file output. And only connect the graphs after a user action.

Convert Movie to OpenNI *.oni video

The Kinect OpenNI library uses a custom video file format to store videos that contain rgb+d information. These videos have the extension *.oni. I am unable to find any information or documentation whatsoever on the ONI video format.
I'm looking for a way to convert a conventional rgb video to a *.oni video. The depth channel can be left blank (ie zeroed out). For example purposes, I have a MPEG-4 encoded .mov file with audio and video channels.
There are no restrictions on how this conversion must be made, I just need to convert it somehow! Ie, imagemagick, ffmpeg, mencoder are all ok, as is custom conversion code in C/C++ etc.
So far, all I can find is one C++ conversion utility in the OpenNI sources. From the looks of it, I this converts from one *.oni file to another though. I've also managed to find a C++ script by a phd student that converts images from a academic database into a *.oni file. Unfortunately the code is in spanish, not one of my native languages.
Any help or pointers much appreciated!
EDIT: As my usecase is a little odd, some explanation may be in order. The OpenNI Drivers (in my case I'm using the excellent Kinect for Matlab library) allow you to specify a *.oni file when creating the Kinect context. This allows you to emulate having a real Kinect attached that is receiving video data - useful when you're testing / developing code (you don't need to have the Kinect attached to do this). In my particular case, we will be using a Kinect in the production environment (process control in a factory environment), but during development all I have is a video file :) Hence wanting to convert to a *.oni file. We aren't using the Depth channel at the moment, hence not caring about it.
I don't have a complete answer for you, but take a look at the NiRecordRaw and NiRecordSynthetic examples in OpenNI/Samples. They demonstrate how to create an ONI with arbitrary or modified data. See how MockDepthGenerator is used in NiRecordSynthetic -- in your case you will need MockImageGenerator.
For more details you may want to ask in the openni-dev google group.
Did you look into this command and its associated documentation
NiConvertXToONI --
NiConvertXToONI opens any recording, takes every node within it, and records it to a new ONI recording. It receives both the input file and the output file from the command line.