Video encoding services - web-services

I need to be able to allow users to upload a wide variety of video files in various formats then clean them up and make them kosher for delivery to a dedicated content handler.
I've tried ffmpeg onsite but it has some serious flaws in regards to h.264.
Then I tried flixcloud.com which has a very good interface, api, and was looking like the perfect solution except it doesn't provide the video frame rate correctly.
Moving on I tried Ankoder.com and it does work, but unfortunately it's API is somewhat of a mess and has some quirks that are proving to be difficult to code around.
What other services are out there, will only accept answer from someone who has used a video transcoding service.
Update:
Just started looking at http://www.encoding.com/ - seems interesting.

I have worked with a number of transcoding services in the past and have found flaws with just about all of them. For the last 3 projects I have been involved with that involved media encoding I have used Expression Encoder, with great results. The application its self is a pleasure to use and simple to achieve the results needed, and the SDK is one of the best out there. Microsoft have definatly done a great job.

Related

Could I forward a stream from my website to youtube through their API?

I'm making a website for my church so we don't have to miss services and kids don't have to miss their classes during this pandemic and I made it so we can stream directly to the website but we can't stream to both the website and youtube at the same time since both require either an encoder or a webcam. Do you think I'd be able to somehow emulate an encoder to stream to youtube as well? I don't need help coding it, I just need to know if you think it's possible and if it is, if I should code a package so others can do the same. If I can help give other developers a bigger tool belt, I will. Through some modification of django-encode it may work. I don't think it's against their Terms of Service but I'll do more reading into that.

Ember Way to Add Rss Feed without third party widget, Front-end only

I am using Ember 3.0 at the moment. Wrote my first lines of code in ANY language about 1 year ago (I switched careers from something totally unrelated to development), but I quickly took to ember. So, not a ton of experience, but not none. I am writing a multi-tenant site which will include about 20 different sites, all with one Ember frontend and a RubyOnRails backend. I am about 75% done with the front end, now just loading content into it. I haven’t started on the backend yet, one, because I don’t have MUCH experience with backend stuff, and two, because I haven’t needed it yet. My sites will be informational to begin with and I’ll build it up from there.
So. I am trying to implement a news feed on my site. I need it to pull in multiple rss feeds, perhaps dozens, filtered by keyword, and display them on my site. I’ve been scouring the web for days just trying to figure out where to get started. I was thinking of writing a service that parses the incoming xml, I tried using a third party widget (which I DON’T really want to do. Everything on my site so far has been built from scratch and I’d like to keep it that way), but in using these third party systems I get some random cross domain errors and node-child errors which only SOMETIMES pop up. Anyway, I’d like to write this myself, if possible, since I’m trying to learn (and my brain is wired to do the code myself - the only way it sticks with me).
Ultimately, every google result I read says RSS feeds are easy to implement. I don’t know where I’m going wrong, but I’m simply looking for:
1: An “Ember-way” starting point. 2: Is this possible without a backend? 3: Do I have to use a third party widget/aggregator? 4: Whatever else you think might help on the subject.
Any help would be appreciated. Here in New Hampshire, there are basically no resources, no meetings, nothing. Thanks for any help.
Based on the results I get back when searching on this topic, it looks like you’ll get a few snags if you try to do this in the browser:
CORS header issues (sounds like you’ve already hit this)
The joy of working with XML in JavaScript (that just might be sarcasm 😉, it’s actually unlikely to be fun)
If your goal is to do this as a learning exercise, then doing it Javascript/Ember will definitely help you learn lots of new things. You might start with this article as a jumping off point: https://www.raymondcamden.com/2015/12/08/parsing-rss-feeds-in-javascript-options/
However, if you want to have this be maintainable for the long run and want things to go quickly and smoothly, I would highly recommend moving the RSS parsing system into your backend and feeding simple data out to Ember. There are enough gotchas and complexities to RSS feeds over time that using a battle-tested library is going to be your best way to stay sane. And loading that type of library up in Ember (while quite doable) will end up increasing your application size. You will avoid all those snags (and more I’m probably not thinking of) if you move your parsing back to the server ...

Grabbing frames when VideoInfoHeader2 structure

I'm working on an application that makes analysis on video files.
Being no expert on DirectShow I used simple code for the analysis
of all frames (SampleGrabber, Callback etc.).
This works fine for all media files, even when decoded with
VideoInfoHeader2 structure (although it shouldn't, as stated everywhere).
The problem is with grabbing a single frame.
For this I used IMediaDet. And this doesn't do if there's only VideoInfoHeader2, and no VideoInfoHeader.
I tried modifications of my analysis code (OneShot, Seek), but it doesn't do.
All sources in the internet concerning this are not very helpful, as they point to SDK/ DX examples that aren't accessible anymore, or they just say that modification would be "easy".
Well, maybe for an DX expert ...
(But I need to use the car, not first to build it ... ;-)
As the matter became more & more important to me my "workaround" is to recode all videos with VideoInfoHeader2, save them with VideoInfoHeader, and do the analysis/ grabbing on that.
Very resource consuming, and the opposite of smart ...
Any help appreciated.
You outlined the necessary steps which are still the easiest solution (provided that you don't give up and use Windows API; using a third party library might be easier in comparison but this is beyond the scope of this question).
Sample Grabber and IMediaDet are parts of deprecated DirectShow Editing Services, development of which was stopped long ago. If you are not satisfied with stock API, you have to use a more flexible replacement. For example, you can grab source of similar Sample Grabber sample from older DirectX or Platform SDK and extend it to support VIDEOINFOHEADER2.
IMediaDet is nothing but COM class building its own graph internally trying to decode video. It is inflexible and almost every time your building your own graph is a more reliable solution.
Microsoft's answer to this problem is - as they abandoned DirectShow development - newer API Media Foundation. However there are reasons why this "answer" is not so good: limited OS compatibility, limited support for codecs and formats, completely new API which has little in common with DirectShow and you need to redesign your application.
All together, you either have to find a Sample Grabber replacement using one of the popular and explained methods (no matter they look not so much helpful), or switch to another API or third party library. Or, another possible solution is to use a different filter/codec which is capable of decoding into VIDEOINFOHEADER formatted media type.

Audio Subtitle Transcription - C++

I'm on a project that among other video related tasks should eventually be capable of extracting the audio of a video and apply some kind of speech recognition to it and get a transcribed text of what's said on the video. Ideally it should output some kind of subtitle format so that the text is linked to a certain point on the video.
I was thinking of using the Microsoft Speech API (aka SAPI). But from what I could see it is rather difficult to use. The very few examples that I found for speech recognition (most are for Text-To-Speech which mush easier) didn't perform very well (they don't recognize a thing). For example this one: http://msdn.microsoft.com/en-us/library/ms717071%28v=vs.85%29.aspx
Some examples use something called grammar files that are supposed to define the words that the recognizer is waiting for but since I haven't trained the Windows Speech Recognition thoroughly I think that might be adulterating the results.
So my question is... what's the best tool for something like this? Could you provide both paid and free options? Well the best "free" (as it comes with Windows) option I believe it's SAPI, all the rest should be paid but if they are really good it might be worth it. Also if you have any good tutorials for using SAPI (or other API) on a context similar to this it would be great.
On the whole this is a big ask!
The issue with any speech recognition system is that it functions best after training. It needs context (what words to expect) and some kind of audio benchmark (what does each voice sound like). This might be possible in some cases, such as a TV series if you wanted to churn through hours of speech -separated for each character- to train it. There's a lot of work there though. For something like a film there's probably no hope of training a recogniser unless you can get hold of the actors.
Most film and TV production companies just hire media companies to transcribe the subtitles based on either direct transcription using a human operator, or converting the script. The fact that they still need humans in the loop for these huge operations suggests that automated systems just aren't up to it yet.
In video you have a plethora of things that make you life difficult, pretty much spanning huge swathes of current speech technology research:
-> Multiple speakers -> "Speaker Identification" (can you tell characters apart? Also, subtitles normally have different coloured text for different speakers)
-> Multiple simultaneous speakers -> The "cocktail party problem" - can you separate the two voice components and transcribe both?
-> Background noise -> Can you pick the speech out from any soundtrack/foley/exploding helicopters.
The speech algorithm will need to be extremely robust as different characters can have different gender/accents/emotion. From what I understand of the current state of recognition you might be able to get a single speaker after some training, but asking a single program to nail all of them might be tough!
--
There is no "subtitle" format that I'm aware of. I would suggest saving an image of the text using a font like Tiresias Screenfont that's specifically designed for legibility in these circumstances, and use a lookup table to cross-reference images against video timecode (remembering NTSC/PAL/Cinema use different timing formats).
--
There's a bunch of proprietary speech recognition systems out there. If you want the best you'll probably want to license a solution off one of the big boys like Nuance. If you want to keep things free the universities of RWTH and CMU have put some solutions together. I have no idea how good they are or how well they might be suited to the problem.
--
The only solution I can think of similar to what you're aiming at is the subtitling you can get on news channels here in the UK "Live Closed Captioning". Since it's live, I assume they use some kind of speech recognition system trained to the reader (although it might not be trained, I'm not sure). It's got better over the past few years, but on the whole it's still pretty poor. The biggest thing it seems to struggle with is speed. Dialogue is normally really fast, so live subtitling has the extra issue of getting everything done in time. Live closed captions quite frequently get left behind and have to miss a lot of content out to catch up.
Whether you have to deal with this depends on whether you'll be subtitling "live" video or if you can pre-process it. To deal with all the additional complications above I assume you'll need to pre-process it.
--
As much as I hate citing the big W there's a goldmine of useful links here!
Good luck :)
This falls into the category of dictation, which is a very large vocabulary task. Products like Dragon Naturally Speaking are amazingly good and that has a SAPI interface for developers. But it's not so simple of a problem.
Normally a dictation product is meant to be single speaker and the best products adapt automatically to that speaker, thereby improving the underlying acoustic model. They also have sophisticated language modeling which serves to constrain the problem at any given moment by limiting what is known as the perplexity of the vocabulary. That's a fancy way of saying the system is figuring out what you're talking about and therefore what types of words and phrases are likely or not likely to come next.
It would be interesting though to apply a really good dictation system to your recordings and see how well it does. My suggestion for a paid system would be to get Dragon Naturally Speaking from Nuance and get the developer API. I believe that provides a SAPI interface, which has the benefit of allowing you to swap in the Microsoft speech or any other ASR engine that supports SAPI. IBM would be another vendor to look at but I don't think you will do much better than Dragon.
But it won't work well! After all the work of integrating the ASR engine, what you will probably find is that you get a pretty high error rate (maybe half). That would be due to a few major challenges in this task:
1) multiple speakers, which will degrade the acoustic model and adaptation.
2) background music and sound effects.
3) mixed speech - people talking over each other.
4) lack of a good language model for the task.
For 1) if you had a way of separating each actor on a separate track that would be ideal. But there's no reliable way of separating speakers automatically in a way that would be good enough for a speech recognizer. If each speaker were at a distinctly different pitch, you could try pitch detection (some free software out there for that) and separate based on that, but this is a sophisticated and error prone task.) The best thing would be hand editing the speakers apart, but you might as well just manually transcribe the speech at that point! If you could get the actors on separate tracks, you would need to run the ASR using different user profiles.
For music (2) you'd either have to hope for the best or try to filter it out. Speech is more bandlimited than music so you could try a bandpass filter that attenuates everything except the voice band. You would want to experiment with the cutoffs but I would guess 100Hz to 2-3KHz would keep the speech intelligible.
For (3), there's no solution. The ASR engine should return confidence scores so at best I would say if you can tag low scores, you could then go back and manually transcribe those bits of speech.
(4) is a sophisticated task for a speech scientist. Your best bet would be to search for an existing language model made for the topic of the movie. Talk to Nuance or IBM, actually. Maybe they could point you in the right direction.
Hope this helps.

Sound output through M-Audio ProFire 610

I got an assignment at work to create a system which will be able to direct sound to different output channels of our sound card. We are using M-Audio ProFire 610, which has 8 channel output and connects through FireWire. We are also using a Mac Mini as our host server and I'm gonna be working in Xcode.
This is the diagram of what I am building:
diagram http://img121.imageshack.us/img121/7865/diagramy.png
At first I thought that Java will be enough for this project, however later on I discovered that Java is not able to push sound to other than default output channels of the sound card so I decided to switch to C++. The problem is that I am a web developer and I don't have any experience in this language whatsoever - that is why I am looking for help from more experienced developers.
I found a Core Audio Primer for ios4 but not sure how much of it I can use for my project. I find it a bit confusing, too.
What steps should I take to complete this assignment? What frameworks should I use? Any code examples? I am looking for any help, hints, tips - well anything that will help me complete this project.
If you're just looking for audio pass-through, you might want to look at something that's already been built, like Jack which creates a software audio device that looks and works just like a real one (you can set it as default output for your app) and then allows you to route each channel anywhere you want (including to other applications).
If you want/need to make your own, definitely go with C++, for which there are many many tutorials (I learned from cplusplus.com). CoreAudio is the low-level C/C++ interface as Justin mentioned, but it's really hard to learn and use. A much simpler API is provided by PortAudio, for which I've worked a bit on the Mac implementation. Look at the tutorials there, make something similar for default input and output, and then to do the channel mapping use PaMacCore_SetupChannelMap, which is described here. You'll need to call it twice, once for the input stream and once for the output stream. Join the mailing list for PortAudio if you need more advice! Good luck!
the primary APIs are at CoreAudio/AudioHardware.h
most of the samples/supporting code provided by apple is in C++. however, the APIs are totally C (don't know if that helps you or not).
you'll want to access the Hardware Abstraction Layer (aka HAL), more details in this doc:
http://developer.apple.com/documentation/MusicAudio/Conceptual/CoreAudioOverview/CoreAudioOverview.pdf
for (a rather significant amount of) additional samples/usage, see $DEVELOPER_DIR/Extras/CoreAudio/