Holiday project goals for undergrads with an FPGA? - computer-vision

It's a student project for vacation research, under-grad, not sure how many of us, there'll probably be 4-6, we're motivated.
My original proposal was to get an FPGA (on an Artix-7 or Z-board) to run a CID camera sensor as a dumb peripheral, do some basic image processing stuff on it (perhaps edge detection and dynamic windowing), and output a bitmap to a PC.
One of the faculty who has about 16 years experience with FPGAs has suggested that for my (and my colleagues) level, this might not be achieveable in a 6-10 week time-frame (we're all pretty much beginners).
We wish to keep the original goal on a more long term basis, but want now to have some other project goals, that would move us towards having the skills and experience (and perhaps some of the IP) for this ultimate goal.
What would some good intermediate project goals be for a group of undergrad beginners with a summer holiday to spend if we ultimately would like to get an FPGA doing cool stuff with a CID camera sensor?

what about distance estimation - using two cameras or more in stereo.

Related

Specific topics on Tensorflow for CNN

I have a mini project for my new course in Tensorflow for this semester with random topics. Since I have some background on Convolution Neuron Network, I intend to use it for my project. My computer can only run CPU version of TensorFlow.
However, as a new bee, I realize that there are a lot of topics such that MNIST, CIFAR-10, etc, thus I don't know which suitable topic I should pick out from them. I only have two weeks left. It would be great if the topic is not too complicated but too not easy for study because it matchs my intermediate level.
In your experience, could you give me some advice about the specific topic I should do for my project?
Moreover, it would be better if in this topic I can provide my own data to test my training, because my professor said that it is a plus point to get A grade in my project.
Thanks in advance,
I think that to answer this question you need to properly evaluate the marking criteria for your project. However, I can give you a brief overview of what you've just mentioned.
MNIST: MNIST is a Optical Character Recognition task for individual numbers 0-9 in images size 28px square. This is considered the "Hello World" of CNNs. It's pretty basic and might be too simplistic for your requirements. Hard to gauge without more information. Nonetheless, this will run pretty quickly with CPU Tensorflow and the online tutorial is pretty good.
CIFAR-10: CIFAR is a much bigger dataset of objects and vehicles. The image sizes are 32px square so individual image processing isn't too bad. But the dataset is very large and your CPU might struggle with it. It takes a long time to train. You could try training on a reduced dataset but I don't know how that would go. Again, depends on your course requirements.
Flowers-Poets: There is the Tensorflow for Poets re-training example which might not be suitable for your course, you could use the flowers dataset to build your own model.
Build-your-own-model: You could use tf.Layers to build your own network and experiment with it. tf.Layers is pretty easy to use. Alternatively you could look at the new Estimators API that will automate a lot of the training processes for you. There are a number of tutorials (of varying quality) on the Tensorflow website.
I hope that helps give you a run-down of what's out there. Other datasets to look at are PASCAL VOC and imageNet (however they are huge!). Models to look at experimenting with may include VGG-16 and AlexNet.

How to process a very heavy downloadable multiplayer game

I am a programmer and I'd like to know how to process a very heavy downloadable multiplayer game. This game will have robots like in Armored Core with arms and legs which are changeable but it will have like 100 players in a certain area fighting aliens and each one fires like 10 bullets each second plus the enemy attacks.
That is like 2000 bullets each second flying in all directions, plus explosions and missiles and lasers and the environment too, and the AI of the aliens.
Is that very hard to process on today's computers? If it is heavy, how would I process that in a multiplayer scenario? Does every computer split up the job and do their own part?
Is that very hard to process on today's computers?
For a programmer that has to ask the question - yes. For a programmer capable of writting efficient code and using a modern high end computer (that would right now have 120 physical threads) - not really. THe AI may be a problem, but then that can run on a cluster of machines.
Does every computer split up the job and do their own part?
Do you TRUST your players not to cheat? If you do not - can you answer that question yourself?
Generally you hire someone who has experience writing distributed systems. Generally this is way over the head of "a programmer asking question" and the level architects with real time experience come into the game. All thos bullets may sound heavy, but I proces a data stream here doing sometimes in excess of 100.000 updates per second, so - it is doable.

Issue regarding practical approach on machine learning/computer vision fields

I am really passionate about the machine learning,data mining and computer vision fields and I was thinking at taking things a little bit further.
I was thinking at buying a LEGO Mindstorms NXT 2.0 robot for trying to experiment machine learning/computer vision and robotics algorithms in order to try to understand better several existing concepts.
Would you encourage me into doing so? Do you recommend any other alternative for a practical approach in understanding these fields which is acceptably expensive like(nearly 200 - 250 pounds) ? Are there any mini robots which I can buy and experiment stuff with?
If your interests are machine learning, data mining and computer vision then I'd say a Lego mindstorms is not the best option for you. Not unless you are also interested in robotics/electronics.
Do do interesting machine learning you only need a computer and a problem to solve. Think ai-contest or mlcomp or similar.
Do do interesting data mining you need a computer, a lot of data and a question to answer. If you have an internet connection the amount of data you can get at is only limited by your bandwidth. Think netflix prize, try your hand at collecting and interpreting data from wherever. If you are learning, this is a nice place to start.
As for computer vision: All you need is a computer and images. Depending on the type of problem you find interesting you could do some processing of random webcam images, take all you holiday photo's and try to detect where all your travel companions are in them. If you have a webcam your options are endless.
Lego mindstorms allows you to combine machine learning and computer vision. I'm not sure where the datamining would come in, and you will spend (waste?) time on the robotics/electronics side of things, which you don't list as one of your passions.
Well, I would take a look at the irobot create... well within your budget, and very robust.
Depending on your age, you may not want to be seen with a "lego robot" if you are out of college :-)
Anyway, I buy the creates in batches for my lab. You can link to them with a hard cable(cheap) or put a blue tooth interface on it.
But a webcam on that puppy, hook it up to a multicore machine and you have an awesome working robot for the things you want to explore.
Also, the old roombas had a ttl level serial port (if that did not make sense to you , then skip it). I don't know about the new ones. So, it was possible to control any roomba vacuum from a laptop.
The Number One rule, and I cannot emphasize this enough: Have a reliable platform for experimentation. If you hand build something, just for basic functionality, you will spend all your time on minor issues and not get to the fun stuff.
Anyway. best of luck.

Audio Subtitle Transcription - C++

I'm on a project that among other video related tasks should eventually be capable of extracting the audio of a video and apply some kind of speech recognition to it and get a transcribed text of what's said on the video. Ideally it should output some kind of subtitle format so that the text is linked to a certain point on the video.
I was thinking of using the Microsoft Speech API (aka SAPI). But from what I could see it is rather difficult to use. The very few examples that I found for speech recognition (most are for Text-To-Speech which mush easier) didn't perform very well (they don't recognize a thing). For example this one: http://msdn.microsoft.com/en-us/library/ms717071%28v=vs.85%29.aspx
Some examples use something called grammar files that are supposed to define the words that the recognizer is waiting for but since I haven't trained the Windows Speech Recognition thoroughly I think that might be adulterating the results.
So my question is... what's the best tool for something like this? Could you provide both paid and free options? Well the best "free" (as it comes with Windows) option I believe it's SAPI, all the rest should be paid but if they are really good it might be worth it. Also if you have any good tutorials for using SAPI (or other API) on a context similar to this it would be great.
On the whole this is a big ask!
The issue with any speech recognition system is that it functions best after training. It needs context (what words to expect) and some kind of audio benchmark (what does each voice sound like). This might be possible in some cases, such as a TV series if you wanted to churn through hours of speech -separated for each character- to train it. There's a lot of work there though. For something like a film there's probably no hope of training a recogniser unless you can get hold of the actors.
Most film and TV production companies just hire media companies to transcribe the subtitles based on either direct transcription using a human operator, or converting the script. The fact that they still need humans in the loop for these huge operations suggests that automated systems just aren't up to it yet.
In video you have a plethora of things that make you life difficult, pretty much spanning huge swathes of current speech technology research:
-> Multiple speakers -> "Speaker Identification" (can you tell characters apart? Also, subtitles normally have different coloured text for different speakers)
-> Multiple simultaneous speakers -> The "cocktail party problem" - can you separate the two voice components and transcribe both?
-> Background noise -> Can you pick the speech out from any soundtrack/foley/exploding helicopters.
The speech algorithm will need to be extremely robust as different characters can have different gender/accents/emotion. From what I understand of the current state of recognition you might be able to get a single speaker after some training, but asking a single program to nail all of them might be tough!
--
There is no "subtitle" format that I'm aware of. I would suggest saving an image of the text using a font like Tiresias Screenfont that's specifically designed for legibility in these circumstances, and use a lookup table to cross-reference images against video timecode (remembering NTSC/PAL/Cinema use different timing formats).
--
There's a bunch of proprietary speech recognition systems out there. If you want the best you'll probably want to license a solution off one of the big boys like Nuance. If you want to keep things free the universities of RWTH and CMU have put some solutions together. I have no idea how good they are or how well they might be suited to the problem.
--
The only solution I can think of similar to what you're aiming at is the subtitling you can get on news channels here in the UK "Live Closed Captioning". Since it's live, I assume they use some kind of speech recognition system trained to the reader (although it might not be trained, I'm not sure). It's got better over the past few years, but on the whole it's still pretty poor. The biggest thing it seems to struggle with is speed. Dialogue is normally really fast, so live subtitling has the extra issue of getting everything done in time. Live closed captions quite frequently get left behind and have to miss a lot of content out to catch up.
Whether you have to deal with this depends on whether you'll be subtitling "live" video or if you can pre-process it. To deal with all the additional complications above I assume you'll need to pre-process it.
--
As much as I hate citing the big W there's a goldmine of useful links here!
Good luck :)
This falls into the category of dictation, which is a very large vocabulary task. Products like Dragon Naturally Speaking are amazingly good and that has a SAPI interface for developers. But it's not so simple of a problem.
Normally a dictation product is meant to be single speaker and the best products adapt automatically to that speaker, thereby improving the underlying acoustic model. They also have sophisticated language modeling which serves to constrain the problem at any given moment by limiting what is known as the perplexity of the vocabulary. That's a fancy way of saying the system is figuring out what you're talking about and therefore what types of words and phrases are likely or not likely to come next.
It would be interesting though to apply a really good dictation system to your recordings and see how well it does. My suggestion for a paid system would be to get Dragon Naturally Speaking from Nuance and get the developer API. I believe that provides a SAPI interface, which has the benefit of allowing you to swap in the Microsoft speech or any other ASR engine that supports SAPI. IBM would be another vendor to look at but I don't think you will do much better than Dragon.
But it won't work well! After all the work of integrating the ASR engine, what you will probably find is that you get a pretty high error rate (maybe half). That would be due to a few major challenges in this task:
1) multiple speakers, which will degrade the acoustic model and adaptation.
2) background music and sound effects.
3) mixed speech - people talking over each other.
4) lack of a good language model for the task.
For 1) if you had a way of separating each actor on a separate track that would be ideal. But there's no reliable way of separating speakers automatically in a way that would be good enough for a speech recognizer. If each speaker were at a distinctly different pitch, you could try pitch detection (some free software out there for that) and separate based on that, but this is a sophisticated and error prone task.) The best thing would be hand editing the speakers apart, but you might as well just manually transcribe the speech at that point! If you could get the actors on separate tracks, you would need to run the ASR using different user profiles.
For music (2) you'd either have to hope for the best or try to filter it out. Speech is more bandlimited than music so you could try a bandpass filter that attenuates everything except the voice band. You would want to experiment with the cutoffs but I would guess 100Hz to 2-3KHz would keep the speech intelligible.
For (3), there's no solution. The ASR engine should return confidence scores so at best I would say if you can tag low scores, you could then go back and manually transcribe those bits of speech.
(4) is a sophisticated task for a speech scientist. Your best bet would be to search for an existing language model made for the topic of the movie. Talk to Nuance or IBM, actually. Maybe they could point you in the right direction.
Hope this helps.

I'm an experienced C++ developer - how can I enter the gaming industry?

I've been working in C++ in embedded environments for a number of years, developing navigation applications. There is a gaming company in my hometown that I like the look of, but I don't have game development experience. You could consider a navigation app as a type of game, depending on who you are running from.
My question is, what steps should I take to enter the industry? Is it a bad idea to enter the industry at this stage (I'm 30)?
Being 30 doesn't really matter, you can enter the games industry at any age assuming you have the drive and ability.
Start reading about gaming topics, and game development websites (gamedev, gamasutra etc.)
Start writing games. Clones of games you like, your own original ideas, tech demos, anything that you can point to and say "I wrote that, and along the way I learned these things, and solved these problems."
If there is a specific area of interest to you, AI, Rendering, Frontend, Tools & Pipelines, Audio, focus on building game/demo/sample projects that challenge you in that area. "Yeah, I've done that" sounds a lot better in an interview than "yeah I've heard of that".
Get to know people in the industry if you can, through online forums, friends of friends, etc... One good contact can do more for your chances than weeks of demo coding or months of sending resumes out. Game companies may have open houses or job fairs.
The "entry level" jobs in game development are likely to be Frontend or Tools. If you've done navigation apps, sounds like that might be a decent fit for you. If that has included more low level work and optimization on embedded platforms, you might also look at Systems roles.
I'd suggest you start trying to write some games in your spare time. Having some demos is always a good start when you go to an interview and it'll give you some insights into what your job is going to be.
Gamedev.net has an excellent set of tutorials to work through to get a grip of a lot of game-coding concepts.
Do they have any job offerings? If so, look at what they ask for in the CV and start educating yourself in those concepts / technologies.
Contacting them and asking if they have any jobs for an excellent software engineer can't hurt either :)
I see you already accepted an answer, but I'll throw in my two cents:
If the company does console (e.g. Xbox360, PS3) or handheld (e.g. DS, iPhone) games, you should definitely emphasize the embedded aspect of your resume. A few anecdotes about how you optimized the memory layout of a class, or sped up some code by taking advantage of an obscure feature of the chipset will show that you can think like a console programmer. Also, if you did any sort of AI for the navigation apps (e.g. A*, Djikstra), it's good to mention that.
A few people recommended writing games - that's not a bad long term plan if you know you want to get into the industry, but I don't think you should let that stop you from applying to this particular company in the meantime. However you should definitely pick up a copy of one of their recent games, play it for a few hours over the weekend, and be able to say what you liked about it.
As for websites, I second the Gamasutra recommendation, along with Kotaku.
Good luck!
"game industry" is a broad question. There are:
AI programming
Graphic programming.
Sound programming.
Tool programming.
Scripting.
Physics programming.
Network programming.
You probably already can deal with #7, #5 and #4.
As for the rest - mostly it is a dealing with some kind of API, plus you need a very good understanding of 3D math (unless you make 2D game, that is).
For 3d math I cannot help you. I picked info in various non-english sources, and most of them aren't available anymore. However, I think this resource might contain info of interest.
For general 3d graphic info you need to study DirectX SDK and NVidia SDKs (both DirectX and OpenGL), plus there are OpenGL books you HAVE to read:
1. Francis s Hill, "Computer Graphics using OpenGL".
2. OpenGL programming guide aka "Red Book"
3. OpenGL shading language (aka "Orange Book")
4. And you might want to take a look at OpenGL reference manucal ("Blue Book")
I'm talking about OpenGL because while it doesn't offer same level of control for hardware resources, it is easier to get started with than DirectX, and available on larger selection of platforms and have a same power as DirectX. Plus GLSL isn't that different from HLSL (except that GLSL doesn't have remnants of assembly shader programming like HLSL), close enough to C++, so it is relatively easy to get started.
One important thing - if you seriously want to deal with 3D, you have to be able to easily imagine 3d operations in your mind. I.e. how to rotate object, scale object, move object, what matrix means, what is reflection vectors, how to cut polygon with planes, how to find intersection of two meshes, etc, and you should have at least basic understanding of more complex thing like boolean operations on polygonal meshes. I have no idea how to develop this skill (it is very close to "mechanical drawing"), but you'll get a lot of difficulties without it.
Just putting "experienced C++ dev" on your CV will probably get you in the door. The (UK at least) games industry is dominated by graduates and inexperienced programmers - the older ones either burn out or get promoted into management.
A lot of games programming is just programming - the skills are entirely transferable. And your navigation software experience probably puts you in for an AI-related role.
If someone with your background applied to me, I'd certainly give them an interview.
Well I started at 16 with (paid) game development. Search for jobs on websites. Make your own low-budget games and then publish them in a way or another.
If you are good people will search for you, otherwise you have to struggle a bit.