How to run Flow training on 2 GPU(s)? - ray

Repost from Wouter J.
How can I run the flow training on GPU(s)?
I'm currently trying to get the ppo_runner benchmark to work with my 2 GPU's. However, I can't find any way to make use of these resources.
What would I need to change in the code to make use of the GPU's?

Repost from the Flow team
Perhaps a good place to start is to learn how to use multiple GPUs through Ray:
https://ray.readthedocs.io/en/latest/using-ray-with-gpus.html
Learning how to use multiple GPUs with ray will enable you to train with multiple GPUs too because we use Ray RLlib for training.
Although a fair warning: unless your problem is well suited for GPUs (large image input, large neural nets), often than not, training on a multi-thread CPU (e.g. 64-thread Xeon) is much faster than training on multiple GPUs.

Related

Dense optical flow for real time on hardware for 4K resolution #30fps

I am working on a hardware-based solution ( without GPU) for dense optical flow to get real-time performance # 30fps with decent accuracy. Something comparable to or better than NVIDIA’s optical flow SDK. Can someone please suggest good algorithms other than Pyramidal Lukas Kanade and horn Schunck. I found SGM as a good starting point but it’s difficult to implement on FPGA or DSP core. The target is to measure large displacements with occlusion as well as similar to real-world videos.
It would be great if someone could tell what exactly algorithm NVIDIA has used.
For dense optical flow estimation in real-time setup, FlowNet is a good option. It can achieve optical flow estimation at a higher FPS. You can take their trained model to perform inference. Since you want to run the estimation in a non-GPU environment, you can try converting the model to ONNX format. A good implementation of FlowNet is available in NVIDIA's Github repo. I am not sure exactly which algorithm NVIDIA is using in its SDK for optical flow.
The FlowNet2 is built upon previous work of FlowNet to compute large displacement. However, if you are concerned about occlusion then you may check out their follow up work on FlowNet3. Another alternative to FlowNet is PwC-Net.

Time between training set images for individual facial recognition

Edit: I didn't make this clear, for this is for the possible future development of an application.
I am looking into individual facial recognition for an application, but an essential part of this seems to be a fairly large training set of images for each individual to be recognized.
Is it important for the images to be taken at different times in different environments, or could several images captured over a few seconds with a handheld camera possibly provide the necessary variations for a good training set?
(This isn't for human facial recognition, by the way, so existing tools and databases won't really help too much. I'm aware that 2D image recognition can not necessarily be applied to all species; let's just assume that it does work in my use case.)
This paper may answer some of your questions:
http://uran.donetsk.ua/~masters/2011/frt/dyrul/library/article8.pdf
From the pattern classification point of view, a usual problem in face recognition is having a plethora of classes and only a few, possibly only one, training sample(s) per class. For this reason, more sophisticated classifiers are not needed but a nearest-neighbour classifier is used.
While I'm not an expert on the subject, it appears to be a common problem to have only one image per person as a training sample and one that has been solved with at least some level of accuracy in controlled lighting/positional situations.
To specifically answer your question, a training set that had multiple images of each person with little or no variation ("several images captured over a few seconds with a handheld camera"), would not be as valuable as one that had more variation (e.g. different facial expressions, lighting, backgrounds).

vehicle type identification with neural network

I was given a project on vehicle type identification with neural network and that is how I came to know the awesomeness of neural technology.
I am a beginner with this field, but I have sufficient materials to learn it. I just want to know some good places to start for this project specifically, as my biggest problem is that I don't have very much time. I would really appreciate any help. Most importantly, I want to learn how to match patterns with images (in my case, vehicles).
I'd also like to know if python is a good language to start this in, as I'm most comfortable with it.
I am having some images of cars as input and I need to classify those cars by there model number.
Eg: Audi A4,Audi A6,Audi A8,etc
You didn't say whether you can use an existing framework or need to implement the solution from scratch, but either way Python is excellent language for coding neural networks.
If you can use a framework, check out Theano, which is written in Python and is the most complete neural network framework available in any language:
http://www.deeplearning.net/software/theano/
If you need to write your implementation from scratch, look at the book 'Machine Learning, An Algorithmic Perspective' by Stephen Marsland. It contains example Python code for implementing a basic multilayered neural network.
As for how to proceed, you'll want to convert your images into 1-D input vectors. Don't worry about losing the 2-D information, the network will learn 'receptive fields' on its own that extract 2-D features. Normalize the pixel intensities to a -1 to 1 range (or better yet, 0 mean with a standard deviation of 1). If the images are already centered and normalized to roughly the same size than a simple feed-forward network should be sufficient. If the cars vary wildly in angle or distance from the camera, you may need to use a convolutional neural network, but that's much more complex to implement (there are examples in the Theano documentation). For a basic feed-forward network try using two hidden layers and anywhere from 0.5 to 1.5 x the number of pixels in each layer.
Break your dataset into separate training, validation, and testing sets (perhaps with a 0.6, 0.2, 0.2 ratio respectively) and make sure each image only appears in one set. Train ONLY on the training set, and don't use any regularization until you're getting close to 100% of the training instances correct. You can use the validation set to monitor progress on instances that you're not training on. Performance should be worse on the validation set than the training set. Stop training when the performance on the validation set stops improving. Once you've accomplished this you can try different regularization constants and choose the one that results in the best validation set performance. The test set will tell you how well your final result is performing (but don't change anything based on test set results, or you risk overfitting to that too!).
If your car images are very complex and varied and you cannot get a basic feed-forward net to perform well, you might consider using 'deep learning'. That is, add more layers and pre-train them using unsupervised training. There's a detailed tutorial on how to do this here (though all the code examples are in MatLab/Octave):
http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
Again, that adds a lot of complexity. Try it with a basic feed-forward NN first.

Improve performance of dense optical flow analysis (easily)?

I wrote a program that uses OpenCV's cvCalcOpticalFlowLK. It performs fine on a low-resolution webcam input, but I need to run it on a full HD stream with significant other computation following the optical flow analysis for each frame. Processing a 5 minute video scaled down to 1440x810 took 4 hours :( Most of the time is being spent in cvCalcOpticalFlowLK.
I've researched improving the speed by adding more raw CPU, but even if I get an 8-core beast, and the speedup is the theoretical ideal (say 8x, since I'm basically only using one of my 2.9GHz cores), I'd only be getting 4FPS. I'd like to reach 30FPS.
More research seems to point to implementing it on the GPU with CUDA, OpenCL, or GLSL(?). I've found some proof-of-concept implementations (eg. http://nghiaho.com/?page_id=189), and many papers saying basically "it's a great application for the GPU, we did it, it was awesome, and no we won't share our code". Needless to say, I haven't gotten any of them to run.
Does anyone know of a GPU-based implementation that would run on Mac with an NVIDIA card? Are there resources that might help me approach writing my own? Are there other dense OF algorithms that might perform better?
Thanks!
What about OpenVidia Bayesian Optical Flow? Also the paper Real-Time Dense and Accurate Parallel Optical Flow using CUDA says that their work is freely available in the CUDA zone. I couldn't find it there immediately, but maybe you will, or can write the authors?

Fastest deskew algorithm?

I am a little overwhelmed by my task at hand. We have a toolkit which we use for TWAIN scanning. Some of our customers are complaining about slower scan speeds when the deskew option is set. This is because if their scanner does not support a hardware deskew, it is done in post-processing on the CPU. I was wondering if anyone knows of a good (i.e. fast) algorithm to achieve this. It is hard for me to say what algorithm we are using now. What algorithms are out there for this, and how do they rank as far as speed/accuracy? If I knew the names of the algorithms, it could be easier for me to do a google search on them.
Thank You.
-Tom
Are you scanning in Color or B/W ?
Deskew is processor intensive. A Group4 tiff or JPEG must be decompressed, skew angle determined, deskewed and then compressed.
There are many image processing algorithms out there with deskew and I have evaluated many over the years. There are some huge differences in processing speed between the different libraries and a lot of it comes down to how well it is coded rather than the algorithm used. There is a huge difference in commercial libraries just reading and writing images.
The fastest commerical deskew I have used by far comes from Unisoft Imaging (www.unisoftimaging.com). I assume much of it is written in assembler. Unisoft has been around for many years and is very fast and efficient. It supports different many different deskew options including black border removal, color and B/W deskew. The Group4 routines are very solid and very fast. The library comes with many other image processing options as well as TWAIN and native SCSI scanner support. It also supports Unix.
If you want a free deskew then you might want to have a look at Leptonica. It does not come with too much documentation but is very stable and well written. http://www.leptonica.com/
Developing code from scratch could be quite time consuming and may be quite buggy and prone to errors.
The other option is to process the document in a separate process so that scanning can run at the speed of the scanner. At the moment you are probably processing everything in a parallel fashion, one task after another, hence the slowdown.
Consider doing it as post-processing, because deskew cannot be done at real-time (unless it's hardware accelerated).
Deskew consists of two steps: skew detection and rotation. Detecting the skew angle can usually be done on a B&W (1-bit) image faster. Rotation speed depends on the quality of the interpolation. A good quality deskew will take a lot of time to run, much more than scanning pages.
A good high speed scanner can do 120 double-sided pages per minute, if it has hardware JPEG or TIFF Group 4 compression, and your TWAIN library takes advantage of it (hint: do not use native mode). You barely have enough time to save the file to the hard drive at that speed, let alone decompress, skew detect, rotate, re-compress. Quality deskew takes several seconds per page, unless you can use the video card's hardware accelerator to rotate and compress.
Do I correctly understand you already have such algorithm implemented? If so, are you sure there is no space for optimization? I'd start with profiling existing solution.
Anyway, I guess you should look for fast digital Radon transform algorithm.
Take a look at http://pagetools.sourceforge.net. They have deskew algorithm implementation.