Shouldn't there be some adjustments for google cardboard? With all different sizes of phones and with everyone having a bit of differences in how far apart our eyes are I was looking for a way to re position the two images closer so that it looked better. I don't need to use all the pixels and I'm thinking if you allowed adjustments to the center placement of each view that this could be more usable. As is I have to hold the phone a bit further from me to see a good image.
The Cardboard is open "technology" and you are free to adjust it to your own personal needs - no one is going to do that for you. If you are on a bigger budget, there are cheap plastic headsets available from various manufacturers. I got my headset for around 35$ with shipping.
I personally use a Color Cross but there are many others. Just make sure to look for some with open back, so you can plug in headphones, for example, or use the camera once that becomes a thing. An adjustable phone holder is a big plus, so be on the lookout for that too. Another important thing is adjustable IPD (Inter Pupillary Distance) for the lenses in the headset - some headsets with fixed lense distance gave me the cross-eyed effect. Also, many headsets have adjustable lens-to-phone distance, which also can be important.
Please note that all this is necessary for an okay-ish experience, and for the very best one available, you should get a whole integrated headset, like the Sony Morpherus, Oculus Rift or SteamVR. Also bear in mind that this technology is still in the RnD phase and there are many problems to be solved.
For an interesting read on some of these problems, check this out:
http://media.steampowered.com/apps/valve/2013/MAbrashGDC2013.pdf
Related
I raised this question due to curiousity while using Google Goggle and Google's "Search by Image".
If you try giving Google an image to search, it can show you some results. Identical images work best (of course), but taken photo of various objects could be difficult.
I guess Google Goggle has workaround a bit by using text recognition and image matching recognition. If text recognition found the text, for instance, "SONY", then things might get simpler. If a brand's image is detected, then things should be simpler as well. The same goes with other famous brand and famous landmark, such as an Eiffel Tower. Having text and brand's image could help recognize things easily.
But if we are to search for something more obscure (need a better wording here), for instance, take this ramen image.
If you put this image into Google, you will get images of various other images that have similar colors and sometimes similar shape. Heck, there are other ramen images in the result, but I think it would be better if these ramen images are up in the top, since we input a ramen image, and our context here is ramen.
So here is my question, will it be possible to create such a software that can understand the context of the image? How can we express the context in the software?
Man, you just pointet out the very reason why so much people work on computer vision.
Is is quite easy to mathematically describe objects. Color, shape, density, . . .
All those can be calculated easily.
But computer vision becomes very complex when talking about "real life objects".
Angle, luminosity, and simply non consistency make it really almost impossible to detect an object accurately.
When working on computer vision, you should always ask yourself : what makes the object I want to recognize unique ?
What descriptor can I use that no other object possess ?
Ask yourself the question for theses ramen. Let's say I simply want to detect ramens.
What if the color of the soup changes? What if the meat is bigger ?
If you want to know more, you should read about pattern recognition and pattern matching.
And if you can find the solution to this kind of problems in a generic way, you can register for the nobel price I think :)
Some things are quite well known nowadays, like face recognition or OCR; but they are often quite specialized and apply to only one domain.
Think about it, even Google's image search algorithm sucks when you feed it with ramen.
It is pretty efficient with sudoku though, as he knows exactly what he is searching for.
All the difference is made in training, where you give a list of assumptions to help the algorithm.
So basically you got it. either you create a really nice computer vision system good at detecting one thing based on a lot of assumptions, or an "ok" but quite generic one :).
The choice mostly depends on your application
Are there any open source code which will take a video taken indoors (from a smart phone for example of a home or office buildings, hallways) and superimpose that on a 2D picture showing the path traveled? This can be a handr drawn picture or a photo of a floor layout.
First I thought of doing this using the accelerometer and compass sensors but thought that perhaps one can get better accuracy with the visual odometer approach. I only need 0.5 to 1 meter accuracy. The phone will also collect important information indoors (no gps) for superimposing that data on the path traveled (this is the real application of this project and we know how to do this part). The post processing of the video can be done later on a stand alone computer so speed and cpu power is not a issue.
Challenges -
The user will simply hand carry the smart phone so the video taker is moving (walking) and not fixed
limit the video rate to keep the file size small (5 frames/sec? is that ok?). Typically need perhaps a full hour of video
Will using inputs from the phone sensors help the visual approach?
any help or guidance is appreciated Thanks
I have worked in the area for quite some time. There are three points which I'd care to make.
Vision only is hard
Vision based navigation using just a cellphone camera is very difficult. Most of the literature with great results show ~1% distance traveled as state-of-the-art but is usually using stereo cameras. Stereo helps a great deal, particularly in indoor environments for coping with scale drift. I've worked on a system which achieves 0.5% distance traveled for stereo but only roughly 5% distance traveled for monocular. While I can't share code, much of our system was inspired by this Sibley and Mei paper.
Stereo code in our case ran at full 60fps on a desktop. Provided you can push data fast enough, it'll be fine. With your error envelope, you can only navigate for 100m or so. Is that enough?
Multi-sensor is way to go. Though other sensors are worse than vision by themselves.
I've heard some good work with accelerometers mounted on the foot to do ZUPT (zero velocity updates) when the foot is briefly motionless on the ground while taking a step in order to zero out drift. This approach has the clear drawback of needing to mount the device on your foot, making a vision approach largely useless.
Compass is interesting but will be distracted by the ton of metal within an office building. Translating few feet around a large metal cabinet might cause 50+ degrees of directional jump.
Ultimately, a combination of sensors is likely to be the best if you can make that work.
Can you solve a simpler problem?
How much control do you have over your environment? Can you slap down fiducial markers? Can you do wifi triangulation? Does it need to be an initial exploration? If you can go through the environment before hand and produce visual bubbles (akin to Google Street View) to match against, you'll be much more accurate.
I'm not aware of any software that does this directly (though it might exist) but stuff similar to what you want to do has been done. A few pointers:
Google for "Vision based robot localization" the problem you state is very similar to the problem robots with a camera have when they enter a new environment. In this field the approach is usually to have the robot map its environment and then use the model for later reference, but the techniques are similar to what you'll need.
Optical flow will roughly tell you in what direction the camera is moving, but it won't tell you the speed because you have no objective reference. This is because you don't know if the things you see moving in the video feed are 1cm away and very small or 1 mile away and very big.
If you know the camera matrix of the camera recording the images you could try partial 3D scene reconstruction techniques to take a stab at the speed. Note that you can do the 3D scene stuff without the camera matrix (this is the "uncalibrated" part you see in the title of a lot of the google results), the camera matrix will let you add real world object sizes (and hence distances) to your reconstruction.
The amount of images/second you need depends on the speed of the camera. More is better, but my guess is that 5/second should be sufficient at walking speeds.
Using extra sensors will help. Probably the robot localization articles talk about this as well.
Are there any methods in the computer vision literature that allows for detecting transparent glass in images? Like if I have an image of a car, can I detect windows? etc...
All methods I've found so far are active methods (i.e. require calibration, control over the environment or lasers). I need a passive method (i.e. all you have is an image, or multi-view images of the object and thats it).
Here is some very recent work aimed at detecting transparent objects in a general setting.
http://books.nips.cc/papers/files/nips22/NIPS2009_0397.pdf
http://videolectures.net/nips09_fritz_alfm/
I think what you looking for is detection of translucent regions. There is very limited work here since it is a very hard problem. Basically it is a major chicken and egg problem. Translucent regions cause almost all fundamental image processing tools to fail (e.g. motion estimation, feature matching, tracking, etc...). Yet you must use such tools to detect translucent regions. Anyway, up to my knowledge this is the most recent piece of work in this area and I doubt there is any other.
http://www.mee.tcd.ie/~sigmedia/pmwiki/uploads/Misc.Icip2011/CVPR_new.pdf
It is published in CVPR which is a top conference in Computer Vision.
Just a wild guess: if the camera is moving and you perform a 3D reconstruction of the scene, you could detect large discontinuities of the reconstructions at the reflected regions.
I think you should provide a clearer description of what your are trying to achieve.
The paper "Deriving intrinsic images from image sequences" shows some results with transparencies.
If you are close enough, you may be able to use the glass refraction (a la Snell's law) to detect the glass from multiple views.
I also think that reflections (specular regions) are a good indication for curved glasses.
Detecting it is one thing, but separating is another. You can do separation because its like putting 2 sounds with 1 of the sounds 180 degree out of phase. If you manage to learn the phasing sound by itself, you have the other sound automatically, so you could then learn that one too. Im stuck at the point where I can only superimposesubtract them if I learnt them by themselves. So the real gain here is somehow learning this addup, as 2 separate things, even though you never saw them apart.
How can I make a screensaver in C++ that fades an image in and out at random places on the screen with a specified time delay on the fade out?
Multimonitor support would be awesome.
If you have a working code or know where I can get it, it would be great. Otherwise point me in the right direction. I'm looking for a method that has a smooth and not laggy og flickery fade. The screensaver is for Windows XP.
I dont know C++, but I do know AS3, Javascript, and PHP. So I was hoping to relate some of that knowledge to C++.
What should I use to compile it?
First off if you're starting out in C++, don't start with a windows specific compiler like visual c++. Grab a nice cross-platform IDE that comes with a compiler like eclipse or code::blocks. Like any project, you are going to want to split it up into smaller tasks you can complete in stages. Sounds like you have a couple of hurdles.
Lack of C++ Knowledge (we were all here once)
Lack of knowledge about images (very common affliction)
Lack of experience (we'll work on this)
DO NOT let others discourage you. You CAN do this, and probably faster than you think possible. DO read a book or two about C++, you can get by for one project without knowing much but you WILL get frustrated often. Let's break up your project into a set of small goals. As you complete each goal your confidence in C++ will rise.
Image blending
Windows Screen Saver w/ multi-monitor support
Screen Canvas (directx, opengl, bitmaps?)
Timers
First, let's look at the problem of image blending. I assume that you'll want to "fade" the image in question into whatever windows background you have active. If you're going to have a solid-color background, you can just do it by changing the alpha transparency of the image in question between canvas refreshes. Essentially you'll want to average the color values of each pixel in the two images based on your refresh timer. In more direct terms to find the red, green, and blue pixel elements for any resultant pixel (P3)
N = timer ticks per interval (seconds/milliseconds/etc)
T = ticks that have occurred this interval
P1r = red pixel element from image 1
P2r = red pixel element from image 2
P3r = resultant red pixel element for blended image
P1g = green pixel element from image 1
P2g = green pixel element from image 2
P3g = resultant green pixel element for blended image
P1b = blue pixel element from image 1
P2b = blue pixel element from image 2
P3b = resultant blue pixel element for blended image
P3r = ((T/N * P1r) + ((N-T)/N * P2r))/2
P3g = ((T/N * P1g) + ((N-T)/N * P2g))/2
P3b = ((T/N * P1b) + ((N-T)/N * P2b))/2
Now let's look at the problem of windows screen savers and multi-monitor support. These are really two separate problems. Windows screensavers are really only regular .exe compiled files with a certain callback function. You can find many tutorials on setting up your screensaver functions on the net. Multi-monitor support will actually be a concern when you set up your Screen Canvas, which we'll discuss next.
When I refer to the Screen Canvas, I am referring to the area upon which your program will output visual data. All image rendering apps or tutorials will basically refer to this same concept. If you find this particularly interesting, please consider a graduate program or learning course in Computer Vision. Trust me you will not regret it. Anyway, considering the basic nature of your app I would reccommend against using openGL or DirectX, just because each have their own layer of app-specific knowledge you'll need to acquire before they are useful. On the other hand if you want built-in 3d niceties like double buffering and gpu offloading, I'd go with openGL (more platform agnostic). Lots of fun image libraries come as gimmes as well.
As for multi monitor support, this is a tricky but not complicated issue. You're basically just going to set your canvas bounds (screen boundary) to match the geometry of your multiple monitors. Shouldn't be an issue with multiple monitors of the same resolution, may get tricky with mismatched monitor resolutions (might have canvas areas not displayed on screen etc. There are well known algorithms and workarounds for these issues, but perhaps this is beyond the scope of this question.
Finally as to the timers required, this should be the easiest part. Windows and Linux handle time in their own strange ways (WHY doesn't MS implement STRPTIME), so If you are interested in portability I'd use a 3rd party library. Otherwise just use the windows settimer() basic functionality and have your image rendered in the callback function.
Advanced Topic: Hey if you're still reading there are a number of interesting algorithmic improvements you can make to this program. For instance, with your timer going off a set quanta each second, you could cache the first few blended images and save some processing time (our eyes are not terribly good at noticing differentiating between changing color gradients). If you are using openGL, you could cache sets of blended images as display lists (gotta use all that graphics card memory for something right?). Fun stuff, good luck!
"I dont know C++, but I do know AS3, Javascript, and PHP. So I was hoping to relate some of that knowledge to C++."
Oh boy, I suppose you're going to be surprised. To learn C++:
Buy at least one or two very good books (see here). Do not buy books that aren't very good. They'll teach you habits that you would have to unlearn later in order to make further progress.
Expect having to do plenty of hands-on code writing with a lot of reading in between. In the first few weeks, the compiler will spit legions of incomprehensible error messages at you, your code will keep crashing, and seasoned C++ programmers looking at your code will throw up their hands in disgust. And it's always your fault.
Plan a lot of time to learn. And I mean a real lot. No matter how much time you devote, it will take at least a couple of months until you upgrade from "dummy" to "novice".
Imagine having to hammer at it for a couple of years in order to become a real professional, constantly reading books and articles, devoting plenty of time to newsgroups and discussion forums, learning from others, banging your head against every wall surrounding your desk.
Be prepared to learn something you haven't known before at least once every week even after a decade of programming in C++ for a living.
Just for a moment, imagine I might not overstate this.
Edit for clarification: I have up-voted both Hunter Davis' and Elemental's answers, because they're very good, pretty much to the point, and in general the encouragement is supported by me despite my rant up there. In fact, I do not doubt that many are able to hack something together in C/C++ when given a skeleton example even if actually they don't know much of C++. Go ahead and try, there's a good chance you'll come up with a screen saver within reasonable time. But. Being able to "hack something together in C/C++" is far from "having learned C++". (And I mean very far.)
C++ is an incredible complex, mean, and stubborn beast. I also consider it almost breathtakingly beautiful, but in that it probably mirrors the view from a very high mountain: It's pretty hard to get to the point where you're able to appreciate the beauty.
C++ is a complex language and all that sbi indicates is probably true (certainly after 10 years of commercial C++ programming there is still some to learn) BUT I think that if you are confident in another language it should only take a couple of days to:
a) Install one of the compilers mentioned here (I would suggest VC as a quick in to windows programming)
b) Find some sample code of a screen saver that does something simple using the windows GDI (there is some code within the MS documentation on screen savers)
c) Modify this code to do what you want.
In terms of your limited requirement I think you will find that the window GDI++ libraries have sufficient power to do the alpha fades you require smoothly.
I'd look at microsoft c++ express. It's free and pretty capable. You may need to hack it, or get an older version, to produce unmanaged executables.
There are a lot of tutorials available for the rest
Is it possible to simulate custom forces (in my case, electromagnetic) using the SolidWorks API for Animator/Motion Study/COSMOS/EMS?
I'm looking for any combination of API's that would expose the required data to be able to simulate the dynamics of either electrical positive/negative or magnetic north/south forces.
The very basics of what I need to be able to do is:
Model two cubes
Mark a point on one as having positive charge and the point on the other as negative charge (or north/south magnetism)
Press "Go"
Watch them come together and stick
Once I can figure out how to do this, I can go through with the more complicated code that I'm trying to write (that's not the problem). I'm simply stuck on where to begin. I have searched and searched but cannot find a definitive answer, the documentation is sparse and hard to grasp.
If this is definitely not possible or not worth it to attempt in SolidWorks, then that's an acceptable answer. I never would have chosen SolidWorks if I was left free to pick the platform, but it was chosen for me.
EDIT
It seems COSMOSMotion API's IDDMActionReactionForce class is what I was looking for. Can anyone point me to an example of using it to define a custom force between two objects?
I can't speak about SolidWorks, so my answer may be irrelevant — BUT I have used ray-tracing software to model dynamic systems.
I my case, I was simulating the circumstances of lunar and solar eclipses. The ray-tracing software (POVRay) took care of generating an image of the scene including the Sun, Earth and Moon, but I had to calculate the positions of the various bodies for each frame of the animation.
I suspect this may be the case with modelling Electromagnetic Dynamics, and you will have to calculate the positions of the bodies involved at intervals, so that Solidworks will render the scenes of an animation.
I may be all wrong about the capabilities of SolidWorks, so I wish you luck.
I was tempted to say that "it's impossible" because you said it would be "an acceptable answer", but that would be too easy.
After much trying, my conclusion is SolidWorks is not the appropriate platform for this. It doesn't let you hook into its internal physics calculations and the Force object I spoke of is way too inefficient for the problem I needed to model. Theoretically, it will work to bring two cubes together along side SolidWorks' built in gravity/collision detection simulation elements but when confronted with an n-body problem, it was apparent that it wasn't made for that.