Cubemap from panoramic horizontally wrappable image - opengl

I'm trying to write an algorithm to generate the "ceiling panel" from a horiontally wrappable panoramic image like the one above. Images 1 to 4 are a straight cut out for the walls of the cube but the ceiling will be more complicated as I assume it needs to be composited from parts 5a to 5d. Does anyone know the solution in pseudocode?
my guess is that we need to iterate over the coordinates of the ceiling tile
i.e.
for y=0 to height
for x=0 to width
colorofsomecoordinateonoriginalimage = some function (poloar coords?)
set pixel(x,y) = colorofsomecoordinateonoriginalimage
next
next

Hum... I remember doing something like that for computer vision class one time back in grad school. It's not impossible but a LOT of work needs to be done. One way would be to degrade the entire product's quality. That's the easiest starting point. Once you degraded it enough (depending on how much you need to stretch the edges), you can start applying nonlinear transformations to the image. This is probably best done approximating by maybe cutting out sections of the cylinder by degrees and then applying one of the age old projections used in making flat maps (like Mercator or CADRG or something)... but you have to remember to interpolate the pixels, make sure you at least do an averaging of the pixels to approximate. That's the best I can think of.

You can't generate a panorama just by taking photos from a single location and stitch them. Well, you can for a single horizontal set, but it would look ugly (usually, you stitch many more than 4 photos to avoid distortions at the edges).
Here, you have even more data in the y-direction, which means even more pictures, and some sort fancy projection to generate the final image.
If you look at the panorama you have closely, you'll notice that the boundary of the region in sunlight is not straight. That is because your panorama was projected on a cylinder, not a cube. So I don't think 1/2/3/4 would look right directly mapped to a cube.
Bottom line, you really can't consider those 8 chunks as 8 pictures taken from a fixed point (If you need convincing, try yourself to take 8 pictures like that and try to stitch them together. You'll see how fun it is for the upper row, and even though it is easy for the bottom row, how ugly it looks on the stitched regions).
Now, why you need cube maps changes drastically what your options are. If you're only looking for a cube map to do cheap environment mapping effects, then the simplest is to find an arbitrary function that maps the edges where you want them to be, and simply linearly interpolate in between. It's completely the wrong projection, but ought to give a picture that looks good enough for the intended goal.
If you're looking for something more accurate, then you need to know how the projection was generated, so that you can unproject it before re-projecting it on the cube.
All that said, it's also a lot easier to just photograph cube maps rather than process a panorama to generate them, but that might not be possible for you.

Related

Estimating a cuboid's rotation from its parallel projection

I paint in my spare time, and that means I have a truly massive collection of reference images. Folders full of buildings, people, animals, cars, etc. It's gotten to the point where it'd be great to tag the objects by their pose, so I can find the right object at the right angle. CVAT, an image annotating tool for machine learning, allows you to mark images with cuboids, as you can see in this picture.
But suddenly I'm wondering... is it even possible for a computer to estimate the rotation of a cuboid based on a single image, when all I can feed it are the eight (x,y) pairs that define the image of said cuboid?
My thinking is that I need to somehow invert the transformation matrix so that this cuboid looks like a rectangle. That would mean that we're looking at it "on-axis", and I'm imagining that this inversion could furnish me with those XYZ rotations I'm looking for.
My best lead right now is OpenCv's getPerspectiveTransform function, which can create a matrix that will warp an image, but that transformation seems to be purely two-dimensional.
Wikipedia does mention the idea of using an "augmented matrix" to perform transformations in an extra dimension, which seems apropos here, since I want to go from a 2D representation to a 3d.
A couple constraints & advantages that might clarify the feasibility, here:
The cuboids are rendered in a parallel projection. They don't match the perspective of the image, and that's okay! Just need a rough sense of their pose -- a margin of error of 10 degrees on any given axis of rotation is fine by me, in case there are some inexact solutions that could work.
In the case of multiple cuboids in the scene, I don't care at all about their interrelations -- each case can be treated separately.
I always have a sense of the "rear wall" of the cuboid, because I'm careful in how I make these annotations, in case that symmetry-breaking helps.
The lengths of edges are irrelevant, I'm not trying to measure the "aspect ratio" of these bounding cuboids.
Thank you for any advice or hints!

Detect a 2 x 3 Matrix of white dots in an image

I want to locate a service robot via infrared landmarks. The idea is to detect two landmarks, get the distance to the landmarks and calculate the robots position from these informations (the position of the landmarks are known).
For this I have built an artificial 2x3 matrix of IR LEDs, which are visible in the robots infrared camera image (shown in the image below).
As the first step, I want to detect a single landmark in a picture and get it's x-y coordinates. I can use these coordinates in the future to get the distance from the depth-image provided.
My first approach was to convert the image to a black and white image. Then I tried to filter out different cluster of points (which i dilated and contoured in the first place). I couldn't succeed with this method.
Now I wonder if there are any pattern recognition/computer vision methods which can help me to quite "easily" detect the pattern.
I've added a picture of the infrared image with the landmark in it and a converted black/white image.
a) Which method can help me to solve this problem?
b) Should I use a 3x3 Matrix or any other geometric form instead of the 2x3 Matrix ?
IR-Image
Black-White Image
A direct answer:
1) find all small circles in the image; 2) look among these small circles for ones that are the same size and close together, and, say, form parallel lines.
The reason for this approach is that you have coded the robot with a specific pattern of small objects. Therefore, look for the objects and then look for the pattern. (If the orientation and size wouldn't change, then you could just look for a sub-image within the larger image, but because it can, you need to look for elements of the pattern that remain consistent with motion in the 3D space, that is, the parallel lines.)
This will work in the example images, but to know whether this will work more generally, we need to know more than you told us: It depends on whether the variation in the images of the matrix and the variations in the background will let this be enough to distinguish between them. If not, maybe you need a more clever algorithm or maybe a different pattern of lights. In the extreme case, it's obvious that if you had another 2x3 matric around, it's not enough. It all depends on the variation of the object to be identified and the variations within the background scene, and because you don't tell us either of these things, it's hard to say the best way, what's good enough, what's a better way, etc.
If you have the choice, and here it sound like you do, good data is better than clever analysis. For this problem, I'd call good data to be anything that clearly distinguishes the object from the background. You need to think of it this way, and look at what the background is, and all the different perspectives on the lights that are possible, and make sure these can never be confused.
For example, if you have a lot of control over this, and enough time, temporal variations are often the easiest. Turning the lights (or a subset of the lights) on and off, etc, and then looking for the expected temporal variation is often the surest way to distinguish signal from noise — but really, this again is just making an assumption about the background and foreground (ie, that the background won't vary with some particular time pattern).

How to Detect Objects in the Image without using any library in C++?

I am writing an application in C++ that requires a little bit of image processing. Since I am completely new to this field I don't quite know where to begin.
Basically I have an image that contains a rectangle with several boxes. What I want is to be able to isolate that rectangle (x, y, width, height) as well as get the center coordinates of each of the boxes inside (18 total).
I was thinking of using a simple for-loop to loop through the pixels in the image until I find a pattern but I was wondering if there is a more efficient approach. I also want to see if I can do it efficiently without using big libraries like OpenCV.
Here are a couple example images, any help would be appreciated:
Also, what are some good resources where I could learn more about image processing like this.
The detection algorithm here can be fairly simple. Your box-of-squares (BOS) is always aligned with the edge of the image, and has a simple structure. Here's how I'd approach it.
Choose a colorspace. Assume RGB is OK for now, but it may work better in something else.
For each line
For each pixel, calculate the magnitude difference between the pixel and the pixel immediately below it. The magnitude difference is simply sqrt((X-x)^2+(Y-y)^2+(Z-z)^2)), where X,Y,Z are color coordinates of the first pixel, and x,y,z are color coordinates of the pixel below it. For RGB, XYZ=RGB of course.
Calculate the maximum run length of consecutive difference magnitudes that are below a certain threshold magThresh. You may also choose a forgiving version of this: maximum run length, but allowing intrusions up to intrLen pixels long that must be followed by up to contLen pixels long runs. This is to take care of possible line-to-line differences at the edges of the squares.
Find the largest set of consecutive lines that have the maximum run lengths above minWidth and below maxWidth.
Thus you've found the lines which contain the box, and by recalculating data in 2.1 above, you'll get to know where the boxes are in horizontal coordinates.
Detecting box edges is done by repeating the same thing but scanning left-to-right within the box. At that point you'll have approximate box centroids that take no notice of bleeding between pixels.
This can be all accomplished by repeatedly running the image through various convolution kernels followed by doing thresholding, I'd think. The good thing is that both of those operations have very fast library implementations. You do not want to reimplement them by hand, it will be likely significantly slower.
If you insist on doing it yourself (personally I'd use OpenCV, it's industrial-strength and free), you're going to need an edge detection algorithm first. There are a good few out there on the internet, but be prepared for some frightening mathematics...
Many involve iterating over each pixel, and lifting it and it's neighbours' values into a matrix, and then convolving with a kernel matrix. Be aware that this has to be done for every pixel (in principle though, in your case you can stop at the first discovered rectangle), and for each colour channel - so it would be highly advisable to push onto the GPU.

How to create large terrain/landscape

I was wandering how it's possible to create a large terrain in opengl. My first idea was using blender and create a plane, subdevide it, create the terrain and export it as .obj. After taking a look at blender I thought this should be possible but soon I realized that my hexacore + 8GB RAM aren't able too keep up the subdeviding in order to support the required precision for a very large terrain.
So my question is, what is the best way to do this?
Maybe trying another 3D rendering software like cinema4d?
Creating the terrain step-by-step in blender and put it together later? (might be problematic to maintain the ratio between the segments)
Some methods I don't know about?
I could create a large landscape with a random generation algorithm but I don't want a random landscape I need a customized landscape with many details. (heights, depth, paths)
Edit
What I'll do is:
Create 3 different heightmaps (1. cave ground (+maybe half of the wall height), 2. inverted heightmap for cave ceiling, 3. standard surface heightmap)
Combine all three heightmaps
Save them in a obj file or whatever format required
do some fine tuning in 3d editing tool (if it's too large to handle I'll create an app with LOD algorithm where I can edit some minor stuff)
save it again as whatever is required (maybe do some optimization)
be happy
Edit2
The map I'm creating is so big that Photoshop is using all of my 8GB Ram so I have to split all 3 heightmaps in smaller parts and assemble them on the fly when moving over the map.
I believe you would just want to make a height map. The larger you make the image, the further it can stretch. Perhaps if you made the seams match up, you could tile it, but if you want an endless terrain it's probably worth the effort to generate a terrain.
To make a height map, you'll make an image where each pixel represents a set height (you don't really have to represent it as an image, but it makes it very easy to visualize) which becomes a grey-scaled color. You can then scale this value to the desired maximum height (precision is decided by the bit-depth of the image).
If you wanted to do this with OpenGL, you could make an interface where you click at points to raise the height of particular points or areas.
Once you have this image, rendering it isn't too hard, because the X and Y coordinates are set for your space and the image will give you the Z coordinate.
This would have the downside of not allowing for caves and similar features (because there is only one height given for a point). If you needed these features, they might be added with meshes or a 2nd
If you're trying to store more data than fits in memory, you need to keep most of it on disk. Dividing the map into segments, loading the nearer segments as necessary, is the technique. A lot of groups access the map segments via quadtrees, which usually don't need much traversion to get to the "nearby" parts.
Variations include creating lower-resolution versions of larger chunks of map for use in rendering long views, so you're keeping a really low-res version of the Whole Map, a medium-res version of This Valley Here, and a high-res copy of This Grove Of Trees I'm Looking At.
It's complicated stuff, which is why nobody really put the whole thing together until about GTA:San Andreas or Oblivion.

OpenGL GL_SELECT or manual collision detection?

As seen in the image
I draw set of contours (polygons) as GL_LINE_STRIP.
Now I want to select curve(polygon) under the mouse to delete,move..etc in 3D .
I am wondering which method to use:
1.use OpenGL picking and selection. ( glRenderMode(GL_SELECT) )
2.use manual collision detection , by using a pick-ray and check whether the ray is inside each polygon.
I strongly recommend against GL_SELECT. This method is very old and absent in new GL versions, and you're likely to get problems with modern graphics cards. Don't expect it to be supported by hardware - probably you'd encounter a software (driver) fallback for this mode on many GPUs, provided it would work at all. Use at your own risk :)
Let me provide you with an alternative.
For solid, big objects, there's an old, good approach of selection by:
enabling and setting the scissor test to a 1x1 window at the cursor position
drawing the screen with no lighting, texturing and multisampling, assigning an unique solid colour for every "important" entity - this colour will become the object ID for picking
calling glReadPixels and retrieving the colour, which would then serve to identify the picked object
clearing the buffers, resetting the scissor to the normal size and drawing the scene normally.
This gives you a very reliable "per-object" picking method. Also, drawing and clearing only 1 pixel with minimal per-pixel operation won't really hurt your performance, unless you are short on vertex processing power (unlikely, I think) or have really a lot of objects and are likely to get CPU-bound on the number of draw calls (but then again, I believe it's possible to optimize this away to a single draw call if you could pass the colour as per-pixel data).
The colour in RGB is 3 unsigned bytes, but it should be possible to additionally use the alpha channel of the framebuffer for the last byte, so you'd get 4 bytes in total - enough to store any 32-bit pointer to the object as the colour.
Alternatively, you can create a dedicated framebuffer object with a specific pixel format (like GL_R32UI, or even GL_RG32UI if you need 64 bits) for that.
The above is a nice and quick alternative (both in terms of reliability and in implementation time) for the strict geometric approach.
I found that on new GPUs, the GL_SELECT mode is extremely slow. I played with a few different ways of fixing the problem.
The first was to do a CPU collision test, which worked, but wasn't as fast as I would have liked. It definitely slows down when you are casting rays into the screen (using gluUnproject) and then trying to find which object the mouse is colliding with. The only way I got satisfactory speeds was to use an octree to reduce the number of collision tests down and then do a bounding box collision test - however, this resulted in a method that was not pixel perfect.
The method I settled on was to first find all the objects under the mouse (using gluUnproject and bounding box collision tests) which is usually very fast. I then rendered each of the objects that have potentially collided with the mouse in the backbuffer as a different color. I then used glReadPixel to get the color under the mouse, and map that back to the object. glReadPixel is a slow call, since it has to read from the frame buffer. However, it is done once per frame, which ends up taking a negligible amount of time. You can speed it up by rendering to a PBO if you'd like.
Giawa
umanga, Cant see how to reply inline... maybe I should sign up :)
First of all I must apologize for giving you the wrong algo - i did the back face culling one. But the one you need is very similar which is why I got confused... d'oh.
Get the camera position to mouse vector as said before.
For each contour, loop through all the coords in pairs (0-1, 1-2, 2-3, ... n-0) in it and make a vec out of them as before. I.e. walk the contour.
Now do the cross prod of those two (contour edge to mouse vec) instead of between pairs like I said before, do that for all the pairs and vector add them all up.
At the end find the magnitude of the resulting vector. If the result is zero (taking into account rounding errors) then your outside the shape - regardless of facing. If your interested in facing then instead of the mag you can do that dot prod with the mouse vector to find the facing and test the sign +/-.
It works because the algo finds the amount of distance from the vector line to each point in turn. As you sum them up and you are outside then they all cancel out because the contour is closed. If your inside then they all sum up. Its actually Gauss's Law of electromagnetic fields in physics...
See:http://en.wikipedia.org/wiki/Gauss%27s_law and note "the right-hand side of the equation is the total charge enclosed by S divided by the electric constant" noting the word "enclosed" - i.e. zero means not enclosed.
You can still do that optimization with the bounding boxes for speed.
In the past I've used GL_SELECT to determine which object(s) contributed the pixel(s) of interest and then used computational geometry to get an accurate intersection with the object(s) if required.
Do you expect to select by clicking the contour (on the edge) or the interior of the polygon? Your second approach sounds like you want clicks in the interior to select the tightest containing polygon. I don't think that GL_SELECT after rendering GL_LINE_STRIP is going to make the interior responsive to clicks.
If this was a true contour plot (from the image I don't think it is, edges appear to intersect) then a much simpler algorithm would be available.
You cant use select if you stay with the lines because you would have to click on the line pixels rendered not the space inside the lines bounding them which I read as what you wish to do.
You can use Kos's answer but in order to render the space you need to solid fill it which would involve converting all of your contours to convex types which is painful. So I think that would work sometimes and give the wrong answer in some cases unless you did that.
What you need to do is use the CPU. You have the view extents from the viewport and the perspective matrix. With the mouse coord, generate the view to mouse pointer vector. You also have all the coords of the contours.
Take the first coord of the first contour and make a vector to the second coord. Make a vector out of them. Take 3rd coord and make a vector from 2 to 3 and repeat all the way around your contour and finally make the last one from coord n back to 0 again. For each pair in sequence find the cross product and sum up all the results. When you have that final summation vector keep hold of that and do a dot product with the mouse pointer direction vector. If its +ve then the mouse is inside the contour, if its -ve then its not and if 0 then I guess the plane of the contour and the mouse direction are parallel.
Do that for each contour and then you will know which of them are spiked by your mouse. Its up to you which one you want to pick from that set. Highest Z ?
It sounds like a lot of work but its not too bad and will give the right answer. You might like to additionally keep bounding boxes of all your contours then you can early out the ones off of the mouse vector by doing the same math as for the full vector but only on the 4 sides and if its not inside then the contour cannot be either.
The first is easy to implement and widely used.