As a part of my masters project I proposed to build a virtual trial room application intended for retail clothing stores. Currently its meant to be used directly in store though it may be extended for online stores as well.
This application will show customers how a selected apparel would look on them by showing it on their 3D replica on screen.
It involves 3 steps
Sizing up the customer
Building customer replica 3D humanoid model
Apply simulated cloth on the model
My question is about the feasibility of the project and choice of framework.
Can this be achieved in real time using a normal Desktop computer? If yes what would be appropriate framework ( hardware, software, programming language etc ) for this purpose?
On the work I have done till now, I was planning to achieve above steps in following ways
for step 1 : option a) Two cameras for front and side views or
option b) 1 Kinect or 2 Kinect for complete 3D data
for step 2: either use makehuman (http://www.makehuman.org/) code to build a customised 3D model using above data or build everything from scratch, unsure about the framework.
for step 3: Just need few cloth samples, so thought of building simulated clothes in blender.
Currently I have just the vague idea about different pieces but I am not sure of how to develop complete application.
Theoretically this can be achieved in real time. Many usefull algorithms for video tracking, stereo vision and 3d recostruction are available in OpenCV library. But it's very difficult to build robust solution. For example, you'll probably need to track human body which moves frame to frame and perform pose estimation (OpenCV contains POSIT algorithm), however it's not trivial to eliminate noise in resulting objects coordinates. For inspiration see a nice work on video tracking.
You might want to choose another way, simplify some things, avoid complicated stuff do things less dynamicaly and estimate only clothes size and approximate human location. I this case most likely you will create something usefull and interesting.
I've lost link to one online fiting room where hands and body detection implemented. Using Kinnect solves many problems. But If for some reason you won't use it then AR(augmented reality) helps you (yet another fitting room)
Related
So a while back about a year ago I was interested in building my own barebones augmented reality (AR) library. My goal was to be able to take a video of something (anything really) and then be able to place augmentations (3D objects that weren't really there) in the video. So for example I might take a video of my living room and then, through this AR library/tool, I'd be able to add in maybe a 3D avatar of a monster sitting on top of my coffee table. So, knowing absolutely nothing about the subject or computer vision in general, I had settled for the following strategy:
Use 3D reconstruction tools/techniques (Structure from Motion, or SfM) to build up a 3D model of everything in the video (e.g. a 3D model of my living room)
Analyze that 3D model (really a 3D pointcloud to be exact) for flat surfaces
Add my own logic to determine what objects (3D models such as Blender files, etc.) to place in what area of the video's 3D model (e.g. monster standing on top of the coffee table)
The hardest part: inferring the camera orientation in each frame of the video, and then figuring out how to orient the augmentation (e.g. monster) correctly based on what the camera is pointed at, and then "merging" the augmentation's 3D model into the main video 3D model. This means that as the camera moves around my living room, the monster appears to remain standing in the same place on my coffee table. I never figured out a good solution for this but figured if I could get to this 4th step that I'd find some solution.
After several difficult weeks (computer vision is hard!) I got the following pipeline of tools to work with mixed success:
I was able to feed sample frames of a video (e.g. a video taken while walking around my living room) into OpenMVG and produce a sparse pointcloud PLY file/model of it
Then I was able to feed that PLY file into MVE and produce a dense pointcloud (again PLY file) of it
Then I fed the dense pointcloud and the original frames into mvs-texturing to produce a textured 3D model of my video
About 30% of the time, this pipeline worked amazing! Here's the model of the front of my house. You can see my 3D front yard, my son's 3D playhouse and even kinda/sorta make out windows and doors!
About 70% of the time the pipelined failed with indecipherable errors, or produced something that looked like an abstract painting. Additionally, even with automated scripting involved, it took the tooling about 30 mins to produce the final 3D textured model...so pretty slow.
Well, looks like Google ARCode and Apple ARKit beat me to the punch! These frameworks can take live video feeds from your smartphone and accomplish exactly what I had been trying to accomplish about a year ago: real-time 3D AR. Very, very similar (but much more advanced & interactive) as Pokemon Go. Take a video of your living room, and voila, an animated monster is sitting on your coffee table, and you can interact with it. Very, very, very cool stuff.
My question
I'm jealous! Of course, Google and Apple can hire some best-in-show CV/3D recon folks, but I'm still jealous!!! I'm curious if there are any hardcore AR/CV/3D recon gurus out there that either have insider knowledge or just know the AR landscape so well that they can speak to what kind of tooling/pipeline/technology is going on behind the scenes here with ARCode or ARKit. Because I practically broke my brain trying to figure this out on my own, and I failed spectacularly.
Was my strategy (explained above) ballpark-accurate, or way off base? (Again: 3D recon of video -> surface analysis -> frame-by-frame camera analysis, model merge)?
What kind of tooling/libraries/techniques are at play here?
How do they accomplish this in real-time whereas, if my 3D recon even worked, it took 30+ mins to be processed & generated?
Thanks in advance!
I understand your jealousy and as a Computer Vision engineer I have it experienced many times before :-).
The key for AR on mobile devices is the fusion of computer vision and inertial tracking (the phone's gyroscope).
Quote from Apple's ARKit docu:
ARKit uses a technique called visual-inertial odometry. This process
combines information from the iOS device’s motion sensing hardware
with computer vision analysis of the scene visible to the device’s
camera.
Quote from Google's ARCore docu:
The visual information is combined with inertial measurements from the
device's IMU to estimate the pose (position and orientation) of the
camera relative to the world over time.
The problem with this approach is that you have to know every single detail about your camera and IMU sensor. They have to be calibrated and synced together. No wonder it is easier for Apple than for the common developer. And this is also the reason why Google only supports a couple of phones for the ARCore preview.
I'm really interested in the Google Street View mobile application, which integrates a method to create a fully functional spherical panorama using only your smartphone camera. (Here's the procedure for anyone interested: https://www.youtube.com/watch?v=NPs3eIiWRaw)
What strikes me the most is that it always manages to create the full sphere, even when stitching a feature-less near monochrome blue sky or ceiling ; which gets me to thinking that they're not using feature based matching.
Is it possible to get a decent quality full spherical mosaic without using feature based matching and only using sensor data? Are smartphone sensors precise enough? What library would be usable to do this? OpenCV? Something else?
Thanks!
The features are needed for registration. In the app the clever UI makes sure they already know where each photo is relative to the sphere so in the extreme case all the have to do is reproject/warp and blend. No additional geometry processing needed.
I would assume that they do try to do some small corrections to improve the registration, but even if these fail, you can fallback onto the sensor based ones acquired at capture time.
This is a case where a clever UI makes the vision problem significantly easier.
Are there any open source code which will take a video taken indoors (from a smart phone for example of a home or office buildings, hallways) and superimpose that on a 2D picture showing the path traveled? This can be a handr drawn picture or a photo of a floor layout.
First I thought of doing this using the accelerometer and compass sensors but thought that perhaps one can get better accuracy with the visual odometer approach. I only need 0.5 to 1 meter accuracy. The phone will also collect important information indoors (no gps) for superimposing that data on the path traveled (this is the real application of this project and we know how to do this part). The post processing of the video can be done later on a stand alone computer so speed and cpu power is not a issue.
Challenges -
The user will simply hand carry the smart phone so the video taker is moving (walking) and not fixed
limit the video rate to keep the file size small (5 frames/sec? is that ok?). Typically need perhaps a full hour of video
Will using inputs from the phone sensors help the visual approach?
any help or guidance is appreciated Thanks
I have worked in the area for quite some time. There are three points which I'd care to make.
Vision only is hard
Vision based navigation using just a cellphone camera is very difficult. Most of the literature with great results show ~1% distance traveled as state-of-the-art but is usually using stereo cameras. Stereo helps a great deal, particularly in indoor environments for coping with scale drift. I've worked on a system which achieves 0.5% distance traveled for stereo but only roughly 5% distance traveled for monocular. While I can't share code, much of our system was inspired by this Sibley and Mei paper.
Stereo code in our case ran at full 60fps on a desktop. Provided you can push data fast enough, it'll be fine. With your error envelope, you can only navigate for 100m or so. Is that enough?
Multi-sensor is way to go. Though other sensors are worse than vision by themselves.
I've heard some good work with accelerometers mounted on the foot to do ZUPT (zero velocity updates) when the foot is briefly motionless on the ground while taking a step in order to zero out drift. This approach has the clear drawback of needing to mount the device on your foot, making a vision approach largely useless.
Compass is interesting but will be distracted by the ton of metal within an office building. Translating few feet around a large metal cabinet might cause 50+ degrees of directional jump.
Ultimately, a combination of sensors is likely to be the best if you can make that work.
Can you solve a simpler problem?
How much control do you have over your environment? Can you slap down fiducial markers? Can you do wifi triangulation? Does it need to be an initial exploration? If you can go through the environment before hand and produce visual bubbles (akin to Google Street View) to match against, you'll be much more accurate.
I'm not aware of any software that does this directly (though it might exist) but stuff similar to what you want to do has been done. A few pointers:
Google for "Vision based robot localization" the problem you state is very similar to the problem robots with a camera have when they enter a new environment. In this field the approach is usually to have the robot map its environment and then use the model for later reference, but the techniques are similar to what you'll need.
Optical flow will roughly tell you in what direction the camera is moving, but it won't tell you the speed because you have no objective reference. This is because you don't know if the things you see moving in the video feed are 1cm away and very small or 1 mile away and very big.
If you know the camera matrix of the camera recording the images you could try partial 3D scene reconstruction techniques to take a stab at the speed. Note that you can do the 3D scene stuff without the camera matrix (this is the "uncalibrated" part you see in the title of a lot of the google results), the camera matrix will let you add real world object sizes (and hence distances) to your reconstruction.
The amount of images/second you need depends on the speed of the camera. More is better, but my guess is that 5/second should be sufficient at walking speeds.
Using extra sensors will help. Probably the robot localization articles talk about this as well.
I'm doing some work on pathfinding.
So far I have tested my code on scenes composed of 2D cells.
I've also created a simple 3d scene to test my work on as well.
I'd like to test my work on some 3d scenes .. but it is time consuming to create them.
Does anyone know of any scene datasets that I could use to test my pathfinding algorithms on?
To get a better answer, you really need to specify the dimensionality of the configuration spaces that you want to consider. You aren't going to be tackling protein folding and docking problems (200+ degrees of freedom) with discrete graph searches. Even a relatively small planning problems (in terms of academic problems), of about 6 degrees of freedom can quickly become intractable.
Most of the best examples for planning tend to be published in research papers first, and then make their way into more general use. Some of the best work tends to be published in IEEE journals, or at the Intelligent Robots and Systems (IROS) and International Conference on Robotics and Automation (ICRA) conferences. It may also be worth using the bibliography of a well known reference in the field, such as "Motion Planning" by LaValle as a starting point for further research (available in bibtex here)
Mark Overmars work in the computational geometry and planning communities have made some of the problems considered in his publications very recognizable. It is worth checking if any his current grad students and collaborators have any data sets available at the moment.
If you're happy to still be doing some work in 2d, and to manually convert an image to geometric data, Kris Beevers website has a number of worked examples for a range of planners in 2d work spaces.
The Motion Strategy Library contains a number of classical motion planning problems for use in 2d and 3d workspaces, with varying dimensionality of configuration space depending on the problem. It includes:
L sections into a birdcage
trailers
multiple trailers
mazes
kinematic chains
non-holonomic cars
A more recent implementation of an academic motion planning library is The Open Motion Planning Library developed by the Kavraki lab. Because of licensing, I haven't checked personally, but I assume that they ship some examples and tests with their project.
A number of significantly more complex kinodynamic motion planning examples are now publicly available as part of the OpenRAVE project. Their gallery is eye opening.
when I need big 3D datasets, I usually use attractors or other dynamical series. You simply have to iterate as many time as you want and it will generate a nice set of 3D data.
Try this 'Peter de Jong Attractor':
Xn+1 = sin(a Yn) - cos(b Xn)
Yn+1 = sin(c Xn) - cos(d Yn)
Where (for example): a = 1.4, b = -2.3, c = 2.4, d = -2.1
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking to write a bit of software that will end up drawing a human frame (which can be configured with various parameters), and the plan is to have some sort of clothing placed on the dummy.
I've looked at Blender, and OpenGL libraries as well as other rendering and physics engines, I'm not looking for you to tell me how to do this, but mainly I'm wondering what libraries are out there to do this sort of thing?
So there'll be a pattern for the clothing in 2d, then the system, (at least in theory) will be able to translate that in to a 3d representation of a shirt for example? And then place that on the human frame. I know there's a lot of work I need to do for this, however in terms of rendering the clothing on to the frame, and accounting for collisions and how it drops around the frame etc, I've been googling, and have found a few bits, but was wondering if there were C++ libraries out there that would do that.
I'm developing using Visual C++ 2010, and the target environment is a Windows box.
Either that, or i'm going to need to take some physics lessons.
Unfortunately, developing a system like the one your talking about would be insanely difficult. On the plus side, there are alot of easy to use technologies that will help you attain your goal hopefully.
Generally, the way that this type of thing works is as follows: You make some 3d asset in a modeling program such as Blender, 3ds max, Maya, Softimage, etc, and then use this in your program/game. You can think of these programs as just spitting out a bunch of 3d coordinates, which your program, with the help of OpenGL or DirectX can load into memory and render.
Modeling and loading assets is of course the alternative to developing algorithms to generate points. This is what it seems like your trying to accomplish.
The bad news is that clothing is really really complicated. A big part of this is due to the fact that most of it requires simulating cloth dynamics. Another part of the problem is that even if you had a 2d pattern, how would you the manner in which the clothing would adhere to your human model? Is it skin tight? Loose? How will you parameterize that? The placement of the actual clothing on the body is a chore in and of itself as anyone with experience in 3d modeling might tell you.
Nevertheless, some of the industry's brightest professionals are looking for both better ways to simulate cloth, and better ways to automate asset creation.
In summation, the easy answer is that what your trying to do, as interesting and noble as it may be, is going to be extremely difficult and may not have the result your looking for.
As for where you can go for more answers:
If your still intersted in finding a way to automate clothing attatchment to models, I would start by looking around academic websites. Look for any computer science departments which have computer graphics research programs. You will find alot of interesting things there.
For more academic type resources look at Game Programming Gems, GPU Programming Gems, and Graphics Programming Gems book series. They feature many good articles that tackle difficult graphics problems such as these.
Another thing you might do is check out blender a little more. There is an interesting project called MakeHuman
http://makehuman.blogspot.com/
That automates the process of developing human models in blender.
There are a couple of tutorials for putting clothing on the models, take a look at this one:
http://www.davidjarvis.ca/blender/tutorial-05.shtml
For more tutorials on clothing and cloth simulation in blender, you can always check out
www.blendercookie.com
cg.tutsplus.com
I hope some of this has been useful.
From what I remember, cloth is simulated as a mesh of springs which suggests physics libraries for the simulation along with an understanding of the physics of springs/cloth. I've not heard of a physics library tailored to cloth simulation though, but no doubt someone on this site will know of one.
It's answer about cloth simulation itself. (maybe it is not you're intersing in)
If you want to model cloth simulation by some vendors middleware - you can try to use
Havok(it's commercial). It seems to me, that is supports any collision objects, represented by a triangle mesh.
PhysX (it's free), but when you will try to use it there is a lot of constraints on it).
If you want to model cloth physics by you hands I can advise to you this steps:
Refresh base knowledge about physics (Interia, Energy, Newton's law.)
Good start point fo cloth simulation and also physics simulation is that book
http://www.amazon.com/Game-Physics-Pearls-Gino-Bergen/dp/1568814747
Read articles from Siggraph about clothes.
Think about which collision objects do you need
Think about what forces do you need.
Split this challenge to
Broad Phase / Integration / Collision Detection / Collision Response / Constrain Solver
I have developed cloth physics simulation in C++, OpenCL.
It takes me about 4 months to develop, and about 2 months to Debug stage5.
But it was very hot-time in my life, the job has consumed huge amount of time.
except the part that you want to change the dummy while application is running what you want is more or less the example of game engines like Esenthel Engine. the whole idea is to load a mesh for the body and then put a "cloth" (cloth is already defined in most game engines as physical type) on it. but when it come to runtime changes in human frame it becomes a little more tricky since you have to know how you are going to affect the parrameters which is not easy of organic shapes.
Free Game engine to use these days is Unity 3d ... as well it all depends in the detail and as well Maya and 3ds Max are the best of the modeling programs.