Benchmarking on the 2006 Middlebury Stereo Dataset - c++

my problem is the following:
I have developed a superpixel segmentation algorithm and i want to test how the superpixel behave in stereo imagery. For this i use the Middlebury Stereo Dataset 2006 (http://vision.middlebury.edu/stereo/data/scenes2006/), right now i load one pair of images segment them and then compute my metrics (basically a fancy IOU) on it. This now works properly and now i want to extend it that it not only uses one pair of stereo images but the whole dataset.
Programming language is C++.
Here lies the problem:
How would i efficiently load all images? Because the pairs are all in independent folders (for the structure of the folder see below).
My idea would be to have a list of paths to the folders and then import all images from one folder, compute everything and then load the next folder.
How would i do that?
Structure of each stereo pair is like that:
Folder with the name of the item (like cat, wood, baby, ...)
disp1.png
disp5.png
view1.png
view5.png
Right now at the start of the my program i load images like that:
String pathImageLeft = "/Users/Stereo/Left/view1.png";
String pathImageRight = "/Users/Stereo/Right/view5.png";
String pathDisparityLeft = "/Users/Stereo/DisparityMap/disp1.png";
String pathDisparityRight = "/Users/Stereo/DisparityMap/disp5.png";
Thanks for your ideas.

If I understood OP's question right, it can be reduced to
How can I access directories?
From C++17, there is a Filesystem libary available which provides access to directories in a portable way.
Namely, it provides a std::filesystem::directory_entry which
Represents a directory entry. The object stores a path as a member and may also store additional file attributes (hard link count, status, symlink status file size, and last write time) during directory iteration.
and a std::filesystem::directory_iterator
that iterates over the directory_entry elements of a directory (but does not visit the subdirectories). The iteration order is unspecified, except that each directory entry is visited only once. The special pathnames dot and dot-dot are skipped.
The provided links provide sample codes.
Before C++17, you either have to use boost::filesystem (which is actually an anchestor of the std::filesystem) or you have to use the OS specific functions which are usually of limited portability.
Concerning the latter, there are already existing questions in SO:
How to list files in a directory using the Windows API?
How can I get the list of files in a directory using C or C++?
How do I get a list of files in a directory in C++?
to list only a few.

Related

In a text file with linked structures, how do I quickly follow those links in C++, without running through the file multiple times?

I'm about to start a project that requires me to load specific information from an IFC file into classes or structs. I'm using C++, but it's been some years since I last used it so I'm a bit rusty.
The IFC file has a linked structure, where an element in a line might refer to a different line, which in turn links to another. I've included a short example where the initial "#xxx" is the line index and any other "#xxx" in the line is a link to a different line.
#170=IFCAXIS2PLACEMENT3D(#168,$,$);
#171=IFCLOCALPLACEMENT(#32,#170);
#172=IFCBUILDINGSTOREY("GlobalId", #41, "Name", "Description", "ObjectType", #171"...);
In this example I would need to search for "IFCBULDINGSTOREY", and then follow the links backwards through the file, jumping around storing the important bits of information I need.
The main problem is that my test file has 273480 lines (18MB), and links can jump from one end of the file to the other - and I'll likely have to handle larger files than this.
In this file I need to populate about 500 objects, so that's a lot of jumping around the file to grap the relevant information.
What's a performance-friendly method of jumping around a file like that?
(Disclosure - I help out with a .NET IFC implementation)
I'd question what it is you're doing that means you can't use one of the many existing implementations of the IFC schema. Parsing the IFC models is generally the simple part of the problem. If you want to visualise the geometry or take measurements from the geometry primitives there's a whole another level of complexity... E.g. Just one particular geometry type out of dozens: https://standards.buildingsmart.org/IFC/DEV/IFC4_3/RC2/HTML/link/ifcadvancedbrep.htm
If you go to BuildingSmart's software implementations list and search for 'development' you'll get a good list of them for various technologies/languages.
If you're sure you want to implement yourself, the typical approaches are to build some kind of dictionary/map holding the entities based on their key. Naively you can run an initial pass through with a Lexer, and build the map in memory. But as IFC models can be over a GB, you may need a more sophisticated approach where you build some kind of persisted index - and maybe even put it into some kind of database with indexes (maybe some flavour of a document database). This is going to be more important if you want to support 'random access' to the data over multiple sessions.

Not sure what i'm looking for (matrix, db-like structure) to organize files by tags

i was messing around organizing my music files when i asked myself why windows nor linux offer a way to organize a folder by custom tags in a database-likle manner rather than hierarchically.
The problem i wanted to solve is the following:
I have music files
A titled "tempest" from Beethoven, classical music in a piano only version.
B titled "whatever" from Mozart, classical music orchestral
D titled "one winged angel" from Uematsu, classical style, game ost, orchestral
C titled "one winged angel" same as before, violin only, cover from Taylor Davis.
And whatever "main" information i use for grouping, makes listing files by any other category immpossible.
Hence i whished to save files in an hidden folder with a simple increasing number.format, and have a program in which i can add files, add categories, search by tags, and end up with a list of the files i want. E.g. today i want to listen to all piano only pieces independently of their composer-time period.
I started making a structure of vectors containing vectors (aka matrix) but indexing lines and column by string started getting complicated when i want to remove a column.
And searching files by tag would require me to have each tag as an object knowing all files that use it, and it starts becomming more similar to a 3d matrix.
I though it would be better to think of this as a database, started with sqllite but ended with the problem of being unable to remove columns (i know i can create a copy etcc, but i wanted to avoid messy workarounds).
Also an sql-like database wouldn't allow me to have an area dedicated to a list of random tags for each file without a definite category.
Is there any existing library that rather then working as an sql database offers me something similar to a search/insert optimized matrix for strings? I don't think i was the first one thinking about that, someone must have done something similar.
This is very similar to what i want to achieve (strictly speaking about functionality), but rather than having only a bunch of random tags, i'd like to have some categories AND a set of random tags.
The problem with random tags only is you can't use the same word when it refers to different things. For example if the title of a piece is A and there's a film named A with a piece titled B, filtering A in the mess of tags would give both, while with categories i could filter pieces titled A. But the random mess of additional tags without category is useful too, for information you don't want to fill in most of the files and that would take pointless space in a standard database.

Linked directory not found

I have following scenario:
The main software I wrote uses a database created by a simulator. This database is around 10 GB big at the moment, so I want to keep only one copy of that data per system.
Assuming I have following projects:
Main Software using the data, located at /SimData
DLL using the data for debugging, searching for data at /SimData
Debugging tool to parse the image database, searching for the data at /SimData
Since I do not want to have all those programs have their own copy of SimData (not only to decrease place used, but also to ensure that all Simulation data used is always up to date for all programs).
I created for the DLL and Debugging Utility a link named SimData to MainSoftware/SimData, but when opening a file with "SimData\MyFile.data" it cannot find it, only the MainSoftware with the ACTUAL SimData folder can find it.
How can I use the MainSoftware/SimData folder without setting absolute paths?
This is on Windows 7 x64
I agree with Peter about adding the DB location as a configurable parameter. A common place to store that is in the registry.
however, If you want to create links that will be recognized by your software, try hardlinks. . fsutil should do the trick as described here.
You need a way to configure the database location. You could use an INI or other configuration file, or a registry setting, or a command-line input, or an environment variable. Or You could write your program to search a directory hierarchy... for example, if the various modules are usually siblings of each other in your directory tree, you could search for SimData/MyFile.data, ../SimData/MyFile.data, ../../MainSoftware/SimData/Myfile.data, and use the first one found.
Which answer is the "right one" depends on your situation.

Include static data/text file

I have a text file (>50k lines) of ascii numbers, with string identifiers, that can be thought of as a collection of data vectors. Based on user input, the application only needs one of these data vectors at runtime.
As far as I can see, I have 3 options for getting the information from this text file:
Keep it as a text file, extract the required vector at run-time. I believe the downside is that you can't have a relative path in the code, so the user would have to point to the file's correct location (?). Or alternatively, get the configure script to inject the absolute path as a macro.
Convert it to a static unsigned char using xxd (as explained here) and then include the resulting file. Downside is that a 5MB file turns into a 25MB include file. Am I correct in thinking that this 25MB is loaded into memory for the duration of the runtime?
Convert it to an object and link using objcopy as explained here. This seems to keep the file size about the same -- are there other trade-offs?
Is there a standard/recommended method for doing this? I can use C or C++ if that makes a difference.
Thanks.
(Running on linux with gcc)
I would go with number 1 and pass the filepath into the program as an argument. There's nothing wrong with doing that and it is simple and straight-forward.
You should have a look at the answers here:
Directory of running program
The top voted answer gives you a glue how to handle your data file. But instead of the home folder I would suggest to save it under /usr/share as explained in the link.
I'd preffer to use zlib (and both ways are possible:side file or include with compressed data).

Is there any method to know whether a directory contain a sub directory?

I am woking in c++.
Is there any method to know whether a directory contain a sub directory?
CFileFind seems have to search through total files.
It is time consuming if the only subdirectory is at the end of the list and the there are lots of files.
for example: directory A contains 99995 files and one subdirectory at the end of FindNextFile List. had I try 99995 times, then say: yes, it contains subdirectory?
Raymond Chen from Microsoft has written a post that probably applies here: Computing the size of a directory is more than just adding file sizes. In essence, he explains that information like the size of a dir cannot be stored in the dir's entry, because different users might have different permissions, possibly making some of the files invisible to them. Therefore, the only way to get the size the user should see is to calculate it upon request from the user.
In your case, the answer probably stems from the same reasoning. The list of directories available to your app can only be determined when your app asks for it, as its view of the root directory might be different than another app's, running with different credentials. Why Windows store directories along with files I don't know, but that's a given.
Since Win32 is as close as you'll get to the file system in user mode, I'd avoid any higher level solutions such as .NET, as it might only simplify the interface. A driver might work quicker, but that out of the scope of my knowledge.
If you are using the .Net framework you could use Directory.GetDirectories and check is the size of the array is 0. Do not know how if this will give you speed.
If you have control over the directories you could apply a naming convention so that directories that have sub directories are named one way and directories with out sub directories are named another.
You can try using the boost filesystem library.
A class by name directory_iterator [ declared in boost/filesystem/operations.hpp ] has many functions which can be used for listing files, finding whether the file is a sub-directory ( is_directory -- I guess this is what you are looking for ) etc..
Refer the following link for more information.
link text
It seems you are using MFC [ just saw that you are using CFileFind ], didn't see that earlier.
Sorry, Didn't have much info. You may have to use FindFirstFile/FindNextFile.
Whether this can be done very fast is entirely platform-dependent.
On Win32 you use FindFirstFile/FindNextFile or wrappers on top of those like MFC CFileFind and they list items in some order that can't be forced to list directories first.