Take out enum to configuration file - c++

For providing correct drawing order I use simple enumerator:
enum class DisplayLayer
{
background,
// Below a lot of different layers dedicated to particular item types
itemTypeA,
itemTypeB,
//..
itemTypeZ,
effects
}
Each drawing object know his own DisplayLayer (sometimes more than one) and before drawing, all objects collected in std::vector and sorted in order of DisplayLayer and additional attributes.
This simple mechanism worked properly, but rarely it demands some updates: adding new levels, remove some old ones and reordered them. When layers count was moderate it was easy to check all relative drawings, but now it contains 50+ layers and after each change drawing should be checked carefully. If QA missed some cases it leads to drawing issues. No way to fixed it with patches, only rebuild executable file.
Now I think to take out DisplayLayer enum to config file, like .xml or .json. It will allow making patches without rebuilding binary file and additional flexibility in debug time.
Config may contain simply key-value pairs.
Question is which data structures I should use for better solution and performance.
For example, is it a good idea to keep layer Ids as set of string constants, like this:
const std::string DISP_LAYER_BACKGROUND = "background";
const std::string DISP_LAYER_ITEM_TYPE_A = "itemTypeA";
const std::string DISP_LAYER_ITEM_TYPE_B = "itemTypeB";
...
And in config parser consistently use them to extract sort values form configuration file and fill std::map<std::string, int>, where key is LayerId and value is corresponded sort layer's value.
In this case each drawing objects would consist std::string data field instead of DisplayLayer, but before drawing it would demand a lot of appeals to our map for converting layerName to layer sorting order.
Thanks in advance for every idea how take out enumerator from code to config file and solve my particular problem.

Related

Dealing with large data binary files

I am working with large binary files (aprox 2 Gb each) that contain raw data. These files have a well defined structure, where each file is an array of events, and each event is an array of data banks. Each event and data bank have a structure (header, data type, etc.).
From these files, all I have to do is extract whatever data I might need, and then I just analyze and play with the data. I might not need all of the data, sometimes I just extract XType data, other just YType, etc.
I don't want to shoot myself in the foot, so I am asking for guidance/best practice on how to deal with this. I can think of 2 possibilities:
Option 1
Define a DataBank class, this will contain the actual data (std::vector<T>) and whatever structure this has.
Define a Event class, this has a std::vector<DataBank> plus whatever structure.
Define a MyFile class, this is a std::vector<Event> plus whatever structure.
The constructor of MyFile will take a std:string (name of the file), and will do all the heavy lifting of reading the binary file into the classes above.
Then, whatever I need from the binary file will just be a method of the MyFile class; I can loop through Events, I can loop through DataBanks, everything I could need is already in this "unpacked" object.
The workflow here would be like:
int main() {
MyFile data_file("data.bin");
std::vector<XData> my_data = data_file.getXData();
\\Play with my_data, and never again use the data_file object
\\...
return 0;
}
Option 2
Write functions that take std::string as an argument, and extract whatever I need from the file e.g. std::vector<XData> getXData(std::string), int getNumEvents(std::string), etc.
The workflow here would be like:
int main() {
std::vector<XData> my_data = getXData("data.bin");
\\Play with my_data, and I didn't create a massive object
\\...
return 0;
}
Pros and Cons that I see
Option 1 seems like a cleaner option, I would only "unpack" the binary file once in the MyFile constructor. But I will have created a huge object that contains all the data from a 2 Gb file, which I will never use. If I need to analyze 20 files (each of 2 Gb), will I need 40 Gb of ram? I don't understand how these are handled, will this affect performance?
Option number 2 seems to be faster; I will just extract whatever data I need, and that's it, I won't "unpack" the entire binary file just to later extract the data I care about. The problem is that I will have to deal with the binary file structure in every function; if this ever changes, that will be a pain. I will only create objects of the data I will play with.
As you can see from my question, I don't have much experience with dealing with large structures and files. I appreciate any advice.
I do not know whether the following scenario matches yours.
I had a case of processing huge log files of hardware signal logging in the automotive area. Signals like door locked, radio on, temperature, and thousands more, appearing sometimes periodically. The operator selects some signal types and then analizes diagrams of signal values.
This scenario is based on a huge log file growing on passing time.
What I did was for every signal type creating its own logfile extract, in optimized binary format (one would load a fixed sized byte[] array).
This meant that having the diagram for just 10 types would be feasible to display fast, in real time. Zooming in on a time interval, dynamically selecting signal types, and so on.
I hope you got some ideas.

Face Recognition Using Backpropagation Neural Network?

I'm very new in image processing and my first assignment is to make a working program which can recognize faces and their names.
Until now, I successfully make a project to detect, crop the detected image, make it to sobel and translate it to array of float.
But, I'm very confused how to implement the Backpropagation MLP to learn the image so it can recognize the correct name for the detected face.
It's a great honor for all experts in stackoverflow to give me some examples how to implement the Image array to be learned with the backpropagation.
It is standard machine learning algorithm. You have a number of arrays of floats (instances in ML or observations in statistics terms) and corresponding names (labels, class tags), one per array. This is enough for use in most ML algorithms. Specifically in ANN, elements of your array (i.e. features) are inputs of the network and labels (names) are its outputs.
If you are looking for theoretical description of backpropagation, take a look at Stanford's ml-class lectures (ANN section). If you need ready implementation, read this question.
You haven't specified what are elements of your arrays. If you use just pixels of original image, this should work, but not very well. If you need production level system (though still with the use of ANN), try to extract more high level features (e.g. Haar-like features, that OpenCV uses itself).
Have you tried writing your feature vectors to an arff file and to feed them to weka, just to see if your approach might work at all?
Weka has a lot of classifiers integrated, including MLP.
As I understood so far, I suspect the features and the classifier you have chosen not to work.
To your original question: Have you made any attempts to implement a neural network on your own? If so, where you got stuck? Note, that this is not the place to request a complete working implementation from the audience.
To provide a general answer on a general question:
Usually you have nodes in an MLP. Specifically input nodes, output nodes, and hidden nodes. These nodes are strictly organized in layers. The input layer at the bottom, the output layer on the top, hidden layers in between. The nodes are connected in a simple feed-forward fashion (output connections are allowed to the next higher layer only).
Then you go and connect each of your float to a single input node and feed the feature vectors to your network. For your backpropagation you need to supply an error signal that you specify for the output nodes. So if you have n names to distinguish, you may use n output nodes (i.e. one for each name). Make them for example return 1 in case of a match and 0 else. You could very well use one output node and let it return n different values for the names. Probably it would even be best to use n completely different perceptrons, i.e. one for each name, to avoid some side-effects (catastrophic interference).
Note, that the output of each node is a number, not a name. Therefore you need to use some sort of thresholds, to get a number-name relation.
Also note, that you need a lot of training data to train a large network (i.e. to obey the curse of dimensionality). It would be interesting to know the size of your float array.
Indeed, for a complex decision you may need a larger number of hidden nodes or even hidden layers.
Further note, that you may need to do a lot of evaluation (i.e. cross validation) to find the optimal configuration (number of layers, number of nodes per layer), or to find even any working configuration.
Good luck, any way!

How to create an index for a collection of vectors/histograms for content based image retrieval

I'm currently writing a Bag of visual words-based image retrieval system which is similar to the Vector Space Model in text retrieval. Under this framework, each image is represented by a vector (or sometimes also called histogram in the literature). Basically each number in the vector counts the number of times each "visual word" occur in that image. If 2 images have vectors which are "close" together, this means they have many image features in common and are hence similar.
I'm basically trying to create the inverted file index for a set of such vectors. I want something that can scale from thousands (during trial stage) to hundred of thousands or million+ images so a home made data structure hack will not work.
I've looked at Lucene but apparently it only indexes text (correct me if I'm wrong) whereas in my case I want it to index numbers (i.e. the vectors themselves). I've seen cases where people convert the vector to a text document in the following way:
<3, 6, ..., 5> --> "w1 w2... wn". Basically any component that is non-zero is replaced by a textual word "w[n]" where n is the index of that number. This "document" is then passed to Lucene to index.
The problem with using this method is that the text representation for the vector does not encode how frequently the particular "word" occur so the ranking of the retrieved images would not be good.
Does anyone know of a mature indexing API that can handle vectors or perhaps suggest a different encoding scheme for my vectors so that I can continue to use Lucene? I've also looked at Lucene for Image Retrieval (LIRE) project and have tried the demo that came with it but the number of exceptions that were generated when I ran that demo makes me unsure about using it.
As for language of API, I'm open to C++ or Java.
Thanks in advance for any replies.
You can try GRire which is a Java library that implements the Bag of Visual Words model. It is my project and I am currently working on implementing an inverted index.

How to create a game save file format in c++ using STL

I just learned about the i/o part of the STL, more specifically fstream. Although I can now save binary info and classes I've made to the hard drive, I am not sure how to define how the info should be read.
I saw the answer for making a file format from this post:
Typically you would define a lot of records/structures, such as
BITMAPINFOHEADER, and specify in what order they should come, how they
should be nestled, and you might need to write a lot of indices and
look-up tables. Such files consists of a number of records (maybe
nestled), look-up tables, magic words (indicating structure begin,
structures end, etc.) and strings in a custom-defined format.
What I want to know specifically is how to do this with the STL and C++...
Since the format is meant simply for the use of a game I would think it would be much easier though. The format should:
Be traversable (I can look through it and find the start of structure and maybe check its name
Be able to hold multiple classes and data in a single file
Have identifiable starts and ends to sections: such as the space in text files
Maybe have it's own icon to represent it??
How do I do this in c++ ?
The first stage in determining how to structure your save-file is determining what information needs to be stored. Presumably, you have a list of entities, each of which with generic information (probably derived from one of a few base classes), such as position, velocity etc.
One of the best things you can do to implement a map format is to have a save-parser for each class (some can just derive from the base class' save-parser). So for instance, if you have a player class, which derives from CBaseNPC, you could most-likely simply derive from CBaseNPC, override the parser, calling the base-class function, and adding any other necessary fields, for example, if we had (pseudocode):
void CBaseNPC::Save() {
SaveToFile( health );
SaveToFile( armor );
SaveToFile( weapons );
SaveToFile( position );
SaveToFile( angles );
}
Then for your player class:
void CPlayer::Save() {
CBaseNPC::Save();
SaveToFile( achievement_progress );
}
Obviously, this is just a simple example, and no doubt your saving parsers will have more fields etc. to deal with.
Dealing with the structure of the file itself, the main thing you need to worry about is delimiters, how will your main load-parser recognise what each field corresponds to?
I suppose the best way to explain this would be with an example, the following could be a simple start to a save-file:
Map: {mapname}
Gametime: {gametime}
===Player===
Health: {health}
Armor: {armor}
Weapons: {wep1 (wep1Ammo), wep2 (wep2Ammo), wep3 (wep3Ammo)}
Position: {x, y, z}
Angles: {yaw, pitch, roll} // Could be quaternion instead.
AchievementProgress: {arbritraryData}
===Player===
===NPC-npc_name===
Health: {health}
Armor: {armor}
Weapons: {wep1 (wep1Ammo), wep2 (wep2Ammo), wep3 (wep3Ammo)}
Position: {x, y, z}
Angles: {yaw, pitch, roll} // Could be quaternion instead.
===NPC-npc_name===
===Entity-item_name===
Position: {x, y, z}
Angles: {yaw, pitch, roll}
Model: {modelname}
===Entity-item_name===
Here we have used the "===" string as a delimiter for the start of a class's parameters, and a new line as the delimiter for the parameters within each class.
It is then a relatively simple matter of structuring your parser so it reads in the map name, and loads it. Then sets the game-time to the value specified in the save-file.
It then looks through the file until it finds a "===" reads the string it encounters, and looks it up from a dictionary (possibly an std::map or std::unordered_map) to determine the class to create (or edit) with the information in the file. Once it has determined the class type, it can then proceed to call the Load() function from that class, which will retrieve all the information contained. The parser then looks for the next instance of the "==={string encountered}===" and closes that class. It then proceeds following the same procedure with the next class encountered.
Sorry for the length of this post, and I'm sure it could be made briefer and there are probably some things I have missed, I just wrote this off the top of my head, so there may be erroneous things here, but I hope it puts you on the right path to getting a workable save-file format. :)
If you still have any problems or questions regarding my post, please comment, I'll do my best to answer promptly.

How to compare 2 volumes and list modified files?

I have 2 hard-disk volumes(one is a backup image of the other), I want to compare the volumes and list all the modified files, so that the user can select the ones he/she wants to roll-back.
Currently I'm recursing through the new volume and comparing each file's time-stamps to the old volume's files (if they are int the old volume). Obviously this is a blunder approach. It's time consuming and wrong!
Is there an efficient way to do it?
EDIT:
- I'm using FindFirstFile and likes to recurse the volume, and gather info of each file (not very slow, just a few minutes).
- I'm using Volume Shadow Copy to backup.
- The backup-volume is remote so I cannot continuously monitor the actual volume.
Part of this depends upon how the two volumes are duplicated; if they are 'true' copies from the file system's point of view (e.g. shadow copies or other block-level copies), you can do a few tricky little things with respect to USN, which is the general technology others are suggesting you look into. You might want to look at an API like FSCTL_READ_FILE_USN_DATA, for example. That API will let you compare two different copies of a file (again, assuming they are the same file with the same file reference number from block-level backups). If you wanted to be largely stateless, this and similar APIs would help you a lot here. My algorithm would look something like this:
foreach( file in backup_volume ) {
file_still_exists = try_open_by_id( modified_volume )
if (file_still_exists) {
usn_result = compare_usn_values_of_files( file, file_in_modified_volume )
if (usn_result == equal_to) {
// file hasn't changed at all
} else {
// file has changed (somehow)
}
} else {
// file was deleted (possibly deleted and recreated)
}
}
// we still don't know about files new in modified_volume
All of that said, my experience leads me to believe that this will be more complicated than my off-the-cuff explanation hints at. This might be a good starting place, though.
If the volumes are not block-level copies of one another, then it will be very difficult to compare USN numbers and file IDs, if not impossible. Instead, you may very well be going by file name, which will be difficult if not impossible to do without opening every file (times can be modified by apps, sizes and times can be out of date in the findfirst/next queries, and you have to handle deleted-then-recreated cases, rename cases, etc.).
So knowing how much control you have over the environment is pretty important.
Instead of waiting until after changes have happened, and then scanning the whole disk to find the (usually few) files that have changed, I'd set up a program to use ReadDirectoryChangesW to monitor changes as they happen. This will let you build a list of files with a minimum of fuss and bother.
Assuming you're not comparing each file on the new volume to every file in the snapshot, that's the only way you can do it. How are you going to find which files aren't modified without looking at all of them?
I am not a Windows programmer.
However shouldn't u have stat function to retrieve the modified time of a file.
Sort the files based on mod time.
The files having mod time greater than your last backup time are the ones of your interest.
For the first time u can iterate over the back up volume to figure out the max mod time and created time from your interested set.
I am assuming the directories of interest don't get modified in the backup volume.
Without knowing more details about what you're trying to do here, it's hard to say. However, some tips about what I think you're trying to achieve:
If you're only concerned about NTFS volumes, I suggest looking into the USN / change journal API's. They have been around since 2000. This way, after the initial inventory you can only look at changes from that point on. A good starting point for this, though a very old article is here: http://www.microsoft.com/msj/0999/journal/journal.aspx
Also, utilizing USN API's, you could omit the hash step and just record information from the journal yourself (this will become more clear when/if you look into said APIs)
The first time through comparing a drive's contents, utilize a hash such as SHA-1 or MD5.
Store hashes and other such information in a database of some sort. For example, SQLite3. Note that this can take up a huge amount of space itself. A quick look at my audio folder with 40k+ files would result in ~750 megs of MD5 information.