Maintaining state of a program - c++

This might be a simple question for most people out there but I'm like stuck on it.
I was wondering,most bank softwares or lets say any commercial software when closed at the end of the day and then re-opened the next,how do those programs remember everything from the previous day? I hope I make myself clear, thanks in advance for your guidance
Best.

This is not black magic.
The answer is by saving its data. You do this by putting it in a database, or writing data files.
The trick is to write your programs in a way that makes it easy to guarantee that you've restored the state you thought you saved.
A common approach is to use serialization. This means that you are able to take your giant data structure and recursively call a 'Save' function on it and its contained objects. This is very intuitive if you are taking advantage of object inheritance and polymorphism. Of course, you also write a 'Load' function to do the reverse.
You write your data in such a way that it can be read back in. For example, if you wanted to write a string, you might first write its length and then its characters. That way, when you read it you know how many bytes to allocate.
The above approach is pretty standard if you are writing binary file formats. In fact, it's the philosophy behind chunk-based formats such as AVI.
For text-based, you might choose to serialize your data in popular formats like XML or JSON. But you are only restricted by your imagination.

Related

Sharing data structures between perl and cpp

I have a perl script which generates a very large data structure (which starts life as an array of array references). This is then written to a text file using some weird home-brew serialisation scheme.
The data from the text file is stored as the value in a key-value store db.
A c++ file then retrieves the data, and deserializes it (into a hashmap, although can potentially be flexible on how this data is structured).
What I'm interested in is finding if there are any good ways of sharing a data structure between perl and c++ (something like Storable, but that is meant for perl->perl not perl->c++). The current method is a headache to maintain, and may not have the best performance.
The most important factors are speed of deserialisation, and the size of the serialized structure in that order. Anyone know of something that might do the trick?
Storable is one way to dump and load perl data structures. I wouldn't actually recommend it for general usage though - it's handy in that it's part of core and easy to use.
But for multi-platform (and language) portability, it's far better to use a standard data representation. Which you choose is probably a matter of what sort of data you're holding in your structure, but core contenders are:
JSON - good for arrays and hashes (key-value).
YAML - Excellent for 'config file' style data (but extends in ways similar to JSON)
And if you must, XML - but bear in mind that XML is designed for documents-with-metadata, and so IMO isn't suitable for most of the applications it's used for.
As standards, they've got documented formatting and parsers are widely available. And implementing your own isn't too hard, if that's the route you want to go. Just make sure you follow the spec and you're good.
Note - that because XML and JSON (and I think YAML?) are recursive, you can parse as a stream, rather than a standalone object. (Trap, process and discard as you hit 'close brackets' in JSON, or 'close tags' in XML).
easy job.
I like perl , and I also like C/C++. To make the best of both,
I wrote a github project to solve this issue.
please see:
https://github.com/tlqtangok/perlcpp
a short example is here :
P_eval("$a=2;$a=$a**10;");
Int("a") ; // a= 1024

Efficient ways to save and load data from C++ simulation

I would like to know which are the best way to save and load C++ data.
I am mostly interested in saving classes and matrices (not sparse) I use in my simulations.
Now I just save them as txt files, but if I add a member to a class I then have to modify the function that loads the data (it has to parse and check for the value in the txt file),
that I think is not ideal.
What would you recommend in general? (p.s. as I'd like to release my code I'd really like to use only standard c++ or libraries that can be redistributed).
In this case, there is no "best." What is best for you is highly dependent upon your situation. But, lets have an example to get you thinking about your details and how deep this rabbit hole can go.
If you absolutely positively must have the fastest save possible without question (and you're willing to pay the price), you can define your own memory management to put all objects into a contiguous array of a common type (such as integers). This allows you to write that array to disk as binary data very rapidly. You might need this in a simulation that uses threads efficiently to load every core/processor to run at real time.
Why is a rather horrible solution? Because it takes a LOT of work and runs many risks for problems in the name of "optimization."
It requires you to build your own memory management (operator new() and operator delete()) which may need to be thread safe.
If you try to load from this array, you will have to placement new all objects with a unique non-modifying constructor in order to ensure all virtual pointers are set properly. Oh, and you have to track the type of each address to now how to do this.
For portability with other systems and between versions of the binary, you will need to have utilities to convert from the binary format to something generic enough to be cross platform (including repopulating pointers to other objects).
I have done this. It was highly unpleasant. I have no doubt there are still problems with it and I have only listed a few here. But, it was very, very fast and very, very, very problematic.
You must design to your needs. Generally, the first need is "Make it work." Don't care about efficiency, just about something that accurately persists and that you have the information known and accessible at some point to do it. Also, you should encapsulate the process of saving and loading. Then, if the need "Make it better" steps in, you should be able to change that one bit of code and the rest should work. You might even make the saving format selectable on user needs instead of your needs which you must assume for all users.
Given all the assumptions, pros and cons listed, you should be able to elaborate your particular needs for this question.
Given that performance is not your concern -- which is a critical part of the answer -- the Boost Serialization library is a great answer.
The link in the comment leads to the documentation. Read the tutorial (which is overkill for what you are initially wanting, but well worth it).
Finally, since you have mostly array matrices, try to encapsulate the entire process of save and load so that should you need to change it later, you are writing a new implementatio and choosing between the exisiting. I expend the eddedmtime for the smarts of Boost Serialization would not be great; however, you might find a future requirement moves you to something else or multiple something elses.
The C++ Middleware Writer automates the creation of marshalling functions. When you add a member to a class, it updates the marshalling functions for you.

C++ Boost.serialization vs simple load/save

I am computational scientist that work with large amount of simulation data and often times I find myself saving/loading data into/from the disk. For simple tasks, like a vector, this is usually as simple as dumping bunch of numbers into a file and that's it.
For more complex stuff, life objects and such, I have save/load member functions. Now, I'm not a computer scientist, and thus often times I see terminologies here on SO that I just do not understand (but I love to). One of these that I've came across recently is the subject of serialization and Boost.Serialization library.
From what I understand serialization is the simply the process of converting your objects into something that can be saved/loaded from dist or be transmitted over a network and such. Considering that at most I need to save/load my objects into/from disk, is there any reason I should switch from the simple load/save functions into Boost.Serialization? What would Boost.Serialization give me other than what I'm already doing?
That library takes into accounts many details that could be non very apparent from a purely 'applicative' point of view.
For instance, data portability WRT big/little numeric endianess, pointed data life time, structured containers, versioning, non intrusive extensions, and more. Moreover, it handles the right way the interaction with other std or boost infrastructure, and dictates a way of code structuring that will reward you with easier code maintenance. You will find ready to use serializers for many (all std & boost ?) containers.
And consider if you need to share your data with someone other, there are chances that referring to a published, mantained, and debugged schema will make things much easier.

Planning for a file indexing program

I'm somewhat new to C++, but not programming in general. I want to write my first practice program in C++ as a file indexing program.
It's seems easy enough scanning directories for names, storing that information, and filtering them depending on what I want to view.
What I'm concerned about is at some point, I want to index a whole drive (I have an extra 1TB drive apart from my OS to store files on). I have about 400,000-500,000 files on there and I was wondering what would be the best way to store this information? I highly doubt keeping all those records in a text file is optimal and would like to think it's naive.
Is there anything else I should be concerned about?
Thanks.
Isn't some kind of database the obvious answer?
If you don't want to hook up to a server, you can try something like SQLite. Alternatively, if you only need to do basic lookups, you could also create your own proprietary file format. You can utilize any combination of binary and textual data in your file. It's hard to suggest possible layouts without knowing what data you need to store and how you'll be accessing it.
You can safely persist your data to a text file. However, you'd need to read the file into memory at startup, and do all the complex operations in memory. Even if we'd assume a naive approach, where you store the file path with every file, you'd still look at ~100 bytes/file, or ~50 megabyte. A smarter approach stores just the filename and a pointer to the directory name.

What is the best way I should go about creating a program to store information into a file, edit the information in that file, and add new information

I'm about to start on a little project i'm trying to do where I create a C++ program to store inventory data into a file ( I guess a .txt will do )
• Item Description • Quantity on Hand
• Wholesale Cost • Retail Cost • Date
Added to Inventory
I need to be able to:
• Add new records to the file
• Display any record in the file
• Change any record in the file
Is there anything I should know of before I start this that could make this much more easy & efficient...
Like for example, should I try and use XML or what that be too hard to work with via C++?
I've never really understood the most efficient way of doing this.
Like would I search through the file and look for things in brackets or something?
EDIT
The datasize shouldn't be too large. It is for homework I guess you could say. I want to write the struct's contents into a file's route, how would I go about doing that?
There are many approaches. Is this for homework or for real use? If it's for homework, there are probably some restrictions on what you may use.
Otherwise I suggest some embedded DBMS like SQLite. There are others too, but this will be the most powerful solution, and will also have the easiest implementation.
XML is also acceptable, and has many reusable implementations available, but it will start loosing performance once you go into thousands of records. The same goes for JSON. And one might still debat which one is simpler - JSON or XML.
Another possibility is to create a struct and write its contents directly to the file. Will get tricky though if the record size is not constant. And, if the record format changes, the file will need to be rebuilt. Otherwise this solution could be one of the best performance-wise - if implemented carefully.
Could you please enlighten us why don't you want to use a database engine for it?
If it is just for learning then.... give us please an estimated size of stored data in that file and the access pattern (how many users, how often they do it etc.)?
The challenge will be to create an efficient search and modification code.
For the search, it's about data structures and organization.
For the modification, it's how would you write updates to the file without reading it completely into memory, updating it there and then writing it again completely back to the file.
If this is a project that will actually be used, with the potential to have features added over time, go for a database solution from the start, even if it seems overkill. I've been down this road before, small features get added over time, and before you realize it you have implemented a database. Poorly. Bite the bullet and use a database.
If this is a learning exercise, it depends on the amount of data you want to store. If it is small, the easiest thing to do is read the entire file into memory and operate on it there. When changes are made, write the entire file back out to disk. If the data is too large to do that, the next best thing is to have fixed sized records. Create a POD struct that contains all of the data (i.e., no pointers, stl containers, etc). Then you can rewrite individual records without needed to rewrite the entire file. If neither of these will work, your best bet is a database solution.
If you insist to do it manually, I suggest JSON instead of XML.
Also, consider sqlite.
This sounds like a perfect job for SQLite. Small, fast, flexible and easy to use.