hibernate-like saving state of a program - c++

Is there any way in C++ or Java or Python that would allow me to save the state of my program, no questions asked? For example, I've spent an hour learning how to save a tree-like structure into a file. Very educative but I feel I could just do:
saveState(file);
And the "file" would contain whole memory my program uses. Just like operating system's "hibernate" or "suspend-to-disk" feature. I know about boost serialization, this is probably not what I'm looking for.

What you most likely want is what we call serialization or object marshalling. There are a whole butt load of academic problems with data/object serialization that you can easily google.
That being said given the right library (probably very native) you could do a true snapshot of your running program similarly what "OS specific hibernate" does. Here is an SO answer for doing that on Linux: https://stackoverflow.com/a/12190830/318174
To do the above snapshot-ing though you will most likely need an external process from the process you want to save. I highly recommend you don't that. Instead read/lookup in your language of choice (btw welcome to SO, don't tag every language... that pisses people off) how to do serialization or object marshalling... hint... most people these days pick JSON.

I think that what you describe would be a feature that few people would actually want to use for a real system. Usually you want to save something so it can be transmitted, or so you can stop running the program, or guard against the possibility that the program quits (or power fails).
In most production systems one wants to make the writes to disk small and incremental so that the system can remain responsive, and writing inconsistent data can be avoided. Writing ALL memory to disk on a regular basis would probably result in lots of non-responsive time. You would need to lock the entire system to avoid inconsistent state.
Writing your own persistence is tedious and error prone however so you may find this SO question of interest: Persisting graph data (Java)

There are a couple of frameworks around this. Check out Google Protocol Buffers if you need support for Java, Python, and C++ https://developers.google.com/protocol-buffers/ I've used it in some projects and it works well.
There's also Thrift (orginally from Facebook) http://thrift.apache.org/ I don't have any experience with it though.
Another option is what #QuentinUK suggests. Use a class that inherits from something streamable and/or make streamable operators/functions.
I'd use a framework.

Here's your problem:
http://en.wikipedia.org/wiki/Address_space_layout_randomization
Back in ancient history (16-bit DOS programs with extenders), compilers used to support "based" pointers which stored relative addresses. These were safe to serialize en masse. And applications did so, saving both code and data, the serialized modules were called "overlays".
Today, you'd need based pointer support in your toolchain (resulting in every pointer access requiring an extra adjustment), or else to go through all the data, distinguishing the pointers from the other data (how?) and adjusting them to their new storage location, in case the OS already loaded some library at the same address your old program had used for its heap. In modern "managed" environments, where pointers already have to be identified for the garbage collector, this is feasible even if not commonly done. In native code, it's very difficult, although that metadata is created to enable relocation of shared libraries.
So instead people end up walking their entire data structures manually, and converting object links (pointers) into something that can be restored on the other end, even though the object has a new address (again, because the old address may have been used for a shared library).
Note that many processors have features to support based addressing... and that since based addressing is no longer common, compilers went ahead and used those pointer arithmetic features to speed up user code.

Yes, derive objects from a streamable class and add the streaming functions. Then you can stream everything to disk. You will need a library for this such as MFC.

Related

Are the C++ Standard Library File stream operations crippled in Microsoft?

I'm asking this question because I have been working on a project that requires collecting a lot of data REALLY fast, depending on the scenario. 5.7GBytes with a capital BYTE per second or 11.4GBytes per second.
We are working with a small striped raid array using 3 Samsung Pro NVME (for 11.4GB/s we have a larger array).
Currently, the project has been developed on Windows, I wanted to make things as portable as possible so I focused on using C++ Standard Library; however, no matter what I did I could not crack transferring files faster than 1.5GB/s
The strategy was simple to create a couple of huge swap buffers, and write them directly to disk as a huge unformatted binary file.
Using std::ofstream
and benchmarking manually setting varied buffer sizes through:
rdbuf()->pubsetbuf(buffer, BUFFER_SIZE);
open(Filename, std::ios::binary|std::ios::trunc);
followed by my managed write loop, I was able to find a sweet spot, but never able to crack 1.5GB/s
I then found the Windows SDK and its CreateFile function
In particular, the create file function using the FILE_FLAG_NO_BUFFERING flag.
This was a game-changer, as long as I made sure I fed it sector-aligned data (in my case everything needed to be some multiple of 512Bytes) I was suddenly able to take full advantage of the raid array throughput.
I revisited the std::ofstream function in an attempt to work with more OS-agnostic functions; however, even though one can specify zero buffer for std::ofstream, there doesn't appear to be any documentation with regards to any caveats to using that function with no buffer.
std::ofstream allows 64bit values for its write size, unlike Windows SDK WriteFile which only accepts DWORD's setting the maximum write size is the largest multiple of 512 one can squeeze into a uint32_t and you must manage your write in a loop if your file exceeds 4GB (mine do).
This just raises the question, is Microsoft simply not giving the C++ Standard Library Devs access to the necessary OS-level system calls to take advantage of Ultra-high-speed drive arrays? Or am I missing something in how to use the C++ Standard Library to its full potential?
"is Microsoft simply not giving the C++ Standard Library Devs..."
You might notice that the product you're using is called Microsoft Visual Studio. The Standard Library developers for Visual Studio work at Microsoft, although in a different team as the Windows developers.
The reason is a bit more simple: the Visual C++ devs can't possibly know and optimize for all possible use scenario's. It's a bit unusual to do text formatting at such high speeds. Remember, the point of ostream is to provide operator<<. ofstream is for formatted output to files. But for high-speed I/O you want binary output anyway.
To put it bluntly, the bandwidth you're aiming for are within the ballpark of the physical limits of current commodity hardware (~24GByte/s for 16×PCIe.4), and in my own work I found it very challenging to reach single-core memory transfer rates above 8GByte/s without the use of "dark magic" (aka hand crafted assembly and optimized system call code), and it involved carefully aligning the memory accesses and making use of vector extensions. But most importantly, to reach these levels of optimization requires to be aware of the kind of data that is being processed and what kind of access patters to expect and/or build caching intermediaries to accomodate for the underlying hardware.
Such optimizations are plain and simply outside of the scope of general purpose standard libraries. Standard libraries in their implementation must adhere to the behaviours written down in the specification, and some of these requirements tend to collide with what has to be done to make the most of the underlying hardware.
So I'm sorry to tell you, but you'll probably have to bite the bullet and use the low level system APIs directly, bypassing the standard library.

Creating a limited use version of a program in VC++

Our company helps migrate client software from other languages to C++. We provide them C++ source code for their application along with header files and compiled libraries for runtime support functions. We charge for both the migration as well as the runtime. Recently a potential client asked to migrate one of a number of systems they have. This system contains 7 programs and we would like to limit the runtime so only these 7 programs can acess it. We can time limit the runtime by putting an encrypted expiration date in the object library but, since we have to provide the source code for the converted programs, we are having difficult coming up with a way to limit the access to a specific set of programs. Obviously, anything we put into the source code to identify the program could be copied to any other program so the only hope seems to be having the run time library discover some set of characteristics about the programs and then validating them against a set of characteristics embedded in the run time library. As I understand it, C++ has very little reflection capability (RTTI is all I could find) so I wanted to ask if anyone has faced a similar problem and found a way to solve it. Thanks in advance for any suggestions.
Based on the two answers a little clarification seems in order. We fully expect the client to modify the source code and normally we provide them an unrestricted version of the runtime libraries. This particular client requested a version that was limited to a single system and is happy to enter into a license that restricts the use of the runtime library to that system. Therefore a discussion of the legal issues isn't relevant. The issue is a technical one -- given a license that is limited to a single system and given that the client has the source to the calling programs but not the runtime, is there a way to limit access to the runtime to the set of programs comprising that system thus enforcing the terms of the license.
If they're not supposed to make further changes to the programs, why did you give them the source code? And if they are expected to continue changing the programs (i.e. maintenance), who decides whether a change constitutes a new program that's not allowed to use the library?
There's no technical way to enforce that licensing model.
There's possibly a legal way -- in the code that loads/enables the library, write a comment "This is a copy protection measure". Then DMCA forbids them from including that code into other programs (in the USA). But IANAL, and I don't think DMCA is valid anyway.
Consult a lawyer to find out what rights you have under the contract/bill of sale to restrict their use.
The most obvious answer I could think of is to get the name and/or path of the calling process-- simply compare this name to the 7 "allowed" programs in your support library. Certainly, they could create a new process with the same name, but they might not know to do so.
Another level could be to further compare the executable size against the known size for that application. (You'll likely want to allow a reasonably wide range around the expected size, in case they make changes to the source code, and/or compile with different options.)
As another thought, you might try adding some seemingly benign strings into the app's resources. ("Copyright 2011 ~Your Corporation Name~")-- You can then scan the parent executable for the magic strings. If they create a new product, they might not think to create this resource.
Finally, as already noted by Ben, if you are giving them the source code, there are likely no foolproof solutions to this problem. (As he said, at what point does "modified" code become a new application?) The best you will likely be able to do is to add enough small roadblocks that they won't bother trying to use that lib for another product. It likely depends on how determined and/or lucky they are.
Why not just technically limit the use of the runtime to one system? There are many software protection solutions out there, one that comes to my mind is SmartDongle.
Now the runtime could still be used by any other program on that machine, but I think this should be a minor concern, no?

Open-source C++ scanning library

Rationale: In my day-to-day C++ code development, I frequently need to
answer basic questions such as who calls what in a very large C++ code
base that is frequently changing. But, I also need to have some
automated way to exactly identify what the code is doing around a
particular area of code. "grep" tools such as Cscope are useful (and
I use them heavily already), but are not C++-language-aware: They
don't give any way to identify the types and kinds of lexical
environment of a given use of a type or function a such way that is
conducive to automation (even if said automation is limited to
"read-only" operations such as code browsing and navigation, but I'm
asking for much more than that below).
Question: Does there exist already an open-source C/C++-based library
(native, not managed, not Microsoft- or Linux-specific) that can
statically scan or analyze a large tree of C++ code, and can produce
result sets that answer detailed questions such as:
What functions are called by some supplied function?
What functions make use of this supplied type?
Ditto the above questions if C++ classes or class templates are involved.
The result set should provide some sort of "handle". I should be able
to feed that handle back to the library to perform the following types
of introspection:
What is the byte offset into the file where the reference was made?
What is the reference into the abstract syntax tree (AST) of that
reference, so that I can inspect surrounding code constructs? And
each AST entity would also have file path, byte-offset, and
type-info data associated with it, so that I could recursively walk
up the graph of callers or referrers to do useful operations.
The answer should meet the following requirements:
API: The API exposed must be one of the following:
C or C++ and probably is "C handle" or C++-class-instance-based
(and if it is, must be generic C o C++ code and not Microsoft- or
Linux-specific code constructs unless it is to meet specifics of
the given platform), or
Command-line standard input and standard output based.
C++ aware: Is not limited to C code, but understands C++ language
constructs in minute detail including awareness of inter-class
inheritance relationships and C++ templates.
Fast: Should scan large code bases significantly faster than
compiling the entire code base from scratch. This probably needs to
be relaxed, but only if Incremental result retrieval and Resilient
to small code changes requirements are fully met below.
Provide Result counts: I should be able to ask "How many results
would you provide to some request (and no don't send me all of the
results)?" that responds on the order of less than 3 seconds versus
having to retrieve all results for any given question. If it takes
too long to get that answer, then wastes development time. This is
coupled with the next requirement.
Incremental result retrieval: I should be able to then ask "Give me
just the next N results of this request", and then a handle to the
result set so that I can ask the question repeatedly, thus
incrementally pulling out the results in stages. This means I
should not have to wait for the entire result set before seeing
some subset of all of the results. And that I can cancel the
operation safely if I have seen enough results. Reason: I need to
answer the question: "What is the build or development impact of
changing some particular function signature?"
Resilient to small code changes: If I change a header or source
file, I should not have to wait for the entire code base to be
rescanned, but only that header or source file
rescanned. Rescanning should be quick. E.g., don't do what cscope
requires you to do, which is to rescan the entire code base for
small changes. It is understood that if you change a header, then
scanning can take longer since other files that include that header
would have to be rescanned.
IDE Agnostic: Is text editor agnostic (don't make me use a specific
text editor; I've made my choice already, thank you!)
Platform Agnostic: Is platform-agnostic (don't make me only use it
on Linux or only on Windows, as I have to use both of those
platforms in my daily grind, but I need the tool to be useful on
both as I have code sandboxes on both platforms).
Non-binary: Should not cost me anything other than time to download
and compile the library and all of its dependencies.
Not trial-ware.
Actively Supported: It is likely that sending help requests to mailing lists
or associated forums is likely to get a response in less than 2
days.
Network agnostic: Databases the library builds should be able to be used directly on
a network from 32-bit and 64-bit systems, both Linux and Windows
interchangeably, at the same time, and do not embed hardcoded paths
to filesystems that would otherwise "root" the database to a
particular network.
Build environment agnostic: Does not require intimate knowledge of my build environment, with
the notable exception of possibly requiring knowledge of compiler
supplied CPP macro definitions (e.g. -Dmacro=value).
I would say that CLang Index is a close fit. However I don't think that it stores data in a database.
Anyway the CLang framework offer what you actually need to build a tool tailored to your needs, if only because of its C, C++ and Objective-C parsing / indexing capabitilies. And since it's provided as a set of reusable libraries... it was crafted for being developed on!
I have to admit that I haven't used either because I work with a lot of Microsoft-specific code that uses Microsoft compiler extensions that i don't expect them to understand, but the two open source analyzers I'm aware of are Mozilla Pork and the Clang Analyzer.
If you are looking for results of code analysis (metrics, graphs, ...) why not use a tool (instead of API) to do that? If you can, I suggest you to take a look at Understand.
It's not free (there's a trial version) but I found it very useful.
Maybe Doxygen with GraphViz could be the answer of some of your constraints but not all,for example the analysis of Doxygen is not incremental.

Experience with IBPP interface for Firebird database

I'd like to the ask guys with experience in Firebird and IBPP (especially the latter). I found a lot of positive posts about Firebird but I'm having a problem to decide about IBPP. The interface itself is clean and simple but it seems that the project does not have much of activity going on (maybe because it's very stable).
Would you recommend IBPP for production environment?
Is it thread-safe?
Any known bugs?
Thanks.
In addition to the points Milan mentioned:
There is currently no way to use more than one client library when connecting to different databases, or even to specify which client library will be used. There is a certain hard-coded sequence of client library locations that are probed, and the first one that is found will be used for all connections. An IBPP version changing this has been hinted at for a very long time, but hasn't arrived yet. SVN trunk contains some code to deal with this, but I'd say that's alpha quality at most.
And all of this holds true for Windows only, as on all other platforms the Firebird client library isn't loaded at runtime anyway.
The library isn't thread-safe. That doesn't matter for the most part, as you should let each thread have its own connection, transaction and other assorted objects anyway. But IBPP uses its own smart pointer implementation, which is neither completely exception-safe nor thread-safe. Still, as long as you initialize the library from the main thread (before any other thread is created) and create and destroy IBPP objects in the same thread (so absolutely no sharing of objects with other threads!) using IBPP in multiple threads should work fine.
If you can live with the points above (they may not matter to you, at all) it is certainly ready for production use. You can always change things you run into, as we did for FlameRobin too.
IBPP is very stable and I would recommend it for production. That is, if you're going to use it for regular applications.
If you want to build an admin tool or something similar, then be prepared to go inside and get your hands dirty as some of the newer features (i.e. Firebird 2.5 stuff) that are not SQL but API improvements are not supported. For example, it is missing a layer that would expose the new trace API.
Anyway, go ahead and I use it. I have a bunch of IBPP applications in production for years, and, as Douglas wrote, FlameRobin is using IBPP and it works flawlessly (at least as far as DB layer is concerned).
The only thing to be careful about is NUMERIC fields, which are internally stored as integer+scale in Firebird. IBPP exposes those via C/C++ "double", but also via 16/32/64bit integer. So be very careful when retrieving such values, as you will get no warning. For example, if you have DECIMAL(18,2) field with value 254.00 in it, and you accidentaly read that into an integer, you will get 25400, not 254. Make sure you either read those in as double or scale yourself later. This is useful because you can safely convert 25400 to string and then add a decimal point, so you don't lose precision with double (it all depends on the kind of your application and which digits count, of course).
I can't really tell from experience because I've never used IBPP.
But apparently it's used by the flamerobin project so I'd trust it to be 'stable enough'.

Serialisation and Databasetechnics to locate objects

I was wondering if there is a (c++ or D) serialisation library that also provides technics to locate a certain object (which was saved to disk) based to certain criterias like a certain attribute combination.
I know about sqlite and mySQL and so own, but I search for an alternative.
Because those databases are not bound to a specific database schema,one known at compile time (of the library), they are only that good.
A library that knows the structures at compile time can be optimized to a very big degree for that structure.
Maybe there is even a library creator, as input you give your c++ classes and desired location/identity attributes and as output you get a serialization/database-library that is heavily optimized for locating the objects based on your needs.
Additionally, I think sqlite and similar is overpowered for my use as I don’t need all the sql features, only the locating of an object based on its attributes..
Greetings,
--Marenz
I have a work-in-progress Serialization lib that I've been working on. (Some commentary here.) It's not done but it works, it just lacks a lot of polish and "convenience" features. If you are interested in using it, I'd be interested in feedback and feature requests.
As for DB storage, I'd just go with SQLite or MySQL. From what I've read, the query optimizers for DBs use more information than you can have a compile time (they look at data distributions and such).
OTOH I have been thinking of making a compile time SQL engine that uses meta-programming to build query plans at compile time. I've got a few other projects that I'll need to get done first though (like a file_malloc to allocate space in a file.)