Serialisation and Databasetechnics to locate objects - c++

I was wondering if there is a (c++ or D) serialisation library that also provides technics to locate a certain object (which was saved to disk) based to certain criterias like a certain attribute combination.
I know about sqlite and mySQL and so own, but I search for an alternative.
Because those databases are not bound to a specific database schema,one known at compile time (of the library), they are only that good.
A library that knows the structures at compile time can be optimized to a very big degree for that structure.
Maybe there is even a library creator, as input you give your c++ classes and desired location/identity attributes and as output you get a serialization/database-library that is heavily optimized for locating the objects based on your needs.
Additionally, I think sqlite and similar is overpowered for my use as I don’t need all the sql features, only the locating of an object based on its attributes..
Greetings,
--Marenz

I have a work-in-progress Serialization lib that I've been working on. (Some commentary here.) It's not done but it works, it just lacks a lot of polish and "convenience" features. If you are interested in using it, I'd be interested in feedback and feature requests.
As for DB storage, I'd just go with SQLite or MySQL. From what I've read, the query optimizers for DBs use more information than you can have a compile time (they look at data distributions and such).
OTOH I have been thinking of making a compile time SQL engine that uses meta-programming to build query plans at compile time. I've got a few other projects that I'll need to get done first though (like a file_malloc to allocate space in a file.)

Related

Local/dedicated sql database for a program

I have a program in C++ under development, which handles a lot of different types of objects which need to be retrieved from files on start-up. Total file size is a little less than 1GB. The program must run at least on Windows 7 and Linux.
My goal is to provide the functionality for reading from files, and storing them into objects (including data validation), so the solutions I can think of are:
simply provide at least one function per each object type to read from files, and copy the data to newly created objects. This is what I've been doing so far.
Use a database to store data and retrieve it using SQL.
I would like to switch to the second option, however I have a few limitations:
I am not allowed to connect to the internet with my program, so a remote server database is out of the question.
I am not allowed to make a program that forces the user to install additional software, in this case, a local database.
Access to the local data can be done only by running the program: no additional executables that the user can run to modify or read the files.
I have searched the internet for a c++ library that allows a program to store/retrieve specific data to local files, using SQL syntax, but so far I've got nothing. Database software I know, like MySQL, Postgresql etc. won't do because of the restrictions I mentioned.
My main question is: is there such library for c++, or do I have to implement my own?
I would prefer it were free, but I can do with anything I can find.
If there are no libraries like this, does anyone know any other alternatives for storing data locally for a c++ program? And if I were to implement my own, what way (as in file data structures) is there to store my objects (without it being extremely slow for less than 1GB of data)?
SQLite fits all your requirements except the last one, that prohibit users from modifying the data outside your program. Unfortunately, this one would be tough to overcome: a sufficiently proficient and determined user will always be able to break this restriction, even if you store your data in plain files of custom format.
Everything else fits perfectly: the database lives in a file, does not need a network connection, and can be installed as part of your program (it's a library that you link to your code).

hibernate-like saving state of a program

Is there any way in C++ or Java or Python that would allow me to save the state of my program, no questions asked? For example, I've spent an hour learning how to save a tree-like structure into a file. Very educative but I feel I could just do:
saveState(file);
And the "file" would contain whole memory my program uses. Just like operating system's "hibernate" or "suspend-to-disk" feature. I know about boost serialization, this is probably not what I'm looking for.
What you most likely want is what we call serialization or object marshalling. There are a whole butt load of academic problems with data/object serialization that you can easily google.
That being said given the right library (probably very native) you could do a true snapshot of your running program similarly what "OS specific hibernate" does. Here is an SO answer for doing that on Linux: https://stackoverflow.com/a/12190830/318174
To do the above snapshot-ing though you will most likely need an external process from the process you want to save. I highly recommend you don't that. Instead read/lookup in your language of choice (btw welcome to SO, don't tag every language... that pisses people off) how to do serialization or object marshalling... hint... most people these days pick JSON.
I think that what you describe would be a feature that few people would actually want to use for a real system. Usually you want to save something so it can be transmitted, or so you can stop running the program, or guard against the possibility that the program quits (or power fails).
In most production systems one wants to make the writes to disk small and incremental so that the system can remain responsive, and writing inconsistent data can be avoided. Writing ALL memory to disk on a regular basis would probably result in lots of non-responsive time. You would need to lock the entire system to avoid inconsistent state.
Writing your own persistence is tedious and error prone however so you may find this SO question of interest: Persisting graph data (Java)
There are a couple of frameworks around this. Check out Google Protocol Buffers if you need support for Java, Python, and C++ https://developers.google.com/protocol-buffers/ I've used it in some projects and it works well.
There's also Thrift (orginally from Facebook) http://thrift.apache.org/ I don't have any experience with it though.
Another option is what #QuentinUK suggests. Use a class that inherits from something streamable and/or make streamable operators/functions.
I'd use a framework.
Here's your problem:
http://en.wikipedia.org/wiki/Address_space_layout_randomization
Back in ancient history (16-bit DOS programs with extenders), compilers used to support "based" pointers which stored relative addresses. These were safe to serialize en masse. And applications did so, saving both code and data, the serialized modules were called "overlays".
Today, you'd need based pointer support in your toolchain (resulting in every pointer access requiring an extra adjustment), or else to go through all the data, distinguishing the pointers from the other data (how?) and adjusting them to their new storage location, in case the OS already loaded some library at the same address your old program had used for its heap. In modern "managed" environments, where pointers already have to be identified for the garbage collector, this is feasible even if not commonly done. In native code, it's very difficult, although that metadata is created to enable relocation of shared libraries.
So instead people end up walking their entire data structures manually, and converting object links (pointers) into something that can be restored on the other end, even though the object has a new address (again, because the old address may have been used for a shared library).
Note that many processors have features to support based addressing... and that since based addressing is no longer common, compilers went ahead and used those pointer arithmetic features to speed up user code.
Yes, derive objects from a streamable class and add the streaming functions. Then you can stream everything to disk. You will need a library for this such as MFC.

Creating a limited use version of a program in VC++

Our company helps migrate client software from other languages to C++. We provide them C++ source code for their application along with header files and compiled libraries for runtime support functions. We charge for both the migration as well as the runtime. Recently a potential client asked to migrate one of a number of systems they have. This system contains 7 programs and we would like to limit the runtime so only these 7 programs can acess it. We can time limit the runtime by putting an encrypted expiration date in the object library but, since we have to provide the source code for the converted programs, we are having difficult coming up with a way to limit the access to a specific set of programs. Obviously, anything we put into the source code to identify the program could be copied to any other program so the only hope seems to be having the run time library discover some set of characteristics about the programs and then validating them against a set of characteristics embedded in the run time library. As I understand it, C++ has very little reflection capability (RTTI is all I could find) so I wanted to ask if anyone has faced a similar problem and found a way to solve it. Thanks in advance for any suggestions.
Based on the two answers a little clarification seems in order. We fully expect the client to modify the source code and normally we provide them an unrestricted version of the runtime libraries. This particular client requested a version that was limited to a single system and is happy to enter into a license that restricts the use of the runtime library to that system. Therefore a discussion of the legal issues isn't relevant. The issue is a technical one -- given a license that is limited to a single system and given that the client has the source to the calling programs but not the runtime, is there a way to limit access to the runtime to the set of programs comprising that system thus enforcing the terms of the license.
If they're not supposed to make further changes to the programs, why did you give them the source code? And if they are expected to continue changing the programs (i.e. maintenance), who decides whether a change constitutes a new program that's not allowed to use the library?
There's no technical way to enforce that licensing model.
There's possibly a legal way -- in the code that loads/enables the library, write a comment "This is a copy protection measure". Then DMCA forbids them from including that code into other programs (in the USA). But IANAL, and I don't think DMCA is valid anyway.
Consult a lawyer to find out what rights you have under the contract/bill of sale to restrict their use.
The most obvious answer I could think of is to get the name and/or path of the calling process-- simply compare this name to the 7 "allowed" programs in your support library. Certainly, they could create a new process with the same name, but they might not know to do so.
Another level could be to further compare the executable size against the known size for that application. (You'll likely want to allow a reasonably wide range around the expected size, in case they make changes to the source code, and/or compile with different options.)
As another thought, you might try adding some seemingly benign strings into the app's resources. ("Copyright 2011 ~Your Corporation Name~")-- You can then scan the parent executable for the magic strings. If they create a new product, they might not think to create this resource.
Finally, as already noted by Ben, if you are giving them the source code, there are likely no foolproof solutions to this problem. (As he said, at what point does "modified" code become a new application?) The best you will likely be able to do is to add enough small roadblocks that they won't bother trying to use that lib for another product. It likely depends on how determined and/or lucky they are.
Why not just technically limit the use of the runtime to one system? There are many software protection solutions out there, one that comes to my mind is SmartDongle.
Now the runtime could still be used by any other program on that machine, but I think this should be a minor concern, no?

Writing dynamically loadable components in c++

I'm currently working on a program which should perform calculations on a home brewed data structure.
I want to build it in a way that it would be easy to add supported calculations (say, as source files which conform to a predetermined structure).
The problem is that I don't want to load all calculations in advance, because there might be a lot of them.
The only mechanism I found which supports dynamic loading of functionality is dlopen, which expects .so files, so in this context, using dlopen means compiling a separate so file for every group of computations.
While I don't see any inherent problem with this design, my spider senses tell me I should verify with the all-knowing-web that it's not utterly stupid. If there are any other suggested ways to do so I'd be glad to hear.
Using dlopen() is the most widely used way to load executable code dynamically in an application on POSIX-compatible operating systems. It allows using a modular architecture where optional or rarely used code is only loaded on-demand, which sounds pretty much like what you need.
I would certainly use this method - if after some time you find that the shared object compilation step is becoming a hurdle, you can build additional dynamically loaded modules to support e.g. an interpretted language such as Lua or Python. This would allow you to keep your existing codebase without losing in extensibility.
Seems like a good approach.
A good way to do this is to declare an abstract (pure) class in C++, say Calculator, with all the methods and accessors you need to perform a calculation. Then, have your separate dynamic libraries or .so files implement a global function Calculator * create_calculator() that creates an instance of a class that derives from Calculator. Finally, you'll have to devise a registration mechanism so that your main program can determine the name of the dynamic library to load, based on some kind of identifier like a string , enum, or uuid. This would typically be available as a easily editable configuration file.
void *handle;
int (*create_calculator)();
/* open the needed object file */
char *libName = get_lib_name_from_config(identifier);
handle = dlopen(libName, RTLD_LOCAL | RTLD_LAZY);
/* find the address of create_calculator function */
create_calculator = (*(Calculator*)()) dlsym(handle, "create_calculator");
Calculator * calc = create_calculator();
This scheme can be made more flexible (and complex) by allowing the create_calculator method name to vary, at the cost of having to obtain that from the config file as well.
Opening shared libraries using dlopen() is certainly the first thing that comes to my mind; it's a fine plan.

Open-source C++ scanning library

Rationale: In my day-to-day C++ code development, I frequently need to
answer basic questions such as who calls what in a very large C++ code
base that is frequently changing. But, I also need to have some
automated way to exactly identify what the code is doing around a
particular area of code. "grep" tools such as Cscope are useful (and
I use them heavily already), but are not C++-language-aware: They
don't give any way to identify the types and kinds of lexical
environment of a given use of a type or function a such way that is
conducive to automation (even if said automation is limited to
"read-only" operations such as code browsing and navigation, but I'm
asking for much more than that below).
Question: Does there exist already an open-source C/C++-based library
(native, not managed, not Microsoft- or Linux-specific) that can
statically scan or analyze a large tree of C++ code, and can produce
result sets that answer detailed questions such as:
What functions are called by some supplied function?
What functions make use of this supplied type?
Ditto the above questions if C++ classes or class templates are involved.
The result set should provide some sort of "handle". I should be able
to feed that handle back to the library to perform the following types
of introspection:
What is the byte offset into the file where the reference was made?
What is the reference into the abstract syntax tree (AST) of that
reference, so that I can inspect surrounding code constructs? And
each AST entity would also have file path, byte-offset, and
type-info data associated with it, so that I could recursively walk
up the graph of callers or referrers to do useful operations.
The answer should meet the following requirements:
API: The API exposed must be one of the following:
C or C++ and probably is "C handle" or C++-class-instance-based
(and if it is, must be generic C o C++ code and not Microsoft- or
Linux-specific code constructs unless it is to meet specifics of
the given platform), or
Command-line standard input and standard output based.
C++ aware: Is not limited to C code, but understands C++ language
constructs in minute detail including awareness of inter-class
inheritance relationships and C++ templates.
Fast: Should scan large code bases significantly faster than
compiling the entire code base from scratch. This probably needs to
be relaxed, but only if Incremental result retrieval and Resilient
to small code changes requirements are fully met below.
Provide Result counts: I should be able to ask "How many results
would you provide to some request (and no don't send me all of the
results)?" that responds on the order of less than 3 seconds versus
having to retrieve all results for any given question. If it takes
too long to get that answer, then wastes development time. This is
coupled with the next requirement.
Incremental result retrieval: I should be able to then ask "Give me
just the next N results of this request", and then a handle to the
result set so that I can ask the question repeatedly, thus
incrementally pulling out the results in stages. This means I
should not have to wait for the entire result set before seeing
some subset of all of the results. And that I can cancel the
operation safely if I have seen enough results. Reason: I need to
answer the question: "What is the build or development impact of
changing some particular function signature?"
Resilient to small code changes: If I change a header or source
file, I should not have to wait for the entire code base to be
rescanned, but only that header or source file
rescanned. Rescanning should be quick. E.g., don't do what cscope
requires you to do, which is to rescan the entire code base for
small changes. It is understood that if you change a header, then
scanning can take longer since other files that include that header
would have to be rescanned.
IDE Agnostic: Is text editor agnostic (don't make me use a specific
text editor; I've made my choice already, thank you!)
Platform Agnostic: Is platform-agnostic (don't make me only use it
on Linux or only on Windows, as I have to use both of those
platforms in my daily grind, but I need the tool to be useful on
both as I have code sandboxes on both platforms).
Non-binary: Should not cost me anything other than time to download
and compile the library and all of its dependencies.
Not trial-ware.
Actively Supported: It is likely that sending help requests to mailing lists
or associated forums is likely to get a response in less than 2
days.
Network agnostic: Databases the library builds should be able to be used directly on
a network from 32-bit and 64-bit systems, both Linux and Windows
interchangeably, at the same time, and do not embed hardcoded paths
to filesystems that would otherwise "root" the database to a
particular network.
Build environment agnostic: Does not require intimate knowledge of my build environment, with
the notable exception of possibly requiring knowledge of compiler
supplied CPP macro definitions (e.g. -Dmacro=value).
I would say that CLang Index is a close fit. However I don't think that it stores data in a database.
Anyway the CLang framework offer what you actually need to build a tool tailored to your needs, if only because of its C, C++ and Objective-C parsing / indexing capabitilies. And since it's provided as a set of reusable libraries... it was crafted for being developed on!
I have to admit that I haven't used either because I work with a lot of Microsoft-specific code that uses Microsoft compiler extensions that i don't expect them to understand, but the two open source analyzers I'm aware of are Mozilla Pork and the Clang Analyzer.
If you are looking for results of code analysis (metrics, graphs, ...) why not use a tool (instead of API) to do that? If you can, I suggest you to take a look at Understand.
It's not free (there's a trial version) but I found it very useful.
Maybe Doxygen with GraphViz could be the answer of some of your constraints but not all,for example the analysis of Doxygen is not incremental.