I have a program in C++ under development, which handles a lot of different types of objects which need to be retrieved from files on start-up. Total file size is a little less than 1GB. The program must run at least on Windows 7 and Linux.
My goal is to provide the functionality for reading from files, and storing them into objects (including data validation), so the solutions I can think of are:
simply provide at least one function per each object type to read from files, and copy the data to newly created objects. This is what I've been doing so far.
Use a database to store data and retrieve it using SQL.
I would like to switch to the second option, however I have a few limitations:
I am not allowed to connect to the internet with my program, so a remote server database is out of the question.
I am not allowed to make a program that forces the user to install additional software, in this case, a local database.
Access to the local data can be done only by running the program: no additional executables that the user can run to modify or read the files.
I have searched the internet for a c++ library that allows a program to store/retrieve specific data to local files, using SQL syntax, but so far I've got nothing. Database software I know, like MySQL, Postgresql etc. won't do because of the restrictions I mentioned.
My main question is: is there such library for c++, or do I have to implement my own?
I would prefer it were free, but I can do with anything I can find.
If there are no libraries like this, does anyone know any other alternatives for storing data locally for a c++ program? And if I were to implement my own, what way (as in file data structures) is there to store my objects (without it being extremely slow for less than 1GB of data)?
SQLite fits all your requirements except the last one, that prohibit users from modifying the data outside your program. Unfortunately, this one would be tough to overcome: a sufficiently proficient and determined user will always be able to break this restriction, even if you store your data in plain files of custom format.
Everything else fits perfectly: the database lives in a file, does not need a network connection, and can be installed as part of your program (it's a library that you link to your code).
Related
OK, maybe doing wrong. But want to teach PostgreSQL to distribute the files with extension functions in C++. The project works in the local network and may by about ten connections. I do not want to have FTP or other external solutions for storing images. I do not want to store images in the database. I would like so: fs_select_file(id), fs_insert_file(id, escaped_bytea), fs_delete_file(id), e.g. SELECT id, name, fs_select_file(id) as escaped_bytea FROM .... But can't find how to determine the path of the current database to use the [PGDATA]/files.
Don't mess with the datadir.
Even if you think it's safe to create your own contents within it, it probably isn't. Tools like pg_basebackup won't expect to see it.
Store your own content outside the PostgreSQL data directory. If you're working with a C extension that's easy enough, just let the user configure a storage location with a custom string GUC (see DefineCustomStringVariable).
Anyway, you can find the definition of the data_directory GUC in src/backend/utils/misc/guc.c. There you'll see that it's mapped to a global char * data_directory. A quick:
$ git grep data_directory src/include
src/include/utils/guc.h:extern char *data_directory;
shows that the extern for it is in guc.h. So you can access it with:
#include "utils/guc.h"
then using the data_directory global. But you shouldn't. Don't do this. Define your own custom GUC specifying a separate image store location.
While we're at it. C++.
PostgreSQL is a C application that uses longjmp based error handling. This goes with C++ exception handling about as well as a cigarette in an oxygen tent. C++ exceptions that cross PG_TRY boundaries will completely mangle the error stack. PostgreSQL error traps that cross a C++ stack subject to unwinding (any stack objects with destructors, try/catch blocks, etc) will completely mangle the C++ state.
It's possible, but hard, to combine PostgreSQL and C++. You have to isolate each piece via an interface of pure C code, and be paranoid about catching and translating error conditions at all boundaries.
If at all possible, you should instead just use C, or at least encapsulate your C++ code into a separate library that never includes PostgreSQL headers and never calls back into PostgreSQL code. The documentation discusses this in a little more detail.
SELECT current_setting('data_directory');
This will show you the fully qualified path to the data directory. I'm not clear why you need this, because mucking around in the data files is usually a good way to corrupt your database. Anyway, I think this is the answer you're looking for.
I am working on a game, and one of the requirements per the licence agreement of the sound assets I am using is that they be distributed in a way that makes them inaccessible to the end user. So, I am thinking about aggregating them into a flat file, encrypting them, or some such. The problem is that the sound library I am using (Hekkus Sound System) only accepts a 'char*' file path and handles file reading internally. So, if I am to continue to use it, I will have to override the c stdio file functions to handle encryption or whatever I decide to do. This seems doable, but it worries me. Looking on the web I am seeing people running into strange frustrating problems doing this on platforms I am concerned with(Win32, Android and iOS).
Does there happen to be a cross-platform library out there that takes care of this? Is there a better approach entirely you would recommend?
Do you have the option of using a named pipe instead of an ordinary file? If so, you can present the pipe to the sound library as the file to read from, and you can decrypt your data and write it to the pipe, no problem. (See Beej's Guide for an explanation of named pipes.)
Override stdio in a way that a lib you not knowing how it works exactly works in a way the developer hasn't in mind do not look like the right approach for me, as it isn't really easy. Implement a ramdrive needs so much effort that I recommend to search for another audio lib.
The Hekkus Sound System I found was build by a single person and last updated 2012. I wouldn't rely on a lib with only one person working on it without sharing the sources.
My advice, invest your time in searching for a proper sound lib instead of searching for a fishy work around for this one.
One possibility is to use a encrypted loopback filesystem (google for additional resources).
The way this works is that you put your assets on a encrypted filesystem, which actually lives in a simple file. This filesystem gets mounted someplace as a loopback device. Password needs to be supplied at attach / mount time. Once mounted, all files are available as regular files to your software. But otherwise, the files are encrypted and inaccessible.
It's compiler-dependent and not a guaranteed feature, but many allow you to embed files/resources directly into the exe and read them in your code as if from disk. You could embed your sound files that way. It will significantly increase the size of your exe however.
Another UNIX-based approach:
The environment variable LD_PRELOAD can be used to override any shared library an executable has been linked against. All symbols exported by a library mentioned in LD_PRELOAD are resolved to that library, including calls to libc functions like open, read, and close. Using the libdl, it is also possible for the wrapping library to call through to the original implementation.
So, all you need to do is to start the process which uses the Hekkus Sound System in an environment that has LD_PRELOAD set appropriately, and you can do anything you like to the file that it reads.
Note, however, that there is absolutely no way that you can keep the data inaccessible from the user: the very fact that he has to be able to hear it means he has to have access. Even if all software in the chain would use encryption, and your user is not willing to hack hardware, it would not be exactly difficult to connect the audio output jack with an audio input jack, would it? And you can't forbid you user to use earphones, can you? And, of course, the kernel can see all audio output unencrypted and can send a copy somewhere else...
The solution to your problem would be a ramdisk.
http://en.wikipedia.org/wiki/RAM_drive
Using a piece of memory in ram as if it was a disk.
There is software available for this too. Caching databases in ram is becoming popular.
And it keeps the file from being on the disk that would make it easy accessible to the user.
Is there any way in C++ or Java or Python that would allow me to save the state of my program, no questions asked? For example, I've spent an hour learning how to save a tree-like structure into a file. Very educative but I feel I could just do:
saveState(file);
And the "file" would contain whole memory my program uses. Just like operating system's "hibernate" or "suspend-to-disk" feature. I know about boost serialization, this is probably not what I'm looking for.
What you most likely want is what we call serialization or object marshalling. There are a whole butt load of academic problems with data/object serialization that you can easily google.
That being said given the right library (probably very native) you could do a true snapshot of your running program similarly what "OS specific hibernate" does. Here is an SO answer for doing that on Linux: https://stackoverflow.com/a/12190830/318174
To do the above snapshot-ing though you will most likely need an external process from the process you want to save. I highly recommend you don't that. Instead read/lookup in your language of choice (btw welcome to SO, don't tag every language... that pisses people off) how to do serialization or object marshalling... hint... most people these days pick JSON.
I think that what you describe would be a feature that few people would actually want to use for a real system. Usually you want to save something so it can be transmitted, or so you can stop running the program, or guard against the possibility that the program quits (or power fails).
In most production systems one wants to make the writes to disk small and incremental so that the system can remain responsive, and writing inconsistent data can be avoided. Writing ALL memory to disk on a regular basis would probably result in lots of non-responsive time. You would need to lock the entire system to avoid inconsistent state.
Writing your own persistence is tedious and error prone however so you may find this SO question of interest: Persisting graph data (Java)
There are a couple of frameworks around this. Check out Google Protocol Buffers if you need support for Java, Python, and C++ https://developers.google.com/protocol-buffers/ I've used it in some projects and it works well.
There's also Thrift (orginally from Facebook) http://thrift.apache.org/ I don't have any experience with it though.
Another option is what #QuentinUK suggests. Use a class that inherits from something streamable and/or make streamable operators/functions.
I'd use a framework.
Here's your problem:
http://en.wikipedia.org/wiki/Address_space_layout_randomization
Back in ancient history (16-bit DOS programs with extenders), compilers used to support "based" pointers which stored relative addresses. These were safe to serialize en masse. And applications did so, saving both code and data, the serialized modules were called "overlays".
Today, you'd need based pointer support in your toolchain (resulting in every pointer access requiring an extra adjustment), or else to go through all the data, distinguishing the pointers from the other data (how?) and adjusting them to their new storage location, in case the OS already loaded some library at the same address your old program had used for its heap. In modern "managed" environments, where pointers already have to be identified for the garbage collector, this is feasible even if not commonly done. In native code, it's very difficult, although that metadata is created to enable relocation of shared libraries.
So instead people end up walking their entire data structures manually, and converting object links (pointers) into something that can be restored on the other end, even though the object has a new address (again, because the old address may have been used for a shared library).
Note that many processors have features to support based addressing... and that since based addressing is no longer common, compilers went ahead and used those pointer arithmetic features to speed up user code.
Yes, derive objects from a streamable class and add the streaming functions. Then you can stream everything to disk. You will need a library for this such as MFC.
now I am writing a app in C++, and currently my app reads models or parameters from several data files. Those files, i.e. self-define dictionary, are currently stored in plain text and to be loaded dynamically by C++ while runtime.
Yet, I don't want those files to be easily seen by my client while they get the released application, so I need to encrypt the file first. What's the general practice for this situation?
And those file are huge in size, so compile to a resource file is not a good option.
Actually I just need a simple 'encryption', at least not plain text stored in released version. And I dont want the encryption libraries which will load the whole file into the memory first in order to perform decryption, since the files are huge and no need to load its whole body into memory at one time.
Thanks!
Usually when you want to deal with encryption in C++ people tend to go for Open SSL libraries which encompass all of the functionality in a pretty standard way.
You'd have to get yourself a copy of the library and some code samples, but it's a pretty common thing and there's lots of documentation around.
I was wondering if there is a (c++ or D) serialisation library that also provides technics to locate a certain object (which was saved to disk) based to certain criterias like a certain attribute combination.
I know about sqlite and mySQL and so own, but I search for an alternative.
Because those databases are not bound to a specific database schema,one known at compile time (of the library), they are only that good.
A library that knows the structures at compile time can be optimized to a very big degree for that structure.
Maybe there is even a library creator, as input you give your c++ classes and desired location/identity attributes and as output you get a serialization/database-library that is heavily optimized for locating the objects based on your needs.
Additionally, I think sqlite and similar is overpowered for my use as I don’t need all the sql features, only the locating of an object based on its attributes..
Greetings,
--Marenz
I have a work-in-progress Serialization lib that I've been working on. (Some commentary here.) It's not done but it works, it just lacks a lot of polish and "convenience" features. If you are interested in using it, I'd be interested in feedback and feature requests.
As for DB storage, I'd just go with SQLite or MySQL. From what I've read, the query optimizers for DBs use more information than you can have a compile time (they look at data distributions and such).
OTOH I have been thinking of making a compile time SQL engine that uses meta-programming to build query plans at compile time. I've got a few other projects that I'll need to get done first though (like a file_malloc to allocate space in a file.)