Server API PostgreSQL: how to find the path to the database? - c++

OK, maybe doing wrong. But want to teach PostgreSQL to distribute the files with extension functions in C++. The project works in the local network and may by about ten connections. I do not want to have FTP or other external solutions for storing images. I do not want to store images in the database. I would like so: fs_select_file(id), fs_insert_file(id, escaped_bytea), fs_delete_file(id), e.g. SELECT id, name, fs_select_file(id) as escaped_bytea FROM .... But can't find how to determine the path of the current database to use the [PGDATA]/files.

Don't mess with the datadir.
Even if you think it's safe to create your own contents within it, it probably isn't. Tools like pg_basebackup won't expect to see it.
Store your own content outside the PostgreSQL data directory. If you're working with a C extension that's easy enough, just let the user configure a storage location with a custom string GUC (see DefineCustomStringVariable).
Anyway, you can find the definition of the data_directory GUC in src/backend/utils/misc/guc.c. There you'll see that it's mapped to a global char * data_directory. A quick:
$ git grep data_directory src/include
src/include/utils/guc.h:extern char *data_directory;
shows that the extern for it is in guc.h. So you can access it with:
#include "utils/guc.h"
then using the data_directory global. But you shouldn't. Don't do this. Define your own custom GUC specifying a separate image store location.
While we're at it. C++.
PostgreSQL is a C application that uses longjmp based error handling. This goes with C++ exception handling about as well as a cigarette in an oxygen tent. C++ exceptions that cross PG_TRY boundaries will completely mangle the error stack. PostgreSQL error traps that cross a C++ stack subject to unwinding (any stack objects with destructors, try/catch blocks, etc) will completely mangle the C++ state.
It's possible, but hard, to combine PostgreSQL and C++. You have to isolate each piece via an interface of pure C code, and be paranoid about catching and translating error conditions at all boundaries.
If at all possible, you should instead just use C, or at least encapsulate your C++ code into a separate library that never includes PostgreSQL headers and never calls back into PostgreSQL code. The documentation discusses this in a little more detail.

SELECT current_setting('data_directory');
This will show you the fully qualified path to the data directory. I'm not clear why you need this, because mucking around in the data files is usually a good way to corrupt your database. Anyway, I think this is the answer you're looking for.

Related

Precompile script into objects inside C++ application

I need to provide my users the ability to write mathematical computations into the program. I plan to have a simple text interface with a few buttons including those to validate the script grammar, save etc.
Here's where it gets interesting. These functions the user is writing need to execute at multi-megabyte line speeds in a communications application. So I need the speed of a compiled language, but the usage of a script. A fully interpreted language just won't cut it.
My idea is to precompile the saved user modules into objects at initialization of the C++ application. I could then use these objects to execute the code when called upon. Here are the workflows I have in mind:
1) Testing(initial writing) of script: Write code in editor, save, compile into object (testing grammar), run with test I/O, Edit Code
2) Use of Code (Normal operation of application): Load script from file, compile script into object, Run object code, Run object code, Run object code, etc.
I've looked into several off the shelf interpreters, but can't find what I'm looking for. I considered JAVA, as it is pretty fast, but I would need to load the JAVA virtual machine, which means passing objects between C and the virtual machine... The interface is the bottleneck here. I really need to create a native C++ object running C++ code if possible. I also need to be able to run the code on multiple processors effectively in a controlled manner.
I'm not looking for the whole explanation on how to pull this off, as I can do my own research. I've been stalled for a couple days here now, however, and I really need a place to start looking.
As a last resort, I will create my own scripting language to fulfill the need, but that seems a waste with all the great interpreters out there. I've also considered taking an existing open source complier and slicing it up for the functionality I need... just not saving the compiled results to disk... I don't know. I would prefer to use a mainline language if possible... but that's not required.
Any help would be appreciated. I know this is not your run of the mill idea I have here, but someone has to have done it before.
Thanks!
P.S.
One thought that just occurred to me while writing this was this: what about using a true C compiler to create object code, save it to disk as a dll library, then reload and run it inside "my" code? Can you do that with MS Visual Studio? I need to look at the licensing of the compiler... how to reload the library dynamically while the main application continues to run... hmmmmm I could then just group the "functions" created by the user into library groups. Ok that's enough of this particular brain dump...
A possible solution could be use gcc (MingW since you are on windows) and build a DLL out of your user defined code. The DLL should export just one function. You can use the win32 API to handle the DLL (LoadLibrary/GetProcAddress etc.) At the end of this job you have a C style function pointer. The problem now are arguments. If your computation has just one parameter you can fo a cast to double (*funct)(double), but if you have many parameters you need to match them.
I think I've found a way to do this using standard C.
1) Standard C needs to be used because when it is compiled into a dll, the resulting interface is cross compatible with multiple compilers. I plan to do my primary development with MS Visual Studio and compile objects in my application using gcc (windows version)
2) I will expose certain variables to the user (inputs and outputs) and standardize them across units. This allows multiple units to be developed with the same interface.
3) The user will only create the inside of the function using standard C syntax and grammar. I will then wrap that function with text to fully define the function and it's environment (remember those variables I intend to expose?) I can also group multiple functions under a single executable unit (dll) using name parameters.
4) When the user wishes to test their function, I dump the dll from memory, compile their code with my wrappers in gcc, and then reload the dll into memory and run it. I would let them define inputs and outputs for testing.
5) Once the test/create step was complete, I have a compiled library created which can be loaded at run time and handled via pointers. The inputs and outputs would be standardized, so I would always know what my I/O was.
6) The only problem with standardized I/O is that some of the inputs and outputs are likely to not be used. I need to see if I can put default values in or something.
So, to sum up:
Think of an app with a text box and a few buttons. You are told that your inputs are named A, B, and C and that your outputs are X, Y, and Z of specified types. You then write a function using standard C code, and with functions from the specified libraries (I'm thinking math etc.)
So now your done... you see a few boxes below to define your input. You fill them in and hit the TEST button. This would wrap your code in a function context, dump the existing dll from memory (if it exists) and compile your code along with any other functions in the same group (another parameter you could define, basically just a name to the user.) It then runs the function using a functional pointer, using the inputs defined in the UI. The outputs are sent to the user so they can determine if their function works. If there are any compilation errors, that would also be outputted to the user.
Now it's time to run for real. Of course I kept track of what functions are where, so I dynamically open the dll, and load all the functions into memory with functional pointers. I start shoving data into one side and the functions give me the answers I need. There would be some overhead to track I/O and to make sure the functions are called in the right order, but the execution would be at compiled machine code speeds... which is my primary requirement.
Now... I have explained what I think will work in two different ways. Can you think of anything that would keep this from working, or perhaps any advice/gotchas/lessons learned that would help me out? Anything from the type of interface to tips on dynamically loading dll's in this manner to using the gcc compiler this way... etc would be most helpful.
Thanks!

Local/dedicated sql database for a program

I have a program in C++ under development, which handles a lot of different types of objects which need to be retrieved from files on start-up. Total file size is a little less than 1GB. The program must run at least on Windows 7 and Linux.
My goal is to provide the functionality for reading from files, and storing them into objects (including data validation), so the solutions I can think of are:
simply provide at least one function per each object type to read from files, and copy the data to newly created objects. This is what I've been doing so far.
Use a database to store data and retrieve it using SQL.
I would like to switch to the second option, however I have a few limitations:
I am not allowed to connect to the internet with my program, so a remote server database is out of the question.
I am not allowed to make a program that forces the user to install additional software, in this case, a local database.
Access to the local data can be done only by running the program: no additional executables that the user can run to modify or read the files.
I have searched the internet for a c++ library that allows a program to store/retrieve specific data to local files, using SQL syntax, but so far I've got nothing. Database software I know, like MySQL, Postgresql etc. won't do because of the restrictions I mentioned.
My main question is: is there such library for c++, or do I have to implement my own?
I would prefer it were free, but I can do with anything I can find.
If there are no libraries like this, does anyone know any other alternatives for storing data locally for a c++ program? And if I were to implement my own, what way (as in file data structures) is there to store my objects (without it being extremely slow for less than 1GB of data)?
SQLite fits all your requirements except the last one, that prohibit users from modifying the data outside your program. Unfortunately, this one would be tough to overcome: a sufficiently proficient and determined user will always be able to break this restriction, even if you store your data in plain files of custom format.
Everything else fits perfectly: the database lives in a file, does not need a network connection, and can be installed as part of your program (it's a library that you link to your code).

The best way to handle config in large C++ project

In order to start my C++ program, I need to read some configs, e.g. ip address, port number, file paths... These settings may change quite frequently (every week or everyday!), so hardcoding them into source files is not a good idea.
After some research, I'm confused about whether there is a best practice to load config settings from a file and made those configs available to other class/module/*.cpp in the same project.
static is bad; singleton is bad (an anti-pattern?) So, what other options do we have? Or, maybe the idea of "config file" is wrong?
EDIT: I have no problem of loading the config file. I'm worried about, after loading all those settings into a std::map< string, string > in memory, how to let other classes, functions access those settings.
EDIT 2: Thanks for everybody's input. I know these patterns that I listed here are FINE, and they are used by lots of programs. I'm curious about whether there is a (sort of) BEST pattern to handle configurations of a program.
Arguably, a configuration file is a legitimate use for a Singleton. The Singleton pattern is usually frowned upon because Singletons cause problems with race conditions in a multi-threaded environment, and since they're globally accessible, you run into the same problems you have with globals. But if your Singleton object is initialized once when you read in the config file, and never altered after that, I can't think of a legitimate reason to call it an "anti-pattern" other than some sort of cargo-cult mentality.
That being said, when I need to make a configuration file available as an object to my application, I don't use a Singleton. Usually I pass the configuration object around to those objects/functions which need it.
The best pattern I know of solving this is through an options class, that gets injected into your code on creation/configuration.
Steps:
create an options parser class
configure the parser on what parameters and options it should accept, and their default values (default values can be your "most probable" defaults)
write client code to accept options as parameters (instead of singleton and/or static stuff).
inject options when creating objects.
Have a look at boost.program_options for an already mature module for program options.
If you're familiar with python, have a look at the examples in the doc of argparse (same concept, implemented in python library). They are very easy to get the concept and interactions from.

hibernate-like saving state of a program

Is there any way in C++ or Java or Python that would allow me to save the state of my program, no questions asked? For example, I've spent an hour learning how to save a tree-like structure into a file. Very educative but I feel I could just do:
saveState(file);
And the "file" would contain whole memory my program uses. Just like operating system's "hibernate" or "suspend-to-disk" feature. I know about boost serialization, this is probably not what I'm looking for.
What you most likely want is what we call serialization or object marshalling. There are a whole butt load of academic problems with data/object serialization that you can easily google.
That being said given the right library (probably very native) you could do a true snapshot of your running program similarly what "OS specific hibernate" does. Here is an SO answer for doing that on Linux: https://stackoverflow.com/a/12190830/318174
To do the above snapshot-ing though you will most likely need an external process from the process you want to save. I highly recommend you don't that. Instead read/lookup in your language of choice (btw welcome to SO, don't tag every language... that pisses people off) how to do serialization or object marshalling... hint... most people these days pick JSON.
I think that what you describe would be a feature that few people would actually want to use for a real system. Usually you want to save something so it can be transmitted, or so you can stop running the program, or guard against the possibility that the program quits (or power fails).
In most production systems one wants to make the writes to disk small and incremental so that the system can remain responsive, and writing inconsistent data can be avoided. Writing ALL memory to disk on a regular basis would probably result in lots of non-responsive time. You would need to lock the entire system to avoid inconsistent state.
Writing your own persistence is tedious and error prone however so you may find this SO question of interest: Persisting graph data (Java)
There are a couple of frameworks around this. Check out Google Protocol Buffers if you need support for Java, Python, and C++ https://developers.google.com/protocol-buffers/ I've used it in some projects and it works well.
There's also Thrift (orginally from Facebook) http://thrift.apache.org/ I don't have any experience with it though.
Another option is what #QuentinUK suggests. Use a class that inherits from something streamable and/or make streamable operators/functions.
I'd use a framework.
Here's your problem:
http://en.wikipedia.org/wiki/Address_space_layout_randomization
Back in ancient history (16-bit DOS programs with extenders), compilers used to support "based" pointers which stored relative addresses. These were safe to serialize en masse. And applications did so, saving both code and data, the serialized modules were called "overlays".
Today, you'd need based pointer support in your toolchain (resulting in every pointer access requiring an extra adjustment), or else to go through all the data, distinguishing the pointers from the other data (how?) and adjusting them to their new storage location, in case the OS already loaded some library at the same address your old program had used for its heap. In modern "managed" environments, where pointers already have to be identified for the garbage collector, this is feasible even if not commonly done. In native code, it's very difficult, although that metadata is created to enable relocation of shared libraries.
So instead people end up walking their entire data structures manually, and converting object links (pointers) into something that can be restored on the other end, even though the object has a new address (again, because the old address may have been used for a shared library).
Note that many processors have features to support based addressing... and that since based addressing is no longer common, compilers went ahead and used those pointer arithmetic features to speed up user code.
Yes, derive objects from a streamable class and add the streaming functions. Then you can stream everything to disk. You will need a library for this such as MFC.

C++: Any way to 'jail function'?

Well, it's a kind of a web server.
I load .dll(.a) files and use them as program modules.
I recursively go through directories and put '_main' functors from these libraries into std::map under name, which is membered in special '.m' files.
The main directory has few directories for each host.
The problem is that I need to prevent usage of 'fopen' or any other filesystem functions working with directory outside of this host directory.
The only way I can see for that - write a warp for stdio.h (I mean, write s_stdio.h that has a filename check).
May be it could be a deamon, catching system calls and identifying something?
add
Well, and what about such kind of situation: I upload only souses and then compile it directly on my server after checking up? Well, that's the only way I found (having everything inside one address space still).
As C++ is low level language and the DLLs are compiled to machine code they can do anything. Even if you wrap the standard library functions the code can do the system calls directly, reimplementing the functionality you have wrapped.
Probably the only way to effectively sandbox such a DLL is some kind of virtualisation, so the code is not run directly but in a virtual machine.
The simpler solution is to use some higher level language for the loadable modules that should be sandboxed. Some high level languages are better at sandboxing (Lua, Java), other are not so good (e.g. AFAIK currently there is no official restricted environment implemented for Python).
If you are the one loading the module, you can perform a static analysis on the code to verify what APIs it calls, and refuse to link it if it doesn't check out (i.e. if it makes any kind of suspicious call at all).
Having said that, it's a lot of work to do this, and not very portable.