Accessing Data From Another Application - c++

I currently have a server (called worldserver) that, when it starts, pulls data from multiple sources (mainly database), and every 5 minutes saves the modified data back into the database.
Since it is still in development, I am wondering - can I somehow load all the data the worldserver needs to load when it starts into a separate background application, and have the main server load data from there?
Here's my logic:
Having two different applications, one of them that stores data only (background application), and another that pulls data from there and updates the database every 5 minutes (worldserver).
I want to kill the data read between worldserver and the database, and make it read data from the background application (which realistically never needs to crash as all it does is store data and gather dust on the machine's RAM).
The reason I am thinking about it is because RAM is the fastest memory storage on a PC, and can optimize the server loading time significantly.
Is such a thing possible? If that IS in fact possible, is it recommended? I seek maximum optimization and data loss prevention as much as possible.
I tried searching online how to make this thing possible but encountered a Stackoverflow thread that says you cannot read data from multiple processes as the OS system restricts it.

Related

Flink RocksDB Performance issues

I have a flink job (scala) that is basically reading from a kafka-topic (1.0), aggregating data (1 minute event time tumbling window, using a fold function, which I know is deprecated, but is easier to implement than an aggregate function), and writing the result to 2 different kafka topics.
The question is - when I'm using a FS state backend, everything runs smoothly, checkpoints are taking 1-2 seconds, with an average state size of 200 mb - that is, until the state size is increasing (while closing a gap, for example).
I figured I would try rocksdb (over hdfs) for checkpoints - but the throughput is SIGNIFICANTLY less than fs state backend. As I understand it, flink does not need to ser/deserialize for every state access when using fs state backend, because the state is kept in memory (heap), rocks db DOES, and I guess that is what is accounting for the slowdown (and backpressure, and checkpoints take MUCH longer, sometimes timeout after 10 minutes).
Still, there are times that the state cannot fit in memory, and I am trying to figure out basically how to make rocksdb state backend perform "better".
Is it because of the deprecated fold function? Do I need to fine tune some parameters that are not easily searchable in documentation? any tips?
Each state backend holds the working state somewhere, and then durably persists its checkpoints in a distributed filesystem. The RocksDB state backend holds its working state on disk, and this can be a local disk, hopefully faster than hdfs.
Try setting state.backend.rocksdb.localdir (see https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/state/state_backends.html#rocksdb-state-backend-config-options) to somewhere on the fastest local filesystem on each taskmanager.
Turning on incremental checkpointing could also make a large difference.
Also see Tuning RocksDB.

Writing to a SQLite database from multiple threads [duplicate]

This question already has answers here:
Multiple SQLite database instances open at the same time on different Threads (QT)
(2 answers)
Closed 7 years ago.
I am writing a QT application which monitors some statistics. In the main window you choose multiple (or just one) items and open a graph window for them. Each item is polled from a different thread.
Every time I get data, it is written in a SQLite database, but I have encountered a problem:
I am making this application on a computer with a SSD drive and it runs OK, but when I run it on a computer with a HDD the application crashes (the crash happens in the QT sqlite dll file - qsqlite.dll). I googled the problem and have found many people saying that sqlite is not recommended for such usage. I have also used QMutex to lock/unlock the sections when I write to the DB, but unfortunately it does not fix the problem.
Is there a way that I can use sqlite for this or must I look for a different database such as postgres ?
Thank you for your time !
I've never been a fan of multiple database connections within one application.
In your situation, I would look to implement a queue (FIFO) for all the write-SQL statements and send them to the database through only one connection. As you will be writing to the queue from several threads you will have to use mutexes etc to protect the write to the queue.
This way, the only thread contention is in your code and the SQLite drivers don't have to work too hard.
I would also consider caching the data from the read-queries so you are only requesting data that you don't have already. That is, request, say, the most recent 100 values and keep them in an array or list, then when you next request values, only request the values from the last one you have in the array.

Buffering to the hard disk

I am receiving a large quantity of data at a fixed rate. I need to do some processing on this data on a different thread, but this may run slower than the data is coming in, so I need to buffer the data. Due to the quantity of data coming in the available RAM would be quickly exhausted, so it needs to overflow onto the hard disk. What I could do with is something like a filesystem-backed pipe, so the writer could be blocked by the filesystem, but not by the reader running too slowly.
Here's a rough set of requirements:
Writing should not be blocked by the reader running too slowly.
If data is read slow enough that the available RAM is exhausted it should overflow to the filesystem. It's ok for writes to the disk to block.
Reading should block if no data is available unless the stream has been closed by the writer.
If the reader is able to keep up with the data then it should never hit the hard disk as the RAM buffer would be sufficient (nice but not essential).
Disk space should be recovered as the data is consumed (or soon after).
Does such a mechanism exist in Windows?
This looks like a classic message queue. Did you consider MSMQ or similar? MSMQ has all the properties you are asking for. You may want to use direct addressing to avoid Active Directory http://msdn.microsoft.com/en-us/library/ms700996(v=vs.85).aspx and use local or TCP/IP queue address.
Use an actual file. Write to the file as the data is received, and in another process read the data from the file and process it.
You even get the added benefits of no multithreading.

Berkeley DB: DbEnv::lsn_reset takes a very long time

I'm using Berkeley DB with a probably relatively large database file (2.1 GiB, using btree format in case it matters). During application shutdown, DbEnv::lsn_reset is called in order to "flush" everything before exiting the application. For the large database, this routine takes a very long time for me -- 10 minutes or so at least, during which heavy disk access happens.
Is this normal or the result of using Berkeley DB in some wrong way? Is there anything that can be done to make things process faster? In particular, which parameters could be tweaked to improve performance here?
DbEnv::lsn_reset() is probably not what you want. That function rewrites every single page in the database, so that you can close the databases out and open them in a different environment. It's going to write out at least 2.1 GiB, and pretty slowly.
If you're just shutting the application down to be started back up sometime later, you may simply just want to do a DbEnv::txn_checkpoint() to flush the database log and insert a checkpoint record. Though, this isn't required either. As long as you have the logs committed to stable storage, you can simply exit your application.
http://docs.oracle.com/cd/E17276_01/html/api_reference/CXX/txncheckpoint.html

MySQL++, storing realtime data

Firstly I'm an engineer, not a computer scientist, so please be gentle.
I currently have a C++ program which uses MySQL++. The program also incorporates the NI Visa runtime. One of the interrupt handlers receives data (1 byte) from a USB device about 200 times a second. I would like to store this data with a time stamp on each sample on a remote server. Is this feasible? Can anyone recommend a good approach?
Regards,
Michael
I think that performing 200 transactions/second against a remote server is asking a lot, especially when you consider that these transactions would be occurring in the context of an interrupt handler which has to do its job and get done quickly. I think it would be better to decouple your interrupt handler from your database access - perhaps have the interrupt handler store the incoming data and timestamp into some sort of in-memory data structure (array, circular linked list, or whatever, with appropriate synchronization) and have a separate thread that waits until data is available in the data structure and then pumps it to the database. I'd want to keep that interrupt handler as lean and deterministic as possible, and I'm concerned that database access across the network to a remote server would be too slow - or worse, would be OK most of the time, but sometimes would go to h*ll for no obvious reason.
This, of course, raises the question/problem of data overrun, where data comes in faster than it can be pumped to the database and the in-memory storage structure fills up. This could cause data loss. How bad a thing is it if you drop some samples?
I don't think you'll be able to maintain that speed with 1 separate insert per value, but if you batched them up into large enough batches you could send it all as one query and it should be fine.
INSERT INTO records(timestamp, value)
VALUES(1, 2), (3, 4), (5, 6), [...], (399, 400);
Just push the timestamp and value onto a buffer, and when the buffer hits 200 in size (or some other arbitrary figure), generate the SQL and send the whole lot off. Building this string up with sprintf shouldn't be too slow. Just beware of reading from a data structure that your interrupt routine might be writing to at the same time.
If you find that this SQL generation is too slow for some reason, and there's no quicker method using the API (eg. stored procedures), then you might want to run this concurrently with the data collection. Simplest is probably to stream the data across a socket or pipe to another process that performs the SQL generation. There are also multithreading approaches but they are more complex and error-prone.
In my opinion, you should do two things: 1. buffer the data and 2. one time stamp per buffer. The USB protocol is not byte based and more message based. If you are tracking messages, then time stamp the messages.
Also, databases would rather receive blocks or chunks of data than one byte at a time. There is overhead in the database with each transaction. To measure the efficiency, divide the overhead by the number of bytes in the transaction. You'll see that large blocks are more efficient than lots of little transactions.
Another option is to store the data into a file, then use the MySQL LOADFILE function to load the data into the the database. Also, there is storing the data into a buffer then using the MySQL C++ connector stream to load the data into the database.
multi-threading doesn't guarantee being any faster than apartment even if you cached it correctly on server side unless there was some strange cpu priority preference. What about using shaders and letting the pass by reference value in windows.h be the time stamp