How to limit database flushing to disk? - c++

I am using sqlite database in my arm9 embedded linux platform. I want to reduce writes to disk database because my disk is a flash memory and it needs minimum write cycles. So I tried to increment SQLITE_DEFAULT_CACHE_SIZE as 5000. My objective was to write data to cache and when the cache is filled, automatically flush to disk. But by incrementing SQLITE_DEFAULT_CACHE_SIZE, I can't confirm whether this is working or not. I am not seeing any changes in the operations! Is my way correct? Can anybody give me some suggestions?
Thanks
Aneesh

SQLite to be ACID db flushes with every commit OR with every insert/delete/update not wrapped with transaction. Use transactions for grouping operations OR turn OFF ACIDity and set PRAGMA synchronous=OFF.
"PRAGMA synchronous = OFF" and SQLite won't flush data at all (effectively leaving that to OS Cache)
SQLITE_DEFAULT_CACHE_SIZE is ONLY for size of cache. And cache is used ONLY for reading data.
There is another option - you can implement own VFS layer and prevent page saving at all before your own buffer will be full. http://www.sqlite.org/c3ref/vfs.html
But I'm sure that sync=off (or much better to use transactions) will do the job good enough (while having a good chance to corrupt your db in case power failures or hard reset for sync=off).
Another hint is to place JOURNAL in memory or turn it off completely. Again - it's turning off acidity, but that also removes some disk touches.

The latest SQLite has a feature for backing up hot databases it's still experimental, but my recommendation would be to use an on Memory Database and merge it with the Disk database when you think is appropriate.

Ok Neil .If the "SQLite is using a form of write-through caching" then on the cache overflow, it will try to flush the Data to some temporary file or disk file .This is the Same point i am trying to experiment by by enalrging the cache size and thus acquire control over flushing rate .but it is not happening.please reply.

You have the source code to SQLite - why not simply instrument it to record the information you are interested in.

Related

Buffering to the hard disk

I am receiving a large quantity of data at a fixed rate. I need to do some processing on this data on a different thread, but this may run slower than the data is coming in, so I need to buffer the data. Due to the quantity of data coming in the available RAM would be quickly exhausted, so it needs to overflow onto the hard disk. What I could do with is something like a filesystem-backed pipe, so the writer could be blocked by the filesystem, but not by the reader running too slowly.
Here's a rough set of requirements:
Writing should not be blocked by the reader running too slowly.
If data is read slow enough that the available RAM is exhausted it should overflow to the filesystem. It's ok for writes to the disk to block.
Reading should block if no data is available unless the stream has been closed by the writer.
If the reader is able to keep up with the data then it should never hit the hard disk as the RAM buffer would be sufficient (nice but not essential).
Disk space should be recovered as the data is consumed (or soon after).
Does such a mechanism exist in Windows?
This looks like a classic message queue. Did you consider MSMQ or similar? MSMQ has all the properties you are asking for. You may want to use direct addressing to avoid Active Directory http://msdn.microsoft.com/en-us/library/ms700996(v=vs.85).aspx and use local or TCP/IP queue address.
Use an actual file. Write to the file as the data is received, and in another process read the data from the file and process it.
You even get the added benefits of no multithreading.

Berkeley DB: DbEnv::lsn_reset takes a very long time

I'm using Berkeley DB with a probably relatively large database file (2.1 GiB, using btree format in case it matters). During application shutdown, DbEnv::lsn_reset is called in order to "flush" everything before exiting the application. For the large database, this routine takes a very long time for me -- 10 minutes or so at least, during which heavy disk access happens.
Is this normal or the result of using Berkeley DB in some wrong way? Is there anything that can be done to make things process faster? In particular, which parameters could be tweaked to improve performance here?
DbEnv::lsn_reset() is probably not what you want. That function rewrites every single page in the database, so that you can close the databases out and open them in a different environment. It's going to write out at least 2.1 GiB, and pretty slowly.
If you're just shutting the application down to be started back up sometime later, you may simply just want to do a DbEnv::txn_checkpoint() to flush the database log and insert a checkpoint record. Though, this isn't required either. As long as you have the logs committed to stable storage, you can simply exit your application.
http://docs.oracle.com/cd/E17276_01/html/api_reference/CXX/txncheckpoint.html

How to determine whether data has been retrieved from disk or from caches?

I have written a program in C/C++ which needs to fetch data from the disk. After some time it so happens that the operating system stores some of the data in its caches. Is there some way by which I may figure out in a C/c++ programs whether the data has been retrieved from the caches or the data has been retrieved from the disk?
A simple solution would be to time the read operation. Disk reads are significantly slower. you can read a a group of file blocks (4K) twice to get an estimate.
The problem is that if you run the program again or copy the file in a shell, the OS will cache it.

File durability settings

I'm working on a ACID database software product and I have some questions about file durability on WinOS.
CreateFile has two attributes, FILE_FLAG_WRITE_THROUGH and FILE_FLAG_NO_BUFFERING - do I need both these to achieve file durability (ie. override all kinds of disk or OS file caching)? I'm asking since they seem to do the same thing, and setting FILE_FLAG_NO_BUFFERING causes WriteFile to throw an ERROR_INVALID_PARAMETER error.
FILE_FLAG_NO_BUFFERING specifies no caching at al. No read nor write cache all data goes directly to and from your application to disk. This is mostly usefull if you read such large chunks that caching is useless or you do your own caching. Note WhozCraig's comment on properly aligning your data when using this flag.
FILE_FLAG_WRITE_THROUGH only means that writes should written directly to disk before the function returns. This is enough to achieve ACID while it still gives the option to the OS to cache data from the file.
Using FlushFileBuffers() can provide a more efficient approach for achieving ACID as you can do several writes to a file and then flush them in one go. Combining writes in one flush is very important as non cached writes will limit you to the spindle speed of your harddrive. 120 non cached writes or flushes per second max for a 7200 rpm disk.

MySQL++, storing realtime data

Firstly I'm an engineer, not a computer scientist, so please be gentle.
I currently have a C++ program which uses MySQL++. The program also incorporates the NI Visa runtime. One of the interrupt handlers receives data (1 byte) from a USB device about 200 times a second. I would like to store this data with a time stamp on each sample on a remote server. Is this feasible? Can anyone recommend a good approach?
Regards,
Michael
I think that performing 200 transactions/second against a remote server is asking a lot, especially when you consider that these transactions would be occurring in the context of an interrupt handler which has to do its job and get done quickly. I think it would be better to decouple your interrupt handler from your database access - perhaps have the interrupt handler store the incoming data and timestamp into some sort of in-memory data structure (array, circular linked list, or whatever, with appropriate synchronization) and have a separate thread that waits until data is available in the data structure and then pumps it to the database. I'd want to keep that interrupt handler as lean and deterministic as possible, and I'm concerned that database access across the network to a remote server would be too slow - or worse, would be OK most of the time, but sometimes would go to h*ll for no obvious reason.
This, of course, raises the question/problem of data overrun, where data comes in faster than it can be pumped to the database and the in-memory storage structure fills up. This could cause data loss. How bad a thing is it if you drop some samples?
I don't think you'll be able to maintain that speed with 1 separate insert per value, but if you batched them up into large enough batches you could send it all as one query and it should be fine.
INSERT INTO records(timestamp, value)
VALUES(1, 2), (3, 4), (5, 6), [...], (399, 400);
Just push the timestamp and value onto a buffer, and when the buffer hits 200 in size (or some other arbitrary figure), generate the SQL and send the whole lot off. Building this string up with sprintf shouldn't be too slow. Just beware of reading from a data structure that your interrupt routine might be writing to at the same time.
If you find that this SQL generation is too slow for some reason, and there's no quicker method using the API (eg. stored procedures), then you might want to run this concurrently with the data collection. Simplest is probably to stream the data across a socket or pipe to another process that performs the SQL generation. There are also multithreading approaches but they are more complex and error-prone.
In my opinion, you should do two things: 1. buffer the data and 2. one time stamp per buffer. The USB protocol is not byte based and more message based. If you are tracking messages, then time stamp the messages.
Also, databases would rather receive blocks or chunks of data than one byte at a time. There is overhead in the database with each transaction. To measure the efficiency, divide the overhead by the number of bytes in the transaction. You'll see that large blocks are more efficient than lots of little transactions.
Another option is to store the data into a file, then use the MySQL LOADFILE function to load the data into the the database. Also, there is storing the data into a buffer then using the MySQL C++ connector stream to load the data into the database.
multi-threading doesn't guarantee being any faster than apartment even if you cached it correctly on server side unless there was some strange cpu priority preference. What about using shaders and letting the pass by reference value in windows.h be the time stamp