How to keep the SQLite database file in RAM? - c++

Okay I know this question feels weird, but let me present my problem. I am building Qt based application with SQLite as my database. I noticed a few things. Whenever you are performing operations like manipulating row one by one directly on the sqlite file, it seems slow. That is because it is doing I/O operations on a file which is stored in hard drive. But when I use SSD instead of HDD, the speed is considerably improved, that is because SSD has high IO speed. But if I load a table into QSqlTableModel, we can make all the changes and save it the speed is good. That is because in one query the data is fetched from sqlite file and stored in RAM memory. And So the IO operations are less. So it got me thinking is it possible to save the sqlite file in RAM when my application launches, perform all my opeartions and then when user chooses to close, at that instant i can save the file to HDD? One might think why don't I just use qsqltablemodel itself, but there are some cases for me which involves creating tables and deleting tables, which qt doesnt support out of box, we need to execute query for that. So if anyone can point me if there's a way to achieve this in Qt, that would be great!

Related

Low level page manager in C/C++

I need a library (C/C++) that handles the basic operations of a low level pager layer on disk. The library must include methods to get a disk page, add data, write the page to disk etc.
To give a bit of background: Assume there's a particular dataset which runs query q. I have a particular grouping of the data on disk so that q can be processed faster. This library will help me write the data in pages, according to the grouping.
Appreciate any suggestions.
You don't specify the OS, but that doesn't really matter: all common OS'es bar this outright. Instead, use a memory-mapped file. To read a page from disk, just read from memory and the OS will (if needed) perform a disk read. Writing to disk will be mostly on-demand but usually can be done explicitly on demand.
As for your "dataset running query q", that just doesn't make sense. Queries are code not data. Do you perhaps have a query Q run over a dataset D?

what's the efficiency of qsqlite data base in Qt?

I am very new to database and I am trying to implement a offline map viewer. What would be the efficiency of the qsqldatabase?
To make it extreme, for example, is it possible to download all satellite image of all the detail levels of US from the google's map server and store it in a local sqlite database and still perform real time query based on my current gps location?
The Qt Database driver for SQLite uses SQLite internally (surprise!). So the question is more like: Is SQLite the right database to use? My answer: I would not use it to store geographical data, consider to look for a database which is optimized for this task.
If this is not an option; SQLite is really efficient. First check if your data is within the limits. Do not forget to create indexes and analyze the database. Then it should be able to handle your task. Here I assume you just want to get an image by its geographical position (but other solutions can be a lot faster because your data is sortable — if I remember correctly SQLite is not optimized for that).
As you will store large blobs, you may want to have a look at the Internal Versus External BLOBs in SQLite document. Maybe this gives you the answer already.

Django fixtures, loading in large amounts of data

So I have two 200mb JSON files. The first one takes 1.5 hours to load, and the second which (which makes a bunch of many-to-many relationships models with the first), takes 24+ hours (since there's no updates via the console, I have no clue if it was still going or if it froze, so I stopped it).
Since the loaddata wasn't working that well, I wrote my own script that loaded the data while also outputting what's been recently saved into the db, but I noticed the speed of the script (along with my computer) decayed the longer it went. So I had to stop the script -> restart my computer -> resume at the section of data where I left off, and that would be faster than running the script throughout. This was a tedious process since it took roughly 18 hrs with me restarting the computer every 4 hours to get all the data fully loaded.
I'm wondering if there is a better solution for loading in large amounts of data?
EDIT: I realized there's an option to load in raw SQL, so I may try that, although I need to brush up on my SQL.
When you're loading large amounts of data, writing your own custom script is generally the fastest. Once you've got it loaded in once, you can use your databases import/export options, which will generally be very fast (ex, pgdump).
When you are writing your own script, though, two things which will drastically speed things up:
Loading data inside a transaction. By default the database is likely in autocommit mode, which causes an expensive commit after each insert. Instead, make sure that you begin a transaction before you insert anything, then commit it afterwards (importantly, though, don't forget to commit; nothing sucks like spending three hours importing data, only to realize you forgot to commit it).
Bypassing the Django ORM and use raw INSERT statements. There is some computational overhead to the ORM, and bypassing it will make things faster.

c++ Reading big text file to string (bigger than string::max_size)

I have a huge text file (~5GB) which is the database for my program. During run this database is read completely many times with string functions like string::find(), string::at(), string::substr()...
The problem is that this text file cannot be loaded in one string, because string::max_size is definitely too small.
How would you implement this? I had the idea of loading a part to string->reading->closing->loading another part to same string->reading->closing->...
Is there a better/more efficient way?
How would you implement this?
With a real database, for instance SQLite. The performance improvement from having indexes is more than going to make up for your time learning another API.
Since this is a database, I'm assuming it'd have many records. That to me implies best idea would be to implement a data class for each records and populate a list/vector/etc depending upon how you plan to use it. I'd also look into persistent cache as the file is big.
And within in your container class of all records, you could implement search etc functions as you see fit. But as suggested for a db of this size, you're probably best of using a database.

How to create a program which is working similar like RAID1 (mirroring)?

I want to create a simple program which is working very similar to RAID1. It should work like this:
First i want to give the primary HDD-s drive letter and than the secondary one. I will only write to the primary HDD! If any new data is copied to the primary HDD it should automatically copy it to the secondary one.
I need some help where should i start all this? How to monitor the written data in the primary HDD? Obviously there are many ways to do what i want (i think), but i need the simpliest way.
If this isn't so complicated, than how can i handle that case if the primary HDD has two or more partition, because then i should check the secondary HDD's partition too, and then create/resize them if necessary?
Thanks in advance!
kampi
The concept of mirroring disk writes to another disk in real time is the basis for high availability, and implementing these schemes are not trivial.
The company I work for makes DoubleTake, which does real time mirroring & replication of file based IO to local or remote volumes. This is a little different than what you are describing, which appears to be block based disk/volume replication, but many of the concepts are similar.
For file based replication, there are a quite a few nasty scenarios, i'll describe a few:
Synchronizing the contents of one volume to another volume, keeping in mind that changes can occur while you are doing this. I suppose you could simply this by requiring that volumes start out totally formatted. But for people that have data that will not be a good solution!
keeping up with disk changes: What if the volume you are mirroring to is slower than the source volume? Where do you buffer? To Disk? Memory?
Anyways we use a kernel mode file system filter driver to capture the disk IO, and then our user mode service grabs this IO and forwards it to a local or remote disk.
If you want to learn about file system filtering, one of the best books (its old but good) is File System Internals, by Rajeev Nagar. Its a must read for doing any serious work with file system filters.
Also take a look at the file system filter samples on the Windows 7 WDK, its free, and they have good file mon examples that will get you seeing disk changes pretty quickly.
Good Luck!