Raw Binary Tree Database or MongoDb/MySQL/Etc? - c++

I will be storing terabytes of information, before indexes, and after compression methods.
Should I code up a Binary Tree Database by hand using sort files etc, or use something like MongoDB or even something like MySQL?
I am worried about (space) cost per record with things like MySQL and other DB's that are around. I also know that some databases even allow for compression, but they convert to read only tables. These tables/records need to be accessed and overwritten with new data fairly often. I think if I were to code something in C++ I'd be able to keep the cost of space per record to a minimum.
What should I do?

There are new non-relational databases that are becoming popular these days, that specialize in managing large-scale data.
Check out Hadoop or Cassandra, both of these are at the Apache Project.

Related

How to keep the SQLite database file in RAM?

Okay I know this question feels weird, but let me present my problem. I am building Qt based application with SQLite as my database. I noticed a few things. Whenever you are performing operations like manipulating row one by one directly on the sqlite file, it seems slow. That is because it is doing I/O operations on a file which is stored in hard drive. But when I use SSD instead of HDD, the speed is considerably improved, that is because SSD has high IO speed. But if I load a table into QSqlTableModel, we can make all the changes and save it the speed is good. That is because in one query the data is fetched from sqlite file and stored in RAM memory. And So the IO operations are less. So it got me thinking is it possible to save the sqlite file in RAM when my application launches, perform all my opeartions and then when user chooses to close, at that instant i can save the file to HDD? One might think why don't I just use qsqltablemodel itself, but there are some cases for me which involves creating tables and deleting tables, which qt doesnt support out of box, we need to execute query for that. So if anyone can point me if there's a way to achieve this in Qt, that would be great!

Should I create a model or just use a string

I'm writing a database (using django) and I can either use a string (max 4 chars) in a few places or I can create a model and reference it in those places. I've looked and I can't find good discussion on the merits of either solution. Because this database is going to be quite large, what solution scales better both in performance and size?
Thanks!
If you just have a string or a few strings which will be constant, just define it in settings.py. And use it like:
from django.conf import settings
print settings.my_string_1
If it is the case, you can save time by avoiding database access.
If you are going to use many strings which may vary over time, or need insertion, update or delete operations frequently, you have to use database to store it. If you already have a database like MySQL or postgres setup in the project, you can use it. If not, it is enough to use sqlite database if the database size is not going to be large enough.

what's the efficiency of qsqlite data base in Qt?

I am very new to database and I am trying to implement a offline map viewer. What would be the efficiency of the qsqldatabase?
To make it extreme, for example, is it possible to download all satellite image of all the detail levels of US from the google's map server and store it in a local sqlite database and still perform real time query based on my current gps location?
The Qt Database driver for SQLite uses SQLite internally (surprise!). So the question is more like: Is SQLite the right database to use? My answer: I would not use it to store geographical data, consider to look for a database which is optimized for this task.
If this is not an option; SQLite is really efficient. First check if your data is within the limits. Do not forget to create indexes and analyze the database. Then it should be able to handle your task. Here I assume you just want to get an image by its geographical position (but other solutions can be a lot faster because your data is sortable — if I remember correctly SQLite is not optimized for that).
As you will store large blobs, you may want to have a look at the Internal Versus External BLOBs in SQLite document. Maybe this gives you the answer already.

Is small changes in large documents a thing document databases is good for?

Sometimes documents with it's free form structure is attractive for storing data (in contrast to a relational database). But one problem is persistence in combination with making small changes to the data, since the entire document has to be rewritten to disk.
So my question is, are "document databases" especially made to solve this?
UPDATE
I think I understand the concept of "document oriented databases" better now. It's obviously not documents of any kind but each implementation uses it's own format, such as for instance JSON. And then the answer to my question also becomes obvious. If the entire JSON-structure had to be rewritten to disk after each change to keep it persisted, it wouldn't be a very good database.
If the entire JSON-structure had to be rewritten to disk after each change to keep it persisted, it wouldn't be a very good database.
I would say this is not true of any document database I know of. For example, Mongo doesn't store documents as JSON, it stores them as BSON (http://en.wikipedia.org/wiki/BSON).
Also databases like Mongo will store documents in RAM and persist them to disk later.
In fact many document databases will follow that pattern of storing documents in main memory and then writing them to disk.
But the fact that a given document database will write data to disk - and the fact that some documents might get changed a lot - does not mean the database is non-performant. I wouldn't disregard document databases based on speculation.

c++ Reading big text file to string (bigger than string::max_size)

I have a huge text file (~5GB) which is the database for my program. During run this database is read completely many times with string functions like string::find(), string::at(), string::substr()...
The problem is that this text file cannot be loaded in one string, because string::max_size is definitely too small.
How would you implement this? I had the idea of loading a part to string->reading->closing->loading another part to same string->reading->closing->...
Is there a better/more efficient way?
How would you implement this?
With a real database, for instance SQLite. The performance improvement from having indexes is more than going to make up for your time learning another API.
Since this is a database, I'm assuming it'd have many records. That to me implies best idea would be to implement a data class for each records and populate a list/vector/etc depending upon how you plan to use it. I'd also look into persistent cache as the file is big.
And within in your container class of all records, you could implement search etc functions as you see fit. But as suggested for a db of this size, you're probably best of using a database.