C++ Insert into MySQl database - c++

I have one application in c++ and i'm using the C++ Mysql Connector (https://dev.mysql.com/downloads/connector/cpp/)
I need to save some logs inside one table.Depending on the times I may have large amounts of data on the order of thousands (for example, 80,000).
I already implement a function that iterate my std::vector<std::string> and save the std::string to my database.
For example:
std::vector<std::string> lines = explode(filedata, '\n');
for (int i = 0; i < lines.size(); i++)
{
std::vector<std::string> elements = explode(lines[i], ';');
ui64 timestamp = strtol(elements.at(0).c_str(), 0, 10);
std::string pointId = elements.at(6);
std::string pointName = elements.at(5);
std::string data = elements.at(3);
database->SetLogs(timestamp, pointId, pointName, data);
}
The logs come from csv file, i save all fields to my vector. After this i parse the vector (with explode) and get only the fields that i need to save.
But i have a problem. If i have e.g 80,000 i'm calling my function to save in database 80,000. It works and save correctly all data but it takes a lot of time.
Exists some way to save all data only calling one time the function to save without calling e.g 80,000 and thus optimize the time?
EDIT 1
I change the insert code to this:
std::string insertLog = "INSERT INTO Logs (timestamp,pointId,pointName,data) VALUES (?,?,?,?)";
pstmt->setString(1, timestampString);
pstmt->setString(2, pointId);
pstmt->setString(3, pointName);
pstmt->setString(4, data);
pstmt->executeUpdate();

You could change the code to do bulk inserts, rather than insert one row at a time.
I would recommend doing it by generating an insert statement as a string and passing the string to mysqlpp::Query. I expect you can use a prepared statement to do bulk inserts in a similar way.
If you do an insert statement for each row (which I assume the case here using explode()), I think there's a lot more traffic between the client and the server, which must slow things down.
EDIT
I doubt the function explode() for increasing the execution time as you are accessing the file twice when you call explode twice. It would be nice if you can produce the code for explode().

Related

RocksDB - Double db size after 2 Put operations of same KEY-VALUEs

I have a program that uses RocksDB that tries to write huge amount of pairs of KEY-VALUE to database:
int main() {
DB* db;
Options options;
// Optimize RocksDB. This is the easiest way to get RocksDB to perform well
options.IncreaseParallelism(12);
options.OptimizeLevelStyleCompaction();
// create the DB if it's not already present
options.create_if_missing = true;
// open DB
Status s = DB::Open(options, kDBPath, &db);
assert(s.ok());
for (int i = 0; i < 1000000; i++)
{
// Put key-value
s = db->Put(WriteOptions(), "key" + std::to_string(i), "a hard-coded string here");
assert(s.ok());
}
delete db;
return 0;
}
When I run the program for the first time, it generated about 2GB of database, and I tried running this program for several times, without any changes, I got N*2GB of database with N=number-of-run. Until a certain number of N, the database size started to reduce.
But whatever I expected is the new batch of data written to database should be overwritten after each run if the batch doesn't change -> Then the size of database should be ~2GB after each run instead.
QUESTION: is it an issue of RocksDB or if not, what is proper settings for it to keep database's size stable in case of similar written pairs of KEY-VALUE?
A full compaction can reduce the space usage, just add this line before delete db;:
db->CompactRange(CompactRangeOptions(), nullptr, nullptr);
Note: a full compaction do take some time, depends on the data size.
Space amplification is expected, all LSM tree data structure DBs have this issue: https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#amplification-factors
Here is a great paper about space amplification research for rocksdb: http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf

How to buffer efficiently when writing to 1000s of files in C++

I am quite inexperienced when it comes to C++ I/O operations especially when dealing with buffers etc. so please bear with me.
I have a programme that has a vector of objects (1000s - 10,000s). At each time-step the state of the objects is updated. I want to have the functionality to log a complete state time history for each of these objects.
Currently I have a function that loops through my vector of objects, updates the state, and then calls a logging function which opens the file (ascii) for that object, writes the state to file, and closes the file (using std::ofstream). The problem is this signficantly slows down my run time.
I've been recommended a couple things to do to help speed this up:
Buffer my output to prevent extensive I/O calls to the disk
Write to binary not ascii files
My question mainly concerns 1. Specifically, how would I actually implement this? Would each object effectively require it's own buffer? or would this be a single buffer that somehow knows which file to send each bit of data? If the latter, what is the best way to achieve this?
Thanks!
Maybe the simplest idea first: instead of logging to separate files, why not log everything to an SQLite database?
Given the following table structure:
create table iterations (
id integer not null,
iteration integer not null,
value text not null
);
At the start of the program, prepare a statement once:
sqlite3_stmt *stmt;
sqlite3_prepare_v3(db, "insert into iterations values(?,?,?)", -1, SQLITE_PREPARE_PERSISTENT, &stmt, NULL);
The question marks here are placeholders for future values.
After every iteration of your simulation, you could walk your state vector and execute the stmt a number of times to actually insert rows into the database, like so:
for (int i = 0; i < objects.size(); i++) {
sqlite3_reset(stmt);
// Fill in the three placeholders and execute the query.
sqlite3_bind_int(stmt, 1, i);
sqlite3_bind_int(stmt, 2, current_iteration); // Could be done once, but here for illustration.
std::string state = objects[i].get_state();
sqlite3_bind_text(stmt, 3, state.c_str(), state.size(), SQLITE_STATIC); // SQLITE_STATIC means "no need to free this"
sqlite3_step(stmt); // Execute the query.
}
You can then easily query the history of each individual object using the SQLite command-line tool or any database manager that understands SQLite.

How to use text file as local database

I have a data_base.txt file with the next structure:
1|client_name|id_client|account_client|balance|status
2|client_name|id_client|account_client|balance|status
Example:
1|John Doe|08913835P|053-323-192|120.00|active
Now I want to have the next four functions working with this file:
This function will add a new client to the .txt file
int newClient(string client_name, string id_client)
{
.....
}
This function will check if a client exists by checking id_client
int checkClient(string id_client)
{
// return true; (if client with that ID exists)
// return false; (if client not exists)
}
This function will get a specific value:
int getData(string field, string id_client)
{
// Example: if string field == 'balance' and id_client == '08913835P' then return '120.00'
// This example is done using above example data structure.
}
This function will modify data
int modifyData(string field, string data)
{
// This should work like the previous function but this function will edit specific values.
}
That's all, I had been Googling for hours and I can't figure out yet how to do this.
This is problematic and horribly inefficient if not done right.
The short answer is that, if the length of each line can change, then you need to completely rewrite the entire file to update the data. This would work, for example, if your program loaded the entire file into memory and then saved it after it was modified.
To update the file on disk, you have to impose some type of rule, such as every line must be the same length. This would mean setting a maximum length for all your fields, and padding those fields with spaces or some other character.
Using the latter technique, it should be possible to construct a new line of data (with padding), know the location of that line in the file (line number - 1 times the length of each line), jump to that location and then write the line.
Getting the data would be simpler but similar. Just determine the offset of the line and read it (you'll know how long the lines is). Strip any padding before presenting to the user.
Modifying a line would be similar to a combination of getting the data and writing the data. You'd just update the data between the two.

Is loading a big file into memory and keeping it for all the running time of the program wrong?

1. Load file
I have a file of size 330 MB which I am loading into a multimap as follows:
// String = first column and vector<string> rest of the columns
`typedef std::multimap<string, vector<string>> termF`;
ifstream file("file.txt");
string line = "";
termF tfidf;
if (file.is_open())
{
while (file.good())
{
getline (file, line);
vector<string> values;
boost::split(values, line, boost::is_any_of(" "));
string id = values[0];
vector<string> vals;
for(int i = 1; i < values.size(); i++)
{
vals.push_back(values[i]);
}
tfidf.insert(pair<string,vector<string>>(id, vals));
}
file.close();
}
return tfidf;
2. Search
I have a list of ids stored in a vector<string> ids. I want to check if these ids are in the multimap by using the following code:
for(auto &id: ids)
{
vector<string> values = tfidf.find(id)->second;
}
3. Question
Instead of loading the file into the memory, is it better to search for the ids directly from the file? it will be like going back and forth between the program and the text file.
This file will be kept in memory for all the running of the program
It is a very subjective question - if you absolutely require maximum (eg in-memory data-base) performance, you don't have memory concerns and you can't change your on-disk data representation, then your options are limited to what you already have.
If your code is supposed to run under limited memory conditions, such as mobile devices, then you should look up dynamically in the file, and that's where we go to the next option.
Use a data-base solution and query the db to find your required data. you may implement a caching layer on top of the db or you may use a db that does some caching for you. leveldb is good and simple document-based data-base library. Sqlite is an option too, especially when you need the features of relational db (that depends on the structure of data you store). This option will definitely beat option #2.

Delete content in a text file between two specific characters

I'm making a simple bug tracker and am using a text file as the database. Right now I'm reading in all the information through keys and importing them into specific arrays.
for (int i = 0; i < 5; i++)
{
getline(bugDB, title[i], '#');
getline(bugDB, importance[i], '!');
getline(bugDB, type[i], '$');
getline(bugDB, description[i], '*');
}
Here is what's in my (terribly unreadable) file
Cant jump#Moderate!Bug$Every time I enter the cave of doom, I'm unable
to jump.*Horse too expensive#Moderate!Improvement$The horses cost way
too much gold, please lower the costs.*Crash on startup#Severe!Bug$I'm
crashing on startup on my Win8.1 machine, seems to be a 8.1
bug.*Floating tree at Imperial March#Minimal!Bug$There is a tree
floating about half a foot over the ground near the crafting
area.*Allow us to instance our group#Moderate!Improvement$We would
like a feature that gives us the ability to play with our groups alone
inside dungeons.*
Output:
This works great for me, but I'd like to be able to delete specific bugs. I'd be able to do this by letting the user choose a bug by number, find the corresponding * key, and delete all information until the program reaches the next * key.
I'd appreciate any suggestions, I don't know where to start here.
There is no direct mechanism for deleting some chunk of data from the middle of the file, no delete(file, start, end) function. To perform such a deletion you have to move the data which appears after the region; To delete ten bytes from the middle of a file you'd have to move all of the subsequent bytes back ten, looping over the data, then truncate to make the file ten bytes smaller.
In your case however, you've already written code to parse the file into memory, populating your arrays. Why not just implement a function to write the contents of the arrays back to a file? Truncate the file (open in mode "w" rather than "w+"), loop over the arrays writing their contents back to the file in your preferred format, but skip the entry that you want to delete.
its only possible by manually copying the data from input file to output file and leaving out the entry you want to delete.
but: i strongly encourage the usage of some small database for keeping the informations (look at sqlite)
Also its a bad bugtracker if solving the bug means "delete it from database" (its not even is a tracker). give it a status field (open, refused, duplicate, fixed, working, ...).
Additional remarks:
use one array that keeps some structure with n informations and not n arrays.
please remind that someone may use your delimiter characters in the descriptions (use some uncommon character and replace its usage in saved text)
explanation for 1.:
instead of using
std::vector<std::string> title;
std::vector<int> importance;
std::vector<std::string> description;
define a structure or class and create a vector of this structure.
struct Bug{
std::string title;
int importance; // better define an enum for importance
std::string description;
};
std::vector<Bug> bugs;