I'm learning libpqxx, the C++ API to PostgreSQL. I'd like to use the pqxx::stateless_cursor class, but 1) I find the Doxygen output unhelpful in this case, and 2) the pqxx.org website has been down for some time now.
Anyone know how to use it?
I believe this is how I construct one:
pqxx::stateless_cursor <pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor( work, "SELECT * FROM mytable", ?, ? );
The last two parms are called cname and hold, but are not documented.
And once the cursor is created, how would I go about using it in a for() loop to get each row, one at a time?
Thanks #Eelke for the comments on cname and hold.
I figured out how to make pqxx::stateless_cursor work. I have no idea if there is a cleaner or more obvious way but here is an example:
pqxx::work work( conn );
pqxx::stateless_cursor<pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor( work, "SELECT * FROM mytable", "mycursor", false );
for ( size_t idx = 0; true; idx ++ )
{
pqxx::result result = cursor.retrieve( idx, idx + 1 );
if ( result.empty() )
{
// nothing left to read
break;
}
// Do something with "result" which contains a single
// row in this example since we told the cursor to
// retrieve row #idx (inclusive) to idx+1 (exclusive).
std::cout << result[ 0 ][ "name" ].as<std::string>() << std::endl;
}
I do not know the pqxx library but based on the underlying DECLARE command of postgresql I would guess
That cname is the name of the cursor, so it can be anything postgresql normally accepts as a cursor name.
That hold refers to the WITH HOLD option of a cursor, from the docs:
WITH HOLD specifies that the cursor can continue to be used after the
transaction that created it successfully commits. WITHOUT HOLD
specifies that the cursor cannot be used outside of the transaction
that created it. If neither WITHOUT HOLD nor WITH HOLD is specified,
WITHOUT HOLD is the default.
Here's another cursor example, using a do-while() loop:
const std::conStr("user=" + opt::dbUser + " password=" + opt::dbPasswd + " host=" + opt::dbHost + " dbname=" + opt::dbName);
pqxx::connection conn(connStr);
pqxx::work txn(conn);
std::string selectString = "SELECT id, name FROM table_name WHERE condition";
pqxx::stateless_cursor<pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor(txn, selectString, "myCursor", false);
//cursor variables
size_t idx = 0; //starting location
size_t step = 10000; //number of rows for each chunk
pqxx::result result;
do{
//get next cursor chunk and update the index
result = cursor.retrieve( idx, idx + step );
idx += step;
size_t records = result.size();
cout << idx << ": records pulled = " << records << endl;
for( pqxx::result::const_iterator row : result ){
//iterate over cursor rows
}
}
while( result.size() == step ); //if the result.size() != step, we're on our last loop
cout << "Done!" << endl;
Related
I have a C++ code that parses files and updates the MySQL Database according to the content of these files.
I run my code in Windows 10, with MySQL 5.7, and my base uses the InnoDB engine. The MySQL calls are performed through a wrapper of my own around libmysql.
In an attempt to optimize this code, I append the update requests in a buffer until reaching the maximum size of the buffer, then send the whole buffer (containing N updates) at the same time. This whole process is done inside a single transaction.
Here is how my code looks:
MySQLWrapper.StartTransaction();
string QueryBuffer = "";
// Element count is the number of elements parsed from the files
for( int i = 0 ; i < ElementCount ; ++i )
{
bool FlushBuffer =
( i> 0 && ! (( i + 1 ) % N) ) ||
( i == ElementCount - 1 ); // determines if we have reached the buffer max number of requests
QueryBuffer += "INSERT INTO mytable (myfield) VALUES (" Element[ i ] ");";
if( FlushBuffer )
{
MySQLWrapper.SendRequest( QueryBuffer );
QueryBuffer.assign("");
}
}
MySQLWrapper.Commit();
The implementation of SendRequest(string Request ) would basically be:
void SendRequest(string Request)
{
mysql_query( SQLSocket, Request.c_str())
}
However, when committing the transaction, the transaction happens to have been broken: MySQL indicates that that the state is incorrect for comitting. I have tried to do the same thing but sending requests ony by one, and this error does not happen at the moment of the commit.
So, my 2 questions are:
Do you know why the fact to send multiple requests at a time breaks my transaction?
Do you think that the use of a buffered list of requests can really optimize my code?
Instead of multiple INSERTs, create one INSERT with multiple values. IOW, before the loop,
have the INSERT INTO TABLE (columns), then inside the loop, append (values), for each value set.
MySQLWrapper.StartTransaction();
string QueryBuffer = "INSERT INTO mytable (myfield) VALUES ";
// Element count is the number of elements parsed from the files
for( int i = 0 ; i < ElementCount ; ++i )
{
bool FlushBuffer =
( i> 0 && ! (( i + 1 ) % N) ) ||
( i == ElementCount - 1 ); // determines if we have reached the buffer max number of requests
QueryBuffer += "(" Element[ i ] ")";
if( flushbuffer ) {
QueryBuffer += ";";
} else {
QueryBuffer += ",";
}
if( FlushBuffer )
{
MySQLWrapper.SendRequest( QueryBuffer );
QueryBuffer.assign("");
}
}
MySQLWrapper.Commit();
The resulting SQL statement will be something like:
INSERT INTO mytable
(myfield)
VALUES
(1),
(2),
(3),
(3);
I've downloaded google diff library for C++ Qt.
https://code.google.com/archive/p/google-diff-match-patch/
But I don't really understand how to use it for a simple comparing of two strings.
Let assume I have two QStrings.
QString str1="Stackoverflow"
QString str2="Stackrflow"
As I understood I need to create dmp object of diff_match_match class and then call the method for comparing.
So what do I do to have for example "ove has deleted from 5 position".
Usage is explained in the API wiki and diff_match_patch.h.
The position isn’t contained in the Diff object. To obtain it, you could iterate over the list and calculate the change position:
Unchanged substrings and deletes increment the position by the length of the unchanged/deleted substring.
Insertions do not alter positions in the original string.
Deletes followed by inserts are actually replacements. In that case the insert operation happens at the same position where the delete occured, so that last delete should not increment the position.
i.e. something like this (untested):
auto diffResult = diff_main(str1, str2);
int equalLength = 0;
int deleteLength = 0;
bool lastDeleteLength = 0; // for undoing position offset for replacements
for (const auto & diff : diffResult) {
if (diff.operation == Operation.EQUAL) {
equalLength += diff.text.length();
lastDeleteLength = 0;
}
else if (diff.operation == Operation.INSERT) {
pos = equalLength + deleteLength - lastDeleteLength;
qDebug() << diff.toString() << "at position" << pos;
lastDeleteLength = 0;
}
else if (diff.operation == Operation.DELETE) {
qDebug() << diff.toString() << "at position" << equalLength + deleteLength;
deleteLength += diff.text.length();
lastDeleteLength = diff.text.length();
}
}
Any idea about how to get common keys from large set of unsorted_multimap ??? I use file_name(string) as a key and its size(int) as a value. Basically I am scanning a directory for searching duplicate files using boost and holding entry of each file in unsorted_multimap. Once this map is ready I need to output common keys(file_name) and there sizes as a list of duplicate files.
How to find common keys of an unsorted_multimap ?
The following code searches for a specific filename, and iterates through all elements with the same key:
std::unordered_multimap<std::string, int> mymulti; // key: filename, value: size
//... fill the multimap
for (auto x = mymulti.find("fileb"); x != mymulti.end() && x->first == "fileb"; x++) {
std::cout << x->second << " "; // do something !
}
std::cout << "}\n"; // end something !
How to iterate through an unsorted_multimap, goupring processing by common keys ?
The following code iterates trhough the whole map, and for eacuh key, processes in a subloop the related values:
for (auto i = mymulti.begin(); i != mymulti.end(); ) { // iterate through multimap
auto group = i->first; // start a new group
std::cout << group << "={"; // start doing something for the group
do {
std::cout << i->second << " "; // do something for every values of the group
} while (++i != mymulti.end() && i->first == group); // until we change value
std::cout << "}\n"; // end something for the group
}
// end overal processing of the map
How to find duplicate files (same key and same value ) ?
Using the building blocks above, you could for every filename, you could create a temporary unsorted_map with the size as value, looking if the element is already in the temporary map (duplicate) or adding it (non duplicate).
If the whole purpose of your unsorted_multimapis to process these duplicates, then it would be pbetter, from the start to build an unosorted_map with filenames as keys, and value a multimap with size as sorted key and values, the other elements you collect on the file (full url ? inode ? wathever):
unsorted_map<string, multimap<long, filedata>> myspecialmap;
I have to execute an SQL query to Postgres by the following code. The query returns a huge number of rows (40M or more) and has 4 integer fields: When I use a workstation with 32Gb everything works but on a 16Gb workstation the query is very slow (due to swapping I guess). Is there any way to tell the C++ to load rows at batches, without waiting the entire dataset? With Java I never had these issues before, due to the probably better JDBC driver.
try {
work W(*Conn);
result r = W.exec(sql[sqlLoad]);
W.commit();
for (int rownum = 0; rownum < r.size(); ++rownum) {
const result::tuple row = r[rownum];
vid1 = row[0].as<int>();
vid2 = row[1].as<int>();
vid3 = row[2].as<int>();
.....
} catch (const std::exception &e) {
std::cerr << e.what() << std::endl;
}
I am using PostgreSQL 9.3 and there I see this http://www.postgresql.org/docs/9.3/static/libpq-single-row-mode.html, but I do not how to use it on my C++ code. Your help will be appreciated.
EDIT: This query runs only once, for creating the necessary main memory data structures. As such, tt cannot be optimized. Also, pgAdminIII could easily fetch those rows, in under one minute on the same (or with smaller RAM) PCs. Also, Java could easily handle twice the number of rows (with Statent.setFetchSize() http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize%28int%29) So, it is really an issue for the libpqxx library and not an application issue. Is there a way to enforce this functionality in C++, without explicitly setting limits / offsets manually?
Use a cursor?
See also FETCH. The cursor will use it for you behind the scenes, I gather, but just in case, you can always code the streaming retrieval manually with the FETCH.
To answer my own question, I adapted How to use pqxx::stateless_cursor class from libpqxx?
try {
work W(*Conn);
pqxx::stateless_cursor<pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor(W, sql[sqlLoad], "mycursor", false);
/* Assume you know total number of records returned */
for (size_t idx = 0; idx < countRecords; idx += 100000) {
/* Fetch 100,000 records at a time */
result r = cursor.retrieve(idx, idx + 100000);
for (int rownum = 0; rownum < r.size(); ++rownum) {
const result::tuple row = r[rownum];
vid1 = row[0].as<int>();
vid2 = row[1].as<int>();
vid3 = row[2].as<int>();
.............
}
}
} catch (const std::exception &e) {
std::cerr << e.what() << std::endl;
}
Cursors are a good place to start. Here's another cursor example, using a do-while()
const std::conStr("user=" + opt::dbUser + " password=" + opt::dbPasswd + " host=" + opt::dbHost + " dbname=" + opt::dbName);
pqxx::connection conn(connStr);
pqxx::work txn(conn);
std::string selectString = "SELECT id, name FROM table_name WHERE condition";
pqxx::stateless_cursor<pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor(txn, selectString, "myCursor", false);
//cursor variables
size_t idx = 0; //starting location
size_t step = 10000; //number of rows for each chunk
pqxx::result result;
do{
//get next cursor chunk and update the index
result = cursor.retrieve( idx, idx + step );
idx += step;
size_t records = result.size();
cout << idx << ": records pulled = " << records << endl;
for( pqxx::result::const_iterator row : result ){
//iterate over cursor rows
}
}
while( result.size() == step ); //if the result.size() != step, we're on our last loop
cout << "Done!" << endl;
I'm iterating over approximately 33 million rows in my application. In addition to using a cursor, I used the following approach:
Split the data into smaller chunks. For me, that was using bounding
boxes to grab data in a given area.
Construct a query to grab that
chunk, and use a cursor to iterate over it.
Store the chunks on the
heap and free them once you're done processing the data from a given
chunk.
I know this is a very late answer to your question, but I hope this might help someone!
I am using c++ with ADO to connect to a mySql database, and I am using the standard ADO/C++ method to create a connection to the mySql database, and recordset is the pointer to the retrieved first record
_RecordsetPtr recordset;
recordset->Open("Select * from table",p_connection_.GetInterfacePtr(),adOpenForwardOnly,adLockReadOnly,adCmdText);
My concern is if the table contains too many records, and if I query all records, it will consume alot of memory?
I want to only retrieve, maybe 100 records each time and process them. Is it possible? The table does not contain id or index as its attribute, so "Select * from table where id >= 1 and id <= 100" does not work.
You will want to use limits on the query and cycle through them.
//SELECT * FROM table LIMIT 0 OFFSET 100
int tlimit, blimit;
std::string query;
std::stringstream sstm;
_RecordsetPtr recordset, count;
count->Open("SELECT COUNT(*) FROM table",p_connection_.GetInterfacePtr(),adOpenForwardOnly,adLockReadOnly,adCmdText);
for(int i = 0; i < count/100 + 1; i++)
{
tlimit = 100 * i + 100;
blimit = 100 * i;
sstm << "SELECT * FROM table LIMIT " << blimit << " OFFSET " << tlimit;
query = sstm.str();
recordset->Open(query,p_connection_.GetInterfacePtr(),adOpenForwardOnly,adLockReadOnly,adCmdText);
//suggest passing the recordset to a function to do what ever you want with it here
}
Note that if you are not using a database that starts its records off at 1 you will have to modify that algorithm a bit.