MySQL transaction and buffered list of requests - c++

I have a C++ code that parses files and updates the MySQL Database according to the content of these files.
I run my code in Windows 10, with MySQL 5.7, and my base uses the InnoDB engine. The MySQL calls are performed through a wrapper of my own around libmysql.
In an attempt to optimize this code, I append the update requests in a buffer until reaching the maximum size of the buffer, then send the whole buffer (containing N updates) at the same time. This whole process is done inside a single transaction.
Here is how my code looks:
MySQLWrapper.StartTransaction();
string QueryBuffer = "";
// Element count is the number of elements parsed from the files
for( int i = 0 ; i < ElementCount ; ++i )
{
bool FlushBuffer =
( i> 0 && ! (( i + 1 ) % N) ) ||
( i == ElementCount - 1 ); // determines if we have reached the buffer max number of requests
QueryBuffer += "INSERT INTO mytable (myfield) VALUES (" Element[ i ] ");";
if( FlushBuffer )
{
MySQLWrapper.SendRequest( QueryBuffer );
QueryBuffer.assign("");
}
}
MySQLWrapper.Commit();
The implementation of SendRequest(string Request ) would basically be:
void SendRequest(string Request)
{
mysql_query( SQLSocket, Request.c_str())
}
However, when committing the transaction, the transaction happens to have been broken: MySQL indicates that that the state is incorrect for comitting. I have tried to do the same thing but sending requests ony by one, and this error does not happen at the moment of the commit.
So, my 2 questions are:
Do you know why the fact to send multiple requests at a time breaks my transaction?
Do you think that the use of a buffered list of requests can really optimize my code?

Instead of multiple INSERTs, create one INSERT with multiple values. IOW, before the loop,
have the INSERT INTO TABLE (columns), then inside the loop, append (values), for each value set.
MySQLWrapper.StartTransaction();
string QueryBuffer = "INSERT INTO mytable (myfield) VALUES ";
// Element count is the number of elements parsed from the files
for( int i = 0 ; i < ElementCount ; ++i )
{
bool FlushBuffer =
( i> 0 && ! (( i + 1 ) % N) ) ||
( i == ElementCount - 1 ); // determines if we have reached the buffer max number of requests
QueryBuffer += "(" Element[ i ] ")";
if( flushbuffer ) {
QueryBuffer += ";";
} else {
QueryBuffer += ",";
}
if( FlushBuffer )
{
MySQLWrapper.SendRequest( QueryBuffer );
QueryBuffer.assign("");
}
}
MySQLWrapper.Commit();
The resulting SQL statement will be something like:
INSERT INTO mytable
(myfield)
VALUES
(1),
(2),
(3),
(3);

Related

What would be the bast way to insert and remove elements of a vector by iterating from the end

I'm currently refactoring some really old code and I would like to make it the most C++11 possible ( we don't use C++14 of 17 yet, but it's coming ) mostly as an academic example.
The code is looping from the end of a vector and displaces some of the elements after the current element of the vector.
Here is the code I came up with ( removed a lot of irrelevant parts ):
const size_t N = m_ObjVec.size();
size_t idx = N-1;
for ( ; ; )
{
CObj& curObj = *m_ObjVec[ idx ];
size_t otherIdx = FindOther( idx, N );
if ( c_NPOS != otherIdx )
{
CObj* pOtherObj = m_ObjVec[ otherIdx ];
// insert pOther after idx
m_ObjVec.insert( m_ObjVec.begin()+idx+1, pOtherObj );
// erase pOther from vector
m_ObjVec.erase( m_ObjVec.begin()+otherIdx );
}
else
{
if ( 0 == idx-- ) break;
}
}
I tried using reverse iterators with little success. I cannot use them with the erase / remove_if idiom nor can I use them in a loop where I insert.
What I'd like to write is basically "reverse_move_after_if".
NOTE
It has to loop from the end, only once. Looks a lot like a sort but it's not if you take those contraints into account.

C++, Postgres , libpqxx huge query

I have to execute an SQL query to Postgres by the following code. The query returns a huge number of rows (40M or more) and has 4 integer fields: When I use a workstation with 32Gb everything works but on a 16Gb workstation the query is very slow (due to swapping I guess). Is there any way to tell the C++ to load rows at batches, without waiting the entire dataset? With Java I never had these issues before, due to the probably better JDBC driver.
try {
work W(*Conn);
result r = W.exec(sql[sqlLoad]);
W.commit();
for (int rownum = 0; rownum < r.size(); ++rownum) {
const result::tuple row = r[rownum];
vid1 = row[0].as<int>();
vid2 = row[1].as<int>();
vid3 = row[2].as<int>();
.....
} catch (const std::exception &e) {
std::cerr << e.what() << std::endl;
}
I am using PostgreSQL 9.3 and there I see this http://www.postgresql.org/docs/9.3/static/libpq-single-row-mode.html, but I do not how to use it on my C++ code. Your help will be appreciated.
EDIT: This query runs only once, for creating the necessary main memory data structures. As such, tt cannot be optimized. Also, pgAdminIII could easily fetch those rows, in under one minute on the same (or with smaller RAM) PCs. Also, Java could easily handle twice the number of rows (with Statent.setFetchSize() http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize%28int%29) So, it is really an issue for the libpqxx library and not an application issue. Is there a way to enforce this functionality in C++, without explicitly setting limits / offsets manually?
Use a cursor?
See also FETCH. The cursor will use it for you behind the scenes, I gather, but just in case, you can always code the streaming retrieval manually with the FETCH.
To answer my own question, I adapted How to use pqxx::stateless_cursor class from libpqxx?
try {
work W(*Conn);
pqxx::stateless_cursor<pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor(W, sql[sqlLoad], "mycursor", false);
/* Assume you know total number of records returned */
for (size_t idx = 0; idx < countRecords; idx += 100000) {
/* Fetch 100,000 records at a time */
result r = cursor.retrieve(idx, idx + 100000);
for (int rownum = 0; rownum < r.size(); ++rownum) {
const result::tuple row = r[rownum];
vid1 = row[0].as<int>();
vid2 = row[1].as<int>();
vid3 = row[2].as<int>();
.............
}
}
} catch (const std::exception &e) {
std::cerr << e.what() << std::endl;
}
Cursors are a good place to start. Here's another cursor example, using a do-while()
const std::conStr("user=" + opt::dbUser + " password=" + opt::dbPasswd + " host=" + opt::dbHost + " dbname=" + opt::dbName);
pqxx::connection conn(connStr);
pqxx::work txn(conn);
std::string selectString = "SELECT id, name FROM table_name WHERE condition";
pqxx::stateless_cursor<pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor(txn, selectString, "myCursor", false);
//cursor variables
size_t idx = 0; //starting location
size_t step = 10000; //number of rows for each chunk
pqxx::result result;
do{
//get next cursor chunk and update the index
result = cursor.retrieve( idx, idx + step );
idx += step;
size_t records = result.size();
cout << idx << ": records pulled = " << records << endl;
for( pqxx::result::const_iterator row : result ){
//iterate over cursor rows
}
}
while( result.size() == step ); //if the result.size() != step, we're on our last loop
cout << "Done!" << endl;
I'm iterating over approximately 33 million rows in my application. In addition to using a cursor, I used the following approach:
Split the data into smaller chunks. For me, that was using bounding
boxes to grab data in a given area.
Construct a query to grab that
chunk, and use a cursor to iterate over it.
Store the chunks on the
heap and free them once you're done processing the data from a given
chunk.
I know this is a very late answer to your question, but I hope this might help someone!

How to use pqxx::stateless_cursor class from libpqxx?

I'm learning libpqxx, the C++ API to PostgreSQL. I'd like to use the pqxx::stateless_cursor class, but 1) I find the Doxygen output unhelpful in this case, and 2) the pqxx.org website has been down for some time now.
Anyone know how to use it?
I believe this is how I construct one:
pqxx::stateless_cursor <pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor( work, "SELECT * FROM mytable", ?, ? );
The last two parms are called cname and hold, but are not documented.
And once the cursor is created, how would I go about using it in a for() loop to get each row, one at a time?
Thanks #Eelke for the comments on cname and hold.
I figured out how to make pqxx::stateless_cursor work. I have no idea if there is a cleaner or more obvious way but here is an example:
pqxx::work work( conn );
pqxx::stateless_cursor<pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor( work, "SELECT * FROM mytable", "mycursor", false );
for ( size_t idx = 0; true; idx ++ )
{
pqxx::result result = cursor.retrieve( idx, idx + 1 );
if ( result.empty() )
{
// nothing left to read
break;
}
// Do something with "result" which contains a single
// row in this example since we told the cursor to
// retrieve row #idx (inclusive) to idx+1 (exclusive).
std::cout << result[ 0 ][ "name" ].as<std::string>() << std::endl;
}
I do not know the pqxx library but based on the underlying DECLARE command of postgresql I would guess
That cname is the name of the cursor, so it can be anything postgresql normally accepts as a cursor name.
That hold refers to the WITH HOLD option of a cursor, from the docs:
WITH HOLD specifies that the cursor can continue to be used after the
transaction that created it successfully commits. WITHOUT HOLD
specifies that the cursor cannot be used outside of the transaction
that created it. If neither WITHOUT HOLD nor WITH HOLD is specified,
WITHOUT HOLD is the default.
Here's another cursor example, using a do-while() loop:
const std::conStr("user=" + opt::dbUser + " password=" + opt::dbPasswd + " host=" + opt::dbHost + " dbname=" + opt::dbName);
pqxx::connection conn(connStr);
pqxx::work txn(conn);
std::string selectString = "SELECT id, name FROM table_name WHERE condition";
pqxx::stateless_cursor<pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor(txn, selectString, "myCursor", false);
//cursor variables
size_t idx = 0; //starting location
size_t step = 10000; //number of rows for each chunk
pqxx::result result;
do{
//get next cursor chunk and update the index
result = cursor.retrieve( idx, idx + step );
idx += step;
size_t records = result.size();
cout << idx << ": records pulled = " << records << endl;
for( pqxx::result::const_iterator row : result ){
//iterate over cursor rows
}
}
while( result.size() == step ); //if the result.size() != step, we're on our last loop
cout << "Done!" << endl;

Functional Code Breaks When Used Twice

I'm still working on my Field class, and tried to improve my piss-poor insertion/erase performance.
However, the new function works once, then breaks catastrophically when I use it a second time.
This is the code:
template <class T>
T *Field<T>::insert(const T *pPos, const T& data)
{
// Special case: field is empty. insert should still succeed.
// Special case: Pointing to one past the end. insert should still succeed
if( empty() || pPos == last() )
{
this->push_back(data);
return (this->last() - 1);
}
/* Explanation: Find cell before which to insert new value. Push_back new
new value, then keep swapping cells until reaching *pPos and swapping it
with data. The while fails, we exit, insert successful. */
T *p = ( std::find( this->first(), this->last(), *pPos ));
if( p != last() )
{
this->push_back(data);
T *right = (this->last() - 1);
T *left = (this->last() - 2);
while( *pPos != data )
std::iter_swap( left--, right-- );
// pPos *has* to be the destination of new value, so we simply return param.
return const_cast<T*>(pPos);
}
else
throw std::range_error("Not found");
}
Calling code from main
// Field already has push_back()ed values 10, 20, 30.
field->insert( &(*field)[2], 25 ); // field is a ptr (no reason, just bad choice)
Produces this output when printed on the console.
Field: 10 20 30 // Original Field
25 // Function return value
Field: 10 20 25 30 // Correct insertion.
New calling code from main
// Field already has push_back()ed values 10, 20, 30
field->insert( &(*field)[2], 25 );
field->insert( &(*field)[3], 35 );
Produces this output when printed on the console.
Field: 10 20 30
25
35
-4.2201...e+37, 10, 15, 20, 30
Windows has triggered a breakpoint in Pg_1.exe.
This may be due to a corruption in the heap (oh shit).
No symbols are loaded for any call stack frame.
The source code cannot be displayed.
The console then proceeds to never shutdown again until I close VSC++08 itself.
What? Why? How? What is my code doing!?
Additional Info
The Field has a size of three before the push, and a capacity of four. After two insertions, the Field is correctly increased to have a capacity of 8 (doubled), and stores five elements.
It doesn't matter where I insert my second element with insert(), it will fail the exact same way. Same output, even same number (I think) at the first cell.
Additional Code
Push_Back()
Note: This code was not changed during my refactoring. This function has always worked, so I highly doubt that this will be the problem-cause.
/* FieldImpl takes care of memory management. it stores the values v_, vused_,
and vsize_. Cells in the Field are only constructed when needed through a
placement new that is done through a helper function. */
template <class T>
void Field<T>::push_back(const T& data)
{
if( impl_.vsize_ == impl_.vused_ )
{
Field temp( (impl_.vsize_ == 0) ? 1
: (impl_.vsize_ * 2) );
while( temp.impl_.vused_ != this->impl_.vused_ )
temp.push_back( this->impl_.v_[temp.size()] );
temp.push_back(data);
impl_.Swap(temp.impl_);
}
else
{
// T *last() const { return &impl_.v_[impl_.vused_]; }
// Returns pointer to one past the last constructed block.
// variant: T *last() const { return impl_.v_; }
Helpers::construct( last(), data );
++impl_.vused_;
}
}
// ...
if( p != last() )
{
this->push_back(data);
After this line pPos may not be a valid pointer anymore.
The console then proceeds to never shutdown again until I close VSC++08 itself.
Tried clicking the Stop button in the debugger?
From the Debugger, and from ybungalobill, it is possible to see that pPos is invalidated after a special case in the
if( p != last()
{
this->push_back(data);
part of the code. If the array is resized, the pointer is invalidated. To bridge this, I simply stored const T pos = *pPos before the push and therefore removed the use of the *pPos pointer after a push.
Updated code:
const T pos = *pPos;
T *p = ( std::find( this->first(), this->last(), pos ) );
if( p != last() )
{
this->push_back(data);
p = ( std::find( this->first(), this->last(), pos ) );
T *right = (this->last() - 1);
T *left = (this->last() - 2);
while( *p != data )
std::iter_swap( left--, right-- );
return const_cast<T*>(p);
}

hashkey collision when removing C++

To make the search foreach "symbol" i want to remove from my hashTable, i have chosen to generate the hashkey i inserted it at. However, the problem that Im seeing in my remove function is when I need to remove a symbol from where a collision was found it previously results in my while loop condition testing false where i do not want.
bool hashmap::get(char const * const symbol, stock& s) const
{
int hash = this->hashStr( symbol );
while ( hashTable[hash].m_symbol != NULL )
{ // try to find a match for the stock associated with the symbol.
if ( strcmp( hashTable[hash].m_symbol , symbol ) == 0 )
{
s = &hashTable[hash];
return true;
}
++hash %= maxSize;
}
return false;
}
bool hashmap::put(const stock& s, int& usedIndex, int& hashIndex, int& symbolHash)
{
hashIndex = this->hashStr( s.m_symbol ); // Get remainder, Insert at that index.
symbolHash = (int&)s.m_symbol;
usedIndex = hashIndex;
while ( hashTable[hashIndex].m_symbol != NULL ) // collision found
{
++usedIndex %= maxSize; // if necessary wrap index around
if ( hashTable[usedIndex].m_symbol == NULL )
{
hashTable[usedIndex] = s;
return true;
}
else if ( strcmp( hashTable[usedIndex].m_symbol , s.m_symbol ) == 0 )
{
return false; // prevent duplicate entry
}
}
hashTable[hashIndex] = s; // insert if no collision
return true;
}
// What if I need to remove an index i generate?
bool hashmap::remove(char const * const symbol)
{
int hashVal = this->hashStr( symbol );
while ( hashTable[hashVal].m_symbol != NULL )
{
if ( strcmp( hashTable[hashVal].m_symbol, symbol ) == 0 )
{
stock temp = hashTable[hashVal]; // we cansave it
hashTable[hashVal].m_symbol = NULL;
return true;
}
++hashVal %= maxSize; // wrap around if needed
} // go to the next cell meaning their was a previous collision
return false;
}
int hashmap::hashStr(char const * const str)
{
size_t length = strlen( str );
int hash = 0;
for ( unsigned i = 0; i < length; i++ )
{
hash = 31 * hash + str[i];
}
return hash % maxSize;
}
What would I need to do to remove a "symbol" from my hashTable from a previous collision?
I am hoping it is not java's equation directly above.
It looks like you are implementing a hash table with open addressing, is that right? Deleting is a little tricky in that scheme. See http://www.maths.lse.ac.uk/Courses/MA407/del-hash.pdf:
"Deletion of keys is problematic with open addressing: If there are two colliding keys x and y with h(x) = h(y), and key x is inserted before key y, and one wants to delete key x, this cannot simply be done by marking T[h(x)] as FREE, since then y would no longer be found. One possibility would be to mark T[h(x)] as DELETED (another special entry), which is skipped when searching for a key. A table place marked as DELETED may also be re-used for storing another key z that one wants to insert if one is sure that this key z is not already in the table (i.e., by reaching the end of the probe sequence for key z and not finding it). Such re-use complicates the insertion method. Moreover, places with DELETED keys fill the table."
What you need to do is create a dummy sentinel value that represents a "deleted" item. When you insert a new value into the table, you need to check to see if an element is NULL or "deleted". If a slot contains this sentinel "deleted" value or the slot is NULL, then the slot is a valid slot for insertion.
That said, if you are writing this code for production, you should consider using the boost::unordered_map, instead of rolling your own hash map implementation. If this is for schoolwork,... well, good luck.