Insert Data into Single MySQL Table Concurrently by Multi-thread in C++

Insert Data into Single MySQL Table Concurrently by Multi-thread in C++ - c++

I am doing an experiment that concurrently insert data into a MySQL table by multi-thread.
Here is partial code in C++.
bool query_thread(const char* cmd, MYSQL* con) {
if( !query( cmd, con ) ) {
return 0;
}
return 1;
}
int main() {
........
if(mysql_query(m_con, "CREATE TABLE tb1 (model INT(32), handle INT(32))") != 0) {
return 0;
}
thread thread1(query_thread, "INSERT INTO tb1 VALUES (1,1)", m_con);
thread thread2(query_thread, "INSERT INTO tb1 VALUES (2,2)", m_con);
thread thread3(query_thread, "INSERT INTO tb1 VALUES (3,3)", m_con);
thread1.join();
thread2.join();
thread3.join();
}
But the MySQL error message is issued.
error cmd: INSERT INTO tb1 VALUES (1,1)
Lost connection to MySQL server during query
Segmentation fault
My questions are as following.
Is it because the MySQL cannot accept concurrently insertion? Or bad use of multi-thread.
By multi-thread insertion as above, does it help to speed up the program? I understand the best way are multiple insert per query and LOAD DATA INFILE. But I just want to know if this way can help.

Each thread must have:
own database connection
own transaction
own cursor
This, however will not make your inserts much faster. In short, the innodb log (journal) is essentially serial which limits server total insert rate. Read mysql performance blog (percona / mariadb) for details. Certainly there are parameters to tweak and there seem to have been advances with recently.

Related

MySQL select for update, many threads time out

I have a table in MySQL database that holds some "prepared" jobs.
CREATE TABLE `ww_jobs_for_update` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`status` int(11) NOT NULL,
`inc` int(11) NOT NULL,
PRIMARY KEY (`id`)
)
Now I have a C++1y multithreaded application, where each thread goes to the table and selects a job where status=0 (Not completed), does some computation and sets the status=1 upon completion.
The problem is that many threads will acquire a "job row" concurrently so some locking in the database has to take place.
The C++ method that locks/updates/commit is the following
connection = "borrow a connection from the pool"
std::unique_ptr<sql::Statement> statement(connection->sql_connection_->createStatement());
connection->sql_connection_->setAutoCommit(false);
//statement->execute("START TRANSACTION");
std::unique_ptr<sql::ResultSet> rs(statement->executeQuery("select id from ww_jobs_for_update where status=0 ORDER BY id LIMIT 1 FOR UPDATE"));
if (rs->next()) {
db_id = rs->getInt64(1);
DEBUG << "Unlock Fetched: " << db_id;
}
rs->close();
std::stringstream ss;
ss << "update ww_jobs_for_update set status=1 where id=" << db_id;
statement->execute(ss.str());
//statement->execute("COMMIT;");
connection->sql_connection_->commit();
"release the connection to the pool();"
But this approach seems not be efficient. I always get back
ErrorCode: 1205,SQLState: HY000. Details:
from a lot of threads, especially when there load is increasing.
Why I am getting this back? What is the most efficient way to do this, hard consistency is a requirement.

The best way for this task in my experience is using redis queues.
Locking tables SELECT ... FOR UPDATE is hanging the database when you have some multi-thread application running etc.
I would advice you to install a redis, and write some scripts to create queues according to the data in tables and rewrite your program to use redis queues for the task.
redis queues will not give the same value for different threads, so you get the uniqueness and there are no locks in the database - so your scripts will work fast.

Can you make your transaction durations shorter? Here's what I mean.
You have status values of 0 for "waiting" and 1 for "complete". Use the status value 2 (or -1, or whatever you choose) to mean "working".
Then when a worker thread grabs a job to do from the table, it will do this (pseudo-SQL).
BEGIN TRANSACTION
SELECT id FROM ww_jobs_for_update WHERE status=0 ORDER BY id LIMIT 1 FOR UPDATE
UPDATE ww_jobs_for_update SET status=2 WHERE id = << db_id
COMMIT
Now, your thread has taken a job and released the transaction lock. When the job is done you simply do this to mark it done, with no transaction needed.
UPDATE ww_jobs_for_update SET status=1 WHERE id=" << db_id;
There's an even simpler way to do this if you can guarantee each worker thread has a unique identifer threadId . Put a thread column in the table with a default NULL value. Then to start processing a job.
UPDATE ww_jobs_for_update
SET thread = threadId, status = 2
WHERE status = 0 AND threadId IS NULL
ORDER BY id LIMIT 1;
SELECT id FROM ww_jobs_for_update WHERE thread = threadId AND status=2
When done
UPDATE ww_jobs_for_update
SET thread = NULL, status = 1
WHERE thread = threadId;
Because each thread has a unique threadId, and because individual SQL UPDATE statements are themselves little transactions, you can do this without using any transactions or commits at all.
Both these approaches have the added benefit that you can use a SELECT query to find out which jobs are active. This may allow you to deal with jobs that never completed for whatever reason.

Explicitly lock and unlock a table using ODBC

I have to perform some calculations with data stored in an MSSQL Server database and then save the results in the same database.
I need to load (part of) a table into C++ data structures, perform a calculation (that can take substantial time), and finally add some rows to the same table.
The problem is that several users can access the database concurrently, and I want the table to be locked since the data is loaded in memory until the results of the calculation are written to the table.
Using the ODBC SDK, is it possible to explicitly lock and unlock part of a table?
I have tried the following test program, but unfortunately the INSERT statement succeeds before StmtHandle1 is freed:
SQLDriverConnect(ConHandle1, NULL, (SQLCHAR *)"DRIVER={ODBC Driver 13 for SQL Server};"
"SERVER=MyServer;"
"DATABASE=MyDatabase;"/*, ... */);
SQLSetStmtAttr(StmtHandle1,SQL_ATTR_CONCURRENCY,(SQLPOINTER)SQL_CONCUR_LOCK,SQL_IS_INTEGER);
SQLExecDirect(StmtHandle1, (SQLCHAR *)"SELECT * FROM [MyTable] WITH (TABLOCKX, HOLDLOCK)", SQL_NTS);
SQLDriverConnect(ConHandle2, NULL, (SQLCHAR *)"DRIVER={ODBC Driver 13 for SQL Server};"
"SERVER=MyServer;"
"DATABASE=MyDatabase;"/*, ... */);
SQLSetStmtAttr(StmtHandle2,SQL_ATTR_CONCURRENCY,(SQLPOINTER)SQL_CONCUR_LOCK,SQL_IS_INTEGER);
SQLExecDirect(StmtHandle2, (SQLCHAR *)"INSERT INTO [MyTable] VALUES (...)", SQL_NTS);

unfortunately the INSERT statement succeeds before StmtHandle1 is
freed
By default SQL Server opereates in autocommit mode, i.e. opens a tarnsaction and commits it for you.
You requested TABLOCKX and the table was locked for the duration of your transaction, but what you want instead is to explicitely open a transaction and don't commit/rollback it until you'll done with your calculations, i.e. you should use
begin tran; SELECT top 1 * FROM [MyTable] WITH (TABLOCKX, HOLDLOCK);
And you don't need to read the whole table, top 1 * is sufficient.

MySQL Api: Lost connection to MySQL server during query

There are zillions of these questions I know, but none of them were similar to my issue, so I figured I'd ask. I have a server set up that validates clients that connect to it by receiving username/password combinations and checking them using a SQL query. I wrote the system, and it worked perfectly fine during the first couple of requests.
http://puu.sh/d9mss/384b4df9f0.png
However, I found that if I wait about 5 minutes and then try to connect, this happens.
http://puu.sh/d9mx7/192cbb2cfc.png
This is the code that I am running to perform that task.
bool CNetDatabase::AuthUser(std::string username, const unsigned char* passwordhash)
{
RoughSanitizeString(username); //this doesn't do anything
/* Turn password hash into string */
std::ostringstream password;
password.fill('0');
password << std::hex;
for (int i = 0; i < 20; i++)
{
password << std::setw(2) << (unsigned int)passwordhash[i];
}
/* Make request */
MYSQL_RES* result = nullptr;
if (mysql_query(sql_con, tools::string::format(
"SELECT COUNT(`index`) FROM `Users` WHERE `username` = '%s' AND `password` = '%s'",
username.c_str(), password.str().c_str()).c_str()))
{
fprintf(
stderr,
"ERROR: mysql_query failed: %s [%d]\n",
mysql_error(sql_con), mysql_errno(sql_con));
return false;
}
/* Get and return result */
result = mysql_store_result(sql_con);
MYSQL_ROW row = mysql_fetch_row(result);
mysql_free_result(result);
return row[0][0] == '1' ? true : false;
}
Any ideas on what could be going wrong?

Check MySQL manual. There are session timeouts on both client and server side.
Anyway it is a good practice to expect that connection to external resources may become unavailable and try to reconnect (for instance database server could be physically rebooted). You can try to set reconnect flag when creating connection to enable automatic reconnects but those might not always work depending on your environment.

Upon further investigation, I think it's safe to conclude that it is my host's problem, not my code. I registered an account with a site that offered the ability to make free MySQL databases, and my server has been throwing SQL queries at it without issue all night. For posterity, if you are using your website's SQL server and are hosting with Hostgator, they might be causing the issue.

ADO's Command object Error when adAsyncExecute

I'm using ADO's Command object to execute simple commands.
For example,
_CommandPtr CommPtr;
CommPtr.CreateInstance(__uuidof(Command));
CommPtr->ActiveConnection = MY_CONNECTION;
CommPtr->CommandType = adCmdText;
CommPtr->CommandText = L"insert into MY_TABLE values MY_VALUE";
for (int i=0; i<10000; i++) {
CommPtr->Execute(NULL, NULL, adExecuteNoRecords);
}
This works fine, yet I wanted to make this an asynchronus execution to enhance performance when dealing with large amount of data... So I just simply changed the Execute Option to adAsyncExecute..
(Documentation Link)
_CommandPtr CommPtr;
CommPtr.CreateInstance(__uuidof(Command));
CommPtr->ActiveConnection = MY_CONNECTION;
CommPtr->CommandType = adCmdText;
CommPtr->CommandText = L"insert into MY_TABLE values MY_VALUE";
for (int i=0; i<10000; i++) {
CommPtr->Execute(NULL, NULL, adAsyncExecute);
}
This gives me a memory error for some reason..
First-change exception
Microsoft C++ exception:
_com_error at memory location 0x0028FA24
Any experts on ADO know why this is happening..?
Thanks

First, I would not bother ask why you need to loop 10K times just to execute the query as it would take you tremendous network, client and server resources.
I will answer how to use the Asynchronous way of executing queries.
You use this style of query execution to prevent your client from having a stuck-up GUI.
To the user, your App has hanged while it is waiting for the reply of your query. Your App cannot do anything. It looks frozen while waiting for a reply from the Database Server.
To implement a nice GUI with animated Hourglass to work, just like below:
you need to execute the query in Asynch mode.
Example below written in Visual Basic:
dbCon.Execute "Insert Into ...Values....",,adAsyncExecute
Do While dbCon.State = (ADODB.ObjectStateEnum.adStateExecuting + ADODB.ObjectStateEnum.adStateOpen)
Application.DoEvents
Loop
This way, the client will continue to wait and permit the GUI to do some events while it is waiting for the reply of the server making your App more responsive.

mysql reconnect c++

Right now I have a C++ client application that uses mysql.h to connect to a MYSQL database and have to preform some logic in case there is a disconnect. I'm wondering if this is the best way to reconnect to a MYSQL database in a situation where my client gets disconnected.
bool MYSQL::Reconnect(const char *host, const char *user, const char *passwd, const char *db)
{
bool out = false;
pid_t command_pid = fork();
if (command_pid == 0)
{
while(1)
{
sleep(1);
if (mysql_real_connect(&m_mysql, host, user, passwd, db, 0, NULL, 0) == NULL )
{
fprintf(stderr, "Failed to connect to database: Error: %s\n",
mysql_error(&m_mysql));
}
else
{
m_connected = true;
out = true;
break;
}
}
exit(0);
}
if (command_pid < 0)
fprintf(stderr, "Could not fork process[reconnect]: %s\n", mysql_error(&m_mysql));
return out;
}
Right now i take in all my parameters and preform a fork. the child process attempts to reconnect every second with a sleep() statement. Is this a good way to do this? Thanks

Sorry, but your code doesn't do what you think it does, Kaiser Wilhelm.
In essence, you're trying to treat a fork like a thread, which it is not.
When you fork a child, the parent process is completely cloned, including file and socket descriptors, which is how your program is connected to the MySQL database server. That is, both the parent and the child end up with their own copy of the same connection to the database server when you fork. I assume the parent only calls this Reconnect() method when it sees the connection drop, and stops using its copy of the now-defunct MySQL connection object, m_mysql. If so, the parent's copy of the connection is just as useless as the client's when you start the reconnect operation.
The thing is, the reverse is not also true: once the child manages to reconnect to the database server, the parent's connection object remains defunct. Nothing the child does propagates back up to the parent. After the fork, the two processes are completely independent, except insofar as they might try to access some I/O resource they initially shared. For example, if you called this Reconnect() while the connection was up and continued using the connection in the parent, the child's attempts to talk to the DB server on the same connection would confuse either mysqld or libmysqlclient, likely causing data corruption or a crash.
As hinted above, one solution to this is to use threads instead of forking. Beware, however, of the many problems with using threads with the MySQL C API.
Given a choice, I'd rather use asynchronous I/O to do the background connection attempt within the application's main thread, but the MySQL C API doesn't allow that.
It seems you're trying to avoid blocking your main application thread while attempting the DB server reconnection. It may be that you can get away with doing it synchronously anyway by setting the connect timeout to 1 second, which is fine when the MySQL server is on the same machine or same LAN as the client. If you could tolerate your main thread blocking for up to a second for connection attempts to fail — worst case happening when the server is on a separate machine and it's physically disconnected or firewalled — this would probably be a cleaner solution than threads. The connection attempt can fail much quicker if the server machine is still running and the port isn't firewalled, such as when it is rebooting and the TCP/IP stack is [still] up.

As far as I can tell, this doesn't do what you intended.
Logical issues
Reconnect doesn't "perform some logic in case there is a disconnect" at all.
It attempts to connect over and over again until it succeeds, then stops. That's it. The state of the connection is never checked again. If the connection drops, this code knows nothing about it.
Technical issues
Also pay close attention to the technical issues that Warren raises.

Sure, it's perfectly OK. You might want to think about replacing the while ( 1 ) loop with something like
while ( NULL == mysql_real_connect( ... )) {
sleep( 1 );
...
}
which is the kind of idiom that one learns by practice, but your code works just fine as far as I can see. Don't forget to put a counter inside the while loop.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js