I have to perform some calculations with data stored in an MSSQL Server database and then save the results in the same database.
I need to load (part of) a table into C++ data structures, perform a calculation (that can take substantial time), and finally add some rows to the same table.
The problem is that several users can access the database concurrently, and I want the table to be locked since the data is loaded in memory until the results of the calculation are written to the table.
Using the ODBC SDK, is it possible to explicitly lock and unlock part of a table?
I have tried the following test program, but unfortunately the INSERT statement succeeds before StmtHandle1 is freed:
SQLDriverConnect(ConHandle1, NULL, (SQLCHAR *)"DRIVER={ODBC Driver 13 for SQL Server};"
"SERVER=MyServer;"
"DATABASE=MyDatabase;"/*, ... */);
SQLSetStmtAttr(StmtHandle1,SQL_ATTR_CONCURRENCY,(SQLPOINTER)SQL_CONCUR_LOCK,SQL_IS_INTEGER);
SQLExecDirect(StmtHandle1, (SQLCHAR *)"SELECT * FROM [MyTable] WITH (TABLOCKX, HOLDLOCK)", SQL_NTS);
SQLDriverConnect(ConHandle2, NULL, (SQLCHAR *)"DRIVER={ODBC Driver 13 for SQL Server};"
"SERVER=MyServer;"
"DATABASE=MyDatabase;"/*, ... */);
SQLSetStmtAttr(StmtHandle2,SQL_ATTR_CONCURRENCY,(SQLPOINTER)SQL_CONCUR_LOCK,SQL_IS_INTEGER);
SQLExecDirect(StmtHandle2, (SQLCHAR *)"INSERT INTO [MyTable] VALUES (...)", SQL_NTS);
unfortunately the INSERT statement succeeds before StmtHandle1 is
freed
By default SQL Server opereates in autocommit mode, i.e. opens a tarnsaction and commits it for you.
You requested TABLOCKX and the table was locked for the duration of your transaction, but what you want instead is to explicitely open a transaction and don't commit/rollback it until you'll done with your calculations, i.e. you should use
begin tran; SELECT top 1 * FROM [MyTable] WITH (TABLOCKX, HOLDLOCK);
And you don't need to read the whole table, top 1 * is sufficient.
Related
I am trying to read the snowflake stream data using aws lambda (snowflake connector library) and writing the data into RDS SQL server. After the lambda run, my stream data is not getting deleted.
I don't want to read the data from stream and insert it into temporary snowflake table and again read to insert the data in the SQL server. Is there any better way to do this?
Lambda code:
for table in table_list:
sql5 = f"""SELECT "header__stream_position","header__timestamp" FROM STREAM_{table} where "header__operation" in ('UPDATE' ,'INSERT' ,'DELETE') ;"""
result =cs.execute(sql5).fetchall()
rds_columns = [(c[0],c[1],table[:-4]) for c in result]
if rds_columns:
cursor.fast_executemany = True
sql6 = f"INSERT INTO {RDS_TABLE}(LSNNUMBER,TRANSACTIONTIME,TABLENAME) VALUES (?, ?, ?);"
data = (rds_columns)
cursor.executemany(sql6,data)
table_write.append(table)
conn.commit()
ctx.commit()
Snowflake Streams requires a successful committed DML operation to advance the Stream so you can't avoid an intermediate Snowflake table (transient or otherwise) with Streams.
You could use Changes to get the same change information if you can manage the time/query offset within your application code.
The offset on a Stream will only advance if it is consumed by a DML statement. (INSERT,UPDATE,MERGE). There is a read-only version of streams called CHANGES. However, you must keep track of the offsets yourself.
https://docs.snowflake.com/en/sql-reference/constructs/changes.html
I am doing an experiment that concurrently insert data into a MySQL table by multi-thread.
Here is partial code in C++.
bool query_thread(const char* cmd, MYSQL* con) {
if( !query( cmd, con ) ) {
return 0;
}
return 1;
}
int main() {
........
if(mysql_query(m_con, "CREATE TABLE tb1 (model INT(32), handle INT(32))") != 0) {
return 0;
}
thread thread1(query_thread, "INSERT INTO tb1 VALUES (1,1)", m_con);
thread thread2(query_thread, "INSERT INTO tb1 VALUES (2,2)", m_con);
thread thread3(query_thread, "INSERT INTO tb1 VALUES (3,3)", m_con);
thread1.join();
thread2.join();
thread3.join();
}
But the MySQL error message is issued.
error cmd: INSERT INTO tb1 VALUES (1,1)
Lost connection to MySQL server during query
Segmentation fault
My questions are as following.
Is it because the MySQL cannot accept concurrently insertion? Or bad use of multi-thread.
By multi-thread insertion as above, does it help to speed up the program? I understand the best way are multiple insert per query and LOAD DATA INFILE. But I just want to know if this way can help.
Each thread must have:
own database connection
own transaction
own cursor
This, however will not make your inserts much faster. In short, the innodb log (journal) is essentially serial which limits server total insert rate. Read mysql performance blog (percona / mariadb) for details. Certainly there are parameters to tweak and there seem to have been advances with recently.
I have a table in MySQL database that holds some "prepared" jobs.
CREATE TABLE `ww_jobs_for_update` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`status` int(11) NOT NULL,
`inc` int(11) NOT NULL,
PRIMARY KEY (`id`)
)
Now I have a C++1y multithreaded application, where each thread goes to the table and selects a job where status=0 (Not completed), does some computation and sets the status=1 upon completion.
The problem is that many threads will acquire a "job row" concurrently so some locking in the database has to take place.
The C++ method that locks/updates/commit is the following
connection = "borrow a connection from the pool"
std::unique_ptr<sql::Statement> statement(connection->sql_connection_->createStatement());
connection->sql_connection_->setAutoCommit(false);
//statement->execute("START TRANSACTION");
std::unique_ptr<sql::ResultSet> rs(statement->executeQuery("select id from ww_jobs_for_update where status=0 ORDER BY id LIMIT 1 FOR UPDATE"));
if (rs->next()) {
db_id = rs->getInt64(1);
DEBUG << "Unlock Fetched: " << db_id;
}
rs->close();
std::stringstream ss;
ss << "update ww_jobs_for_update set status=1 where id=" << db_id;
statement->execute(ss.str());
//statement->execute("COMMIT;");
connection->sql_connection_->commit();
"release the connection to the pool();"
But this approach seems not be efficient. I always get back
ErrorCode: 1205,SQLState: HY000. Details:
from a lot of threads, especially when there load is increasing.
Why I am getting this back? What is the most efficient way to do this, hard consistency is a requirement.
The best way for this task in my experience is using redis queues.
Locking tables SELECT ... FOR UPDATE is hanging the database when you have some multi-thread application running etc.
I would advice you to install a redis, and write some scripts to create queues according to the data in tables and rewrite your program to use redis queues for the task.
redis queues will not give the same value for different threads, so you get the uniqueness and there are no locks in the database - so your scripts will work fast.
Can you make your transaction durations shorter? Here's what I mean.
You have status values of 0 for "waiting" and 1 for "complete". Use the status value 2 (or -1, or whatever you choose) to mean "working".
Then when a worker thread grabs a job to do from the table, it will do this (pseudo-SQL).
BEGIN TRANSACTION
SELECT id FROM ww_jobs_for_update WHERE status=0 ORDER BY id LIMIT 1 FOR UPDATE
UPDATE ww_jobs_for_update SET status=2 WHERE id = << db_id
COMMIT
Now, your thread has taken a job and released the transaction lock. When the job is done you simply do this to mark it done, with no transaction needed.
UPDATE ww_jobs_for_update SET status=1 WHERE id=" << db_id;
There's an even simpler way to do this if you can guarantee each worker thread has a unique identifer threadId . Put a thread column in the table with a default NULL value. Then to start processing a job.
UPDATE ww_jobs_for_update
SET thread = threadId, status = 2
WHERE status = 0 AND threadId IS NULL
ORDER BY id LIMIT 1;
SELECT id FROM ww_jobs_for_update WHERE thread = threadId AND status=2
When done
UPDATE ww_jobs_for_update
SET thread = NULL, status = 1
WHERE thread = threadId;
Because each thread has a unique threadId, and because individual SQL UPDATE statements are themselves little transactions, you can do this without using any transactions or commits at all.
Both these approaches have the added benefit that you can use a SELECT query to find out which jobs are active. This may allow you to deal with jobs that never completed for whatever reason.
I have a C/C++ DLL that is connecting to SQL and issuing a large number of ODBC queries rapidly in a loop. The only thing is that it is turning out to be so much slower from the ODBC DLL than running the query from T-SQL in Management Studio. Many orders of magnitude slower.
At first I thought it might be the query itself, but then I stripped it down to a simple "select NULL" and still got the same results.
I was wondering if this is expected or whether there is some ODBC setting that I am missing or getting wrong?
First I connect like this (for brevity I have omitted all error checking, however, retcode is returning SQL_SUCCESS in all cases):
char *connString = "Driver={SQL Server};Server=.\\ENT2012;uid=myuser;pwd=mypwd";
...
retcode = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &henv);
retcode = SQLSetEnvAttr(henv, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0);
retcode = SQLSetEnvAttr(henv, SQL_ATTR_ODBC_VERSION, (SQLPOINTER)SQL_OV_ODBC3, SQL_IS_UINTEGER);
retcode = SQLAllocHandle(SQL_HANDLE_DBC, henv, &hdbc);
SQLSetConnectAttr(hdbc, SQL_LOGIN_TIMEOUT, (void*)5, 0);
retcode = SQLDriverConnect(
hdbc,
0,
(SQLTCHAR*) connString,
SQL_NTS,
connStringOut,
MAX_PATH,
(SQLSMALLINT*)&connLen,
SQL_DRIVER_COMPLETE);
Then I prepare the statement, bind a parameter (unused in this example), and bind a column like this:
char queryString = "select NULL;";
SQLLEN g_int32 = 4;
SQLLEN bytesRead = 0;
...
retcode = SQLAllocHandle(SQL_HANDLE_STMT, hdbc, &hstmt)
retcode = SQLPrepare(hstmt, queryString, SQL_NTS);
retcode = SQLBindParameter(hstmt, 1, SQL_PARAM_INPUT,
SQL_C_LONG, SQL_INTEGER, sizeof(int), 0, spid, 0, (SQLLEN*)&g_int32))
retcode = SQLBindCol(hstmt, 1, SQL_C_CHAR, col_1, 32, &bytesRead);
Finally, I repeatedly call the query (e.g., 10000 times) in a loop like this:
retcode = SQLExecute(hstmt);
retcode = SQLFetch(hstmt);
SQLCloseCursor(hstmt);
This takes about 90 seconds to run 10000 times in the ODBC DLL. Testing on a 4 core Windows 2008 R2 Server running SQL 2012 x64.
On the other hand, if I run, what looks to me to be, an equivalent test in Management Studio, it takes less than a second:
declare #sql varchar(128), #repeat int;
set #repeat = 10000;
set #sql = 'select NULL;';
while #repeat > 0 begin
exec(#sql);
set #repeat = #repeat - 1;
end;
Can someone point out something that I am overlooking? Some flaw in my logic?
Thanks.
Neil Weicher
www.netlib.com
This is too long for a comment
declare #sql varchar(128), #repeat int;
set #repeat = 10000;
set #sql = 'select NULL;';
while #repeat > 0 begin
exec(#sql);
set #repeat = #repeat - 1;
end;
does not realistically simulate the same as 10000 remote calls. exec bypasses a lot of the internals of setting up a request. To simulate 10000 calls do this in SSMS:
select NULL;
go 10000
and measure. Output as text probably should be used to avoid timing around SSMS grid display.
I'm not all that familiar with the T-SQL stuff but here are a couple things to consider.
Your ODBC driver has to transfer data via your network and I suspect the T-SQL execution does not. Next to disk IO, transfering data on the network is one of the slowest things your ODBC driver will have to do. You may find that the driver is spending considerable time either waiting on the data to travel to clearing data off the wire.
Also, it's not clear to me that your T-SQL example actually moves and data but your ODBC example does when SQLFetch is called. The T-SQL may just be executing the query and never fetching any data. So, removing SQLFetch from the loop might be a more equal comparison.
To see if data transfer is your limiting factor estimate how much data will be included in all the records you fetch with ODBC and try to move that much data between the two machines with something like FTP. An ODBC driver will never be able to fetch data faster than a simple raw transfer of data. I see you are just fetching NULL so not much in your result set but the driver and database still transfer data between them to service this request. Could be several hundred bytes per execution\fetch.
I faced the same issue. I solved it by changing the "Cursor Default Mode" setting of the DSN for the ODBC driver (through the ODBC Administrator tool) from "READ_ONLY" to "READ_ONLY_STREAMING". This alone increased the speed of my application (query data and write them to file) from 260 seconds to 51 seconds using Java 32-bit and from 1234 seconds to 11 seconds using C++.
See this post: http://www.etl-tools.com/forum/visual-importer/1587-question-about-data-transformation-memory-usage?start=6
Where exactly sqlite3 database will be at SQLITE_BUSY state for other threads and processes. (db at default mode SERIALIZE, not WAL)
Simple Example to Illustrate the question:
char buffer[] = "SELECT sessionid FROM sessions WHERE something < 1000";
sqlite3_prepare_v2(db, buffer, strlen(buffer), &stmt, 0)
// IS DB SQLITE_BUSY HERE ? PLACE 1
while( sqlite3_step(stmt) == SQLITE_ROW )
{
// IS DB SQLITE_BUSY HERE ? PLACE 2
}
// IS DB STILL SQLITE_BUSY HERE? PLACE 3
sqlite3_finalize(stmt);
I know for a fact that both sqlite3_prepare_v2 and sqlite3_step can error with SQLITE_BUSY (the docs say so and I've encountered it many times). The docs are less clear for sqlite3_finalize but my impression is that sqlite3_finalize is merely for memory management, so it should not do any database access.
sqlite3_step is the most likely place for this to happen, since it is what actually performs things like "INSERT INTO..." and "COMMIT" which tend to be heavy on the database.
SQLite is not terribly helpful when it comes to concurrency. By default, it does not even provide any fairness guarantees (although you can write them yourself as long as your concurrency is happening within the same process).