ADO Connection to OLE DB is way too slow - c++

I'm using ADO Connection & Recordset Objects to access a Sybase ASE database (OLE DB Provider)..
For example, simply executing a SQL statement looks something like this:
(inserting 10000 rows of data)
_ConnectionPtr ConnPtr;
ConnPtr.CreateInstance("ADODB.Connection");
ConnPtr->Open(....my Connection String, UserID, and Password....);
for (int i=0; i<10000; i++)
ConnPtr->Execute("INSERT INTO my_table VALUES (1, 2, 3)");
OR (alternative option):
_RecordSet RecPtr; RecPtr.CreateInstance("ADODB.Recordset");
MyObject obj;
// Construct & Bind obj..
...
for (int i=0; i<10000; i++)
RecPtr->AddNew(&obj);
Both approach works fine and produces the expected result.. The only problem is that they are both extremely slow.. Inserting 10000 rows of data using raw sql statements only takes about 3~5 Seconds. On the other hand, accomplishing the same task using ADO objects takes 40-50 Seconds!!!
So here are some of my questions:
Is this a normal result? I mean it's obvious that direct sql execution is always faster than using something like ADO,, but is the performance difference usually this much different??
Can the speed bottleneck be attributed mostly to ADO? Or does the problem has to do more with Database (Sybase)..?
Is there any other way to access OLE DB in C++,, instead of using ADO (Faster alternative)??
Any insights by people who have alot of experience with database please

You should consider using the Prepared property so that the SQL query only gets compiled once. This is slow for a command's first execution, but, you will get improved performance for subsequent executions:
_ConnectionPtr ConnPtr;
ConnPtr.CreateInstance("ADODB.Connection");
ConnPtr->Open(....my Connection String, UserID, and Password....);
_CommandPtr CmdPtr;
CmdPtr.CreateInstance("ADODB.Command");
CmdPtr->ActiveConnection = ConnPtr;
CmdPtr->CommandText = "INSERT INTO my_table VALUES (1, 2, 3)";
CmdPtr->PutPrepared(true);
for (int i=0; i<10000; i++)
CommandPtr->Execute(NULL, NULL, adCmdText);
References:
MSDN - Prepared Property (ADO): http://msdn.microsoft.com/en-us/library/windows/desktop/ms675106(v=vs.85).aspx
MSDN - Prepared Property Example (VC++): http://msdn.microsoft.com/en-us/library/windows/desktop/ms681552(v=vs.85).aspx

Related

ORACLE bulk insert based on SQL

Ich have to shuffle a lot of data data from an application into an ORACLE 11g database. To execute SQL from within C++ I use to employ the following scheme, using bare SQL together with the Poco::Data framework and an ODBC connection to my database:
// get session from ODBC session pool
Session session = moc::SessionPoolMOC::get();
// prepare a statement as QString
QString q = ("INSERT INTO MOC_DATA_CONCENTRATOR"
"("
"EQUIPMENT_INSTANCE_ID, "
"STATION_NAME, "
"DATA_SRC, "
"ORG_ID)"
"values (%1, '%2', '%3', %4);"
);
/*
there would be also prepared statements, but I think that is not the
topic now...
*/
// set parameters within string from passed object 'info'
q = q.arg(info.equipment_instance_id); /* 1 */
q = q.arg(info.station_name.toUtf8().constData()); /* 2 */
q = q.arg(info.data_src.toUtf8().constData()); /* 3 */
q = q.arg(info.org_id); /* 4 */
// prepare statement
Statement query(session);
query << q.toUtf8().constData();
try
{
// execute query
query.execute();
}
catch (...)
{
...
}
I'm aware, that this is a very low level approach, but it worked really fine for all situations I encountered so far...
The problem is, that I have now a lot of data (about 400,000 records) to fill into one table. Data is available as different C++ objects, stored in a Qt-List. My naive approach is to call this code sequence for each object to insert. This works fine, but turns out to be quite slow (taking about 15 minutes or so). I asked around, and was told to better use a so called "bulk insert".
It seems, however, that this is more related to PLSQL - at least I have no Idea how to use that from the same context or in a similar way as shown in my example. I'm neither ORACLE expert nor administrator and usage of the database is only a side problem within my project. What would you recommend?
Many thanks,
Michael

SQLite C++ 'database is locked' when multiple processes access db in readonly mode

I have an sqlite database that doesn't change.
Multiple processes that open a database connection each in SQLITE_OPEN_READONLY mode using sqlite3_open_v2. Each process is single threaded
The connections are made from an MSVC project using the official C/C++ Interface's single amalgamated C source file.
According to the SQLite FAQ multiple processes running SELECTs is fine
Each process after opening the database creates 4 prepared SELECT statements each with 2 bindable values.
Over the course of the execution the statements (one at a time) have the following called on them repeatedly as required
sqlite3_bind_int
sqlite3_bind_int
sqlite3_step (while SQLITE_ROW is returned)
sqlite3_column_int (while there was a row)
sqlite3_reset
The prepared statements are reused so finalize isn't called on each of them until near the end of the program. Finally the database is closed at the very end of execution.
The problem is any of these operations can fail with error code = 5: 'database is locked'
Error code 5 is SQLITE_BUSY and the website states that
"indicates a conflict with a separate database connection, probably in a separate process"
The rest of the internet seems to agree that multiple READONLY connections is fine. I've gone over and over the source and can't see that anything is wrong (I can't post it here sadly, I know, not helpful)
So I'm turning it to you guys, what could I possibly be missing?
EDIT 1:
Database is on a local drive, File system is NTFS, OS is Windows 7.
EDIT 2:
Wrapping all sqlite3 calls in infinite loops that check if SQLITE_BUSY was returned and then remake the call alleviates the problem. I don't consider this a fix but if that truly is the right thing to do then I'll do that.
So the working answer I have used is to wrap all the calls to sqlite in functions that loop that function while SQLITE_BUSY is returned. There doesn't seem to be a simple alternative.
const int bindInt(sqlite3_stmt* stmt, int parameterIndex, int value)
{
int ret;
do
ret = sqlite3_bind_int(stmt, parameterIndex, value);
while (ret == SQLITE_BUSY)
return ret;
}

Speed up ADO's recordset BOF call

I have some C++ code on Windows that uses plain old ADO (not ADO.NET) to retrieve data from a bunch of SQL Server databases. The code uses forward-only cursors to allow for fire hose cursors for maximum data throughput on queries that produce large Recordsets.
The code that processes the results looks like this, using the #import-generated wrapper for ADO 2.7:
ADODB::_RecordsetPtr records("ADODB.Recordset");
records->Open(cmd, _variant_t(static_cast<IDispatch *>(m_DBConnection)), ADODB::adOpenForwardOnly, ADODB::adLockReadOnly, ADODB::adCmdText);
if (!(records->BOF && records->EOF))
{
... Loop over the recordset and extract data from each record ...
}
Profiling shows that close to 40% of the above loop is spent in the call to BOF and this is having a massive impact on the overall database code's read performance. Because the code uses forward only cursors, it's not possible to check the RecordCount property as it is always -1 when using a forward only cursor.
Is there another way to either check for an empty Recordset that is not using the BOF/EOF check, or a way to speed up this check?
The other alternative that I can think of at the moment is to use one of the other cursor types and check how that will affect data throughput.

Callgrind: Profile a specific part of my code

I'm trying to profile (with Callgrind) a specific part of my code by removing noise and computation that I don't care about.
Here is an example of what I want to do:
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
//Method to be profiled with these data
//Post operation on the data
}
My use-case is a regression test, I want to make sure that the method in question is still fast enough (something like less than 10% extra instructions since the last implementation).
This is why I'd like to have the cleaner output form Callgrind.
(I need a for loop in order to have a significant amount of data processed in order to have a good estimation of the behavior of the method I want to profile)
My first try was to change the code to:
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
CALLGRIND_START_INSTRUMENTATION;
//Method to be profiled with these data
CALLGRIND_STOP_INSTRUMENTATION;
//Post operation on the data
}
CALLGRIND_DUMP_STATS;
Adding the Callgrind macros to control the instrumentation. I also added the --instr-atstart=no options to be sure that I profile only the part of the code I want...
Unfortunately with this configuration when I start to launch my executable with callgrind, it never ends... It is not a question of slowness, because a full instrumentation run last less than one minute.
I also tried
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
CALLGRIND_TOGGLE_COLLECT;
//Method to be profiled with these data
CALLGRIND_TOGGLE_COLLECT;
//Post operation on the data
}
CALLGRIND_DUMP_STATS;
(or the --toggle-collect="myMethod" option)
But Callgrind returned me a log without any call (KCachegrind is white as snow :( and says zero instructions...)
Did I use the macros/options correctly? Any idea of what I need to change in order to get the expected result?
I finally managed to solve this issue... This was a config issue:
I kept the code
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
CALLGRIND_TOGGLE_COLLECT;
//Method to be profiled with these data
CALLGRIND_TOGGLE_COLLECT;
//Post operation on the data
}
CALLGRIND_DUMP_STATS;
But ran the callgrind with --collect-atstart=no (and without the --instr-atstart=no!!!) and it worked perfectly, in a reasonable time (~1min).
The issue with START/STOP instrumentation was that callgrind dumps a file (callgrind.out.#number) at each iteration (each STOP) thus it was really really slow... (after 5min I had only 5000 runs for a 300 000 iterations benchmark... unsuitable for a regression test).
The toggle-collect option is very picky in how you specify the method to use as trigger. You actually need to specify its argument list as well, and even the whitespace needs to match! Use the method name exactly as it appears in the callgrind output. For instance, I am using this invokation:
$ valgrind
--tool=callgrind
--collect-atstart=no
"--toggle-collect=ctrl_simulate(float, int)"
./swaag
Please observe:
The double quotes around the option.
The argument list including parentheses.
The whitespace after the comma character.

How to improve MS Access INSERT performance

I have a C++ program that insert about a million of records into MS Access DB using OLEDBConnection. To do that, I ran the INSERT INTO query a millions time in order to get the records inserted which take quite a long time.
The data is generated in the program in form of array, will that be any other way that i can load the data into database in one single step to improve the performance?
Thanks!
Loop i use to insert the records currently
for (int i = 0; i < populationSize; i++){
insertSQL = "INSERT INTO [" + pTableName + "] (" + columnsName + ") VALUES (" + columnsValue[i] + ");";`
outputDBConn->runSQLEdit(insertSQL);
}
Method that run the SQL query
void DBConnector::runSQLEdit(String^ query){
SQLCMD = gcnew OleDbCommand( query, dbConnection );
SQLCMD->CommandTimeout = 30;
dbConnection->Open();
SQLCMD->ExecuteNonQuery();
dbConnection->Close();
}
It seems very inefficient to open/close the connection for each insert statement.
The standard approach goes something like:
Open connection.
Start transaction, if supported. (This is often very important for databases with transactions.)
Insert. Repeat this step as needed.
Commit transaction, if supported.
Close connection.
Update: The following does not apply to MS Access. Access does not support inserting multiple rows from a literal. It only supports inserting multiple rows from an existing data source. (Although here is a "workabout" that might work. In any case, the most important thing is likely limiting the number of transactions.)
One more thing that can be done is to build a single insert command that adds multiple records at once. This can be done with either multiple statements or a multi-record insert (if supported). It may or may not be significantly faster than just the above (depends upon other factors like network latency and database engine) and will likely need to be adapted to fit within the restrictions of the database (e.g. might only be feasible for a few hundred records at once). This should only be considered after proper connection/transaction usage as described above.
I wouldn't be surprised if there we already-made "bulk insert" libraries/modules floating about... and I don't use MS Access so I can only hope that the above suggestions were helpful :-)
Happy coding.
Don't do ONE insertion per command.
Change your code to something like, this:
string strSQLCommand;
for (int i = 0; i < populationSize; i++){
strSQLCommand += "INSERT INTO [" + pTableName + "] (" + columnsName + ") VALUES (" + columnsValue[i] + ");";`
}
outputDBConn->runSQLEdit(strSQLCommand );
I'm not sure what's the max buffer size of the command, so do some checks and then get the best value to do some "breaks" at every X inserts.