QSqlDatabase ODBC query speed - c++

I use QSqlDatabase to connect to remote MSSQL Server, but it seem the speed very slow. With a same query (the result is about 20 rows), i tried in MSSQL Management 2008 take about 1s but in my application (use QSqlDatabase) it take nearly 8s, Anybody explain me why this happen?

I found that setting setForward(true) and preparing your SQL statements drastically improves performance of SELECT based SQL queries. If setForward(true) is not called, Qt will attempt to loop through the entire result set while querying the database which causes slow downs.

Related

How to set or disable the Query Timeout with a C++'s CDatabase ODBC connection to a SQL Server database

I need to have a query run by some code. Running the query in MSSMS takes a minute and a half (not too shabby for over 4M rows).
In code I open the connection with
CDatabase *base = new CDatabase () ;
base->OpenEx ("Driver={SQL Server};Server=Computer\\User;Database=base;") ;
I can then create CRecordset objects and run queries. The SELECT COUNT query works properly (gives ~4M). The first SELECT cols query (fetching some attributes) works properly. Their respective CRecordset are properly closed and cleaned. The second SELECT cols query (big join that returns the 4M rows) times out on every try.
I do not know how to set up the query timeout value, how that parameter would be called, or where to set it in the first place. I tried many combinations of parameters in the connection string, I tried editing the ODBC pilot pooling options. I am not interested in using another ODBC connection object, but I can set up a DSN and connect through it instead of using the direct connection string.
Worst comes to worst I'll just paginate it all but right now it's a hassle, and since the query can timeout, logically there should be a way to set that timeout too, I'd like to know what it is.
Query timeout is not a connection string parameter, at least not for SQL Server. You're probably looking for the CDatabase::SetQueryTimeout member function.

Is it more optimal to cleanse data with regex in SSIS rather than SQL queries?

I am trying to cleanse a field in SQL server using regex replace CLR funcions. However, the query is taking forever to run. I am wondering if doing the same logic using a script component in SSIS would be any faster.
I've never met a DBA who let me load Assemblies onto a Prod Server!
But I've certainly found that String manipulation via SSIS Script Component was much faster than using CHARINDEX & SUBSTRING etc via SQL.
SSIS can execute the compiled .net code so that it can run all records in a buffer through the script task in parallel provided the code is non-blocking. Try and fit as many rows as possible into your buffer by minimizing the "width" (toal col size) of your data flow.
This might mean its faster to just take your fields to be cleansed & a primary key and bulk load these to a new empty table. You can always inner join this back to your original table afterwards...
So I'd be optimistic that SSIS Could perform adequately.
I would certainly run a quick test as the c# regex code for your script component can pretty much be lifted and dropped directly from your existing clr.

Django model count() with caching

I have an Django application with Apache Prometheus monitoring and model called Sample.
I want to monitor Sample.objects.count() metric
and cache this value for concrete time interval
to avoid costly COUNT(*) queries in database.
From this tutorial
https://github.com/prometheus/client_python#custom-collectors
i read that i need to write custom collector.
What is best approach to achieve this?
Is there any way in django to
get Sample.objects.count() cached value and update it after K seconds?
I also use Redis in my application. Should i store this value there?
Should i make separate thread to update Sample.objects.count() cache value?
First thing to note is that you don't really need to cache the result of a count(*) query.
Though different RDBMS handle count operations differently, they are slow across the board for large tables. But one thing they have in common is that there is an alternative to SELECT COUNT(*) provided by the RDBMS which is in fact a cached result. Well sort of.
You haven't mentioned what your RDBMS is so let's see how it is in the popular ones used wtih Django
mysql
Provided you have a primary key on your table and you are using MyISAM. SELECT COUNT() is really fast on mysql and scales well. But chances are that you are using Innodb. And that's the right storage engine for various reasons. Innodb is transaction aware and can't handle COUNT() as well as MyISAM and the query slows down as the table grows.
the count query on a table with 2M records took 0.2317 seconds. The following query took 0.0015 seconds
SELECT table_rows FROM information_schema.tables
WHERE table_name='for_count';
but it reported a value of 1997289 instead of 2 million but close enough!
So you don't need your own caching system.
Sqlite
Sqlite COUNT(*) queries aren't really slow but it doesn't scale either. As the table size grows the speed of the count query slows down. Using a table similar to the one used in mysql, SELECT COUNT(*) FROM for_count required 0.042 seconds to complete.
There isn't a short cut. The sqlite_master table does not provide row counts. Neither does pragma table_info
You need your own system to cache the result of SELECT COUNT(*)
Postgresql
Despite being the most feature rich open source RDBMS, postgresql isn't good at handling count(*), it's slow and doesn't scale very well. In other words, no different from the poor relations!
The count query took 0.194 seconds on postgreql. On the other hand the following query took 0.003 seconds.
SELECT reltuples FROM pg_class WHERE relname = 'for_count'
You don't need your own caching system.
SQL Server
The COUNT query on SQL server took 0.160 seconds on average but it fluctuated rather wildly. For all the databases discussed here the first count(*) query was rather slow but the subsequent queries were faster because the file was cached by the operating system.
I am not an expert on SQL server so before answering this question, I didn't know how to look up the row count using schema info. I found this Q&A helpfull. One of them I tried produced the result in 0.004 seconds
SELECT t.name, s.row_count from sys.tables t
JOIN sys.dm_db_partition_stats s
ON t.object_id = s.object_id
AND t.type_desc = 'USER_TABLE'
AND t.name ='for_count'
AND s.index_id = 1
You dont' need your own caching system.
Integrate into Django
As can be seen, all databases considered except sqlite provide a built in 'Cached query count' There isn't a need for us to create one of our own. It's a simple matter of creating a customer manager to make use of this functionality.
class CustomManager(models.Manager):
def quick_count(self):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("""SELECT table_rows FROM information_schema.tables
WHERE table_name='for_count'""")
row = cursor.fetchone()
return row[0]
class Sample(models.Model):
....
objects = CustomManager()
The above example is for postgresql, but the same thing can be used for mysql or sql server by simply changing the query into one of those listed above.
Prometheus
How to plug this into django prometheus? I leave that as an exercise.
A custom collector that returns the previous value if it's not too old and fetches otherwise would be the way to go. I'd keep it all in-process.
If you're using MySQL you might want to look at the collectors the mysqld_exporter offers as there's some for table size that should be cheaper.

Can Prefetch Row Count for Oracle OCCI be set externally (e.g. configuration, connection string)?

We have a simple OCCI (C++) Oracle client program that works too slowly with a remote DB. We discovered that increasing the prefetch RowCount parameter has a dramatic impact on this. Preferably we would like not to provide a patch to the customer at this point but to increase this value externally. Is this possible?
(The connection string is taken from our proprietary config so it can be modified if needed, as long as we don't provide a binary patch)

Fastest OLEDB read from ORACLE

What would be the fastest way of retrieving data from the Oracle DB via OLEDB?
It should be portable (have to work on Postgres and MS SQL), only one column is transfered (ID from some large table).
Current performance is 100k rows/sec. Am I expecting too much if I want it to go faster?
Clarification:
datatable has 23M records
Query is: SELECT ID FROM OBJECTS
Bottleneck is transfer from oracle to the client software, which is c++/OLEDB
What the heck, I'll take a chance.
Edit: As far as connectivity, I HEARTILTY recommend:
Oracle Objects for OLE, OO4O for short.
It's made by Oracle for Oracle, not by MS. It uses high-performance native drivers, NOT ODBC for a performance boost. I've personally used this myself on several occasions and it is fast. I was connecting to extremely large DB's and data warehouses where every table was never less than 2 million records, most were far larger.
Note you do not need to know OLE to use this. It wraps OLE, hence the name. Conceptually and syntactically, it wraps the "result set" into a dynaset fed by SQL commands. If you've ever used DAO, or ADO you will be productive in 5 minutes.
Here's a more in-depth article.
If you can't use OO4O, then the specialized .Net Data Provider made by Oracle is very good. NOT the one made by MS.
HTH
Use a "WHERE" clause? Example: "select id from objects where id = criteria"
WHERE
This sends only the record of interest across the network. Otherwise all 23 million records are sent across the wire.
OR, look into "between."
"select id from objects where id between thisone and thatone"
BETWEEN
That sends a reduced set of records in the range you specify.
HTH