Improve UPDATE-per-second performance of SQLite?

Improve UPDATE-per-second performance of SQLite? - c++

My question comes directly from this one, although I'm only interested on UPDATE and only that.
I have an application written in C/C++ which makes heavy use of SQLite, mostly SELECT/UPDATE, on a very frequent interval (about 20 queries every 0.5 to 1 second)
My database is not big, about 2500 records at the moments, here is the table structure:
CREATE TABLE player (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name VARCHAR(64) UNIQUE,
stats VARBINARY,
rules VARBINARY
);
Up to this point I did not used transactions because I was improving the code and wanted stability rather performance.
Then I measured my database performance by merely executing 10 update queries, the following (in a loop of different values):
// 10 times execution of this
UPDATE player SET stats = ? WHERE (name = ?)
where stats is a JSON of exactly 150 characters and name is from 5-10 characters.
Without transactions, the result is unacceptable: - about 1 full second (0.096 each)
With transactions, the time drops x7.5 times: - about 0.11 - 0.16 seconds (0.013 each)
I tried deleting a large part of the database and/or re-ordering / deleting columns to see if that changes anything but it did not. I get the above numbers even if the database contains just 100 records (tested).
I then tried playing with PRAGMA options:
PRAGMA synchronous = NORMAL
PRAGMA journal_mode = MEMORY
Gave me smaller times but not always, more like about 0.08 - 0.14 seconds
PRAGMA synchronous = OFF
PRAGMA journal_mode = MEMORY
Finally gave me extremely small times about 0.002 - 0.003 seconds but I don't want to use it since my application saves the database every second and there's a high chance of corrupted database on OS / power failure.
My C SQLite code for queries is: (comments/error handling/unrelated parts omitted)
// start transaction
sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);
// query
sqlite3_stmt *statement = NULL;
int out = sqlite3_prepare_v2(query.c_str(), -1, &statement, NULL);
// bindings
for(size_t x = 0, sz = bindings.size(); x < sz; x++) {
out = sqlite3_bind_text(statement, x+1, bindings[x].text_value.c_str(), bindings[x].text_value.size(), SQLITE_TRANSIENT);
...
}
// execute
out = sqlite3_step(statement);
if (out != SQLITE_OK) {
// should finalize the query no mind the error
if (statement != NULL) {
sqlite3_finalize(statement);
}
}
// end the transaction
sqlite3_exec(db, "END TRANSACTION", NULL, NULL, NULL);
As you see, it's a pretty typical TABLE, records number is small and I'm doing a plain simple UPDATE exactly 10 times. Is there anything else I could do to decrease my UPDATE times? I'm using the latest SQLite 3.16.2.
NOTE: The timings above are coming directly from a single END TRANSACTION query. Queries are done into a simple transaction and i'm
using a prepared statement.
UPDATE:
I performed some tests with transaction enabled and disabled and various updates count. I performed the tests with the following settings:
VACUUM;
PRAGMA synchronous = NORMAL; -- def: FULL
PRAGMA journal_mode = WAL; -- def: DELETE
PRAGMA page_size = 4096; -- def: 1024
The results follows:
no transactions (10 updates)
0.30800 secs (0.0308 per update)
0.30200 secs
0.36200 secs
0.28600 secs
no transactions (100 updates)
2.64400 secs (0.02644 each update)
2.61200 secs
2.76400 secs
2.68700 secs
no transactions (1000 updates)
28.02800 secs (0.028 each update)
27.73700 secs
..
with transactions (10 updates)
0.12800 secs (0.0128 each update)
0.08100 secs
0.16400 secs
0.10400 secs
with transactions (100 updates)
0.088 secs (0.00088 each update)
0.091 secs
0.052 secs
0.101 secs
with transactions (1000 updates)
0.08900 secs (0.000089 each update)
0.15000 secs
0.11000 secs
0.09100 secs
My conclusions are that with transactions there's no sense in time cost per query. Perhaps the times gets bigger with colossal number of updates but i'm not interested in those numbers. There's literally no time cost difference between 10 and 1000 updates on a single transaction. However i'm wondering if this is a hardware limit on my machine and can't do much. It seems i cannot go below ~100 miliseconds using a single transaction and ranging 10-1000 updates, even by using WAL.
Without transactions there's a fixed time cost of around 0.025 seconds.

With such small amounts of data, the time for the database operation itself is insignificant; what you're measuring is the transaction overhead (the time needed to force the write to the disk), which depends on the OS, the file system, and the hardware.
If you can live with its restrictions (mostly, no network), you can use asynchronous writes by enabling WAL mode.

You may still be limited by the time it takes to commit a transaction. In your first example each transaction took about 0.10 to complete which is pretty close to the transaction time for inserting 10 records. What kind of results do you get if you batch 100 or 1000 updates in a single transaction?
Also, SQLite expects around 60 transactions per second on an average hard drive, while you're only getting about 10. Could your disk performance be the issue here?
https://sqlite.org/faq.html#q19

Try adding INDEXEs to your database:
CREATE INDEX IDXname ON player (name)

Related

WASAPI, Delays on m_AudioClient->Start()

The app captures sound from a microphone using WASAPI.
This code initializes m_AudioClient that is of type IAudioClient*.
const LONG CAPTURE_CLIENT_LATENCY = 50 * 10000;
DWORD loopFlag = m_IsLoopback ? AUDCLNT_STREAMFLAGS_LOOPBACK : 0;
hr = m_AudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED
, AUDCLNT_STREAMFLAGS_EVENTCALLBACK | AUDCLNT_STREAMFLAGS_NOPERSIST | loopFlag
, CAPTURE_CLIENT_LATENCY, 0, m_WaveFormat->GetRawFormat(), NULL);
Then I use m_AudioClient->Start() and m_AudioClient->Stop() to pause or resume capturing.
Usaually m_AudioClient->Start() takes 5-6 ms, but sometimes it takes about 150 mswhich is too much for the application.
If I once call m_AudioClient->Start() then subsecuent calls of m_AudioClient->Start() during next 5 seconds will be fast but after about 10-15 seconds next call of m_AudioClient->Start() will take logner (150 ms). So looks like it keeps some state several seconds and after that it needs to get to that state again which takes some time.
On another machine this delays never happen, every call of m_AudioClient->Start() takes about 30 ms
On the third machine average duration of m_AudioClient->Start() is 140 ms but peak values are about 1 s.
I run the same code on all 3 machines. The michrophone is not exatly the same on most cases it is Microphone Array Realtek High Definition Audio.
Can somebody explain why these peak values for the duration m_AudioClient->Start() happen and how I can fix it?

Cost efficiency for AWS Lambda Provisioned Concurrency

I'm running a system with lots of AWS Lambda functions. Our load is not huge, let's say a function gets 100k invocations per month.
For quite a few of the Lambda functions, we're using warmup plugins to reduce cold start times. This is effectively a CloudWatch event triggered every 5 minutes to invoke the function with a dummy event which is ignored, but keeps that Lambda VM running. In most cases, this means one instance will be "warm".
I'm now looking at the native solution to the cold start problem: AWS Lambda Concurrent Provisioning, which at first glance looks awesome, but when I start calculating, either I'm missing something, or this will simply be a large cost increase for a system with only medium load.
Example, with prices from the eu-west-1 region as of 2020-09-16:
Consider function RAM M (GB), average execution time t (s), million requests per month N, provisioned concurrency C ("number of instances"):
Without provisioned concurrency
Cost per month = N⋅(16.6667⋅M⋅t + 0.20)
= $16.87 per million requests # M = 1 GB, t = 1 s
= $1.87 per million requests # M = 1 GB, t = 100 ms
= $1.69 per 100.000 requests # M = 1 GB, t = 1 s
= $1686.67 per 100M requests # M = 1GB, t = 1 s
With provisioned concurrency
Cost per month = C⋅0.000004646⋅M⋅60⋅60⋅24⋅30 + N⋅(10.8407⋅M⋅t + 0.20) = 12.04⋅C⋅M + N(10.84⋅M⋅t + 0.20)
= $12.04 + $11.04 = $23.08 per million requests # M = 1 GB, t = 1 s, C = 1
= $12.04 + $1.28 = $13.32 per million requests # M = 1 GB, t = 100 ms, C = 1
= $12.04 + $1.10 = $13.14 per 100.000 requests # M = 1 GB, t = 1 s, C = 1
= $12.04 + $1104.07 = $1116.11 per 100M requests # M = 1 GB, t = 1 s, C = 1
There are obviously several factors to take into account here:
How many requests per month is expected? (N)
How much RAM does the function need? (M)
What is the average execution time? (t)
What is the traffic pattern, few small bursts or even traffic (might mean C is low, high, or must be dynamically changed to follow peak hours etc)
In the end though, my initial conclusion is that Provisioned Concurrency will only be a good deal if you have a lot of traffic? In my example, at 100M requests per month there's a substantial saving (however, at that traffic it's perhaps likely that you would need a higher value of C as well; break-even at about C = 30). Even with C = 1, you need almost a million requests per month to cover the static costs.
Now, there are obviously other benefits of using the native solution (no ugly dummy events, no log pollution, flexible amount of warm instances, ...), and there are also probably other hidden costs of custom solutions (CloudWatch events, additional CloudWatch logging for dummy invocations etc), but I think they are pretty much neglectible.
Is my analysis fairly correct or am I missing something?

I think about provisioned concurrency as something that eliminates the cold starts and not something that saves money. There is a bit of saving if you can keep the lambda function running all the time (100%) utilization, but as you've calculated it becomes quite expensive when the provisioned capacity sits idle.

DB inserts in a Django app gradually get slower as more inserts are executed

I am inserting a large amount of rows to a Postgresql db using Django ORM bulk_create(). I have roughly the following code:
entries = []
start = time()
for i, row in enumerate(initial_data_list):
serializer = serializer_class(data=row, **kwargs)
serializer.is_valid(raise_exception=True)
entries.append(MyModel(**serializer.initial_data))
if i % 1000 == 0:
MyModel.objects.bulk_create(entries)
end = time()
_logger.info('processed %d inserts... (%d seconds per batch)' % (i, end-start))
start = time()
entries = []
Measured execution times:
processed 1000 inserts... (16 seconds per batch)
processed 2000 inserts... (16 seconds per batch)
processed 3000 inserts... (17 seconds per batch)
processed 4000 inserts... (18 seconds per batch)
processed 5000 inserts... (18 seconds per batch)
processed 6000 inserts... (18 seconds per batch)
processed 7000 inserts... (19 seconds per batch)
processed 8000 inserts... (19 seconds per batch)
processed 9000 inserts... (20 seconds per batch)
etc., the time keeps growing as more insertions are made. The requests run inside a transaction using the Django settings DATABASES['default']['ATOMIC_REQUESTS'] = True, however, turning it off does not seem to have any effect. DEBUG mode is turned off.
Some observations:
When the request is finished, and I execute an identical request, the measured times look pretty much identical: Start from decently low, and begin growing from there. No process-restart between the requests.
The effect is the same, whether I'm doing individual insertions using serializer.save(), or bulk_create().
The execution time of the bulk_create() line itself keeps about the same, a bit more than half a seconds.
If I remove the insertion entirely, the loop is executed at constant time, which points to something going on in the db-connection layer, that slows down the entire process as the quantity of insertions grows...
During execution, The Python process' memory consumption pretty much stays constant after reaching the first bulk_create(), as does Postgres process memory, so that does not look to be the problem.
What is happening that makes the inserts grow slower? Since new requests start fast again (without process-restart needed), could I perform some sort of clean-ups during the request to restore speed?

The only solution I found working was to disable ATOMIC_REQUESTS in settings.py, which allows me to call db.connection.close() and db.connection.connect() every now and then during the request, to keep the execution time from growing.

Does MyModel have any indexes at the database level? Could it be that those are affecting the processing time by being updated after each insertion?
You could try to remove the indexes (if there are any) and see if that changes anything.

Big disclaimer : i'm still a novice so...
From what i can see in your code is
for 1000 entries 16seconds
for 9000 entries 20seconds
for me it looks like a normal thing that time increase, inserting more entries takes more time no??
Sorry if i'm misunderstanding something

Loop every x seconds based on process speed

I am implementing a basic (just for kiddies) anti-cheat for my game. I've included a timestamp to each of my movement packets and do sanity checks on server side for the time difference between those packets.
I've also included a packet that sends a timestamp every 5 seconds based on process speed. But it seems like this is a problem when the PC lags.
So what should I use to check if the process time is faster due to "speed hack"?
My current loop speed check on client:
this_time = clock();
time_counter += (double)(this_time - last_time);
last_time = this_time;
if (time_counter > (double)(5 * CLOCKS_PER_SEC))
{
time_counter -= (double)(5 * CLOCKS_PER_SEC);
milliseconds ms = duration_cast<milliseconds>(system_clock::now().time_since_epoch());
uint64_t curtime = ms.count();
if (state == WALK) {
// send the CURTIME to server
}
}
// other game loop function
The code above works fine if the clients PC doesn't lag maybe because of RAM or CPU issues. They might be running too many applications.
Server side code for reference: (GoLang)
// pktData[3:] packet containing the CURTIME from client
var speed = pickUint64(pktData, 3)
var speedDiff = speed - lastSpeed
if lastSpeed == 0 {
speedDiff = 5000
}
lastSpeed = speed
if speedDiff < 5000 /* 5000 millisec or 5 sec */ {
c.hackDetect("speed hack") // hack detect when speed is faster than the 5 second send loop in client
}

Your system has a critical flaw which makes it easy to circumvent for cheaters: It relies on the timestamp provided by the client. Any data you receive from the client can be manipulated by a cheater, so it must not be trusted.
If you want to check for speed hacking on the server:
log the current position of the players avatar at irregular intervals. Store the timestamp of each log according to the server-time.
Measure the speed between two such logs-entries by calculating the distance and divide it by the timestamp-difference.
When the speed is larger than the speed limit of the player, then you might have a cheater. But keep in mind that lags can lead to sudden spikes, so it might be better to take the average speed measurement of multiple samples to detect if the player is speed-hacking. This might make the speedhack-detection less reliable, but that might actually be a good thing, because it makes it harder for hackers to know how reliable any evasion methods they use are working.
To avoid false-positives, remember to keep track of any artificial ways of moving players around which do not obey the speed limit (like teleporting to spawn after being killed). When such an event occurs, the current speed measurement is meaningless and should be discarded.

Strategy to reduce time of gettimeofday?

I write a stat server to count visit data of each day, therefore I have to clear data in db (memcached) every day.
Currently, I'll call gettimeofday to get date and compare it with the cached date to check if there are of the same day frequently.
Sample code as belows:
void report_visits(...) {
std::string date = CommonUtil::GetStringDate(); // through gettimeofday
if (date != static_cached_date_) {
flush_db_date();
static_cached_date_ = date;
}
}
The problem is that I have to call gettimeofday every time the client reports visit information. And gettimeofday is time-consuming.
Any solution for this problem ?

The gettimeofday system call (now obsolete in favor of clock_gettime) is among the shortest system calls to execute. The last time I measured that was on an Intel i486 and lasted around 2us. The kernel internal version is used to timestamp network packets, read, write, and chmod system calls to update the timestamps in the filesystem inodes, and the like. If you want to measure how many time you spent in gettimeofday system call you just have to do several (the more, the better) pairs of calls, one inmediately after the other, annotating the timestamp differences between them and getting finally the minimum value of the samples as the proper value. That will be a good aproximation to the ideal value.
Think that if the kernel uses it to timestamp each read you do to a file, you can freely use it to timestamp each service request without serious penalty.
Another thing, don't use (as suggested by other responses) a routine to convert gettimeofday result to a string, as this indeed consumes a lot more resources. You can compare timestamps (suppose them t1 and t2) and,
gettimeofday(&t2, NULL);
if (t2.tv_sec - t1.tv_sec > 86400) { /* 86400 is one day in seconds */
erase_cache();
t1 = t2;
}
or, if you want it to occur everyday at the same time
gettimeofday(&t2, NULL);
if (t2.tv_sec / 86400 > t1.tv_sec / 86400) {
/* tv_sec / 86400 is the number of whole days since 1/1/1970, so
* if it varies, a change of date has occured */
erase_cache();
}
t1 = t2; /* now, we made it outside, so we tie to the change of date */
Even, you can use the time() system call for this, as it has second resolution (and you don't need to cope with the usecs or with the overhead of the struct timeval structure).

(This is an old question, but there is an important answer missing:)
You need to define the TZ environment variable and export it to your program. If it is not set, you will incur a stat(2) call of /etc/localtime... for every single call to gettimeofday(2), localtime(3), etc.
Of course these will get answered without going to disk, but the frequency of the calls and the overhead of the syscall is enough to make an appreciable difference in some situations.
Supporting documentation:
How to avoid excessive stat(/etc/localtime) calls in strftime() on linux?
https://blog.packagecloud.io/eng/2017/02/21/set-environment-variable-save-thousands-of-system-calls/

To summarise:
The check, as you say, is done up to a few thousand times per seconds.
You're flushing a cache once every day.
Assuming that the exact time at which you flush is not critical and can be seconds (or even minutes perhaps) late, there is a very simple/practical solution:
void report_visits(...)
{
static unsigned int counter;
if ((counter++ % 1000) == 0)
{
std::string date = CommonUtil::GetStringDate();
if (date != static_cached_date_)
{
flush_db_date();
static_cached_date_ = date;
}
}
}
Just do the check once every N-times that report_visits() is called. In the above example N is 1000. With up to a few thousand checks per seconds, you'll be less than a second (or 0.001% of a day) late.
Don't worry about counter wrap-around, it only happens once in about 20+ days (assuming a few thousand checks/s maximum, with 32-bit int), and does not hurt.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Improve UPDATE-per-second performance of SQLite? - c++

Try adding INDEXEs to your database: CREATE INDEX IDXname ON player (name)

Related

WASAPI, Delays on m_AudioClient->Start()

Cost efficiency for AWS Lambda Provisioned Concurrency

DB inserts in a Django app gradually get slower as more inserts are executed

Loop every x seconds based on process speed

Strategy to reduce time of gettimeofday?

Categories

Resources