In short, this post would like to answer the following question : how (if possible) can we configure a SQLite database to be absolutely sure that any INSERT command will return in less than 8 milliseconds?
By configure, I mean: compiling options, database pragma options, and run-time options.
To give some background, we would like to apply the same INSERT statement at 120 fps. (1000 ms / 120 fps ≃ 8 ms)
The Database is created with the following strings:
"CREATE TABLE IF NOT EXISTS MYTABLE ("
"int1 INTEGER PRIMARY KEY AUTOINCREMENT, "
"int2 INTEGER, "
"int3 INTEGER, "
"int4 INTEGER, "
"fileName TEXT);
and the options:
"PRAGMA SYNCHRONOUS=NORMAL;"
"PRAGMA JOURNAL_MODE=WAL;"
The INSERT statement is the following one:
INSERT INTO MYTABLE VALUES (NULL, ?, ?, ?, ?)
The last ? (for fileName) is the name of a file, so it's a small string. Each INSERT is thus small.
Of course, I use precompiled statements to accelerate the process.
I have a little program that makes one insert every 8 ms, and measures the time taking to perform this insert. To be more precise, the program makes one insert, THEN wait for 8 ms, THEN makes another insert, etc... At the end, 7200 inserts were pushed, so the program runs for about 1 minutes.
Here are two links that show two charts:
This image shows how many milliseconds were spent to make an insert as a function of the time expressed in minutes. As you can see, most of the time, the insert time is 0, but there are spikes than can go higher than 100 ms.
This image is the histogram representation of the same data. All the values below 5 ms are not represented, but I can tell you that from the 7200 inserts, 7161 are below 5 milliseconds (and would give a huge peak at 0 that would make the chart less readable).
The total program time is
real 1m2.286s
user 0m1.228s
sys 0m0.320s.
Let's say it's 1 minute and 4 seconds. Don't forget that we spend 7200 times 8 milliseconds to wait. So the 7200 inserts take 4 seconds ---> we have a rate of 1800 inserts per seconds, and thus an average time of 0.55 milliseconds per insert.
This is really great, except that in my case, i want ALL THE INSERTS to be below 8 milliseconds, and the chart shows that this is clearly not the case.
So where do these peaks come from?
When the WAL file reaches a given size (1MB in our case), SQLite makes a checkpoint (the WAL file is applied to the real database file). And because we passed PRAGMA SYNCHRONOUS=NORMAL, then at this moment, SQLite performs a fsync on the hard drive.
We suppose this is this fsync that makes the corresponding insert really slow.
This long insert time does not depend on the WAL file size. We played with the pragma WAL_AUTOCHECKPOINT (1000 by default) linked to the WAL file, and we could not reduce the height of the peaks.
We also tried with PRAGMA SYNCHRONOUS=OFF. The performances are better but still not enough.
For information, the dirty_background_ratio (/proc/sys/vm/dirty_background_ratio) on my computer was set to 0, meaning that all dirty pages in the cache must be flushed immediately on the hard drive.
Does anyone have an idea and how to "smooth" the chart, meaning that all inserts time will not overpass 8 ms ?
By default, pretty much everything in SQLite is optimized for throughput, not latency.
WAL mode moves most delays into the checkpoint, but if you don't want those big delays, you have to use more frequent checkpoints, i.e., do a checkpoint after each transaction.
In that case, WAL mode does not make sense; better try journal_mode=persist.
(This will not help much because the delay comes mostly from the synchronization, not from the amount of data.)
If the WAL/journal operations are too slow, and if even synchronous=off is not fast enough, then your only choice is to disable transaction safety and try journal_mode=memory or even =off.
Related
Let's say I have:
A table with 100 RCUs
This table has 200 items
Each item has 4kb
As far as I understand, RCU are calculated per second and you spend 1 full RCU per 4kb (with a strongly consistent read).
1) Because of this, if I spend more than 100 RCU in one second I should get an throttling error, right?
2) How can I predict that a certain request will require more than my provisioned througput? It feels scary that at any time I can compromise the whole database by making a expensive request.
3) Let's say I want to do a scan on the whole table (get all items), so that should require 200 RCUS, But that will depend on how fast dynamodb does it right? If its too fast it will give me an error, but if it takes 2 seconds or more it should be fine, how do I account for this? How to take in account DynamoDB speed to know how much RCUs I will need? What it DynamoDB "speed"?
4) What's the difference between throttling and throughput limit exceeded?
Most of your questions are theoretical at this point , because you now (as of Nov 2018) have the option of simply telling dynamodbv to use 'on demand' mode where you no longer need to calculate or worry about RCU's. Simply enable this option, and forget about it. I had similar problems in the past because of very uneven workloads - periods of no activity and then periods where I needed to do full table scans to generate a report - and struggled to get it all working seemlessly.
I turned on 'on demand' mode, cost went down by about 70% in my case, and no more throttling errors. Your cost profile may be different, but I would defintely check out this new option.
https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing/
Our content management server hosts the Lucene sitecore_analytics_index.
By default, the sitecore_analytics_index uses a TimedIndexRefreshStrategy with an interval of 1 minute. This means that every minute, Sitecore adds new analytics data to the index, and then optimizes the index.
We've found that the optimization part takes ~20 minutes for our index. In practice, this means that the index is constantly being optimized, resulting in non-stop high disk I/O.
I see two possible ways to improve the situation:
Don't run the optimize step after index updates, and implement an agent to optimize the index just once per day (as per this post). Is there a big downside to only optimizing the index, say, once per day? AFAIK it's not necessary to optimize the index after every update.
Keep running the optimize step after every index update, but increase the interval from 1 minute to something much higher. What ill-effects might we see from this?
Option 2 is easier as it is just a config change, but I suspect that updating the index less frequently might be bad (hopefully I'm wrong?). Looking in the Sitecore search log, I see that the analytics index is almost constantly being searched, but I don't know what by, so I'm not sure what might happen if I reduce the index update frequency.
Does anyone have any suggestions? Thanks.
EDIT: alternatively, what would be the impact of disabling the Sitecore analytics index entirely (and how could I do that)?
I am working on a map/reduce review and I always have reduce_overflow_error each time I run the view, if I set reduce_limit = false in couchdb configuration, it is working, I want to know if there is negative effect if I change this config setting? thank you
The setting reduce_limit=true enforces CouchDB to control the size of reduced output on each step of reduction. If stringified JSON output of a reduction step has more than 200 chars and it‘s twice or more longer than input, CouchDB‘s query server throws an error. Both numbers, 2x and 200 chars, are hard-coded.
Since a reduce function runs inside SpiderMonkey instance(s) with only 64Mb RAM available, the limitation set by default looks somehow reasonable. Theoretically, reduce must fold, not blow up the data given.
However, in real life it‘s quite hard to fly under the limitation in all cases. You can not control number of chunks for a (re)reduction step. It means you can run into situation, when your output for a particular chunk is more than twice longer in chars, although other chunks reduced are much shorter. In this case even one uncomfortable chunk breaks entire reduction if reduce_limit is set.
So unsetting reduce_limit might be helpful, if your reducer can sometimes output more data, than it received.
Common case – unrolling arrays into objects. Imagine you receive list of arrays like [[1,2,3...70], [5,6,7...], ...] as input rows. You want to aggregate your list in a manner {key0:(sum of 0th elts), key1:(sum of 1st elts)...}.
If CouchDB decides to send you a chunk with 1 or 2 rows, you have an error. Reason is simple – object keys are also accounted calculating result length.
Possible (but very hard to achieve) negative effect is SpiderMonkey instance constantly restarting/falling on RAM overquota, when trying to process a reduction step or entire reduction. Restarting SM is CPU and RAM intensive and costs hundreds milliseconds in general.
I found a very old thread from 2004 that reported the fact that the execution times listed in ColdFusion debugging output are only accurate to the 16ms. Meaning, that when you turn debugging output on and look at execution times, you're seeing an estimate to the closest multiple of 16ms. I can see this today with ACF10. When refreshing a page, most times bounce between multiples of 15-16ms.
Here are the questions:
Starting at the bottom, when ColdFusion reports 0ms or 16ms, does
that always mean somewhere between 0 and 16, but not over 16ms?
When coldfusion reports 32 ms, does this mean somewhere between
17 and 32?
ColdFusion lists everything separately by default rather than as
an execution tree where callers include many functions. When
determining the execution cost higher up on the tree, is it summing
the "innaccurate" times of the children, or is this a realistic cost
of the actual time all the child processes took to execute?
Can we use cftimers or getTickCount() to actually get accurate
times, or are these also estimates?
Sometimes, you'll see that 3 functions took 4ms each for a total of 12ms or even a single call taking 7ms. Why does it sometimes seem "accurate?"
I will now provide some guesses, but I'd like some community support!
Yes
Yes
ColdFusion will track report accurate to the 16ms the total time that process took rather than summing the child processes.
cftimers and getTickCount() are more accurate.
I have no idea?
In Java, you either have System.currentTimeMillis() or System.nanoTime().
I assume getTickCount() merely returns System.currentTimeMillis(). It's also used by ColdFusion to report debugging output execution times. You can find on numerous StackOverflow questions complaining about the inaccuracy of System.currentTimeMillis() because it is reporting from the operating system. On Windows, the accuracy can vary quite a bit, up to 50ms some say. It doesn't take leap ticks into account or anything. However, it is fast. Queries seem to report either something from the JDBC driver, the SQL engine, or another method as they are usually accurate.
As an alternative, if you really want increased accuracy, you can use this:
currentTime = CreateObject("java", "java.lang.System").nanoTime()
That is less performant than currentTimeMillis(), but it is precise down to nanoseconds. You can divide by 1000 to get to microseconds. You'll want to wrap in precisionEvaluate() if you are trying to convert to milliseconds by dividing by 1000000.
Please note that nanoTime() is not accurate to the nanosecond, is just precise to the nanosecond. It's accuracy is just a matter of being an improvement over currentTimeMillis().
This is more a comment then an answer but i can't comment yet.
In my experience the minimum execution time for a query is 0 ms or 16 ms. It is never 8ms or 9ms. For fun you can try this:
<cfset s = gettickcount()>
<cfset sleep(5)>
<cfset e = gettickcount() -s>
<Cfoutput>#e#</cfoutput>
I tried it with different values it seems the expected output and the actual output always differ in the range from 0ms to 16ms no matter what value is used. It seems that coldfusion (java) is accurate with a margin of about 16 ms.
I'm pretty sure this question has been asked several times, but either I did not find the correct answer or I didn't understand the solution.
To my current problem:
I have a sensor which measures the time a motor is running.
The sensor is reset after reading.
I'm not interested in the time the motor was running the last five minutes.
I'm more interested in how long the motor was running from the very beginning (or from the last reset).
When storing the values in an rrd, depending on the aggregate function, several values are recorded.
When working with GAUGE, the value read is 3000 (10th seconds) every five minutes.
When working with ABSOLUTE, the value is 10 every five minutes.
But what I would like to get is something like:
3000 after the first 5 minutes
6000 after the next 5 minutes (last value + 3000)
9000 after another 5 minutes (last value + 3000)
The accuracy of the older values (and slopes) is not so important, but the last value should reflect the time in seconds since the beginning as accurate as possible.
Is there a way to accomplish this?
I dont know if it is useful for ur need or not but maybe using TREND/TRENDNAN CDEF function is what u want, look at here:
TREND CDEF function
I now created a small SQLite database with one table and one column in that tabe.
The table has one row. I update that row every time my cron job runs and add the current value to the current value. So the current value of the one row and column is the cumualted value of my sensor. This is then fed into the rrd.
Any other (better) ideas?
The way that I'd tackle this (in Linux) is to write the value to a plain-text file and then use the value from that file for the RRDTool graph. I think that maybe using SQLite (or any other SQL server) just to keep track of this would be unnecessarily hard on a system just to keep track of something like this.