I found a very old thread from 2004 that reported the fact that the execution times listed in ColdFusion debugging output are only accurate to the 16ms. Meaning, that when you turn debugging output on and look at execution times, you're seeing an estimate to the closest multiple of 16ms. I can see this today with ACF10. When refreshing a page, most times bounce between multiples of 15-16ms.
Here are the questions:
Starting at the bottom, when ColdFusion reports 0ms or 16ms, does
that always mean somewhere between 0 and 16, but not over 16ms?
When coldfusion reports 32 ms, does this mean somewhere between
17 and 32?
ColdFusion lists everything separately by default rather than as
an execution tree where callers include many functions. When
determining the execution cost higher up on the tree, is it summing
the "innaccurate" times of the children, or is this a realistic cost
of the actual time all the child processes took to execute?
Can we use cftimers or getTickCount() to actually get accurate
times, or are these also estimates?
Sometimes, you'll see that 3 functions took 4ms each for a total of 12ms or even a single call taking 7ms. Why does it sometimes seem "accurate?"
I will now provide some guesses, but I'd like some community support!
Yes
Yes
ColdFusion will track report accurate to the 16ms the total time that process took rather than summing the child processes.
cftimers and getTickCount() are more accurate.
I have no idea?
In Java, you either have System.currentTimeMillis() or System.nanoTime().
I assume getTickCount() merely returns System.currentTimeMillis(). It's also used by ColdFusion to report debugging output execution times. You can find on numerous StackOverflow questions complaining about the inaccuracy of System.currentTimeMillis() because it is reporting from the operating system. On Windows, the accuracy can vary quite a bit, up to 50ms some say. It doesn't take leap ticks into account or anything. However, it is fast. Queries seem to report either something from the JDBC driver, the SQL engine, or another method as they are usually accurate.
As an alternative, if you really want increased accuracy, you can use this:
currentTime = CreateObject("java", "java.lang.System").nanoTime()
That is less performant than currentTimeMillis(), but it is precise down to nanoseconds. You can divide by 1000 to get to microseconds. You'll want to wrap in precisionEvaluate() if you are trying to convert to milliseconds by dividing by 1000000.
Please note that nanoTime() is not accurate to the nanosecond, is just precise to the nanosecond. It's accuracy is just a matter of being an improvement over currentTimeMillis().
This is more a comment then an answer but i can't comment yet.
In my experience the minimum execution time for a query is 0 ms or 16 ms. It is never 8ms or 9ms. For fun you can try this:
<cfset s = gettickcount()>
<cfset sleep(5)>
<cfset e = gettickcount() -s>
<Cfoutput>#e#</cfoutput>
I tried it with different values it seems the expected output and the actual output always differ in the range from 0ms to 16ms no matter what value is used. It seems that coldfusion (java) is accurate with a margin of about 16 ms.
Related
When I execute a CF on GCP, it has graphs on 4 parameter. Invocations, Active Instance are easy to understand what data is trying to say. But I am unable to make sense of other graphs,i.e execution time & memory usage. This is a screenshot of one of our http triggered CF. Can someone explain how exactly to make sense of this data? What does CF mean when it says, 99th percentile: 882.85
Is 99th percentile good or bad?
It is neither good nor bad; these are the statistics for the average execution time.
See what percentile actually means, in order to understand the chart's meaning.
eg. 99% of the observations fall below the average execution duration of 882.85 ms -
and that 1% of the observations have extreme values, which do not fall below that.
These 882.85 ms might only be suboptimal, in case the function could possibly run quicker.
It's represented alike this, so that a few extreme values won't distort the whole statistics.
So I've been using libcurl to have a little go with HTTP requests like GET, and I have managed to create the progress function callback to see how much has downloaded. However, what I don't know is the formula in order to calculate download speed as you go (similar to how browsers show you the download speed, eg Chrome).
I originally thought of using this:
downloadSpeed = amountCurrentlyDownloaded / secondsSinceDownloadStarted
Similar to the
speed = distance / time
formula. However, this isn't accurate. For example, if the download hasn't changed at all, downloadSpeed will go down slightly, but not down to zero.
So what is the correct formula to calculate download speed?
Think of a car. Do you want to know the average speed for the trip, or do you want to know your current speed? Your formula gives average speed.
Since you are receiving data in increments, you can't just do like a spedometer and see a current speed. Instead, maybe you could update every few seconds, and when you do, divide the number of chars since the
Last update by the time since the last update (need to use higher precision timer than seconds).
Perhaps you want to display both the current and average speeds. That's just a question of what will "feel" best to your users.
I need to measure message decoding latency (3 to 5 us ) of a low latency application.
I used following method,
1. Get time T1
2. Decode Data
3. Get time T2
4. L1 = T2 -T1
5. Store L1 in a array (size = 100000)
6. Repeat same steps for 100000 times.
7. Print array.
8. Get the 99% and 95% presentile for the data set.
But i got fluctuation between each test. Can some one explain the reason for this ?
Could you suggest any alternative method for this.
Note: Application is tight loop (acquire 100% cpu) and Bind to CPU via taskset commad
There are a number of different ways that performance metrics can be gathered either using code profilers or by using existing system calls.
NC State University has a good resource on the different types of timers and profilers that are available as well as the appropriate case for using each and some examples on their HPC website here.
Fluctuations will inevitably occur on most modern systems, certain BIOS setting related to hyper threading and frequency scaling can have a significant impact on the performance of certain applications, as can power-consumption and cooling/environmental settings.
Looking at the distribution of results as a histogram and/or fitting them to a Gaussian will also help determine how normal the distribution is and if the fluctuations are normal statistical noise or serious outliers. Running additional tests would also be beneficial.
In short, this post would like to answer the following question : how (if possible) can we configure a SQLite database to be absolutely sure that any INSERT command will return in less than 8 milliseconds?
By configure, I mean: compiling options, database pragma options, and run-time options.
To give some background, we would like to apply the same INSERT statement at 120 fps. (1000 ms / 120 fps ≃ 8 ms)
The Database is created with the following strings:
"CREATE TABLE IF NOT EXISTS MYTABLE ("
"int1 INTEGER PRIMARY KEY AUTOINCREMENT, "
"int2 INTEGER, "
"int3 INTEGER, "
"int4 INTEGER, "
"fileName TEXT);
and the options:
"PRAGMA SYNCHRONOUS=NORMAL;"
"PRAGMA JOURNAL_MODE=WAL;"
The INSERT statement is the following one:
INSERT INTO MYTABLE VALUES (NULL, ?, ?, ?, ?)
The last ? (for fileName) is the name of a file, so it's a small string. Each INSERT is thus small.
Of course, I use precompiled statements to accelerate the process.
I have a little program that makes one insert every 8 ms, and measures the time taking to perform this insert. To be more precise, the program makes one insert, THEN wait for 8 ms, THEN makes another insert, etc... At the end, 7200 inserts were pushed, so the program runs for about 1 minutes.
Here are two links that show two charts:
This image shows how many milliseconds were spent to make an insert as a function of the time expressed in minutes. As you can see, most of the time, the insert time is 0, but there are spikes than can go higher than 100 ms.
This image is the histogram representation of the same data. All the values below 5 ms are not represented, but I can tell you that from the 7200 inserts, 7161 are below 5 milliseconds (and would give a huge peak at 0 that would make the chart less readable).
The total program time is
real 1m2.286s
user 0m1.228s
sys 0m0.320s.
Let's say it's 1 minute and 4 seconds. Don't forget that we spend 7200 times 8 milliseconds to wait. So the 7200 inserts take 4 seconds ---> we have a rate of 1800 inserts per seconds, and thus an average time of 0.55 milliseconds per insert.
This is really great, except that in my case, i want ALL THE INSERTS to be below 8 milliseconds, and the chart shows that this is clearly not the case.
So where do these peaks come from?
When the WAL file reaches a given size (1MB in our case), SQLite makes a checkpoint (the WAL file is applied to the real database file). And because we passed PRAGMA SYNCHRONOUS=NORMAL, then at this moment, SQLite performs a fsync on the hard drive.
We suppose this is this fsync that makes the corresponding insert really slow.
This long insert time does not depend on the WAL file size. We played with the pragma WAL_AUTOCHECKPOINT (1000 by default) linked to the WAL file, and we could not reduce the height of the peaks.
We also tried with PRAGMA SYNCHRONOUS=OFF. The performances are better but still not enough.
For information, the dirty_background_ratio (/proc/sys/vm/dirty_background_ratio) on my computer was set to 0, meaning that all dirty pages in the cache must be flushed immediately on the hard drive.
Does anyone have an idea and how to "smooth" the chart, meaning that all inserts time will not overpass 8 ms ?
By default, pretty much everything in SQLite is optimized for throughput, not latency.
WAL mode moves most delays into the checkpoint, but if you don't want those big delays, you have to use more frequent checkpoints, i.e., do a checkpoint after each transaction.
In that case, WAL mode does not make sense; better try journal_mode=persist.
(This will not help much because the delay comes mostly from the synchronization, not from the amount of data.)
If the WAL/journal operations are too slow, and if even synchronous=off is not fast enough, then your only choice is to disable transaction safety and try journal_mode=memory or even =off.
I need to measure an elapsed time (or some sort of timestamp - doesn't matter if it's the system time or something that started from 0) in milliseconds, and was interested in using the boost::cpu_timer class to do this.
Is it un-wise to use this class for an extended period of time (i.e. a week straight of non-stop measuring)? Is there an alternative solution?
From my experience with getting the system timestamp, I've gradually come to the assumption that obtaining the timestamp in milliseconds (which is what I need) every couple of milliseconds is incredibly slow and strenuous.
I think boost::chrono or std::chrono better solve this problem