DolphinDB - Can't use Delta of Delta to compress POINT data - compression

Script:
pt = point(1 2 3 4, 1 2 3 4)
compress(pt,"delta")
Error Message:
compress(pt, "delta") => Delta compression only supports integral and temporal data

Although an integer is inserted in when constructing the POINT type, the underlying storage is a POINT type, and the delta of delta compression does not support the compression of a floating point number.

Related

Cyclic Redundancy check : Single and double bit error

Found this in the book by Forouzan (Data Communications and Networking 5E). But, not able to understand the logic behind this.
This is in the context of topic two isolated single-bit errors
In other words, g(x) must not divide x^t + 1, where t is between 0 and n − 1. However, t = 0 is meaningless and t = 1 is needed as we will see later. This means t should be between 2 and n – 1
why t=1 is excluded here? (x^1 + 1) is two consecutive errors, it must also be detected right using our g(x).
The third image states that (x+1) should be a factor of g(x), but this reduces the maximum length that the CRC is guaranteed to detect 2 bit errors from n-1 to (n/2)-1, but it provides the advantage of being able to detect any odd number of bit errors such as (x^k + x^j + x^i) where k+j+i <= (n/2)-1.
Not mentioned in the book, is that some generators can detect more than 3 errors, but sacrifice the maximum length of a message in order to do this.
If a CRC can detect e errors, then it can also correct floor(e/2) errors, but I'm not aware of an efficient algorithm to do this, other than a huge table lookup (if there is enough space). For example there is a 32 bit CRC (in hex: 1f1922815 = 787·557·465·3·3) that can detect 7 bit errors or correct 3 bit errors for a message size up to 1024 bits, but fast correction requires a 1.4 giga-byte lookup table.
As for the "t = 1 is needed", the book later clarifies this by noting that g(x) = (x+1) cannot detect adjacent bit errors. In the other statement, the book does not special case t = 0 or t = 1, it states, "If a generator cannot divide (x^t + 1), t between 0 and n-1, then all isolated double bit errors can be detected", except that if t = 0, (x^0 + 1) = (1 + 1) = 0, which would be a zero bit error case.

Constructing chrono::time_point

Given two split values, seconds since epoch and µs, is one of the following preferable?
auto timestamp = system_clock::time_point(seconds(time_seconds) + microseconds(time_us));
or
auto timestamp = system_clock::time_point(seconds(time_seconds)) + microseconds(time_us);
It does not matter at all which one of those two you choose. It does however pay to have time_seconds and time_us as 64-bit integers--this cuts the whole operation from 5 instructions to 3 on x86_64. See: https://godbolt.org/g/8u1pYn

How to get rid of different results of logarithm between CUDA and CPU?

I want to realize an algorithm in GPU using CUDA. At the same time, I write a CPU edition using C++ to verify the results of GPU edition. However I got into trouble when using log() in CPU and GPU. A very simple piece of algorithm (used both on CPU and GPU) is shown below:
float U;
float R = U * log(U);
However, when I compare the results on CPU side, I find that there are many results (459883 out of 1843161) having small differences (max dif is 0.5). Some results are shown below:
U -- R (CPU side) -- R (GPU side) -- R using Python (U * math.log(U))
86312.0 -- 980998.375000 -- 980998.3125 -- 980998.3627440572
67405.0 -- 749440.750000 -- 749440.812500 -- 749440.7721980268
49652.0 -- 536876.875000 -- 536876.812500 -- 536876.8452369706
32261.0 -- 334921.250000 -- 334921.281250 -- 334921.2605240216
24232.0 -- 244632.437500 -- 244632.453125 -- 244632.4440747978
Can anybody give me some suggestions? Which one should I trust?
Which one should I trust?
You should trust the double-precision result computed by Python, that you could also have computed with CUDA or C++ in double-precision to obtain very similar (although likely not identical still) values.
To rephrase the first comment made by aland, if you care about an error of 0.0625 in 980998, you shouldn't be using single-precision in the first place. Both the CPU and the GPU result are “wrong” for that level of accuracy. On your examples, the CPU result happens to be more accurate, but you can see that both single-precision results are quite distant from the more accurate double-precision Python result. This is simply a consequence of using a format that allows 24 significant binary digits (about 7 decimal digits), not just for the input and the end result, but also for intermediate computations.
If the input is provided as float and you want the most accurate float result for R, compute U * log(U) using double and round to float only in the end. Then the results will almost always be identical between CPU and GPU.
By curiosity, I compared the last bit set in the significand (or in other words the number of trailing zeros in the significand)
I did it with Squeak Smalltalk because I'm more comfortable with it, but I'm pretty sure you can find equivalent libraries in Python:
CPU:
#(980998.375000 749440.750000 536876.875000 334921.250000 244632.437500)
collect: [:e | e asTrueFraction numerator highBit].
-> #(23 22 23 21 22)
GPU:
#(980998.3125 749440.812500 536876.812500 334921.281250 244632.453125)
collect: [:e | e asTrueFraction numerator highBit].
-> #(24 24 24 24 24)
That's interestingly not as random as we could expect, especially the GPU, but there is not enough clue at this stage...
Then I used an ArbitraryPrecisionFloat package to perform (emulate) the operations in extended precision, then round to nearest single precision float, the correct answer matches quite exactly the one of CPU:
#( 86312 67405 49652 32261 24232 ) collect: [:e |
| u r |
u := e asArbitraryPrecisionFloatNumBits: 80.
r = u*u ln.
(r asArbitraryPrecisionFloatNumBits: 24) asTrueFraction printShowingMaxDecimalPlaces: 100]
-> #('980998.375' '749440.75' '536876.875' '334921.25' '244632.4375')
It works as well with 64 bits.
But if I emulate the operations in single precision, then I can say the GPU matches the emulated results quite well too (except the second item):
#( 86312 67405 49652 32261 24232 ) collect: [:e |
| u r |
u := e asArbitraryPrecisionFloatNumBits: 24.
r = u*u ln.
r asTrueFraction printShowingMaxDecimalPlaces: 100]
-> #('980998.3125' '749440.75' '536876.8125' '334921.28125' '244632.453125')
So I'd say the CPU did probably use a double (or extended) precision to evaluate the log and perform the multiplication.
On the other side, the GPU did perform all the operations in single precision. Then the log function of ArbitraryPrecisionFloat package is correct to half ulp, but that's not a requirement of IEEE 754, so that can explain the observed mismatch on second item.
You may try to write the code so as to force float (like using logf instead of log if it's C99, or use intermediate results float ln=log(u); float r=u*ln;) and eventually use appropriate compilation flags to forbid extended precision (can't remember, I don't use C every day). But then you have very few guaranty to obtain 100% match on log function, the norms are too lax.

Fast percentile in C++ - speed more important than precision

This is a follow-up to Fast percentile in C++
I have a sorted array of 365 daily cashflows (xDailyCashflowsDistro) which I randomly sample 365 times to get a generated yearly cashflow. Generating is carried out by
1/ picking a random probability in the [0,1] interval
2/ converting this probability to an index in the [0,364] interval
3/ determining what daily cashflow corresponds to this probability by using the index and some linear aproximation.
and summing 365 generated daily cashflows. Following the previously mentioned thread, my code precalculates the differences of sorted daily cashflows (xDailyCashflowDiffs) where
xDailyCashflowDiffs[i] = xDailyCashflowsDistro[i+1] - xDailyCashflowsDistro[i]
and thus the whole code looks like
double _dIdxConverter = ((double)(365 - 1)) / (double)(RAND_MAX - 1);
for ( unsigned int xIdx = 0; xIdx < _xCount; xIdx++ )
{
double generatedVal = 0.0;
for ( unsigned int xDayIdx = 0; xDayIdx < 365; xDayIdx ++ )
{
double dIdx = (double)fastRand()* _dIdxConverter;
long iIdx1 = (unsigned long)dIdx;
double dFloor = (double)iIdx1;
generatedVal += xDailyCashflowsDistro[iIdx1] + xDailyCashflowDiffs[iIdx1] *(dIdx - dFloor);
}
results.push_back(generatedVal) ;
}
_xCount (the number of simulations) is 1K+, usually 10K.
The problem:
This simulation is being carried out 15M times (compared to 100K when the first thread was written) at the moment, and it takes ~10 minutes on a 3.4GHz machine. Due to the nature of problem, this 15M is unlikely to be significantly lowered in the future, only increased. Having used VTune Analyzer, I am being told that the last but one line (generatedVal += ...) generates 80% of runtime. And my question is why and how I can work with that.
Things I have tried:
1/ getting rid of the (dIdx - dFloor) part to see whether double difference and multiplication is the main culprit - runtime dropped by a couple of percent
2/ declaring xDailyCashflowsDistro and xDailyCashflowDiffs as __restict so as to prevent the compiler thinking they are dependendent on each other - no change
3/ tried using 16 days (as opposed to 365) to see whether it is cache misses that drag my performance - not a slight change
4/ tried using floats as opposed to doubles - no change
5/ compiling with different /fp: - no change
6/ compiling as x64 - has effect on the double <-> ulong conversions, but the line in question is unaffected
What I am willing to sacrifice is resolution - I do not care whether the generatedVal is 100010.1 or 100020.0 at the end if the speed gain is substantial.
EDIT:
The daily/yearly cashflows are related to the whole portfolio. I could divide all daily cashflows by portflio size and would thus (at 99.99% confidence level) ensure that daily cashflows/pflio_size will not reach out of the [-1000,+1000] interval. In this case, though, I would need precision to the hundredths.
Perhaps you could turn your piecewise linear function into a piecewise-linear "histogram" of its values. The number you're sampling appears to be the sum of 365 samples from that histogram. What you're doing is a not-particularly-fast way to sample from the sum of 365 samples from that histogram.
You might try computing a Fourier (or wavelet or similar) transform, keeping only the first few terms, raising it to the 365th power, and computing the inverse transform. You won't get a probability distribution in the end, but there shouldn't be "too much" mass below 0 or above 1 and the total mass shouldn't be "too different" from 1 with this technique. (I don't know what your data looks like; this technique may well be unworkable for good mathematical reasons.)

how I can normal a double point number to a int number exacly?

I have a Client program by tcp connection that send data to a server.
In client I need send a normalized decimal number to server for normalization I Multiplier decimal number to 100,000 then send it to server but I get wrong number in server.
for example.
double price;
I set it from Gui to 74.40
cout<<price; ---> 74.40
and when I serial my object I send
#define Normal 100000
int tmp = price*Normal;
oDest<<tmp;
In wireshrk I see that client sent 7439999.
why this happened? how I can pervent this problem?
Don't store anything as a floating point value. Use a rational number instead, or use a fixed point value. Floating point values (like double) basically "cheat" in order to fix a large range of possible values into a reasonable chunk of memory, and they have to make compromises in order to do so.
If you are storing a financial value, consider storing pennies or cents or whatever is the smallest denomination.
This is due to floating point precision errors. You can add some rounding:
int tmp = (price + 0.5/Normal)*Normal;
You need to round the number as you convert it to integer, due to the inability of floating point to represent a decimal number exactly.
int tmp = price*Normal + 0.5;