How to Determine Which CRC to Use? - crc

If I have a certain number of bytes to transfer serially, how do I determine which CRC (CRC8, CRC16, etc., basically a how many bit CRC?) to use and still have error detection percentage be high? Is there a formula for this?

From a standpoint of the length of the CRC, normal statistics apply. For a bit-width of CRC, you have 1/(2^n) chance of having a false positive. So for a 8 bit CRC, you have a 1/255 chance, etc.
However, the polynomial chosen also makes a big impact. The math is highly dependent on the data being transferred and isn't an easy answer.
You should evaluate more than just CRC depending on your communication mechanism (FEC with systems such a turbo codes is very useful and common).

To answer this question, you need to know the bit error rate of your channel which can only be determined empirically. And then once you have the measured BER, you have to decide what detection rate is "high" enough for your purposes.
Sending each message, for example, 5 times will give you pretty pretty good detection even on a very noisy channel, but it does crimp your throughput a bit. However, if you are sending commands to a deep-space-probe you may need that redundancy.

Related

Can I use software to measure the frequency (in Hertz) that information is being sent?

I have two computers which send and receive data from each other via reflective memory.
Using time stamps I have been able to determine the time to transfer a 32 bit int (in this case 0.2313ms). I need to know the how much bandwidth is being used in hertz (Hz) to make sure I meet the project requirements. Is there a software based way (preferably C++) to determine this? From what I can tell I can't simply convert between bps and hertz.
No. That's not a C++ problem, though : you simply have not nearly enough information to even guess at the bandwidth. You have one latency measure, but no encoding knowledge, no signal-to-noise ratios, no sustained throughput measures, ...

Recommended samples for performance benchmarks?

I'm writing performance benchmarks for some of my code. This is both to compare my own implementations as I develop/experiment, and to compare against "competing" implementations. I have no problem writing these, and getting usable results.
It's very well established that more samples are a good thing, as it reduces the impact of erroneous data and gives a more true result.
So, if I'm profiling a given function/procedure/whatever, how many samples does it seem reasonable to get?
I'm currently doing about 1 million samples for each test. These are individual operations, the results rarely take longer than 10s per item, even on an old laptop. Most are under a hundredth of a second.
Actually, it is not well established that more samples are a good thing.
It is nothing more than common wisdom.
I think you are sharing in a general confusion about the reason for profiling, whether the purpose is to measure performance or to find speedups.
For measuring performance, you don't need samples at all.
what you need is a stopwatch, whether in software or not.
If your process runs too quickly for the resolution of the stopwatch, just run your process 10^3 or 10^6 times, measure it, and divide by that number.
For finding speedups, sampling the call stack is very effective, provided the samples contain line-level or instruction-level call site information.
How many samples do you need?
Well, if you see it doing something that could be removed on one sample, that probably doesn't mean much.
But if you see it on two samples, that estimates it's costing time fraction F of about 2/N where N is the number of samples.
Example: if you see it twice in 10 samples, that means it costs roughly 20% of time.
In general, if the speedup is going to save you fraction F of time, it takes on average 2/F samples to see it twice.
Example: if it is going to save 30% of time (F = 0.3) you need on average 2/0.3 = 6.67 samples to see it twice.
Of course, if you see it more than twice, all the better.
Bottom line, for finding speedups, you don't need a lot of samples.
What you do need is to examine each one for activity that could be removed.
What you don't need is to mush them together into "statistics" (like most profilers do).
Many people understand this.
If you want a bit more rigorous explanation, look here.

Recommended CRC16 polynomials for data logging application

I'm writing a data logging application (running on a microcontroller) which will write data to ordinary, embedded NOR-type serial flash memory (in this case - an AT25DF161.)
Each packet of data (240 or 496 bytes) will be logged individually to the flash one after another. I figure the most common failure in the flash memory would be a stuck bit - typically "0", the non-erased state. I need to be able to detect single bit events, typically -at most- two per record (I assume this as a worst case after 100,000 write cycles.)
I'm using a processor which has a built in 16-bit CRC calculation module, so there's no performance implication for using less or more terms - so what decisions would I need to make to decide on an optimum polynomial?
Use a standard polynomial. You can find a list to choose from here.
Look at the paper by Philip Koopman, "Cyclic Redundancy Code (CRC) Polynomial Selection for Embedded Networks". He analyzes a number of 16-bit polynomials, measuring their error detection capability for different message lengths. As you'll see from his paper, they are not all created equally. For a small number of errors (HD=2 in your case) and a fairly large block, 0xBAAD might be a good choice.

Matrix compression methods

In an application I've been working on, I have to send a 256 x 256 matrix over a socket. I'm developing a visualization client for a offshore system simulator that runs on a cluster, and this matrix is a heightmap representing the current state of the oceanic surface.
This is an realtime application, so speed is a must. And, using an 256 x 256 matrix of floats, I have to send 256 kbytes of data every second, for a bandwith requirement of 256 kbytes/second.
That's a lot, at least for my application.
So, my question is, is there some good method for compressing this matrix before sending it via socket? And, if there is such a method, how much os reduction can I expect?
As my matrix represent an continuous surface, lossy compression methods are not a problem for me. I'm mostly concerned with the compression ratio, the time that it takes for the compression to take place and, finally, if there is already an implementation of this method for C++.
If you are far enough offshore and/or in calm sea states, breaking waves are not likely to be a big problem. If this is the case, then the surface will be nicely continuous, and will likely look a lot like the superposition of multiple sine/cosine waves in X and Y.
2-D FFTs of the surface might give you some insight. You might be able to represent the surface as a bandwidth-limited 2-D FFT, and discard data for higher spatial frequencies.
First off: Numeric representation
Since I assume the physical range of the ocean high is limited (say -50.0 to 50 metres waves) if I understand your description correctly, the typical IEEE 754-2008 the 32-bit floating point (i.e. float in C/C++) uses 8-bits for it's exponent (range of -126 to 127), and 23 bits for the fraction and one bit for the sign. Note, that's base 2.
If your minimal measured (or computed) variance is 1mm, 0.001 metres, then you can reduce the floating point size need to at least 16-bits. IEEE 754 does define a 16-bit floating point value, for uses as an interchange format. Which is 5-bits for the exponent, 10-bits for the fraction, and 1-bit for the sign. I believe that would be suitable, and immediately reduce your requirements to 128KB/s (1024Kbps).
After I originally wrote this I realized that if you wanted a uniform representation, with a very small amount of error in the representation (<= 2mm), then converting to an 16-bit signed integer that a unit represents 2mm of physical height. So that you would have a uniform representation with a resolution of 2mm, from with values ranging from -32768 (== -65536 mm or approximately -65 metres, -200 ft) to 32767 (== 65534 mm or approximately 65.5 metres).
That's a very simple alternative representation based on the simples assumption that a) the valid range of values is with +/- 65.5 metres, and that 2mm resolution is acceptable for transmission.
Second: Modifying (filtering) the data
I don't know if a Discrete Cosine Transform (DCT), similar to what is used in JPEG compression might be useful as a lossy compression technique. Basically this is quantizing the data so that nearly equal neighbouring values are smoothed such that they can then be better compressed by loss-less compression methods.
Third: Traditional Lossless Compression
Otherwise reasonably fast loss-less compression techniques such as Lempel-Ziv based methods (LZ, LZH, LZW, etc.) and perhaps the fast LZO method.
Well, a matrix is just a 2D signal. So there is a lot of different compression methods.
I would first try the easy solution: go for inflate/deflate without a container (basically a Zip, without a Zip). http://en.wikipedia.org/wiki/DEFLATE
The compression level will depend on the data, so I can not say, you must try it yourself.
Otherwise, the smarter way to do it would be to send only the changes. If you have access to the server-side code, you can just send few bytes of the heightmap that is changed every second. That would be the ideal solution, and if you wish, you can even compress the changed bytes with a deflater.
First, I'd figure out whether you can change the basic encoding from 32-bit floating point to some sort of fixed point. Assuming your values all fall within a fairly specific range (which seems likely) this may well be enough to cut your bandwidth in half. Depending on the range (and precision) needed, you might well want to represent an exponential value, so you're capturing a fairly decent idea of a wide range of magnitudes, but small differences are mostly ignored.
I'd guess you don't (probably) expect huge height changes from one sample to the next, and (at a guess) you probably fairly frequently see slopes that continue across a number of samples.
If that's the case, a predictive delta compression will probably work well. The basic idea is that for every (non-edge) point, you predict the value of that point based on surrounding points, and then encode only the difference between what that predicts and the actual value for the point. Depending on how much precision you can lose, you might well be able to encode that delta into a single byte (or maybe even two per byte).
Once you've done that, you can consider using Huffman compression or even arithmetic compression, but either one will slow your compression a fair amount.
First, look at your data. How many bits of information in those floats do you really need to send? Play with chopping off the least significant bits and seeing if it's accurate enough. Next start with the basic lossless algorithms. Compress it via the LZ, lossless methods (LZ78, LZW, ...) Get a baseline lossless ratio with a fast decompress speed. Then try BZip and the likes for a possibly better compression method and a slower decompress. You now have your lossless limit. Now try some lossy algorithms. JPEG and the likes have tunable lossy ratios and still decompress really fast. Finally, add some filters. Your data would probably compress very well with a simple differential pass along the X or Y axis (or try both and save the result as 1 bit.) This should make your data even more compressable.
All told, I'd guess you could get at least x3 your current bandwidth lossless and x10 with a little loss.
If i understand you right, the surface is measured every second. So if the changes within one second are not to high, why not treat the data as video and try a video compression algorithm. Video compression also takes motion compensation into account. Motion compensation is, among the other parts of the algorithm, important for the high video compression rates.
I would try the following:
Because the height change is expected to be quite small in every second, try sending the differences in height between 2 consecutive transfers. Multiplying these numbers with 10^n so we don't have to send them as floats but integers instead. Next, use the zero-compressed encoding (plz google for it) which can reduce the number of bytes to be sent significantly. After that, use some compression algorithm to pack these bytes.
I would think it can be reduced about 50% (unless the differences are big enough to be used 3-4 bytes for each).
Well first off, how many levels of height do you need? What's the maximum difference in height from wave peak to trough? I bet you could represent it with only 256 or 65536 possible height values which immediately cuts your data to 1/2 or 1/4 without you having to modify your data structure.
You can send the min/max values as floats as well each update, so the 256 levels are always used fully to get the most accuracy possible... as the sea gets rougher you lose accuracy.
You can also save an image of 256x256 using standard image algorithms. You've not quite got a standard format bitmap cut could treat it as a grayscale - if each vertex V is scaled to a value 0-255, you can build a color (V,V,V) and for free use a JPG library that already exists. Or you can probably find a DDS file format that has a single channel of 8/16/32-bit data too.
The first part of this I did in the past, successfully. The 2nd part, I'd be keen to avoid writing your own algorithm but get your data in a form it can use existing libraries, like D3DX for example.

Overhead of casting double to float?

So I have megabytes of data stored as doubles that need to be sent over a network... now I don't need the precision that a double offers, so I want to convert these to a float before sending them over the network. What is the overhead of simply doing:
float myFloat = (float)myDouble;
I'll be doing this operation several million times every few seconds and don't want to slow anything down. Thanks
EDIT: My platform is x64 with Visual Studio 2008.
EDIT 2: I have no control over how they are stored.
As Michael Burr said, while the overhead strongly depends on your platform, the overhead is definitely less than the time needed to send them over the wire.
a rough estimate:
800MBit/s payload on a excellent Gigabit wire, 25M-floats/second.
On a 2GHz single core, that gives you a whopping 80 clock cycles for each value converted to break even - anythign less, and you will save time. That should be more than enough on all architectures :)
A simple load-store cycle (barring all caching delays) should be below 5 cycles per value. With instruction interleaving, SIMD extensions and/or parallelizing on multiple cores, you are likely to do multiple conversions in a single cycle.
Also, the receiver will be happy having to handle only half the data. Remember that memory access time is nonlinear.
The only thing arguing against the conversion would be is if the transfer should have minimal CPU load: a modern architecture could transfer the data from disk/memory to bus without CPU intervention. However, with above numbers I'd say that doesn't matter in practice.
[edit]
I checked some numbers, the 387 coprocessor would indeed have taken around 70 cycles for a load-store cycle. On the initial pentium, you are down to 3 cycles without any parallelization.
So, unless you run a gigabit network on a 386...
It's going to depend on your implementation of C++ libraries. Test it and see.
Even if it does take time this will not be the slow point in your application.
Your FPU can do the conversion a lot quicker than it can send network traffic (so the bottleneck here will more than likely be the write to the socket).
But as with al things like this measure it and see.
Personally I don't think any time spent here will affect the real time spent sending the data.
Assuming that you're talking about a significant number of packets to ship the data (a reasonable assumption if you're sending millions of values) casting the doubles to float will likely reduce the number of network packets by about half (assuming sizeof(double)==8 and sizeof(float)==4).
Almost certainly the savings in network traffic will dominate whatever time is spent performing the conversion. But as everyone says, measuring some tests will be the proof of the pudding.
Bearing in mind that most compilers deal with doubles a lot more efficiently than floats -- many promote float to double before performing operations on them -- I'd consider taking the block of data, ZIPping/compressing it, then sending the compressed block across. Depending on what your data looks like, you could get 60-90% compression, vs. the 50% you'd get converting 8-byte values to four bytes.
You don't have any choice but to measure them yourself and see. You could use timers to measure them. Looks like some has already implemented a neat C++ timer class
I think this cast is a lot cheaper than you think since it doesn't really involve any kind of calculation. In fact, it's just bitshifting to get rid of some of the digits of exponent and mantissa.
It will also depend on the CPU and what floating point support it has. In the bad old days (1980s), processors supported integer operations only. Floating point math had to be emulated in software. A separate chip for floating point (a coprocessor) could be bought separately.
Modern CPUs now have SIMD instructions, so large amounts of floating point data can be processed at once. These instructions include MMX, SSE, 3DNow! and the like. Your compiler may know how to make use of these instructions, but you may need to write your code in a particular way, and turn on the right options.
Finally, the fastest way to process floating point data is in a video card. A fairly new language called OpenCL lets you send tasks to the video card to be processed there.
It all depends on how much performance you need.