Reed solomon error correction and false positives

Reed solomon error correction and false positives - error-correction

I have a Reed-Solomon encoder/decoder. After manipulating data and evaluating the results, I have experienced the following 3 cases:
The decoder decodes the message correctly and does not throw an error
The decoder decodes the message to a wrong result, without complaining - effectively producing a false positive. The chance should be very low, but can happen, even if the number of manipulated data is far below the error correction ability (even after changing a single bit...)
The decoder fails (throws an error), if more data is manipulated, than what is allowed by its error correction ability.
Are all 3 cases valid for a proper Reed-Solomon decoder? I am especially unsure about case 2, where the decoder would produce a wrong result (without throwing an error), even if there are much fewer errors than what is allowed by its correction abilities...?

mis-correction below error correction ability
This would indicate a bug in the code. A RS decoder should never fail if there are less than ⌊(n-k)/2⌋ errors.
correction detects when there more errors then error correction ability
Even if there are more than ⌊(n-k)/2⌋ errors, there is a good chance that a RS decoder will still detect an uncorrectable error, as most error patterns would not result in a received codeword that is within ⌊(n-k)/2⌋ or fewer error symbols of a valid codeword, since a working RS decoder should only produce a valid codeword or indicate an uncorrectable error. Miscorrection of more than ⌊(n-k)/2⌋ errors involves the decoder creating an additional ⌊(n-k)/2⌋ or fewer error symbols, resulting in a valid codeword, but one that differs from the original by n-k+1 or more symbols.
Detecting an uncorrectable error can be done by regenerating syndromes for the corrected codeword, but it's usually caught sooner when solving the error locator polynomial (normally done by looping through all possible locator values), when it produces fewer locators than it should due to duplicate or missing roots.
I wrote some interactive RS demo programs in C, for both 4 bit and 8 bit fields, that include the 3 most common decoders (PGZ (matrix), BM (discrepancy), SY (extended Euclid)). Note the SY - extended Euclid decoders in my examples emulate a hardware register oriented solution, two registers, always shift left, each register holds two polynomials where the split shifts left along with the register. The right half of each register is reversed (least significant coefficient first). The wiki article example may be easier to follow.
http://rcgldr.net/misc/eccdemo4.zip
http://rcgldr.net/misc/eccdemo8.zip

Related

FORTRAN 77 Divide By Zero Behavior

I am working on re-engineering an old FORTRAN77 program to Python for a while now. I'm running into an issue, though: when dividing by zero, it appears that the FORTRAN program just continues processing the data without issue. However, predictably it causes an issue in Python. I'm not able to find a discussion about this on any official channel for F77, and I only have an old version of the source code for the program I am translating that I can't get to compile.
TL;DR: How does F77 handle division by zero for the following cases?:
REAL division
INT division
The numerator is also zero (e.g. 0./0.)

Yes, I also have code that does nothing when a divide by zero error is encountered. Usually, it is the programmers responsibility to ensure that the results are either expected (the target variable's value is unchanged) or an error is thrown etc. In other words, you need to inspect any division operation for a possible zero divisor. Modern operating systems throw an internal exception on divide by zero (and assign NAN to the target variable if the system would pause under these circumstances), most older Fortran code is written such that divide by zero doesn't matter.

Using Reed Solomon decoding, do we need to know which shards are correct?

I am using Reed-Solomon error correction in a Java project. The library I use is JavaReedSolomon (https://github.com/Backblaze/JavaReedSolomon). There is an example of decoding using JavaReedSolomon:
byte[][] shards = new byte[NUM_SHARDS][SHARD_SIZE];
//shards is the array containing all the shards
ReedSolomon reedSolomon = ReedSolomon.create(NUM_DATA_SHARDS, NUM_PARITY_SHARDS);
reedSolomon.decodeMissing(shards, shardPresent, 0, shardSize);
The array shardPresent represents which shards are sure to be correct, for example, if you are sure that the 4th shard is correct, then shardPresent[3] equals true.
My question is, does Reed-Solomon decoding necessarily need to know which shards are correct or it is just how this library implement it?

The answer is no: the decoding procedure can recover from both unknown and known errors (erasures). A Reed-Solomon code (in fact, any MDS code) can correct twice as many erasures as errors. There are multiple ways to determine the error locator.
It is likely the API in the library corresponds to its use case, i.e. there is probably some side-channel information about which parts of the data are correct.

How to detect errors in CRC-protected data?

As far as I understand to check if the data with it's CRC appended to the end of it has errors, one needs to run it through the same CRC algorithm and see if the newly calculated CRC is zero.
I've tried going though this using an online CRC calculator in the following way:
Calculate CRC for 0xAABBDD (without the 0x part) - CRC16 outputs 0x8992
Calculate CRC for 0xAABBDD8992 - CRC16 outputs 0xFB4A, not 0x0000
What am I doing wrong?

The appending the CRC thing only works for "pure" CRCs without pre and post-conditioning. Most real world CRCs however have pre and post-conditioning, mainly so that the CRC of strings of zeros are not zero.
The way to check a CRC is the same as any other check value. You get a message m c where m are the message bytes and c is the check value. You are told through some other channel (most likely a standards document) that c=f(m), with some description of the function f. To check the integrity of m c, you compute f(m) and see if that is equal to c. If not, then the message and/or check value were corrupted in transit. If they are equal, then you have some degree of assurance that the message has arrived unscathed. The assurance depends on the nature of f, the number of bits in c, and the characteristics of the possible errors on the transmission channel.

error correcting code over a 4 element alphabet

I need to develop an error correcting code.
My alphabet is {0,1,2,3} (4 elements)
Codeword size n will be 8 or 12
expected error correction capability = 1 digit
expected error detection capability = 2 digit
I reviewed many ecc techniques (rs,ldpc,etc), yet still dont know where to start, and how to do.
Can anybody plz help me to construct it?
Thx

Have you considered a checksum?

There are tons of ways to implement this, but a common approach would be to use a Reed-Solomon code.
Since you need to detect all two-symbol errors and correct all one-symbol errors, that means you will need two check symbols.
You say you have 2-bit (4-element) symbols, which limits your code length to 3 symbols.
Add that up and you have 1 data symbol and 2 check symbols for each 12-bit code word.
Not very efficient, eh? For that efficiency, you might as well just triplicate your symbol thrice, with the same codewords size and detective and corrective power.
To use Reed-Solomon more effectively, you'll need to use large symbols. This is true for most other types of codes as well.
EDIT:
You may want to consider generalized BCH codes which don't have quite as many limitations as Reed-Solomon codes (which are a subset of BCH codes), at the expense of more complex decoding:
http://en.wikipedia.org/wiki/BCH_code

Why do we follow opposite conventions while returning from main()?

I have gone through this and this,
but the question I am asking here is that why is 0 considered a Success?
We always associate 0 with false, don't we?

Because there are more fail cases than success cases.
Usually, there is only one reason we succeed (because we're successful :)), but there are a whole lot of reasons why we could fail. So 0 means success, and everything else means failure, and the value could be used to report the reason.
For functions in your code, this is different, because you are the one specifying the interface, and thus can just use a bool if it suffices. For main, there is one fixed interface for returns, and there may be programs that just report succeed/fail, but others that need more fine error reporting. To satisfy them all, we will have multiple error cases.

I have to quibble with with Johannes' answer a bit. True 0 is used for success because there is only 1 successful outcome while there can be many unsuccessful outcomes. But my experience is that return codes have less to do with reasons for failure than levels of failure.
Back in the days of batch programming there were usually conventions for return codes that allowed for some automation of the overall stream of execution. So a return code of 4 might be a warning but the next job could continue; an 8 might mean the job stream should stop; a 12 might mean something catastrophic happened and the fire department should be notified.
Similarly, batches would set aside some range of return codes so that the overall batch stream could branch. If an update program returned XX, for instance, the batch might skip a backup step because nothing changed.
Return codes as reasons for failure aren't all that helpful, certainly not nearly as much as log files, core dumps, console alerts, and whatnot. I have never seen a system that returns XX because "such and such file was not found", for instance.

Generally the return values for any given program tend to be a list (enum) of possible values, such as Success or specific errors. As a "list", this list generally tends to start at 0 and count upwards. (As an aside, this is partly why the Microsoft Error Code 0 is ERROR_SUCCESS).
Globally speaking, Success tends to be one of the only return values that almost any program should be capable of returning. Even if a program has several different error values, Success tends to be a shared necessity, and as such is given the most common position in a list of return values.
It's just the simplest way to allow a most common return value by default. It's completely separate from the idea of a boolean.

Here's the convention that I'm used to from various companies (although this obviously varies from place to place):
A negative number indicates an error occured. The value of the negative number indicates (hopefully) the type of error.
A zero indicates success (generic success)
A positive number indicates a type of success (under some business cases there various things that trigger a successful case....this indicates which successful case happened).

I, too, found this confusing when I first started programming. I resolved it in my mind by saying, 0 means no problems.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js