error correcting code over a 4 element alphabet - error-correction

I need to develop an error correcting code.
My alphabet is {0,1,2,3} (4 elements)
Codeword size n will be 8 or 12
expected error correction capability = 1 digit
expected error detection capability = 2 digit
I reviewed many ecc techniques (rs,ldpc,etc), yet still dont know where to start, and how to do.
Can anybody plz help me to construct it?
Thx

Have you considered a checksum?

There are tons of ways to implement this, but a common approach would be to use a Reed-Solomon code.
Since you need to detect all two-symbol errors and correct all one-symbol errors, that means you will need two check symbols.
You say you have 2-bit (4-element) symbols, which limits your code length to 3 symbols.
Add that up and you have 1 data symbol and 2 check symbols for each 12-bit code word.
Not very efficient, eh? For that efficiency, you might as well just triplicate your symbol thrice, with the same codewords size and detective and corrective power.
To use Reed-Solomon more effectively, you'll need to use large symbols. This is true for most other types of codes as well.
EDIT:
You may want to consider generalized BCH codes which don't have quite as many limitations as Reed-Solomon codes (which are a subset of BCH codes), at the expense of more complex decoding:
http://en.wikipedia.org/wiki/BCH_code

Related

Why RS(255,233) has 32 redundant symbols?

How does Reed Solomon code(255,233) is formed?
I understand how RS(255,223) is formed because
n=2^8-1=255
r=32, k=n-r=223
but how about RS(255,233)?
I read somewhere on the internet, it says RS(255,233) has 32 redundant symbols but why? Isn't it supposed to be 22 redundant symbols?
Any link that I can refer to would be appreciated. Thank you.
It was a mistake. RS(255,233) would be 22 parity symbols, RS(255,223) would be 32 parity symbols.
https://www.cs.cmu.edu/~guyb/realworld/reedsolomon/reed_solomon_codes.html
Note, in some cases of RS(n,k), n-k is an odd number, so 2t+1 parity symbols.
Another note, in the Wiki article, t means the number of parity symbols, instead of the number of errors that can be corrected:
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
The Wiki article also covers the original view for RS code, which is actually a different code with the same name. In this case, for GF(2^8), the max value for n is 256 instead of 255. Some erasure only codes use original view encoding, called "Vandermonde" encoding. Another encoding method is "Cauchy".

Reed solomon error correction and false positives

I have a Reed-Solomon encoder/decoder. After manipulating data and evaluating the results, I have experienced the following 3 cases:
The decoder decodes the message correctly and does not throw an error
The decoder decodes the message to a wrong result, without complaining - effectively producing a false positive. The chance should be very low, but can happen, even if the number of manipulated data is far below the error correction ability (even after changing a single bit...)
The decoder fails (throws an error), if more data is manipulated, than what is allowed by its error correction ability.
Are all 3 cases valid for a proper Reed-Solomon decoder? I am especially unsure about case 2, where the decoder would produce a wrong result (without throwing an error), even if there are much fewer errors than what is allowed by its correction abilities...?
mis-correction below error correction ability
This would indicate a bug in the code. A RS decoder should never fail if there are less than ⌊(n-k)/2⌋ errors.
correction detects when there more errors then error correction ability
Even if there are more than ⌊(n-k)/2⌋ errors, there is a good chance that a RS decoder will still detect an uncorrectable error, as most error patterns would not result in a received codeword that is within ⌊(n-k)/2⌋ or fewer error symbols of a valid codeword, since a working RS decoder should only produce a valid codeword or indicate an uncorrectable error. Miscorrection of more than ⌊(n-k)/2⌋ errors involves the decoder creating an additional ⌊(n-k)/2⌋ or fewer error symbols, resulting in a valid codeword, but one that differs from the original by n-k+1 or more symbols.
Detecting an uncorrectable error can be done by regenerating syndromes for the corrected codeword, but it's usually caught sooner when solving the error locator polynomial (normally done by looping through all possible locator values), when it produces fewer locators than it should due to duplicate or missing roots.
I wrote some interactive RS demo programs in C, for both 4 bit and 8 bit fields, that include the 3 most common decoders (PGZ (matrix), BM (discrepancy), SY (extended Euclid)). Note the SY - extended Euclid decoders in my examples emulate a hardware register oriented solution, two registers, always shift left, each register holds two polynomials where the split shifts left along with the register. The right half of each register is reversed (least significant coefficient first). The wiki article example may be easier to follow.
http://rcgldr.net/misc/eccdemo4.zip
http://rcgldr.net/misc/eccdemo8.zip

Using Reed Solomon decoding, do we need to know which shards are correct?

I am using Reed-Solomon error correction in a Java project. The library I use is JavaReedSolomon (https://github.com/Backblaze/JavaReedSolomon). There is an example of decoding using JavaReedSolomon:
byte[][] shards = new byte[NUM_SHARDS][SHARD_SIZE];
//shards is the array containing all the shards
ReedSolomon reedSolomon = ReedSolomon.create(NUM_DATA_SHARDS, NUM_PARITY_SHARDS);
reedSolomon.decodeMissing(shards, shardPresent, 0, shardSize);
The array shardPresent represents which shards are sure to be correct, for example, if you are sure that the 4th shard is correct, then shardPresent[3] equals true.
My question is, does Reed-Solomon decoding necessarily need to know which shards are correct or it is just how this library implement it?
The answer is no: the decoding procedure can recover from both unknown and known errors (erasures). A Reed-Solomon code (in fact, any MDS code) can correct twice as many erasures as errors. There are multiple ways to determine the error locator.
It is likely the API in the library corresponds to its use case, i.e. there is probably some side-channel information about which parts of the data are correct.

Error correcting codes

I need to use an error correcting technique on short messages (between 100 and 200 bits). Space available to add the redundant bits is constrained to 20-50%.
I will have to implement the coding and decoding in C/C++. So it needs to be either open sourced or sufficiently easy to program. (I have had some experience in the past with decoding algorithms - they are dreadful!)
Can anyone advise of a suitable error code to use (with relevant parameters) ?
Take a look at Reed Solomon error correction.
Sample implementation in C++ is available here.
For a different option look here - see item #11
EDIT: If you want a commercial library - http://www.schifra.com/faq.html
Reed-Solomon encoders are described in the form RS(CAPACITY,PAYLOAD). The capacity is always 2^SYMBOL-1, where SYMBOL is the number of bits in each Reed-Solomon symbol. Quite often, this SYMBOL size is 8 bits (a normal byte). It can typically be anything from 3 to 16 bits. For an 8-bit symbol, the Reed-Solomon encoder will be named RS(255,PAYLOAD).
The PAYLOAD is the number of non-parity symbols. If you want 4 parity symbols, you would specify RS(255,251).
To effectively correct errors in your data block, you must first package the data as symbols (groups of bits, quite often just 8-bit bytes). Your goal is to try to arrange (if possible) for any errors to be clustered into the smallest number of symbols possible.
For example, if an error occurs on average every 8 bits, then an 8-bit symbol will not be appropriate; pretty much every symbol will have an error! You might go for 4-bit symbols and use an RS(15,11) codec -- for up to 11 4-bit symbols at a time, producing 4 parity symbols per block. The smaller the symbol size, the lower the CAPACITY (eg. for a SYMBOL size of 4 bits, 2^4-1 == 15 symbol CAPACITY).
But typically, you would use 8-bit symbols. If you have a more realistic error rate of, say, 10% of your 8-bit symbols being erroneous, then you might use an RS(255,205) -- 50 parity symbols per 255 symbol Reed-Solomon "codeword", with a maximum PAYLOAD of 205 bytes. This gives us ~25% parity, allowing us to correct a codeword containing up to ~12.5% errors.
Using https://github.com/pjkundert/ezpwd-reed-solomon's c++/ezpwd/rs Reed-Solomon API, you would specify this as:
#include <ezpwd/rs>
...
ezpwd::RS<255,205> rscodec;
Put your data in a std::string (it can handle raw 8-bit binary data just fine) or a std::vector and call the API, adding the 50 symbols of parity:
std::string data;
// ... fill data with a fixed size block, up to 205 bytes
rscodec.encode( data );
Send your data, and later on, after you receive the data+parity, recover the original data (and discard the 50 parity symbols):
int corrected = rscodec.decode( data );
If the data could be recovered, the number of symbols corrected will be returned, or -1 if the Reed-Solomon codeword contained too many errors.
Enjoy!

Where is this encryption/decryption algorithm going wrong?

I've been working on a basic string encryption/decryption algorithm in C++ (the source is here: http://pastebin.com/MLnn8D82)
The problem I'm having is that it doesn't decrypt properly. The encryption equation is:
strInput[nPos]=(((strInput[nPos])+(nPos+1))*2);
And the decryption equation is:
strPassword[nPos]=(((strPassword[nPos])-(nPos+1))/2);
When I try it with just addition/subtraction operators, it works perfectly. But when I multiply in encryption and divide in decryption, I get a seemingly random string outputted.
At first I thought it may be because the password is written to and retrieved from a file before being decrypted, but I tried outputting it directly from the main function and I ended up with the same results.
Is there a problem with dividing/multiplying strings? It worked before with C-style (char array) strings, but I guess this could be different.
Any help is appreciated!
Edit: Thanks for the answers so far. I know that this isn't secure and that I shouldn't use it; I'm only doing it for practice.
Also, it's not a memory problem. I've tried dividing in the encryption stage rather than multiplying, but I still get a random string rather than the original string.
It's quite likely your multiplication is overflowing for some characters, meaning your division will never be able to recover the original.
On a side note, why are you writing the encryption algorithm yourself? If you're going to be using it for anything real, rather than just learning, you would be much better off using a library written by cryptography experts that is known to be secure. Something like Keyczar would be a good idea because it's designed to be difficult to get wrong (which is very easy to do in ways that are very subtle when it comes to cryptography).
There are multiple things wrong with this algorithm:
This is just a basic change to a standard Vigenère Cipher, which is well known to be very insecure. Do not use it for anything more than writing letters to a girlfriend, which other students should not be able to read. Even a somewhat decent math teacher will be able to decipher it easily.
Do not ever try to invent a cryptographic algorithm, unless you have a doctorate in number theory or cryptography. Even with a degree in one of these fields, writing a cryptographic algorithm, which is fairly secure, is a very hard task. And even if you find an algorithm, do not try to implement it yourself, but rather try to find an implementation which is already available. There is a lot you can get wrong, as can be seen by the various security flaws, which were cause by badly implemented cryptographic algorithms.
You do not have any support for a passphrase in your algorithm. This means, anybody who knows the algorithm can easily decipher your encrypted data. Usually a cryptographic algorithm takes a passphrase as an input, which is then used to decipher the data. This way the algorithm can be made public and only the passphrase must be kept secret. If the algorithm is kept secret, this is considered a fatal flaw by the cryptographic community.
Your multiplication might overflow, in case it yields a result, which is bigger than what could be stored in a char. In that case a division will not be able to retrieve the original data. This has been pointed out by others as well.
The order of operation is wrong. In your encryption step you add first then you multiply. Have a look at the resultion equation. Solving that equation for the input means you also have to change the order. In your case this means, you first have to divide and then you have to subtract. However in your code you are first subtracting and then dividing.
These are all the things I can tell you for now. This is not meant to discourage you from trying out this kind of stuff. I wrote a fair amount of similar algorithms when I was much younger. You just need to be very aware, that they will not be very secure.
There are two issues here.
One appears to stem from the use of strings and the input/output streams. If you set a breakpoint and step through this you'll realize that in the fRetrieve function the values of strPassword[nPos] can be negative. You are essentially reading in binary data into a string and trying to act on it.
What you should be doing is processing your strings into a binary data buffer. Such as a char array. That solely stores bytes. Then in your decryption you will get purely binary data back and can convert that into a string. This will insure the integrity of your data when writing/reading from the file. Playing with strings and high ASCII values is asking for the data to be interpreted wrong.
Second, is that your decryption algorithm is not properly reversed. So even if you did decrypt it correctly you would be off by 1 every time. This is an order of operations issue.
Example, assume an A (65) and nPos of 0. Encrypt:
65 + (0+1) = 66 * 2 = 132
Then reverse:
132 - (0+1) = 131 / 2 = 65.5
This may be rounded or truncated since it's an integer data type. The proper reverse is
(strPassword[nPos] / 2) - (nPos+1)