I wonder about crc error probability. How can I get 2^(-n)? - crc

I wonder about crc error probability.
In most papers, crc error rate is described like 1-2(-n)
For example, the probability of crc-16 is 1-2(-16),
so 2(-16)=1∕65536=0.0015%, prob = 99.9984%
I want to know how I can get this formula: 2^(-n).
If 2(-n) is correct rate, the rate of crc-16 and crc-ccitt is same?
And if message bit is bigger than before, the rate is same?

For an n-bit CRC, there are 2n possible values of that CRC. Therefore the probability that a message with random errors applied, regardless of the length of the message (so long as it's four bytes or more), has the same CRC as the original message is 2-n. This true for any hash function, including any variant of a CRC, that mixes the input bits well into the output.

Related

In which order should I put LEN/CRC/DATA in a message? Should CRC protect the LEN field?

There's a section (2.5) in Xz format inadequate for long-term archiving:
According to Koopman (p. 50), one of the "Seven Deadly Sins" (i.e.,
bad ideas) of CRC and checksum use is failing to protect a message
length field. This causes vulnerabilities due to framing errors. Note
that the effects of a framing error in a data stream are more serious
than what Figure 1 suggests. Not only data at a random position are
interpreted as the CRC. Whatever data that follow the bogus CRC will
be interpreted as the beginning of the following field, preventing the
successful decoding of any remaining data in the stream.
He talks about this case, when a message is like this:
ID LEN DATA CRC
If LEN is damaged, then a CRC at a random position will be used. But I fail to see, why it is a problem. At that random position, almost surely there will be an invalid CRC value, so the error is detected.
And he talks about decoding the following data. I fail to see, if the LEN is protected, how one is able to decode the following data either. If LEN is damaged, we cannot find the next message in both cases.
For example, PNG doesn't protect the length field.
So, why is it obviously better, when a LEN field is protected by CRC?
If I were to design a message structure, which is the best way to do that? What order should I use, and what should I protect with CRC? Suppose that the message has the following parts:
message type ID (variable length integer)
message length (variable length integer)
CRC
the message data itself
My current design is this:
CRC, protects the whole message
message type ID (variable length integer)
message length (variable length integer)
the message data itself
Is there any drawback of this method?
What Koopman actually says (here) is:
Failing to protect message length field
Results in pointing to data as FCS, giving HD=1
HD is the Hamming distance, meaning that the probability of a false positive can go up significantly on a low bit-error-rate stream if you look at part of the data as the (faux) check value, instead of the actual check value. To really do it right, you should protect the length field and other header values with their own check value before the data.
As for your design, putting the CRC first has the disadvantage of having to buffer all of the message to compute the CRC before you can write the message in a stream. You could do type id, length, header crc, message, message crc.

High error rate in multi-layer feedforward neurolab network

Why is my network showing me a high error?
I need to follow these rules -
multi-layer feedforward (2 inputs, 1 output);
the first input has 262144 values (from 0 to 256) and
the second 262144 (from 0 to 1024).
I'm using only one hidden layer. My error is something like this:
Epoch: 1; Error: 2816810148.1;
Epoch: 2; Error: 2814260288.59;
Epoch: 3; Error: 2813602739.7;
Epoch: 4; Error: 2813385229.99;
Epoch: 5; Error: 2813308095.39;
You should normalize your input for first and second column. Then de-normalize your output. Also consider scaling your second column to be closer in values to first column. It will give your better error surface.
It would appear that your network is working exactly as intended. On every epoch, the error has come down. The network initialises with a random "guess" and then moves from there - it has no foresight into the answer it must generate. With 250K+ inputs of values that go into the 1000s, it's not difficult to imagine that your error would be so high (you haven't stated what your error metric is). It's a lot of data, it's going to take a lot of epochs (1000s minimum) to get something useful. Between epoch 1 and 2 your error, whatever unit that may be, has come down by 2.5 million points. I'd say that was some improvement. Have you left it running for a few hours to see whether it might be able to solve this problem within a reasonable amount of time?

rrdtool info ds unknown_sec meaning

The answer to this question is probably obvious, but I couldn't figure out what it means nor find the documentation for it. When I run rrdtool info, I get ds[source].unknown_sec = 0, and I am not sure what it exactly means...Any help or pointers would be appreciated!
The unknown_sec is the number of seconds for which the value of the DS is Unknown. This could be because the supplied value was Unknown, or was outside the specified range, or because the time since the last sample exceeds the heartbeat time (which marks everything since then as Unknown).
The amount of Unknown time in an interval is then used in combination with the xff fraction in the RRA definitions to determine if a consolodated data point stored in the RRA is Unknown.
Actually, I think I figured out what it means. If two data samples exceed the heartbeat value, then the entire interval between the two data samples are marked as unknown. So unknown_sec = 0 means there hasn't been two data samples that exceed the heartbeat value.

RRDTool Counter increment lower than time

I create a standard RRDTool database with a default step of 5mn (300s).
I have different types of values in it, some gauges which are easily processed, but I have other values I would have in COUNTER but here is my problem :
I read the data in a program, and get the difference between values over two steps is good but the counter increment less than time (It can increment by less than 300 during a step), so my out value is wrong.
Is it possible to change the COUNTER for not be a number by second but by step or something like that, if it's not I suppose I have to calculate the difference in my program ?
Thank you for helping.
RRDTool is capable of handling fractional values, so there is no problem if the counter increments by less than the seconds interval since the last update.
RRDTool stores everything as a Rate. If your DS is of type GAUGE, then RRDTool assumes that the incoming value is alreayd a rate, and only applies Data Normalisation (more on this later). If the type is COUNTER or DERIVE, then the value/timepoint you are updating with is compared to the previous value/timepoint to obtain a rate thus: r=(x2 - x1)/(t2 - t1). The rate obtained is then Normalised. The other DS type is ABSOLUTE, which assumes the counter was reset on the last read, giving r=x2/(t2 - t1).
The Normalisation step adjusts the data point based on assuming a linear progression from the last data point so that it lies exactly on an interval boundary. For example, if your step is 5min, and you update at 12:06, the data point is adjusted back to what it would have been at 12:05, and stored against 12:05. However the last unadjusted DP is still preserved for use at the next update, so that overall rates are correct.
So, if you have a 300s (5min) interval, and the value increased by 150, the rate stored will be 0.5.
If the value you are graphing is something small, e.g. 'number of pages printed', this might seem counterintuitive, but it works well for large rates such as network traffic counters (which is what RRDTool was designed for).
If you really do not want to display fractional values in the generated graphs or output, then you can use a format string such as %.0f to enforce no decimal places and the displayed number will be rounded to the nearest integer.

Hamming codes formulas

This is the question:
Determine if the Hamming codes (15,10), (14,10) and (13,10) can correct a single error (SEC), detect a single error (SED) or detect double bit errors (DED).
I do know how Hamming distance work and how you can detect an error if you have the data-word that you want to transmit. But I don't know how to do it without the data-word.
Only for SEC which has the formula:
2^m > m+k+1
where
m = check bits
k = data bits
But is the any formulas for SED and DED? I have searched google all day long without any success.
I learned how to solve it checks the Hamming codes through this Youtube clip. there is no effective way. wish there was a faster way to solve the problem.
https://www.youtube.com/watch?v=JAMLuxdHH8o
This is not responding properly to the question because I know what the user
"user3314356" needs, but I could not comment because I did not have 50 reputation points!.