Checksum calculation - two’s complement sum of all bytes - c++

I have instructions on creating a checksum of a message described like this:
The checksum consists of a single byte equal to the two’s complement sum of all bytes starting from the “message type” word up to the end of the message block (excluding the transmitted checksum). Carry from the most significant bit is ignored.
Another description I found was:
The checksum value contains the twos complement of the modulo 256 sum of the other words in the data message (i.e., message type, message length, and data words). The receiving equipment may calculate the modulo 256 sum of the received words and add this sum to the received checksum word. A result of zero generally indicates that the message was correctly received.
I understand this to mean that I sum the value of all bytes in message (excl checksum), get modulo 256 of this number. get twos complement of this number and that is my checksum.
But I am having trouble with an example message example (from design doc so I must assume it has been encoded correctly).
unsigned char arr[] = {0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30,0x2,0x8,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe};
So the last byte, 0xE, is the checksum. My code to calculate the checksum is as follows:
bool isMsgValid(unsigned char arr[], int len) {
int sum = 0;
for(int i = 0; i < (len-1); ++i) {
sum += arr[i];
}
//modulo 256 sum
sum %= 256;
char ch = sum;
//twos complement
unsigned char twoscompl = ~ch + 1;
return arr[len-1] == twoscompl;
}
int main(int argc, char* argv[])
{
unsigned char arr[] = {0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30,0x2,0x8,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe};
int arrsize = sizeof(arr) / sizeof(arr[0]);
bool ret = isMsgValid(arr, arrsize);
return 0;
}
The spec is here:= http://www.sinet.bt.com/227v3p5.pdf
I assume I have misunderstood the algorithm required. Any idea how to create this checksum?
Flippin spec writer made a mistake in their data example. Just spotted this then came back on here and found others spotted too. Sorry if I wasted your time. I will study responses because it looks like some useful comments for improving my code.

You miscopied the example message from the pdf you linked. The second parameter length is 9 bytes, but you used 0x08 in your code.
The document incorrectly states "8 bytes" in the third column when there are really 9 bytes in the parameter. The second column correctly states "00001001".
In other words, your test message should be:
{0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30, // param1
0x2,0x9,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe} // param2
^^^
With the correct message array, ret == true when I try your program.

Agree with the comment: looks like the checksum is wrong. Where in the .PDF is this data?
Some general tips:
Use an unsigned type as the accumulator; that gives you well-defined behavior on overflow, and you'll need that for longer messages. Similarly, if you store the result in a char variable, make it unsigned char.
But you don't need to store it; just do the math with an unsigned type, complement the result, add 1, and mask off the high bits so that you get an 8-bit result.
Also, there's a trick here, if you're on hardware that uses twos-complement arithmetic: just add all of the values, including the checksum, then mask off the high bits; the result will be 0 if the input was correct.

The receiving equipment may calculate the modulo 256 sum of the received words and add this sum to the received checksum word.
It's far easier to use this condition to understand the checksum:
{byte 0} + {byte 1} + ... + {last byte} + {checksum} = 0 mod 256
{checksum} = -( {byte 0} + {byte 1} + ... + {last byte} ) mod 256
As the others have said, you really should use unsigned types when working with individual bits. This is also true when doing modular arithmetic. If you use signed types, you leave yourself open to a rather large number of sign-related mistakes. OTOH, pretty much the only mistake you open yourself up to using unsigned numbers is things like forgetting 2u-3u is a positive number.
(do be careful about mixing signed and unsigned numbers together: there are a lot of subtleties involved in that too)

Related

How to convert an arbitrary length unsigned int array to a base 10 string representation?

I am currently working on an arbitrary size integer library for learning purposes.
Each number is represented as uint32_t *number_segments.
I have functional arithmetic operations, and the ability to print the raw bits of my number.
However, I have struggled to find any information on how I could convert my arbitrarily long array of uint32 into the correct, and also arbitrarily long base 10 representation as a string.
Essentially I need a function along the lines of:
std::string uint32_array_to_string(uint32_t *n, size_t n_length);
Any pointers in the right direction would be greatly appreciated, thank you.
You do it the same way as you do with a single uint64_t except on a larger scale (bringing this into modern c++ is left for the reader):
char * to_str(uint64_t x) {
static char buf[23] = {0}; // leave space for a minus sign added by the caller
char *p = &buf[22];
do {
*--p = '0' + (x % 10);
x /= 10;
} while(x > 0);
return p;
}
The function fills a buffer from the end with the lowest digits and divides the number by 10 in each step and then returns a pointer to the first digit.
Now with big nums you can't use a static buffer but have to adjust the buffer size to the size of your number. You probably want to return a std::string and creating the number in reverse and then copying it into a result string is the way to go. You also have to deal with negative numbers.
Since a long division of a big number is expensive you probably don't want to divide by 10 in the loop. Rather divide by 1'000'000'000 and convert the remainder into 9 digits. This should be the largest power of 10 you can do long division by a single integer, not bigum / bignum. Might be you can only do 10'000 if you don't use uint64_t in the division.

Arduino code: shifting bits seems to change data type from int to long

on my Arduino, the following code produces output I don't understand:
void setup(){
Serial.begin(9600);
int a = 250;
Serial.println(a, BIN);
a = a << 8;
Serial.println(a, BIN);
a = a >> 8;
Serial.println(a, BIN);
}
void loop(){}
The output is:
11111010
11111111111111111111101000000000
11111111111111111111111111111010
I do understand the first line: leading zeros are not printed to the serial terminal. However, after shifting the bits the data type of a seems to have changed from int to long (32 bits are printed). The expected behaviour is that bits are shifted to the left, and that bits which are shifted "out" of the 16 bits an int has are simply dropped. Shifting the bits back does not turn the "32bit" variable to "16bit" again.
Shifting by 7 or less positions does not show this effect.
I probably should say that I am not using the Arduino IDE, but the Makefile from https://github.com/sudar/Arduino-Makefile.
What is going on? I almost expect this to be "normal", but I don't get it. Or is it something in the printing routine which simply adds 16 "1"'s to the output?
Enno
In addition to other answers, Integers might be stored in 16 bits or 32 bits depending on what arduino you have.
The function printing numbers in Arduino is defined in /arduino-1.0.5/hardware/arduino/cores/arduino/Print.cpp
size_t Print::printNumber(unsigned long n, uint8_t base) {
char buf[8 * sizeof(long) + 1]; // Assumes 8-bit chars plus zero byte.
char *str = &buf[sizeof(buf) - 1];
*str = '\0';
// prevent crash if called with base == 1
if (base < 2) base = 10;
do {
unsigned long m = n;
n /= base;
char c = m - base * n;
*--str = c < 10 ? c + '0' : c + 'A' - 10;
} while(n);
return write(str);
}
All other functions rely on this one, so yes your int gets promoted to an unsigned long when you print it, not when you shift it.
However, the library is correct. By shifting left 8 positions, the negative bit in the integer number becomes '1', so when the integer value is promoted to unsigned long the runtime correctly pads it with 16 extra '1's instead of '0's.
If you are using such a value not as a number but to contain some flags, use unsigned int instead of int.
ETA: for completeness, I'll add further explanation for the second shifting operation.
Once you touch the 'negative bit' inside the int number, when you shift towards right the runtime pads the number with '1's in order to preserve its negative value. Shifting to the left k positions corresponds to dividing the number by 2^k, and since the number is negative to start with then the result must remain negative.

Convert a 64bit integer to an array of 7bit-characters

Say I have a function vector<unsigned char> byteVector(long long UID), returning a byte presentation of the UID, a 64bit integer, as a vector. This vector is later on used to write this data to a file.
Now, because I decided I want to read that file with Python, I have to comply to the utf-8 standard, which means I can only use the first 7bits of each char. If the highest significant bit is 1 I can't decode it to a string anymore, because those are only supporting ASCII-characters. Also, I'll have to pass those strings to other processes via a Command Line Interface, which also is only supporting the ASCII-character set.
Before that problem arose, my approach on splitting the 64bit integer up into 8 separate bytes was the following, which worked really great:
vector<unsigned char> outputVector = vector<unsigned char>();
unsigned char * uidBytes = (unsigned char*) &UID_;
for (int i = 0; i < 8; i++){
outputVector.push_back(uidBytes[i]);
}
Of course that doesn't work anymore, as the constrain "HBit may not be 1" limits the maximum value of each unsigned char to 127.
My easiest option now would of course be to replace the one push_back call with this:
outputVector.push_back(uidBytes[i] / 128);
outputVector.push_back(uidBytes[i] % 128);
But this seems kind of wasteful, as the first of each unsigned char pair can only be 0 or 1 and I would be wasting some space (6 bytes) I could otherwise use.
As I need to save 64 bits, and can use 7 bits per byte, I'll need 64//7 + 64%7 = 10 bytes.
It isn't really much (none of the files I write ever even reached the 1kB mark), but I was using 8 bytes before and it seems a bit wasteful to use 16 now when ten (not 9, I'm sorry) would suffice. So:
How do I convert a 64bit integer to a vector of ten 7-bit integers?
This is probably too much optimization, but there could be some very cool solution for this problem (probably using shift operators) and I would be really interested in seeing it.
You can use bit shifts to take 7-bit pieces of the 64-bit integer. However, you need ten 7-bit integers, nine is not enough: 9 * 7 = 63, one bit short.
std::uint64_t uid = 42; // Your 64-bit input here.
std::vector<std::uint8_t> outputVector;
for (int i = 0; i < 10; i++)
{
outputVector.push_back(uid >> (i * 7) & 0x7f);
}
In every iteration, we shift the input bits by a multiple of 7, and mask out a 7-bit part. The most significant bit of the 8-bit numbers will be zero. Note that the numbers in the vector are “reversed”: the least significant bits have the lowest index. This is irrelevant though, if you decode the parts in the correct way. Decoding can be done as follows:
std::uint64_t decoded = 0;
for (int i = 0; i < 10; i++)
{
decoded |= static_cast<std::uint64_t>(outputVector[i]) << (i * 7);
}
Please note that it seems like a bad idea to interpret the resulting vector as UTF-8 encoded text: the sequence can still contain control characters and and \0. If you want to encode your 64-bit integer in printable characters, take a look at base64. In that case, you will need one more character (eleven in total) to encode 64 bits.
I suggest using assembly language.
Many assembly languages have instructions for shifting a bit into a "spare" carry bit and shifting the carry bit into a register. The C language has no convenient or efficient method to do this.
The algorithm:
for i = 0; i < 7; ++i
{
right shift 64-bit word into carry.
right shift carry into character.
}
You should also look into using std::bitset.

checksum code in C++

can someone please explain what this code is doing? i have to interpret this code and use it as a checksum code, but i am not sure if it is absolutely correct. Especially how the overflows are working and what *cp, const char* cp and sum & 0xFFFF mean? The basic idea was to take an input as string from user, convert it to binary form 16 bits at a time. Then sum all the multiple 16 bits together (in binary) and get a 16 bit sum. If there is any overflow bit in the addition, add that to lsb of final sum. Then take a ones complement of the result.
How close is this code to doing the above?
unsigned int packet::calculateChecksum()
{
unsigned int c = 0;
int i;
string j;
int k;
cout<< "enter a message" << message;
getline(cin, message) ; // Some string.
//std::string message =
std::vector<uint16_t> bitvec;
const char* cp = message.c_str()+1;
while (*cp) {
uint16_t bits = *(cp-1)>>8 + *(cp);
bitvec.push_back(bits);
cp += 2;
}
uint32_t sum=0;
uint16_t overflow=0;
uint32_t finalsum =0;
// Compute the sum. Let overflows accumulate in upper 16 bits.
for(auto j = bitvec.begin(); j != bitvec.end(); ++j)
sum += *j;
// Now fold the overflows into the lower 16 bits. Loop until no overflows.
do {
sum = (sum & 0xFFFF) + (sum >> 16);
} while (sum > 0xFFFF);
// Return the 1s complement sum in finalsum
finalsum = 0xFFFF & sum;
//cout<< "the finalsum is" << c;
c = finalsum;
return c;
}
I see several issues in the code:
cp is a pointer to zero ended char array holding the input message. The while(*cp) will have problem as inside the while loop body cp is incremented by 2!!! So it's fairly easy to skip the ending \0 of the char array (e.g. the input message has 2 characters) and result in a segmentation fault.
*(cp) and *(cp-1) fetch the two neighbouring characters (bytes) in the input message. But why the two-bytes word is formed by *(cp-1)>>8 + *(cp)? I think it would make sense to formed the 16bits word by *(cp-1)<<8 + *(cp) i.e. the preceding character sits on the higher byte and the following character sits on the lower byte of the 16bits word.
To answer your question sum & 0xFFFF just means calculating a number where the higher 16 bits are zero and the lower 16 bits are the same as in sum. the 0xFFFF is a bit mask.
The funny thing is, even the above code might not doing the exact thing you mentioned as requirement, as long as the sending and receiving party are using the same piece of incorrect code, your checksum creation and verification will pass, as both ends are consistent with each other:)

Shift left/right adding zeroes/ones and dropping first bits

I've got to program a function that receives
a binary number like 10001, and
a decimal number that indicates how many shifts I should perform.
The problem is that if I use the C++ operator <<, the zeroes are pushed from behind but the first numbers aren't dropped... For example
shifLeftAddingZeroes(10001,1)
returns 100010 instead of 00010 that is what I want.
I hope I've made myself clear =P
I assume you are storing that information in int. Take into consideration, that this number actually has more leading zeroes than what you see, ergo your number is most likely 16 bits, meaning 00000000 00000001 . Maybe try AND-ing it with number having as many 1 as the number you want to have after shifting? (Assuming you want to stick to bitwise operations).
What you want is to bit shift and then limit the number of output bits which can be active (hold a value of 1). One way to do this is to create a mask for the number of bits you want, then AND the bitshifted value with that mask. Below is a code sample for doing that, just replace int_type with the type of value your using -- or make it a template type.
int_type shiftLeftLimitingBitSize(int_type value, int numshift, int_type numbits=some_default) {
int_type mask = 0;
for (unsigned int bit=0; bit < numbits; bit++) {
mask += 1 << bit;
}
return (value << numshift) & mask;
}
Your output for 10001,1 would now be shiftLeftLimitingBitSize(0b10001, 1, 5) == 0b00010.
Realize that unless your numbits is exactly the length of your integer type, you will always have excess 0 bits on the 'front' of your number.