Checksum for binary PLC communication - c++

I've been scratching my head around calculating a checksum to communicate with Unitronics PLCs using binary commands. They offer the source code but it's in a Windows-only C# implementation, which is of little help to me other than basic syntax.
Specification PDF (the checksum calculation is near the end)
C# driver source (checksum calculation in Utils.cs)
Intended Result
Below is the byte index, message description and the sample which does work.
# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | 24 25 26 27 28 29 | 30 31 32
# sx--------------- id FE 01 00 00 00 cn 00 specific--------- lengt CHKSM | numbr ot FF addr- | CHKSM ex
# 2F 5F 4F 50 4C 43 00 FE 01 00 00 00 4D 00 00 00 00 00 01 00 06 00 F1 FC | 01 00 01 FF 01 00 | FE FE 5C
The specification calls for calculating the accumuated value of the 22 byte message header and, seperately, the 6+ byte detail, getting the value of sum modulo 65536, and then returning two's complement of that value.
Attempt #1
My understanding is the tilde (~) operator in Python is directly derived from C/C++. After a day of writing the Python that creates the message I came up with this (stripped down version):
#!/usr/bin/env python
def Checksum( s ):
x = ( int( s, 16 ) ) % 0x10000
x = ( ~x ) + 1
return hex( x ).split( 'x' )[1].zfill( 4 )
Details = ''
Footer = ''
Header = ''
Message = ''
Details += '0x010001FF0100'
Header += '0x2F5F4F504C4300FE010000004D000000000001000600'
Header += Checksum( Header )
Footer += Checksum( Details )
Footer += '5C'
Message += Header.split( 'x' )[1].zfill( 4 )
Message += Details.split( 'x' )[1].zfill( 4 )
Message += Footer
print Message
Message: 2F5F4F504C4300FE010000004D000000000001000600600L010001FF010001005C
I see an L in there, which is a different result to yesterday, which wasn't any closer. If you want a quick formula result based on the rest of the message: Checksum(Header) should return F1FC and Checksum(Details) should return FEFE.
The value it returns is nowhere near the same as the specification's example. I believe the issue may be one or two things: the Checksum method isn't calculating the sum of the hex string correctly or the Python ~ operator is not equivalent to the C++ ~ operator.
Attempt #2
A friend has given me his C++ interpretation of what the calculation SHOULD be, I just can't get my head around this code, my C++ knowledge is minimal.
short PlcBinarySpec::CalcHeaderChecksum( std::vector<byte> _header ) {
short bytesum = 0;
for ( std::vector<byte>::iterator it = _header.begin(); it != _header.end(); ++it ) {
bytesum = bytesum + ( *it );
}
return ( ~( bytesum % 0x10000 ) ) + 1;
}

I'm not entirely sure what the correct code should be… but if the intention is for Checksum(Header) to return f705, and it's returning 08fb, here's the problem:
x = ( ~( x % 0x10000 ) ) + 1
The short version is that you want this:
x = (( ~( x % 0x10000 ) ) + 1) % 0x10000
The problem here isn't that ~ means something different. As the documentation says, ~x returns "the bits of x inverted", which is effectively the same thing it means in C (at least on 2s-complement platforms, which includes all Windows platforms).
You can run into a problem with the difference between C and Python types here (C integral types are fixed-size, and overflow; Python integral types are effectively infinite-sized, and grow as needed). But I don't think that's your problem here.
The problem is just a matter of how you convert the result to a string.
The result of calling Checksum(Header), up to the formatting, is -2299, or -0x08fb, in both versions.
In C, you can pretty much treat a signed integer as an unsigned integer of the same size (although you may have to ignore warnings to do so, in some cases). What exactly that does depends on your platform, but on a 2s-complement platform, signed short -0x08fb is the bit-for-bit equivalent of unsigned 0xf705. So, for example, if you do sprintf(buf, "%04hx", -0x08fb), it works just fine—and it gives you (on most platforms, including everything Windows) the unsigned equivalent, f705.
But in Python, there are no unsigned integers. The int -0x08fb has nothing to do with 0xf705. If you do "%04hx" % -0x08fb, you'll get -8fb, and there's no way to forcibly "cast it to unsigned" or anything like that.
Your code actually does hex(-0x08fb), which gives you -0x8fb, which you then split on the x, giving you 8fb, which you zfill to 08fb, which makes the problem a bit harder to notice (because that looks like a perfectly valid pair of hex bytes, instead of a minus sign and three hex digits), but it's the same problem.
Anyway, you have to explicitly decide what you mean by "unsigned equivalent", and write the code to do that. Since you're trying to match what C does on a 2s-complement platform, you can write that explicit conversion as % 0x10000. If you do "%04hx" % (-0x08fb % 0x10000), you'll get f705, just as you did in C. And likewise for your existing code.

It's quite simple. I checked your friend's algorithm by adding all the header bytes manually on a calculator, and it yields the correct result (0xfcf1).
Now, I don't actually know python, but it looks to me like you are adding up half-byte values. You have made your header string like this:
Header = '2F5F4F504C4300FE010000004D000000000001000600'
And then you go through converting each byte in that string from hex and adding it. That means you are dealing with values from 0 to 15. You need to consider every two bytes as a pair and convert that (values from 0 to 255). Or you need to use actual binary data instead of a text representation of the binary data.
At the end of the algorithm, you don't really need to do the ~ operator if you don't trust it. Instead you can do (0xffff - (x % 0x10000)) + 1. Bear in mind that prior to adding 1, the value might actually be 0xffff, so you need to modulo the entire result by 0x10000 afterwards. Your friend's C++ version uses the short datatype so no modulo is necessary at all because the short will naturally overflow

Related

6-bit CRC datasheet confusion STMicroelectronics L9963E

I’m working on the SPI communication between a microcontroller and the L9963E. The datasheet of the L9963E shows little information about the CRC calculation, but mentions:
a CRC of 6 bits,
a polynomial of X6 + X4 + X3 + 1 = 0b1011001
a seed of 0b111000
The documentation also mentions in the SPI protocol details that the CRC is a value between 0x00 and 0x3F, and is "calculated on the [39-7] field of the frame", see Table 22.
I'm wondering: What is meant by "field [39-7]"? The total frame length is 40bits, out of this the CRC makes up 6 bits. I would expect a CRC calculation over the remaining 34 bits, but field 39-7 would mean either 33 bits (field 7 inclusive) or 32 bits (excluding field 7).
Since I have access to a L9963E evaluation board, which includes pre-loaded firmware, I have hooked up a logic analyser. I have found the following example frames to be sent to the L9963E from the eval-board, I am assuming that these are valid, error-free frames.
0xC21C3ECEFD
0xC270080001
0xE330081064
0xC0F08C1047
0x827880800E
0xC270BFFFF9
0xC2641954BE
Could someone clear up the datasheet for me, and maybe advise me on how to implement this CRC calculation?
All of the special frames have CRC == 0:
(0000000016 xor E000000000) mod 59 = 00
(C1FCFFFC6C xor E000000000) mod 59 = 00
(C1FCFFFC87 xor E000000000) mod 59 = 00
(C1FCFFFCDE xor E000000000) mod 59 = 00
(C1FCFFFD08 xor E000000000) mod 59 = 00
and as noted by Mark Adler only one of the messages in the question works:
(C270BFFFF9 xor E000000000) mod 59 = 00
Bit ranges in datasheets like this are always inclusive.
I suspect that this is just a typo, or the person who wrote it temporarily forgot that the bits are numbered from zero.
Looking at the other bit-field boundaries in Table 19 of the document you linked it wouldn't make sense to exclude the bottom bit of the data field from the CRC, so I suspect the datasheet should say bits 39 to 6 inclusive.
There is a tool called pycrc that can generate C code to calculate a CRC with an arbitrary polynomial.
If a 40-bit message is in the low bits of uint64_t x;, then this:
x ^= (uint64_t)0x38 << 34;
for (int j = 0; j < 40; j++)
x = x & 0x8000000000 ? (x << 1) ^ ((uint64_t)0x19 << 34) : x << 1;
x &= 0xffffffffff;
gives x == 0 for all of the example messages in the document. But only for one of your examples in the question. You may not be extracting the data correctly.

Is there any difference betwwen writing each and every member of struct to a file and writing structure object directly to a file in c++?

#include <iostream>
#include <fstream>
using namespace std;
struct example
{
int num1;
char abc[10];
}obj;
int main ()
{
ofstream myfile1 , myfile2;
myfile1.open ("example1.txt");
myfile2.open ("example2.txt");
myfile1 << obj.num1<<obj.abc; //instruction 1
myfile2.write((char*)&obj, sizeof(obj)); //instruction 2
myfile1.close();
myfile2.close();
return 0;
}
In this example will both the example files be identical with data or different? Are instruction 1 and instruction 2 same?
There's a massive difference.
Approach 1) writes the number using ASCII encoding, so there's an ASCII-encoded byte for each digit in the number. For example, the number 28 is encoded as one byte containing ASCII '2' (value 50 decimal, 32 hex) and another for '8' (56 / 0x38). If you look at the file in a program like less you'll be able to see the 2 and the 8 in there as human-readable text. Then << obj.abc writes the characters in abc up until (but excluding) the first NUL (0-value byte): if there's no NUL you run off the end of the buffer and have undefined behaviour: your program may or may not crash, it may print nothing or garbage, all bets are off. If your file is in text mode, it might translate any newline and/or carriage return characters in abc1 to some other standard representation of line breaks your operating system uses (e.g. it might automatically place a carriage return after every newline you write, or remove carriage returns that were in abc1).
Approach 2) writes the sizeof(obj) bytes in memory: that's a constant number of bytes regardless of their content. The number will be stored in binary, so a program like less won't show you the human-readable number from num1.
Depending on the way your CPU stores numbers in memory, you might have the bytes in the number stored in different orders in the file (something called endianness). There'll then always be 10 characters from abc1 even if there's a NUL in there somewhere. Writing out binary blocks like this is normally substantially faster than converting number to ASCII text and the computer having to worry about if/where there are NULs. Not that you normally have to care, but not all the bytes written necessarily contribute to the logical value of obj: some may be padding.
A more subtle difference is that for approach 1) there are ostensibly multiple object states that could produce the same output. Consider {123, "45"} and {12345, ""} -> either way you'd print "12345". So, you couldn't later open and read from the file and be sure to set num1 and abc to what they used to be. I say "ostensibly" above because you might happen to have some knowledge we don't, such as that the abc1 field will always start with a letter. Another problem is knowing where abc1 finishes, as its length can vary. If these issues are relevant to your actual use (e.g. abc1 could start with a digit), you could for example write << obj.num1 << ' ' << obj.abc1 << '\n' so the space and newline would tell you where the fields end (assuming abc1 won't contain newlines: if it could, consider another delimiter character or an escaping/quoting convention). With the space/newline delimiters, you can read the file back by changing the type of abc1 to std::string to protect against overruns by corrupt or tampered-with files, then using if (inputStream >> obj.num1 && getline(inputStream, obj.abc1)) ...process obj.... getline can cope with embedded spaces and will read until a newline.
Example: {258, "hello\0\0\0\0\0"} on a little-endian system where sizeof(int) is 32 and the stucture's padded out to 12 bytes would print (offsets and byte values shown in hexadecimal):
bytes in file at offset...
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
approach 1) 32 35 38 69 65 6c 6c 6f
'2' '5' '8' 'h' 'e' 'l' 'l' 'o'
approach 2) 00 00 01 02 69 65 6c 6c 6f 00 00 00 00 00 00 00
[-32 bit 258-] 'h' 'e' 'l' 'l' 'o''\0''\0''\0''\0''\0' pad pad
Notes: for approach 2, 00 00 01 02 encodes 100000010 binary which is 258 decimal. (Search for "binary encoding" to learn more about this).

How to read a binary file of complex numbers in C++

I have a 320Mb binary file (data.dat), containing 32e7 lines of hex numbers:
1312cf60 d9 ff e0 ff 05 00 f0 ff 22 00 2f 00 fe ff 33 00 |........"./...3.|
1312cf70 00 00 00 00 f4 ff 1d 00 3d 00 6d 00 53 00 db ff |........=.m.S...|
1312cf80 b7 ff b0 ff 1e 00 0c 00 67 00 d1 ff be ff f8 ff |........g.......|
1312cf90 0b 00 6b 00 38 00 f3 ff cf ff cb ff e4 ff 4b 00 |..k.8.........K.|
....
Original numbers were:
(16,-144)
(-80,-64)
(-80,16)
(16,48)
(96,95)
(111,-32)
(64,-96)
(64,-16)
(31,-48)
(-96,-48)
(-32,79)
(16,48)
(-80,80)
(-48,128)
...
I have a matlab code which can read them as real numbers and convert them to complex numbers:
nsamps = (256*1024);
for i = 1:305
nstart = 1 + (i - 1) * nsamps ;
fid = fopen('data.dat');
fseek(fid,4 * nstart ,'bof');
y = fread(fid,[2,nsamps],'short');
fclose(fid);
x = complex(y(1,:),y(2,:));
I am using C++ and trying to get data as a vector<complex<float>>:
std::ifstream in('data.dat', std::ios_base::in | std::ios_base::binary);
fseek(infile1, 4*nstart, SEEK_SET);
vector<complex<float> > sx;
in.read(reinterpret_cast<char*>(&sx), sizeof(int));
and very confuse to get complex data using C++. Can anyone give me a help?
Theory
I'll try to explain some points using the issues in your code as examples.
Let's start from the end of the code. You try to read a number, which is stored as a four-byte single-precision floating point number, but you use sizeof(int) as a size argument. While on modern x86 platforms with modern compilers sizeof(int) tends to be equal to sizeof(float), it's not guaranteed. sizeof(int) is compiler dependent, so please use sizeof(float) instead.
In the matlab code you read 2*nsamps numbers, while in C++ code only four bytes (one number) is being read. Something like sizeof(float) * 2 * nsamps would be closer to matlab code.
Next, std::complex is a complicated class, which (in general) may have any implementation-defined internal representation. But luckily, here we read that
For any object z of type complex<T>, reinterpret_cast<T(&)[2]>(z)[0]
is the real part of z and reinterpret_cast<T(&)[2]>(z)[1] is the
imaginary part of z.
For any pointer to an element of an array of complex<T> named p and
any valid array index i, reinterpret_cast<T*>(p)[2*i] is the real part
of the complex number p[i], and reinterpret_cast<T*>(p)[2*i + 1] is
the imaginary part of the complex number p[i].
so we can just cast an std::complex to char type and read binary data there. But std::vector is a class template with it's implementation-defined internal representation as well! It means, that we can't just reinterpret_cast<char*>(&sx) and write binary data to the pointer, as it points to the beginning of the vector object, which is unlikely to be the beginning of the vector data. Modern C++ way to get the beginning of the data is to call sx.data(). Pre-C++11 way is to take an address of the first element: &sx[0]. Overwriting the object from the beginning will result in segfault almost always.
OK, now we have the beginning of the data buffer which is able to receive binary representation of complex numbers. But when you declared vector<complex<float> > sx;, it got zero size, and as you are not pushing or emplacing it's elements, the vector will not "know" that it should resize. Segfault again. So just call resize:
sx.resize(number_of_complex_numbers_to_store);
or use an appropriate constructor:
vector<complex<float> > sx(number_of_complex_numbers_to_store);
Before writing data to the vector. Note that these methods operate with "high-level" concept of number of stored elements, not number of bytes to store.
Putting it all together, the last two lines of your code should look like:
vector<complex<float> > sx(nsamps);
in.read(reinterpret_cast<char*>(sx.data()), 2 * nsamps * sizeof(float));
Minimal example
If you continue having troubles, try a simpler sandbox code first.
For example, let's write six floats to a binary file:
std::ofstream ofs("file.dat", std::ios::binary | std::ios::out);
float foo[] = {1,2,3,4,5,6};
ofs.write(reinterpret_cast<char*>(foo), 6*sizeof(float));
ofs.close();
then read them to a vector of complex:
std::ifstream ifs("file.dat", std::ios::binary | std::ios::in);
std::vector<std::complex<float>> v(3);
ifs.read(reinterpret_cast<char*>(v.data()), 6*sizeof(float));
ifs.close();
and, finally, print them:
std::cout << v[0] << " " << v[1] << " " << v[2] << std::endl;
The program prints:
(1,2) (3,4) (5,6)
so this approach works fine.
Binary files
Here is the remark about binary files which I initially posted as a comment.
Binary files haven't got the concept of "lines". The number of "lines" in binary file completely depends on the size of the window you are viewing it in. Think of binary files as of a magnetic tape, where each discrete position of the head is able to read only one byte. Interpretation of those bytes is up to you.
If everything should work fine, but you get weird numbers, check the displacement in fseek call. A mistake by a number of bytes yields random-looking values instead of the floats you wish to get.
Surely, you might just read a vector (or an array) of floats, observing the above considerations, and then convert them to complex numbers in a loop. Also, it's a good way to debug your fseek call to make sure that you start reading from the right place.

Escape sequence in char confusion

I came across this example here
.
#include <iostream>
int main()
{
int i = 7;
char* p = reinterpret_cast<char*>(&i);
if(p[0] == '\x7') //POINT OF INTEREST
std::cout << "This system is little-endian\n";
else
std::cout << "This system is big-endian\n";
}
What I'm confused about is the if statement. How do the escape sequences behave here? I get the same result with p[0] == '\07' (\x being hexadecimal escape sequence). How would checking if p[0] == '\x7' tell me if the system is little or big endian?
The layout of a (32-bit) integer in memory is;
Big endian:
+-----+-----+-----+-----+
| 0 | 0 | 0 | 7 |
+-----+-----+-----+-----+
^ pointer to int points here
Little endian:
+-----+-----+-----+-----+
| 7 | 0 | 0 | 0 |
+-----+-----+-----+-----+
^ pointer to int points here
What the code basically does is read the first char that the integer pointer points to, which in case of little endian is \x0, and big endian is \x7.
Hex 7 and octal 7 happens to be the same value, as is decimal 7.
The check is intended to try to determine if the value ends up in the first or last byte of the int.
A little endian system will store the bytes of the value in "reverse" order, with the lower part first
07 00 00 00
A big endian system would store the "big" end first
00 00 00 07
By reading the first byte, the code will see if the 7 ends up there, or not.
7 in decimal is the same as 7 in hexadecimal and 7 in octal, so it doesn't matter if you use '\x7', '\07', or even just 7 (numeric literal, not a character one).
As for the endianness test: the value of i is 7, meaning it will have the number 7 in its least significant byte, and 0 in all other bytes. The cast char* p = reinterpret_cast<char*>(&i); makes p point to the first byte in the representation of i. The test then checks whether that byte's value is 7. If so, it's the least significant byte, implying a little-endian system. If the value is not 7, it's not a little-endian system. The code assumes that it's a big-endian, although that's not strictly established (I believe there were exotic systems with some sort of mixed endianness as well, although the code will probably not run on such in practice).

Using bitwise operators in C++ to change 4 chars to int

What I must do is open a file in binary mode that contains stored data that is intended to be interpreted as integers. I have seen other examples such as Stackoverflow-Reading “integer” size bytes from a char* array. but I want to try taking a different approach (I may just be stubborn, or stupid :/). I first created a simple binary file in a hex editor that reads as follows.
00 00 00 47 00 00 00 17 00 00 00 41
This (should) equal 71, 23, and 65 if the 12 bytes were divided into 3 integers.
After opening this file in binary mode and reading 4 bytes into an array of chars, how can I use bitwise operations to make char[0] bits be the first 8 bits of an int and so on until the bits of each char are part of the int.
My integer = 00 00 00 00
+ ^ ^ ^ ^
Chars Char[0] Char[1] Char[2] Char[3]
00 00 00 47
So my integer(hex) = 00 00 00 47 = numerical value of 71
Also, I don't know how the endianness of my system comes into play here, so is there anything that I need to keep in mind?
Here is a code snippet of what I have so far, I just don't know the next steps to take.
std::fstream myfile;
myfile.open("C:\\Users\\Jacob\\Desktop\\hextest.txt", std::ios::in | std::ios::out | std::ios::binary);
if(myfile.is_open() == false)
{
std::cout &lt&lt "Error" &lt&lt std::endl;
}
char* mychar;
std::cout &lt&lt myfile.is_open() &lt&lt std::endl;
mychar = new char[4];
myfile.read(mychar, 4);
I eventually plan on dealing with reading floats from a file and maybe a custom data type eventually, but first I just need to get more familiar with using bitwise operations.
Thanks.
You want the bitwise left shift operator:
typedef unsigned char u8; // in case char is signed by default on your platform
unsigned num = ((u8)chars[0] << 24) | ((u8)chars[1] << 16) | ((u8)chars[2] << 8) | (u8)chars[3];
What it does is shift the left argument a specified number of bits to the left, adding zeros from the right as stuffing. For example, 2 << 1 is 4, since 2 is 10 in binary and shifting one to the left gives 100, which is 4.
This can be more written in a more general loop form:
unsigned num = 0;
for (int i = 0; i != 4; ++i) {
num |= (u8)chars[i] << (24 - i * 8); // += could have also been used
}
The endianness of your system doesn't matter here; you know the endianness of the representation in the file, which is constant (and therefore portable), so when you read in the bytes you know what to do with them. The internal representation of the integer in your CPU/memory may be different from that of the file, but the logical bitwise manipulation of it in code is independent of your system's endianness; the least significant bits are always at the right, and the most at the left (in code). That's why shifting is cross-platform -- it operates at the logical bit level :-)
Have you thought of using Boost.Spirit to make a binary parser? You might hit a bit of a learning curve when you start, but if you want to expand your program later to read floats and structured types, you'll have an excellent base to start from.
Spirit is very well-documented and is part of Boost. Once you get around to understanding its ins and outs, it's really mind-boggling what you can do with it, so if you have a bit of time to play around with it, I'd really recommend taking a look.
Otherwise, if you want your binary to be "portable" - i.e. you want to be able to read it on a big-endian and a little-endian machine, you'll need some sort of byte-order mark (BOM). That would be the first thing you'd read, after which you can simply read your integers byte by byte. Simplest thing would probably be to read them into a union (if you know the size of the integer you're going to read), like this:
union U
{
unsigned char uc_[4];
unsigned long ui_;
};
read the data into the uc_ member, swap the bytes around if you need to change endianness and read the value from the ui_ member. There's no shifting etc. to be done - except for the swapping if you want to change endianness..
HTH
rlc