I am required to generate a 4 byte checksum which is defined as "a 32-bit bitwise exclusive-OR negation value" of some piece of binary data. I am re-writing the encode/decode sections of a certain MML interface for a billing system in Erlang.
The C/C++ version of such a function is here below:
Function: GetChkSum
Description:
A 32-bit bitwise Exclusive-OR negation value of "message header
+ session header + transaction header + operation information".
Input:
len indicates the total length of "message header + session header
+ transaction header + operation information".
Buf indicates the string consisting of message header, session header,
transaction header, and operation information.
Output: res indicates the result of the 32-bit bitwise Exclusive-OR negation
value
void GetChkSum(Int len, PSTR buf, PSTR res)
{
memset(res, 0, MSG_CHKSUM_LEN);
for(int i=0; i<len; i+=4)
{
res[0]^=(buf+i)[0];
res[1]^=(buf+i)[1];
res[2]^=(buf+i)[2];
res[3]^=(buf+i)[3];
};
res[0]=~res[0];
res[1]=~res[1];
res[2]=~res[2];
res[3]=~res[3];
};
I am required to re-write this in Erlang. How can I do this?
There is no difficulty to do an xor in erlang (the operator to use is bxor and works with integer). But to write any code you need to define the "format" of input and output first. From your example I guess it may be ascii code, stored in a binary, or a string??
Once you have define the input type, the result can be evaluated with a function of the type:
negxor(<<>>,R) -> int_to_your_result_type(bnot(R) band 16#FFFFFFFF);
negxor(<<H:32,Q:binary>>,R) -> negxor(Q,R bxor H).
and you can call it with negxor(your_input_to_binary(Input),0).
Related
I've got to program a function that receives
a binary number like 10001, and
a decimal number that indicates how many shifts I should perform.
The problem is that if I use the C++ operator <<, the zeroes are pushed from behind but the first numbers aren't dropped... For example
shifLeftAddingZeroes(10001,1)
returns 100010 instead of 00010 that is what I want.
I hope I've made myself clear =P
I assume you are storing that information in int. Take into consideration, that this number actually has more leading zeroes than what you see, ergo your number is most likely 16 bits, meaning 00000000 00000001 . Maybe try AND-ing it with number having as many 1 as the number you want to have after shifting? (Assuming you want to stick to bitwise operations).
What you want is to bit shift and then limit the number of output bits which can be active (hold a value of 1). One way to do this is to create a mask for the number of bits you want, then AND the bitshifted value with that mask. Below is a code sample for doing that, just replace int_type with the type of value your using -- or make it a template type.
int_type shiftLeftLimitingBitSize(int_type value, int numshift, int_type numbits=some_default) {
int_type mask = 0;
for (unsigned int bit=0; bit < numbits; bit++) {
mask += 1 << bit;
}
return (value << numshift) & mask;
}
Your output for 10001,1 would now be shiftLeftLimitingBitSize(0b10001, 1, 5) == 0b00010.
Realize that unless your numbits is exactly the length of your integer type, you will always have excess 0 bits on the 'front' of your number.
I have instructions on creating a checksum of a message described like this:
The checksum consists of a single byte equal to the two’s complement sum of all bytes starting from the “message type” word up to the end of the message block (excluding the transmitted checksum). Carry from the most significant bit is ignored.
Another description I found was:
The checksum value contains the twos complement of the modulo 256 sum of the other words in the data message (i.e., message type, message length, and data words). The receiving equipment may calculate the modulo 256 sum of the received words and add this sum to the received checksum word. A result of zero generally indicates that the message was correctly received.
I understand this to mean that I sum the value of all bytes in message (excl checksum), get modulo 256 of this number. get twos complement of this number and that is my checksum.
But I am having trouble with an example message example (from design doc so I must assume it has been encoded correctly).
unsigned char arr[] = {0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30,0x2,0x8,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe};
So the last byte, 0xE, is the checksum. My code to calculate the checksum is as follows:
bool isMsgValid(unsigned char arr[], int len) {
int sum = 0;
for(int i = 0; i < (len-1); ++i) {
sum += arr[i];
}
//modulo 256 sum
sum %= 256;
char ch = sum;
//twos complement
unsigned char twoscompl = ~ch + 1;
return arr[len-1] == twoscompl;
}
int main(int argc, char* argv[])
{
unsigned char arr[] = {0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30,0x2,0x8,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe};
int arrsize = sizeof(arr) / sizeof(arr[0]);
bool ret = isMsgValid(arr, arrsize);
return 0;
}
The spec is here:= http://www.sinet.bt.com/227v3p5.pdf
I assume I have misunderstood the algorithm required. Any idea how to create this checksum?
Flippin spec writer made a mistake in their data example. Just spotted this then came back on here and found others spotted too. Sorry if I wasted your time. I will study responses because it looks like some useful comments for improving my code.
You miscopied the example message from the pdf you linked. The second parameter length is 9 bytes, but you used 0x08 in your code.
The document incorrectly states "8 bytes" in the third column when there are really 9 bytes in the parameter. The second column correctly states "00001001".
In other words, your test message should be:
{0x80,0x15,0x1,0x8,0x30,0x33,0x31,0x35,0x31,0x30,0x33,0x30, // param1
0x2,0x9,0x30,0x33,0x35,0x31,0x2d,0x33,0x32,0x31,0x30,0xe} // param2
^^^
With the correct message array, ret == true when I try your program.
Agree with the comment: looks like the checksum is wrong. Where in the .PDF is this data?
Some general tips:
Use an unsigned type as the accumulator; that gives you well-defined behavior on overflow, and you'll need that for longer messages. Similarly, if you store the result in a char variable, make it unsigned char.
But you don't need to store it; just do the math with an unsigned type, complement the result, add 1, and mask off the high bits so that you get an 8-bit result.
Also, there's a trick here, if you're on hardware that uses twos-complement arithmetic: just add all of the values, including the checksum, then mask off the high bits; the result will be 0 if the input was correct.
The receiving equipment may calculate the modulo 256 sum of the received words and add this sum to the received checksum word.
It's far easier to use this condition to understand the checksum:
{byte 0} + {byte 1} + ... + {last byte} + {checksum} = 0 mod 256
{checksum} = -( {byte 0} + {byte 1} + ... + {last byte} ) mod 256
As the others have said, you really should use unsigned types when working with individual bits. This is also true when doing modular arithmetic. If you use signed types, you leave yourself open to a rather large number of sign-related mistakes. OTOH, pretty much the only mistake you open yourself up to using unsigned numbers is things like forgetting 2u-3u is a positive number.
(do be careful about mixing signed and unsigned numbers together: there are a lot of subtleties involved in that too)
I have been following the msdn example that shows how to hash data using the Windows CryptoAPI. The example can be found here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa382380%28v=vs.85%29.aspx
I have modified the code to use the SHA1 algorithm.
I don't understand how the code that displays the hash (shown below) in hexadecmial works, more specifically I don't understand what the >> 4 operator and the & 0xf operator do.
if (CryptGetHashParam(hHash, HP_HASHVAL, rgbHash, &cbHash, 0)){
printf("MD5 hash of file %s is: ", filename);
for (DWORD i = 0; i < cbHash; i++)
{
printf("%c%c", rgbDigits[rgbHash[i] >> 4],
rgbDigits[rgbHash[i] & 0xf]);
}
printf("\n");
}
I would be grateful if someone could explain this for me, thanks in advance :)
x >> 4 shifts x right four bits. x & 0xf does a bitwise and between x and 0xf. 0xf has its four least significant bits set, and all the other bits clear.
Assuming rgbHash is an array of unsigned char, this means the first expression retains only the four most significant bits and the second expression the four least significant bits of the (presumably) 8-bit input.
Four bits is exactly what will fit in one hexadecimal digit, so each of those is used to look up a hexadecimal digit in an array which presumably looks something like this:
char rgbDigits[] = "0123456789abcdef"; // or possibly upper-case letters
this code uses simple bit 'filtering' techniques
">> 4" means shift right by 4 places, which in turn means 'divide by 16'
"& 0xf" equals to bit AND operation which means 'take first 4 bits'
Both these values are passed to rgbDigits which proly produced output in valid range - human readable
I've read about [ostream] << hex << 0x[hex value], but I have some questions about it
(1) I defined my file stream, output, to be a hex output file stream, using output.open("BWhite.bmp",ios::binary);, since I did that, does that make the hex parameter in the output<< operation redundant?
(2)
If I have an integer value I wanted to store in the file, and I used this:
int i = 0;
output << i;
would i be stored in little endian or big endian? Will the endi-ness change based on which computer the program is executed or compiled on?
Does the size of this value depend on the computer it's run on? Would I need to use the hex parameter?
(3) Is there a way to output raw hex digits to a file? If I want the file to have the hex digit 43, what should I use?
output << 0x43 and output << hex << 0x43 both output ASCII 4, then ASCII 3.
The purpose of outputting these hex digits is to make the header for a .bmp file.
The formatted output operator << is for just that: formatted output. It's for strings.
As such, the std::hex stream manipulator tells streams to output numbers as strings formatted as hex.
If you want to output raw binary data, use the unformatted output functions only, e.g. basic_ostream::put and basic_ostream::write.
You could output an int like this:
int n = 42;
output.write(&n, sizeof(int));
The endianness of this output will depend on the architecture. If you wish to have more control, I suggest the following:
int32_t n = 42;
char data[4];
data[0] = static_cast<char>(n & 0xFF);
data[1] = static_cast<char>((n >> 8) & 0xFF);
data[2] = static_cast<char>((n >> 16) & 0xFF);
data[3] = static_cast<char>((n >> 24) & 0xFF);
output.write(data, 4);
This sample will output a 32 bit integer as little-endian regardless of the endianness of the platform. Be careful converting that back if char is signed, though.
You say
"Is there a way to output raw hex digits to a file? If I want the file to have the hex digit 43, what should I use? "
"Raw hex digits" will depend on the interpretation you do on a collection of bits. Consider the following:
Binary : 0 1 0 0 1 0 1 0
Hex : 4 A
Octal : 1 1 2
Decimal : 7 4
ASCII : J
All the above represents the same numeric quantity, but we interpret it differently.
So you can simply need to store the data as binary format, that is the exact bit pattern which is represent by the number.
EDIT1
When you open a file in text mode and write a number in it, say when you write 74 (as in above example) it will be stored as two ASCII character '7' and '4' . To avoid this open the file in binary mode ios::binary and write it with write () . Check http://courses.cs.vt.edu/~cs2604/fall00/binio.html#write
The purpose of outputting these hex digits is to make the header for a .bmp file.
You seem to have a large misconception of how files work.
The stream operators << generate text (human readable output). The .bmp file format is a binary format that is not human readable (will it is but its not nice and I would not read it without tools).
What you really want to do is generate binary output and place it the file:
char x = 0x43;
output.write(&x, sizeof(x));
This will write one byte of data with the hex value 0x43 to the output stream. This is the binary representation you want.
would i be stored in little endian or big endian? Will the endi-ness change based on which computer the program is executed or compiled on?
Neither; you are again outputting text (not binary data).
int i = 0;
output.write(reinterpret_cast<char*>(&i), sizeof(i)); // Writes the binary representation of i
Here you do need to worry about endianess (and size) of the integer value and this will vary depending on the hardware that you run your application on. For the value 0 there is not much tow worry about endianess but you should worry about the size of the integer.
I would stick some asserts into my code to validate the architecture is OK for the code. Then let people worry about if their architecture does not match the requirements:
int test = 0x12345678;
assert((sizeof(test) * CHAR_BITS == 32) && "BMP uses 32 byte ints");
assert((((char*)&test)[0] == 0x78) && "BMP uses little endian");
There is a family of functions that will help you with endianess and size.
http://www.gnu.org/s/hello/manual/libc/Byte-Order.html
Function: uint32_t htonl (uint32_t hostlong)
This function converts the uint32_t integer hostlong from host byte order to network byte order.
// Writing to a file
uint32_t hostValue = 0x12345678;
uint32_t network = htonl(hostValue);
output.write(&network, sizeof(network));
// Reading from a file
uint32_t network;
output.read(&network, sizeof(network);
uint32_t hostValue = ntohl(network); // convert back to platform specific value.
// Unfortunately the BMP was written with intel in-mind
// and thus all integers are in liitle-endian.
// network bye order (as used by htonl() and family) is big endian.
// So this may not be much us to you.
Last thing. When you open a file in binary format output.open("BWhite.bmp",ios::binary) it does nothing to stream apart from how it treats the end of line sequence. When the file is in binary format the output is not modified (what you put in the stream is what is written to the file). If you leave the stream in text mode then '\n' characters are converted to the end of line sequence (OS specific set of characters that define the end of line). Since you are writing a binary file you definitely do not want any interference in the characters you write so binary is the correct format. But it does not affect any other operation that you perform on the stream.
I have two hex strings, accompanied by masks, that I would like to merge into a single string value/mask pair. The strings may have bytes that overlap but after applying masks, no overlapping bits should contradict what the value of that bit must be, i.e. value1 = 0x0A mask1 = 0xFE and value2 = 0x0B, mask2 = 0x0F basically says that the resulting merge must have the upper nibble be all '0's and the lower nibble must be 01011
I've done this already using straight c, converting strings to byte arrays and memcpy'ing into buffers as a prototype. It's tested and seems to work. However, it's ugly and hard to read and doesn't throw exceptions for specific bit requirements that contradict. I've considered using bitsets, but is there another way that might not demand the conversion overhead? Performance would be nice, but not crucial.
EDIT: More detail, although writing this makes me realize I've made a simple problem too difficult. But, here it is, anyway.
I am given a large number of inputs that are binary searches of a mixed-content document. The document is broken into pages, and pages are provided by an api the delivers a single page at a time. Each page needs to be searched with the provided search terms.
I have all the search terms prior to requesting pages. The input are strings representing hex digits (this is what I mean by hex strings) as well a mask to indicate bits that are significant in the input hex string. Since I'm given all input up-front I wanted to improve the search of each page returned. I wanted to pre-process merge these hex strings together. To make the problem more interesting, every string has an optional offset into the page where they must appear and a lack of an offset indicates that the string can appear anywhere in a page requested. So, something like this:
class Input {
public:
int input_id;
std::string value;
std::string mask;
bool offset_present;
unsigned int offset;
};
If a given Input object has offset_present = false, then any value assigned to offset is ignored. If offset_present is false, then it clearly can't be merged with other inputs.
To make the problem more interesting, I want to report an output that provides information about what was found (input_id that was found, where the offset was, etc). Merging some input (but not others) makes this a bit more difficult.
I had considered defining a CompositeInput class and was thinking about the underlying merger be a bitset, but further reading about about bitsets made me realize it wasn't what I really thought. My inexperience made me give up on the composite idea and go brute force. I necessarily skipped some details about other input types an additional information to be collected for the output (say, page number, parag. number) when an input is found. Here's an example output class:
class Output {
public:
Output();
int id_result;
unsigned int offset_result;
};
I would want to product N of these if I merge N hex strings, keeping any merger details hidden from the user.
I don't know what a hexstring is... but other than that it should be like this:
outcome = (value1 & mask1) | (value2 & mask2);
it sounds like |, & and ~ would work?
const size_t prefix = 2; // "0x"
const size_t bytes = 2;
const char* value1 = "0x0A";
const char* mask1 = "0xFE";
const char* value2 = "0x0B";
const char* mask2 = "0x0F";
char output[prefix + bytes + 1] = "0x";
uint8_t char2int[] = { /*zeroes until index '0'*/ 0,1,2,3,4,5,6,7,8,9 /*...*/ 10,11,12,13,14,15 };
char int2char[] = { '0', /*...*/ 'F' };
for (size_t ii = prefix; ii != prefix + bytes; ++ii)
{
uint8_t result1 = char2int[value1[ii]] & char2int[mask1[ii]];
uint8_t result2 = char2int[value2[ii]] & char2int[mask2[ii]];
if (result1 & result2)
throw invalid_argument("conflicting bits");
output[ii] = int2char[result1 | result2];
}