What does this piece of Tiled C++ code do? - c++

I'm trying to figure out the purpose of this piece of code, from the Tiled utility's map format documentation.
const int gid = data[i] |
data[i + 1] << 8 |
data[i + 2] << 16 |
data[i + 3] << 24;
It looks like there is some "or-ing" and shifting of bits, but I have no clue what the aim of this is, in the context of using data from the tiled program.

Tiled stores its layer "Global Tile ID" (GID) data in an array of 32-bit integers, base64-encoded and (optionally) compressed in the XML file.
According to the documentation, these 32-bit integers are stored in little-endian format -- that is, the first byte of the integer contains the least significant byte of the number. As an analogy, in decimal, writing the number "1234" in little-endian would look like 4321 -- the 4 is the least significant digit in the number (representing a value of just 4), the 3 is the next-least-significant (representing 30), and so on. The only difference between this example and what Tiled is doing is that we're using decimal digits, while Tiled is using bytes, which are effectively digits that can each hold 256 different values instead of just 10.
If we think about the code in terms of decimal numbers, though, it's actually pretty easy to understand what it's doing. It's basically reconstructing the integer value from the digits by doing just this:
int digit[4] = { 4, 3, 2, 1 }; // our decimal digits in little-endian order
int gid = digit[0] +
digit[1] * 10 +
digit[2] * 100 +
digit[3] * 1000;
It's just moving each digit into position to create the full integer value. (In binary, bit shifting by multiples of 8 is like multiplying by powers of 10 in decimal; it moves a value into the next 'significant digit' slot)
More information on big-endian and little-endian and why the difference matters can be found in On Holy Wars And A Plea For Peace, an important (and entertainingly written) document from 1980 in which Danny Cohen argued for the need to standardise on a single byte ordering for network protocols. (spoiler: big-endian eventually won that fight, and so the big-endian representation of integers is now the standard way to represent integers in files and network transmissions -- and has been for decades. Tiled's use of little-endian integers in their file format is somewhat unusual. And results in needing code like the code you quoted in order to reliably convert the little-endian integers in the data file into the computer's native format. If they'd stored their data in the standard big-endian format, every OS provides standard utility functions for converting back and forth from big-endian to native, and you could simply have called ntohl() to assemble the native-format integer, instead of needing to write and comprehend this sort of byte manipulation code manually).

As you noted, the << operator shifts bits to the left by the given number.
This block takes the data[] array, which has four (presumably one byte) elements, and "encodes" those four values into one integer.
Example Time!
data[0] = 0x3A; // 0x3A = 58 = 0011 1010 in binary
data[1] = 0x48; // 0x48 = 72 = 0100 1000 in binary
data[2] = 0xD2; // 0xD2 = 210 = 1101 0010 in binary
data[3] = 0x08; // 0x08 = 8 = 0000 1000 in binary
int tmp0 = data[0]; // 00 00 00 3A = 0000 0000 0000 0000 0000 0000 0011 1010
int tmp1 = data[1] << 8; // 00 00 48 00 = 0000 0000 0000 0000 0100 1000 0000 0000
int tmp2 = data[2] << 16; // 00 D2 00 00 = 0000 0000 1101 0010 0000 0000 0000 0000
int tmp3 = data[3] << 24; // 08 00 00 00 = 0000 1000 0000 0000 0000 0000 0000 0000
// "or-ing" these together will set each bit to 1 if any of the bits are 1
int gid = tmp1 | // 00 00 00 3A = 0000 0000 0000 0000 0000 0000 0011 1010
tmp2 | // 00 00 48 00 = 0000 0000 0000 0000 0100 1000 0000 0000
tmp3 | // 00 D2 00 00 = 0000 0000 1101 0010 0000 0000 0000 0000
tmp4; // 08 00 00 00 = 0000 1000 0000 0000 0000 0000 0000 0000
gid == 147998778;// 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
Now, you've just encoded four one-byte values into a single four-byte integer.
If you're (rightfully) wondering, why would anyone want to go through all that effort when you can just use byte and store the four single-byte pieces of data directly into four bytes, then you should check out this question:
int, short, byte performance in back-to-back for-loops
Bonus Example!
To get your encoded values back, we use the "and" operator along with the right-shift >>:
int gid = 147998778; // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
// "and-ing" will set each bit to 1 if BOTH bits are 1
int tmp0 = gid & // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
0x000000FF; // 00 00 00 FF = 0000 0000 0000 0000 0000 0000 1111 1111
int data0 = tmp0; // 00 00 00 3A = 0000 0000 0000 0000 0000 0000 0011 1010
int tmp1 = gid & // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
0x0000FF00; // 00 00 FF 00 = 0000 0000 0000 0000 1111 1111 0000 0000
tmp1; //value of tmp1 00 00 48 00 = 0000 0000 0000 0000 0100 1000 0000 0000
int data1 = tmp1 >> 8; // 00 00 00 48 = 0000 0000 0000 0000 0000 0000 0100 1000
int tmp2 = gid & // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
0x00FF0000; // 00 FF 00 00 = 0000 0000 1111 1111 0000 0000 0000 0000
tmp2; //value of tmp2 00 D2 00 00 = 0000 0000 1101 0010 0000 0000 0000 0000
int data2 = tmp2 >> 16; // 00 00 00 D2 = 0000 0000 0000 0000 0000 0000 1101 0010
int tmp3 = gid & // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
0xFF000000; // FF 00 00 00 = 1111 1111 0000 0000 0000 0000 0000 0000
tmp3; //value of tmp3 08 00 00 00 = 0000 1000 0000 0000 0000 0000 0000 0000
int data3 = tmp3 >> 24; // 00 00 00 08 = 0000 0000 0000 0000 0000 0000 0000 1000
The last "and-ing" for tmp3 isn't needed, since the bits that "fall off" when shifting are just lost and the bits coming in are zero. So:
gid; // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
int data3 = gid >> 24; // 00 00 00 08 = 0000 0000 0000 0000 0000 0000 0000 1000
but I wanted to provide a complete example.

Related

What is the output of this C++ Program? chars are stored as ASCII values?

char char_ = '3';
unsigned int * custom_mem_address = (unsigned int *) &char_;
cout<<char_<<endl;
cout << *custom_mem_address<<endl;
Since custom_mem_address contains one byte value of char '3', I except it to contain the ascii value of '3' which is 51.
But the output is the following.
3
1644042035
Depending on the byte alignment at least one byte in the 1644042035 should be 51 right? But its not. Can you please explain.
Can someone explain where am I wrong
1644042035 in binary is 0110 0001 1111 1110 0001 0111 0011 0011 and 51 is 0011 0011.
0110 0001 1111 1110 0001 0111 0011 0011
0000 0000 0000 0000 0000 0000 0011 0011
Isn't that what you are looking for?

How to invert shifting and addition

1E 1B 01 13 6 [ 0001 1110 0001 1011 0000 0001 0001 0011 0110 ]is converted to F6C336 by doing sifting and addition.
(0x1E<<19)+(0x1B<<14)+(0x01<<9)+(0x13<<4)+6 = F6C336[1111 0110 1100 0011 0011 0110]
Now, I am stuck to reverse this calculation. i.e. From F6C336, I want to get 1E 1B 01 13 6.
Sorry for my poor knowledge in bit operations.
If these are four blocks of 5 bits each and one block of 4 bits each, then the "conversion" is their concatenation, and the inverse of that is splitting it back up into those pieces. For example:
piece0 = x >> 19;
piece1 = (x >> 14) & 31;
piece2 = (x >> 9) & 31;
piece3 = (x >> 4) & 31;
piece4 = x & 15;
Shown in Java here but the logic would be similar in most languages.
If the input was not of that form, for example if it was FF FF FF FF F then the inverse is ambiguous.

How to read FORTRAN binary file with c or c++?

I have FORTRAN 77 binary file (created on Sun Sparc machine,big endian). I want to read it on my little endian machine. I have come across this
http://paulbourke.net/dataformats/reading/
Paul has written these macros for C or C++, but I do not understand what they really do.
#define SWAP_2(x) ( (((x) & 0xff) << 8) | ((unsigned short)(x) >> 8) )
#define SWAP_4(x) ( ((x) << 24) | (((x) << 8) & 0x00ff0000) | \
(((x) >> 8) & 0x0000ff00) | ((x) >> 24) )
#define FIX_SHORT(x) (*(unsigned short *)&(x) = SWAP_2(*(unsigned short *)&(x)))
#define FIX_LONG(x) (*(unsigned *)&(x) = SWAP_4(*(unsigned *)&(x)))
#define FIX_FLOAT(x) FIX_LONG(x)
I know that every record of the file contains contains
x,y,z,t,d,i
i is integer*2,all other variables are real*4.
First 512 bytes hexdump
0000000 0000 1800 0000 0000 0000 0000 0000 0000
0000010 0000 0000 0000 0000 ffff ffff 0000 1800
0000020 0000 1800 003f 0000 0000 0000 233c 0ad7
0000030 0000 0000 233c 0ad7 0000 0100 0000 1800
0000040 0000 1800 803f 0000 0000 0000 233c 0ad7
0000050 0000 0000 233c 0ad7 0000 0100 0000 1800
0000060 0000 1800 c03f 0000 0000 0000 233c 0ad7
0000070 0000 0000 233c 0ad7 0000 0100 0000 1800
0000080 0000 1800 0040 0000 0000 0000 233c 0ad7
0000090 0000 0000 233c 0ad7 0000 0100 0000 1800
00000a0 0000 1800 2040 0000 0000 0000 233c 0ad7
00000b0 0000 0000 233c 0ad7 0000 0100 0000 1800
00000c0 0000 1800 4040 0000 0000 0000 233c 0ad7
00000d0 0000 0000 233c 0ad7 0000 0100 0000 1800
00000e0 0000 1800 6040 0000 0000 0000 233c 0ad7
00000f0 0000 0000 233c 0ad7 0000 0100 0000 1800
0000100 0000 1800 8040 0000 0000 0000 233c 0ad7
0000110 0000 0000 233c 0ad7 0000 0100 0000 1800
0000120 0000 1800 9040 0000 0000 0000 233c 0ad7
0000130 0000 0000 233c 0ad7 0000 0100 0000 1800
0000140 0000 1800 a040 0000 0000 0000 233c 0ad7
0000150 0000 0000 233c 0ad7 0000 0100 0000 1800
0000160 0000 1800 b040 0000 0000 0000 233c 0ad7
0000170 0000 0000 233c 0ad7 0000 0100 0000 1800
0000180 0000 1800 c040 0000 0000 0000 233c 0ad7
0000190 0000 0000 233c 0ad7 0000 0100 0000 1800
00001a0 0000 1800 d040 0000 0000 0000 233c 0ad7
00001b0 0000 0000 233c 0ad7 0000 0100 0000 1800
00001c0 0000 1800 e040 0000 0000 0000 233c 0ad7
00001d0 0000 0000 233c 0ad7 0000 0100 0000 1800
00001e0 0000 1800 f040 0000 0000 0000 233c 0ad7
00001f0 0000 0000 233c 0ad7 0000 0100 0000 1800
0000200
My code to read file
#include <endian.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *file;
char *buffer;
char *rec;
long fileLen;
file = fopen("rec.in", "rb");
fseek(file, 0, SEEK_END);
fileLen=ftell(file);
fseek(file, 0, SEEK_SET);
buffer=(char *)malloc(fileLen+1);
fread(buffer, fileLen, 1, file);
fclose(file);
free(buffer);
char *curr = buffer;
char *end = buffer + fileLen;
constexpr int LINE_SIZE = sizeof(float)*5 + sizeof(uint16_t); //based upon your "x,y,z,t,d,i" description
while(curr < end) {
uint32_t temp = be32toh(*reinterpret_cast<uint32_t*>(*curr));
float x = *reinterpret_cast<float*>(&temp);
temp = be32toh(*reinterpret_cast<uint32_t*>(*(curr+sizeof(float))));
float y = *reinterpret_cast<float*>(&temp);
temp = be32toh(*reinterpret_cast<uint32_t*>(*(curr+2*sizeof(float))));
float z = *reinterpret_cast<float*>(&temp);
temp = be32toh(*reinterpret_cast<uint32_t*>(*(curr+3*sizeof(float))));
float t = *reinterpret_cast<float*>(&temp);
temp = be32toh(*reinterpret_cast<uint32_t*>(*(curr+4*sizeof(float))));
float d = *reinterpret_cast<float*>(&temp);
uint16_t i = be16toh(*reinterpret_cast<uint16_t*>(*(curr+5*sizeof(float))));
curr += LINE_SIZE;
}
}
I got two errors
r.cc: In function ‘int main()’:
r.cc:29:1: error: ‘constexpr’ was not declared in this scope
constexpr int LINE_SIZE = sizeof(float)*5 + sizeof(uint16_t); //based upon your "x,y,z,t,d,i" description
^
r.cc:49:13: error: ‘LINE_SIZE’ was not declared in this scope
curr += LINE_SIZE;
If you're reading the file on a linux machine, there are some library functions provided for this purpose in the endian.h header (documentation here). To convert a 16-bit integer to host order (little-endian in your case):
uint16_t hostInteger = be16toh(bigEndianIntegerFromFile);
For floats, you can do something similar but incorporate reinterpretation:
float hostFloat = reinterpret_cast<float>(be32toh(reinterpret_cast<uint32_t>(bigEndianFloatFromFile)));
Or, if you read it as an unsigned int in the first place, you don't need the inner reinterpret_cast:
float hostFloat = reinterpret_cast<float>(be32toh(bigEndianUint32FromFile));
UPDATE: Given your code, you could read the file by inserting this between your fclose and free calls:
char *curr = buffer;
char *end = buffer + fileLen;
constexpr int LINE_SIZE = sizeof(float)*5 + sizeof(uint16_t); //based upon your "x,y,z,t,d,i" description
while(curr < end) {
uint32_t temp = be32toh(*reinterpret_cast<uint32_t*>(*curr));
float x = *reinterpret_cast<float*>(&temp);
temp = be32toh(*reinterpret_cast<uint32_t*>(*(curr+sizeof(float))));
float y = *reinterpret_cast<float*>(&temp);
temp = be32toh(*reinterpret_cast<uint32_t*>(*(curr+2*sizeof(float))));
float z = *reinterpret_cast<float*>(&temp);
temp = be32toh(*reinterpret_cast<uint32_t*>(*(curr+3*sizeof(float))));
float t = *reinterpret_cast<float*>(&temp);
temp = be32toh(*reinterpret_cast<uint32_t*>(*(curr+4*sizeof(float))));
float d = *reinterpret_cast<float*>(&temp);
uint16_t i = be16toh(*reinterpret_cast<uint16_t*>(*(curr+5*sizeof(float))));
curr += LINE_SIZE;
...
//do something with these values
...
}

Reconstructing integers using bit mask

I am quite new to bit masking and bit operations. Could you please help me understanding this. I have three integers a, b, and c and I have created a new number d with below operations:
int a = 1;
int b = 2;
int c = 92;
int d = (a << 14) + (b << 11) + c;
How do we reconstruct a, b and c using d?
I have no idea of the range of your a, b and c. However, assuming 3 bits for a and b, and 11 bits for c we can do:
a = ( d >> 14 ) & 7;
b = ( d >> 11 ) & 7;
c = ( d >> 0 ) & 2047;
Update:
The value of and-mask is computed as: (2^NumberOfBits)-1
a is 0000 0000 0000 0000 0000 0000 0000 0001
b is 0000 0000 0000 0000 0000 0000 0000 0010
c is 0000 0000 0000 0000 0000 0000 0101 1100
a<<14 is 0000 0000 0000 0000 0100 0000 0000 0000
b<<11 is 0000 0000 0000 0000 0001 0000 0000 0000
c is 0000 0000 0000 0000 0000 0000 0101 1100
d is 0000 0000 0000 0000 0101 0000 0101 1100
^ ^ { }
a b c
So a = d>>14
b = d>>11 & 7
c = d>>0 & 2047
By the way ,you should make sure the b <= 7 and c <= 2047

Question about bitwise And and Shift operations

How exactly do the following lines work if pData = "abc"?
pDes[1] = ( pData[0] & 0x1c ) >> 2;
pDes[0] = ( pData[0] << 6 ) | ( pData[1] & 0x3f );
Okay, assuming ASCII which is by no means guaranteed, pData[0] is 'a' (0x61) and pData[1] is 'b' (0x62):
pDes[1]:
pData[0] 0110 0001
&0x1c 0001 1100
---- ----
0000 0000
>>2 0000 0000 0x00
pDes[0]:
pData[0] 0110 0001
<< 6 01 1000 0100 0000 (interim value *a)
pData[1] 0110 0010
&0x3f 0011 1111
-- ---- ---- ----
0010 0010
|(*a) 01 1000 0100 0000
-- ---- ---- ----
01 1000 0110 0010 0x1862
How it works:
<< N simply means shift the bits N spaces to the left, >> N is the same but shifting to the right.
The & (and) operation will set each bit of the result to 1 if and only if the corresponding bit in both inputs is 1.
The | (or) operations sets each bit of the result to 1 if one or more of the corresponding bit in both inputs is 1.
Note that the 0x1862 will be truncated to fit into pDes[0] if it's type is not wide enough.
The folowing C program shows this in action:
#include <stdio.h>
int main(void) {
char *pData = "abc";
int pDes[2];
pDes[1] = ( pData[0] & 0x1c ) >> 2;
pDes[0] = ( pData[0] << 6 ) | ( pData[1] & 0x3f );
printf ("%08x %08x\n", pDes[0], pDes[1]);
return 0;
}
It outputs:
00001862 00000000
and, when you change pDes to a char array, you get:
00000062 00000000
& is not logical AND - it is bit-wise AND.
a is 0x61, thus pData[0] & 0x1c gives
0x61 0110 0001
0x1c 0001 1100
--------------
0000 0000
>> 2 shifts this to right by two positions - value doesn't change as all bits are zero.
pData[0] << 6 left shifts 0x61 by 6 bits to give 01000000 or 0x40
pData[1] & 0x3f
0x62 0110 0010
0x3f 0011 1111
--------------
0x22 0010 0010
Thus it comes down to 0x40 | 0x22 - again | is not logical OR, it is bit-wise.
0x40 0100 0000
0x22 0010 0010
--------------
0x62 0110 0010
The results will be different if pDes is not a char array. Left shifting 0x61 would give you 0001 1000 0100 0000 or 0x1840 - (in case pDes is a char array, the left parts are not in the picture).
0x1840 0001 1000 0100 0000
0x0022 0000 0000 0010 0010
--------------------------
0x1862 0001 1000 0110 0010
pDes[0] would end up as 0x1862 or decimal 6242.
C++ will treat a character as a number according to it's encoding. So, assuming ASCII, 'a' is 97 (which has a bit pattern of 0110_0001) and 'b' is 98 (bit pattern 0110_0010).
Once you think of them as numbers, bit operations on characters should be a bit clearer.
In C, all characters are also integers. That means "abc" is equivalent to (char[]){0x61, 0x62, 0x63, 0}.
The & is not the logical AND operator (&&). It is the bitwise AND, which computes the AND at bit-level, e.g.
'k' = 0x6b -> 0 1 1 0 1 0 1 1
0x1c -> 0 0 0 1 1 1 0 0 (&
———————————————————
8 <- 0 0 0 0 1 0 0 0
The main purpose of & 0x1c here is to extract bits #2 ~ #4 from pData[0]. The >> 2 afterwards remove the extra zeros at the end.
Similarly, the & 0x3f is to extract bits #0 ~ #5 from pData[1].
The << 6 pushes 6 zeros at the least significant end of the bits. Assuming pDes[0] is also a char, the most significant 6 bits will be discarded:
'k' = 0x6b -> 0 1 1 0 1 0 1 1
<< 6 = 0 1 1 0 1 0 1 1 0 0 0 0 0 0
xxxxxxxxxxx—————————————————
0xc0 <- 1 1 0 0 0 0 0 0
In terms of bits, if
pData[1] pData[0]
pData -> b7 b6 b5 b4 b3 b2 b1 b0 a7 a6 a5 a4 a3 a2 a1 a0
then
pDes -> 0 0 0 0 0 a4 a3 a2 a1 a0 b5 b4 b3 b2 b1 b0
pDes[1] pDes[0]
This looks like an operation to pack three values into a 6-5-5 bit structure.