why are the bytes in byte array reversed in C++

why are the bytes in byte array reversed in C++ - c++

the code i am trying to understand overwrites a section of a game process memory (window.h, WriteProcessMemory) in order to modify a parameter in the game (for example, strength). the values would most likely be integers
the code attempts replacement with this function
WriteProcessMemory( GameHandle, (BYTE*)StrengthMemoryAddress, &StrengthValue, sizeof(StrengthValue), NULL);
where StrengthMemoryAddress is a pre-calculated dynamic address and StrengthValue is the following:
byte StrengthValue[] = { 0x39, 0x5, 0x0, 0x0 };
it replaces strength with 1337
my question is basically how the byte array works in this function. from google i know that the hex value of 1337 is 0x539.
how come you have to reverse it in the byte array? i see that he first puts 0x39 then 0x5, which i concluded probably combines to 0x539 in some reverse order. also, why do you need the extra 0x0 at the end - can't you just leave it out?
thanks

from google i know that the hex value of 1337 is 0x539.
Or it is 0x00000539 which is same but written as a 4 byte integer. Now if you write this integer in little endian way in memory you would have to store it in following order (Least significant byte - 0x39 - goes first):
Memory Address Values
1000 0x39
1001 0x05
1002 0x00
1003 0x00
So that has to do with endianness. You may want to read more on that topic.

You were expecting the 0x39 to be the highest byte (Big Endian), but you ended up on an architecture where it is the lowest byte (Little Endian).
Looking at an int logically as:
[ BYTE 0 ][ BYTE 1 ][ BYTE 2 ][ BYTE 3 ]
* 256^3 *256^2 *256 *1
MSB LSB
But that does not mean the architecture you are on maps a char array in that way. In, fact it did the opposite.
value [what you expected] [what you got]
BIG ENDIAN LITTLE ENDIAN
0x39 BYTE 0 BYTE 3
0x05 BYTE 1 BYTE 2
0x00 BYTE 2 BYTE 1
0x00 BYTE 3 BYTE 0
If you do not set all 4 bytes than the missing bytes are called uninialized memory and using it through the int you create is considered undefined behavior. This will likely just leave an unexpected value in the missing byte (whatever happened to be there before), but the compiler is free to do whatever it wants, like removing code you thought would do something, leading to very unexpected behavior for you.

The numbers you're writing have to be in Little Endian format. I recommend you read up on Endianness.
As for the extra 0 at the end: You have to overwrite the entirety of the byte-length of the int, or you'll risk leaving behind old values which would corrupt the value of the int you're writing.

Related

How can i reverse the order of hex in c++

To start with, I have a char array that store data
unsigned char dat[3];
memset(dat, 0, sizeof(dat));
memcpy(dat, &no, 2);
when I inspect dat, it contain a hex of 0xfd 0x01
as the value of no is 509, the hex should be 0x01 0xfd
I'm wondering should I be concern of the order of the hexadecimal,
should I change the order. Many thanks

Your system is little endian. It's hardware dependent and on little endian platform first byte is the least significant one when treated as part of multi-byte value. Look up: https://en.wikipedia.org/wiki/Endianness
Essentially if CPU is little endian, then value 0x12345689 would be represented as set of bytes starting with 0x89. On big endian it's opposite order and on mixed endian it may change during run-time.

The question really is what do you want to do next ? On your current hardware (Little Endian) this is how the system orders bytes of a numeric. The least significant byte comes first: 0xfd 0x01.
In case you really want to swap this byte order, for whatever reason, checkout: How do I convert between big-endian and little-endian values in C++?

Invert orientation of memory space?

I'm trying to read some bytes from a file.
This is what I've done:
struct HeaderData {
char format[2];
char n_trks[2];
char division[2];
};
HeaderData* header = new HeaderData;
Then, to get the data directly from the file to header I do
file.read(reinterpret_cast<char*>(header), sizeof(HeaderData))
If the first two bytes are00 06, header->format[0] will be 00 and header->format[1] 06. This two numbers combined represent the number 0x0006 which is 6 in decimal, which is the desired value.
When I do something like
*reinterpret_cast<unsigned*>(header->format) // In this case, the result is 0x0600
it erroneously returns the number 0x0600, so it seems that it inverts the reading of bytes.
My question is what is some workaround to correctly read the numbers as unsigned.

This is going to be an endianness mismatch.
When you read in from the file in that fashion, the bytes will be placed into your structure in the exact order they were in the file in.
When you read from the structure with an unsigned, the processor will interpret those bytes in whatever order the architecture requires it to do (most are hardcoded but some can be set to either order).
Or to put it another way
This two numbers combined represent the number 0x0006 which is 6 in decimal.
That's not necessarily remotely true. It's perfectly permissible for the processor of your choice to represent 6 in decimal as 0x06 0x00, this would be the little-endian scheme which is used on very common processors like x86. Representing it as 0x00 0x06 would be big-endian.
As M.M has stated in his comment, if your format explicitly defines the integer to be little-endian, you should explicitly read it as little-endian, e.g. format[0] + format[1] * 256, or if it is defined to be big-endian, you should read it as format[0] * 256 + format[1]. Don't rely on the processor's endianness happening to match the endianness of the data.

Difference between byte flip and byte swap

I am trying to find the difference becoz of byte flip functionality I see in Calculator on Mac with Programmer`s view.
So I wrote a program to byte swap a value which we do to go from small to big endian or other way round and I call it as byte swap. But when I see byte flip I do not understand what exactly it is and how is it different than byte swap. I did confirm that the results are different.
For example, for an int with value 12976128
Byte Flip gives me 198;
Byte swap gives me 50688.
I want to implement an algorithm for byte flip since 198 is the value I want to get while reading something. Anything on google says byte flip founds the help byte swap which isnt the case for me.

Byte flip and byte swap are synonyms.
The results you see are just two different ways of swapping the bytes, depending on whether you look at the number as a 32bit number (consisting of 4 bytes), or as the smallest size of a number that can hold 12976128, which is 24 bits or 3 bytes.
The 4byte swap is more usual in computer culture, because 32bit processors are currently predominant (even 64bit architectures still do most of their mathematics in 32bit numbers, partly because of backward compatible software infrastructure, partly because it is enough for many practical purposes). But the Mac Calculator seems to use the minimum-width swap, in this case a 3 byte swap.
12976128, when converted to hexadecimal, gives you 0xC60000. That's 3 bytes total ; each hexadecimal digit is 4 bits, or half a byte wide. The bytes to be swapped are 0xC6, zero, and another zero.
After 3byte swap: 0x0000C6 = 198
After 4byte swap: 0x0000C600 = 50688

Strange address conversion in C/C++? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I can't say that I know C/C++ bad, but I've encountered with interesting syntax.
I have this code:
int i=7;
char* m=(char*)&i;
m[2]=9;
cout<<i;
Its output 589831. So can someone explain me in details what is going here.

The integer i very likely takes 4 bytes, arranged with the lowest value first (little endian). In memory the values look like this:
0x07 0x00 0x00 0x00
You changed the value at index 2 so now it looks like:
0x07 0x00 0x09 0x00
If you reverse the bytes and put them back together, they make the hex value 0x00090007 which is the same as 589831 in decimal.

a 4-byte integer is filled with the number 7.
the 4-byte integer is mapped to an array of four single bytes (chars). On a little-endian architecture like x86 the least significant bytes come first in a number, so the array looks like this in memory: { 07, 00, 00, 00 }
the 3rd byte of the integer slash byte array is changed to 9. It now looks like this: { 07, 00, 09, 00 }
the resulting integer (hexadecimal 90007) is written to stdout (in decimal format: 589831).
Long story short, it's an example how you can manipulate individual bytes in a multi-byte integer.

You are casting the integer address to a char* then modifying it using array notation. This step
m[2] = 9;
is the same as the pointer arithmetic
*(m+2) = 9;
that is to say, it is modifying the byte at address of m + 2 bytes. Thus you have changed one of the bytes (3rd) in your initial integer value

Here is my breakdown of what is going on, then an explanation.
// An integer on the stack, probably 4 bytes big, but we can't say that for sure.
int i=7; // Looks like 0x0000007 in memory. Endianness needs to be considered.
// Treat that integer as a \0 terminated string.
char* m=(char*)&i; // Acts as an empty string since the first byte is a 0, but we can't count on that.
// Set the second byte to 9.
m[2]=9; // Results in i being 0x00090007 (589831 decimal) on whatever architecture you are running. Once again, can't count on it.
// Print the modified integer.
cout<<i;
This is an incredibly dangerous and stupid thing to do for three reasons...
You should not count on the endianness of your architecture. Your code may end up running on a CPU that has a different underlying representation of what an int is.
You cannot count on int to always be 4 bytes.
You now have a char* that if you ever go to perform a string operation on it could cause a crash. In your specific case, it will print an empty string, but it would not take much for that integer to not have a 0 byte in it and go on reading other parts of your stack.
If you really, really, really need to do this, the preferred method is to use unions but this kind of bit twiddling is very error prone and unions do very little to help.

int i=7 reserves 4 bytes of memory for integer and depending on CPU architecture (lets say yours is i86) would produce something like this in memory
7 0 0 0
then a pointer m created to point at the beginning of 7 0 0 0 .
after m[2] = 9 memory should look like
7 0 9 0 (arrays are zero based);
then you printout i

What's the concept behind zip compression?

What's the concept behind zip compression? I can understand the concept of removing empty space etc, but presumably something has to be added to say how much/where that free space needs to be added back in during decompression?
What's the basic process for compressing a stream of bytes?

A good place to start would be to lookup the Huffman compression scheme. The basic idea behind huffman is that in a given file some bytes appear more frequently then others (in a plaintext file many bytes won't appear at all). Rather then spend 8 bits to encode every byte, why not use a shorter bit sequence to encode the most common characters, and a longer sequences to encode the less common characters (these sequences are determined by creating a huffman tree).
Once you get a handle on using these trees to encode/decode files based on character frequency, imagine that you then start working on word frequency - instead of encoding "they" as a sequence of 4 characters, why not consider it to be a single character due to its frequency, allowing it to be assigned its own leaf in the huffman tree. This is more or less the basis of ZIP and other lossless type compression - they look for common "words" (sequences of bytes) in a file (including sequences of just 1 byte if common enough) and use a tree to encode them. The zip file then only needs to include the tree info (a copy of each sequence and the number of times it appears) to allow the tree to be reconstructed and the rest of the file to be decoded.
Follow up:
To better answer the original question, the idea behind lossless compression is not so much to remove empty space, but to remove redundent information.
If you created a database to store music lyrics, you'd find a lot of space was being used to store the chorus which repeats several times. Instead of using all that space, you could simply place the word CHORUS before the first instance of the chorus lines, and then every time the chorus is to be repeated, just use CHORUS as a place holder (in fact this is pretty much the idea behind LZW compression - in LZW each line of the song would have a number shown before it. If a line repeats later in the song, rather then write out the whole line only the number is shown)

The basic concept is that instead of using eight bits to represent each byte, you use shorter representations for more frequently occuring bytes or sequences of bytes.
For example, if your file consists solely of the byte 0x41 (A) repeated sixteen times, then instead of representing it as the 8-bit sequence 01000001 shorten it to the 1-bit sequence 0. Then the file can be represented by 0000000000000000 (sixteen 0s). So then the file of the byte 0x41 repeated sixteen times can be represented by the file consisting of the byte 0x00 repeated twice.
So what we have here is that for this file (0x41 repeated sixteen times) the bits 01000001 don't convey any additional information over the bit 0. So, in this case, we throw away the extraneous bits to obtain a shorter representation.
That is the core idea behind compression.
As another example, consider the eight byte pattern
0x41 0x42 0x43 0x44 0x45 0x46 0x47 0x48
and now repeat it 2048 times. One way to follow the approach above is to represent bytes using three bits.
000 0x41
001 0x42
010 0x43
011 0x44
100 0x45
101 0x46
110 0x47
111 0x48
Now we can represent the above byte pattern by 00000101 00111001 01110111 (this is the three-byte pattern 0x05 0x39 0x77) repeated 2048 times.
But an even better approach is to represent the byte pattern
0x41 0x42 0x43 0x44 0x45 0x46 0x47 0x48
by the single bit 0. Then we can represent the above byte pattern by 0 repeated 2048 times which becomes the byte 0x00 repeated 256 times. Now we only need to store the dictionary
0 -> 0x41 0x42 0x43 0x44 0x45 0x46 0x47 0x48
and the byte pattern 0x00 repeated 256 times and we compressed the file from 16,384 bytes to (modulo the dictionary) 256 bytes.
That, in a nutshell is how compression works. The whole business comes down to finding short, efficient representations of the bytes and byte sequences in a given file. That's the simple idea, but the details (finding the representation) can be quite challenging.
See for example:
Data compression
Run length encoding
Huffman compression
Shannon-Fano coding
LZW

The concept between compression is basically statististical. If you've got a series of bytes, the chance of byte N being X in practice depends on the value distribution of the previous bytes 0..N-1. Without compression, you allocate 8 bits for each possible value X. With compression, the amounts of bytes allocated for each value X depends on the estimated chance p(N,X).
For instance, given a sequence "aaaa", a compression algorithm can assign a high value to p(5,a) and lower values to p(5,b). When p(X) is high, the bitstring assigned to X will be short, when p(X) is low a long bitstring is used. In this way, if p(N,X) is a good estimate then the average bitstring will be shorter than 8 bits.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js