(Bitwise Logic) What does AND'ing something with 0x7F accomplish? - bit-manipulation

I'm trying to understand a program that I have disassembled. I'm understanding it so far.
However, I do not understand why the program is AND'ing an integer with 0x7F.
It also likes to AND an integer with 0xFF. The program is somewhat of a random number generator.
What does this accomplish?
I think AND'ing with 0xFF takes the lower byte (of a register) and discards the rest?
Specifically in MIPS ASM:
## r2 = 0xfd r3 = 0x10 ##
andi r2,r2,0x00ff # What? Why?
andi r3,r3,0x007f # What? Why?

"AND"ing with 0x7f is used to remove the most significant bit  as 0x7f is 0111 1111 and ensures that number is positive. "AND"ing with 0xff does the opposite.

Related

Using unnecessary hexadecimal notation?

Whilst reading Apple's dlfcn.h header I can across these macros:
#define RTLD_LAZY 0x1
#define RTLD_NOW 0x2
#define RTLD_LOCAL 0x4
#define RTLD_GLOBAL 0x8
Is there any reason why the author of this header wrote these as hexadecimal numbers when the prefixing 0x could be removed safely? 0x1, 0x2 etc are the same numbers as without 0x.
Is this just personal coding style?
It's conventional to use hexadecimal rather than decimal for powers-of-2 sequences, because it scales legibly.
Sure, there are only four options here, but consider a possible extension:
0x01
0x02
0x04
0x08
0x10
0x20
0x40
0x80
// ...
While the equivalent sequence rendered in decimal notation will be instantly familiar to any programmer, it's not as legible/symmetrical:
1
2
4
8
16
32
64
128
// ...
So, using hexadecimal here becomes a convention.
Besides that, sure, style. Some people like to use hex for "numbers used by the computer" because it looks kind of robotic; c.f. decimal for "numbers used by humans".
Consider also that values using use of these constants are likely to be manipulated using bitwise operators (which are similarly convenient to do in hex), and viewed in debuggers that give byte values in hexadecimal. It's easier to cross-reference the values in source code if they use the same base as the program's operations and your tools. 0x22 is easier to understand in this context than 34 is.
Ultimately, you may as well ask why we ever use hexadecimal instead of decimal, since there is always a valid conversion between the two. The truth is that some bases are just more convenient than others in certain scenarios. You wouldn't count on your fingers in binary, because you have ten of them.

Checking for a specific value sequence within data during a CRC

I'd like to preface this by saying that my knowledge of CRC techniques is very limited, I spent most of the day googlin' and reading things, but I can't quite find what I'm looking for. It may very well not be possible, if so just let me know!
What I have is a sequence of seemingly random data:
0xAF 0xBC 0x1F 0x5C... etc
Within this data, there is a field that is not random (that I put there), and I want to use a CRC check of the entire data set to see if this field is set to the correct value (lets say 0x12 0x34 0x56 0x78). I am trying to do this sneakily and this is key because I don't want a casual observer to know that I am looking for that field - this is why I don't just read out the location I want and compare against expected value.
The field's value is constant, the rest is pretty much random. There are some fields here and there that will also be constant if that helps.
Is this possible to do? I am not limited in the number of times I do the CRC check, or which direction I go through data, or of I change the polynomial, or really anything. I can also start from the middle of the array, or the third, or whatever, but I would prefer not to start near my field of interest.
The only function that comes to mind that will do what you want is a discrete wavelet transform. (A CRC will always depend on all of the bits that you are computing it over — that's kind of the point.)
You can find the coefficients to apply to the set of discrete wavelet basis functions that will give you a function with a finite basis that covers only the region of interest, using the orthogonality of the basis functions. It will appear that the wavelet functions are over the entire message, but the coefficients are rigged so that the values outside the region of interest cancel in the sum.
While this all may not be obvious to a casual reader of the code, it would be straightforward to write down the functions and coefficients, and multiply it out to see what bytes in the message are selected by the coefficients.
OK, so, to confirm, you have something like this as your data:
0xAF 0xBC 0x1F 0x5C 0x11 0x1F 0x5C 0x11
0x2D 0xAB 0xBB 0xCC 0x00 0xBB 0xCC 0x00
0x12 0x34 0x56 0x78 0xFF 0x56 0x78 0xFF
and you're trying to isolate something in a particular location of that data, e.g., to find the 0x12 0x34 0x56 0x78 value there.
To clarify, you're wanting to 1) check that value (that particular address range's value), and 2) then do a crc on the whole? Or are you wanting to integrate the hunt for the value into the crc algorithm?
Honestly trying to understand where you're going. I realize this isn't really an answer, but it's a better place for this than in a comment.

What are benefits of using Hexadecimal notation? [duplicate]

Hey! I was looking at this code at http://www.gnu.org/software/m68hc11/examples/primes_8c-source.html
I noticed that in some situations they used hex numbers, like in line 134:
for (j = 1; val && j <= 0x80; j <<= 1, q++)
Now why would they use the 0x80? I am not that good with hex but I found an online hex to decimal and it gave me 128 for 0x80.
Also before line 134, on line 114 they have this:
small_n = (n & 0xffff0000) == 0;
The hex to decimal gave me 4294901760 for that hex number.
So here in this line they are making a bit AND and comparing the result to 0??
Why not just use the number?
Can anyone please explain and please do give examples of other situations.
Also I have seen large lines of code where it's just hex numbers and never really understood why :(
In both cases you cite, the bit pattern of the number is important, not the actual number.
For example,
In the first case,
j is going to be 1, then 2, 4, 8, 16, 32, 64 and finally 128 as the loop progresses.
In binary, that is,
0000:0001, 0000:0010, 0000:0100, 0000:1000, 0001:0000, 0010:0000, 0100:0000 and 1000:0000.
There's no option for binary constants in C (until C23) or C++ (until C++14), but it's a bit clearer in Hex:
0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, and 0x80.
In the second example,
the goal was to remove the lower two bytes of the value.
So given a value of 1,234,567,890 we want to end up with 1,234,567,168.
In hex, it's clearer: start with 0x4996:02d2, end with 0x4996:0000.
There's a direct mapping between hex (or octal for that matter) digits and the underlying bit patterns, which is not the case with decimal. A decimal '9' represents something different with respect to bit patterns depending on what column it is in and what numbers surround it - it doesn't have a direct relationship to a bit pattern. In hex, a '9' always means '1001', no matter which column. 9 = '1001', 95 = '*1001*0101' and so forth.
As a vestige of my 8-bit days I find hex a convenient shorthand for anything binary. Bit twiddling is a dying skill. Once (about 10 years ago) I saw a third year networking paper at university where only 10% (5 out of 50 or so) of the people in the class could calculate a bit-mask.
its a bit mask. Hex values make it easy to see the underlying binary representation. n & 0xffff0000 returns the top 16 bits of n. 0xffff0000 means "16 1s and 16 0s in binary"
0x80 means "1000000", so you start with "00000001" and continue shifting that bit over to the left "0000010", "0000100", etc until "1000000"
0xffff0000 is easy to understand that it's 16 times "1" and 16 times "0" in a 32 bit value, while 4294901760 is magic.
I find it maddening that the C family of languages have always supported octal and hex but not binary. I have long wished that they would add direct support for binary:
int mask = 0b00001111;
Many years/jobs ago, while working on a project that involved an enormous amount of bit-level math, I got fed up and generated a header file that contained defined constants for all possible binary values up to 8 bits:
#define b0 (0x00)
#define b1 (0x01)
#define b00 (0x00)
#define b01 (0x01)
#define b10 (0x02)
#define b11 (0x03)
#define b000 (0x00)
#define b001 (0x01)
...
#define b11111110 (0xFE)
#define b11111111 (0xFF)
It has occasionally made certain bit-level code more readable.
The single biggest use of hex is probably in embedded programming. Hex numbers are used to mask off individual bits in hardware registers, or split multiple numeric values packed into a single 8, 16, or 32-bit register.
When specifying individual bit masks, a lot of people start out by:
#define bit_0 1
#define bit_1 2
#define bit_2 4
#define bit_3 8
#define bit_4 16
etc...
After a while, they advance to:
#define bit_0 0x01
#define bit_1 0x02
#define bit_2 0x04
#define bit_3 0x08
#define bit_4 0x10
etc...
Then they learn to cheat, and let the compiler generate the values as part of compile time optimization:
#define bit_0 (1<<0)
#define bit_1 (1<<1)
#define bit_2 (1<<2)
#define bit_3 (1<<3)
#define bit_4 (1<<4)
etc...
Sometimes the visual representation of values in HEX makes code more readable or understandable. For instance bitmasking or use of bits becomes non-obvious when looking at decimal representations of numbers.
This can sometimes do with the amount of space a particular value type has to offer, so that may also play a role.
A typical example might be in a binary setting, so instead of using decimal to show some values, we use binary.
let's say an object had a non-exclusive set of properties that had values of either on or off (3 of them) - one way to represent the state of those properties is with 3 bits.
valid representations are 0 through 7 in decimal, but that is not so obvious. more obvious is the binary representation:
000, 001, 010, 011, 100, 101, 110, 111
Also, some people are just very comfortable with hex. Note also that hard-coded magic numbers are just that and it is not all that important no matter numbering system to use
I hope that helps.
Generally the use of Hex numbers instead of Decimal it's because the computer works with bits (binary numbers) and when you're working with bits also is more understandable to use Hexadecimal numbers, because is easier going from Hex to binary that from Decimal to binary.
OxFF = 1111 1111 ( F = 1111 )
but
255 = 1111 1111
because
255 / 2 = 127 (rest 1)
127 / 2 = 63 (rest 1)
63 / 2 = 31 (rest 1)
... etc
Can you see that? It's much more simple to pass from Hex to binary.
There are 8 bits in a byte. Hex, base 16, is terse. Any possible byte value is expressed using two characters from the collection 0..9, plus a,b,c,d,e,f.
Base 256 would be more terse. Every possible byte could have its own single character, but most human languages don't use 256 characters, so Hex is the winner.
To understand the importance of being terse, consider that back in the 1970's, when you wanted to examine your megabyte of memory, it was printed out in hex. The printout would use several thousand pages of big paper. Octal would have wasted even more trees.
Hex, or hexadecimal, numbers represent 4 bits of data, 0 to 15 or in HEX 0 to F. Two hex values represent a byte.
To be more precise, hex and decimal, are all NUMBERS. The radix (base 10, 16, etc) are ways to present those numbers in a manner that is either clearer, or more convenient.
When discussing "how many of something there are" we normally use decimal. When we are looking at addresses or bit patterns on computers, hex is usually preferred, because often the meaning of individual bytes might be important.
Hex, (and octal) have the property that they are powers of two, so they map groupings of bit nicely. Hex maps 4 bits to one hex nibble (0-F), so a byte is stored in two nibbles (00-FF). Octal was popular on Digital Equipment (DEC) and other older machines, but one octal digit maps to three bits, so it doesn't cross byte boundaries as nicely.
Overall, the choice of radix is a way to make your programming easier - use the one that matches the domain best.
Looking at the file, that's some pretty groady code. Hope you are good at C and not using it as a tutorial...
Hex is useful when you're directly working at the bit level or just above it. E.g, working on a driver where you're looking directly at the bits coming in from a device and twiddling the results so that someone else can read a coherent result. It's a compact fairly easy-to-read representation of binary.

why are the bytes in byte array reversed in C++

the code i am trying to understand overwrites a section of a game process memory (window.h, WriteProcessMemory) in order to modify a parameter in the game (for example, strength). the values would most likely be integers
the code attempts replacement with this function
WriteProcessMemory( GameHandle, (BYTE*)StrengthMemoryAddress, &StrengthValue, sizeof(StrengthValue), NULL);
where StrengthMemoryAddress is a pre-calculated dynamic address and StrengthValue is the following:
byte StrengthValue[] = { 0x39, 0x5, 0x0, 0x0 };
it replaces strength with 1337
my question is basically how the byte array works in this function. from google i know that the hex value of 1337 is 0x539.
how come you have to reverse it in the byte array? i see that he first puts 0x39 then 0x5, which i concluded probably combines to 0x539 in some reverse order. also, why do you need the extra 0x0 at the end - can't you just leave it out?
thanks
from google i know that the hex value of 1337 is 0x539.
Or it is 0x00000539 which is same but written as a 4 byte integer. Now if you write this integer in little endian way in memory you would have to store it in following order (Least significant byte - 0x39 - goes first):
Memory Address Values
1000 0x39
1001 0x05
1002 0x00
1003 0x00
So that has to do with endianness. You may want to read more on that topic.
You were expecting the 0x39 to be the highest byte (Big Endian), but you ended up on an architecture where it is the lowest byte (Little Endian).
Looking at an int logically as:
[ BYTE 0 ][ BYTE 1 ][ BYTE 2 ][ BYTE 3 ]
* 256^3 *256^2 *256 *1
MSB LSB
But that does not mean the architecture you are on maps a char array in that way. In, fact it did the opposite.
value [what you expected] [what you got]
BIG ENDIAN LITTLE ENDIAN
0x39 BYTE 0 BYTE 3
0x05 BYTE 1 BYTE 2
0x00 BYTE 2 BYTE 1
0x00 BYTE 3 BYTE 0
If you do not set all 4 bytes than the missing bytes are called uninialized memory and using it through the int you create is considered undefined behavior. This will likely just leave an unexpected value in the missing byte (whatever happened to be there before), but the compiler is free to do whatever it wants, like removing code you thought would do something, leading to very unexpected behavior for you.
The numbers you're writing have to be in Little Endian format. I recommend you read up on Endianness.
As for the extra 0 at the end: You have to overwrite the entirety of the byte-length of the int, or you'll risk leaving behind old values which would corrupt the value of the int you're writing.

Why use hex?

Hey! I was looking at this code at http://www.gnu.org/software/m68hc11/examples/primes_8c-source.html
I noticed that in some situations they used hex numbers, like in line 134:
for (j = 1; val && j <= 0x80; j <<= 1, q++)
Now why would they use the 0x80? I am not that good with hex but I found an online hex to decimal and it gave me 128 for 0x80.
Also before line 134, on line 114 they have this:
small_n = (n & 0xffff0000) == 0;
The hex to decimal gave me 4294901760 for that hex number.
So here in this line they are making a bit AND and comparing the result to 0??
Why not just use the number?
Can anyone please explain and please do give examples of other situations.
Also I have seen large lines of code where it's just hex numbers and never really understood why :(
In both cases you cite, the bit pattern of the number is important, not the actual number.
For example,
In the first case,
j is going to be 1, then 2, 4, 8, 16, 32, 64 and finally 128 as the loop progresses.
In binary, that is,
0000:0001, 0000:0010, 0000:0100, 0000:1000, 0001:0000, 0010:0000, 0100:0000 and 1000:0000.
There's no option for binary constants in C (until C23) or C++ (until C++14), but it's a bit clearer in Hex:
0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, and 0x80.
In the second example,
the goal was to remove the lower two bytes of the value.
So given a value of 1,234,567,890 we want to end up with 1,234,567,168.
In hex, it's clearer: start with 0x4996:02d2, end with 0x4996:0000.
There's a direct mapping between hex (or octal for that matter) digits and the underlying bit patterns, which is not the case with decimal. A decimal '9' represents something different with respect to bit patterns depending on what column it is in and what numbers surround it - it doesn't have a direct relationship to a bit pattern. In hex, a '9' always means '1001', no matter which column. 9 = '1001', 95 = '*1001*0101' and so forth.
As a vestige of my 8-bit days I find hex a convenient shorthand for anything binary. Bit twiddling is a dying skill. Once (about 10 years ago) I saw a third year networking paper at university where only 10% (5 out of 50 or so) of the people in the class could calculate a bit-mask.
its a bit mask. Hex values make it easy to see the underlying binary representation. n & 0xffff0000 returns the top 16 bits of n. 0xffff0000 means "16 1s and 16 0s in binary"
0x80 means "1000000", so you start with "00000001" and continue shifting that bit over to the left "0000010", "0000100", etc until "1000000"
0xffff0000 is easy to understand that it's 16 times "1" and 16 times "0" in a 32 bit value, while 4294901760 is magic.
I find it maddening that the C family of languages have always supported octal and hex but not binary. I have long wished that they would add direct support for binary:
int mask = 0b00001111;
Many years/jobs ago, while working on a project that involved an enormous amount of bit-level math, I got fed up and generated a header file that contained defined constants for all possible binary values up to 8 bits:
#define b0 (0x00)
#define b1 (0x01)
#define b00 (0x00)
#define b01 (0x01)
#define b10 (0x02)
#define b11 (0x03)
#define b000 (0x00)
#define b001 (0x01)
...
#define b11111110 (0xFE)
#define b11111111 (0xFF)
It has occasionally made certain bit-level code more readable.
The single biggest use of hex is probably in embedded programming. Hex numbers are used to mask off individual bits in hardware registers, or split multiple numeric values packed into a single 8, 16, or 32-bit register.
When specifying individual bit masks, a lot of people start out by:
#define bit_0 1
#define bit_1 2
#define bit_2 4
#define bit_3 8
#define bit_4 16
etc...
After a while, they advance to:
#define bit_0 0x01
#define bit_1 0x02
#define bit_2 0x04
#define bit_3 0x08
#define bit_4 0x10
etc...
Then they learn to cheat, and let the compiler generate the values as part of compile time optimization:
#define bit_0 (1<<0)
#define bit_1 (1<<1)
#define bit_2 (1<<2)
#define bit_3 (1<<3)
#define bit_4 (1<<4)
etc...
Sometimes the visual representation of values in HEX makes code more readable or understandable. For instance bitmasking or use of bits becomes non-obvious when looking at decimal representations of numbers.
This can sometimes do with the amount of space a particular value type has to offer, so that may also play a role.
A typical example might be in a binary setting, so instead of using decimal to show some values, we use binary.
let's say an object had a non-exclusive set of properties that had values of either on or off (3 of them) - one way to represent the state of those properties is with 3 bits.
valid representations are 0 through 7 in decimal, but that is not so obvious. more obvious is the binary representation:
000, 001, 010, 011, 100, 101, 110, 111
Also, some people are just very comfortable with hex. Note also that hard-coded magic numbers are just that and it is not all that important no matter numbering system to use
I hope that helps.
Generally the use of Hex numbers instead of Decimal it's because the computer works with bits (binary numbers) and when you're working with bits also is more understandable to use Hexadecimal numbers, because is easier going from Hex to binary that from Decimal to binary.
OxFF = 1111 1111 ( F = 1111 )
but
255 = 1111 1111
because
255 / 2 = 127 (rest 1)
127 / 2 = 63 (rest 1)
63 / 2 = 31 (rest 1)
... etc
Can you see that? It's much more simple to pass from Hex to binary.
There are 8 bits in a byte. Hex, base 16, is terse. Any possible byte value is expressed using two characters from the collection 0..9, plus a,b,c,d,e,f.
Base 256 would be more terse. Every possible byte could have its own single character, but most human languages don't use 256 characters, so Hex is the winner.
To understand the importance of being terse, consider that back in the 1970's, when you wanted to examine your megabyte of memory, it was printed out in hex. The printout would use several thousand pages of big paper. Octal would have wasted even more trees.
Hex, or hexadecimal, numbers represent 4 bits of data, 0 to 15 or in HEX 0 to F. Two hex values represent a byte.
To be more precise, hex and decimal, are all NUMBERS. The radix (base 10, 16, etc) are ways to present those numbers in a manner that is either clearer, or more convenient.
When discussing "how many of something there are" we normally use decimal. When we are looking at addresses or bit patterns on computers, hex is usually preferred, because often the meaning of individual bytes might be important.
Hex, (and octal) have the property that they are powers of two, so they map groupings of bit nicely. Hex maps 4 bits to one hex nibble (0-F), so a byte is stored in two nibbles (00-FF). Octal was popular on Digital Equipment (DEC) and other older machines, but one octal digit maps to three bits, so it doesn't cross byte boundaries as nicely.
Overall, the choice of radix is a way to make your programming easier - use the one that matches the domain best.
Looking at the file, that's some pretty groady code. Hope you are good at C and not using it as a tutorial...
Hex is useful when you're directly working at the bit level or just above it. E.g, working on a driver where you're looking directly at the bits coming in from a device and twiddling the results so that someone else can read a coherent result. It's a compact fairly easy-to-read representation of binary.