Hey! I was looking at this code at http://www.gnu.org/software/m68hc11/examples/primes_8c-source.html
I noticed that in some situations they used hex numbers, like in line 134:
for (j = 1; val && j <= 0x80; j <<= 1, q++)
Now why would they use the 0x80? I am not that good with hex but I found an online hex to decimal and it gave me 128 for 0x80.
Also before line 134, on line 114 they have this:
small_n = (n & 0xffff0000) == 0;
The hex to decimal gave me 4294901760 for that hex number.
So here in this line they are making a bit AND and comparing the result to 0??
Why not just use the number?
Can anyone please explain and please do give examples of other situations.
Also I have seen large lines of code where it's just hex numbers and never really understood why :(
In both cases you cite, the bit pattern of the number is important, not the actual number.
For example,
In the first case,
j is going to be 1, then 2, 4, 8, 16, 32, 64 and finally 128 as the loop progresses.
In binary, that is,
0000:0001, 0000:0010, 0000:0100, 0000:1000, 0001:0000, 0010:0000, 0100:0000 and 1000:0000.
There's no option for binary constants in C (until C23) or C++ (until C++14), but it's a bit clearer in Hex:
0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, and 0x80.
In the second example,
the goal was to remove the lower two bytes of the value.
So given a value of 1,234,567,890 we want to end up with 1,234,567,168.
In hex, it's clearer: start with 0x4996:02d2, end with 0x4996:0000.
There's a direct mapping between hex (or octal for that matter) digits and the underlying bit patterns, which is not the case with decimal. A decimal '9' represents something different with respect to bit patterns depending on what column it is in and what numbers surround it - it doesn't have a direct relationship to a bit pattern. In hex, a '9' always means '1001', no matter which column. 9 = '1001', 95 = '*1001*0101' and so forth.
As a vestige of my 8-bit days I find hex a convenient shorthand for anything binary. Bit twiddling is a dying skill. Once (about 10 years ago) I saw a third year networking paper at university where only 10% (5 out of 50 or so) of the people in the class could calculate a bit-mask.
its a bit mask. Hex values make it easy to see the underlying binary representation. n & 0xffff0000 returns the top 16 bits of n. 0xffff0000 means "16 1s and 16 0s in binary"
0x80 means "1000000", so you start with "00000001" and continue shifting that bit over to the left "0000010", "0000100", etc until "1000000"
0xffff0000 is easy to understand that it's 16 times "1" and 16 times "0" in a 32 bit value, while 4294901760 is magic.
I find it maddening that the C family of languages have always supported octal and hex but not binary. I have long wished that they would add direct support for binary:
int mask = 0b00001111;
Many years/jobs ago, while working on a project that involved an enormous amount of bit-level math, I got fed up and generated a header file that contained defined constants for all possible binary values up to 8 bits:
#define b0 (0x00)
#define b1 (0x01)
#define b00 (0x00)
#define b01 (0x01)
#define b10 (0x02)
#define b11 (0x03)
#define b000 (0x00)
#define b001 (0x01)
...
#define b11111110 (0xFE)
#define b11111111 (0xFF)
It has occasionally made certain bit-level code more readable.
The single biggest use of hex is probably in embedded programming. Hex numbers are used to mask off individual bits in hardware registers, or split multiple numeric values packed into a single 8, 16, or 32-bit register.
When specifying individual bit masks, a lot of people start out by:
#define bit_0 1
#define bit_1 2
#define bit_2 4
#define bit_3 8
#define bit_4 16
etc...
After a while, they advance to:
#define bit_0 0x01
#define bit_1 0x02
#define bit_2 0x04
#define bit_3 0x08
#define bit_4 0x10
etc...
Then they learn to cheat, and let the compiler generate the values as part of compile time optimization:
#define bit_0 (1<<0)
#define bit_1 (1<<1)
#define bit_2 (1<<2)
#define bit_3 (1<<3)
#define bit_4 (1<<4)
etc...
Sometimes the visual representation of values in HEX makes code more readable or understandable. For instance bitmasking or use of bits becomes non-obvious when looking at decimal representations of numbers.
This can sometimes do with the amount of space a particular value type has to offer, so that may also play a role.
A typical example might be in a binary setting, so instead of using decimal to show some values, we use binary.
let's say an object had a non-exclusive set of properties that had values of either on or off (3 of them) - one way to represent the state of those properties is with 3 bits.
valid representations are 0 through 7 in decimal, but that is not so obvious. more obvious is the binary representation:
000, 001, 010, 011, 100, 101, 110, 111
Also, some people are just very comfortable with hex. Note also that hard-coded magic numbers are just that and it is not all that important no matter numbering system to use
I hope that helps.
Generally the use of Hex numbers instead of Decimal it's because the computer works with bits (binary numbers) and when you're working with bits also is more understandable to use Hexadecimal numbers, because is easier going from Hex to binary that from Decimal to binary.
OxFF = 1111 1111 ( F = 1111 )
but
255 = 1111 1111
because
255 / 2 = 127 (rest 1)
127 / 2 = 63 (rest 1)
63 / 2 = 31 (rest 1)
... etc
Can you see that? It's much more simple to pass from Hex to binary.
There are 8 bits in a byte. Hex, base 16, is terse. Any possible byte value is expressed using two characters from the collection 0..9, plus a,b,c,d,e,f.
Base 256 would be more terse. Every possible byte could have its own single character, but most human languages don't use 256 characters, so Hex is the winner.
To understand the importance of being terse, consider that back in the 1970's, when you wanted to examine your megabyte of memory, it was printed out in hex. The printout would use several thousand pages of big paper. Octal would have wasted even more trees.
Hex, or hexadecimal, numbers represent 4 bits of data, 0 to 15 or in HEX 0 to F. Two hex values represent a byte.
To be more precise, hex and decimal, are all NUMBERS. The radix (base 10, 16, etc) are ways to present those numbers in a manner that is either clearer, or more convenient.
When discussing "how many of something there are" we normally use decimal. When we are looking at addresses or bit patterns on computers, hex is usually preferred, because often the meaning of individual bytes might be important.
Hex, (and octal) have the property that they are powers of two, so they map groupings of bit nicely. Hex maps 4 bits to one hex nibble (0-F), so a byte is stored in two nibbles (00-FF). Octal was popular on Digital Equipment (DEC) and other older machines, but one octal digit maps to three bits, so it doesn't cross byte boundaries as nicely.
Overall, the choice of radix is a way to make your programming easier - use the one that matches the domain best.
Looking at the file, that's some pretty groady code. Hope you are good at C and not using it as a tutorial...
Hex is useful when you're directly working at the bit level or just above it. E.g, working on a driver where you're looking directly at the bits coming in from a device and twiddling the results so that someone else can read a coherent result. It's a compact fairly easy-to-read representation of binary.
Related
My textbook says
"The bitwise AND operator & is often used to mask off some set of bits, for example
n = n & 0177;
sets to zero all but the low-order 7 bits of n."
But, as per my understanding, binary form of 0177 is 101010001, so the operation n =n & 0177 should retain 1st, 5th , 7th and 9th bit of n from right, and set all other bits to zero.
Can anyone point out where am I wrong in understanding this?
As mentioned in the comments, it would work when 0177 is an octal (base 8, 3 bits per digit) number.
In several languages (for instance javascript) a leading 0 signals an octal number:
var n = 0177; // n now contains the decimal value 127
so octal 0177 == binary 01 111 111 == decimal 127
And this (0-prefix means octal) is also why in javascript a parseInt fails on a month input of 08 or 09, unless you explicitly specify a radix of 10.
I believe your understanding is correct aside from the binary representation of 0177. If this is a decimal number it would be 01111111 and if it were hex (as I suspect it is), don’t forget the 0x prefix! Then it is 101110111 so it would retain different bits. Not sure where you got 101010001. Let me know if this doesn’t make sense.
Whilst reading Apple's dlfcn.h header I can across these macros:
#define RTLD_LAZY 0x1
#define RTLD_NOW 0x2
#define RTLD_LOCAL 0x4
#define RTLD_GLOBAL 0x8
Is there any reason why the author of this header wrote these as hexadecimal numbers when the prefixing 0x could be removed safely? 0x1, 0x2 etc are the same numbers as without 0x.
Is this just personal coding style?
It's conventional to use hexadecimal rather than decimal for powers-of-2 sequences, because it scales legibly.
Sure, there are only four options here, but consider a possible extension:
0x01
0x02
0x04
0x08
0x10
0x20
0x40
0x80
// ...
While the equivalent sequence rendered in decimal notation will be instantly familiar to any programmer, it's not as legible/symmetrical:
1
2
4
8
16
32
64
128
// ...
So, using hexadecimal here becomes a convention.
Besides that, sure, style. Some people like to use hex for "numbers used by the computer" because it looks kind of robotic; c.f. decimal for "numbers used by humans".
Consider also that values using use of these constants are likely to be manipulated using bitwise operators (which are similarly convenient to do in hex), and viewed in debuggers that give byte values in hexadecimal. It's easier to cross-reference the values in source code if they use the same base as the program's operations and your tools. 0x22 is easier to understand in this context than 34 is.
Ultimately, you may as well ask why we ever use hexadecimal instead of decimal, since there is always a valid conversion between the two. The truth is that some bases are just more convenient than others in certain scenarios. You wouldn't count on your fingers in binary, because you have ten of them.
Hey! I was looking at this code at http://www.gnu.org/software/m68hc11/examples/primes_8c-source.html
I noticed that in some situations they used hex numbers, like in line 134:
for (j = 1; val && j <= 0x80; j <<= 1, q++)
Now why would they use the 0x80? I am not that good with hex but I found an online hex to decimal and it gave me 128 for 0x80.
Also before line 134, on line 114 they have this:
small_n = (n & 0xffff0000) == 0;
The hex to decimal gave me 4294901760 for that hex number.
So here in this line they are making a bit AND and comparing the result to 0??
Why not just use the number?
Can anyone please explain and please do give examples of other situations.
Also I have seen large lines of code where it's just hex numbers and never really understood why :(
In both cases you cite, the bit pattern of the number is important, not the actual number.
For example,
In the first case,
j is going to be 1, then 2, 4, 8, 16, 32, 64 and finally 128 as the loop progresses.
In binary, that is,
0000:0001, 0000:0010, 0000:0100, 0000:1000, 0001:0000, 0010:0000, 0100:0000 and 1000:0000.
There's no option for binary constants in C (until C23) or C++ (until C++14), but it's a bit clearer in Hex:
0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, and 0x80.
In the second example,
the goal was to remove the lower two bytes of the value.
So given a value of 1,234,567,890 we want to end up with 1,234,567,168.
In hex, it's clearer: start with 0x4996:02d2, end with 0x4996:0000.
There's a direct mapping between hex (or octal for that matter) digits and the underlying bit patterns, which is not the case with decimal. A decimal '9' represents something different with respect to bit patterns depending on what column it is in and what numbers surround it - it doesn't have a direct relationship to a bit pattern. In hex, a '9' always means '1001', no matter which column. 9 = '1001', 95 = '*1001*0101' and so forth.
As a vestige of my 8-bit days I find hex a convenient shorthand for anything binary. Bit twiddling is a dying skill. Once (about 10 years ago) I saw a third year networking paper at university where only 10% (5 out of 50 or so) of the people in the class could calculate a bit-mask.
its a bit mask. Hex values make it easy to see the underlying binary representation. n & 0xffff0000 returns the top 16 bits of n. 0xffff0000 means "16 1s and 16 0s in binary"
0x80 means "1000000", so you start with "00000001" and continue shifting that bit over to the left "0000010", "0000100", etc until "1000000"
0xffff0000 is easy to understand that it's 16 times "1" and 16 times "0" in a 32 bit value, while 4294901760 is magic.
I find it maddening that the C family of languages have always supported octal and hex but not binary. I have long wished that they would add direct support for binary:
int mask = 0b00001111;
Many years/jobs ago, while working on a project that involved an enormous amount of bit-level math, I got fed up and generated a header file that contained defined constants for all possible binary values up to 8 bits:
#define b0 (0x00)
#define b1 (0x01)
#define b00 (0x00)
#define b01 (0x01)
#define b10 (0x02)
#define b11 (0x03)
#define b000 (0x00)
#define b001 (0x01)
...
#define b11111110 (0xFE)
#define b11111111 (0xFF)
It has occasionally made certain bit-level code more readable.
The single biggest use of hex is probably in embedded programming. Hex numbers are used to mask off individual bits in hardware registers, or split multiple numeric values packed into a single 8, 16, or 32-bit register.
When specifying individual bit masks, a lot of people start out by:
#define bit_0 1
#define bit_1 2
#define bit_2 4
#define bit_3 8
#define bit_4 16
etc...
After a while, they advance to:
#define bit_0 0x01
#define bit_1 0x02
#define bit_2 0x04
#define bit_3 0x08
#define bit_4 0x10
etc...
Then they learn to cheat, and let the compiler generate the values as part of compile time optimization:
#define bit_0 (1<<0)
#define bit_1 (1<<1)
#define bit_2 (1<<2)
#define bit_3 (1<<3)
#define bit_4 (1<<4)
etc...
Sometimes the visual representation of values in HEX makes code more readable or understandable. For instance bitmasking or use of bits becomes non-obvious when looking at decimal representations of numbers.
This can sometimes do with the amount of space a particular value type has to offer, so that may also play a role.
A typical example might be in a binary setting, so instead of using decimal to show some values, we use binary.
let's say an object had a non-exclusive set of properties that had values of either on or off (3 of them) - one way to represent the state of those properties is with 3 bits.
valid representations are 0 through 7 in decimal, but that is not so obvious. more obvious is the binary representation:
000, 001, 010, 011, 100, 101, 110, 111
Also, some people are just very comfortable with hex. Note also that hard-coded magic numbers are just that and it is not all that important no matter numbering system to use
I hope that helps.
Generally the use of Hex numbers instead of Decimal it's because the computer works with bits (binary numbers) and when you're working with bits also is more understandable to use Hexadecimal numbers, because is easier going from Hex to binary that from Decimal to binary.
OxFF = 1111 1111 ( F = 1111 )
but
255 = 1111 1111
because
255 / 2 = 127 (rest 1)
127 / 2 = 63 (rest 1)
63 / 2 = 31 (rest 1)
... etc
Can you see that? It's much more simple to pass from Hex to binary.
There are 8 bits in a byte. Hex, base 16, is terse. Any possible byte value is expressed using two characters from the collection 0..9, plus a,b,c,d,e,f.
Base 256 would be more terse. Every possible byte could have its own single character, but most human languages don't use 256 characters, so Hex is the winner.
To understand the importance of being terse, consider that back in the 1970's, when you wanted to examine your megabyte of memory, it was printed out in hex. The printout would use several thousand pages of big paper. Octal would have wasted even more trees.
Hex, or hexadecimal, numbers represent 4 bits of data, 0 to 15 or in HEX 0 to F. Two hex values represent a byte.
To be more precise, hex and decimal, are all NUMBERS. The radix (base 10, 16, etc) are ways to present those numbers in a manner that is either clearer, or more convenient.
When discussing "how many of something there are" we normally use decimal. When we are looking at addresses or bit patterns on computers, hex is usually preferred, because often the meaning of individual bytes might be important.
Hex, (and octal) have the property that they are powers of two, so they map groupings of bit nicely. Hex maps 4 bits to one hex nibble (0-F), so a byte is stored in two nibbles (00-FF). Octal was popular on Digital Equipment (DEC) and other older machines, but one octal digit maps to three bits, so it doesn't cross byte boundaries as nicely.
Overall, the choice of radix is a way to make your programming easier - use the one that matches the domain best.
Looking at the file, that's some pretty groady code. Hope you are good at C and not using it as a tutorial...
Hex is useful when you're directly working at the bit level or just above it. E.g, working on a driver where you're looking directly at the bits coming in from a device and twiddling the results so that someone else can read a coherent result. It's a compact fairly easy-to-read representation of binary.
With unsigned char you can store a number from 0 to 255
255(b10) = 11111111(b2) <= that's 1 byte
This will make it easy to preform operations like +,-,*...
Now how about:
255(b10) = 10101101(b2)
Following this method will make it possible to represent up to 399 using unsigned char?
399(b10) = 11111111(b2)
Can someone propose an algorithm to preform addition using the last method?
With eight bits there are only 256 possible value (28), no matter how you slice and dice it.
Your scheme to encode digits in a 2-3-3 form like:
255 = 10 101 101
399 = 11 111 111
ignores the fact that those three-bit sequences in there can only represent eight values (0-7), not ten (ie, that second one would be 377, not 399).
The trade-off is that this means you gain the numbers '25[6-7]' (2 values) '2[6-7][0-7]' (16 values) and '3[0-7][0-7]' (64 values) for a total of 82 values.
Your sacrifice for that gain is that you can no longer represent any numbers containing 8 or 9: '[8-9]' (2 values), '[1-7][8-9]' (14 values), '[8-9][0-9]' (20 values), '1[0-7][8-9]' (16 values), '1[8-9][0-9]' (20 values) or '2[0-4][8-9]' (10 values), for a total of 82 values.
The balance there (82 vs. 82) shows that there are still only 256 possible values for an eight-bit data type.
So your encoding scheme is based on a flawed premise, which makes the second part of your question (how to add them) irrelevant, I'm afraid.
A unsigned char type can only mathematically hold values between 0 and 255 as determined by the rule 2^n - 1 for the maximum unsigned value that the amount of bits n can represent. There is no way to "improve" a char range, you probably want to use an unsigned short which holds two bytes instead.
You're mistaken.
In your scheme, 255 would be 010101101, which is 9 bits. The leading zero is important. I'm assuming here you're using something that looks like the octal representation. 3 bits/digit. Any other alternative means you cannot represent all the other digits.
|0|000|
|1|001|
|2|010|
|3|011|
|4|100|
|5|101|
|6|110|
|7|111|
|8|???|
|9|???|
9 in binary is 1001.
So you can't use 3 bits per digit. You need to use 4 bits if you want to represent 8 and 9. Again, I'm trying to assume here that you're encoding each digit separately.
So, 399 according to you would be: 001110011001 - 12 bits.
By comparison, binary does 399 in 110001111 - 9 bits.
So binary is the most efficient, because encoding digits from 0 to 9 in your system means that the maximum number you can store without any information loss in 8 bits is 99 - 10011001 :)
One way to think of binary, is a path that is the result of a log search to find the number.
If you really want to condense the number of bits needed to represent a number, what you're really after is some sort of compression and not the way binary is done.
What you want to do is mathematically impossible. You can only represent 256 discrete values with 8 boolean values.
To test this, make a chart of all possible values, in decimal and binary. I.e.
000 = 00000000
001 = 00000001
002 = 00000010
003 = 00000011
004 = 00000100
...
254 = 11111110
255 = 11111111
You will see that after 255, you need a ninth bit.
You can let 255 = 10101101, but if you work backwards from that, you will run out before you reach 0.
You seem to hope you can somehow use a different counting mechanism to store more values. This is not mathematically possible. See the Pidgeonhole Principle.
Why is it that if you open up an EXE in a hex editor, you will see all sorts of things. If computers only understand binary then shouldn't there only be 2 possible symbols seen in the file? Thanks
You're confusing content with representation. Every single file on your computer can be represented with binary (1s and 0s), and indeed that's how it's generally stored on disk (alignment of magnetic particles) or RAM (charge).
You're viewing your exe with a "hex editor", which represents the content using hexadecimal numbers. It does this because it's easier to understand and navigate hex than binary (compare "FA" to "11111010").
So the hexadecimal symbol "C0" represents the same value as the binary "11000000", "C1" == "11000001", "C2" == "11000010", and so on.
The hexadecimal values are interpreted binary values in memory. The software only make it a bit more readable to human beings.
0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = 8
1001 = 9
1010 = 10 A
1011 = 11 B
1100 = 12 C
1101 = 13 D
1110 = 14 E
1111 = 15 F
Computers don't only understand binary, that's a misconception. At the very lowest, lowest, lowest level, yes, data in digital computers is a series of 1s and 0s. But computer CPUs group those bits together into bytes, words, dwords, qwords, etc. The basic unit dealt with by a modern CPU is a dword or a qword, not a bit. That's why they're called 32-bit or 64-bit processors. If you want to get them to work with a single bit, you pretty much end up including 31 or 63 extraneous bits with it. (It gets a bit blurry when you start dealing with flag registers.)
Digital computers really came into their own as of 8-bit processors, so hexadecimal became a very useful display format as it succinctly represents a byte (8 bits) in two characters. You're using a hex editor, so it's showing you hex, and because of this early byte-orientation, it's showing you two characters for every 8 bits. It's mostly a display thing, though; there's little reason it couldn't show you one character for every 4 bits or four characters for every 16 bits, although file systems generally work on byte granularity for actual data (and much, much larger chunks for storage allocation granularity -- almost always 4k or more).
This character A you see here on the screen is just a pattern made of ones and zeros. It's how we all cooperate by all the standards that make all ones and zeros making patterns ending up on the screen understandable.
The character A can have the value 65. In binary this is 0100 0001 but on the screen it might be the pattern
##
# #
####
# #
# #
In a exe file a lot of stuff is stored in various formats, floats, integers and strings. These formats are often used as they will easily be read directly by the computer without further conversion. In a Hex editor you will often be able to read strings that happen to be stored in the exe file.
In a computer everything's binary
There are only two possible states. What you're seeing is larger patterns of combinations of them, much in the same way that the only things sentences are made of are letters and punctuation.
Each character (byte) in the file represents 8 bits (8 ones or zeroes). You don't see bits, you see bytes (and larger types).
So I am going to give a layman answer here. What others suggested above is correct, you can read binary through Hex representation. Most data is saved in round number of bytes anyway. It is possible that e.g. compression algorithm computes a compressed representation in some odd number of bits, but it would still pad it to a full byte to save it. And each byte can be represented as 8 bits or 2 hex digits.
But, this may not be what you have asked. Quite likely you found some ascii data inside
the supposedly binary data. Why? Well, sometimes code is not just for running. Sometimes
compilers include some bits of human readable data that can help debugging if the code were
to crash and you needed to access the stack trace. Things like variable names, line numbers etc.
Not that I ever had to do that. I don't have bugs in my code. Thats right.
Don't forget that about operating system and disk file sytem. They are may only use files in their formats. For example executable files in win32 must begin with PE header. Operation system loads exutable in memory and transfer control, assort api-instructions in the exutables and so on...The low level instructions executes by CPU, for that level instructions already may be a sets of byte.