I'm using Java, for this.
I have the code 97 which represents the 'a' character is ascii. I convert 97 to binary which gives me 1100001 (7 bits) I want to convert this to 12 bits, I can add leading 0's to the existing 7 bits until it reaches 12 bits, but this seems inefficient. I've been thinking of using the & bit wise operator to make zeros all but the lowest bits of 97 to reach 12 bits, is this possible and how can I do it?
byte buffer = (byte) (code & 0xff);
Above line of code will give me 01100001 no?
which gives me 1100001 (7 bits)
Your value buffer is 8 bits. Because that's what a byte is: 8 bits.
If code has type int (detail added in comment below) it is already a 32-bit number with, in this case, 25 leading zero bits. You need do nothing with it. It's got all the bits you're asking for.
There is no Java integral type with 12 bits, nor is one directly achievable, since 12 is not a multiple of the byte size. It's unclear why you want exactly 12 bits. What harm do you think an extra 20 zero bits will do?
The important fact is that in Java, integral types (char, byte, int, etc.) have a fixed number of bits, defined by the language specification.
With reference to your original code & 0xff - code has 32 bits. In general these bits could have any value.
In your particular case, you told us that code was 97, and therefore we know the top 25 bits of code were zero; this follows from the binary representation of 97.
Again in general, & 0xff would set all but the low 8 bits to zero. In your case, that had no actual effect because they were already zero. No bits are "added" - they are always there.
Related
generally, CRC-32 is being calculated for 32 bits and its multiples. i want to calculate CRC-32 for a 24-bit number. how to perform such action. I'm not from a computer science background so not having a thorough understanding of CRC-32 so kindly help.
The actual math is in effect appending 32 zero bits to the 24 bit number when calculating a CRC. A software version emulates this by cycling the CRC as needed.
To simplify things, assume the number is stored in big endian format. Then the 24 bit value could be place into a 32 bit register, and the 32 bit register cycled 32 times (emulating appending 32 zero bits) to produce a CRC. Since after putting a 24 bit number into a 32 bit register results in 8 leading zero bits, the first step could just shift the 24 bit number left 8 bits, then cycle the CRC 24 times.
If processing a byte at a time using a table lookup and 3 bytes of data that hold the 24 bit number, the process xor's the next byte into the upper 8 bits of the 32 bit CRC register, then uses the table to emulate cycling the 32 bit CRC 8 times.
My textbook says
"The bitwise AND operator & is often used to mask off some set of bits, for example
n = n & 0177;
sets to zero all but the low-order 7 bits of n."
But, as per my understanding, binary form of 0177 is 101010001, so the operation n =n & 0177 should retain 1st, 5th , 7th and 9th bit of n from right, and set all other bits to zero.
Can anyone point out where am I wrong in understanding this?
As mentioned in the comments, it would work when 0177 is an octal (base 8, 3 bits per digit) number.
In several languages (for instance javascript) a leading 0 signals an octal number:
var n = 0177; // n now contains the decimal value 127
so octal 0177 == binary 01 111 111 == decimal 127
And this (0-prefix means octal) is also why in javascript a parseInt fails on a month input of 08 or 09, unless you explicitly specify a radix of 10.
I believe your understanding is correct aside from the binary representation of 0177. If this is a decimal number it would be 01111111 and if it were hex (as I suspect it is), don’t forget the 0x prefix! Then it is 101110111 so it would retain different bits. Not sure where you got 101010001. Let me know if this doesn’t make sense.
I'd like to know what the difference is between the values 0x7FFF and 32767. As far as I know, they should both be integers, and the only benefit is the notational convenience. They will take up the same amount of memory, and be represented the same way, or is there another reason for choosing to write a number as 0x vs base 10?
The only advantage is that some programmers find it easier to convert between base 16 and binary in their heads. Since each base 16 digit occupies exactly 4 bits, it's a lot easier to visualize the alignment of bits. And writing in base 2 is quite cumbersome.
The type of an undecorated decimal integral constants is always signed. The type of an undecorated hexadecimal or octal constant alternates between signed and unsigned as you hit the various boundary values determined by the widths of the integral types.
For constants decorated as unsigned (e.g. 0xFU), there is no difference.
Also, it's not possible to express 0 as a decimal literal.
See Table 6 in C++11 and 6.4.4.1/5 in C11.
Both are integer literals, and just provide a different means of expressing the same number. There is no technical advantage to using one form over the other.
Note that you can also use octal notation as well (by prepending the value with 0).
The 0x7FFF notation is much more clear about potential over/underflow than the decimal notation.
If you using something that is 16 bits wide, 0x7FFF alerts you to the fact that if you use those bits in a signed way, you are at the very maximum of what those 16 bits can hold for a positive, signed value. Add 1 to it, and you'll overflow.
Same goes for 32 bits wide. The maximum that it can hold (signed, positive) is 0x7FFFFFFF.
You can see these maximums straight off of the hex notation, whereas you can't off of the decimal notation. (Unless you happen to have memorized that 32767 is the positive signed max for 16 bits).
(Note: the above is true when twos complement is being used for distinguishing between positive and negative values if the 16 bits are holding a signed value).
One is hex -- base 16 -- and the other is decimal?
That is true, there is no difference. Any differences would be in the variable the value is stored in. The literals 0x7FFF and 32767 are identical to the compiler in every way.
See http://www.cplusplus.com/doc/tutorial/constants/.
Choosing to write 0x7fff or 32767 into source code it's only a programmer choice because, those values are stored in the same identical way into computer memory.
For example: I'd feel more comfortable use the 0x notation when I need to do operations with 4bit instead the classical byte.
If I need to extract the lower 4 bit of a char variable I'd do
res = charvar & 0x0f;
That's the same of:
res = charvar & 15;
The latter is just less intuitive and readable but the operation is identical
I've recently been assigned to a C++ project involving information being sent between computers via UDP. When a packet arrives, I have a program which accepts the data and can display it as a raw hexadecimal string. However, I'm struggling to grasp exactly how this whole process is supposed to work. The hex string supposedly contains several fields (e.g. a 4-char array, some float_32s, and some uint_32s).
How do I translate the sections of this string into the correct variable types? The first value, an ASCII title, was simple enough; the first eight chars in the hex string are a hexadecimal representation of an ASCII word (0x45 hex can be translated directly to the capital letter E). But the next value, a 32-bit float, doesn't really make sense to me. What is the relation between the hex value "42 01 33 33" and the float value "32.3" (a given example)?
I'm a bit in over my head here, I feel I'm missing some essential information regarding the way number systems work.
All types in C have a representation (which for most types is defined by a particular implementation). Most C implementations use IEEE 754 for representing the floating types (this may actually be a requirement for C and C++, but from memory it is not). The Wikipedia article explains how the floating types are represented in memory. In most C and C++ implementations, float is a 32-bit type and double is a 64-bit type. Therefore, in these implementations float is 4 bytes wide and double is 8 bytes wide.
Be careful, because the byte order can be different. Some architectures store the floating type in little endian, some in big endian. There is also a Wikipedia article on endianness too.
To copy the bytes to the floating type, you have to make sure that the floating type is the same size as the number of bytes you have, and then you can copy the bytes one-by-one ‘into’ the floating type. Something like this will give you the gist of it:
unsigned char rep[] = { 0x42, 0x01, 0x33, 0x33 };
float someFloat;
if (sizeof(someFloat) == 4)
{
memcpy(&someFloat, rep, sizeof(someFloat));
}
else
{
// throw an exception or something
}
There are other ways of copying the bytes to the floating type, but be careful about ‘breaking the rules’ (type-punning etc.). Also, if the resulting value is incorrect, it may be because the byte order is wrong, and therefore you need to copy the bytes in reverse, so that the 4th byte in the representation is the 1st byte of the float.
If you have a hex value:
42 01 33 33
It is the equivalent of
0100 0010 0000 0001 0011 0011 0011 0011
in binary code.
Now, there is a floating point standard called IEEE 754 which tells you how to format a floating point number into binary or back.
The gist of it is that the first bit is the sign (positive/negative number), the next 8 bits are the exponent and the last 23 are the mantisse. This is how the computer internally saves floating point numbers, since it's only able to store 1's and 0's.
If you add it all together in the way the IEEE specifies you get 32.3.
The exact data format is specified by the protocol used, but the common ways to represent numeric data are:
Unsigned integer: This is actually the simplest. Its typical representation works in principle like our normal decimal system, except that the "digits" are bytes, and can have 256 different values.
If you look at a decimal number like 3127, you see the three digits. The least significant digit is the last one (the 7 in this case). Least significant means that if you change it by 1, you get the minimal change of the value (namely 1). The most significant digit in the example is the 3 at the very left: If you change that one by 1, you make the maximal change of the value, namely a change of 1000. Since there are 10 different digits (0 to 9), the number represented by "3127" is 3*10*10*10 + 1*10*10 + 2*10 + 7. Note that itz is just a convention that the most significant digit comes first; you could also define that the least significant digit comes first, and then this number would be written as "7213".
Now in most encodings, unsigned numbers work exactly the same, except that the "digits" are bytes, and therefore instead of base 10 we have base 256. Also, unlike in decimal numbers, there's no universal convention whether the most significant byte (MSB) or the least significant byte (LSB) comes first; both conventions are used in different protocols or file formats.
For example, in 4-byte (i.e. 32 bit) unsigned int with MSB first (also called big-endian encoding), the value 1000 = 0*256^3 + 0*256^2 + 3*256 + 232 would be represented by the four byte values 0, 0, 3, 232, or hex 00 00 03 E8. For little-endian encoding (LSB first), it would be E8 03 00 00 instead. And as 16 bit integer, it would be just 03 E8 (big endian) or E8 03 (little endian).
For signed integers, the most often used representation is two's complement. Basically it means that if the most significant bit is 1 (i.e. the most significant byte is 128 or larger), the byte sequence doesn't encode the number as written above, but instead the negative number you get by subtracting 2^(bits) from it, where (bits) is the number of bits in the number. For example, in a signed 16-bit int, the sequence FF FF is not 65535 as it would be in 16-bit unsigned int, but rather 65535-2^16=-1. As with unsigned ints, you have to distinguish between big-endian and little-endian. For example, -3 would be FF FD in 16-bit bit endian, but FD FF in 16-bit little endian.
Floating point is quite a bit more complicated; today usually the format specified by IEEE/IEC is used. Basically, floating point numbers are of the form sign*(1.mantissa)*2^exponent, and sign, mantissa and exponent are stored in different subfields. Again, there are little-endian and big-endian forms.
I have an access control solution where the 27 bit format is 13 bits for the facility code and 14 bits for the badge ID. Conversely, I need to convert it into 8 bits for the facility code and 16 bits for the badge ID.
What is the largest number I can convert from on the 27 bit side to get the same result using the 8 bit facility code size? Meaning, if I have 13 bits for the facility code, how many bits can I chop off to still get the same result and an 8 bit size?
If the facility code is never greater than 255, you can chop off the 5 most significant bits (i.e. keep the 8 least significant ones), without losing information.