Related
I am trying to count the length of an int that may or may not have leading 0s. For instance 0100. I tried using the to_string() method and it turned 0100 into the string "64" for some reason & I don't understand why. I am relatively new to C++ and I think I may have fundamentally misunderstood how to_string() works.
I am using C++11 for my compiler.
By starting your integer literal with 0, you actually declared the number in octal base. (100 in octal = 64 in decimal)
See cppreference's page on integer literals for more details.
I have tried searching around but have not been able to find much about binary literals and endianness. Are binary literals little-endian, big-endian or something else (such as matching the target platform)?
As an example, what is the decimal value of 0b0111? Is it 7? Platform specific? Something else? Edit: I picked a bad value of 7 since it is represented within one byte. The question has been sufficiently answered despite this fact.
Some background: Basically I'm trying to figure out what the value of the least significant bits are, and masking it with binary literals seemed like a good way to go... but only if there is some guarantee about endianness.
Short answer: there isn't one. Write the number the way you would write it on paper.
Long answer:
Endianness is never exposed directly in the code unless you really try to get it out (such as using pointer tricks). 0b0111 is 7, it's the same rules as hex, writing
int i = 0xAA77;
doesn't mean 0x77AA on some platforms because that would be absurd. Where would the extra 0s that are missing go anyway with 32-bit ints? Would they get padded on the front, then the whole thing flipped to 0x77AA0000, or would they get added after? I have no idea what someone would expect if that were the case.
The point is that C++ doesn't make any assumptions about the endianness of the machine*, if you write code using primitives and the literals it provides, the behavior will be the same from machine to machine (unless you start circumventing the type system, which you may need to do).
To address your update: the number will be the way you write it out. The bits will not be reordered or any such thing, the most significant bit is on the left and the least significant bit is on the right.
There seems to be a misunderstanding here about what endianness is. Endianness refers to how bytes are ordered in memory and how they must be interpretted. If I gave you the number "4172" and said "if this is four-thousand one-hundred seventy-two, what is the endianness" you can't really give an answer because the question doesn't make sense. (some argue that the largest digit on the left means big endian, but without memory addresses the question of endianness is not answerable or relevant). This is just a number, there are no bytes to interpret, there are no memory addresses. Assuming 4 byte integer representation, the bytes that correspond to it are:
low address ----> high address
Big endian: 00 00 10 4c
Little endian: 4c 10 00 00
so, given either of those and told "this is the computer's internal representation of 4172" you could determine if its little or big endian.
So now consider your binary literal 0b0111 these 4 bits represent one nybble, and can be stored as either
low ---> high
Big endian: 00 00 00 07
Little endian: 07 00 00 00
But you don't have to care because this is also handled by the hardware, the language dictates that the compiler reads from left to right, most significant bit to least significant bit
Endianness is not about individual bits. Given that a byte is 8 bits, if I hand you 0b00000111 and say "is this little or big endian?" again you can't say because you only have one byte (and no addresses). Endianness doesn't pertain to the order of bits in a byte, it refers to the ordering of entire bytes with respect to address(unless of course you have one-bit bytes).
You don't have to care about what your computer is using internally. 0b0111 just saves you the time from having to write stuff like
unsigned int mask = 7; // only keep the lowest 3 bits
by writing
unsigned int mask = 0b0111;
Without needing to comment explaining the significance of the number.
* In c++20 you can check the endianness using std::endian.
All integer literals, including binary ones are interpreted in the same way as we normally read numbers (left most digit being most significant).
The C++ standard guarantees the same interpretation of literals without having to be concerned with the specific environment you're on. Thus, you don't have to concern yourself with endianness in this context.
Your example of 0b0111 is always equal to seven.
The C++ standard doesn't use terms of endianness in regards to number literals. Rather, it simply describes that literals have a consistent interpretation, and that the interpretation is the one you would expect.
C++ Standard - Integer Literals - 2.14.2 - paragraph 1
An integer literal is a sequence of digits that has no period or
exponent part, with optional separating single quotes that are ignored
when determining its value. An integer literal may have a prefix that
specifies its base and a suffix that specifies its type. The lexically
first digit of the sequence of digits is the most significant. A
binary integer literal (base two) begins with 0b or 0B and consists of
a sequence of binary digits. An octal integer literal (base eight)
begins with the digit 0 and consists of a sequence of octal digits.
A decimal integer literal (base ten) begins with a digit other than 0
and consists of a sequence of decimal digits. A hexadecimal integer
literal (base sixteen) begins with 0x or 0X and consists of a sequence
of hexadecimal digits, which include the decimal digits and the
letters a through f and A through F with decimal values ten through
fifteen. [Example: The number twelve can be written 12, 014, 0XC, or
0b1100. The literals 1048576, 1’048’576, 0X100000, 0x10’0000, and
0’004’000’000 all have the same value. — end example ]
Wikipedia describes what endianness is, and uses our number system as an example to understand big-endian.
The terms endian and endianness refer to the convention used to
interpret the bytes making up a data word when those bytes are stored
in computer memory.
Big-endian systems store the most significant byte of a word in the
smallest address and the least significant byte is stored in the
largest address (also see Most significant bit). Little-endian
systems, in contrast, store the least significant byte in the smallest
address.
An example on endianness is to think of how a decimal number is
written and read in place-value notation. Assuming a writing system
where numbers are written left to right, the leftmost position is
analogous to the smallest address of memory used, and rightmost
position the largest. For example, the number one hundred twenty three
is written 1 2 3, with the hundreds place left-most. Anyone who reads
this number also knows that the leftmost digit has the biggest place
value. This is an example of a big-endian convention followed in daily
life.
In this context, we are considering a digit of an integer literal to be a "byte of a word", and the word to be the literal itself. Also, the left-most character in a literal is considered to have the smallest address.
With the literal 1234, the digits one, two, three and four are the "bytes of a word", and 1234 is the "word". With the binary literal 0b0111, the digits zero, one, one and one are the "bytes of a word", and the word is 0111.
This consideration allows us to understand endianness in the context of the C++ language, and shows that integer literals are similar to "big-endian".
You're missing the distinction between endianness as written in the source code and endianness as represented in the object code. The answer for each is unsurprising: source-code literals are bigendian because that's how humans read them, in object code they're written however the target reads them.
Since a byte is by definition the smallest unit of memory access I don't believe it would be possible to even ascribe an endianness to any internal representation of bits in a byte -- the only way to discover endianness for larger numbers (whether intentionally or by surprise) is by accessing them from storage piecewise, and the byte is by definition the smallest accessible storage unit.
The C/C++ languages don't care about endianness of multi-byte integers. C/C++ compilers do. Compilers parse your source code and generate machine code for the specific target platform. The compiler, in general, stores integer literals the same way it stores an integer; such that the target CPU's instructions will directly support reading and writing them in memory.
The compiler takes care of the differences between target platforms so you don't have to.
The only time you need to worry about endianness is when you are sharing binary values with other systems that have different byte ordering.Then you would read the binary data in, byte by byte, and arrange the bytes in memory in the correct order for the system that your code is running on.
One picture is sometimes more than thousand words.
Endianness is implementation-defined. The standard guarantees that every object has an object representation as an array of char and unsigned char, which you can work with by calling memcpy() or memcmp(). In C++17, it is legal to reinterpret_cast a pointer or reference to any object type (not a pointer to void, pointer to a function, or nullptr) to a pointer to char, unsigned char, or std::byte, which are valid aliases for any object type.
What people mean when they talk about “endianness” is the order of bytes in that object representation. For example, if you declare unsigned char int_bytes[sizeof(int)] = {1}; and int i; then memcpy( &i, int_bytes, sizeof(i)); do you get 0x01, 0x01000000, 0x0100, 0x0100000000000000, or something else? The answer is: yes. There are real-world implementations that produce each of these results, and they all conform to the standard. The reason for this is so the compiler can use the native format of the CPU.
This comes up most often when a program needs to send or receive data over the Internet, where all the standards define that data should be transmitted in big-endian order, on a little-endian CPU like the x86. Some network libraries therefore specify whether particular arguments and fields of structures should be stored in host or network byte order.
The language lets you shoot yourself in the foot by twiddling the bits of an object representation arbitrarily, but it might get you a trap representation, which could cause undefined behavior if you try to use it later. (This could mean, for example, rewriting a virtual function table to inject arbitrary code.) The <type_traits> header has several templates to test whether it is safe to do things with an object representation. You can copy one object over another of the same type with memcpy( &dest, &src, sizeof(dest) ) if that type is_trivially_copyable. You can make a copy to correctly-aligned uninitialized memory if it is_trivially_move_constructible. You can test whether two objects of the same type are identical with memcmp( &a, &b, sizeof(a) ) and correctly hash an object by applying a hash function to the bytes in its object representation if the type has_unique_object_representations. An integral type has no trap representations, and so on. For the most part, though, if you’re doing operations on object representations where endianness matters, you’re telling the compiler to assume you know what you’re doing and your code will not be portable.
As others have mentioned, binary literals are written with the most-significant-digit first, like decimal, octal or hexidecimal literals. This is different from endianness and will not affect whether you need to call ntohs() on the port number from a TCP header read in from the Internet.
You might want to think about C or C++ or any other language as being intrinsically little endian (think about how the bitwise operators work). If the underlying HW is big endian, the compiler ensures that the data is stored in big endian (ditto for other endianness) however your bit wise operations work as if the data is little endian. Thing to remember is that as far as the language is concerned, data is in little endian. Endianness related problems arise when you cast the data from one type to the other. As long as you don't do that you are good.
I was questioned about the statement "C/C++ language as being intrinsically little endian", as such I am providing an example which many knows how it works but well here I go.
typedef union
{
struct {
int a:1;
int reserved:31;
} bits;
unsigned int value;
} u;
u test;
test.bits.a = 1;
test.bits.reserved = 0;
printf("After bits assignment, test.value = 0x%08X\n", test.value);
test.value = 0x00000001;
printf("After value assignment, test.value = 0x%08X\n", test.value);
Output on a little endian system:
After bits assignment, test.value = 0x00000001
After value assignment, test.value = 0x00000001
Output on a big endian system:
After bits assignment, test.value = 0x80000000
After value assignment, test.value = 0x00000001
So, if you do not know the processor's endianness, where does everything come out right? in the little endian system! Thus, I say that the C/C++ language is intrinsically little endian.
I'd like to know what the difference is between the values 0x7FFF and 32767. As far as I know, they should both be integers, and the only benefit is the notational convenience. They will take up the same amount of memory, and be represented the same way, or is there another reason for choosing to write a number as 0x vs base 10?
The only advantage is that some programmers find it easier to convert between base 16 and binary in their heads. Since each base 16 digit occupies exactly 4 bits, it's a lot easier to visualize the alignment of bits. And writing in base 2 is quite cumbersome.
The type of an undecorated decimal integral constants is always signed. The type of an undecorated hexadecimal or octal constant alternates between signed and unsigned as you hit the various boundary values determined by the widths of the integral types.
For constants decorated as unsigned (e.g. 0xFU), there is no difference.
Also, it's not possible to express 0 as a decimal literal.
See Table 6 in C++11 and 6.4.4.1/5 in C11.
Both are integer literals, and just provide a different means of expressing the same number. There is no technical advantage to using one form over the other.
Note that you can also use octal notation as well (by prepending the value with 0).
The 0x7FFF notation is much more clear about potential over/underflow than the decimal notation.
If you using something that is 16 bits wide, 0x7FFF alerts you to the fact that if you use those bits in a signed way, you are at the very maximum of what those 16 bits can hold for a positive, signed value. Add 1 to it, and you'll overflow.
Same goes for 32 bits wide. The maximum that it can hold (signed, positive) is 0x7FFFFFFF.
You can see these maximums straight off of the hex notation, whereas you can't off of the decimal notation. (Unless you happen to have memorized that 32767 is the positive signed max for 16 bits).
(Note: the above is true when twos complement is being used for distinguishing between positive and negative values if the 16 bits are holding a signed value).
One is hex -- base 16 -- and the other is decimal?
That is true, there is no difference. Any differences would be in the variable the value is stored in. The literals 0x7FFF and 32767 are identical to the compiler in every way.
See http://www.cplusplus.com/doc/tutorial/constants/.
Choosing to write 0x7fff or 32767 into source code it's only a programmer choice because, those values are stored in the same identical way into computer memory.
For example: I'd feel more comfortable use the 0x notation when I need to do operations with 4bit instead the classical byte.
If I need to extract the lower 4 bit of a char variable I'd do
res = charvar & 0x0f;
That's the same of:
res = charvar & 15;
The latter is just less intuitive and readable but the operation is identical
I am currently trying to learn some more in depth stuff of file formats.
I have a spec for a 3D file format (U3D in this case) and I want to try to implement that. Nothing serious, just for the learning effect.
My problem starts very early with the types, that need to be defined. I have to define different integers (8Bit, 16bit, 32bit unsigned and signed) and these then need to be converted to hex before writing that to a file.
How do I define these types, since I can not just create an I16 i.e.?
Another problem for me is how to convert that I16 to a hex number with 8 digits
(i.e. 0001 0001).
Hex is just a representation of a number. Whether you interpret the number as binary, decimal, hex, octal etc is up to you. In C++ you have support for decimal, hex, and octal representations, but they are all stored in the same way.
Example:
int x = 0x1;
int y = 1;
assert(x == y);
Likely the file format wants you to store the files in normal binary format. I don't think the file format wants the hex numbers as a readable text string. If it does though then you could use std::hex to do the conversion for you. (Example: file << hex << number;)
If the file format talks about writing more than a 1 byte type to file then be careful of the Endianness of your architecture. Which means do you store the most significant byte of the multi byte type first or last.
It is very common in file format specifications to show you how the binary should look for a given part of the file. Don't confuse this though with actually storing binary digits as strings. Likewise they will sometimes give a shortcut for this by specifying in hex how it should look. Again most of the time they don't actually mean text strings.
The smallest addressable unit in C++ is a char which is 1 byte. If you want to set bits within that byte you need to use bitwise operators like & and |. There are many tutorials on bitwise operators so I won't go into detail here.
If you include <stdint.h> you will get types such as:
uint8_t
int16_t
uint32_t
First, let me understand.
The integers are stored AS TEXT in a file, in hexadecimal format, without prefix 0x?
Then, use this syntax:
fprintf(fp, "%08x", number);
Will write 0abc1234 into a file.
As for "define different integers (8Bit, 16bit, 32bit unsigned and signed)", unless you roll your own, and concomitant math operations for them, you should stick to the types supplied by your system. See stdint.h for the typedefs available, such as int32_t.
This question already has answers here:
Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
(14 answers)
Closed 6 years ago.
I am facing a problem and unable to resolve it. Need help from gurus. Here is sample code:-
float f=0.01f;
printf("%f",f);
if we check value in variable during debugging f contains '0.0099999998' value and output of printf is 0.010000.
a. Is there any way that we may force the compiler to assign same values to variable of float type?
b. I want to convert float to string/character array. How is it possible that only and only exactly same value be converted to string/character array. I want to make sure that no zeros are padded, no unwanted values are padded, no changes in digits as in above example.
It is impossible to accurately represent a base 10 decimal number using base 2 values, except for a very small number of values (such as 0.25). To get what you need, you have to switch from the float/double built-in types to some kind of decimal number package.
You could use boost::lexical_cast in this way:
float blah = 0.01;
string w = boost::lexical_cast<string>( blah );
The variable w will contain the text value 0.00999999978. But I can't see when you really need it.
It is preferred to use boost::format to accurately format a float as an string. The following code shows how to do it:
float blah = 0.01;
string w = str( boost::format("%d") % blah ); // w contains exactly "0.01" now
Have a look at this C++ reference. Specifically the section on precision:
float blah = 0.01;
printf ("%.2f\n", blah);
There are uncountably many real numbers.
There are only a finite number of values which the data types float, double, and long double can take.
That is, there will be uncountably many real numbers that cannot be represented exactly using those data types.
The reason that your debugger is giving you a different value is well explained in Mark Ransom's post.
Regarding printing a float without roundup, truncation and with fuller precision, you are missing the precision specifier - default precision for printf is typically 6 fractional digits.
try the following to get a precision of 10 digits:
float amount = 0.0099999998;
printf("%.10f", amount);
As a side note, a more C++ way (vs. C-style) to do things is with cout:
float amount = 0.0099999998;
cout.precision(10);
cout << amount << endl;
For (b), you could do
std::ostringstream os;
os << f;
std::string s = os.str();
In truth using the floating point processor or co-processor or section of the chip itself (most are now intergrated into the CPU), will never result in accurate mathematical results, but they do give a fairly rough accuracy, for more accurate results, you could consider defining a class "DecimalString", which uses nybbles as decimal characters and symbols... and attempt to mimic base 10 mathematics using strings... in that case, depending on how long you want to make the strings, you could even do away with the exponent part altogether a string 256 can represent 1x10^-254 upto 1^+255 in straight decimal using actual ASCII, shorter if you want a sign, but this may prove significantly slower. You could speed this by reversing the digit order, so from left to right they read
units,tens,hundreds,thousands....
Simple example
eg. "0021" becomes 1200
This would need "shifting" left and right to make the decimal points line up before routines as well, the best bet is to start with the ADD and SUB functions, as you will then build on them in the MUL and DIV functions. If you are on a large machine, you could make them theoretically as long as your heart desired!
Equally, you could use the stdlib.h, in there are the sprintf, ecvt and fcvt functions (or at least, there should be!).
int sprintf(char* dst,const char* fmt,...);
char *ecvt(double value, int ndig, int *dec, int *sign);
char *fcvt(double value, int ndig, int *dec, int *sign);
sprintf returns the number of characters it wrote to the string, for example
float f=12.00;
char buffer[32];
sprintf(buffer,"%4.2f",f) // will return 5, if it is an error it will return -1
ecvt and fcvt return characters to static char* locations containing the null terminated decimal representations of the numbers, with no decimal point, most significant number first, the offset of the decimal point is stored in dec, the sign in "sign" (1=-,0=+) ndig is the number of significant digits to store. If dec<0 then you have to pad with -dec zeros pror to the decimal point. I fyou are unsure, and you are not working on a Windows7 system (which will not run old DOS3 programs sometimes) look for TurboC version 2 for Dos 3, there are still one or two downloads available, it's a relatively small program from Borland which is a small Dos C/C++ edito/compiler and even comes with TASM, the 16 bit machine code 386/486 compile, it is covered in the help files as are many other useful nuggets of information.
All three routines are in "stdlib.h", or should be, though I have found that on VisualStudio2010 they are anything but standard, often overloaded with function dealing with WORD sized characters and asking you to use its own specific functions instead... "so much for standard library," I mutter to myself almost each and every time, "Maybe they out to get a better dictionary!"
You would need to consult your platform standards to determine how to best determine the correct format, you would need to display it as a*b^C, where 'a' is the integral component that holds the sign, 'b' is implementation defined (Likely fixed by a standard), and 'C' is the exponent used for that number.
Alternatively, you could just display it in hex, it'd mean nothing to a human, though, and it would still be binary for all practical purposes. (And just as portable!)
To answer your second question:
it IS possible to exactly and unambiguously represent floats as strings. However, this requires a hexadecimal representation. For instance, 1/16 = 0.1 and 10/16 is 0.A.
With hex floats, you can define a canonical representation. I'd personally use a fixed number of digits representing the underlying number of bits, but you could also decide to strip trailing zeroes. There's no confusion possible on which trailing digits are zero.
Since the representation is exact, the conversions are reversible: f==hexstring2float(float2hexstring(f))