I want to convert a double to a 8 length char array in c++. The problem is that I want to cover all the number of bytes of the double type (double is not always 8 byte long in c++).
The char array is just used to store the bytes of the double, as if char type = byte type.
Any ideas?
Yes, you can always treat any object as an array of bytes. To access the bytes, use a reinterpret-cast:
T x; // any object
unsigned char const * bytes = reinterpret_cast<unsigned char const *>)(&x);
for (std::size_t i = 0; i != sizeof(T); ++i)
{
std::fprintf("Byte %zu is %02X\n", bytes[i]); // assuming CHAR_BIT == 8
}
Note that there isn't generally a way to know which of the bytes are part of the object representation and what their actual meaning is. For example, a long double may on certain platforms have size 12 or 16 but only have 10 relevant bytes, and you don't know which one is which. Though for a double with size 8 it's reasonable to assume that there's no padding and that the bytes make up an IEEE-754 representation in linear order. Your platform manual might tell you.
Related
I have an unsigned const char* buffer in memory (comes from the network) that I need to do some stuff with. What stumps me right now is that I need to interpret the first two bytes as binary data, while the rest is ASCII. I have no problem reading the ASCII (I think), but I can't figure out how to read just the first two bytes of the unsigned array, and turn them into (say) an int. I was going to use reinterpret_cast, but the first two bytes are not null-terminated, and the only other help I could find was all about file IO.
In short, I have something like {0000000000001011}ABC Z123 XY0 5, where the characters outside the curly braces are read as ASCII, while the ones inside are supposed to be a single binary number, i.e. 11).
int c1 = buffer[0];
int c2 = buffer[1];
int number = c1 << 8 + c2;
unsigned char* asciiData = buffer+2;
I really don't get why the bytes have to be "null-terminated" for you to use reinterpret_cast. What I would do (and works so far in my projects) is:
uint16_t first_bytes = *(reinterpret_cast<const uint16_t*>(buffer));
That would get you the first two bytes in the buffer and assign the value to the first_bytes variable.
I need to compute the Hamming distance between bitsets that are represented as char arrays. This is a core operation, so it must be as fast as possible. I have something like this:
const int N = 32; // 32 always
// returns the number of bits that are ones in a char
int countOnes_uchar8(unsigned char v);
// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
int ret = 0;
for(int i = 0; i < N; ++i, ++pa, ++pb)
{
ret += countOnes_uchar8(*pa ^ *pb);
}
return ret;
}
After profiling, I noticed that operating on ints is faster, so I wrote:
const int N = 32; // 32 always
// returns the number of bits that are ones in a int of 32 bits
int countOnes_int32(unsigned int v);
// pa and pb point to arrays of N items
int hamming(const unsigned char *pa, const unsigned char *pb)
{
const unsigned int *qa = reinterpret_cast<const unsigned int*>(pa);
const unsigned int *qb = reinterpret_cast<const unsigned int*>(pb);
int ret = 0;
for(int i = 0; i < N / sizeof(unsigned int); ++i, ++qa, ++qb)
{
ret += countOnes_int32(*qa ^ *qb);
}
return ret;
}
Questions
1) Is that cast from unsigned char * to unsigned int * safe?
2) I work on a 32-bit machine, but I would like the code to work on a 64-bit machine. Does sizeof(unsigned int) returns 4 in both machines, or is it 8 on a 64-bit one?
3) If sizeof(unsigned int) returned 4 in a 64-bit machine, how would I be able to operate on a 64-bit type, with long long?
Is that cast from unsigned char * to unsigned int * safe?
Formally, it gives undefined behaviour. Practically, it will work on just about any platform if the pointer is suitably aligned for unsigned int. On some platforms, it may fail, or perform poorly, if the alignment is wrong.
Does sizeof(unsigned int) returns 4 in both machines, or is it 8 on a 64-bit one?
It depends. Some platforms have 64-bit int, and some have 32-bit. It would probably make sense to use uint64_t regardless of platform; on a 32-bit platform, you'd effectively be unrolling the loop (processing two 32-bit values per iteration), which might give a modest improvement.
how would I be able to operate on a 64-bit type, with long long?
uint64_t, if you have a C++11 or C99 library. long long is at least 64 bits, but might not exist on a pre-2011 implementation.
1) No, it is not safe/portable, it is undefined behavior. There are systems where char is larger than one byte and there is no guarantee that the char pointer is properly aligned.
2) sizeof(int) might in theory be anything on a 64 bit machine. In practice, it will be either 4 or 8.
3) long long is most likely 64 bits but there are no guarantees there either. If you want guarantees, use uint64_t. However, for your specific algorithm I don't see why the sizeof() the data chunk would matter.
Consider using the types in stdint.h instead, they are far more suitable for portable code. Instead of char, int or long long, use uint_fast8_t. This will let the compiler pick the fastest integer for you, in a portable manner.
As a sidenote, you should consider implementing "countOnes" as a lookup table, working on 4, 8 or 32 bit level, depending on what is most optimal for your system. This will increase program size but reduce execution time. Maybe try to implement some form of adaptive lookup table which depends on sizeof(uint_fast8_t).
What is the use of unsigned char pointers? I have seen it at many places that pointer is type cast to pointer to unsinged char Why do we do so?
We receive a pointer to int and then type cast it to unsigned char*. But if we try to print element in that array using cout it does not print anything. why? I do not understand. I am new to c++.
EDIT Sample Code Below
int Stash::add(void* element)
{
if(next >= quantity)
// Enough space left?
inflate(increment);
// Copy element into storage, starting at next empty space:
int startBytes = next * size;
unsigned char* e = (unsigned char*)element;
for(int i = 0; i < size; i++)
storage[startBytes + i] = e[i];
next++;
return(next - 1); // Index number
}
You are actually looking for pointer arithmetic:
unsigned char* bytes = (unsigned char*)ptr;
for(int i = 0; i < size; i++)
// work with bytes[i]
In this example, bytes[i] is equal to *(bytes + i) and it is used to access the memory on the address: bytes + (i* sizeof(*bytes)). In other words: If you have int* intPtr and you try to access intPtr[1], you are actually accessing the integer stored at bytes: 4 to 7:
0 1 2 3
4 5 6 7 <--
The size of type your pointer points to affects where it points after it is incremented / decremented. So if you want to iterate your data byte by byte, you need to have a pointer to type of size 1 byte (that's why unsigned char*).
unsigned char is usually used for holding binary data where 0 is valid value and still part of your data. While working with "naked" unsigned char* you'll probably have to hold the length of your buffer.
char is usually used for holding characters representing string and 0 is equal to '\0' (terminating character). If your buffer of characters is always terminated with '\0', you don't need to know it's length because terminating character exactly specifies the end of your data.
Note that in both of these cases it's better to use some object that hides the internal representation of your data and will take care of memory management for you (see RAII idiom). So it's much better idea to use either std::vector<unsigned char> (for binary data) or std::string (for string).
In C, unsigned char is the only type guaranteed to have no trapping values, and which guarantees copying will result in an exact bitwise image. (C++ extends this guarantee to char as well.) For this reason, it is traditionally used for "raw memory" (e.g. the semantics of memcpy are defined in terms of unsigned char).
In addition, unsigned integral types in general are used when bitwise operations (&, |, >> etc.) are going to be used. unsigned char is the smallest unsigned integral type, and may be used when manipulating arrays of small values on which bitwise operations are used. Occasionally, it's also used because one needs the modulo behavior in case of overflow, although this is more frequent with larger types (e.g. when calculating a hash value). Both of these reasons apply to unsigned types in general; unsigned char will normally only be used for them when there is a need to reduce memory use.
The unsinged char type is usually used as a representation of a single byte of binary data. Thus, and array is often used as a binary data buffer, where each element is a singe byte.
The unsigned char* construct will be a pointer to the binary data buffer (or its 1st element).
I am not 100% sure what does c++ standard precisely says about size of unsigned char, whether it is fixed to be 8 bit or not. Usually it is. I will try to find and post it.
After seeing your code
When you use something like void* input as a parameter of a function, you deliberately strip down information about inputs original type. This is very strong suggestion that the input will be treated in very general manner. I.e. as a arbitrary string of bytes. int* input on the other hand would suggest it will be treated as a "string" of singed integers.
void* is mostly used in cases when input gets encoded, or treated bit/byte wise for whatever reason, since you cannot draw conclusions about its contents.
Then In your function you seem to want to treat the input as a string of bytes. But to operate on objects, e.g. performing operator= (assignment) the compiler needs to know what to do. Since you declare input as void* assignment such as *input = something would have no sense because *input is of void type. To make compiler to treat input elements as the "smallest raw memory pieces" you cast it to the appropriate type which is unsigned int.
The cout probably did not work because of wrong or unintended type conversion. char* is considered a null terminated string and it is easy to confuse singed and unsigned versionin code. If you pass unsinged char* to ostream::operator<< as a char* it will treat and expect the byte input as normal ASCII characters, where 0 is meant to be end of string not an integer value of 0. When you want to print contents of memory it is best to explicitly cast pointers.
Also note that to print memory contents of a buffer you would need to use a loop, since other wise the printing function would not know when to stop.
Unsigned char pointers are useful when you want to access the data byte by byte. For example, a function that copies data from one area to another could need this:
void memcpy (unsigned char* dest, unsigned char* source, unsigned count)
{
for (unsigned i = 0; i < count; i++)
dest[i] = source[i];
}
It also has to do with the fact that the byte is the smallest addressable unit of memory. If you want to read anything smaller than a byte from memory, you need to get the byte that contains that information, and then select the information using bit operations.
You could very well copy the data in the above function using a int pointer, but that would copy chunks of 4 bytes, which may not be the correct behavior in some situations.
Why nothing appears on the screen when you try to use cout, the most likely explanation is that the data starts with a zero character, which in C++ marks the end of a string of characters.
I was wondering why this size_t is used where I can use say int type. Its said that size_t is a return type of sizeof operator. What does it mean? like if I use sizeof(int) and store what its return to an int type variable, then it also works, it's not necessary to store it in a size_t type variable. I just clearly want to know the basic concept of using size_t with a clearly understandable example.Thanks
size_t is guaranteed to be able to represent the largest size possible, int is not. This means size_t is more portable.
For instance, what if int could only store up to 255 but you could allocate arrays of 5000 bytes? Clearly this wouldn't work, however with size_t it will.
The simplest example is pretty dated: on an old 16-bit-int system with 64 k of RAM, the value of an int can be anywhere from -32768 to +32767, but after:
char buf[40960];
the buffer buf occupies 40 kbytes, so sizeof buf is too big to fit in an int, and it needs an unsigned int.
The same thing can happen today if you use 32-bit int but allow programs to access more than 4 GB of RAM at a time, as is the case on what are called "I32LP64" models (32 bit int, 64-bit long and pointer). Here the type size_t will have the same range as unsigned long.
You use size_t mostly for casting pointers into unsigned integers of the same size, to perform calculations on pointers as if they were integers, that would otherwise be prevented at compile time. Such code is intended to compile and build correctly in the context of different pointer sizes, e.g. 32-bit model versus 64-bit.
It is implementation defined but on 64bit systems you will find that size_t is often 64bit while int is still 32bit (unless it's ILP64 or SILP64 model).
depending on what architecture you are on (16-bit, 32-bit or 64-bit) an int could be a different size.
if you want a specific size I use uint16_t or uint32_t .... You can check out this thread for more information
What does the C++ standard state the size of int, long type to be?
size_t is a typedef defined to store object size. It can store the maximum object size that is supported by a target platform. This makes it portable.
For example:
void * memcpy(void * destination, const void * source, size_t num);
memcpy() copies num bytes from source into destination. The maximum number of bytes that can be copied depends on the platform. So, making num as type size_t makes memcpy portable.
Refer https://stackoverflow.com/a/7706240/2820412 for further details.
size_t is a typedef for one of the fundamental unsigned integer types. It could be unsigned int, unsigned long, or unsigned long long depending on the implementation.
Its special property is that it can represent the size of (in bytes) of any object (which includes the largest object possible as well!). That is one of the reasons it is widely used in the standard library for array indexing and loop counting (that also solves the portability issue). Let me illustrate this with a simple example.
Consider a vector of length 2*UINT_MAX, where UINT_MAX denotes the maximum value of unsigned int (which is 4294967295 for my implementation considering 4 bytes for unsigned int).
std::vector vec(2*UINT_MAX,0);
If you would want to fill the vector using a for-loop such as this, it would not work because unsigned int can iterate only upto the point UINT_MAX (beyond which it will start again from 0).
for(unsigned int i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
The solution here is to use size_t since it is guaranteed to represent the size of any object (and therefore our vector vec too!) in bytes. Note that for my implementation size_t is a typedef for unsigned long and therefore its max value = ULONG_MAX = 18446744073709551615 considering 8 bytes.
for(size_t i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
References: https://en.cppreference.com/w/cpp/types/size_t
Say I have these two variables:
size_t value;
size_t size;
And I want to "cast" value to the size of size. So if size is 4, value is casted to be 4 bytes long. If size is 3, value is presumably truncated to 3 bytes long, preserving sign (assume a signed int may be loaded into value then taken out later to be cast back to signed) and stored in an int/uint depending on sign choice. Preferably with a method that would work to turn, for example, an unsigned long, or whatever other integral type, to any arbitrary size in bytes along with being signed/unsigned.
The cast to long is to preserve the sign, and long is supposed to be at least as big as size_t (though I think that's not actually true in MS compilers). If it's not true, pick a different signed type as big as size_t and replace the three references to long.
size_t casted = size_t(long(value) << (8 * (sizeof(long) - size))) >> (8 * (sizeof(long) - size)));
For an unsigned version use size_t instead of long.
This is untested.
It depends on what you mean by truncate. If your intent is just to clear the bytes beyond the truncation point to zero, you could probably get away with something like:
size_t mask[] = {0x00000000, 0xff000000, 0xffff0000, 0xffffff00, 0xffffffff};
value &= mask[size];
So, where size is zero, nothing is preserved. Where size is two, only the upper two bytes are preserved.
Obviously, this will depend on the actual widths of your data types so is implementation specific. But that's the case anyway since you're casting between size_t and other data types - those types are not necessarily compatible.