Can i use an array of uchars as a single uchar? - c++

I'm making a LZW compressor that records its output in hexadecimal. It currently uses an uchar (OpenCV) for storing values, and outputs the uchar in hexadecimal.
However, I have been asked to allow the user to choose how many bytes are used when storing each value, so he could have, for example, 2 bytes for each value (or 32 bytes, it's up to him).
So, to manipulate the output, I was thinking of using an array of uchars (so, if the user asks for 32 bytes, I use an array of 32 uchars), and the question is: is there an easy way to write a big value to this array and outputting that value later without having to worry about what is in what index and other things? That is, to treat the array as just a x byte uchar? Should I use a vector?
Any help is appreciated.

You could use the following union
union pun_unsigned {
unsigned char c[sizeof(uint64_t)];
uint16_t u16;
uint32_t u32;
uint64_t u64;
};
Note that only conversions from or to (signed or unsigned) char are defined behaviour.

Related

Reading a char array as a float array in C++, without having to copy it using memcpy

I have a question really similar to this:
Building a 32-bit float out of its 4 composite bytes.
Specifically I have an array of unsigned char composed by 8 elements:
unsigned char c[8] = {0b01001000, 0b11100001, 0b00100110, 0b01000001, 0b01111011,0b00010100, 0b10000110, 0b01000000}
This, with a little endianness convention corresponds to two floats, namely { 10.4300f, 4.19000f }.
I know that I could obtain the latter with:
float f[2];
memcpy(&f, &c, sizeof(f))
//f = { 10.4300f, 4.19000f }
But this involves, a copy.
Is there a way to cast the c array inplace, changing its type so that I can avoid copying?
Is there a way to cast the c array inplace
No. However, if the array is sufficiently aligned to hold a float, what you can do after memcpy is to placement-new a copy of that float onto the array.
Optimisers are smart, and typically know that you copied same value back. Sometimes two copies for abstract machine results in zero copies for cpu.
This, with a little endianness convention corresponds
I know that I could obtain the latter with
Note that memcpy will always result in native byte order and thus you only get little endian result on little endian systems. Thus the assumption of the data being interpreted as little endian is not a portable assumption.
If you want to avoid assuming native endianness, you'll need to read bytes in correct order, shift / mask them into an unsigned integer, then memcpy (or bit_cast) that integer into float.

Is there an actual 8-bit integer data type in C++

In c++, specifically the cstdint header file, there are types for 8-bit integers which turn out to be of the char data type with a typedef. Could anyone suggest an actual 8-bit integer type?
Yes, you are right. int8_t and uint8_t are typedef to char on platforms where 1 byte is 8 bits. On platforms where it is not, appropriate definition will be given.
Following answer is based on assumption that char is 8 bits
char holds 1 byte, which may be signed or unsigned based on implementation.
So int8_t is signed char and uint8_t is unsigned char, but this will be safe to use int8_t/uint8_t as actual 8-bit integer without relying too much on the implementation.
For a implementer's point of view, typedeffing where char is 8 bits makes sense.
Having seen all this, It is safe to use int8_t or uint8_t as real 8 bit integer.

dealing with endianness in c++

I am working on translating a system from python to c++. I need to be able to perform actions in c++ that are generally performed by using Python's struct.unpack (interpreting binary strings as numerical values). For integer values, I am able to get this to (sort of) work, using the data types in stdint.h:
struct.unpack("i", str) ==> *(int32_t*) str; //str is a char* containing the data
This works properly for little-endian binary strings, but fails on big-endian binary strings. Basically, I need an equivalent to using the > tag in struct.unpack:
struct.unpack(">i", str) ==> ???
Please note, if there is a better way to do this, I am all ears. However, I cannot use c++11, nor any 3rd party libraries other than Boost. I will also need to be able to interpret floats and doubles, as in struct.unpack(">f", str) and struct.unpack(">d", str), but I'll get to that when I solve this.
NOTE I should point out that the endianness of my machine is irrelevant in this case. I know that the bitstream I receive in my code will ALWAYS be big-endian, and that's why I need a solution that will always cover the big-endian case. The article pointed out by BoBTFish in the comments seems to offer a solution.
For 32 and 16-bit values:
This is exactly the problem you have for network data, which is big-endian. You can use the the ntohl to turn a 32-bit into host order, little-endian in your case.
The ntohl() function converts the unsigned integer netlong from network byte order to
host byte order.
int res = ntohl(*((int32_t) str)));
This will also take care of the case where your host is big-endian and won't do anything.
For 64-bit values
Non-standardly on linux/BSD you can take a look at 64 bit ntohl() in C++?, which points to htobe64
These functions convert the byte encoding of integer values from the byte order that
the current CPU (the "host") uses, to and from little-endian and big-endian byte
order.
For windows try: How do I convert between big-endian and little-endian values in C++?
Which points to _byteswap_uint64 and as well as a 16 and 32-bit solution and a gcc-specific __builtin_bswap(32/64) call.
Other Sizes
Most systems don't have values that aren't 16/32/64 bits long. At that point I might try to store it in a 64-bit value, shift it and they translate. I'd write some good tests. I suspectt is an uncommon situation and more details would help.
Unpack the string one byte at a time.
unsigned char *str;
unsigned int result;
result = *str++ << 24;
result |= *str++ << 16;
result |= *str++ << 8;
result |= *str++;
First, the cast you're doing:
char *str = ...;
int32_t i = *(int32_t*)str;
results in undefined behavior due to the strict aliasing rule (unless str is initialized with something like int32_t x; char *str = (char*)&x;). In practical terms that cast can result in an unaligned read which causes a bus error (a crash) on some platforms and slow performance on others.
Instead you should be doing something like:
int32_t i;
std::memcpy(&i, c, sizeof(i));
There are a number of functions for swapping bytes between the host's native byte ordering and a host independent ordering: ntoh*(), hton*(), where * is nothing, l, or s for the different types supported. Since different hosts may have different byte orderings then this may be what you want to use if the data you're reading uses a consistent serialized form on all platforms.
ntoh(i);
You can also manually move bytes around in str before copying it into the integer.
std::swap(str[0],str[3]);
std::swap(str[1],str[2]);
std::memcpy(&i,str,sizeof(i));
Or you can manually manipulate the integer's value using shifts and bitwise operators.
std::memcpy(&i,str,sizeof(i));
i = (i&0xFFFF0000)>>16 | (i&0x0000FFFF)<<16;
i = (i&0xFF00FF00)>>8 | (i&0x00FF00FF)<<8;
This falls in the realm of bit twiddling.
for (i=0;i<sizeof(struct foo);i++) dst[i] = src[i ^ mask];
where mask == (sizeof type -1) if the stored and native endianness differ.
With this technique one can convert a struct to bit masks:
struct foo {
byte a,b; // mask = 0,0
short e; // mask = 1,1
int g; // mask = 3,3,3,3,
double i; // mask = 7,7,7,7,7,7,7,7
} s; // notice that all units must be aligned according their native size
Again these masks can be encoded with two bits per symbol: (1<<n)-1, meaning that in 64-bit machines one can encode necessary masks of a 32 byte sized struct in a single constant (with 1,2,4 and 8 byte alignments).
unsigned int mask = 0xffffaa50; // or zero if the endianness matches
for (i=0;i<16;i++) {
dst[i]=src[i ^ ((1<<(mask & 3))-1]; mask>>=2;
}
If your as received values are truly strings, (char* or std::string) and you know their format information, sscanf(), and atoi(), well, really ato() will be your friends. They take well formatted strings and convert them per passed-in formats (kind of reverse printf).

Bit setting question

In C or C++, it's apparently possible to restrict the number of bits a variable has, so for example:
unsigned char A:1;
unsigned char B:3;
I am unfamiliar however with how it works specifically, so a number of questions:
If I have a class with the following variables:
unsigned char A:1;
unsigned char B:3;
unsigned char C:1;
unsigned char D:3;
What is the above technique actually called?
Is above class four bytes in size, or one byte in size?
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
Is there someway of combining the bits to a centralised byte? So for example:
.
unsigned char MainByte;
unsigned char A:1; //Can this be made to point at the first bit in MainByte?
unsigned char B:3; //Etc etc
unsigned char C:1;
unsigned char D:3;
Is there an article that covers this topic in more depth?
If 'A:1' is treated like an entire byte, what is the point/purple of it?
Feel free to mention any other considerations (like compiler restrictions or other limitations).
Thank you.
What is the above technique actually called?
Bitfields. And you're only supposed to use int (signed, unsigned or otherwise) as the "type", not char.
Is above class four bytes in size, or one byte in size?
Neither. It is probably sizeof(int) because the compiler generates a word-sized object. The actual bitfields will be stored within a byte, however. It'll just waste some space.
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
They represent only the bits specified, and will be packed as tightly as possible.
Is there someway of combining the bits to a centralised byte? So for example:
Use a union:
struct bits {
unsigned A:1;
unsigned B:3;
unsigned C:1;
unsigned D:3;
};
union SplitByte {
struct bits Bits;
unsigned char Byte[sizeof(struct bits)];
/* the array is a trick so the two fields
are guaranteed to be the same size and
thus be aligned on the same boundary */
} SplitByteObj;
// access the byte
SplitByteObj.Byte[0]
// access a bitfield
SplitByteObj.Bits.B
Note that there are problems with bitfields, for example when using threads. Each bitfield cannot be accessed individually, so you may get errors if you try to use a mutex to guard each of them. Also, the order in which the fields are laid out is not clearly specified by the standard. Many people prefer to use bitwise operators to implement bitfields manually for that reason.
Is there an article that covers this topic in more depth?
Not many. The first few you'll get when you Google it are about all you'll find. They're not a widely used construct. You'll be best off nitpicking the standard to figure out exactly how they work so that you don't get bitten by a weird edge case. I couldn't tell you exactly where in the standard they're specified.
If 'A:1' is treated like an entire byte, what is the point/purple of it?
It's not, but I've addressed this already.
These are bit-fields.
The details of how these fields are arranged in memory are largely implementation-defined. Typically, you will find that the compiler packs them in some way. But it may take various alignment issues into account.

how do i define 24 bit array in c++?

how do i define 24 bit array in c++? (variable declaration)
There is no 24-bit variable type in C++.
You can use a bitpacked struct:
struct ThreeBytes {
uint32_t value:24;
};
But it is not guaranteed that sizeof ThreeBytes == 3.
You can also just use uint32_t or sint32_t, depending on what you need.
Another choice is to use std::bitset:
typedef std::bitset<24> ThreeBytes;
Then make an array out of that:
ThreeBytes *myArray = new ThreeBytes[10];
Of course, if you really just need "three bytes", you can make an array of arrays:
typedef uint8_t ThreeBytes[3];
Note that uint8_t and friends are non-standard, and are used simply for clarification.
An unsigned byte array of 3 bytes is 24 bits. Depending on how you are planning to use it, it could do.
unsigned char arrayname[3]
As #GMan points out you should be aware that it's not 100% of all systems that has 8 bits chars.
If you intend to perform bitwise operations on them, then simply use an integral type that has at least 24 bits. An int is 32 bits on most platforms, so an int may be suitable for this purpose.
EDIT: Since you actually wanted an array of 24 bit variables, the most straightforward way to do this is to create an array of ints or longs (as long as it's an integral data type that contains at least 24 bits) and treat each element as though it was 24 bits.
Depending on your purpose (for example, if you are concerned that using a 32bit type might waste too much memory), you might also consider creating an array of bytes with three times the length.
I used to do that a lot for storing RGB images. To access the n'th element, you would have to multiply by three and then add zero, one or two depending on which "channel" out of the element you wanted. Of course, if you want to access all 24 bits as one integer, this approach requires some additional arithmetics.
So simply unsigned char myArray[ELEMENTS * 3]; where ELEMENTS is the number of 24bit elements that you want.
Use bitset or bitvector if they are supported on your platform. (They are :) )
std::vector<std::bitset<24> > myArray;