Store an int in a char array? - c++

I want to store a 4-byte int in a char array... such that the first 4 locations of the char array are the 4 bytes of the int.
Then, I want to pull the int back out of the array...
Also, bonus points if someone can give me code for doing this in a loop... IE writing like 8 ints into a 32 byte array.
int har = 0x01010101;
char a[4];
int har2;
// write har into char such that:
// a[0] == 0x01, a[1] == 0x01, a[2] == 0x01, a[3] == 0x01 etc.....
// then, pull the bytes out of the array such that:
// har2 == har
Thanks guys!
EDIT: Assume int are 4 bytes...
EDIT2: Please don't care about endianness... I will be worrying about endianness. I just want different ways to acheive the above in C/C++. Thanks
EDIT3: If you can't tell, I'm trying to write a serialization class on the low level... so I'm looking for different strategies to serialize some common data types.

Unless you care about byte order and such, memcpy will do the trick:
memcpy(a, &har, sizeof(har));
...
memcpy(&har2, a, sizeof(har2));
Of course, there's no guarantee that sizeof(int)==4 on any particular implementation (and there are real-world implementations for which this is in fact false).
Writing a loop should be trivial from here.

Not the most optimal way, but is endian safe.
int har = 0x01010101;
char a[4];
a[0] = har & 0xff;
a[1] = (har>>8) & 0xff;
a[2] = (har>>16) & 0xff;
a[3] = (har>>24) & 0xff;

#include <stdio.h>
int main(void) {
char a[sizeof(int)];
*((int *) a) = 0x01010101;
printf("%d\n", *((int *) a));
return 0;
}
Keep in mind:
A pointer to an object or incomplete type may be converted to a pointer to a different
object or incomplete type. If the resulting pointer is not correctly aligned for the
pointed-to type, the behavior is undefined.

Note: Accessing a union through an element that wasn't the last one assigned to is undefined behavior.
(assuming a platform where characters are 8bits and ints are 4 bytes)
A bit mask of 0xFF will mask off one character so
char arr[4];
int a = 5;
arr[3] = a & 0xff;
arr[2] = (a & 0xff00) >>8;
arr[1] = (a & 0xff0000) >>16;
arr[0] = (a & 0xff000000)>>24;
would make arr[0] hold the most significant byte and arr[3] hold the least.
edit:Just so you understand the trick & is bit wise 'and' where as && is logical 'and'.
Thanks to the comments about the forgotten shift.

int main() {
typedef union foo {
int x;
char a[4];
} foo;
foo p;
p.x = 0x01010101;
printf("%x ", p.a[0]);
printf("%x ", p.a[1]);
printf("%x ", p.a[2]);
printf("%x ", p.a[3]);
return 0;
}
Bear in mind that the a[0] holds the LSB and a[3] holds the MSB, on a little endian machine.

Don't use unions, Pavel clarifies:
It's U.B., because C++ prohibits
accessing any union member other than
the last one that was written to. In
particular, the compiler is free to
optimize away the assignment to int
member out completely with the code
above, since its value is not
subsequently used (it only sees the
subsequent read for the char[4]
member, and has no obligation to
provide any meaningful value there).
In practice, g++ in particular is
known for pulling such tricks, so this
isn't just theory. On the other hand,
using static_cast<void*> followed by
static_cast<char*> is guaranteed to
work.
– Pavel Minaev

You can also use placement new for this:
void foo (int i) {
char * c = new (&i) char[sizeof(i)];
}

#include <stdint.h>
int main(int argc, char* argv[]) {
/* 8 ints in a loop */
int i;
int* intPtr
int intArr[8] = {1, 2, 3, 4, 5, 6, 7, 8};
char* charArr = malloc(32);
for (i = 0; i < 8; i++) {
intPtr = (int*) &(charArr[i * 4]);
/* ^ ^ ^ ^ */
/* point at | | | */
/* cast as int* | | */
/* Address of | */
/* Location in char array */
*intPtr = intArr[i]; /* write int at location pointed to */
}
/* Read ints out */
for (i = 0; i < 8; i++) {
intPtr = (int*) &(charArr[i * 4]);
intArr[i] = *intPtr;
}
char* myArr = malloc(13);
int myInt;
uint8_t* p8; /* unsigned 8-bit integer */
uint16_t* p16; /* unsigned 16-bit integer */
uint32_t* p32; /* unsigned 32-bit integer */
/* Using sizes other than 4-byte ints, */
/* set all bits in myArr to 1 */
p8 = (uint8_t*) &(myArr[0]);
p16 = (uint16_t*) &(myArr[1]);
p32 = (uint32_t*) &(myArr[5]);
*p8 = 255;
*p16 = 65535;
*p32 = 4294967295;
/* Get the values back out */
p16 = (uint16_t*) &(myArr[1]);
uint16_t my16 = *p16;
/* Put the 16 bit int into a regular int */
myInt = (int) my16;
}

char a[10];
int i=9;
a=boost::lexical_cast<char>(i)
found this is the best way to convert char into int and vice-versa.
alternative to boost::lexical_cast is sprintf.
char temp[5];
temp[0]="h"
temp[1]="e"
temp[2]="l"
temp[3]="l"
temp[5]='\0'
sprintf(temp+4,%d",9)
cout<<temp;
output would be :hell9

union value {
int i;
char bytes[sizof(int)];
};
value v;
v.i = 2;
char* bytes = v.bytes;

Related

Divide char* into few variables

I have some char array: char char[8] which containing for example two ints, on first 4 indexes is first int, and on next 4 indexes there is second int.
char array[8] = {0,0,0,1,0,0,0,1};
int a = array[0-3]; // =1;
int b = array[4-8]; // =1;
How to cast this array to two int's?
There can be any other type, not necessarily int, but this is only some example:
I know i can copy this array to two char arrays which size will be 4 and then cast each of array to int. But i think this isn't nice, and breaks the principle of clean code.
If your data has the correct endianness, you can extract blitable types from a byte buffer with memcpy:
int8_t array[8] = {0,0,0,1,0,0,0,1};
int32_t a, b;
memcpy(&a, array + 0, sizeof a);
memcpy(&b, array + 4, sizeof b);
While #Vivek is correct that ntohl can be used to normalize endianness, you have to do that as a second step. Do not play games with pointers as that violates strict aliasing and leads to undefined behavior (in practice, either alignment exceptions or the optimizer discarding large portions of your code as unreachable).
int8_t array[8] = {0,0,0,1,0,0,0,1};
int32_t tmp;
memcpy(&tmp, array + 0, sizeof tmp);
int a = ntohl(tmp);
memcpy(&tmp, array + 4, sizeof tmp);
int b = ntohl(tmp);
Please note that almost all optimizing compilers are smart enough to not call a function when they see memcpy with a small constant count argument.
Let's use a little bit of the C++ algorithms, such as std::accumulate:
#include <numeric>
#include <iostream>
int getTotal(const char* value, int start, int end)
{
return std::accumulate(value + start, value + end, 0,
[](int n, char ch){ return n * 10 + (ch-'0');});
}
int main()
{
char value[8] = {'1','2','3','4','0','0','1','4'};
int total1 = getTotal(value, 0, 4);
int total2 = getTotal(value, 4, 8);
std::cout << total1 << " " << total2;
}
Note the usage of std::accumulate and the lambda function. All we did was have a running total, multiplying each subtotal by 10. The character is translated to a number by simply subtracting '0'.
Live Example
You can type cast the bytes from the array to an int *. Then dereferencing will cause 4 bytes to be read as an int. Then doing an ntohl, will ensure that the bytes in the int are arranged as per the host order.
char array[8] = {0,0,0,1,0,0,0,1};
int a = *((int *)array);
int b = *((int *)&array[4]);
a = ntohl(a);
b = ntohl(b);
This will set a and b to 1 on both little and big endian systems.
If the compiler is set for strict aliasing, memcpy could be used to achieve the same, as follows:
char array[8] = {0,0,0,1,0,0,0,1};
int a, b;
memcpy(&a, array, sizeof(int));
memcpy(&b, array+4, sizeof(int));
a = ntohl(a);
b = ntohl(b);

c++ - store byte[4] in an int

I want to take a byte array with 4 bytes in it, and store it in an int.
For example (non-working code):
unsigned char _bytes[4];
int * combine;
_bytes[0] = 1;
_bytes[1] = 1;
_bytes[2] = 1;
_bytes[3] = 1;
combine = &_bytes[0];
I do not want to use bit shifting to put the bytes in the int, I would like to point at the bytes memory and use them as an int if possible.
In Standard C++ it's not possible to do this reliably. The strict aliasing rule says that when you read through an expression of type int, it must actually designate an int object (or a const int etc.) otherwise it causes undefined behaviour.
However you can do the opposite: declare an int and then fill in the bytes:
int combine;
unsigned char *bytes = reinterpret_cast<unsigned char *>(&combine);
bytes[0] = 1;
bytes[1] = 1;
bytes[2] = 1;
bytes[3] = 1;
std::cout << combine << std::endl;
Of course, which value you get out of this depends on how your system represents integers. If you want your code to use the same mapping on different systems then you can't use memory aliasing; you'd have to use an equation instead.

Converting 4 bytes in little endian order into an unsigned integer

I have a string of 256*4 bytes of data. These 256* 4 bytes need to be converted into 256 unsigned integers. The order in which they come is little endian, i.e. the first four bytes in the string are the little endian representation of the first integer, the next 4 bytes are the little endian representation of the next integer, and so on.
What is the best way to parse through this data and merge these bytes into unsigned integers? I know I have to use bitshift operators but I don't know in what way.
Hope this helps you
unsigned int arr[256];
char ch[256*4] = "your string";
for(int i = 0,k=0;i<256*4;i+=4,k++)
{
arr[k] = ch[i]|ch[i+1]<<8|ch[i+2]<<16|ch[i+3]<<24;
}
Alternatively, we can use C/C++ casting to interpret a char buffer as an array of unsigned int. This can help get away with shifting and endianness dependency.
#include <stdio.h>
int main()
{
char buf[256*4] = "abcd";
unsigned int *p_int = ( unsigned int * )buf;
unsigned short idx = 0;
unsigned int val = 0;
for( idx = 0; idx < 256; idx++ )
{
val = *p_int++;
printf( "idx = %d, val = %d \n", idx, val );
}
}
This would print out 256 values, the first one is
idx = 0, val = 1684234849
(and all remaining numbers = 0).
As a side note, "abcd" converts to 1684234849 because it's run on X86 (Little Endian), in which "abcd" is 0x64636261 (with 'a' is 0x61, and 'd' is 0x64 - in Little Endian, the LSB is in the smallest address). So 0x64636261 = 1684234849.
Note also, if using C++, reinterpret_cast should be used in this case:
const char *p_buf = "abcd";
const unsigned int *p_int = reinterpret_cast< const unsigned int * >( p_buf );
If your host system is little-endian, just read along 4 bytes, shift properly and copy them to int
char bytes[4] = "....";
int i = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
If your host is big-endian, do the same and reverse the bytes in the int, or reverse it on-the-fly while copying with bit-shifting, i.e. just change the indexes of bytes[] from 0-3 to 3-0
But you shouldn't even do that just copy the whole char array to the int array if your PC is in little-endian
#define LEN 256
char bytes[LEN*4] = "blahblahblah";
unsigned int uint[LEN];
memcpy(uint, bytes, sizeof bytes);
That said, the best way is to avoid copying at all and use the same array for both types
union
{
char bytes[LEN*4];
unsigned int uint[LEN];
} myArrays;
// copy data to myArrays.bytes[], do something with those bytes if necessary
// after populating myArrays.bytes[], get the ints by myArrays.uint[i]

C++ Bits in 64 bit integer

Hello I have a struct here that is 7 bytes and I'd like to write it to a 64 bit integer. Next, I'd like to extract out this struct later from the 64 bit integer.
Any ideas on this?
#include "stdafx.h"
struct myStruct
{
unsigned char a;
unsigned char b;
unsigned char b;
unsigned int someNumber;
};
int _tmain(int argc, _TCHAR* argv[])
{
myStruct * m = new myStruct();
m->a = 11;
m->b = 8;
m->c = 12;
m->someNumber = 30;
printf("\n%s\t\t%i\t%i\t%i\t%i\n\n", "struct", m->a, m->b, m->c, m->someNumber);
unsigned long num = 0;
// todo: use bitwise operations from m into num (total of 7 bytes)
printf("%s\t\t%i\n\n", "ulong", num);
m = new myStruct();
// todo: use bitwise operations from num into m;
printf("%s\t\t%i\t%i\t%i\t%i\n\n", "struct", m->a, m->b, m->c, m->someNumber);
return 0;
}
You should to do something like this:
class structured_uint64
{
uint64_t data;
public:
structured_uint64(uint64_t x = 0):data(x) {}
operator uint64_t&() { return data; }
unsigned uint8_t low_byte(size_t n) const { return data >> (n * 8); }
void low_byte(size_t n, uint8_t val) {
uint64_t mask = static_cast<uint64_t>(0xff) << (8 * n);
data = (data & ~mask) | (static_cast<uint64_t>(val) << (8 * n));
}
unsigned uint32_t hi_word() const { return (data >> 24); }
// et cetera
};
(there is, of course, lots of room for variation on the details of the interface and where among the 64 bits the constituents are placed)
Using different types to alias the same portion of memory is a generally bad idea. The thing is, it's very valuable for the optimizer to be able to use reasoning like:
"Okay, I've read a uint64_t at the start of this block, and nowhere in the middle does the program write to any uint64_ts, therefore the value must be unchanged!"
which means it will get the wrong answer if you tried to change the value of the uint64_t object through a uint32_t reference. And as this is very dependent what optimizations are possible and done, it is actually pretty easy to never run across the problem in test cases, but see it in the real program you're trying to write -- and you'll spend forever trying to find the bug because you convinced yourself it's not this problem.
So, you really should do the insertion/extraction of the fields with bit twiddling (or intrinsics, if profiling shows that this is a performance issue and there are useful ones available) rather than trying to set up a clever struct.
If you really know what you're doing, you can make the aliasing work, I believe. But it should only be done if you really know what you're doing, and that includes knowing relevant rules from the standard inside and out (which I don't, and so I can't advise you on how to make it work). And even then you probably shouldn't do it.
Also, if you intend your integral types to be a specific size, you should really use the correct types. For example, never use unsigned int for an integer that is supposed to be exactly 32 bits. Instead use uint32_t. Not only is it self-documenting, but you won't run into a nasty surprise when you try to build your program in an environment where unsigned int is not 32 bits.
Use a union. Each element of a union occupies the same address space. The struct is one element, the unsigned long long is another.
#include <stdio.h>
union data
{
struct
{
unsigned char a;
unsigned char b;
unsigned char c;
unsigned int d;
} e;
unsigned long long f;
};
int main()
{
data dat;
dat.f = 0xFFFFFFFFFFFFFFFF;
dat.e.a = 1;
dat.e.b = 2;
dat.e.c = 3;
dat.e.d = 4;
printf("f=%016llX\n",dat.f);
printf("%02X %02X %02X %08X\n",dat.e.a,dat.e.b,dat.e.c,dat.e.d);
return 0;
}
Output, but note one byte of the original unsigned long long remains. Compilers like to align data such as 4-byte integers on addresses divisible by 4, so three bytes, then a pad byte so the integer is at offset 4 and the struct has a total size of 8.
f=00000004FF030201
01 02 03 00000004
This can be controlled in compiler-dependent fashion. Below is for Microsoft C++:
#include <stdio.h>
#pragma pack(push,1)
union data
{
struct
{
unsigned char a;
unsigned char b;
unsigned char c;
unsigned int d;
} e;
unsigned long long f;
};
#pragma pack(pop)
int main()
{
data dat;
dat.f = 0xFFFFFFFFFFFFFFFF;
dat.e.a = 1;
dat.e.b = 2;
dat.e.c = 3;
dat.e.d = 4;
printf("f=%016llX\n",dat.f);
printf("%02X %02X %02X %08X\n",dat.e.a,dat.e.b,dat.e.c,dat.e.d);
return 0;
}
Note the struct occupies seven bytes now and the highest byte of the unsigned long long is now unchanged:
f=FF00000004030201
01 02 03 00000004
Got it.
static unsigned long long compress(char a, char b, char c, unsigned int someNumber)
{
unsigned long long x = 0;
x = x | a;
x = x << 8;
x = x | b;
x = x << 8;
x = x | c;
x = x << 32;
x = x | someNumber;
return x;
}
myStruct * decompress(unsigned long long x)
{
printBinary(x);
myStruct * m = new myStruct();
m->someNumber = x | 4294967296;
x = x >> 32;
m->c = x | 256;
x = x >> 8;
m->b = x | 256;
x = x >> 8;
m->a = x | 256;
return m;
}

Integer into char array

I need to convert integer value into char array on bit layer. Let's say int has 4 bytes and I need to split it into 4 chunks of length 1 byte as char array.
Example:
int a = 22445;
// this is in binary 00000000 00000000 1010111 10101101
...
//and the result I expect
char b[4];
b[0] = 0; //first chunk
b[1] = 0; //second chunk
b[2] = 87; //third chunk - in binary 1010111
b[3] = 173; //fourth chunk - 10101101
I need this conversion make really fast, if possible without any loops (some tricks with bit operations perhaps). The goal is thousands of such conversions in one second.
I'm not sure if I recommend this, but you can #include <stddef.h> and <sys/types.h> and write:
*(u32_t *)b = htonl((u32_t)a);
(The htonl is to ensure that the integer is in big-endian order before you store it.)
int a = 22445;
char *b = (char *)&a;
char b2 = *(b+2); // = 87
char b3 = *(b+3); // = 173
Depending on how you want negative numbers represented, you can simply convert to unsigned and then use masks and shifts:
unsigned char b[4];
unsigned ua = a;
b[0] = (ua >> 24) & 0xff;
b[1] = (ua >> 16) & 0xff;
b[2] = (ua >> 8) & 0xff
b[3] = ua & 0xff;
(Due to the C rules for converting negative numbers to unsigned, this will produce the twos complement representation for negative numbers, which is almost certainly what you want).
To access the binary representation of any type, you can cast a pointer to a char-pointer:
T x; // anything at all!
// In C++
unsigned char const * const p = reinterpret_cast<unsigned char const *>(&x);
/* In C */
unsigned char const * const p = (unsigned char const *)(&x);
// Example usage:
for (std::size_t i = 0; i != sizeof(T); ++i)
std::printf("Byte %u is 0x%02X.\n", p[i]);
That is, you can treat p as the pointer to the first element of an array unsigned char[sizeof(T)]. (In your case, T = int.)
I used unsigned char here so that you don't get any sign extension problems when printing the binary value (e.g. through printf in my example). If you want to write the data to a file, you'd use char instead.
You have already accepted an answer, but I will still give mine, which might suit you better (or the same...). This is what I tested with:
int a[3] = {22445, 13, 1208132};
for (int i = 0; i < 3; i++)
{
unsigned char * c = (unsigned char *)&a[i];
cout << (unsigned int)c[0] << endl;
cout << (unsigned int)c[1] << endl;
cout << (unsigned int)c[2] << endl;
cout << (unsigned int)c[3] << endl;
cout << "---" << endl;
}
...and it works for me. Now I know you requested a char array, but this is equivalent. You also requested that c[0] == 0, c[1] == 0, c[2] == 87, c[3] == 173 for the first case, here the order is reversed.
Basically, you use the SAME value, you only access it differently.
Why haven't I used htonl(), you might ask?
Well since performance is an issue, I think you're better off not using it because it seems like a waste of (precious?) cycles to call a function which ensures that bytes will be in some order, when they could have been in that order already on some systems, and when you could have modified your code to use a different order if that was not the case.
So instead, you could have checked the order before, and then used different loops (more code, but improved performance) based on what the result of the test was.
Also, if you don't know if your system uses a 2 or 4 byte int, you could check that before, and again use different loops based on the result.
Point is: you will have more code, but you will not waste cycles in a critical area, which is inside the loop.
If you still have performance issues, you could unroll the loop (duplicate code inside the loop, and reduce loop counts) as this will also save you a couple of cycles.
Note that using c[0], c[1] etc.. is equivalent to *(c), *(c+1) as far as C++ is concerned.
typedef union{
byte intAsBytes[4];
int int32;
}U_INTtoBYTE;