I have a buffer unsigned char table[512] that I want to convert faster into a table of short int table[256] where every position is compound by to bytes of the table.
I have a camera that give me this buffer that is the table to convert the disparity to the real depth.
unsigned char zDtable[512] = {0};
unsigned short int zDTableHexa[256]={0};
.. get the buffer data.....
for (int i = 0; i < 256; ++i) {
zDTableHexa[i]=zDtable[i*2]<<8 + zDtable[i*2+1];
}
these 2 has problem in converting well the values, the bytes are inversed:
memcpy(zDTableHexa_ptr,zDtable,256*sizeof( unsigned short int));
unsigned short* zDTableHexa = (unsigned short*)zDtable;
Try something like this
short* zDTableHexa = (short*)zDtable;
It simply maps the memory space of char array to an array of shorts. So if the memory looks like this:
(char0),(char1),(char2),(char3)
then it will be reinterpreted to be
(short0 = char0,char1),(short1 = char2,char3)
Beware that such direct reinterpretation depends on endianness and formally allows a sufficiently pedantic compiler to do ungood things, i.e., it's system- and compiler-specific.
Related
char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);
so if the memory looks like this:
0000 0000 0000 0000 0000 0000 0000 0001
The program outputs 0.
How do I make it output a uint value of the entire 32 bits?
Because you are using type promotion. char will promote to int when accessed. You'll get no diagnostic for this. So what you are doing is dereferencing the first element in your char array, which is 0, and assigning it to an int...which likewise ends up being 0.
What you want to do is technically undefined behavior but generally works. You want to do this:
unsigned int j = *reinterpret_cast<unsigned int*>(f);
At this point you'll be dealing with undefined behavior and with the endianness of the platform. You probably do not have the value you want recorded in your byte stream. You're treading in territory that requires intimate knowledge of your compiler and your target architecture.
Supposed your platform supports 32bit length integers, you can do the following to achieve the kind of cast you want:
char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
uint32_t j;
memcpy(&j,f,sizeof(j));
printf("%u\n", j);
Be aware of endianess in integer representation.
In order to ensure that your code works on both little endian and big endian systems, you could do the following:
char f[4] = {0,0,0,1};
int32_t j = *((int32_t *)f);
j=ntohl(j);
printf("%d", j);
This will print 1 on both little endian and big endian systems. Without using ntohl, 1 will only be printed on Big Endian systems.
The code works because f is being assigned values in the same way as in a Big Endian System. Since network order is also Big Endian, ntohl will correctly convert j. If the host is Big Endian, j will remain unchanged. If the host is Little Endian, the bytes in j will be reversed.
What happens in the line:
unsigned int j = *f;
is simply assigning the first element of f to the integer j. It is equivalent to:
unsigned int j = f[0];
and since f[0] is 0 it is really just assigning a 0 to the integer:
unsigned int j = 0;
You will have to convert the elements of f.
Reinterpretation will always cause undefined behavior. The following example shows such usage and it is always incorrect:
unsigned int j = *( unsigned int* )f;
Undefined behavior may produce any result, even apparently correct ones. Even if such code appears to produce correct results when you run it for the first time, this isn't proof that the program is defined. The program is still undefined, and may produce incorrect results at any time.
There is no such thing as technically undefined behavior or generally works, the program is either undefined or not. Relying on such statements is dangerous and irresponsible.
Luckily we don't have to rely on such bad code.
All you need to do is choose the representation of the integer that will be stored in f, and then convert it. It appears you want to store in big-endian, with at most 8 bits per element. This doesn't mean that the machine must be big-endian, only the representation of the integer you're encoding in f. Representation of integers on the machine is not important, as this method is completely portable.
This means the most significant byte will appear first. The most significant byte is f[0], and the least significant byte is f[3].
We will need an integer capable of storing at least 32 bits and type unsigned long does this.
Type char is for used storing characters not integers. An unsigned integer type like unsigned char should be used.
Then only the conversion from big-endian encoded in f must be done:
unsigned char encoded[4] = { 0 , 0 , 0 , 1 };
unsigned long value = 0;
value = value | ( ( ( unsigned long )encoded[0] & 0xFF ) << 24 );
value = value | ( ( ( unsigned long )encoded[1] & 0xFF ) << 16 );
value = value | ( ( ( unsigned long )encoded[2] & 0xFF ) << 8 );
value = value | ( ( ( unsigned long )encoded[3] & 0xFF ) << 0 );
regarding the posted code:
char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);
in C, the return type from malloc() is void* which can be assigned to any other pointer, so casting just clutters the code and can be a problem when applying maintenance to the code.
The C standard defines sizeof(char) as 1, so that expression has absolutely no effect as a part of the expression passed to malloc()
the size of a int is not necessarily 4 (think of microprocessors or 64bit architecture)
the function: calloc() will pre set all the bytes to 0x00
which byte should be set to 0x01 depends on the Endianness of the underlying architecture
lets' assume, for now, your computer is a little Endian architecture. (I.E. Intel or similar)
then the code should look similar to the following:
#include <stdio.h> // printf(), perror()
#include <stdlib.h> // calloc(), exit(), EXIT_FAILURE
int main( void )
{
char *f = calloc( 1, sizeof(unsigned int) );
if( !f )
{
perror( "calloc failed" );
exit( EXIT_FAILURE );
}
// implied else, calloc successful
// f[sizeof(unsigned int)-1] = 0x01; // if big Endian
f[0] = 0x01; // assume little Endian/Intel x86 or similar
unsigned int j = *(unsigned int*)f;
printf("%u\n", j);
}
Which when compiled/linked, outputs the following:
1
I could not fully understand the consequences of what I read here: Casting an int pointer to a char ptr and vice versa
In short, would this work?
set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffff;
if ((uintmax_t)buffer % 4) {//misaligned
for (int i = 0; i < 4; i++) {
buffer[i] = 0xff;
}
} else {//4-byte alignment
*((uint32_t*) buffer) = MASK;
}
}
Edit
There was a long discussion (it was in the comments, which mysteriously got deleted) about what type the pointer should be casted to in order to check the alignment. The subject is now addressed here.
This conversion is safe if you are filling same value in all 4 bytes. If byte order matters then this conversion is not safe.
Because when you use integer to fill 4 Bytes at a time it will fill 4 Bytes but order depends on the endianness.
No, it won't work in every case. Aside from endianness, which may or may not be an issue, you assume that the alignment of uint32_t is 4. But this quantity is implementation-defined (C11 Draft N1570 Section 6.2.8). You can use the _Alignof operator to get the alignment in a portable way.
Second, the effective type (ibid. Sec. 6.5) of the location pointed to by buffer may not be compatible to uint32_t (e.g. if buffer points to an unsigned char array). In that case you break strict aliasing rules once you try reading through the array itself or through a pointer of different type.
Assuming that the pointer actually points to an array of unsigned char, the following code will work
typedef union { unsigned char chr[sizeof(uint32_t)]; uint32_t u32; } conv_t;
void set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffffU;
if ((uintptr_t)buffer % _Alignof(uint32_t)) {// misaligned
for (size_t i = 0; i < sizeof(uint32_t); i++) {
buffer[i] = 0xffU;
}
} else { // correct alignment
conv_t *cnv = (conv_t *) buffer;
cnv->u32 = MASK;
}
}
This code might be of help to you. It shows a 32-bit number being built by assigning its contents a byte at a time, forcing misalignment. It compiles and works on my machine.
#include<stdint.h>
#include<stdio.h>
#include<inttypes.h>
#include<stdlib.h>
int main () {
uint32_t *data = (uint32_t*)malloc(sizeof(uint32_t)*2);
char *buf = (char*)data;
uintptr_t addr = (uintptr_t)buf;
int i,j;
i = !(addr%4) ? 1 : 0;
uint32_t x = (1<<6)-1;
for( j=0;j<4;j++ ) buf[i+j] = ((char*)&x)[j];
printf("%" PRIu32 "\n",*((uint32_t*) (addr+i)) );
}
As mentioned by #Learner, endianness must be obeyed. The code above is not portable and would break on a big endian machine.
Note that my compiler throws the error "cast from ‘char*’ to ‘unsigned int’ loses precision [-fpermissive]" when trying to cast a char* to an unsigned int, as done in the original post. This post explains that uintptr_t should be used instead.
In addition to the endian issue, which has already been mentioned here:
CHAR_BIT - the number of bits per char - should also be considered.
It is 8 on most platforms, where for (int i=0; i<4; i++) should work fine.
A safer way of doing it would be for (int i=0; i<sizeof(uint32_t); i++).
Alternatively, you can include <limits.h> and use for (int i=0; i<32/CHAR_BIT; i++).
Use reinterpret_cast<>() if you want to ensure the underlying data does not "change shape".
As Learner has mentioned, when you store data in machine memory endianess becomes a factor. If you know how the data is stored correctly in memory (correct endianess) and you are specifically testing its layout as an alternate representation, then you would want to use reinterpret_cast<>() to test that memory, as a specific type, without modifying the original storage.
Below, I've modified your example to use reinterpret_cast<>():
void set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffff;
if (*reinterpret_cast<unsigned int *>(buffer) % 4) {//misaligned
for (int i = 0; i < 4; i++) {
buffer[i] = 0xff;
}
} else {//4-byte alignment
*reinterpret_cast<unsigned int *>(buffer) = MASK;
}
}
It should also be noted, your function appears to set the buffer (32-bytes of contiguous memory) to 0xFFFFFFFF, regardless of which branch it takes.
Your code is perfect for working with any architecture with 32bit and up. There is no issue with byte ordering since all your source bytes are 0xFF.
At x86 or x64 machines, the extra work necessary to deal with eventually unaligned access to RAM are managed by the CPU and transparent to the programmer (since Pentium II), with some performance cost at each access. So, if you are just setting the first four bytes of a buffer a few times, you are good to simplify your function:
void set4Bytes(unsigned char* buffer) {
const uint32_t MASK = 0xffffffff;
*((uint32_t *)buffer) = MASK;
}
Some readings:
A Linux kernel doc about UNALIGNED MEMORY ACCESSES
Intel Architecture Optimization Manual, section 3.4
Windows Data Alignment on IPF, x86, and x64
A Practical 'Aligned vs. unaligned memory access', by Alexander Sandler
I have to read 10 bytes from a file and the last 4 bytes are an unsigned integer. But I got a 11 char byte long char array / pointer. How do I convert the last 4 bytes (without the zero terminating character at the end) to an unsigned integer?
//pesudo code
char *p = readBytesFromFile();
unsigned int myInt = 0;
for( int i = 6; i < 10; i++ )
myInt += (int)p[i];
Is that correct? Doesn't seem correct to me.
The following code might work:
myInt = *(reinterpret_cast<unsigned int*>(p + 6));
iff:
There are no alignment problems (e.g. on a GPU memory space this is very likely to blow if some guarantees aren't provided).
You can guarantee that the system endianness is the same used to store the data
You can be sure that sizeof(int) == 4, this is not guaranteed everywhere
If not, as Dietmar suggested, you should loop over your data (forward or reverse according to the endianness) and do something like
myInt = myInt << 8 | static_cast<unsigned char>(p[i])
this is alignment-safe (it should be on every system). Still pay attention to points 1 and 3.
I agree with the previous answer but just wanna add that this solution may not work 100% if the file was created with a different endianness.
I do not want to confuse you with extra information but keep in mind that endianness may cause you problem when you cast directly from a file.
Here's a tutorial on endianness : http://www.codeproject.com/Articles/4804/Basic-concepts-on-Endianness
Try myInt = *(reinterpret_cast<unsigned int*>(p + 6));.
This takes the address of the 6th character, reinterprets as a pointer to an unsigned int, and then returns the (unsigned int) value it points to.
Maybe using an union is an option? I think this might work;
UPDATE: Yes, it works.
union intc32 {
char c[4];
int v;
};
int charsToInt(char a, char b, char c, char d) {
intc32 r = { { a, b, c, d } };
return r.v;
}
Despite the fact that big-endian computers are not very widely used, I want to store the double datatype in an independant format.
For int, this is really simple, since bit shifts make that very convenient.
int number;
int size=sizeof(number);
char bytes[size];
for (int i=0; i<size; ++i)
bytes[size-1-i] = (number >> 8*i) & 0xFF;
This code snipet stores the number in big endian format, despite the machine it is being run on. What is the most elegant way to do this for double?
The best way for portability and taking format into account, is serializing/deserializing the mantissa and the exponent separately. For that you can use the frexp()/ldexp() functions.
For example, to serialize:
int exp;
unsigned long long mant;
mant = (unsigned long long)(ULLONG_MAX * frexp(number, &exp));
// then serialize exp and mant.
And then to deserialize:
// deserialize to exp and mant.
double result = ldexp ((double)mant / ULLONG_MAX, exp);
The elegant thing to do is to limit the endianness problem to as small a scope as possible. That narrow scope is the I/O boundary between your program and the outside world. For example, the functions that send binary data to / receive binary data from some other application need to be aware of the endian problem, as do the functions that write binary data to / read binary data from some data file. Make those interfaces cognizant of the representation problem.
Make everything else blissfully ignorant of the problem. Use the local representation everywhere else. Represent a double precision floating point number as a double rather than an array of 8 bytes, represent a 32 bit integer as an int or int32_t rather than an array of 4 bytes, et cetera. Dealing with the endianness problem throughout your code is going to make your code bloated, error prone, and ugly.
The same. Any numeric object, including double, is eventually several bytes which are interpreted in a specific order according to endianness. So if you revert the order of the bytes you'll get exactly the same value in the reversed endianness.
char *src_data;
char *dst_data;
for (i=0;i<N*sizeof(double);i++) *dst_data++=src_data[i ^ mask];
// where mask = 7, if native == low endian
// mask = 0, if native = big_endian
The elegance lies in mask which handles also short and integer types: it's sizeof(elem)-1 if the target and source endianness differ.
Not very portable and standards violating, but something like this:
std::array<unsigned char, 8> serialize_double( double const* d )
{
std::array<unsigned char, 8> retval;
char const* begin = reinterpret_cast<char const*>(d);
char const* end = begin + sizeof(double);
union
{
uint8 i8s[8];
uint16 i16s[4];
uint32 i32s[2];
uint64 i64s;
} u;
u.i64s = 0x0001020304050607ull; // one byte order
// u.i64s = 0x0706050403020100ull; // the other byte order
for (size_t index = 0; index < 8; ++index)
{
retval[ u.i8s[index] ] = begin[index];
}
return retval;
}
might handle a platform with 8 bit chars, 8 byte doubles, and any crazy-ass byte ordering (ie, big endian in words but little endian between words for 64 bit values, for example).
Now, this doesn't cover the endianness of doubles being different than that of 64 bit ints.
An easier approach might be to cast your double into a 64 bit unsigned value, then output that as you would any other int.
void reverse_endian(double number, char (&bytes)[sizeof(double)])
{
const int size=sizeof(number);
memcpy(bytes, &number, size);
for (int i=0; i<size/2; ++i)
std::swap(bytes[i], bytes[size-i-1]);
}
Ok, I'm using a raw SHA1 hash to seed a Mersenne Twister pseudo-random number generator
the generator gives me the option to seed either with an unsigned long or a array of unsigned longs
the SHA1 class I'm using gives me the hash as a 20 byte array of unsigned chars
I figured I could recast this array of chars to an array of longs to get a working seed but how can I know how long the resulting array of longs is?
example code:
CSHA1 sha1;
sha1.Update((unsigned char*)key, size_key);
sha1.Final();
unsigned char* hash;
sha1.GetHash(hash);
// Seed the random with the key
MTRand mt((unsigned long*)hash, <size of array of longs>);
I'm hoping that there is no data loss (as in no bytes are dropped off) as I need this to remain cryptography secure
You can say
sizeof(unsigned long) / sizeof(unsigned char)
to get the number of octets in a long.
However there are two potential problems with simply casting.
First, the array of chars might not be properly aligned. On some processors this can cause a trap. On others it just slows execution.
Second, you're asking for byte order problems if the program must work the same way on different architecutures.
You can solve both problems by copying the bytes into an array of longs explicitly. Untested code:
const int bytes_per_long = sizeof(unsigned long) / sizeof(unsigned char);
unsigned long hash_copy[key_length_in_bytes / bytes_per_long];
int i_hash = 0;
for (int i_copy = 0; i_copy < sizeof hash_copy / sizeof hash_copy[0]; i_copy++) {
unsigned long b = 0;
for (int i_byte = 0; i_byte < bytes_per_long; i_byte++)
b = (b << 8) | hash[i_hash++];
hash_copy[i_copy] = b;
}
// Now use hash_copy.
You can use len_of_chars * sizeof(char) / sizeof(long), where len_of_chars is presumably 20.
Your library seems to assume 32-bit unsigned longs, so there's no [more] harm in you doing the same. In fact, I'd go as far to assume 8-bit unsigned chars and perhaps even unpadded, little-endian representations for both. So you could use a simple cast (though I'd use a reinterpret_cast), or maybe #Gene's memcpy sample for alignment.
Portable code*, however, should use <cstdint>, the uint#_t types therein and piecewise, by-value copying for conversion:
uint32_t littleEndianInt8sToInt32(uint8_t bytes[4]) {
return bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
}
...and better names. Sorry, it's getting late here :)
*: Though, of course, stdint itself isn't very portable (>= C++11) and the exact-width types aren't guaranteed to be in it. Ironic.