Is there a correct or conventional way to guarantee a data type will be sizeof 4?
Previously I had just done some typedef statements based upon sizeof different data types, and I was wondering if there is a better or more conventional method to have data types of a pre determined size on different architectures?
For the sake of this question lets say I have a large array of chars that I have read from a file. I want to read a series of 24 bit Integers from this char array. In the past I have casted the array to a data type with a sizeof my desired data type, this method fails however if there is no primitive data type with my desired size.
What is the best way to handle this situation?
char x[10] = {1,1,1,1,1,1,1,1,1,1};
uint32_t* y = (uint32_t*)x;
for(int i=0; i < 10; i++)std::cout << "y: " << y[i] << "\n";
output:
y: 16843009
y: 16843009
y: 257
desired output:
y: 65793
y: 65793
y: 65793
....
Go old school and read-a-byte, read-a-byte, read-a-byte-byte-byte!
uint32_t read24(unsigned char *& bufp)
{
uint32_t val;
val = *bufp++;
val |= *bufp++ << 8;
val |= *bufp++ << 16;
return val;
}
Usage:
unsigned char buffer[] =
{ 0x2A, 0x00, 0x00, 0x9A, 0x02, 0x00, 0x4E, 0x61, 0xBC };
unsigned char * bufp = buffer;
uint32_t A = read24(bufp);
uint32_t B = read24(bufp);
uint32_t C = read24(bufp);
Why unsigned chars? Easiest way to deal with sign extension. If you use signed chars, you have to do masking like this:
val = *bufp++ & 0xFF;
to strip off the extra sign bits.
And watch out for endian. Depending on where your data is coming from you might have to read everything in the other direction.
The fixed-width type uint32_t is designed for this. It will be constructed from whatever primitive is appropriate on your system.
You can find it in <cstdint> or, pre-C++11, in Boost as <boost/cstdint.hpp>. C also provides a` that may be of use.
Failing that, char[4]?
Related
I need to add a 64 bit floating point number into an unsigned char array at specific indexes (ex. index 1 through 8).
Example unsigned char array:
unsigned char msg[10] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
I want to add a floating point number like 0.084, for example, which is represented as 1B2FDD240681B53F in hex (little endian) to the unsigned char array at indexes 1,2,3,4,5,6,7,8 and leave indexes 0 and 9 unchanged.
So, I would like the unsigned char array, msg, to contain the following:
msg = {0x00, 0x1B, 0x2F, 0xDD, 0x24, 0x06, 0x81, 0xB5, 0x3F, 0x00}
So far I can get a std::string with the hexadecimal representation of the example floating point value 0.084 using the following code but I'm not sure how to add the string values back into the unsigned char array:
#include <iostream>
#include <sstream>
#include <iomanip>
using namespace std;
int main()
{
union udoub
{
double d;
unsigned long long u;
};
double dVal = 0.084;
udoub val;
val.d = dVal;
std::stringstream ss;
ss << std::setw(16) << std::setfill('0') << std::hex << val.u << std::endl;
std::string strValHexString = ss.str();
cout<< strValHexString << std::endl;
return 0;
}
Output:
3fb5810624dd2f1b
I tried using std::copy like in the example below to copy the values from the std::string to an unsigned char but it doesn't seem to do what I want:
unsigned char ucTmp[2];
std::copy(strValHexString.substr(0,2).begin(), strValHexString.substr(0,2).end(), ucTmp);
Looking for a C or C++ solution.
Formatting the component bytes into a hex string and then reading those back in again is a terrible waste of time and effort. Just use std::memcpy() (in C++) or memcpy (in C):
std::memcpy(&msg[1], &dVal, sizeof(dVal));
This will take care of any required pointer alignment issues. However, it will not do any 'interpretation' in terms of your endianness - but this shouldn't be a problem unless you're then transferring that byte array between different platforms.
Your example has undefined behaviour due to reading from an inactive member of a union. A well defined way to do the conversion to integer:
auto uVal = std::bit_cast<std::uint64_t>(dVal);
Now that you have the data in an integer, you can use bitwise operations to extract individual octets in specific positions:
msg[1] = (uVal >> 0x0 ) & 0xff;
msg[2] = (uVal >> 0x8 ) & 0xff;
msg[3] = (uVal >> 0x10) & 0xff;
msg[4] = (uVal >> 0x18) & 0xff;
msg[5] = (uVal >> 0x20) & 0xff;
...
This can be condensed into a loop.
Note that this works the same way regardless of endianness of the CPU. The resulting order in the array will always be little endian unlike in the direct std::memcpy approach which results in native endianness which is not necessarily little endian on all systems. However, if floating point and integers use different endianness, then the order won't be the same even with this approach.
Right now in my current project i have a string like this :
std::string ordre="0x010x030x000x320x000x01";
And i would like to create a char array looking like this with it (and if possible the reverse action too) :
unsigned char Com[]= {0x01, 0x03, 0x00, 0x32, 0x00, 0x01};
I have no problem working with the string, creating another std::string and getting the 0x01 part in the beginning using ordre.at() for the characters i want. But i can't find a way to put this new string 0x01 into Com[1].
Writing directly :
Com[1]=0x01;
It works but i would like to make something where Com[1] could change.
Right now in my current project i have a string like this :
std::string ordre="0x010x030x000x320x000x01";
And i would like to create a char array looking like this with it (and if possible the reverse action too) :
unsigned char Com[]= {0x01, 0x03, 0x00, 0x32, 0x00, 0x01};
First, "0x01" is different than 0x01. To extract values from the string, you will need to read it in a loop, four characters at a time:
if(ordre.size() % 4)
throw std::runtime_error{ "invalid string length; format is different" };
std::vector<int> values;
auto b = std::begin(ordre);
const auto e = std::end(ordre);
while(b != e)
{
std::string s{ b, b+4 };
values.push_back(std::stoi(s, 0, 16));
b += 4;
}
If you use C++ : use STL structures instead of (strugling with) arrays
If I where you i'll build a std::vector<unsigned char> and fill it dynamically by iterating over the x in the loop.
I would do the vector filling as such:
Note: This code work ith any size input and is not limited to 4 chars substrings. It is therefore more general but less efficient then the other answers code. Choose according to your needs
std::string order = "0x010x020x030x2360x10240x9001";
std::vector<int> coms;
size_t pos = 0, it;
while ((it = order.find("0x", pos + 1)) != std::string::npos)
{
coms.push_back(std::stoi(order.substr(pos, it-pos), 0, 16));
pos = it;
}
coms.push_back(std::stoi(order.substr(pos), 0, 16));
gives:
0x01 = 1
0x02 = 2
0x03 = 3
0x236 = 556
0x1024 = 4132
0x9001 = 36865
I have a binary file from which I load whole text in unsigned char[] and a variable const uint32_t LITTLE_ENDIAN_ID = 0x49696949;
I need to compare first four characters from loaded char[] with given uint32_t.
Is that possible somehow?
If buff is your unsigned char[] buffer, you can do:
memcmp((unsigned char*)&LITTLE_ENDIAN_ID, buff, 4) == 0
memcmp is defined in string.h
yes, it's absolutely possible, but your question is underspecified. What you want to do is to take the first 4 characters of your character array and convert them into a uint32_t; the obvious question: which character corresponds to which byte of the 32-bit int? This is probably equivalent of asking if these bytes are stored in little-endian or big-endian order. Though now that I see your LITTLE_ENDIAN_ID I realize that it doesn't matter - it's (oddly) the same forwards and backwards.
Anyhow, what you want is either:
unsigned char[] text = ...
uint32_t x = text[0] << 24 + text[1] << 16 + text[2] << 8 + text[3];
if (x == LITTLE_ENDIAN_ID)
// do something
Or the same thing, but with
uint32_t x = text[3] << 24 + text[2] << 16 + text[1] << 8 + text[0];
Alternatively we could do something a little more unusual like
union {
uint32_t int_value;
unsigned char[4] characters;
} converter;
unsigned char[] text = ...
converter x;
for (int i=0; i < 4; i++)
x.characters[i] = text[i];
if (x.int_value == LITTLE_ENDIAN_ID)
// do something
This is probably closer to what you want if you are actually looking to test the endianness of the current system.
I have a process that listens to an UDP multi-cast broadcast and reads in the data as a unsigned char*.
I have a specification that indicates fields within this unsigned char*.
Fields are defined in the specification with a type and size.
Types are: uInt32, uInt64, unsigned int, and single byte string.
For the single byte string I can merely access the offset of the field in the unsigned char* and cast to a char, such as:
char character = (char)(data[1]);
Single byte uint32 i've been doing the following, which also seems to work:
uint32_t integer = (uint32_t)(data[20]);
However, for multiple byte conversions I seem to be stuck.
How would I convert several bytes in a row (substring of data) to its corresponding datatype?
Also, is it safe to wrap data in a string (for use of substring functionality)? I am worried about losing information, since I'd have to cast unsigned char* to char*, like:
std::string wrapper((char*)(data),length); //Is this safe?
I tried something like this:
std::string wrapper((char*)(data),length); //Is this safe?
uint32_t integer = (uint32_t)(wrapper.substr(20,4).c_str()); //4 byte int
But it doesn't work.
Thoughts?
Update
I've tried the suggest bit shift:
void function(const unsigned char* data, size_t data_len)
{
//From specifiction: Field type: uInt32 Byte Length: 4
//All integer fields are big endian.
uint32_t integer = (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | (data[3]);
}
This sadly gives me garbage (same number for every call --from a callback).
I think you should be very explicit, and not just do "clever" tricks with casts and pointers. Instead, write a function like this:
uint32_t read_uint32_t(unsigned char **data)
{
const unsigned char *get = *data;
*data += 4;
return (get[0] << 24) | (get[1] << 16) | (get[2] << 8) | get[3];
}
This extracts a single uint32_t value from a buffer of unsigned char, and increases the buffer pointer to point at the next byte of data in the buffer.
This assumes big-endian data, you need to have a well-defined idea of the buffer's endian-mode in order to interpret it.
Depends on the byte ordering of the protocol, for big-endian or so called network byte order do:
uint32_t i = data[0] << 24 | data[1] << 16 | data[2] << 8 | data[3];
Without commenting on whether it's a good idea or not, the reason why it doesn't work for you is that the result of wrapper.substring(20,4).c_str() is (uint32_t *), not (uint32_t). So if you do:
uint32_t * integer = (uint32_t *)(wrapper.substr(20,4).c_str(); it should work.
uint32_t integer = ntohl(*reinterpret_cast<const uint32_t*>(data + 20));
or (handles alignment issues):
uint32_t integer;
memcpy(&integer, data+20, sizeof integer);
integer = ntohl(integer);
The pointer way:
uint32_t n = *(uint32_t*)&data[20];
You will run into problems on different endian architectures though. The solution with bit shifts is better and consistent.
std::string wrapper((char*)(data),length); //Is this safe?
This should be safe since you specified the length of the data.
On the other hand if you did this:
std::string wrapper((char*)data);
The string length would be determined wherever the first 0 byte occurs, and you will more than likely chop off some data.
I have defined the following struct to represent an IPv4 header (up until the options field):
struct IPv4Header
{
// First row in diagram
u_int32 Version:4;
u_int32 InternetHeaderLength:4; // Header length is expressed in units of 32 bits.
u_int32 TypeOfService:8;
u_int32 TotalLength:16;
// Second row in diagram
u_int32 Identification:16;
u_int32 Flags:3;
u_int32 FragmentOffset:13;
// Third row in diagram
u_int32 TTL:8;
u_int32 Protocol:8;
u_int32 HeaderChecksum:16;
// Fourth row in diagram
u_int32 SourceAddress:32;
// Fifth row in diagram
u_int32 DestinationAddress:32;
};
I now also captured an IP frame with Wireshark. As an array literal it looks like this:
// Captured with Wireshark
const u_int8 cIPHeaderSample[] = {
0x45, 0x00, 0x05, 0x17,
0xA7, 0xE0, 0x40, 0x00,
0x2E, 0x06, 0x1B, 0xEA,
0x51, 0x58, 0x25, 0x02,
0x0A, 0x04, 0x03, 0xB9
};
My question is: How can I create a IPv4Header object using the array data?
This doesn't work because of incompatible endianness:
IPv4Header header = *((IPv4Header*)cIPHeaderSample);
I'm aware of the functions like ntohs and ntohl, but it can't figure out how to use them correctly:
u_int8 version = ntohs(cIPHeaderSample[0]);
printf("version: %x \n", version);
// Output is:
// version: 0
Can anyone help?
The most portable way to do it is one field at a time, using memcpy() for types longer than a byte. You don't need to worry about endianness for byte-length fields:
uint16_t temp_u16;
uint32_t temp_u32;
struct IPv4Header header;
header.Version = cIPHeaderSample[0] >> 4;
header.InternetHeaderLength = cIPHeaderSample[0] & 0x0f;
header.TypeOfServer = cIPHeaderSample[1];
memcpy(&temp_u16, &cIPHeaderSample[2], 2);
header.TotalLength = ntohs(temp_u16);
memcpy(&temp_u16, &cIPHeaderSample[4], 2);
header.Identification = ntohs(temp_u16);
header.Flags = cIPHeaderSample[6] >> 5;
memcpy(&temp_u16, &cIPHeaderSample[6], 2);
header.FragmentOffset = ntohs(temp_u16) & 0x1fff;
header.TTL = cIPHeaderSample[8];
header.Protocol = cIPHeaderSample[9];
memcpy(&temp_u16, &cIPHeaderSample[10], 2);
header.HeaderChecksum = ntohs(temp_u16);
memcpy(&temp_u32, &cIPHeaderSample[12], 4);
header.SourceAddress = ntohl(temp_u32);
memcpy(&temp_u32, &cIPHeaderSample[16], 4);
header.DestinationAddress = ntohl(temp_u32);
ntohl and ntohs don't operate on 1-byte fields. They are for 32 and 16 bit fields, respectively. You probably want to start with a cast or memcpy then byte swap the 16 and 32-bit fields if you need to. If you find that version isn't coming through with that approach without any byte swapping, then you have bit field troubles.
Bit fields are a big mess in C. Most people (including me) will advise you to avoid them.
You want to take a look at an the source for ip.h, that one is from FreeBSD. There should be a pre-dedined iphdr struct on your system, use that. Don't reinvent the wheel if you don't have to.
The easiest way to make this work is to take a pointer to the byte array from wireshark and cast it into a pointer to an iphdr. That'll let you use the correct header struct.
struct iphdr* hrd;
hdr = (iphdr*) cIPHeaderSample;
unsigned int version = hdr->version;
Also, htons takes in a 16-bit and changes the byte order, calling it on a 32-bit variable is just going to make a mess of things. You want htonl for 32-bit variables. Also note that for a byte there is no such thing as an endianess, it takes multiple bytes to have different endianess.
Updated:
I suggest you use memcpy to avoid the issues of bitfields and struct alignment, as this can get messy. The solution below works on a simple example, and can be easily extended:
struct IPv4Header
{
uint32_t Source;
};
int main(int argc, char **argv) {
const uint8_t cIPHeaderSample[] = {
0x45, 0x00, 0x05, 0x17
};
IPv4Header header;
memcpy(&header.Source, cIPHeaderSample, sizeof(uint8_t) * 4);
header.Source= ntohl(header.Source);
cout << hex << header.Source<< endl;
}
Output:
45000517