I am trying to copy data from an array of characters into a member of my class using memcpy. I set a breakpoint in the debugger write before memcpy. I checked all of the variables that I will be using and the i calculated how much space is left in the destination and it looks like it should work.
#include <iostream>
#include <cstdlib>
#include <cstring>
class bigNum
{
unsigned int dataLength;
unsigned long long int *data;
public:
bigNum(){
//long long int is 8 bytes so (2 * 1024)long long int = 16 KiB
// if that's not enough, we can always add more later
dataLength = 2048;
data = new unsigned long long [dataLength];
};
virtual ~bigNum(){delete[] data;};
//bigNum& operator=(const bigNum& other);
bigNum& set(char chars[], unsigned int charsLength) {
//calculate where we will start writing the data
void *writeStart = (void*)(
(unsigned long long)data + dataLength*64 - charsLength*8
);
//DEBUG -- set a couple of the array elements to watch in debugger
data[2047] = data[2046] = ~((unsigned long long)0);
//zero out the space before writeStart
std::memset(data, 0, dataLength*8 - charsLength);
//write the data starting at writeStart
std::memcpy(writeStart, chars, charsLength);
return *this;
}
};
using namespace std;
int main()
{
bigNum myNum;
char chars[9] = {'a'};
myNum.set(chars, 9);
system("PAUSE");
return 0;
}
The bug is in this statement here:
void *writeStart = (void*)(
(unsigned long long)data + dataLength*64 - charsLength*8
);
That is, it appears you're trying to get the right write location in bits, when you should be doing this in bytes.
When you have statements like dataLength*64 and charsLength*8, you're multiplying by their sizes in bits, when what you're dealing with - pointers - refers to bytes.
But still, it seems you're playing loose with integer sizes. Don't do that! It means your code will break on other machine architectures. Instead of assuming that unsigned long long is 64 bits, find out exactly how many bytes they are using sizeof(unsigned long long), or if you want a fixed width integer, use the corresponding type like uint64_t.
Then again, with C++, you should stick to standard containers like std::vector.
Related
typedef unsigned char Byte;
...
void ReverseBytes( void *start, int size )
{
Byte *buffer = (Byte *)(start);
for( int i = 0; i < size / 2; i++ ) {
std::swap( buffer[i], buffer[size - i - 1] );
}
}
What this method does right now is it reverses bytes in memory. What I would like to know is, is there a better way to get the same effect? The whole "size / 2" part seems like a bad thing, but I'm not sure.
EDIT: I just realized how bad the title I put for this question was, so I [hopefully] fixed it.
The standard library has a std::reverse function:
#include <algorithm>
void ReverseBytes( void *start, int size )
{
char *istart = start, *iend = istart + size;
std::reverse(istart, iend);
}
A performant solution without using the STL:
void reverseBytes(void *start, int size) {
unsigned char *lo = start;
unsigned char *hi = start + size - 1;
unsigned char swap;
while (lo < hi) {
swap = *lo;
*lo++ = *hi;
*hi-- = swap;
}
}
Though the question is 3 ½ years old, chances are that someone else will be searching for the same thing. That's why I still post this.
If you need to reverse there is a chance that you can improve your algorithms and just use reverse iterators.
If you're reversing binary data from a file with different endianness you should probably use the ntoh* and hton* functions, which convert specified data sizes from network to host order and vice versa. ntohl for instance converts a 32 bit unsigned long from big endian (network order) to host order (little endian on x86 machines).
I would review the stl::swap and make sure it's optimized; after that I'd say you're pretty optimal for space. I'm reasonably sure that's time-optimal as well.
Consider the following c++ code:
unsigned char* data = readData(..); //Let say data consist of 12 characters
unsigned int dataSize = getDataSize(...); //the size in byte of the data is also known (let say 12 bytes)
struct Position
{
float pos_x; //remember that float is 4 bytes
double pos_y; //remember that double is 8 bytes
}
Now I want to fill a Position variable/instance with data.
Position pos;
pos.pos_x = ? //data[0:4[ The first 4 bytes of data should be set to pos_x, since pos_x is of type float which is 4 bytes
pos.pos_x = ? //data[4:12[ The remaining 8 bytes of data should be set to pos_y which is of type double (8 bytes)
I know that in data, the first bytes correspond to pos_x and the rest to pos_y. That means the 4 first byte/character of data should be used to fill pos_x and the 8 remaining byte fill pos_y but I don't know how to do that.
Any idea? Thanks. Ps: I'm limited to c++11
You can use plain memcpy as another answer advises. I suggest packing memcpy into a function that also does error checking for you for most convenient and type-safe usage.
Example:
#include <cstring>
#include <stdexcept>
#include <type_traits>
struct ByteStreamReader {
unsigned char const* begin;
unsigned char const* const end;
template<class T>
operator T() {
static_assert(std::is_trivially_copyable<T>::value,
"The type you are using cannot be safely copied from bytes.");
if(end - begin < static_cast<decltype(end - begin)>(sizeof(T)))
throw std::runtime_error("ByteStreamReader");
T t;
std::memcpy(&t, begin, sizeof t);
begin += sizeof t;
return t;
}
};
struct Position {
float pos_x;
double pos_y;
};
int main() {
unsigned char data[12] = {};
unsigned dataSize = sizeof data;
ByteStreamReader reader{data, data + dataSize};
Position p;
p.pos_x = reader;
p.pos_y = reader;
}
One thing that you can do is to copy the data byte-by byte. There is a standard function to do that: std::memcpy. Example usage:
assert(sizeof pos.pos_x == 4);
std::memcpy(&pos.pos_x, data, 4);
assert(sizeof pos.pos_y == 8);
std::memcpy(&pos.pos_y, data + 4, 8);
Note that simply copying the data only works if the data is in the same representation as the CPU uses. Understand that different processors use different representations. Therefore, if your readData receives the data over the network for example, a simple copy is not a good idea. The least that you would have to do in such case is to possibly convert the endianness of the data to the native endianness (probably from big endian, which is conventionally used as the network endianness). Converting from one floating point representation to another is much trickier, but luckily IEE-754 is fairly ubiquitous.
What's the best method for returning an unsigned long from a vector of ints? I'm working on a BigInt class in c++ and I'm storing the large numbers in a vector. I want to write a method that will return this vector as a standard long, provided it isn't larger than unsigned long can hold. Thanks
Something along these lines, assuming the ints are stored in the vector with the least significant first:
size_t bits_in_int = std::numeric_limits<int>::digits;
size_t bits_in_ulong = std::numeric_limits<unsigned long>::digits;
unsigned long accumulator = 0;
size_t bits_so_far = 0;
for (unsigned long i : the_ints) {
size_t next_bits = bits_so_far + bits_in_int;
if (next_bits > bits_in_long) { /* failed, do something about it */}
accumulator += (i << bits_so_far);
bits_so_far = next_bits;
}
return accumulator;
Notes:
1) In practice you could save some bother because the number of loops is going to be either 1 or 2 on any vaguely normal-looking C++ implementation. So you could just write a case where you return the_ints[0] and a case where you return the_ints[0] + (the_ints[1] << bits_in_int).
2) I've been lazy. Because int is signed and unsigned long is unsigned, you can actually fit at least one int plus the least significant bit of another int into an unsigned long. For example you might find bits_in_int is 31 but bits_in_long is 32.
So actually in the "failed" case there is one last hope for peace, which is that (a) there is only one int left to process, and (b) its value fits in the remaining bits of the result. But like I say, I'm lazy, and I think I've shown the components you need to put together.
For this reason if no other, you should probably use a vector of unsigned int for your BigInt. It's not required that the width of unsigned long is a multiple of the number of bits in unsigned int, but it might be strange enough that you can ignore it.
Update for base 10 digits, stored most significant first:
if (the_ints.size() <= std::numeric_limits<unsigned long>::digits10 + 1) {
std::stringstream ss;
for (int i : the_ints) ss << char(i + '0');
unsigned long result;
if (ss >> result) return result;
}
/* failed, do something about it */
I am looking for any library of example parsing a binary msg in C++. Most people asks for reading a binary file, or data received in a socket, but I just have a set of binary messages I need to decode. Somebody mentioned boost::spirit, but I haven't been able to find a suitable example for my needs.
As an example:
9A690C12E077033811FFDFFEF07F042C1CE0B704381E00B1FEFFF78004A92440
where first 8 bits are a preamble, next 6 bits the msg ID (an integer from 0 to 63), next 212 bits are data, and final 24 bits are a CRC24.
So in this case, msg 26, I have to get this data from the 212 data bits:
4 bits integer value
4 bits integer value
A 9 bit float value from 0 to 63.875, where LSB is 0.125
4 bits integer value
EDIT: I need to operate at bit level, so a memcpy is not a good solution, since it copies a number of bytes. To get first 4-bit integer value I should get 2 bits from a byte, and another 2 bits from the next byte, shift each pair and compose. What I am asking for is a more elegant way of extracting the values, because I have about 20 different messages and wanted to reach a common solution to parse them at bit level.
And so on.
Do you know os any library which can easily achieve this?
I also found other Q/A where static_cast is being used. I googled about it, and for each person recommending this approach, there is another one warning about endians. Since I already have my message, I don't know if such a warning applies to me, or is just for socket communications.
EDIT: boost:dynamic_bitset looks promising. Any help using it?
If you can't find a generic library to parse your data, use bitfields to get the data and memcpy() it into an variable of the struct. See the link Bitfields. This will be more streamlined towards your application.
Don't forget to pack the structure.
Example:
#pragma pack
include "order32.h"
struct yourfields{
#if O32_HOST_ORDER == O32_BIG_ENDIAN
unsigned int preamble:8;
unsigned int msgid:6;
unsigned data:212;
unsigned crc:24;
#else
unsigned crc:24;
unsigned data:212;
unsigned int msgid:6;
unsigned int preamble:8;
#endif
}/*__attribute__((packed)) for gcc*/;
You can do a little compile time check to assert if your machine uses LITTLE ENDIAN or BIG ENDIAN format. After that define it into a PREPROCESSOR SYMBOL::
//order32.h
#ifndef ORDER32_H
#define ORDER32_H
#include <limits.h>
#include <stdint.h>
#if CHAR_BIT != 8
#error "unsupported char size"
#endif
enum
{
O32_LITTLE_ENDIAN = 0x03020100ul,
O32_BIG_ENDIAN = 0x00010203ul,
O32_PDP_ENDIAN = 0x01000302ul
};
static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
{ { 0, 1, 2, 3 } };
#define O32_HOST_ORDER (o32_host_order.value)
#endif
Thanks to code by Christoph # here
Example program for using bitfields and their outputs:
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <memory.h>
using namespace std;
struct bitfields{
unsigned opcode:5;
unsigned info:3;
}__attribute__((packed));
struct bitfields opcodes;
/* info: 3bits; opcode: 5bits;*/
/* 001 10001 => 0x31*/
/* 010 10010 => 0x52*/
void set_data(unsigned char data)
{
memcpy(&opcodes,&data,sizeof(data));
}
void print_data()
{
cout << opcodes.opcode << ' ' << opcodes.info << endl;
}
int main(int argc, char *argv[])
{
set_data(0x31);
print_data(); //must print 17 1 on my little-endian machine
set_data(0x52);
print_data(); //must print 18 2
cout << sizeof(opcodes); //must print 1
return 0;
}
You can manipulate bits for your own, for example to parse 4 bit integer value do:
char[64] byte_data;
size_t readPos = 3; //any byte
int value = 0;
int bits_to_read = 4;
for (size_t i = 0; i < bits_to_read; ++i) {
value |= static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
}
Floats usually sent as string data:
std::string temp;
temp.assign(_data+readPos, 9);
flaot value = std::stof(temp);
If your data contains custom float format then just extract bits and do your math:
char[64] byte_data;
size_t readPos = 3; //any byte
float value = 0;
int i = 0;
int bits_to_read = 9;
while (bits_to_read) {
if (i > 8) {
++readPos;
i = 0;
}
const int bit = static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
//here your code
++i;
--bits_to_read;
}
Here is a good article that describes several solutions to the problem.
It even contains the reference to the ibstream class that the author created specifically for this purpose (the link seems dead, though). The only other mention of this class I could find is in the bit C++ library here - it might be what you need, though it's not popular and it's under GPL.
Anyway, the boost::dynamic_bitset might be the best choice as it's time-tested and community-proven. But I have no personal experience with it.
I want to read sizeof(int) bytes from a char* array.
a) In what scenario's do we need to worry if endianness needs to be checked?
b) How would you read the first 4 bytes either taking endianness into consideration or not.
EDIT : The sizeof(int) bytes that I have read needs to be compared with an integer value.
What is the best approach to go about this problem
Do you mean something like that?:
char* a;
int i;
memcpy(&i, a, sizeof(i));
You only have to worry about endianess if the source of the data is from a different platform, like a device.
a) You only need to worry about "endianness" (i.e., byte-swapping) if the data was created on a big-endian machine and is being processed on a little-endian machine, or vice versa. There are many ways this can occur, but here are a couple of examples.
You receive data on a Windows machine via a socket. Windows employs a little-endian architecture while network data is "supposed" to be in big-endian format.
You process a data file that was created on a system with a different "endianness."
In either of these cases, you'll need to byte-swap all numbers that are bigger than 1 byte, e.g., shorts, ints, longs, doubles, etc. However, if you are always dealing with data from the same platform, endian issues are of no concern.
b) Based on your question, it sounds like you have a char pointer and want to extract the first 4 bytes as an int and then deal with any endian issues. To do the extraction, use this:
int n = *(reinterpret_cast<int *>(myArray)); // where myArray is your data
Obviously, this assumes myArray is not a null pointer; otherwise, this will crash since it dereferences the pointer, so employ a good defensive programming scheme.
To swap the bytes on Windows, you can use the ntohs()/ntohl() and/or htons()/htonl() functions defined in winsock2.h. Or you can write some simple routines to do this in C++, for example:
inline unsigned short swap_16bit(unsigned short us)
{
return (unsigned short)(((us & 0xFF00) >> 8) |
((us & 0x00FF) << 8));
}
inline unsigned long swap_32bit(unsigned long ul)
{
return (unsigned long)(((ul & 0xFF000000) >> 24) |
((ul & 0x00FF0000) >> 8) |
((ul & 0x0000FF00) << 8) |
((ul & 0x000000FF) << 24));
}
Depends on how you want to read them, I get the feeling you want to cast 4 bytes into an integer, doing so over network streamed data will usually end up in something like this:
int foo = *(int*)(stream+offset_in_stream);
The easy way to solve this is to make sure whatever generates the bytes does so in a consistent endianness. Typically the "network byte order" used by various TCP/IP stuff is
best: the library routines htonl and ntohl work very well with this, and they
are usually fairly well optimized.
However, if network byte order is not being used, you may need to do things in
other ways. You need to know two things: the size of an integer, and the byte order.
Once you know that, you know how many bytes to extract and in which order to put
them together into an int.
Some example code that assumes sizeof(int) is the right number of bytes:
#include <limits.h>
int bytes_to_int_big_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result = (result << CHAR_BIT) + bytes[i];
return result;
}
int bytes_to_int_little_endian(const char *bytes)
{
int i;
int result;
result = 0;
for (i = 0; i < sizeof(int); ++i)
result += bytes[i] << (i * CHAR_BIT);
return result;
}
#ifdef TEST
#include <stdio.h>
int main(void)
{
const int correct = 0x01020304;
const char little[] = "\x04\x03\x02\x01";
const char big[] = "\x01\x02\x03\x04";
printf("correct: %0x\n", correct);
printf("from big-endian: %0x\n", bytes_to_int_big_endian(big));
printf("from-little-endian: %0x\n", bytes_to_int_little_endian(little));
return 0;
}
#endif
How about
int int_from_bytes(const char * bytes, _Bool reverse)
{
if(!reverse)
return *(int *)(void *)bytes;
char tmp[sizeof(int)];
for(size_t i = sizeof(tmp); i--; ++bytes)
tmp[i] = *bytes;
return *(int *)(void *)tmp;
}
You'd use it like this:
int i = int_from_bytes(bytes, SYSTEM_ENDIANNESS != ARRAY_ENDIANNESS);
If you're on a system where casting void * to int * may result in alignment conflicts, you can use
int int_from_bytes(const char * bytes, _Bool reverse)
{
int tmp;
if(reverse)
{
for(size_t i = sizeof(tmp); i--; ++bytes)
((char *)&tmp)[i] = *bytes;
}
else memcpy(&tmp, bytes, sizeof(tmp));
return tmp;
}
You shouldn't need to worry about endianess unless you are reading the bytes from a source created on a different machine, e.g. a network stream.
Given that, can't you just use a for loop?
void ReadBytes(char * stream) {
for (int i = 0; i < sizeof(int); i++) {
char foo = stream[i];
}
}
}
Are you asking for something more complicated than that?
You need to worry about endianess only if the data you're reading is composed of numbers which are larger than one byte.
if you're reading sizeof(int) bytes and expect to interpret them as an int then endianess makes a difference. essentially endianness is the way in which a machine interprets a series of more than 1 bytes into a numerical value.
Just use a for loop that moves over the array in sizeof(int) chunks.
Use the function ntohl (found in the header <arpa/inet.h>, at least on Linux) to convert from bytes in the network order (network order is defined as big-endian) to local byte-order. That library function is implemented to perform the correct network-to-host conversion for whatever processor you're running on.
Why read when you can just compare?
bool AreEqual(int i, char *data)
{
return memcmp(&i, data, sizeof(int)) == 0;
}
If you are worrying about endianness when you need to convert all of integers to some invariant form. htonl and ntohl are good examples.