I have a file laid out as such:
Offset Type Value [meaning]
0000 32 bit integer 60000 count of data
0004 32 bit integer 32 width
0008 32 bit integer 32 height
0012 byte 1 data; ext.
I read in the data by quickly dumping the file contents to a string.
Due to the size of the file, and the nature of the data and how its used id like to try to avoid copying it all over the place.
So I wish to use pointers to the data, to save time, but I cant get them to work quite right.
Id like to do something like this:
std::string mydata;
dumpdata("myfile.bin",mydata); //dumps data to reference string.
uint32_t* count = (uint32_t*)&mydata[4]; //32 bit integer.
I know i'm forgetting a cast or something but I cant figure out what it is.
What I tried didn't work.
To make this clear. I do not want to copy. I just want count to point to that area and treat it like an uint32_t even tho its an array of bytes.
This is probably a super easy questions, but its one of them "google thinks i'm trying to convert ascii" things, and i'm not.
I expect if the data is: 00002710H I will get the uint 10000. Instead I get 270,991,360
I just want count to point to that area and treat it like an uint32_t even tho its an array of bytes.
This is not possible in Standard C++ (nor C for that matter).
If your intent is to read the data, one solution would be to write:
uint32_t get_count(std::string const& mydata)
{
uint32_t x;
std::memcpy(&x, &mydata[4], 4);
return x;
}
Related
I'm having trouble when I want to read binary file into bitset and process it.
std::ifstream is("data.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg(0, is.end);
int length = is.tellg();
is.seekg(0, is.beg);
char *buffer = new char[length];
is.read(buffer, length);
is.close();
const int k = sizeof(buffer) * 8;
std::bitset<k> tmp;
memcpy(&tmp, buffer, sizeof(buffer));
std::cout << tmp;
delete[] buffer;
}
int a = 5;
std::bitset<32> bit;
memcpy(&bit, &a, sizeof(a));
std::cout << bit;
I want to get {05 00 00 00} (hex memory view), bitset[0~31]={00000101 00000000 00000000 00000000} but I get bitset[0~31]={10100000 00000000 00000000 00000000}
You need to learn how to crawl before you can crawl on broken glass.
In short, computer memory is an opaque box, and you should stop making assumptions about it.
Hyrum's law is the stupidest thing that has ever existed and if you stopped proliferating this cancer, that would be great.
What I'm about to write is common sense to every single competent C++ programmer out there. As trivial as breathing, and as important as breathing. It should be included in every single copy of C++ book ever, hammered into the heads of new programmers as soon as possible, but for some undefined reason, isn't.
The only things you can rely on when it comes to what I'm going to loosely define as "memory" is bits of a byte never being out of order. std::byte is such a type, and before it was added to the standard, we used unsigned char, they are more or less interchangeable, but you should prefer std::byte whenever you can.
So, what do I mean by this?
std::byte a = 0b10101000;
assert(((a >> 3) & 1) == 1); // always true
That's it, everything else is up to the compiler, your machine architecture and stars in the sky.
Oh, what, you thought you can just write int a = 0b1010100000000010; and expect something good? I'm sorry, but that's just not how things work in these savage lands. If you expect any order here, you will have to split it into bytes yourself, you cannot just cast this into std::byte bytes[2] and expect bytes[0] == 0b10101000. It is NEVER correct to assume anything here, if you do, one day your code will break, and by the time you realize that it's broken it will be too late, because it will be yet another undebuggable 30 million line of code legacy codebase half of which is only available in proprietary shared objects that we didn't have source code of since 1997. Goodluck.
So, what's the correct way? Luckily for us, binary shifts are architecture independent. int is guaranteed to be no smaller than 2 bytes, so that's the only thing this example relies on, but most machines have sizeof (int) == 4. If you needed more bytes, or exact number of bytes, you should be using appropriate type from Fixed width integer types.
int a = 0b1010100000000010;
std::byte bytes[2]; // always correct
// std::byte bytes[4]; // stupid assumption by inexperienced programmers
// std::byte bytes[sizeof (a)]; // flexible solution that needs more work
// we think in terms of 8 bits, don't care about the rest
bytes[0] = a & 0xFF;
// we need to skip possibly more than 8 bits to access next 8 bits however
bytes[1] = (a >> CHAR_BIT) & 0xFF;
This is the only absolutely correct way to convert sizeof (T) > 1 into array of bytes and if you see anything else then it's without a doubt subpar implementation that will stop working the moment you change a compiler and/or machine architecture.
The reverse is true too, you need to use binary shifts to convert a byte array to a size bigger than 1 byte.
On top of that, this only applies to primitive types. int, long, short... Sometimes you can rely on it working correctly with float or double as long as you always need IEEE 754 and will never need a machine so old or bizarre that it doesn't support IEEE 754. That's it.
If you think really long and hard, you may realize that this is no different from structs.
struct x {
int a;
int b;
};
What can we rely on? Well, we know that x will have address of a. That's it. If we want to set b, we need to access it by x.b, every other assumption is ALWAYS wrong with no ifs or buts. The only exception is if you wrote your own compiler and you are using your own compiler, but you're ignoring the standard and at that point anything is possible; that's fine, but it's not C++ anymore.
So, what can we infer from what we know now? Array of bytes cannot be just memcpy'd into a std::bitset. You don't know its implementation and you cannot know its implementation, it may change tomorrow and if your code breaks because of that then it's wrong and you're a failure of a programmer.
Want to convert an array of bytes to a bitset? Then go ahead and iterate over every single bit in the byte array and set each bit of the bitset however you need it to be, that's the only correct and sane solution. Every other solution is objectively wrong, now, and forever. Until someone decides to say otherwise in C++ standard. Let's just hope that it will never happen.
I'd like to process data provided by an external library.
The lib holds the data and provides access to it like this:
const uint8_t* data;
std::pair<const uint8_t*, const uint8_t*> getvalue() const {
return std::make_pair(data + offset, data + length);
}
I know that the current data contains two uint16_t numbers, but I need to change their endianness.
So altogether the data is 4 bytes long and contains this numbers:
66 4 0 0
So I'd like to get two uint16_t numbers with 1090 and 0 value respectively.
I can do basic arithmetic and in one place change the endianness:
pair<const uint8_t*, const uint8_t*> dataPtrs = library.value();
vector<uint8_t> data(dataPtrs.first, dataPtrs.second);
uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]
However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).
How can I better create uint16_t from uint8_t*? I'd avoid memcpy if possible, and use something more modern/safe.
Boost has some nice header-only endian library which can work, but it needs an uint16_t input.
For going further, Boost also provides data types for changing endianness, so I could create a struct:
struct datatype {
big_int16_buf_t data1;
big_int16_buf_t data2;
}
Is it possible to safely (paddings, platform-dependency, etc) cast a valid, 4 bytes long uint8_t* to datatype? Maybe with something like this union?
typedef union {
uint8_t u8[4];
datatype correct_data;
} mydata;
Maybe with something like this union?
No. Type punning with unions is not well defined in C++.
This would work assuming big_int16_buf_t and therefore datatype is trivially copiable:
datatype d{};
std::memcpy(&d, data, sizeof d);
uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]
However I'd like to do something more elegant
This is actually (subjectively, in my opinion) quite an elegant way because it works the same way on all systems. This reads the data as little endian, whether the CPU is little, big or some other endian. This is well portable.
However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).
The vector seems entirely pointless. You could just as well use:
const std::uint8_t* data = dataPtrs.first;
How can I better create uint16_t from uint8_t*?
If you are certain that the data sitting behind the uint8_t pointer is truly a uint16_t, C++ allows: auto u16 = *static_cast<uint16_t const*>(data); Otherwise, this is UB.
Given a big endian value, transforming this into little endian can be done with the ntohs function (under linux, other OSes have similar functions).
But beware, if the pointer you hold points to two individual uint8_t values, you mustn't convert them by pointer-cast. In that case, you have to manually specify which value goes where (conceivably with a function template). This will be the most portable solution, and in all likelihood the compiler will create efficient code out of the shifts and ors.
I'm making a LZW compressor that records its output in hexadecimal. It currently uses an uchar (OpenCV) for storing values, and outputs the uchar in hexadecimal.
However, I have been asked to allow the user to choose how many bytes are used when storing each value, so he could have, for example, 2 bytes for each value (or 32 bytes, it's up to him).
So, to manipulate the output, I was thinking of using an array of uchars (so, if the user asks for 32 bytes, I use an array of 32 uchars), and the question is: is there an easy way to write a big value to this array and outputting that value later without having to worry about what is in what index and other things? That is, to treat the array as just a x byte uchar? Should I use a vector?
Any help is appreciated.
You could use the following union
union pun_unsigned {
unsigned char c[sizeof(uint64_t)];
uint16_t u16;
uint32_t u32;
uint64_t u64;
};
Note that only conversions from or to (signed or unsigned) char are defined behaviour.
I want to have a data variable which will be an integer and its range will be from
0 - 1.000.000.
For example normal int variables can store numbers from -2,147,483,648 to 2,147,483,647.
I want the new data type to have less range so it can have LESS SIZE.
If there is a way to do that please let me know?
There isn't; you can't specify arbitrary ranges for variables like this in C++.
You need 20 bits to store 1,000,000 different values, so using a 32-bit integer is the best you can do without creating a custom data type (even then you'd only be saving 1 byte at 24 bits, since you can't allocate less than 8 bits).
As for enforcing the range of values, you could do that with a custom class, but I assume your goal isn't the validation but the size reduction.
So, there's no true good answer to this problem. Here are a few thoughts though:
If you're talking about an array of these 20 bit values, then perhaps the answers at this question will be helpful: Bit packing of array of integers
On the other hand, perhaps we are talking about an object, that has 3 int20_ts in it, and you'd like it to take up less space than it would normally. In that case, we could use a bitfield.
struct object {
int a : 20;
int b : 20;
int c : 20;
} __attribute__((__packed__));
printf("sizeof object: %d\n", sizeof(struct object));
This code will probably print 8, signifying that it is using 8 bytes of space, not the 12 that you would normally expect.
You can only have data types to be multiple of 8 bits. This is because, otherwise that data type won't be addressable. Imagine a pointer to a 5 bit data. That won't exist.
How do I assign value to uint32_t key[4]
I initially have this uint32_t iv[2] = {0xFFFFFFDD};
Then at my 2nd run.. I need assign a new value.. lets assume the new value is
10273348653513887325 (decimal) but recorded as string for now.
string value = "10273348653513887325";
But I want change the value of iv from 0xFFFFFFDD (hexadecimal) to 10273348653513887325 (decimal)
How do I do it?
You don't. 10273348653513887325 as a decimal will not fit into a uint32_t. You can pretend that iv is a 64 bit value thus:
uint64_t* piv = (uint64_t*)&iv[0];
*piv = decValue; // where dec value is the decimal conversion of the string
but after you've done that, what you'll really have are two 32 bit values that each have a portion of the value you're trying to store. Just because that's not awkward enough, the endianness of the architecture you're using comes into play. So don't do that.
Another alternative is to use an anonymous union of 2 32bit values and one 64 bit value.
It might merit pointing out that the assignment
uint32_t iv[2] = {0xFFFFFFDD };
results in the first element in iv having that value, and the second containing 0. Is that what you meant?
As pointed out by #hvd, there's some lack of clarity about what you're trying to achieve.