How can I convert short int[] to char*?
short int test[4000];
char* test2;
I tried this:
test2 = (char*)test[4000]
Error--> PTR is not valid
Like this:
test2 = (char*)test;
test[4000] means the 4001st item of array test, not the array itself.
In general though, this is not a good idea. In the very least, your program won't be portable between big-endian and little-endian systems. Nevertheless, if you are coding for a specific microcontroller for example, it's ok.
What you are doing is likely a bad idea, but...
test2 = (char*) test;
So you have a buffer in form of an array, and you want to write the binary contents of it into a file. You do it like this:
if (fwrite(test, sizeof(test), 1, f) < 1)
{
// handle error here (write failed)
}
fwrite() function is used to write binary data to files (and fread() to read). It takes a void* pointer, so it can work with any type (and C++ implicitly converts any other pointer/array to it).
The sizeof(test) determines the exact size of the array. If you don't want to write the whole of it (i.e. just filled part of it), you want to use sizeof(short) * N, where N is the number of filled elements.
1 here means that there is one block of data to write; so fwrite() will write the whole data at once. f is the file you're writing to. And it returns the number of blocks written (so 1 on success and 0 on failure).
For completeness, I should note that's only one of the approaches to use of fwrite(). It may be a bit more semantic to use something like:
fwrite(test, sizeof(short), N, f)
but then the fwrite() may actually write only part of the data, and you will need to care about that. In other words, if it returned less than N, you'd have to retry writing the remaining part.
Related
I'm having trouble when I want to read binary file into bitset and process it.
std::ifstream is("data.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg(0, is.end);
int length = is.tellg();
is.seekg(0, is.beg);
char *buffer = new char[length];
is.read(buffer, length);
is.close();
const int k = sizeof(buffer) * 8;
std::bitset<k> tmp;
memcpy(&tmp, buffer, sizeof(buffer));
std::cout << tmp;
delete[] buffer;
}
int a = 5;
std::bitset<32> bit;
memcpy(&bit, &a, sizeof(a));
std::cout << bit;
I want to get {05 00 00 00} (hex memory view), bitset[0~31]={00000101 00000000 00000000 00000000} but I get bitset[0~31]={10100000 00000000 00000000 00000000}
You need to learn how to crawl before you can crawl on broken glass.
In short, computer memory is an opaque box, and you should stop making assumptions about it.
Hyrum's law is the stupidest thing that has ever existed and if you stopped proliferating this cancer, that would be great.
What I'm about to write is common sense to every single competent C++ programmer out there. As trivial as breathing, and as important as breathing. It should be included in every single copy of C++ book ever, hammered into the heads of new programmers as soon as possible, but for some undefined reason, isn't.
The only things you can rely on when it comes to what I'm going to loosely define as "memory" is bits of a byte never being out of order. std::byte is such a type, and before it was added to the standard, we used unsigned char, they are more or less interchangeable, but you should prefer std::byte whenever you can.
So, what do I mean by this?
std::byte a = 0b10101000;
assert(((a >> 3) & 1) == 1); // always true
That's it, everything else is up to the compiler, your machine architecture and stars in the sky.
Oh, what, you thought you can just write int a = 0b1010100000000010; and expect something good? I'm sorry, but that's just not how things work in these savage lands. If you expect any order here, you will have to split it into bytes yourself, you cannot just cast this into std::byte bytes[2] and expect bytes[0] == 0b10101000. It is NEVER correct to assume anything here, if you do, one day your code will break, and by the time you realize that it's broken it will be too late, because it will be yet another undebuggable 30 million line of code legacy codebase half of which is only available in proprietary shared objects that we didn't have source code of since 1997. Goodluck.
So, what's the correct way? Luckily for us, binary shifts are architecture independent. int is guaranteed to be no smaller than 2 bytes, so that's the only thing this example relies on, but most machines have sizeof (int) == 4. If you needed more bytes, or exact number of bytes, you should be using appropriate type from Fixed width integer types.
int a = 0b1010100000000010;
std::byte bytes[2]; // always correct
// std::byte bytes[4]; // stupid assumption by inexperienced programmers
// std::byte bytes[sizeof (a)]; // flexible solution that needs more work
// we think in terms of 8 bits, don't care about the rest
bytes[0] = a & 0xFF;
// we need to skip possibly more than 8 bits to access next 8 bits however
bytes[1] = (a >> CHAR_BIT) & 0xFF;
This is the only absolutely correct way to convert sizeof (T) > 1 into array of bytes and if you see anything else then it's without a doubt subpar implementation that will stop working the moment you change a compiler and/or machine architecture.
The reverse is true too, you need to use binary shifts to convert a byte array to a size bigger than 1 byte.
On top of that, this only applies to primitive types. int, long, short... Sometimes you can rely on it working correctly with float or double as long as you always need IEEE 754 and will never need a machine so old or bizarre that it doesn't support IEEE 754. That's it.
If you think really long and hard, you may realize that this is no different from structs.
struct x {
int a;
int b;
};
What can we rely on? Well, we know that x will have address of a. That's it. If we want to set b, we need to access it by x.b, every other assumption is ALWAYS wrong with no ifs or buts. The only exception is if you wrote your own compiler and you are using your own compiler, but you're ignoring the standard and at that point anything is possible; that's fine, but it's not C++ anymore.
So, what can we infer from what we know now? Array of bytes cannot be just memcpy'd into a std::bitset. You don't know its implementation and you cannot know its implementation, it may change tomorrow and if your code breaks because of that then it's wrong and you're a failure of a programmer.
Want to convert an array of bytes to a bitset? Then go ahead and iterate over every single bit in the byte array and set each bit of the bitset however you need it to be, that's the only correct and sane solution. Every other solution is objectively wrong, now, and forever. Until someone decides to say otherwise in C++ standard. Let's just hope that it will never happen.
I need to write a boolean vector to a binary file. I searched stackoverflow for similar questions and didn't find anything that worked. I tried to write it myself, here's what I came up with:
vector<bool> bits = { 1, 0, 1, 1 };
ofstream file("file.bin", ios::out | ios::binary);
uint32_t size = (bits.size() / 8) + ((bits.size() % 8 == 0) ? 0 : 1);
file.write(reinterpret_cast<const char*>(&bits[0]), size);
I was hoping it would write 1011**** (where * is a random 1/0). But I got an error:
error C2102: '&' requires l-value
Theoretically, I could do some kind of loop and add 8 bools to char one by one, then write the char to a file, and then repeat many times. But I have a rather large vector, it will be very slow and inefficient. Is there any other way to write the entire vector at once. Bitset is not suitable since I need to add bits to the vector.
vector<bool> may or may not be packed and you can't access the internal data directly, at least not portable.
So you have to iterate over the bits one by one and combined them into bytes (yes, bytes, c++ has bytes now, don't use char, use uint8_t for older c++).
As you say writing out each byte is slow. But why would you write out each byte? You know how big the vector is so create a suitable buffer, fill it and write it out in one go. At a minimum write out chunks of bytes at once.
Since vector<bool> doesn't have the data() function, getting the address of its internal storage requires some ugly hacks (although it works for listdc++ I strongly discourage it)
file.write(
reinterpret_cast<const char*>(
*reinterpret_cast<std::uint64_t* const*>(&bits)),
size);
It's easy to find popular conventions for C-style I/O. What's more difficult is finding explanations as to why they are such. It's common to see a read with statements like:
fread(buffer, sizeof(buffer), 1, ptr);
How should a programmer think about using the parameters size and n of fread()?
For example, if my input file is 100 bytes, should I opt for a larger size with fewer n or read more objects of a smaller size?
If the size-to-be-read and n exceed the byte-size of an input file, what happens? Are the excess bytes that were read composed, colloquially speaking, of "junk values"?
size_t fread(void * restrict ptr, size_t size, size_t n, FILE * restrict stream);
How should a programmer think about using the parameters size and n of fread()?
When reading into an array:
size is the size of called pointer's de-referenced type.
n in the number of elements.
some_type destination[max_number_of_elements];
size_t num_read = fread(destination, sizeof *destination, max_number_of_elements, inf);
printf("Number of elements read %zu\n", num_read);
if (num_read == 0) {
if (feof(inf)) puts("Nothing read as at end-of-file");
else if (ferror(inf)) puts("Input error occurred");
else {
// since sizeof *destination and max_number_of_elements cannot be 0 here
// something strange has occurred (UB somewhere prior?)
}
For example, if my input file is 100 bytes, should I opt for a larger size with fewer n or read more objects of a smaller size?
In the case, the size of the data is 1, the max count 100.
#define MAX_FILE_SIZE 100
uint8_t destination[MAX_FILE_SIZE];
size_t num_read = fread(destination, sizeof *destination, MAX_FILE_SIZE, inf);
If the size-to-be-read and n exceed the byte-size of an input file, what happens?
The destination is not filled. Use the return value to determine.
Are the excess bytes that were read composed, colloquially speaking, of "junk values"?
No. There values before fread() remain the same, (as long as the return was not 0 and ferror() not set). If the destination was not initialized/assigned, then yes, it may be though of as junk.
Separate size, n allows fread() to function as desired even if size * n overflows size_t math. With current flat memory models, rarely is this needed.
First, the while (!feof(ptr)) is wrong and a really bad anti-pattern. There are situations where it can work, but it's almost always gratuitously more complicted that correct idiomatic usage. The return value of fread or other stdio read functions already tells you if it didn't succeed, and you usually need to be able to handle that immediately rather than waiting for the next loop iteration to start. If whatever resource you're learning from is teaching this while (!feof(ptr)) thing, you should probably stop trusting it as a source for learning C.
Now, on to your specific question about the size and n arguments: having them separate is completely gratuitous and not useful. Just pass the desired length to read for one of them, and 1 for the other. If you want to be able to determine how many bytes were already read if you hit end-of-file or an error, you need to pass 1 for size and the requested number of bytes as n. Otherwise, if any read shorter than expected is an error, it sometimes makes sense to switch them; then the only possible return values are 1 and 0 (success and error, respectively).
For an understanding of why it's the case that how you use these two arguments don't matter, all the stdio read functions, including fread, are specified as if they happened via repeated calls to fgetc. It does not matter if you have size*n such calls or n*size such calls, because multiplication of numbers commutes.
Is it possible to store data in integer form from 0 to 255 rather than 8-bit characters.Although both are same thing, how can we do it, for example, with write() function?
Is it ok to directly cast any integer to char and vice versa? Does something like
{
int a[1]=213;
write((char*)a,1);
}
and
{
int a[1];
read((char*)a,1);
cout<<a;
}
work to get 213 from the same location in the file? It may work on that computer but is it portable, in other words, is it suitable for cross-platform projects in that way? If I create a file format for each game level(which will store objects' coordinates in the current level's file) using this principle, will it work on other computers/systems/platforms in order to have loaded same level?
The code you show would write the first (lowest-address) byte of a[0]'s object representation - which may or may not be the byte with the value 213. The particular object representation of an int is imeplementation defined.
The portable way of writing one byte with the value of 213 would be
unsigned char c = a[0];
write(&c, 1);
You have the right idea, but it could use a bit of refinement.
{
int intToWrite = 213;
unsigned char byteToWrite = 0;
if ( intToWrite > 255 || intToWrite < 0 )
{
doError();
return();
}
// since your range is 0-255, you really want the low order byte of the int.
// Just reading the 1st byte may or may not work for your architecture. I
// prefer to let the compiler handle the conversion via casting.
byteToWrite = (unsigned char) intToWrite;
write( &byteToWrite, sizeof(byteToWrite) );
// you can hard code the size, but I try to be in the habit of using sizeof
// since it is better when dealing with multibyte types
}
{
int a = 0;
unsigned char toRead = 0;
// just like the write, the byte ordering of the int will depend on your
// architecture. You could write code to explicitly handle this, but it's
// easier to let the compiler figure it out via implicit conversions
read( &toRead, sizeof(toRead) );
a = toRead;
cout<<a;
}
If you need to minimize space or otherwise can't afford the extra char sitting around, then it's definitely possible to read/write a particular location in your integer. However, it can need linking in new headers (e.g. using htons/ntons) or annoying (using platform #defines).
It will work, with some caveats:
Use reinterpret_cast<char*>(x) instead of (char*)x to be explicit that you’re performing a cast that’s ordinarily unsafe.
sizeof(int) varies between platforms, so you may wish to use a fixed-size integer type from <cstdint> such as int32_t.
Endianness can also differ between platforms, so you should be aware of the platform byte order and swap byte orders to a consistent format when writing the file. You can detect endianness at runtime and swap bytes manually, or use htonl and ntohl to convert between host and network (big-endian) byte order.
Also, as a practical matter, I recommend you prefer text-based formats—they’re less compact, but far easier to debug when things go wrong, since you can examine them in any text editor. If you determine that loading and parsing these files is too slow, then consider moving to a binary format.
Lets us consider this snippet:
int s;
scanf("%c",&s);
Here I have used int, and not char, for variable s, now for using s for character conversion safely I have to make it char again because when scanf reads a character it only overwrites one byte of the variable it is assigning it to, and not all four that int has.
For conversion I could use s = (char)s; as the next line, but is it possible to implement the same by subtracting something from s ?
What you've done is technically undefined behaviour. The %c format calls for a char*, you've passed it an int* which will (roughly speaking) be reinterpreted. Even assuming that the pointer value is still good after reinterpreting, storing an arbitrary character to the first byte of an int and then reading it back as int is undefined behaviour. Even if it were defined, reading an int when 3 bytes of it are uninitialized, is undefined behaviour.
In practice it probably does something sensible on your machine, and you just get garbage in the top 3 bytes (assuming little-endian).
Writing s = (char)s converts the value from int to char and then back to int again. This is implementation-defined behaviour: converting an out-of-range value to a signed type. On different implementations it might clean up the top 3 bytes, it might return some other result, or it might raise a signal.
The proper way to use scanf is:
char c;
scanf("%c", &c);
And then either int s = c; or int s = (unsigned char)c;, according to whether you want negative-valued characters to result in a negative integer, or a positive integer (up to 255, assuming 8-bit char).
I can't think of any good reason for using scanf improperly. There are good reasons for not using scanf at all, though:
int s = getchar();
Are you trying to convert a digit to its decimal value? If so, then
char c = '8';
int n = c - '0';
n should 8 at this point.
That's probably not a good idea; GCC gives me a warning for that code:
main.c:10: warning: format ‘%c’ expects type ‘char *’, but
argument 2 has type ‘int *’
In this case you're ok since you're passing a pointer to more space than you need (for most systems), but what if you did it the other way around? Could be crash city. If you really want to do something like what you have there, just do the typecast or mask it - the mask will be endian-dependent.
As written this won't work reliably . The argument, &s, to scanf is a pointer to int and scanf is expecting a pointer to char. The two data type (int and char) have different sizes (at least on most architectures) so the data may get put in the wrong spot in memeory, and the other part of s may not get properly cleared.
The answers suggesting manipulation of the result after using a pointer to int rely on unspecified behavior (i.e. that scanf will put the character value it has in the least significant byte of the int you're pointing to), and are not safe.
Not but you could use the following:
s = s & 0xFF
That will blank out all of the data except the first byte. But in general all these ideas (and the ones above) are bad ideas, since not all systems store the lowest part of the integer in memory first. So if you ever have to port this code to a big endian system, you'll be screwed.
True, you may never have to port the code, but why write unportable code to begin with?
See this for more info:
http://en.wikipedia.org/wiki/Endianness