Read a dynamically allocated array in a struct from a binary file - c++

I have a struct. I would like to have an array in this struct, and then write this to a binary file, and then read it. However this array should be dynamically allocated. I'm not sure how should I approach this. My current guess is this:
I define and then write the struct to the file like this:
struct map {
int *tiles;
};
int main() {
map sample;
sample.tiles = new int[2];
sample.tiles[0]=1;
sample.tiles[1]=2;
ofstream file("sample.data", ios::binary);
file.write((char *)&sample, sizeof(sample));
file.close();
return 0;
}
Then read it like this in another program:
map test;
ifstream file("sample.data", ios::binary);
file.read((char *)&test, sizeof(test));
When I want to check the results with
cout << test.tiles[0];
I get a weirdly huge number, but clearly not the number I originally wrote to the file.
What is the right way to do this? How can I read an array without knowing its size?

file.write((char *)&sample, sizeof(sample));
This writes to the file the following structure:
struct map {
int *tiles;
};
That is, a structure that contains a single pointer. A pointer is an address in memory. So, what you end up writing to the file is one, meaningless, raw, memory address.
Which, obviously, is of absolutely no use, whatsoever, when it gets read back later. You did not write any of your integers to your file, just what their address in memory was, in a program that long ago terminated.
In order to do this correctly, you will need to:
Record how many int-egers the tiles pointer is pointing to.
fwrite() not the structure itself, but the integers that the tiles pointer is pointing to.
You will also need to write, to the file, how many integers there are in this array. If the file contains only these integers, this is not really needed, since you can simply read the entire contents of the file. But, if you expect to write some additional data in the file, it obviously becomes necessary to also record the size of the written array, in the file itself, so that it's possible to figure it out how many integers there are, when reading them back. The simplest way to do this is to write a single int, the size of the array first, followed by the contents of the array itself. And do the reverse process, when reading everything back.

Your sample object contains a pointer tiles and it's the pointer value that you're writing to the file. What you want to write is what the pointer points at.
In your example, you'd want to do file.write(sample.tiles, 2*sizeof(*sample.tiles));. It's 2* because you did new int[2]. In general, you'd also want to save the size (2 in this case) in the file so you know how many ints to read back in. In this simple case you could infer the 2 from the size of the file.

Related

Dynamic allocation of file data in C++

To be frank, I have an assignment that says, quite vaguely,
"If the file exists, the one-argument constructor allocates memory for the number of records contained in the file and copies them into memory."
Now, in considering this instruction, it would seem I am to allocate the dynamic memory /before/ copy the data over, and this seems in principle, impossible.
To dynamically allocate memory, to my knowledge, you require runtime definition of the size of the block to be reserved.
Given that the file size, or number of 'entries' is unknown, how can one possibly allocate that much memory? Does not the notion defeat the very purpose of dynamic allocation?
Solution wise, it would seem the only option is to parse the entire file, determining the size, allocate the proper amount of memory afterward, and then read through the file again, copying the data into the allocated memory.
Given that this must be a common operation in any program that reads file data, I wonder: What is the proper, or most efficient way of loading a file into RAM?
The notion of reading once to determine the size, and then again to copy seems very inefficient. I assume there is a way to jump to the end of the file to determine max length, which would make the process faster. Or perhaps using a static buffer and loading that in blocks to RAM?
Is it possible to read all of the data, and then move it into dynamic memory using the move operator? Or perhaps more efficient to use a linked list of some kind?
The most efficient method is to have the operating system map the file to memory. Search your OS API for "mmap" or "memory mapping".
Another approach is to seek to the end of the file and get the position (tellg()). This is the size of the file. Allocate an array in dynamic memory or create a std::vector reserving at least this amount of space.
Some Operating Systems have API you can call to get the size of a file (without having to seek to the end). You could use this method, then dynamically allocate the memory or use std::vector<char>.
You will need to come up with a plan if the file doesn't fit into memory.
If you need to read the entire file into memory, you could use istream::read using the file length.
It all depends on file format. One way to store records is to first write how many records are stored in file. If you have two phone numbers your file might look like this:
2
Jon
555-123
Mary
555-456
In this case the solution is straightforward:
// ...
is >> count;
record_type *record = new record_type[count];
for ( int i = 0; i < count; ++i )
is >> record[i].name >> record[i].number; // stream checks omitted
// ...
If the file does not store the number of records (I wouldn't do this), you will have to count them first, and then use the above solution:
// ...
int count = 0;
std::string dummy;
while ( is >> dummy >> dummy )
++count;
is.clear();
is.seekg( 0 );
// ...
A second solution for the second case, would be to write a dynamic container (I assume you are not allowed to use standard containers) and push the records as you read them:
// ...
list_type list;
record_type r;
while ( is >> r.name >> r.number )
list.push_back( r );
// ...
The solutions are ordered by complexity. I did not compile the examples above.

Efficient approaches for parsing objects from consecutive fixed size buffers that don't align with object size

I am trying to achieve something in C++, where I have an API that reads out objects from a byte array, while the array I pass in is constrained to a fixed size. After it parses out a complete object, the API knows the pointer location where it finishes reading (the beginning of next object to be read from but not complete in the current byte array).
Then I simply need to attach the remaining byte array with the next same fixed size array, and start reading a new object out at the pointer location as if it's the beginning of the new array.
I am new to C++ and I have the following approach working, but looks rather cumbersome and inefficient. It requires three vectors and lots of cleanup, reserve and insertion. I wonder if there is any alternative that may be more efficient, or at least as efficient but the code looks much more concise? I've been reading things like stringstream all such but they don't seem to require less memory copy (probably more as my API has to require byte array gets passed in). Thanks!
std::vector<char> checkBuffer;
std::vector<char> remainingBuffer;
std::vector<char> readBuffer(READ_BUFFER_SIZE);
//loop while I still have stuff to read from input stream
while (in.good()) {
in.read(readBuffer.data(), READ_BUFFER_SIZE);
//This is the holding buffer for the API to parse object from
checkBuffer.clear();
//concatenate what's remaining in remainingBuffer (initially empty)
//with what's newly read from input inside readBuffer
checkBuffer.reserve(remainingBuffer.size() + readBuffer.size());
checkBuffer.insert(checkBuffer.end(), remainingBuffer.begin(),
remainingBuffer.end());
checkBuffer.insert(checkBuffer.end(), readBuffer.begin(),
readBuffer.end());
//Call API here, and I will also get a pointerPosition back as to
//where I am inside the buffer when finishing reading the object
Object parsedObject = parse(checkBuffer, &pointerPosition)
//Then calculate the size of bytes not read in checkBuffer
int remainingBufSize = CheckBuffer.size() - pointerPosition;
remainingBuffer.clear();
remainingBuffer.reserve(remainingBufSize);
//Then just copy over whatever is remaining in the checkBuffer into
//remainingBuffer and make it be used in next iteration
remainingBuffer.insert(remainingBuffer.end(),
&checkBuffer[pointerPosition],&checkBuffer[checkBuffer.size()]);
}
Write append_chunk_into(in,vect). It appends one chunk of data at the end of vect. It does resizing as needed. As an aside, a char-sized does-not-zero-memory standard layout struct might be a better choice than char.
To append to end:
size_t old_size=vect.size();
vect.resize(vect.size()+new_bytes);
in.read(vect.data()+old_size, new_bytes);
or whatever the read api is.
To parse, feed it vect.data(). Get back the pointer of when it ends ptr.
Then `vect.erase(vect.begin(), vect.begin()+(ptr-vect.data())) to remove the parsed bytes. (only do this after you have parsed everything you can from the buffer, to save wasted mem moves).
One vector. It will reuse its memory, and never grow larger than read size+size of largest object-1. So you can pre-reserve it.
But really, usually most of the time spent will be io. So focus optimizarion on keepimg the data flowing smoothly.
If I were in your position I would keep only the readBuffer. I would reserve READ_BUFFER_SIZE +sizeof(LargestMessage).
After parsing you would be given back a pointer to the last thing the api was able to read in the vector. I would then convert the end iterator to a pointer &*readbuffer.end() and use it to bound the data we have to then copy to the head of the vector. once you have that data on the head of the vector you can then read the rest in using that same data call except you add in the number of bytes remaining. There does need to be some way of determining how many characters were in the remaining array but that shouldn't be insurmountable.

How to insert a char array into a float vector

Hmm, hello! I am trying to read a binary file that contains a number of float values at a specific position. As seemingly must be done with binary files, they were saved as arrays of bytes, and I have been searching for a way to convert them back to floats with no success. Basically I have a char* memory block, and am attempting to extract the floats stored at a particular location and seamlessly insert them into a vector. I wonder, would that be possible, or would I be forced to rely on arrays instead if I wished to save copying the data? And how could it possibly be done? Thank you ^_^
If you know where the floats are you can read them back:
float a = *(float*)buffer[position];
Then you can do whatever you need of a, including 'push_back'ing it into a vector.
Make sure you read the file in binary mode, and if you know the positions of the float in the file it should work.
I'd need to see the code that generated the file to be more efficient.

Reading data from hard drive and put it in a c++ container

I wrote a simple thread and its task is just to read the data from hard drive , put it in a container and tag it with timestamp and an unique Id. After that I will write the newly structured data into a memory mapped file.
The thing is I don't care about the internal structure of the data, I mean it can be in a Wav format (because in the real situtaion I will be dealing with some audio datas which are avergely 3 mb each) bu I won't do any operaions on that data. After inserting it in my struct I will be just dealing with the UniqueId and Data Tag. Sample structure would be something like:
Struct SampleData
{
long UniqueID;
... MyData; // the data which I am trying to read from hard drive
Time insertionTime;
}
So the question is how am i going to read a Wav data into this struct without knowing (because i don't need to) the internal structure of it? What will ve the ... part for instance. Is there any container type for a large data chunk?
For reading tha data may I use ifstream or any other method?
Try to keep it in a TLV format:
http://en.wikipedia.org/wiki/Type-length-value
EDIT:
A very simple container for TLV.
You'll be able to store the raw data as it is and you'll know which field you're reading and what's its size.
class TlvContainer
{
public:
unsigned long Type; // Maybe we have billions of types of objects?
unsigned long Size; // The size of the object.
unsigned char* Bytes; // This will hold the raw data.
};
When you write your data to the file, you'll have to know how many bytes it is, allocate the "Bytes" array and update the "Size" field.
When you read it from the file, you'll know how you have written it in. (You'll have to read the fields in the same order you have written them.)
For example, if you wrote it as: Type, Size, Bytes:
You'll first read sizeof(unsigned long) from the file to know the the type of the element.
Then you'll need to read another sizeof(unsigned long) to know how big is your real data.
and then you'll be able to read "Size" bytes from the file, knowing that after them there is a new element built the same way.
How about storing MyData as vector<unsigned char>?
You can use file streams to read but remember to use ios::binary mode. See http://www.cplusplus.com/doc/tutorial/files/
Here is sample code. You may want to add error checking. Also, I didn't try to compile this so there may be mistakes.
std::vector<unsigned char> data;
ifstream file("sample.wav", ios::binary);
while(!file.eof()) {
unsigned char byte;
file >> byte;
data.push_back(byte);
}

How to read an array in a binary file in C++

Currently I read arrays in C++ with ifstream, read and reinterpret_cast by making a loop on values. Is it possible to load for example an unsigned int array from a binary file in one time without making a loop ?
Thank you very much
Yes, simply pass the address of the first element of the array, and the size of the array in bytes:
// Allocate, for example, 47 ints
std::vector<int> numbers(47);
// Read in as many ints as 'numbers' has room for.
inFile.read(&numbers[0], numbers.size()*sizeof(numbers[0]));
Note: I almost never use raw arrays. If I need a sequence that looks like an array, I use std::vector. If you must use an array, the syntax is very similar.
The ability to read and write binary images is non-portable. You may not be able to re-read the data on another machine, or even on the same machine with a different compiler. But, you have that problem already, with the solution that you are using now.