CFile writing question

CFile writing question - c++

In my MFC application, I am using CFile class to write data to file. I store a sequence of objects of class CParagraph using Write() method for each data member in order. I use then Read() method to read from file into memory. One of the CParagraph's members used to be of type int, but now I have to change it to size_t, as int cannot hold data large enough. If my application reads a file created before this change, and then a saves a CParagraph object back into the file, size of size_t will be passed to the Write method instead of size of int, so the file will grow. My question is this: can the data written after the object being modified and saved be overwritten and thus corrupted because the object became larger ?
Thanks.

Yes. If anything in the file changes size, everything after it must be re-saved.
It's common to save a "version" char as the first part of a file. Then, when you need to resize a variable (or change tons of things), you can change the version when saving. Then, while loading, you can check the version, and use the coresponding code to load it. Then you can still open files from older versions. Note that this version should only change when the file format changes, not when you rebuild/release.

Related

Writing data using libxl library

I am calling the xlCreateBook() function. Before that my program holds more memory because i am reading a huge file. After called the xlCreateBook() it returns the null pointr to the Sheet variable.
But once i loaded the less size file it xlCreateBook() functions working correctly. Help me to get out of this.

Which method returns a null pointer? AddSheet()? It's strange that it would be related to size to write, since at the sheet creation, you have not added data yet. What does Book::errorMessage() say when you get a null sheet pointer?
Maybe what you are trying to save is too big for the old xls format, and you should create an xlsx file, by using xlCreateXMLBook() instead?
If you really are lacking memory because of something else, there is not much that can be done by software if you lack physical memory. Except detecting it and returning a 'clean' error

Save objects from vector with pointers to file

I have a vector with pointers to a base class object so i can manage objects derived from that class.
vector <Product*> products;
i am trying to write these objects to a file while iterating through the vector
but i am not sure if this works correctly.
void Inventory :: saveProductsToFile()
{
ofstream outfile;
outfile.open("inventory.dat",ios::binary);
list <Product*> :: iterator it;
for(it=products.begin(); it!=products.end(); it++)
outfile.write((char*)*(it),sizeof(Product));
}
The file is created but i have no idea if i'm saving the actual objects themselves or their
addresses.Is this correct or is there another way?
This is how the file looks like:
ˆFG " H*c \Âõ(œ##pFG h*c b'v b#

You code can work. You cannot serialize polymorphic objects in
that way. For starters, you're writing the hidden vptr out
to disk; when you reread the data, it will not be valid. And
you're only writing out the data in the base class (Product),
because that's what sizeof(Product) evaluates to. And
finally, just writing a byte image of anything but a char[]
will probably mean that you won't be able to reread the data
some time in the future (after a compiler upgrade, or a machine
upgrade, or whatever).
What you have to do is to define a format (binary or text) for
the file, and write that. For the basic types, you can start
with something existing, like XDR or Protocol buffers, but
neither of these work that well with polymorphic types. For
polymorphic types, you have to start by defining how you
identify the type in question when rereading. This can be
tricky: there's nothing in std::type_info which helps, so you
need some means of establishing a relationship between your
(derived) types and the identifier. Then every derived class
must implement a write function, which first writes its type,
then writes its data out, one element by one. When reading, you
read the type, look up the appropriate read function for that
type in a map, and call that function, which then reads the data
one by one.
Finally, I might point out that all successful serialization
schemes I've seen depend on generated code. You describe your
types in a separate file, or in special markup (in a specially
marked comment in the C++), and have a program which reads that,
and generates the necessary code (and often the actual classes
you use).

Thats not how you "serialize" data. Like this the pointers are only valid during runtime or until you delete them (depending on what happens/stops first). Like this you wouldn't be able to restore your data, because after the program has stopped everything from your former memory becomes invalid. You would have to store the actual values from your class.

Possible to use an istream with memcpy

I am working on a C++ application that uses a library that is written in C (StormLib). The library has a function to read a file to a void* buffer(I am guessing a char[]). To which I would like to send to a different library to be processed. Hopefully it can be done with something like boost::iostreams::stream_buffer or boost::asio::streambuf to store the file as to be read by whatever method needs.
I have tried simply passing in a istream (that has a boost::asio::streambuf open) to the function and it gives me a BADACCESS as it tried to execute
memcpy((theFile),(myiStream),(full size of the file))
I would basically like a sort of "bag of bits" object that can be easily moved to different methods for conversion of the data in a structured fashion but I do not know how I should implement it.

What do you want to do with the data once you have it back in C++? If you just want the raw data, then you could (for example) just create a std::vector<char>, resize so it's big enough to hold all the data*, and then pass a pointer to its first element.
* How you determine "big enough" is a different question...

Extracting, then passing raw data into another class - How to avoid copying twice while maintaining encapsulation?

Consider a class Book with a stl container of class Page. each Page holds a screenshot, like page10.jpg in raw vector<char> form.
A Book is opened with a path to a zip, rar, or directory containing these screenshots, and uses respective methods of extracting the raw data, like ifstream inFile.read(buffer, size);, or unzReadCurrentFile(zipFile, buffer, size). It then calls the Page(const char* stream, int filesize) constructor.
Right now, it's clear that the raw data is being copied twice. Once to extract to Book's local buffer and a second time in the Page ctor to the Page::vector<char>. Is there a way to maintain encapsulation while getting rid of the middleman buffer?

In terms of code changes based on what you have already, the simplest is probably to give Page a setter taking a non-const vector reference or pointer, and swap it with the vector contained in the Page. The caller will be left holding an empty vector, but since the problem is excessive copying, presumably the caller doesn't want to keep the data:
void Book::addPage(ifstream file, streampos size) {
std::vector<char> vec(size);
file.read(&vec[0], size);
pages.push_back(Page()); // pages is a data member
pages.back().setContent(vec);
}
class Page {
std::vector<char> content;
public:
Page() : content(0) {} // create an empty page
void setContent(std::vector<char> &newcontent) {
content.swap(newcontent);
}
};
Some people (for example the Google C++ style guide) want reference parameters to be const, and would want you to pass the newcontent parameter as a pointer, to emphasise that it is non-const:
void setContent(std::vector<char> *newcontent) {
content.swap(*newcontent);
}
swap is fast - you'd expect it just to exchange the buffer pointers and sizes of the two vector objects.
Alternatively, give Page two different constructors: one for the zip file and one for the regular file, and have it be responsible for reading its own data. This is probably the cleanest, and it allows Page to be immutable, rather than being modified after construction. But actually you might not want that, since as you've noticed in a comment, adding the Page to a container copies the Page. So there's some benefit to being able to modify the Page to add the data after it has been cheaply constructed in the container: it avoids that extra copy without you needing to mess with containers of pointers. Still, the setContent function could just as easily take the file stream / zip file info as take a vector.
You could find or write a stream class which reads from a zipfile, so that Page can be responsible for reading data with just one constructor taking a stream. Or perhaps not a whole stream class, perhaps just an interface you design which reads data from a stream/zip/rar into a specified buffer, and Page can specify its internal vector as the buffer.
Finally, you could "mess with containers of pointers". Make pages a std::vector<boost::shared_ptr<Page> >, then do:
void Book::addPage(ifstream file, streampos size) {
boost::shared_ptr<Page> page(new Page(file, size));
pages.push_back(page); // pages is a data member
}
A shared_ptr has a modest overhead relative to just a Page (it makes an extra memory allocation for a small node containing a pointer and a refcount), but is much cheaper to copy. It's in TR1 too, if you have some implementation of that other than Boost.

use std::vector resize member to set the buffer size originally, and then use its buffer directly by using front()'s address.
std::vector<char> v;
v.resize(size);
strcpy(&v.front(), "testing");
Direct buffer access of std::vector is given by: &v.front()

Use std::vector to hold image data is a bad idea. I will use raw pointer or shared_ptr for this purpose. This prevents the buffer is copied twice.
Since you do care memory, holding all image data in memory is also a bad idea to me. A better case is to encapsulate it into a separate class. For example, ImageData. This class contains the row pointer of image data. The class can be initialized with a file path at the first, and the image data is loaded from disk when it is required.

I would have the Page class read its own data directly from the source, and the Book would only read as much of the source that it needed in order to locate each individual page (and to read any data belonging to the Book in general, such as a title).
For example, in the case of the data being stored in a directory, the Book would retrieve the list of files in the directory. For each file, it would pass the filename to a Page constructor which would open the file and load its contents.
As for the case where the book was stored in a zip file, I'm making some guesses as to how the library you're using works. I think you're using Minizip, which I'm not familiar with, but at a glance it looks like opening a file via Minizip gives you a handle. You pass that handle to unzGoToFirstFile() and unzGoToNextFile() to set the active subfile within the zip file (in your case, the active page), and use unzReadCurrentFile() to load the active subfile into a buffer. If that's the case, then your Book class would open the file using Minizip and set it to the first subfile. It would then pass the handle to the zip file to a constructor in Page, which would do the work of reading the subfile from the zip file. The Book would then call unzGoToNextFile() to move to the next subfile, and would create another page by again passing the handle to Page. It would continue doing this until there were no subfiles remaining. It would look something like:
Page::Page(zipFile file)
{
// TODO: Determine the required size of the buffer that will store the data
unsigned buffer_size;
data_.resize(buffer_size)
unzReadCurrentFile(file, &data_[0], buffer_size);
}
void Book::open(const std::string &filename)
{
zipFile file = unzOpen(filename.c_str());
int result = unzGoToFirstFile(file);
while (result == UNZ_OK)
{
pages_.push_back(Page(file));
unzGoToNextFile(file);
}
}
This is very simplified (and I might be using Minizip totally wrong, so beware), and it also assumes that Book stores a vector of Page objects named pages_, and that Page names its buffer data_.

You can introduce a third component that would hold all the images. The book would fill it in, the pages would read from it. If you want to restrict access you can close it down and make book and page its friends. If you have repetitions of images (say, each page has a footer and a header, or some pages have a logo, or whatever) you can make that third component a flyweight, thus making it even more efficient than what you strived for.
Make sure that you don't open all the pages when you open the book. That can be expensive. Have each page hold an identifier for its images (perhaps file-paths), and load the images only when you really want to view the page.

C++ Storing objects in a file

I have a list of objects that I would like to store in a file as small as possible for later retrieval. I have been carefully reading this tutorial, and am beginning (I think) to understand, but have several questions. Here is the snippet I am working with:
static bool writeHistory(string fileName)
{
fstream historyFile;
historyFile.open(fileName.c_str(), ios::binary);
if (historyFile.good())
{
list<Referral>::iterator i;
for(i = AllReferrals.begin();
i != AllReferrals.end();
i++)
{
historyFile.write((char*)&(*i),sizeof(Referral));
}
return true;
} else return false;
}
Now, this is adapted from the snippet
file.write((char*)&object,sizeof(className));
taken from the tutorial. Now what I believe it is doing is converting the object to a pointer, taking the value and size and writing that to the file. But if it is doing this, why bother doing the conversions at all? Why not take the value from the beginning? And why does it need the size? Furthermore, from my understanding then, why does
historyFile.write((char*)i,sizeof(Referral));
not compile? i is an iterator (and isn't an iterator a pointer?). or simply
historyFile.write(i,sizeof(Referral));
Why do i need to be messing around with addresses anyway? Aren't I storing the data in the file? If the addresses/values are persisting on their own, why can't i just store the addresses deliminated in plain text and than take their values later?
And should I still be using the .txt extension? < edit> what should I use instead then? I tried .dtb and was not able to create the file. < /edit> I actually can't even seem to get file to open without errors with the ios::binary flag. I'm also having trouble passing the filename (as a string class string, converted back by c_str(), it compiles but gives an error).
Sorry for so many little questions, but it all basically sums up to how to efficiently store objects in a file?

What you are trying to do is called serialization. Boost has a very good library for doing this.
What you are trying to do can work, in some cases, with some very important conditions. It will only work for POD types. It is only guaranteed to work for code compiled with the same version of the compiler, and with the same arguments.
(char*)&(*i)
says to take the iterator i, dereference it to get your object, take the address of it and treat it as an array of characters. This is the start of what is being written to the file. sizeof(Referral) is the number of bytes that will be written out.
An no, an iterator is not necessarily a pointer, although pointers meet all the requirements for an iterator.

Question #1 why does ... not compile?
Answer: Because i is not a Referral* -- it's a list::iterator ;; an iterator is an abstraction over a pointer, but it's not a pointer.
Question #2 should I still be using the .txt extension?
Answer: probably not. .txt is associated by many systems to the MIME type text/plain.
Unasked Question: does this work?
Answer: if a Referral has any pointers on it, NO. When you try to read the Referrals from the file, the pointers will be pointing to the location on memory where something used to live, but there is no guarantee that there is anything valid there anymore, least of all the thing that the pointers were pointing to originally. Be careful.

isn't an iterator a pointer?
An iterator is something that acts like a pointer from the outside. In most (perhaps all) cases, it is actually some form of object instead of a bare pointer. An iterator might contain a pointer as an internal member variable that it uses to perform its job, but it just as well might contain something else or additional variables if necessary.
Furthermore, even if an iterator has a simple pointer inside of it, it might not point directly at the object you're interested in. It might point to some kind of bookkeeping component used by the container class which it can then use to get the actual object of interest. Fortunately, we don't need to care what those internal details actually are.
So with that in mind, here's what's going on in (char*)&(*i).
*i returns a reference to the object stored in the list.
& takes the address of that object, thus yielding a pointer to the object.
(char*) casts that object pointer into a char pointer.
That snippet of code would be the short form of doing something like this:
Referral& r = *i;
Referral* pr = &r;
char* pc = (char*)pr;
Why do i need to be messing around
with addresses anyway?
And why does it need the size?
fstream::write is designed to write a series of bytes to a file. It doesn't know anything about what those bytes mean. You give it an address so that it can write the bytes that exist starting wherever that address points to. You give it a size so that it knows how many bytes to write.
So if I do:
MyClass ExampleObject;
file.write((char*)ExampleObject, sizeof(ExampleObject));
Then it writes all the bytes that exist directly within ExampleObject to the file.
Note: As others have mentioned, if the object you want to write has members that dynamically allocate memory or otherwise make use of pointers, then the pointed to memory will not be written by a single simple fstream::write call.
will serialization give a significant boost in storage efficiency?
In theory, binary data can often be both smaller than plain-text and faster to read and write. In practice, unless you're dealing with very large amounts of data, you'll probably never notice the difference. Hard drives are large and processors are fast these days.
And efficiency isn't the only thing to consider:
Binary data is harder to examine, debug, and modify if necessary. At least without additional tools, but even then plain-text is still usually easier.
If your data files are going to persist between different versions of your program, then what happens if you need to change the layout of your objects? It can be irritating to write code so that a version 2 program can read objects in a version 1 file. Furthermore, unless you take action ahead of time (like by writing a version number in to the file) then a version 1 program reading a version 2 file is likely to have serious problems.
Will you ever need to validate the data? For instance, against corruption or against malicious changes. In a binary scheme like this, you'd need to write extra code. Whereas when using plain-text the conversion routines can often help fill the roll of validation.
Of course, a good serialization library can help out with some of these issues. And so could a good plain-text format library (for instance, a library for XML). If you're still learning, then I'd suggest trying out both ways to get a feel for how they work and what might do best for your purposes.

What you are trying to do (reading and writing raw memory to/from file) will invoke undefined behaviour, will break for anything that isn't a plain-old-data type, and the files that are generated will be platform dependent, compiler dependent and probably even dependent on compiler settings.
C++ doesn't have any built-in way of serializing complex data. However, there are libraries that you might find useful. For example:
http://www.boost.org/doc/libs/1_40_0/libs/serialization/doc/index.html

Did you have already a look at boost::serialization, it is robust, has a good documentation, supports versioning and if you want to switch to an XML format instead of a binary one, it'll be easier.

Fstream.write simply writes raw data to a file. The first parameter is a pointer to the starting address of the data. The second parameter is the length (in bytes) of the object, so write knows how many bytes to write.
file.write((char*)&object,sizeof(className));
^
This line is converting the address of object to a char pointer.
historyFile.write((char*)i,sizeof(Referral));
^
This line is trying to convert an object (i) into a char pointer (not valid)
historyFile.write(i,sizeof(Referral));
^
This line is passing write an object, when it expects a char pointer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js