Optimal datafile format loading on a game console

Optimal datafile format loading on a game console - c++

I need to load large models and other structured binary data on an older CD-based game console as efficiently as possible. What's the best way to do it? The data will be exported from a Python application. This is a pretty elaborate hobby project.
Requierements:
no reliance on fully standard compliant STL - i might use uSTL though.
as little overhead as possible. Aim for a solution so good. that it could be used on the original Playstation, and yet as modern and elegant as possible.
no backward/forward compatibility necessary.
no copying of large chunks around - preferably files get loaded into RAM in background, and all large chunks accessed directly from there later.
should not rely on the target having the same endianness and alignment, i.e. a C plugin in Python which dumps its structs to disc would not be a very good idea.
should allow to move the loaded data around, as with individual files 1/3 the RAM size, fragmentation might be an issue. No MMU to abuse.
robustness is a great bonus, as my attention span is very short, i.e. i'd change saving part of the code and forget the loading one or vice versa, so at least a dumb safeguard would be nice.
exchangeability between loaded data and runtime-generated data without runtime overhead and without severe memory management issues would be a nice bonus.
I kind of have a semi-plan of parsing in Python trivial, limited-syntax C headers which would use structs with offsets instead of pointers, and convenience wrapper structs/classes in the main app with getters which would convert offsets to properly typed pointers/references, but i'd like to hear your suggestions.
Clarification: the request is primarily about data loading framework and memory management issues.

On platforms like the Nintendo GameCube and DS, 3D models are usually stored in a very simple custom format:
A brief header, containing a magic number identifying the file, the number of vertices, normals, etc., and optionally a checksum of the data following the header (Adler-32, CRC-16, etc).
A possibly compressed list of 32-bit floating-point 3-tuples for each vector and normal.
A possibly compressed list of edges or faces.
All of the data is in the native endian format of the target platform.
The compression format is often trivial (Huffman), simple (Arithmetic), or standard (gzip). All of these require very little memory or computational power.
You could take formats like that as a cue: it's quite a compact representation.
My suggestion is to use a format most similar to your in-memory data structures, to minimize post-processing and copying. If that means you create the format yourself, so be it. You have extreme needs, so extreme measures are needed.

This is a common game development pattern.
The usual approach is to cook the data in an offline pre-process step. The resulting blobs can be streamed in with minimal overhead. The blobs are platform dependent and should contain the proper alignment & endian-ness of the target platform.
At runtime, you can simply cast a pointer to the in-memory blob file. You can deal with nested structures as well. If you keep a table of contents with offsets to all the pointer values within the blob, you can then fix-up the pointers to point to the proper address. This is similar to how dll loading works.
I've been working on a ruby library, bbq, that I use to cook data for my iphone game.
Here's the memory layout I use for the blob header:
// Memory layout
//
// p begining of file in memory.
// p + 0 : num_pointers
// p + 4 : offset 0
// p + 8 : offset 1
// ...
// p + ((num_pointers - 1) * 4) : offset n-1
// p + (num_pointers * 4) : num_pointers // again so we can figure out
// what memory to free.
// p + ((num_pointers + 1) * 4) : start of cooked data
//
Here's how I load binary blob file and fix up pointers:
void* bbq_load(const char* filename)
{
unsigned char* p;
int size = LoadFileToMemory(filename, &p);
if(size <= 0)
return 0;
// get the start of the pointer table
unsigned int* ptr_table = (unsigned int*)p;
unsigned int num_ptrs = *ptr_table;
ptr_table++;
// get the start of the actual data
// the 2 is to skip past both num_pointer values
unsigned char* base = p + ((num_ptrs + 2) * sizeof(unsigned int));
// fix up the pointers
while ((ptr_table + 1) < (unsigned int*)base)
{
unsigned int* ptr = (unsigned int*)(base + *ptr_table);
*ptr = (unsigned int)((unsigned char*)ptr + *ptr);
ptr_table++;
}
return base;
}
My bbq library isn't quite ready for prime time, but it could give you some ideas on how to write one yourself in python.
Good Luck!

I note that nowhere in your description do you ask for "ease of programming". :-)
Thus, here's what comes to mind for me as a way of creating this:
The data should be in the same on-disk format as it would be in the target's memory, such that it can simply pull blobs from disk into memory with no reformatting it. Depending on how much freedom you want in putting things into memory, the "blobs" could be the whole file, or could be smaller bits within it; I don't understand your data well enough to suggest how to subdivide it but presumably you can. Because we can't rely on the same endianness and alignment on the host, you'll need to be somewhat clever about translating things when writing the files on the host-side, but at least this way you only need the cleverness on one side of the transfer rather than on both.
In order to provide a bit of assurance that the target-side and host-side code matches, you should write this in a form where you provide a single data description and have some generation code that will generate both the target-side C code and the host-side Python code from it. You could even have your generator generate a small random "version" number in the process, and have the host-side code write this into the file header and the target-side check it, and give you an error if they don't match. (The point of using a random value is that the only information bit you care about is whether they match, and you don't want to have to increment it manually.)

Consider storing your data as BLOBs in a SQLite DB. SQLite is extremely portable and lighweight, ANSI C, has both C++ and Python interfaces. This will take care of large files, no fragmentation, variable-length records with fast access, and so on. The rest is just serialization of structs to these BLOBs.

Related

c++ Alternative implementation to avoid shifting between RAM and SWAP memory

I have a program, that uses dynamic programming to calculate some information. The problem is, that theoretically the used memory grows exponentially. Some filters that I use limit this space, but for a big input they also can't avoid that my program runs out of RAM - Memory.
The program is running on 4 threads. When I run it with a really big input I noticed, that at some point the program starts to use the swap memory, because my RAM is not big enough. The consequence of this is, that my CPU-usage decreases from about 380% to 15% or lower.
There is only one variable that uses the memory which is the following datastructure:
Edit (added type) with CLN library:
class My_Map {
typedef std::pair<double,short> key;
typedef cln::cl_I value;
public:
tbb::concurrent_hash_map<key,value>* map;
My_Map() { map = new tbb::concurrent_hash_map<myType>(); }
~My_Map() { delete map; }
//some functions for operations on the map
};
In my main program I am using this datastructure as globale variable:
My_Map* container = new My_Map();
Question:
Is there a way to avoid the shifting of memory between SWAP and RAM? I thought pushing all the memory to the Heap would help, but it seems not to. So I don't know if it is possible to maybe fully use the swap memory or something else. Just this shifting of memory cost much time. The CPU usage decreases dramatically.

If you have 1 Gig of RAM and you have a program that uses up 2 Gb RAM, then you're going to have to find somewhere else to store the excess data.. obviously. The default OS way is to swap but the alternative is to manage your own 'swapping' by using a memory-mapped file.
You open a file and allocate a virtual memory block in it, then you bring pages of the file into RAM to work on. The OS manages this for you for the most part, but you should think about your memory usage so not to try to keep access to the same blocks while they're in memory if you can.
On Windows you use CreateFileMapping(), on Linux you use mmap(), on Mac you use mmap().

The OS is working properly - it doesn't distinguish between stack and heap when swapping - it pages you whatever you seem not to be using and loads whatever you ask for.
There are a few things you could try:
consider whether myType can be made smaller - e.g. using int8_t or even width-appropriate bitfields instead of int, using pointers to pooled strings instead of worst-case-length character arrays, use offsets into arrays where they're smaller than pointers etc.. If you show us the type maybe we can suggest things.
think about your paging - if you have many objects on one memory page (likely 4k) they will need to stay in memory if any one of them is being used, so try to get objects that will be used around the same time onto the same memory page - this may involve hashing to small arrays of related myType objects, or even moving all your data into a packed array if possible (binary searching can be pretty quick anyway). Naively used hash tables tend to flay memory because similar objects are put in completely unrelated buckets.
serialisation/deserialisation with compression is a possibility: instead of letting the OS swap out full myType memory, you may be able to proactively serialise them into a more compact form then deserialise them only when needed
consider whether you need to process all the data simultaneously... if you can batch up the work in such a way that you get all "group A" out of the way using less memory then you can move on to "group B"
UPDATE now you've posted your actual data types...
Sadly, using short might not help much because sizeof key needs to be 16 anyway for alignment of the double; if you don't need the precision, you could consider float? Another option would be to create an array of separate maps...
tbb::concurrent_hash_map<double,value> map[65536];
You can then index to map[my_short][my_double]. It could be better or worse, but is easy to try so you might as well benchmark....
For cl_I a 2-minute dig suggests the data's stored in a union - presumably word is used for small values and one of the pointers when necessary... that looks like a pretty good design - hard to improve on.
If numbers tend to repeat a lot (a big if) you could experiment with e.g. keeping a registry of big cl_Is with a bi-directional mapping to packed integer ids which you'd store in My_Map::map - fussy though. To explain, say you get 987123498723489 - you push_back it on a vector<cl_I>, then in a hash_map<cl_I, int> set [987123498723489 to that index (i.e. vector.size() - 1). Keep going as new numbers are encountered. You can always map from an int id back to a cl_I using direct indexing in the vector, and the other way is an O(1) amortised hash table lookup.

Optimizing for 3D imaging processes in C++

I am working with 3D volumetric images, possibly large (256x256x256). I have 3 such volumes that I want to read in and operate on. Presently, each volume is stored as a text file of numbers which I read in using ifstream. I save it as a matrix (This is a class I have written by dynamic allocation of a 3D array). Then I perform operations on these 3 matrices, addition, multiplication and even Fourier transform. So far, everything works well, but, it takes a hell lot of time, especially the Fourier transform since it has 6 nested loops.
I want to know how I can speed this up. Also, whether the fact that I have stored the images in text files makes a difference. Should I save them as binary or in some other easier/faster to read in format? Is fstream the fastest way I can read in? I use the same 3 matrices each time without changing them. Does that make a difference? Also, is pointer to pointer to pointer the best way to store a 3D volume? If not what else can I do?

Also, is pointer to pointer to pointer best way to store a 3d volume?
Nope thats usually very ineficient.
If not what else can I do?
Its likely that you will get better performance if you store it in a contiguous block, and use computed offsets into the block.
I'd usually use a structure like this:
class DataBlock {
unsigned int nx;
unsigned int ny;
unsigned int nz;
std::vector<double> data;
DataBlock(in_nx,in_ny,in_nz) :
nx(in_nx), ny(in_ny), nz(in_nz) , data(in_nx*in_ny*in_nz, 0)
{}
//You may want to make this check bounds in debug builds
double& at(unsigned int x, unsigned int y, unsigned int z) {
return data[ x + y*nx + z*nx*ny ];
};
const double& at(unsigned int x, unsigned int y, unsigned int z) const {
return data[ x + y*nx + z*nx*ny ];
};
private:
//Dont want this class copied, so remove the copy constructor and assignment.
DataBlock(const DataBlock&);
DataBlock&operator=(const DataBlock&);
};

Storing a large (2563 elements) 3D image file as plain text is a waste of resources.
Without loss of generality, if you have a plain text file for your image and each line of your file consists of one value, you will have to read several characters until you find the end of the line (for a 3-digit number, these will be 4 bytes; 3 bytes for the digits, 1 byte for newline). Afterwards you will have to convert these single digits to a number. When using binary, you directly read a fixed amount of bytes and you will have your number. You could and should write and read it as a binary image.
There are several formats for doing so, the one I would recommend is the meta image file format of VTK. In this format, you have a plain text header file and a binary file with the actual image data. With the information from the header file you will know how large your image is and what datatype you will be using. In your program, you then directly read the binary data and save it to a 3D array.
If you really want to speed things up, use CUDA or OpenCL which will be pretty fast for your applications.
There are several C++ libraries that can help you with writing, saving and manipulating image data, including the before-mentioned VTK and ITK.

2563 is a rather large number. Parsing 2563 text strings will take a considerable amount of time. Using binary will make the reading/writing process much faster because it doesn't require converting a number to/from string, and using much less space. For example to read the number 123 as char from a text file the program will need to read it as a string and convert from decimal to binary using lots of multiplies by 10. Whereas if you had written it directly as the binary value 0b01111011 you only need to read that byte back again into memory, no conversion at all.
Using hexadecimal number may also increase reading speed since each hex digit can map directly to binary value but if you need more speed, binary file is the way to go. Just a fread command is enough to load the whole 2563 bytes = 16MB file into memory in less than 1 sec. And when you're done, just fwrite it back to file. To speedup you can use SIMD (SSE/AVX), CUDA or another parallel processing technique. You can improve the speed even further by multithreading or by only saving the non zero values because in many cases, most values will often be 0's.
Another reason maybe because your array is large and each dimension is a power of 2. This has been discussed in many questions on SO:
Why is there huge performance hit in 2048x2048 versus 2047x2047 array multiplication?
Why is my program slow when looping over exactly 8192 elements?
Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?
You may consider changing the last dimension to 257 and try again. Or better use another algorithm like divide and conquer that's more cache friendly

You should add timers around the load and the process so you know which is taking the most time, and focus your optimization efforts on it. If you control the file format, make one that is more efficient to read. If it is the processing, I'll echo what previous folks have said, investigate efficient memory layout as well as GPGPU computing. Good luck.

Compressing strings with common parts

I have an application that manages a large number of strings. Strings are in a path-like format and have many common parts, but without a clear rule. They are not paths on the file-system but can be considered like so.
I clearly need to optimize memory consumption but without a big performance sacrifice.
I am considering 2 options:
- implement a compressed_string class that stores data zipped, but i need a fixed dictionary and i cant find a library for this right now. I don't want a Huffman on bytes, I want it on words.
- implement some kind of flyweight pattern on string parts.
The problem looks like a common one and I'm wonder what is the best solution to it or if someone knows a library that targets this issue.
thanks

Although it might be tempting to tune a specific algorithm for your problem, it is likely to require an unreasonable amount of time and effort, while standard compression techniques will immediately provide you a great boost to solve your memory consumption problem.
The "standard" way to handle this issue is to chunk source data into small blocks (such as 256KB), and compress them individually. When accessing data into the block, you need to decode it first. Therefore, the optimal block size really depends on your application, i.e. the more the application streams, the larger the blocks; on the other hand, the more random access pattern, the smaller the block size.
If you are worried by the compression/decompression speed, use a high-speed algorithm. If decompression speed is the most important metric (for access time), something like LZ4 will provide you about 1GB/s decoding performance per core, so this gives you an idea of how many blocks per second you can decode.
If only decompression speed matters, you may use the high-compression variant LZ4-HC, which will boost compression ratio even more by about 30%, while also improving decompression speed.

Strings are in a path-like format and have many common parts, but without a clear rule.
In the sense that they are locators in a hierarchy of the form name, (separator, name)*? If so, you can use interning: store the name parts as char const * elements that point into a pool of strings. That way, you effectively compress a name that is used n times to just over n * sizeof(char const *) + strlen(name) bytes. The full path would become a sequence of interned names, e.g. an std::vector.
It might seem that sizeof(char const *) is big on 64-bit hardware, but you also save some of the allocation overhead. Or, if you know for some reason that you'll never need more than, say, 65536 strings, you might store them as
class interned_name
{
uint16_t tab_idx;
public:
char const *c_str() const
{
return NAME_TABLE[tab_idx];
}
};
where NAME_TABLE is an static std::unordered_map<uint16_t, char const *>.

Extremely big integer multiplication and addition

Greetings,
I need to multiply two extremely long integer values stored in a text file (exported via GMP (MPIR, to be exact), so they can be any in any base). Now, I would usually just import these integers via the mpz_inp_str() function and perform the multiplication in RAM, however, these values are so long that I can't really load them (about 1 GB of data each). What would be the fastest way to do this? Perhaps there are some external libraries that do this sort of thing already? Are there any easily implementable methods for this (performance is not incredibly important as this operation would only be performed once or twice)?
tl;dr: I need to multiply values so large they don't fit into process memory limits (Windows).
Thank you for your time.

I don't know if there is a library that supports this, but you could use GMP/MPIR on parts of each really big number (RBN). That is, start by breaking each RBN into manageable, uniformly sized chunks (e.g. 10M digit chunks, expect an undersized chunk for most significant digits, also see below).
RBN1 --> A B C D E
RBN2 --> F G H I J
The chunking can be done in base 10, so just read <chuck_size> characters from the file for each piece. Then multiply chunks from each number one at a time.
AxF BxF CxF DxF ExF
+ AxG BxG CxG DxG ExG
+ AxH BxH CxH DxH ExH
+ AxI BxI CxI DxI ExI
+ AxJ BxJ CxJ DxJ ExJ
Perform each column of the final sum in memory. Then, keeping the carry in memory, write the column out to disk, repeat for next column... For carries, convert each column sum result to a string with GMP, write out the bottom <chunk size> portion of the result and read the top portion back in as a GMP int for the carry.
I'd suggest selecting a chunk size dynamically for each multiplication in order to keep each column addition in memory; the larger the numbers, the more column additions will need to be done, the smaller the chunk size will need to be.
For both reading and writing, I'd suggest using memory mapped files, boost has a nice interface for this (note that this does not load the entire file, it just basically buffers the IO on virtual memory). Open one mapped file for each input RBN numbers, and one output with size = size(RBN1) + size(RBN2) + 1; With memory mapped files, file access is treated as a raw char*, so you can read/write chunks directly using gmp c-string io methods. You will probably need to read into an intermediate buffer in order to NULL terminated strings for GMP (unless you want to temporarily alter the memory mapped file).
This isn't very easy to implement correctly, but then again this isn't a particularly easy problem (maybe just tedious). This approach has the advantage that it exactly mirrors what GMP is doing in memory, so the algorithms are well known.

How to handle changing data structures on program version update?

I do embedded software, but this isn't really an embedded question, I guess. I don't (can't for technical reasons) use a database like MySQL, just C or C++ structs.
Is there a generic philosophy of how to handle changes in the layout of these structs from version to version of the program?
Let's take an address book. From program version x to x+1, what if:
a field is deleted (seems simple enough) or added (ok if all can use some new default)?
a string gets longer or shorter? An int goes from 8 to 16 bits of signed / unsigned?
maybe I combine surname/forename, or split name into two fields?
These are just some simple examples; I am not looking for answers to those, but rather for a generic solution.
Obviously I need some hard coded logic to take care of each change.
What if someone doesn't upgrade from version x to x+1, but waits for x+2? Should I try to combine the changes, or just apply x -> x+ 1 followed by x+1 -> x+2?
What if version x+1 is buggy and we need to roll-back to a previous version of the s/w, but have already "upgraded" the data structures?
I am leaning towards TLV (http://en.wikipedia.org/wiki/Type-length-value) but can see a lot of potential headaches.
This is nothing new, so I just wondered how others do it....

I do have some code where a longer string is puzzled together from two shorter segments if necessary. Yuck. Here's my experience after 12 years of keeping some data compatible:
Define your goals - there are two:
new versions should be able to read what old versions write
old versions should be able to read what new versions write (harder)
Add version support to release 0 - At least write a version header. Together with keeping (potentially a lot of) old reader code around that can solve the first case primitively. If you don't want to implement case 2, start rejecting new data right now!
If you need only case 1, and and the expected changes over time are rather minor, you are set. Anyway, these two things done before the first release can save you many headaches later.
Convert during serialization - at run time, only keep the data in the "new format" in memory. Do necessary conversions and tests at persistence limits (convert to newest when reading, implement backward compatibility when writing). This isolates version problems in one place, helping to avoid hard-to-track-down bugs.
Keep a set of test data from all versions around.
Store a subset of available types - limit the actually serialized data to a few data types, such as int, string, double. In most cases, the extra storage size is made up by reduced code size supporting changes in these types. (That's not always a tradeoff you can make on an embedded system, though).
e.g. don't store integers shorter than the native width. (you might need to do that when you need to store long integer arrays).
add a breaker - store some key that allows you to intentionally make old code display an error message that this new data is incompatible. You can use a string that is part of the error message - then your old version could display an error message it doesn't know about - "you can import this data using the ConvertX tool from our web site" is not great in a localized application but still better than "Ungültiges Format".
Don't serialize structs directly - that's the logical / physical separation. We work with a mix of two, both having their pros and cons. None of these can be implemented without some runtime overhead, which can pretty much limit your choices in an embedded environment. At any rate, don't use fixed array/string lengths during persistence, that should already solve half of your troubles.
(A) a proper serialization mechanism - we use a bianry serializer that allows to start a "chunk" when storing, which has its own length header. When reading, extra data is skipped and missing data is default-initialized (which simplifies implementing "read old data" a lot in the serializationj code.) Chunks can be nested. That's all you need on the physical side, but needs some sugar-coating for common tasks.
(B) use a different in-memory representation - the in-memory reprentation could basically be a map<id, record> where id woukld likely be an integer, and record could be
empty (not stored)
a primitive type (string, integer, double - the less you use the easier it gets)
an array of primitive types
and array of records
I initially wrote that so the guys don't ask me for every format compatibility question, and while the implementation has many shortcomings (I wish I'd recognize the problem with the clarity of today...) it could solve
Querying a non existing value will by default return a default/zero initialized value. when you keep that in mind when accessing the data and when adding new data this helps a lot: Imagine version 1 would calculate "foo length" automatically, whereas in version 2 the user can overrride that setting. A value of zero - in the "calculation type" or "length" should mean "calculate automatically", and you are set.
The following are "change" scenarios you can expect:
a flag (yes/no) is extended to an enum ("yes/no/auto")
a setting splits up into two settings (e.g. "add border" could be split into "add border on even days" / "add border on odd days".)
a setting is added, overriding (or worse, extending) an existing setting.
For implementing case 2, you also need to consider:
no value may ever be remvoed or replaced by another one. (But in the new format, it could say "not supported", and a new item is added)
an enum may contain unknown values, other changes of valid range
phew. that was a lot. But it's not as complicated as it seems.

There's a huge concept that the relational database people use.
It's called breaking the architecture into "Logical" and "Physical" layers.
Your structs are both a logical and a physical layer mashed together into a hard-to-change thing.
You want your program to depend on a logical layer. You want your logical layer to -- in turn -- map to physical storage. That allows you to make changes without breaking things.
You don't need to reinvent SQL to accomplish this.
If your data lives entirely in memory, then think about this. Divorce the physical file representation from the in-memory representation. Write the data in some "generic", flexible, easy-to-parse format (like JSON or YAML). This allows you to read in a generic format and build your highly version-specific in-memory structures.
If your data is synchronized onto a filesystem, you have more work to do. Again, look at the RDBMS design idea.
Don't code a simple brainless struct. Create a "record" which maps field names to field values. It's a linked list of name-value pairs. This is easily extensible to add new fields or change the data type of the value.

Some simple guidelines if you're talking about a structure use as in a C API:
have a structure size field at the start of the struct - this way code using the struct can always ensure they're dealing only with valid data (for example, many of the structures the Windows API uses start with a cbCount field so these APIs can handle calls made by code compiled against old SDKs or even newer SDKs that had added fields
Never remove a field. If you don't need to use it anymore, that's one thing, but to keep things sane for dealing with code that uses an older version of the structure, don't remove the field.
it may be wise to include a version number field, but often the count field can be used for that purpose.
Here's an example - I have a bootloader that looks for a structure at a fixed offset in a program image for information about that image that may have been flashed into the device.
The loader has been revised, and it supports additional items in the struct for some enhancements. However, an older program image might be flashed, and that older image uses the old struct format. Since the rules above were followed from the start, the newer loader is fully able to deal with that. That's the easy part.
And if the struct is revised further and a new image uses the new struct format on a device with an older loader, that loader will be able to deal with it, too - it just won't do anything with the enhancements. But since no fields have been (or will be) removed, the older loader will be able to do whatever it was designed to do and do it with the newer image that has a configuration structure with newer information.
If you're talking about an actual database that has metadata about the fields, etc., then these guidelines don't really apply.

What you're looking for is forward-compatible data structures. There are several ways to do this. Here is the low-level approach.
struct address_book
{
unsigned int length; // total length of this struct in bytes
char items[0];
}
where 'items' is a variable length array of a structure that describes its own size and type
struct item
{
unsigned int size; // how long data[] is
unsigned int id; // first name, phone number, picture, ...
unsigned int type; // string, integer, jpeg, ...
char data[0];
}
In your code, you iterate through these items (address_book->length will tell you when you've hit the end) with some intelligent casting. If you hit an item whose ID you don't know or whose type you don't know how to handle, you just skip it by jumping over that data (from item->size) and continue on to the next one. That way, if someone invents a new data field in the next version or deletes one, your code is able to handle it. Your code should be able to handle conversions that make sense (if employee ID went from integer to string, it should probably handle it as a string), but you'll find that those cases are pretty rare and can often be handled with common code.

I have handled this in the past, in systems with very limited resources, by doing the translation on the PC as a part of the s/w upgrade process. Can you extract the old values, translate to the new values and then update the in-place db?
For a simplified embedded db I usually don't reference any structs directly, but do put a very light weight API around any parameters. This does allow for you to change the physical structure below the API without impacting the higher level application.

Lately I'm using bencoded data. It's the format that bittorrent uses. Simple, you can easily inspect it visually, so it's easier to debug than binary data and is tightly packed. I borrowed some code from the high quality C++ libtorrent. For your problem it's so simple as checking that the field exist when you read them back. And, for a gzip compressed file it's so simple as doing:
ogzstream os(meta_path_new.c_str(), ios_base::out | ios_base::trunc);
Bencode map(Bencode::TYPE_MAP);
map.insert_key("url", url.get());
map.insert_key("http", http_code);
os << map;
os.close();
To read it back:
igzstream is(metaf, ios_base::in | ios_base::binary);
is.exceptions(ios::eofbit | ios::failbit | ios::badbit);
try {
torrent::Bencode b;
is >> b;
if( b.has_key("url") )
d->url = b["url"].as_string();
} catch(...) {
}
I have used Sun's XDR format in the past, but I prefer this now. Also it's much easier to read with other languages such as perl, python, etc.

Embed a version number in the struct or, do as Win32 does and use a size parameter.
if the passed struct is not the latest version then fix up the struct.
About 10 years ago I wrote a similar system to the above for a computer game save game system. I actually stored the class data in a seperate class description file and if i spotted a version number mismatch then I coul run through the class description file, locate the class and then upgrade the binary class based on the description. This, obviously required default values to be filled in on new class member entries. It worked really well and it could be used to auto generate .h and .cpp files as well.

I agree with S.Lott in that the best solution is to separate the physical and logical layers of what you are trying to do. You are essentially combining your interface and your implementation into one object/struct, and in doing so you are missing out on some of the power of abstraction.
However if you must use a single struct for this, there are a few things you can do to help make things easier.
1) Some sort of version number field is practically required. If your structure is changing, you will need an easy way to look at it and know how to interpret it. Along these same lines, it is sometimes useful to have the total length of the struct stored in a structure field somewhere.
2) If you want to retain backwards compatibility, you will want to remember that code will internally reference structure fields as offsets from the structure's base address (from the "front" of the structure). If you want to avoid breaking old code, make sure to add all new fields to the back of the structure and leave all existing fields intact (even if you don't use them). That way, old code will be able to access the structure (but will be oblivious to the extra data at the end) and new code will have access to all of the data.
3) Since your structure may be changing sizes, don't rely on sizeof(struct myStruct) to always return accurate results. If you follow #2 above, then you can see that you must assume that a structure may grow larger in the future. Calls to sizeof() are calculated once (at compile time). Using a "structure length" field allows you to make sure that when you (for example) memcpy the struct you are copying the entire structure, including any extra fields at the end that you aren't aware of.
4) Never delete or shrink fields; if you don't need them, leave them blank. Don't change the size of an existing field; if you need more space, create a new field as a "long version" of the old field. This can lead to data duplication problems, so make sure to give your structure a lot of thought and try to plan fields so that they will be large enough to accommodate growth.
5) Don't store strings in the struct unless you know that it is safe to limit them to some fixed length. Instead, store only a pointer or array index and create a string storage object to hold the variable-length string data. This also helps protect against a string buffer overflow overwriting the rest of your structure's data.
Several embedded projects I have worked on have used this method to modify structures without breaking backwards/forwards compatibility. It works, but it is far from the most efficient method. Before long, you end up wasting space with obsolete/abandoned structure fields, duplicate data, data that is stored piecemeal (first word here, second word over there), etc etc. If you are forced to work within an existing framework then this might work for you. However, abstracting away your physical data representation using an interface will be much more powerful/flexible and less frustrating (if you have the design freedom to use such a technique).

You may want to take a look at how Boost Serialization library deals with that issue.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js