I'm writing a data compression library and I need to write a sequence of encodings of integers (in a variety of integer encoders) in memory, store it in a file and then read all of them later.
Integer encodings have to be stored consecutively.
Since generally their size in bits isn't a multiple of 8, I don't have them aligned in memory.
In short, what I need is something which exposes functions like these:
unsigned int BitReader::read_bits(size_t bits);
unsigned int BitWriter::write_bits(unsigned int num, size_t bits);
void BitWriter::get_array(char** array);
BitReader::BitReader(char *array);
Since I need to invoke those functions in a (very) tight loop, efficiency is of paramount interest (especially in reading).
Do you know some C++ libraries which does what I want? Thanks.
If efficiency is your only requirement then get the address of the storage for the data and write it directly to storage. Then on restore allocate the same storage and perform the reverse operation. It's simple, fast, and has no learning curve.
Openning a stream to whatever input / output is the most affective although not efficient for huge amount of data. Streams provide a portable way of carrying out read/write operations thats why you can open stream in memory or disk. If you want to take control of the stream to the disk itself i would recommend using _bios_disk function google "_bios_disk" for more info.
Related
I tried to create an array with size upto 10^12 elements in c++. But I can only make array upto 1000001 size. i.e
long long int dp[1000001]
But I want to store data upto 10^12 values in the array. Any Idea how can I implement this in C++ ?
First, you must realize that the size of that array is nearly 8 TB. Does your computer have that much memory? Probably not. In such case, you cannot store that much data in memory, and practically cannot have such a large array.
Any Idea how can I implement this
Instead of an array in memory, you could store the data in the file system... Assuming you have 8 TB free storage. You can use a paging mechanism to read and write small pieces of the file at a time.
The simplest way to implement that in C++ is to use operating system functionality to map the file into the memory. That way the operating system takes care of the paging. There is no standard way to map files into memory in C++, so first step is to figure out what operating system you're using. POSIX standard specifies mmap function for this purpose.
Before doing that however, I recommend considering whether you actually need to store that much data. Perhaps you need a smarter algorithm instead.
I have a big file (let's assume I can make it binary), that can not fit in RAM, and I want to sort numbers from it. In the process I need to read/write a big amount of numbers from/to the file (from/to vector<int> or int[]) quickly, so I'd like not to read/write it one by one, but read/write it by blocks with a fixed size. How can I do it?
I have a big file (let's assume I can make it binary), that can not fit in RAM, and I want to sort numbers from it.
Given that the file is binary, perhaps the simplest and presumably efficient solution is to memory map the file. Unfortunately there is no standard interface to perform memory mapping. On POSIX systems, there is the mmap function.
Now, the memory mapped file is simply an array of raw bytes. Treating it as an array of integers is technically not allowed until C++20 where C-style "implicit creation of low level objects" is introduced. In practice, that already works on most current language implementations Note 1.
For this reinterpretation to work, the representation of the integers in the file must match the representation of integers used by the CPU. The file will not be portable to the same program running on other, incompatible systems.
We can simply use std::sort on this array. The operating system should take care of paging the file in and out of memory. The algorithm used by std::sort isn't necessarily optimised for this use case however. To find the optimal algorithm, you may need to do some research.
1 In case Pre-C++20 standard conformance is a concern, it is possible to iterate over the array, copy the underlying bytes into an integer, placement-new an integer object into the memory using the copied integer as the new value. A compiler can optimise these operations into zero instructions, and this makes the program's behaviour well defined.
You can use ostream::write to write into a file and istream::read to read from a file.
To make the process clean, it will be good to have the number of items also in the file.
Let's say you have a vector<int>.
You can use the following code to write its contents to a file.
std::vector<int> myData;
// .. Fill up myData;
// Open a file to write to, in binary mode.
std::ofstream out("myData.bin", std::ofstream::binary);
// Write the size first.
auto size = myData.size();
out.write(reinterpret_cast<char const*>(&size), sizeof(size));
// Write the data.
out.write(reinterpret_cast<char const*>(myData.data()), sizeof(int)*size);
You can read the contents of such a file using the following code.
std::vector<int> myData;
// Open the file to read from, in binary mode.
std::ifstream in("myData.bin", std::ifstream::binary);
// Read the size first.
auto size = myData.size();
in.read(reinterpret_cast<char*>(&size), sizeof(size));
// Resize myData so it has enough space to read into.
myData.resize(size);
// Read the data.
in.read(reinterpret_cast<char*>(myData.data()), sizeof(int)*size);
If not all of the data can fit into RAM, you can read and write the data in smaller chunks. However, if you read/write them in smaller chunks, I don't know how you would sort them.
I have an application that manages a large number of strings. Strings are in a path-like format and have many common parts, but without a clear rule. They are not paths on the file-system but can be considered like so.
I clearly need to optimize memory consumption but without a big performance sacrifice.
I am considering 2 options:
- implement a compressed_string class that stores data zipped, but i need a fixed dictionary and i cant find a library for this right now. I don't want a Huffman on bytes, I want it on words.
- implement some kind of flyweight pattern on string parts.
The problem looks like a common one and I'm wonder what is the best solution to it or if someone knows a library that targets this issue.
thanks
Although it might be tempting to tune a specific algorithm for your problem, it is likely to require an unreasonable amount of time and effort, while standard compression techniques will immediately provide you a great boost to solve your memory consumption problem.
The "standard" way to handle this issue is to chunk source data into small blocks (such as 256KB), and compress them individually. When accessing data into the block, you need to decode it first. Therefore, the optimal block size really depends on your application, i.e. the more the application streams, the larger the blocks; on the other hand, the more random access pattern, the smaller the block size.
If you are worried by the compression/decompression speed, use a high-speed algorithm. If decompression speed is the most important metric (for access time), something like LZ4 will provide you about 1GB/s decoding performance per core, so this gives you an idea of how many blocks per second you can decode.
If only decompression speed matters, you may use the high-compression variant LZ4-HC, which will boost compression ratio even more by about 30%, while also improving decompression speed.
Strings are in a path-like format and have many common parts, but without a clear rule.
In the sense that they are locators in a hierarchy of the form name, (separator, name)*? If so, you can use interning: store the name parts as char const * elements that point into a pool of strings. That way, you effectively compress a name that is used n times to just over n * sizeof(char const *) + strlen(name) bytes. The full path would become a sequence of interned names, e.g. an std::vector.
It might seem that sizeof(char const *) is big on 64-bit hardware, but you also save some of the allocation overhead. Or, if you know for some reason that you'll never need more than, say, 65536 strings, you might store them as
class interned_name
{
uint16_t tab_idx;
public:
char const *c_str() const
{
return NAME_TABLE[tab_idx];
}
};
where NAME_TABLE is an static std::unordered_map<uint16_t, char const *>.
Okay, so I've written a (rather unoptimized) program before to encode images to JPEGs, however, now I am working with MPEG-2 transport streams and the H.264 encoded video within them. Before I dive into programming all of this, I am curious what the fastest way to deal with the actual file is.
Currently I am file-mapping the .mts file into memory to work on it, although I am not sure if it would be faster to (for example) read 100 MB of the file into memory in chunks and deal with it that way.
These files require a lot of bit-shifting and such to read flags, so I am wondering that when I reference some of the memory if it is faster to read 4 bytes at once as an integer or 1 byte as a character. I thought I read somewhere that x86 processors are optimized to a 4-byte granularity, but I'm not sure if this is true...
Thanks!
Memory mapped files are usually the fastest operations available if you require your file to be available synchronously. (There are some asynchronous APIs that allow the O/S to reorder things for a slight speed increase sometimes, but that sounds like it's not helpful in your application)
The main advantage you're getting with the mapped files is that you can work in memory on the file while it is still being read from disk by the O/S, and you don't have to manage your own locking/threaded file reading code.
Memory reference wise, on the x86 memory is going to be read an entire line at a time no matter what you're actually working with. The extra time associated with non byte granular operations refers to the fact that integers need not be byte aligned. For example, performing an ADD will take more time if things aren't aligned on a 4 byte boundary, but for something like a memory copy there will be little difference. If you are working with inherently character data then it's going to be faster to keep it that way than to read everything as integers and bit shift things around.
If you're doing h.264 or MPEG2 encoding the bottleneck is probably going to be CPU time rather than disk i/o in any case.
If you have to access the whole file, it is always faster to read it to memory and do the processing there. Of course, it's also wasting memory, and you have to lock the file somehow so you won't get concurrent access by some other application, but optimization is about compromises anyway. Memory mapping is faster if you're skipping (large) parts of the file, because you don't have to read them at all then.
Yes, accessing memory at 4-byte (or even 8-byte) granularity is faster than accessing it byte-wise. Again it's a compromise - depending on what you have to do with the data afterwards, and how skilled you are at fiddling with the bits in an int, it might not be faster overall.
As for everything regarding optimization:
measure
optimize
measure
These are sequential bit-streams - you basically consume them one bit at a time without random-access.
You don't need to put a lot of effort into explicitly buffering reads and such in this scenario: the operating system will be buffering them for you anyway. I've written H.264 parsers before, and the time is completely dominated by the decoding and manipulation, not the IO.
My recommendation is to use a standard library and for parsing these bit-streams.
Flavor is such a parser, and the website even includes examples of MPEG-2 (PS) and various H.264 parts like M-Coder. Flavor builds native parsing code from a c++-like language; here's an quote from the MPEG-2 PS spec:
class TargetBackgroundGridDescriptor extends BaseProgramDescriptor : unsigned int(8) tag = 7
{
unsigned int(14) horizontal_size;
unsigned int(14) vertical_size;
unsigned int(4) aspect_ratio_information;
}
class VideoWindowDescriptor extends BaseProgramDescriptor : unsigned int(8) tag = 8
{
unsigned int(14) horizontal_offset;
unsigned int(14) vertical_offset;
unsigned int(4) window_priority;
}
Regarding to the best size to read from memory, I'm sure you will enjoy reading this post about memory access performance and cache effects.
One thing to consider about memory-mapping files is that a file with a size greater than the available address range will only be able to be map a portion of the file. To access the remainder of the file requires the first part to be unmapped and the next part to mapped in its place.
Since you're decoding mpeg streams you may want to use a double buffered approach with asynchronous file reading. It works like this:
blocksize = 65536 bytes (or whatever)
currentblock = new byte [blocksize]
nextblock = new byte [blocksize]
read currentblock
while processing
asynchronously read nextblock
parse currentblock
wait for asynchronous read to complete
swap nextblock and currentblock
endwhile
I need to load large models and other structured binary data on an older CD-based game console as efficiently as possible. What's the best way to do it? The data will be exported from a Python application. This is a pretty elaborate hobby project.
Requierements:
no reliance on fully standard compliant STL - i might use uSTL though.
as little overhead as possible. Aim for a solution so good. that it could be used on the original Playstation, and yet as modern and elegant as possible.
no backward/forward compatibility necessary.
no copying of large chunks around - preferably files get loaded into RAM in background, and all large chunks accessed directly from there later.
should not rely on the target having the same endianness and alignment, i.e. a C plugin in Python which dumps its structs to disc would not be a very good idea.
should allow to move the loaded data around, as with individual files 1/3 the RAM size, fragmentation might be an issue. No MMU to abuse.
robustness is a great bonus, as my attention span is very short, i.e. i'd change saving part of the code and forget the loading one or vice versa, so at least a dumb safeguard would be nice.
exchangeability between loaded data and runtime-generated data without runtime overhead and without severe memory management issues would be a nice bonus.
I kind of have a semi-plan of parsing in Python trivial, limited-syntax C headers which would use structs with offsets instead of pointers, and convenience wrapper structs/classes in the main app with getters which would convert offsets to properly typed pointers/references, but i'd like to hear your suggestions.
Clarification: the request is primarily about data loading framework and memory management issues.
On platforms like the Nintendo GameCube and DS, 3D models are usually stored in a very simple custom format:
A brief header, containing a magic number identifying the file, the number of vertices, normals, etc., and optionally a checksum of the data following the header (Adler-32, CRC-16, etc).
A possibly compressed list of 32-bit floating-point 3-tuples for each vector and normal.
A possibly compressed list of edges or faces.
All of the data is in the native endian format of the target platform.
The compression format is often trivial (Huffman), simple (Arithmetic), or standard (gzip). All of these require very little memory or computational power.
You could take formats like that as a cue: it's quite a compact representation.
My suggestion is to use a format most similar to your in-memory data structures, to minimize post-processing and copying. If that means you create the format yourself, so be it. You have extreme needs, so extreme measures are needed.
This is a common game development pattern.
The usual approach is to cook the data in an offline pre-process step. The resulting blobs can be streamed in with minimal overhead. The blobs are platform dependent and should contain the proper alignment & endian-ness of the target platform.
At runtime, you can simply cast a pointer to the in-memory blob file. You can deal with nested structures as well. If you keep a table of contents with offsets to all the pointer values within the blob, you can then fix-up the pointers to point to the proper address. This is similar to how dll loading works.
I've been working on a ruby library, bbq, that I use to cook data for my iphone game.
Here's the memory layout I use for the blob header:
// Memory layout
//
// p begining of file in memory.
// p + 0 : num_pointers
// p + 4 : offset 0
// p + 8 : offset 1
// ...
// p + ((num_pointers - 1) * 4) : offset n-1
// p + (num_pointers * 4) : num_pointers // again so we can figure out
// what memory to free.
// p + ((num_pointers + 1) * 4) : start of cooked data
//
Here's how I load binary blob file and fix up pointers:
void* bbq_load(const char* filename)
{
unsigned char* p;
int size = LoadFileToMemory(filename, &p);
if(size <= 0)
return 0;
// get the start of the pointer table
unsigned int* ptr_table = (unsigned int*)p;
unsigned int num_ptrs = *ptr_table;
ptr_table++;
// get the start of the actual data
// the 2 is to skip past both num_pointer values
unsigned char* base = p + ((num_ptrs + 2) * sizeof(unsigned int));
// fix up the pointers
while ((ptr_table + 1) < (unsigned int*)base)
{
unsigned int* ptr = (unsigned int*)(base + *ptr_table);
*ptr = (unsigned int)((unsigned char*)ptr + *ptr);
ptr_table++;
}
return base;
}
My bbq library isn't quite ready for prime time, but it could give you some ideas on how to write one yourself in python.
Good Luck!
I note that nowhere in your description do you ask for "ease of programming". :-)
Thus, here's what comes to mind for me as a way of creating this:
The data should be in the same on-disk format as it would be in the target's memory, such that it can simply pull blobs from disk into memory with no reformatting it. Depending on how much freedom you want in putting things into memory, the "blobs" could be the whole file, or could be smaller bits within it; I don't understand your data well enough to suggest how to subdivide it but presumably you can. Because we can't rely on the same endianness and alignment on the host, you'll need to be somewhat clever about translating things when writing the files on the host-side, but at least this way you only need the cleverness on one side of the transfer rather than on both.
In order to provide a bit of assurance that the target-side and host-side code matches, you should write this in a form where you provide a single data description and have some generation code that will generate both the target-side C code and the host-side Python code from it. You could even have your generator generate a small random "version" number in the process, and have the host-side code write this into the file header and the target-side check it, and give you an error if they don't match. (The point of using a random value is that the only information bit you care about is whether they match, and you don't want to have to increment it manually.)
Consider storing your data as BLOBs in a SQLite DB. SQLite is extremely portable and lighweight, ANSI C, has both C++ and Python interfaces. This will take care of large files, no fragmentation, variable-length records with fast access, and so on. The rest is just serialization of structs to these BLOBs.