How to avoid reloading big data on program start - c++

My question is all about tips and tricks. I'm currently working on project, where I have one very big(~1Gb) file with data. First, I need to extract data. This extraction takes 10 mins. Then I do calculations. Next calculation depends on previous. Let's call them calculation1, calculation2 and so on. Assuming, that I've done extraction part right, I currently face two problems:
Every time I launch program it works 10 mins least. I cannot avoid it, so I have to plan debugging.
For every next calculation it takes more time.
Thinking of first problem, I assumed, that some sort of database may help, if database is faster, than reading file, which I doubt.
Second problem might be overcomed if I split my big program in smaller programs, each of which will do: read file-do stuff-write file. So next stage always can read file from previous, for debug. But it introduces many wasted code for file I/O.
I think that both these problems could be solved by some strategy like: write, and test extract module, than launch it and let it extract all data into RAM. Than write calculation1, and launch it to somehow grab data directly from RAM of extract module. And so on with every next calculation. So my questions are:
Are there tips and tricks to minimize loads from files?
Are there ways to share RAM and objects between programs?
By the way I write this task on Perl because I need it quick, but I'll rewrite it on C++ or C# later, so any language-specific or language-agnostic answers welcome.
Thank you!
[EDIT]
File of data does not change it is like big immutable source of knowledge. And it is not exactly 1Gb and it does not take 10 minutes to read it. I just wanted to say, that file is big and time to read it is considerable. On my machine 1 Gb read+parse file into right objects takes about minute. Which is still pretty bad.
[/EDIT]

On my current system Perl copies the whole 1GB file in memory in 2 seconds. So I believe your problem is not reading the file but parsing it.
So the straightforward solution I can think of is to preparse it by, for instance, converting your data into actual code source. I mean, you can prepare your data and hardcode it in your script directly (using another file of course).
However, if reading is an actual problem (which I doubt) you can use database that store the data in the memory (example). It will be faster anyway just because your database reads the data once upon starting and you don't restart your database as often as your program.

The idea for solving this type of problems can be as follows:
Go for 3 programs:
Reader
Analyzer
Writer
and exchange data between them using shared memory.
For that big file I guess you have considerable amount of data of one object type which you can store in circular buffer in shared memory (I recommend using boost::interprocess).
Reader will continuously read data from the input file and store it in shared memory.
In the meantime, once is enough data read for doing calculations, the Analyzer will start processing it and store results into another circular buffer shared memory file.
Once there are some calculations in the second shared memory the Writer will read them and store them into final output file.
You need to make sure all the processes are synchronized properly so that they do their job simultanouesly and you don't lose the data (the data is not being overwritten before is processed or saved into final file).

I like the answer doqtor gives, but to prevent data from being overwritten, a nice helper class to enable and disable critical sections of code within a thread will do the trick.
// Note: I think sealed may be specific to Visual Studio Compiler.
// CRITICAL_SECTION is defined in Windows.h - If on another OS,
// look for similar structure.
class BlockThread sealed {
private:
CRITICAL_SECTION* m_pCriticalSection;
public:
explicit BlockThread( CRITICAL_SECTION& criticalSection );
~BlockThread();
private:
BlockThread( const BlockThread& c );
BlockThread& operator=( const BlockThread& c ); // Not Implement
};
BlockThread::BlockThread( CRITICAL_SECTION& criticalSection ) {
m_pCriticalSection = &criticalSection;
}
BlockThread::~BlockThread() {
LeaveCriticalSection( m_pCriticalSection
}
A class such as this would allow you to block specific threads if you are within the bounds of a critical section where shared memory is being used
and another thread currently has access to it. This will cause this thread
of code to be locked until the current thread is done its work and this
class goes out of scope.
To use this class with in another class is fairly simple: in the class that you
want to block a thread within its .cpp file you need to create a static variable of this type and call the API's function to initialize it. Then
you can use the BlockThread class to lock this thread.
SomeClass.cpp
#include "SomeClass.h"
#include "BlockThread.h"
static CRITICAL_SECTION s_criticalSection;
SomeClass::SomeClass {
// Do This First
InitializeCriticalSection( &s_criticalSection );
// Class Stuff Here
}
SomeClass::~SomeClass() {
// Class Stuff Here
// Do This Last
DeleteCriticalSection( &s_criticalSection );
}
// To Use The BlockThread
SomeClass::anyFunction() {
// When Your Condition Is Met & You Know This Is Critical
// Call This Before The Critical Computation Code.
BlockThread blockThread( s_criticalSection );
}
And that is about it, once this object goes out of scope the static member
is cleaned up within the objects destructor and when this object goes out
of scope so does the BlockThread class and its Destructor cleans it up there.
And now this shared memory can be used. You would usually want to use this class if you are traversing over containers to either add, insert, or find and access elements when this data is a shared type.
As for the 3 different threads running in memory on the same data set a good concept is to have 3 or 4 buffers each about 4MB in size and have them work in a rotating order. Buff1 gets data then Buff2 gets data, while Buff2 is getting data Buff 1 is either parsing the data it or passing it off to be stored for computation, then Buff1 waits until Buff3 or 4 is done, pending on how many buffers you have. Then this process starts again. This is the same principle that is used with Sound Buffers when reading in sound files for doing an Audio Stream, or sending batches of triangles to a graphics card. Another words it is a Batch type process.

Related

What is the best way to synchronize data when using QQuickFramebufferObject?

So I am using a QQuickFramebufferObject and QQuickFramebufferObject::Renderer in my Qt application. As mentioned here:
To avoid race conditions and read/write issues from two threads it is important that the renderer and the item never read or write shared variables. Communication between the item and the renderer should primarily happen via the QQuickFramebufferObject::Renderer::synchronize() function.
So I have to synchronize whatever data I render when QQuickFramebufferObject::Renderer::synchronize() is called. However, because many times the data that is sent to the render thread can be quite large I would like to avoid copying that data (which is stored in a DataObject), so for now I am passing a std::shared_ptr<DataObject> in the function and assigning that to a private member of my QQuickFramebufferObject::Renderer class. This approaches works fine, but I am not sure if this is the "correct" way of doing things. What approach can I take in order to share/transfer the data between the GUI thread and the rendering thread?
For data that is too big to copy in the synchronize() method, use a synchronization object to manage access to the data; lock it when writing, release it when finished and lock it in when rendering and access the data directly. You are safe as long as only one thread is accessing the data at a time.
The risk of skipped frames increases the longer the synchronization object is locked. Locking for writing longer than half the optimal render quantum (8.5ms = ~16.7ms/2) will incur dropped frames, but you probably have much more happening in your app so the real number is lower.
Alternatively, you could use a circular buffer for your large data structures with a protected index variable so you can simultaneously write to one structure while reading from another. Increment the index variable when all data is ready to display and call QQuickItem::update().

How to templatize code for a class member function based on constructor parameter

I have a highly performance-sensitive (read low latency requirement) C++ 17 class for logging that has member functions that can either log locally or can log remotely depending upon the flags with which the class is implemented. "Remote Logging" or "Local Logging" functionality is fully defined at the time when the object is constructed.
The code looks something like this
class Logger {
public:
Logger(bool aIsTx):isTx_(aIsTx) {init();}
~Logger() {}
uint16_t fbLog(const fileId_t aId, const void *aData, const uint16_t aSz){
if (isTx_)
// do remote logging
return remoteLog(aId, aData, aSz);
else
// do local logging
return fwrite(aData, aSz, 1,fd_[aId]);
}
protected:
bool isTx_
}
What I would like to do is
Some way of removing the if(isTx_) such that the code to be used gets defined at the time of instantiating.
Since the class objects are used by multiple other modules, I would not like to templatize the class because this will require me to wrap two templatized implementations of the class in an interface wrapper which will result in v-table call every time a member function is called.
You cannot "templetize" the behaviour, since you want the choice to be done at runtime.
In case you want to get rid of the if because of performance, rest assured that it will have negligible impact compared to disk access or network communication. Same goes for virtual function call.
If you need low latency, I recommend considering asynchronous logging: The main thread would simply copy the message into an internal buffer. Memory is way faster than disk or network, so there will be much less latency. You can then have a separate service thread that waits for the buffer to receive messages, and handles the slow communication.
As a bonus, you don't need branches or virtual functions in the main thread since it is the service thread that decides what to do with the messages.
Asynchronisity is not an easy approach however. There are many cases that must be taken into consideration:
How to synchronise the access to the buffer (I suggest trying out a lock free queue instead).
How much memory should the buffer be allowed to occupy? Without limit it can consume too much if the program logs faster than can be written.
If the buffer limit is reached, what should the main thread do? It either needs to fall back to synchronously waiting while the buffer is being processed or messages need to be discarded.
How to flush the buffer when the program crashes? If it is not possible, then the last messages may be lost - which probably are what you need to figure out why the program crashed in the first place.
Regardless of choice: If performance is critical, then try out multiple approaches and measure.

"live C++ objects that live in memory mapped files"?

So I read this interview with John Carmack in Gamasutra, in which he talks about what he calls "live C++ objects that live in memory mapped files". Here are some quotes:
JC: Yeah. And I actually get multiple benefits out of it in that... The last iOS Rage project, we shipped with some new technology that's using some clever stuff to make live C++ objects that live in memory mapped files, backed by the flash file system on here, which is how I want to structure all our future work on PCs.
...
My marching orders to myself here are, I want game loads of two seconds on our PC platform, so we can iterate that much faster. And right now, even with solid state drives, you're dominated by all the things that you do at loading times, so it takes this different discipline to be able to say "Everything is going to be decimated and used in relative addresses," so you just say, "Map the file, all my resources are right there, and it's done in 15 milliseconds."
(Full interview can be found here)
Does anybody have any idea what Carmack is talking about and how you would set up something like this? I've searched the web for a bit but I can't seem to find anything on this.
The idea is that you have all or part of your program state serialized into a file at all times by accessing that file via memory mapping. This will require you not having usual pointers because pointers are only valid while your process lasts. Instead you have to store offsets from the mapping start so that when you restart the program and remap the file you can continue working with it. The advantage of this scheme is that you don't have separate serialization which means you don't have extra code for that and you don't need to save all the state at once - instead your (all or most of) program state is backed by the file at all times.
You'd use placement new, either directly or via custom allocators.
Look at EASTL for an implementation of (subset) STL that is specifically geared to working well with custom allocation schemes (such as required for games running on embedded systems or game consoles).
A free subset of EASTL is here:
http://gpl.ea.com/
a clone at https://github.com/paulhodge/EASTL
We use for years something we call "relative pointers" which is some kind of smart pointer. It is inherently nonstandard, but works nice on most platforms. It is structured like:
template<class T>
class rptr
{
size_t offset;
public:
T* operator->() { return reinterpret_cast<T*>(reinterpret_cast<char*>(this)+offset); }
};
This requires that all objects are stored into the same shared memory (which can be a filemap too). It also usually requires us to only store our own compatible types in there, as well as havnig to write own allocators to manage that memory.
To always have consistent data, we use snapshots via COW mmap tricks (which work in userspace on linux, no idea about other OSs).
With the big move to 64bit we also sometimes just use fixed mappings, as the relative pointers incur some runtime overhead. With usually 48bits of address space, we chose a reserved memry area for our applications that we always map such a file to.
This reminds me of a file system I came up with that loaded level files of CD in an amazingly short time (it improved the load time from 10s of seconds to near instantaneous) and it works on non-CD media as well. It consisted of three versions of a class to wrap the file IO functions, all with the same interface:
class IFile
{
public:
IFile (class FileSystem &owner);
virtual Seek (...);
virtual Read (...);
virtual GetFilePosition ();
};
and an additional class:
class FileSystem
{
public:
BeginStreaming (filename);
EndStreaming ();
IFile *CreateFile ();
};
and you'd write the loading code like:
void LoadLevel (levelname)
{
FileSystem fs;
fs.BeginStreaming (levelname);
IFile *file = fs.CreateFile (level_map_name);
ReadLevelMap (fs, file);
delete file;
fs.EndStreaming ();
}
void ReadLevelMap (FileSystem &fs, IFile *file)
{
read some data from fs
get names of other files to load (like textures, object definitions, etc...)
for each texture file
{
IFile *texture_file = fs.CreateFile (some other file name)
CreateTexture (texture_file);
delete texture_file;
}
}
Then, you'd have three modes of operation: debug mode, stream file build mode and release mode.
In each mode, the FileSystem object would create different IFile objects.
In debug mode, the IFile object just wrapped the standard IO functions.
In stream file building, the IFile object also wrapped the standard IO but had the additional functions of writing to the stream file (the owner FileSystem opened the stream file) every byte that was read, and writing the return value of any file pointer position queries (so if anything needed to know a file size, that information is written to the stream file). This would sort of concatenate the various files into one big file, but only the data that was actually read.
The release mode would create an IFile that did not open files or seek within files, it just read from the streaming file (as opened by the owner FileSystem object).
This means that in release mode, all data is read in one sequential series of reads (the OS would buffer it nicely) rather than lots of seeks and reads. This is ideal for CDs where seek times are really slow. Needless to say, this was developed for a CD based console system.
A side effect is that the data is stripped of unnecessary meta data that would normally be skipped.
It does have drawbacks - all the data for a level is in one file. These can get quite large and the data can't be shared between files, if you had a set of textures, say, that were common across two or more levels, the data would be duplicated in each stream file. Also, the load process must be the same every time the data is loaded, you can't conditionally skip or add elements to a level.
As Carmack indicates many games (and other applications) loading code is structured lika a lot of small reads and allocations.
Instead of doing this you do a single fread (or equivalent) of say a level file into memory and just fixup the pointers afterwards.

proper way to use lock file(s) as locks between multiple processes

I have a situation where 2 different processes(mine C++, other done by other people in JAVA) are a writer and a reader from some shared data file. So I was trying to avoid race condition by writing a class like this(EDIT:this code is broken, it was just an example)
class ReadStatus
{
bool canRead;
public:
ReadStatus()
{
if (filesystem::exists(noReadFileName))
{
canRead = false;
return;
}
ofstream noWriteFile;
noWriteFile.open (noWriteFileName.c_str());
if ( ! noWriteFile.is_open())
{
canRead = false;
return;
}
boost::this_thread::sleep(boost::posix_time::seconds(1));
if (filesystem::exists(noReadFileName))
{
filesystem::remove(noWriteFileName);
canRead= false;
return;
}
canRead= true;
}
~ReadStatus()
{
if (filesystem::exists(noWriteFileName))
filesystem::remove(noWriteFileName);
}
inline bool OKToRead()
{
return canRead;
}
};
usage:
ReadStatus readStatus; //RAII FTW
if ( ! readStatus.OKToRead())
return;
This is for one program ofc, other will have analogous class.
Idea is:
1. check if other program created his "I'm owner file", if it has break else go to 2.
2. create my "I'm the owner" file, check again if other program created his own, if it has delete my file and break else go to 3.
3. do my reading, then delete mine "I'm the owner file".
Please note that rare occurences when they both dont read or write are OK, but the problem is that I still see a small chance of race conditions because theoretically other program can check for the existence of my lock file, see that there isnt one, then I create mine, other program creates his own, but before FS creates his file I check again, and it isnt there, then disaster occurs. This is why I added the one sec delay, but as a CS nerd I find it unnerving to have code like that running.
Ofc I don't expect anybody here to write me a solution, but I would be happy if someone does know a link to a reliable code that I can use.
P.S. It has to be files, cuz I'm not writing entire project and that is how it is arranged to be done.
P.P.S.: access to data file isn't reader,writer,reader,writer.... it can be reader,reader,writer,writer,writer,reader,writer....
P.P.S: other process is not written in C++ :(, so boost is out of the question.
On Unices the traditional way of doing pure filesystem based locking is to use dedicated lockfiles with mkdir() and rmdir(), which can be created and removed atomically via single system calls. You avoid races by never explicitly testing for the existence of the lock --- instead you always try to take the lock. So:
lock:
while mkdir(lockfile) fails
sleep
unlock:
rmdir(lockfile)
I believe this even works over NFS (which usually sucks for this sort of thing).
However, you probably also want to look into proper file locking, which is loads better; I use F_SETLK/F_UNLCK fcntl locks for this on Linux (note that these are different from flock locks, despite the name of the structure). This allows you to properly block until the lock is released. These locks also get automatically released if the app dies, which is usually a good thing. Plus, these will let you lock your shared file directly without having to have a separate lockfile. This, too, work on NFS.
Windows has very similar file locking functions, and it also has easy to use global named semaphores that are very convenient for synchronisation between processes.
As far as I've seen it, you can't reliably use files as locks for multiple processes. The problem is, while you create the file in one thread, you might get an interrupt and the OS switches to another process because I/O is taking so long. The same holds true for deletion of the lock file.
If you can, take a look at Boost.Interprocess, under the synchronization mechanisms part.
While I'm generally against making API calls which can throw from a constructor/destructor (see docs on boost::filesystem::remove) or making throwing calls without a catch block in general that's not really what you were asking about.
You could check out the Overlapped IO library if this is for windows. Otherwise have you considered using shared memory between the processes instead?
Edit: Just saw the other process was Java. You may still be able to create a named mutex that can be shared between processes and used that to create locks around the file IO bits so they have to take turns writing. Sorry I don't know Java so no I idea if that's more feasible than shared memory.

Lockless reader/writer

I have some data that is both read and updated by multiple threads. Both reads and writes must be atomic. I was thinking of doing it like this:
// Values must be read and updated atomically
struct SValues
{
double a;
double b;
double c;
double d;
};
class Test
{
public:
Test()
{
m_pValues = &m_values;
}
SValues* LockAndGet()
{
// Spin forver until we got ownership of the pointer
while (true)
{
SValues* pValues = (SValues*)::InterlockedExchange((long*)m_pValues, 0xffffffff);
if (pValues != (SValues*)0xffffffff)
{
return pValues;
}
}
}
void Unlock(SValues* pValues)
{
// Return the pointer so other threads can lock it
::InterlockedExchange((long*)m_pValues, (long)pValues);
}
private:
SValues* m_pValues;
SValues m_values;
};
void TestFunc()
{
Test test;
SValues* pValues = test.LockAndGet();
// Update or read values
test.Unlock(pValues);
}
The data is protected by stealing the pointer to it for every read and write, which should make it threadsafe, but it requires two interlocked instructions for every access. There will be plenty of both reads and writes and I cannot tell in advance if there will be more reads or more writes.
Can it be done more effective than this? This also locks when reading, but since it's quite possible to have more writes then reads there is no point in optimizing for reading, unless it does not inflict a penalty on writing.
I was thinking of reads acquiring the pointer without an interlocked instruction (along with a sequence number), copying the data, and then having a way of telling if the sequence number had changed, in which case it should retry. This would require some memory barriers, though, and I don't know whether or not it could improve the speed.
----- EDIT -----
Thanks all, great comments! I haven't actually run this code, but I will try to compare the current method with a critical section later today (if I get the time). I'm still looking for an optimal solution, so I will get back to the more advanced comments later. Thanks again!
What you have written is essentially a spinlock. If you're going to do that, then you might as well just use a mutex, such as boost::mutex. If you really want a spinlock, use a system-provided one, or one from a library rather than writing your own.
Other possibilities include doing some form of copy-on-write. Store the data structure by pointer, and just read the pointer (atomically) on the read side. On the write side then create a new instance (copying the old data as necessary) and atomically swap the pointer. If the write does need the old value and there is more than one writer then you will either need to do a compare-exchange loop to ensure that the value hasn't changed since you read it (beware ABA issues), or a mutex for the writers. If you do this then you need to be careful how you manage memory --- you need some way to reclaim instances of the data when no threads are referencing it (but not before).
There are several ways to resolve this, specifically without mutexes or locking mechanisms. The problem is that I'm not sure what the constraints on your system is.
Remember that atomic operations is something that often get moved around by the compilers in C++.
Generally I would solve the issue like this:
Multiple-producer-single-consumer by having 1 single-producer-single-consumer per writing thread. Each thread writes into their own queue. A single consumer thread that gathers the produced data and stores it in a single-consumer-multiple-reader data storage. The implementation for this is a lot of work and only recommended if you are doing a time-critical application and that you have the time to put in for this solution.
There are more things to read up about this, since the implementation is platform specific:
Atomic etc operations on windows/xbox360:
http://msdn.microsoft.com/en-us/library/ee418650(VS.85).aspx
The multithreaded single-producer-single-consumer without locks:
http://www.codeproject.com/KB/threads/LockFree.aspx#heading0005
What "volatile" really is and can be used for:
http://www.drdobbs.com/cpp/212701484
Herb Sutter has written a good article that reminds you of the dangers of writing this kind of code:
http://www.drdobbs.com/cpp/210600279;jsessionid=ZSUN3G3VXJM0BQE1GHRSKHWATMY32JVN?pgno=2