Methodology for debugging serial poll - c++

I'm reading values from a sensor via serial (accelerometer values), in a loop similar to this:
while( 1 ) {
vector values = getAccelerometerValues();
// Calculate velocity
// Calculate total displacement
if ( displacement == 0 )
print("Back at origin");
}
I know the time that it takes per sample, which is taken care of in the getAccelerometerValues(), so I have a time-period to calculate velocity, displacement etc. I sample at approximately 120 samples / second.
This works, but there are bugs (non-precise accelerometer values, floating-point errors, etc), and calibrating and compensating to get reasonably accurate displacement values is proving difficult.
I'm having great amounts of trouble finding a process to debug the loop. If I use a debugger (my code happens to be written in C++, and I'm slowly learning to use gdb rather than print statements), I have issues stepping through and pushing my sensor around to get an accelerometer reading at the point in time that the debugger executes the line. That is, it's very difficult to get the timing of "continue to next line" and "pushing sensor so it's accelerating" right.
I can use lots of print statements, which tend to fly past on-screen, but due to the number of samples, this gets tedious and difficult to derive where the problems are, particularly if there's more than one print statement per loop tick.
I can decrease the number of samples, which improves the readability of the programs output, but drastically decreases the reliability of the acceleration values I poll from the sensor; if I poll at 1Hz, the chances of polling the accelerometer value while it's undergoing acceleration drop considerably.
Overall, I'm having issues stepping through the code and using realistic data; I can step through it with non-useful data, or I can not really step through it with better data.
I'm assuming print statements aren't the best debugging method for this scenario. What's a better option? Are there any kinds of resources that I would find useful (am I missing something with gdb, or are there other tools that I could use)? I'm struggling to develop a methodology to debug this.

A sensible approach would be to use some interface for getAccelerometerValues() that you can substitute at either runtime or build-time, such as passing in a base-class pointer with a virtual method to override in the concrete derived class.
I can describe the mechanism in more detail if you need, but the ideal is to be able to run the same loop against:
real live data
real live data (and save it to file)
canned real data saved from a previous run
fake data you cooked up as a test case
Note in particular that the 'replay' version(s) should be easy to debug, if each call just returns the next data from the file.

Create if blocks for the exact conditions you want to debug. For example, if you only care about when the accelerometer reads that you are moving left:
if(movingLeft(values) {
print("left");
}

The usual solution to this problem is recording. Your capture sample sequences from your sensor in a real-time manner, and store them to files. Then you train your system, debug your code, etc. using the recorded data. Finally, you connect the working code to the real data stream which flows immediately from the sensor.

I would debug your code with fake (ie: random) values, before everything else.
It the computations work as expected, then I would use the values read from the port.
Also, isnt' there a way to read those values in a callback/push fashion, that is, get your function called only when there's new, reliable data?
edit: I don't know what libraries you are using, but in the .NET framework you can use a SerialPort class with the Event DataReceived. That way you are sure to use the most actual and reliable data.

Related

DirectShow IReferenceClock implementation

How exactly are you meant to implement an IReferenceClock that can be set via IMediaFilter::SetSyncSource?
I have a system that implements GetTime and AdviseTime, UnadviseTime. When a stream starts playing it sets a base time via AdviseTime and then increases Stream Time for each subsequent advise.
However how am I supposed to know when a new graph has run? I need to set a zero point for a given reference clock. Otherwise if I create a reference clock and then, 10 seconds later, I start the graph I am now in the position that I don't know whether I should be 10 seconds down the playback or whether I should be starting from 0. Obviously the base time will say that I am starting from 0 but have I just stalled for 10 seconds and do I need to drop a bunch of frames?
I really can't seem to figure out how to write a proper IReferenceClock so any hints or ideas would be hugely appreciated.
Edit: One example of a problem I am having is that I have 2 graphs and 2 videos. The audio from both videos is going to a null renderer. The Video to a standard CLSID_VideoRenderer. Now If i set the same reference clock to both and then Run graph 1 all seems to be fine. However if 10 seconds down the line I run graph 2 then it will run as though the SetSyncSource is NULL for the first 10 seconds or so until it has caught up with the other video.
Obviously if the graphs called GetTime to get their "base time" this would solve the problem but this is not what I'm seeing happening. Both videos end up with a base time of 0 because thats the point I run them from.
Its worth noting that if I set no clock at all (or call SetDefaultSyncSource) then both graphs run as fast as they can. I assume this is due to the lack of an Audio Renderer ...
However how am I supposed to know when a new graph has run?
The clock runs on its own, it is the graph that aligns its operation against the clock and not otherwise. The graph receives outer Run call, then it checks current clock time and assigns base time, which is distributed among filters, as "current clock time + some time for the things to take off". The clock itself doesn't have to have a faintest idea about all this and its task is to keep running and keep incrementing time.
In particular, clock time does not have to reset to zero at any time.
From documentation:
The clock's baseline—the time from which it starts counting—depends on the implementation, so the value returned by GetTime is not inherently meaningful. What matters is the delta from when the graph started running.
When an application calls IMediaControl::Run to run the filter graph, the Filter Graph Manager calls IMediaFilter::Run on each filter. To compensate for the slight amount of time it takes for the filters to start running, the Filter Graph Manager specifies a start time slightly in the future.
BaseClasses offer CBaseReferenceClock class, which you can use as reference implementation (in refclock.*).
Comment to your edit:
You obviously not describing the case in full and you are omitting important details. There is a simple test: you can instantiate standard clock (CLSID_SystemClock) and use it on two regular graphs - they WILL run fine, even with time-separated Run times.
I suspect that you are doing some sync'ing or matching between the graphs and you are time stamping the samples, also using the clock. Presumably you are doing something wrong at that point and then you have hard time fixing it through the clock.

How to diagnose / profile momentary performance hit, C++

Solved: For when simple profiling isn't effective enough, I have written a tool to show me where performance hits occur. Basic information about how the tool works is in the accepted answer below. The source can be found here: http://pastebin.com/ETiW8hE8 (be sure to turn debugging symbols on in the program you're testing)
I've built a game engine in C++ and I have noticed in one particular area of a level that there is a brief performance hit. The game will stop completely for about half a second, and then continue on merrily. I've tried to profile this, but it's difficult isolate the condition since I also have to load the map and perform the in-game task which causes the performance hit. I can make a map load automatically and skip showing menus, etc, and comparing those profile results against a set of similar control data (all the same steps but without actually initiating the performance hit), but it doesn't show anything obvious.
I'm using gmon to profile.
This is a large application with many, many classes and functions. The performance hit only happens once, so there's no way to just trigger the problem many times during one execution to saturate my profiling results in order to make the offending functions more obvious in the profiling results.
What else can I do?
What I would do is try to grab a stack sample in that half second when it's frozen.
This would require an alarm clock timer set to go off some small time in the future, like 100ms.
Then in some loop, like the frame display loop, that normally takes less than 100ms to repeat, keep resetting the timer.
That way, it will act as a watchdog that barks if you don't keep petting it.
Then, stick a breakpoint in the timer interrupt handler.
When it gets there, you know you're in the bad slice of time.
Then just display the call stack, and it should show you what the problem is.
You might have to repeat the process a few times.
You are not saying anything about whether your application is threaded, but I will assume that it is not.
As per suggestion from mike, get insights by getting a stack trace at and see where it is freezing, you can do that with a bit of luck using pstack, so
while usleep 100000; do
pstack processid
done >/tmp/stack.log
Should give you some output to go on -- my guess is that you are calling a blocking IO operation, like reading some assets from disk.

Multithreading a File Map into an Array of Buffers

I'm trying to work with nasty large xml and text documents: ~40GBs.
I'm using Visual Studio 2012 on Windows 7.
I'm going to use 'Xerces' to snag the header/'footer tag' from the xmls.
I want to map an area of the file, say.. 60-120MBs.
Split the Map into (3 * processors/cores) equal parts. Setting each part as a buffer and loading the buffers into an array.
Then using (#processors/cores) while statments in new threads, I will synchronously count characters/lines/xml cycles while chewing through the the buffer array. When one buffer is completed the the process will jump to the next 'available' buffer and the completed buffer will be dropped out of memory. At the end I will add the total results into a project log.
Afterwards, I will reference the log, Split the files by character count/size(Or other option) to the nearest line or cycle and drop in the header and 'footer tag' to all the splits.
I'm doing this so I can import massive data to a MySQL server over a network with multiple computers.
My Question is, how do I create the buffer array and the file map with new threads?
Can I use :
win CreateFile
win CreateFileMapping
win MapViewOfFile
with standard ifstream operations and char buffers or should I opt something else?
Futher clarification:
My thinking is that if I can have the hard drive streaming the file into memory from one place and in one direction that I can use the full processing power of the machine to chew through seperate but equal buffers.
~Flavor: It's kind of like being a Shepard trying to scoop food out from one huge bin with 3-6 Large buckets with only two arms for X sheep that need to stay inside the fenced area. But they all move at the speed of light.
A few ideas or pointers might help me along here.
Any thoughts are Most Welcome. Thanks.
while(getline(my_file, myStr))
{
characterCount += myStr.length();
lineCount++;
if(my_file.eof()){
break;
}
}
This was the only code at run time for the test. 2hours, 30+min. 45-50% total processor for the program running it on a dual core 1.6Mhz laptop with 2GB RAM. Most of the RAM loaded right now is 600+MB from ~50 tabs open in firefox, Visual Studio at 60MB, then etcs.
IMPORTANT: During the test, the program running the code, which is only a window, and a dialog box, seemed to dump it's own working and private set of ram, down to like 300K ish, and didn't respond for the length of the test. I need to make another thread for the while statement I'm sure. But this means that NONE of the file was read into a buffer. The CPU was struggling for the entire run to keep up with the tinyest effort from the hard drive.
P.S. Further proof of CPU bottlenecking. It might take me 20min to transfer than entire file to another computer over my wireless network. Which includes the read process and a socket catch to write process on the other computer.
UPDATE
I used this adorable little thing to go from the previous test time to about 15-20min which is in line with what Mats Petersson was saying.
while (my_file.read( &bufferOne[0], bufferOne.size() ))
{
int cc = my_file.gcount();
for (int i = 0; i < cc; i++)
{
if (bufferOne[i] == '\n')
lineCount++;
characterCount++;
}
currentPercent = characterCount/onePercent;
SendMessage(GetDlgItem(hDlg, IDC_GENPROGRESS), PBM_SETPOS, currentPercent, 0);
}
Granted this is a single loop and it actually behaved much more appropriately than the previous test. This test was ~800% faster than the tight loop shown above this one with Getline. I set the buffer for this loop at 20MB. I jacked this code from: SOF - Fastest Example
BUT...
I would like to point out that while polling the process in resource mon and task manager, it clearly showed the first core at 75-90% usage, the second fluxuately 25-50% (Pretty standard for some minor background stuff that I have open), and the hard drive at.. wait for it... 50%. Some 100% disk time spikes but also some lows at 25%. All of which basically means that Splitting the buffer processing between two different threads could very well be a benefit. It will use all the system resources but.. that's what I want. I'll update later today when I have the working prototype.
MAJOR UPDATE:
Finally finished my project after a bunch of learning. No File Map needed. Only a bunch of vector char's. I have successfully built a dynamically executing file stream line and character counter.
The good news, went from the previous 10-15min marker to ~3-4min on a 5.8GB file, BOOYA!~
Very short answer: Yes, you can use those functions.
For reading data, it's likely the most efficient method to map the file content into memory, since it saves having to copy the memory into a buffer in the application, just read it straight into the place it's supposed to go. So, no problem as long as you have enough address space available - 64-bit machines should certainly have plenty, in a 32-bit system it may be more of a scarce resource - but for sections of a few hundred MB, it shouldn't be a huge issue.
However, using multiple threads, I'm not at all convinced. I have a fair idea that reading more than one part of a very large file will be counter productive. This will increase the amount of head movement on the disk, which is a large portion of transfer rate. You can count on some 50-100MB/s transfer rates for "ordinary" systems. If the system has some sort of raid controller or some such, maybe around double that - very exotic raid controllers may achieve three times.
So reading 40GB will take somewhere in the order of 3-15 minutes.
The CPU is probably not going to be very busy, and running multiple threads is quite likely to worsen the overall performance of the system.
You may want to keep a thread for reading and one for writing, and only actually write out the data once you have a sufficient amount of it, again, to avoid unnecessary moves of the read/write head on the disk(s).

Achieving game engine determinism with threading

I would like to achieve determinism in my game engine, in order to be able to save and replay input sequences and to make networking easier.
My engine currently uses a variable timestep: every frame I calculate the time it took to update/draw the last one and pass it to my entities' update method. This makes 1000FPS games seem as fast ad 30FPS games, but introduces undeterministic behavior.
A solution could be fixing the game to 60FPS, but it would make input more delayed and wouldn't get the benefits of higher framerates.
So I've tried using a thread (which constantly calls update(1) then sleeps for 16ms) and draw as fast as possible in the game loop. It kind of works, but it crashes often and my games become unplayable.
Is there a way to implement threading in my game loop to achieve determinism without having to rewrite all games that depend on the engine?
You should separate game frames from graphical frames. The graphical frames should only display the graphics, nothing else. For the replay it won't matter how many graphical frames your computer was able to execute, be it 30 per second or 1000 per second, the replaying computer will likely replay it with a different graphical frame rate.
But you should indeed fix the gameframes. E.g. to 100 gameframes per second. In the gameframe the game logic is executed: stuff that is relevant for your game (and the replay).
Your gameloop should execute graphical frames whenever there is no game frame necessary, so if you fix your game to 100 gameframes per second that's 0.01 seconds per gameframe. If your computer only needed 0.001 to execute that logic in the gameframe, the other 0.009 seconds are left for repeating graphical frames.
This is a small but incomplete and not 100% accurate example:
uint16_t const GAME_FRAMERATE = 100;
uint16_t const SKIP_TICKS = 1000 / GAME_FRAMERATE;
uint16_t next_game_tick;
Timer sinceLoopStarted = Timer(); // Millisecond timer starting at 0
unsigned long next_game_tick = sinceLoopStarted.getMilliseconds();
while (gameIsRunning)
{
//! Game Frames
while (sinceLoopStarted.getMilliseconds() > next_game_tick)
{
executeGamelogic();
next_game_tick += SKIP_TICKS;
}
//! Graphical Frames
render();
}
The following link contains very good and complete information about creating an accurate gameloop:
http://www.koonsolo.com/news/dewitters-gameloop/
To be deterministic across a network, you need a single point of truth, commonly called "the server". There is a saying in the game community that goes "the client is in the hands of the enemy". That's true. You cannot trust anything that is calculated on the client for a fair game.
If for example your game gets easier if for some reasons your thread only updates 59 times a second instead of 60, people will find out. Maybe at the start they won't even be malicious. They just had their machines under full load at the time and your process didn't get to 60 times a second.
Once you have a server (maybe even in-process as a thread in single player) that does not care for graphics or update cycles and runs at it's own speed, it's deterministic enough to at least get the same results for all players. It might still not be 100% deterministic based on the fact that the computer is not real time. Even if you tell it to update every $frequence, it might not, due to other processes on the computer taking too much load.
The server and clients need to communicate, so the server needs to send a copy of it's state (for performance maybe a delta from the last copy) to each client. The client can draw this copy at the best speed available.
If your game is crashing with the thread, maybe it's an option to actually put "the server" out of process and communicate via network, this way you will find out pretty fast, which variables would have needed locks because if you just move them to another project, your client will no longer compile.
Separate game logic and graphics into different threads . The game logic thread should run at a constant speed (say, it updates 60 times per second, or even higher if your logic isn't too complicated, to achieve smoother game play ). Then, your graphics thread should always draw the latest info provided by the logic thread as fast as possible to achieve high framerates.
In order to prevent partial data from being drawn, you should probably use some sort of double buffering, where the logic thread writes to one buffer, and the graphics thread reads from the other. Then switch the buffers every time the logic thread has done one update.
This should make sure you're always using the computer's graphics hardware to its fullest. Of course, this does mean you're putting constraints on the minimum cpu speed.
I don't know if this will help but, if I remember correctly, Doom stored your input sequences and used them to generate the AI behaviour and some other things. A demo lump in Doom would be a series of numbers representing not the state of the game, but your input. From that input the game would be able to reconstruct what happened and, thus, achieve some kind of determinism ... Though I remember it going out of sync sometimes.

Platform independent parallelization without changing the framework?

I hope the title did not mislead you.
My problem is the following: Currently I try to speed up a raytracer and this is done with the help of the graphics card. It works fine despite the fact that it got slower by this. :)
This is caused by the fact, that I trace one ray on the whole geometry at once on the graphics card(my "tracing server") and then fetch the results, which is awfully slow, so I have to gather some rays and calc them and fetch the results together to speed this up.
The next problem is, that I am not allowed to rewrite the surrounding framework that should know nothing or least possible about this parallelization.
So here is my approach:
I thought about using several threads, where each one gets a ray and requests my "tracing server" to calc the intersections. Then the thread is stopped until enough rays were gathered to calc the intersections on the graphics card and get the results back efficiently. This means that each thread will wait until the results were fetched.
You see I already have some plan but following I do not know:
Which threading framework should I take to be platformindependent?
Should I use a threadpool of fixed size or create them as needed?
Can any given thread library handle at least 1000 waiting threads(because that would be the number that I need to gather for my fetch to be efficient)?
But I also could imagine doing this with one thread that
dumps its load (a new ray) to the "tracing server" and fetches the next load until
there is enough to fetch the results.
Then the thread would take the results one by one, do the further calculations until all results are processed and then goes back to step one until all rays are done.
Also if you have some better idea how to parallelize this, tell me about it.
Regards,
Nobody
PS
If you need this information: The two platforms I want to use are Linux and Windows.
use either Thread Building Blocks or boost::thread.
http://www.boost.org/doc/libs/1_46_0/doc/html/thread.html
http://threadingbuildingblocks.org/
As far as threadpool/on-demand-threads - threadpool is generally better idea as it avoids creation overhead.
Number of waiting threads is gonna depend on the underlying system more than anything else:
Maximum number of threads per process in Linux?