What is the proper way to calculate latency in omnet++? - c++

I have written a simulation module. For measuring latency, I am using this:
simTime().dbl() - tempLinkLayerFrame->getCreationTime().dbl();
Is this the proper way ? If not then please suggest me or a sample code would be very helpful.
Also, is the simTime() latency is the actual latency in terms of micro
seconds which I can write in my research paper? or do I need to
scale it up?
Also, I found that the channel data rate and channel delay has no impact on the link latency instead if I vary the trigger duration the latency varies. For example
timer = new cMessage("SelfTimer");
scheduleAt(simTime() + 0.000000000249, timer);
If this is not the proper way to trigger simple module recursively then please suggest one.

Assuming both simTime and getCreationTime use the OMNeT++ class for representing time, you can operate on them directly, because that class overloads the relevant operators. Going with what the manual says, I'd recommend using a signal for the measurements (e.g., emit(latencySignal, simTime() - tempLinkLayerFrame->getCreationTime());).
simTime() is in seconds, not microseconds.
Regarding your last question, this code will have problems if you use it for all nodes, and you start all those nodes at the same time in the simulation. In that case you'll have perfect synchronization of all nodes, meaning you'll only see collisions in the first transmission. Therefore, it's probably a good idea to add a random jitter to every newly scheduled message at the start of your simulation.

Related

STM32 (using Mbed online) showing delay at higher analog input frequency

I am new to the use of controllers.
I am setting up a STM32F769 Controller(Using Mbed online compiler), my target is to get a PWM output which changes its frequency according to an analog input. I did some basic coding but there is a problem. When i check the output on oscilloscope with analog input 1Hz frequency, its working perfectly, but when i check it with 100Hz analog input there is delay in the output, and i get wrong values. I do not understand why, because this board is faster(216 MHZ) and i should not face such issue. (If someone could also explain, is it possible to use the board at 216MHz or other max frequency? and how?)
1st time user
{
meas_r=0;
for(int i=1;i<=1024;i++)
{
meas_r = meas_r+analog_value.read();
}
meas_r=meas_r/1024;
meas_v = meas_r * 3300;
out_freq=50000+(meas_v*50);
pulse.period( 1.0 / out_freq);
}
}
It should be working on 100Hz analog input as it works on 1 Hz.
216MHz might be the maximum clock frequency at which your processor can operate, however it does not mean that it can input/output that much frequency from its ports.
The delays are caused by time it takes to read analog values and compute required math operations. You are using multiple multiplications and divisions which are more complex than adding and subtracting for almost any hardware device. Obviously, you are using library/libraries as well (pulse.period(), analog_value.read()), there are some hidden computations on top of those multiplications and divisions. Finally, it is possible that your device is working with other stuffs as well (only you know about this). All those computations need time. At lower frequencies you might not be able to notice the delay, however when the frequency is high enough, the delays can be noticed. Also consider time required to read the analog values many times.
Wrong signal and period is due to the delays and some other uncertainties. If the processor is working with other tasks as well, then it will be hard to predict time it takes to finish them all. As the processor executes the instructions line by line and waits for previous computation to finish before the new one starts, it causes some uncertainty in timing. Data path and frequency of peripheral devices (getting input from peripherals) play crucial role in timing uncertainty and delays.
If timing and accuracy is really important in solving the problem you have, and if you can't solve the problem with DSP, MPU, MCU, CPU, GPU, etc. I would suggest you to use an FPGA to solve the problem.

How do I measure GPU time on Metal?

I want to see programmatically how much GPU time a part of my application consumes on macOS and iOS. On OpenGL and D3D I can use GPU timer query objects. I searched and couldn't find anything similar for Metal. How do I measure GPU time on Metal without using Instruments etc. I'm using Objective-C.
There are a couple of problems with this method:
1) You really want to know what is the GPU side latency within a command buffer most of the time, not round trip to CPU. This is better measured as the time difference between running 20 instances of the shader and 10 instances of the shader. However, that approach can add noise since the error is the sum of the errors associated with the two measurements.
2) Waiting for completion causes the GPU to clock down when it stops executing. When it starts back up again, the clock is in a low power state and may take quite a while to come up again, skewing your results. This can be a serious problem and may understate your performance in benchmark vs. actual by a factor of two or more.
3) if you start the clock on scheduled and stop on completed, but the GPU is busy running other work, then your elapsed time includes time spent on the other workload. If the GPU is not busy, then you get the clock down problems described in (2).
This problem is considerably harder to do right than most benchmarking cases I've worked with, and I have done a lot of performance measurement.
The best way to measure these things is to use on device performance monitor counters, as it is a direct measure of what is going on, using the machine's own notion of time. I favor ones that report cycles over wall clock time because that tends to weed out clock slewing, but there is not universal agreement about that. (Not all parts of the hardware run at the same frequency, etc.) I would look to the developer tools for methods to measure based on PMCs and if you don't find them, ask for them.
You can add scheduled and completed handler blocks to a command buffer. You can take timestamps in each and compare. There's some latency, since the blocks are executed on the CPU, but it should get you close.
With Metal 2.1, Metal now provides "events", which are more like fences in other APIs. (The name MTLFence was already used for synchronizing shared heap stuff.) In particular, with MTLSharedEvent, you can encode commands to modify the event's value at particular points in the command buffer(s). Then, you can either way for the event to have that value or ask for a block to be executed asynchronously when the event reaches a target value.
That still has problems with latency, etc. (as Ian Ollmann described), but is more fine grained than command buffer scheduling and completion. In particular, as Klaas mentions in a comment, a command buffer being scheduled does not indicate that it has started executing. You could put commands to set an event's value at the beginning and (with a different value) at the end of a sequence of commands, and those would only notify at actual execution time.
Finally, on iOS 10.3+ but not macOS, MTLCommandBuffer has two properties, GPUStartTime and GPUEndTime, with which you can determine how much time a command buffer took to execute on the GPU. This should not be subject to latency in the same way as the other techniques.
As an addition to Ken's comment above, GPUStartTime and GPUEndTime is now available on macOS too (10.15+):
https://developer.apple.com/documentation/metal/mtlcommandbuffer/1639926-gpuendtime?language=objc

freeRTOS scheduling configurations for tasks

I have my freeRTOS currently working on my Microzed board. I am using the Xilinx SDK as the software platform and until now I have been able to create tasks and assign priority.
I was just curious to know if it would be possible to assign a fixed time for each of my tasks such that for example after 100 miliseconds my scheduler would switch to the next task . So is it possible to set a fixed execution time for each of my tasks ?? As far as I checked I could not find a method to work this out, if there is any means to implement this using the utilities of freeRTOS, kindly let me know guys.
By default FreeRTOS will time slice tasks of equal priority, see http://www.freertos.org/a00110.html#configUSE_TIME_SLICING, but there is nothing to guarantee that each task gets an equal share of the CPU. For example, interrupts use an unknown amount of processing time during each time slice, and higher priority tasks can use part or all of a time slice.
Question for you though - why would you want the behaviour you requested? Maybe if you said what you were trying to achieve, rather than than ask if a feature existed, people would be able to make helpful suggestions.

DirectShow IReferenceClock implementation

How exactly are you meant to implement an IReferenceClock that can be set via IMediaFilter::SetSyncSource?
I have a system that implements GetTime and AdviseTime, UnadviseTime. When a stream starts playing it sets a base time via AdviseTime and then increases Stream Time for each subsequent advise.
However how am I supposed to know when a new graph has run? I need to set a zero point for a given reference clock. Otherwise if I create a reference clock and then, 10 seconds later, I start the graph I am now in the position that I don't know whether I should be 10 seconds down the playback or whether I should be starting from 0. Obviously the base time will say that I am starting from 0 but have I just stalled for 10 seconds and do I need to drop a bunch of frames?
I really can't seem to figure out how to write a proper IReferenceClock so any hints or ideas would be hugely appreciated.
Edit: One example of a problem I am having is that I have 2 graphs and 2 videos. The audio from both videos is going to a null renderer. The Video to a standard CLSID_VideoRenderer. Now If i set the same reference clock to both and then Run graph 1 all seems to be fine. However if 10 seconds down the line I run graph 2 then it will run as though the SetSyncSource is NULL for the first 10 seconds or so until it has caught up with the other video.
Obviously if the graphs called GetTime to get their "base time" this would solve the problem but this is not what I'm seeing happening. Both videos end up with a base time of 0 because thats the point I run them from.
Its worth noting that if I set no clock at all (or call SetDefaultSyncSource) then both graphs run as fast as they can. I assume this is due to the lack of an Audio Renderer ...
However how am I supposed to know when a new graph has run?
The clock runs on its own, it is the graph that aligns its operation against the clock and not otherwise. The graph receives outer Run call, then it checks current clock time and assigns base time, which is distributed among filters, as "current clock time + some time for the things to take off". The clock itself doesn't have to have a faintest idea about all this and its task is to keep running and keep incrementing time.
In particular, clock time does not have to reset to zero at any time.
From documentation:
The clock's baseline—the time from which it starts counting—depends on the implementation, so the value returned by GetTime is not inherently meaningful. What matters is the delta from when the graph started running.
When an application calls IMediaControl::Run to run the filter graph, the Filter Graph Manager calls IMediaFilter::Run on each filter. To compensate for the slight amount of time it takes for the filters to start running, the Filter Graph Manager specifies a start time slightly in the future.
BaseClasses offer CBaseReferenceClock class, which you can use as reference implementation (in refclock.*).
Comment to your edit:
You obviously not describing the case in full and you are omitting important details. There is a simple test: you can instantiate standard clock (CLSID_SystemClock) and use it on two regular graphs - they WILL run fine, even with time-separated Run times.
I suspect that you are doing some sync'ing or matching between the graphs and you are time stamping the samples, also using the clock. Presumably you are doing something wrong at that point and then you have hard time fixing it through the clock.

Achieving game engine determinism with threading

I would like to achieve determinism in my game engine, in order to be able to save and replay input sequences and to make networking easier.
My engine currently uses a variable timestep: every frame I calculate the time it took to update/draw the last one and pass it to my entities' update method. This makes 1000FPS games seem as fast ad 30FPS games, but introduces undeterministic behavior.
A solution could be fixing the game to 60FPS, but it would make input more delayed and wouldn't get the benefits of higher framerates.
So I've tried using a thread (which constantly calls update(1) then sleeps for 16ms) and draw as fast as possible in the game loop. It kind of works, but it crashes often and my games become unplayable.
Is there a way to implement threading in my game loop to achieve determinism without having to rewrite all games that depend on the engine?
You should separate game frames from graphical frames. The graphical frames should only display the graphics, nothing else. For the replay it won't matter how many graphical frames your computer was able to execute, be it 30 per second or 1000 per second, the replaying computer will likely replay it with a different graphical frame rate.
But you should indeed fix the gameframes. E.g. to 100 gameframes per second. In the gameframe the game logic is executed: stuff that is relevant for your game (and the replay).
Your gameloop should execute graphical frames whenever there is no game frame necessary, so if you fix your game to 100 gameframes per second that's 0.01 seconds per gameframe. If your computer only needed 0.001 to execute that logic in the gameframe, the other 0.009 seconds are left for repeating graphical frames.
This is a small but incomplete and not 100% accurate example:
uint16_t const GAME_FRAMERATE = 100;
uint16_t const SKIP_TICKS = 1000 / GAME_FRAMERATE;
uint16_t next_game_tick;
Timer sinceLoopStarted = Timer(); // Millisecond timer starting at 0
unsigned long next_game_tick = sinceLoopStarted.getMilliseconds();
while (gameIsRunning)
{
//! Game Frames
while (sinceLoopStarted.getMilliseconds() > next_game_tick)
{
executeGamelogic();
next_game_tick += SKIP_TICKS;
}
//! Graphical Frames
render();
}
The following link contains very good and complete information about creating an accurate gameloop:
http://www.koonsolo.com/news/dewitters-gameloop/
To be deterministic across a network, you need a single point of truth, commonly called "the server". There is a saying in the game community that goes "the client is in the hands of the enemy". That's true. You cannot trust anything that is calculated on the client for a fair game.
If for example your game gets easier if for some reasons your thread only updates 59 times a second instead of 60, people will find out. Maybe at the start they won't even be malicious. They just had their machines under full load at the time and your process didn't get to 60 times a second.
Once you have a server (maybe even in-process as a thread in single player) that does not care for graphics or update cycles and runs at it's own speed, it's deterministic enough to at least get the same results for all players. It might still not be 100% deterministic based on the fact that the computer is not real time. Even if you tell it to update every $frequence, it might not, due to other processes on the computer taking too much load.
The server and clients need to communicate, so the server needs to send a copy of it's state (for performance maybe a delta from the last copy) to each client. The client can draw this copy at the best speed available.
If your game is crashing with the thread, maybe it's an option to actually put "the server" out of process and communicate via network, this way you will find out pretty fast, which variables would have needed locks because if you just move them to another project, your client will no longer compile.
Separate game logic and graphics into different threads . The game logic thread should run at a constant speed (say, it updates 60 times per second, or even higher if your logic isn't too complicated, to achieve smoother game play ). Then, your graphics thread should always draw the latest info provided by the logic thread as fast as possible to achieve high framerates.
In order to prevent partial data from being drawn, you should probably use some sort of double buffering, where the logic thread writes to one buffer, and the graphics thread reads from the other. Then switch the buffers every time the logic thread has done one update.
This should make sure you're always using the computer's graphics hardware to its fullest. Of course, this does mean you're putting constraints on the minimum cpu speed.
I don't know if this will help but, if I remember correctly, Doom stored your input sequences and used them to generate the AI behaviour and some other things. A demo lump in Doom would be a series of numbers representing not the state of the game, but your input. From that input the game would be able to reconstruct what happened and, thus, achieve some kind of determinism ... Though I remember it going out of sync sometimes.