dtrace function time tracing without called functions that are also traced - dtrace

I would like to have a record of how much cpu time each function is using. The caveat here is that I only want to record the time used by the function itself, not the cpu time used by functions that are called by the function and which are also traced by my script. e.g. if both function foo and bar are traced by my script and the total cpu time for function foo is 2000, but function foo calls function bar three times, which costs 500 cpu time each, then I would like to see the following result:
function cputime call count
foo 500 1
bar 1500 3
Right now I have the following dtrace script to get the total cpu time per function, but I don't have any leads yet on how to change it such that I get the cpu time results as described above. (Note call count and output formatting is not yet in the script, but those are easy to add once I have the cpu time info that I am after.)
#!/usr/sbin/dtrace -s
pid$1:$2::entry
{
++self->call_depth[probefunc];
self->start[probefunc, self->call_depth[probefunc]] = timestamp;
self->vstart[probefunc, self->call_depth[probefunc]] = vtimestamp;
}
pid$1:$2::return
/self->start[probefunc, self->call_depth[probefunc]]/
{
#function_walltime[probefunc] = sum(timestamp - self->start[probefunc, self->call_depth[probefunc]]);
self->start[probefunc, self->call_depth[probefunc]] = 0;
#function_cputime[probefunc] = sum(vtimestamp - self->vstart[probefunc, self->call_depth[probefunc]]);
self->vstart[probefunc, self->call_depth[probefunc]] = 0;
--self->call_depth[probefunc];
}

Hope the following script can help:
#!/usr/sbin/dtrace -qs
pid$1:$2::entry
{
self->vstart[probefunc] = vtimestamp;
}
pid$1:$2::return
{
this->cputime = vtimestamp - self->vstart[probefunc];
/* Sub the caller function CPU time */
#function_cputime[ufunc(ucaller)] = sum(-(this->cputime));
/* Add the callee function (current function) CPU time */
#function_cputime[ufunc(uregs[R_PC])] = sum(this->cputime);
/* Add the callee function (current function) count */
#function_count[ufunc(uregs[R_PC])] = sum(1);
}

This is the program I ended up using:
#!/usr/sbin/dtrace -s
#pragma option quiet
pid$1:$2::entry
/self->start[probefunc] == 0/
{
this->call_depth = self->call_depth++;
self->func_pcs[this->call_depth] = uregs[R_PC];
self->start[probefunc] = timestamp;
self->vstart[probefunc] = vtimestamp;
#function_entry_count[ufunc(uregs[R_PC])] = count();
}
pid$1:$2::return
/self->start[probefunc]/
{
this->call_depth = --self->call_depth;
this->wall_elapsed = timestamp - self->start[probefunc];
self->start[probefunc] = 0;
this->cpu_elapsed = vtimestamp - self->vstart[probefunc];
self->vstart[probefunc] = 0;
#function_walltime_inc[ufunc(uregs[R_PC])] = sum(this->wall_elapsed);
#function_walltime_exc[ufunc(uregs[R_PC])] = sum(this->wall_elapsed);
#function_cputime_inc[ufunc(uregs[R_PC])] = sum(this->cpu_elapsed);
#function_cputime_exc[ufunc(uregs[R_PC])] = sum(this->cpu_elapsed);
#function_return_count[ufunc(uregs[R_PC])] = count();
}
pid$1:$2::return
/this->call_depth > 0/
{
this->caller_pc = self->func_pcs[this->call_depth - 1];
#function_walltime_exc[ufunc(this->caller_pc)] = sum(-(this->wall_elapsed));
#function_cputime_exc[ufunc(this->caller_pc)] = sum(-(this->cpu_elapsed));
}
dtrace:::END
{
/* normalize to millisecons */
normalize(#function_walltime_inc, 1000000);
normalize(#function_walltime_exc, 1000000);
normalize(#function_cputime_inc, 1000000);
normalize(#function_cputime_exc, 1000000);
printf("\n");
printf("%-60s %21s %21s %25s\n", "", "INCLUSIVE", "EXCLUSIVE", "CALL COUNT");
printf("%-60s %10s %10s %10s %10s %12s %12s\n",
"MODULE`FUNCTION", "WALL [ms]", "CPU [ms]", "WALL [ms]", "CPU [ms]", "ENTRY", "RETURN");
printa("%-60A %#10d %#10d %#10d %#10d %#12d %#12d\n",
#function_walltime_inc, #function_cputime_inc,
#function_walltime_exc, #function_cputime_exc,
#function_entry_count, #function_return_count);
}
Note: I'm tracing both function entry count and return count as for some functions dtrace cannot instrument function returns properly, which completely messes up the call stack and hence the exclusive times. With both counts printed the problematic functions can be identified and if necessary removed from the tracing.

Related

C++ call functions internally

I'm working with following code which gives access to low level monitor configuration using Windows APIs
https://github.com/scottaxcell/winddcutil/blob/main/winddcutil/winddcutil.cpp
And I would like to create a new function that increases or decreases the brightness, I was able to do this using Powershell but since the C++ code looks somewhat easy to understand I want to have a crack at it and try my luck and hopefully integrate it with an ambient light sensor later.
The powershell code I have is as follows which works with above executable: (its very crude at this stage)
$cb = [int]([uint32]("0x" + ((C:\Users\Nick\WindowsScripts\winddcutil-main\x64\Release\winddcutil.exe getvcp 0 10) -join "`n").split(" ")[2]))
if ($args[0] -eq "increase") {
if ( $cb -ne 100) {
$nb = "{0:x}" -f ($cb + 10)
C:\Users\Nick\WindowsScripts\winddcutil-main\x64\Release\winddcutil.exe setvcp 0 10 $nb
}
} elseif ($args[0] -eq "decrease") {
if ( $cb -ne 10) {
$nb = "{0:x}" -f ($cb - 10)
C:\Users\Nick\WindowsScripts\winddcutil-main\x64\Release\winddcutil.exe setvcp 0 10 $nb
}
}
It gets current brightness and if argument given is "increase" and if brightness is not already 100 then adds 10, in case of "decrease" it subtracts 10. Values are coveted to and from hex to decimals.
I understand if I want to integrate this inside the C++ code directly I would have something like following:
int increaseBrightness(std::vector<std::string> args) {
size_t did = INT_MAX;
did = std::stoi(args[0]);
//0 is monitor ID and 10 is the feature code for brightness
//currentBrightness = getVcp("0 10")
//calculate new value
//setVcp("0 10 NewValue")
}
Ultimetaly I would like to call the executable like "winddcutil.exe increasebrightness 0" (0 being the display ID)
I can keep digging around on how to do the calculation in C++ but internally calling the functions and passing the arguments so far turned out to be very challenging for me and I would appreciate some help there.
you need to add a needed option here
line 164
std::unordered_map<std::string,std::function<int(std::vector<std::string>)>> commands
{
{ "help", printUsage },
{ "detect", detect},
{ "capabilities", capabilities },
{ "getvcp", getVcp },
{ "setvcp", setVcp},
{"increasebrightness ", increaseBrightness } // update here
};
to get current brightness you can't use getVcp api due to its result will be printed to stdout , it isn't returned via returned value, follow getVcp to get brighness value , use this
DWORD currentValue;
bool success = GetVCPFeatureAndVCPFeatureReply(physicalMonitorHandle, vcpCode, NULL, &currentValue, NULL);
if (!success) {
std::cerr << "Failed to get the vcp code value" << std::endl;
return success;
}
then
define your increaseBrightness like
int increaseBrightness(std::vector<std::string> args) {
size_t did = INT_MAX;
did = std::stoi(args[0]);
DWORD currentBrightness;
bool success = GetVCPFeatureAndVCPFeatureReply(
physicalMonitorHandle, vcpCode, NULL, &currentBrightness, NULL);
if (!success) {
std::cerr << "Failed to get the vcp code value" << std::endl;
return success;
}
//example + 10
auto newValue = did + 10;
success = setVcp({"0", "10", std::to_string(newValue)});
if(success)
{
// your handler
}
// 0 is monitor ID and 10 is the feature code for brightness
// currentBrightness = getVcp("0 10")
// calculate new value
// setVcp("0 10 NewValue")
}
test for passing argument:
https://godbolt.org/z/5n5Gq3d7e
note: make sure your have increaseBrightness's declaration before std::unordered_map<std::string,std::function<int(std::vector<std::string>)>> commands to avoid compiler's complaint

Is it normal that CPU % is about "what can be done" instead of "what is happening"?

I have this function, which is called heavily within my application:
void Envelope::Process(Voice &voice) {
VoiceParameters &voiceParameters = mVoiceParameters[voice.mIndex];
// control rate
if (voiceParameters.mControlRateIndex-- == 0) {
voiceParameters.mControlRateIndex = PLUG_CONTROL_RATE - 1;
DBGMSG("I'm entered");
}
// next phase
voiceParameters.mBlockStep += mRate;
voiceParameters.mStep += mRate;
}
This function never enter within the if statement (i.e. I never see that "I'm entered" message). And it takes 3% of CPU.
Now, if I write this function:
void Envelope::Process(Voice &voice) {
VoiceParameters &voiceParameters = mVoiceParameters[voice.mIndex];
// control rate
if (voiceParameters.mControlRateIndex-- == 0) {
voiceParameters.mControlRateIndex = PLUG_CONTROL_RATE - 1;
DBGMSG("I'm entered");
// samples (as rest) between two "quantized by block" sections occurs in the prev or next section, by round (interpolation). latest samples will be ignored (>) or added (<)
if (mIsEnabled) {
// update value
voiceParameters.mValue = (voiceParameters.mBlockStartAmp + (voiceParameters.mBlockStep * voiceParameters.mBlockFraction));
// scale value
if (!mIsBipolar) {
voiceParameters.mValue = (voiceParameters.mValue + 1.0) / 2.0;
}
voiceParameters.mValue *= mAmount;
}
else {
voiceParameters.mValue = 0.0;
}
// connectors
mOutputConnector_CV.mPolyValue[voice.mIndex] = voiceParameters.mValue;
}
// next phase
voiceParameters.mBlockStep += mRate;
voiceParameters.mStep += mRate;
}
(which does the same, since the code inserted will never be executed) the CPU raise 7%.
What's happening? How can it be?
I'm working in Release (or Tracer, nothing is different) modes.
Since the "unused" extra code contains a read of voiceParameters.mBlockStep, the writes to it become relevant. Compilers can take more liberty with write-only variables, potentially even eliminating them. But the least they can do is reorder such writes.

Memory usage of C++ program grows, (shown in Debian's "top"), until it crashes

I'm working on a C++ program that should be able to run for several days, so it is a bit of a hassle that its memory consumption seems to grow really fast.
The full code of the program is a little long, so I'll post just the related things. The structure is the following:
int main (void){
//initialization of the global variables
error = 0;
state = 0;
cycle = 0;
exportcycle = 0;
status = 0;
counter_temp_ctrl = 0;
start = 0;
stop = 0;
inittimer();
mysql_del ("TempMeas");
mysql_del ("TempMeasHist");
mysql_del ("MyControl");
mysql_del ("MyStatus");
initmysql();
while(1){
statemachine();
pause();
}
}
The timer function that is initialized above is the following:
void catch_alarm (int sig)
{
//Set the statemachine to state 1 (reading in new values)
start = readmysql("MyStatus", "Start", 0);
stop = readmysql("MyStatus", "Stop", 0);
if (start == 1){
state = 1;
}
if (stop == 1){
state = 5;
}
//printf("Alarm event\n");
signal (sig, catch_alarm);
return void();
}
So basically, since I'm not setting the start bit in the webinterface that modifies the MyStatus Tab the program just calls the readmysql function twice every second (the timer's interval). The readmysql function is given below:
float readmysql(string table, string row, int lastvalue){
float readdata = 0;
// Initialize a connection to MySQL
MYSQL_RES *mysql_res;
MYSQL_ROW mysqlrow;
MYSQL *con = mysql_init(NULL);
if(con == NULL)
{
error_exit(con);
}
if (mysql_real_connect(con, "localhost", "user1", "user1", "TempDB", 0, NULL, 0) == NULL)
{
error_exit(con);
}
if (lastvalue == 1){
string qu = "Select "+ row +" from "+ table +" AS a where MeasTime=(select MAX(MeasTime) from "+ table;
error = mysql_query(con, qu.c_str());
}
else{
string qu = "Select "+ row +" from "+ table;
error = mysql_query(con, qu.c_str());
}
mysql_res = mysql_store_result(con);
while((mysqlrow = mysql_fetch_row(mysql_res)) != NULL)
{
readdata = atoi(mysqlrow[0]);
}
//cout << "readdata "+table+ " "+row+" = " << readdata << endl;
// Close the MySQL connection
mysql_close(con);
//delete mysql_res;
//delete mysqlrow;
return readdata;
}
I thought that the variables in this function are stored on the stack and are freed automaticaly when leaving the function. However it seems that some part of the memory is not freed, because it just grows after all. As you can see I have tried to use the delete function on two of the variables. Seems to have no effect. What am i doing wrong in terms of memory-management and so on?
Thanks for your help!
Greetings Oliver.
At least mysql_store_result is leaking. From documentation:
After invoking mysql_query() or mysql_real_query(), you must call mysql_store_result() or mysql_use_result() for every statement that successfully produces a result set (SELECT, SHOW, DESCRIBE, EXPLAIN, CHECK TABLE, and so forth). You must also call mysql_free_result() after you are done with the result set.
If your program continuously consumes memory (without ever releasing it), then you have a memory leak.
A good way to detect memory leaks, is to run it through a memory debugger, e.g. valgrind:
$ valgrind /path/to/my/program
Once your program started eating memory, stop it and valgrind will give you a nice summary about where your program allocated memory that was never freed.
There is no need to let the system run out of memory and crash; just wait until it has eaten some memory that has not been freed. Then fix your code. Then repeat until no more memory errors can be detected.
Also note that valgrind intercepts your systems memory management. This usually results in a (severe) performance penalty.

Steptimer.getTotalSeconds within steptimer.h returning 0, c++ visual studio 2013, directx app

I'm trying to use the given code within steptimer.h to set up code that will run every two seconds. However with the code below, timer.GetTotalSeconds() always returns 0.
Unfortunately there isn't much information readily available on StepTimer.h (at least I believe due to a lack of useful search results), so I was hoping someone might be able to shed some light as to why the timer isn't recording the elapsed seconds. Am I using it incorrectly?
Code from Game.h, Game.cpp and StepTimer.h are included below. Any help is greatly appreciated.
From Game.cpp:
double time = timer.GetTotalSeconds();
if (time >= 2) {
laser_power++;
timer.ResetElapsedTime();
}
Initialised in Game.h:
DX::StepTimer timer;
From Common/StepTimer.h:
#pragma once
#include <wrl.h>
namespace DX
{
// Helper class for animation and simulation timing.
class StepTimer
{
public:
StepTimer() :
m_elapsedTicks(0),
m_totalTicks(0),
m_leftOverTicks(0),
m_frameCount(0),
m_framesPerSecond(0),
m_framesThisSecond(0),
m_qpcSecondCounter(0),
m_isFixedTimeStep(false),
m_targetElapsedTicks(TicksPerSecond / 60)
{
if (!QueryPerformanceFrequency(&m_qpcFrequency))
{
throw ref new Platform::FailureException();
}
if (!QueryPerformanceCounter(&m_qpcLastTime))
{
throw ref new Platform::FailureException();
}
// Initialize max delta to 1/10 of a second.
m_qpcMaxDelta = m_qpcFrequency.QuadPart / 10;
}
// Get elapsed time since the previous Update call.
uint64 GetElapsedTicks() const { return m_elapsedTicks; }
double GetElapsedSeconds() const { return TicksToSeconds(m_elapsedTicks); }
// Get total time since the start of the program.
uint64 GetTotalTicks() const { return m_totalTicks; }
double GetTotalSeconds() const { return TicksToSeconds(m_totalTicks); }
// Get total number of updates since start of the program.
uint32 GetFrameCount() const { return m_frameCount; }
// Get the current framerate.
uint32 GetFramesPerSecond() const { return m_framesPerSecond; }
// Set whether to use fixed or variable timestep mode.
void SetFixedTimeStep(bool isFixedTimestep) { m_isFixedTimeStep = isFixedTimestep; }
// Set how often to call Update when in fixed timestep mode.
void SetTargetElapsedTicks(uint64 targetElapsed) { m_targetElapsedTicks = targetElapsed; }
void SetTargetElapsedSeconds(double targetElapsed) { m_targetElapsedTicks = SecondsToTicks(targetElapsed); }
// Integer format represents time using 10,000,000 ticks per second.
static const uint64 TicksPerSecond = 10000000;
static double TicksToSeconds(uint64 ticks) { return static_cast<double>(ticks) / TicksPerSecond; }
static uint64 SecondsToTicks(double seconds) { return static_cast<uint64>(seconds * TicksPerSecond); }
// After an intentional timing discontinuity (for instance a blocking IO operation)
// call this to avoid having the fixed timestep logic attempt a set of catch-up
// Update calls.
void ResetElapsedTime()
{
if (!QueryPerformanceCounter(&m_qpcLastTime))
{
throw ref new Platform::FailureException();
}
m_leftOverTicks = 0;
m_framesPerSecond = 0;
m_framesThisSecond = 0;
m_qpcSecondCounter = 0;
}
// Update timer state, calling the specified Update function the appropriate number of times.
template<typename TUpdate>
void Tick(const TUpdate& update)
{
// Query the current time.
LARGE_INTEGER currentTime;
if (!QueryPerformanceCounter(&currentTime))
{
throw ref new Platform::FailureException();
}
uint64 timeDelta = currentTime.QuadPart - m_qpcLastTime.QuadPart;
m_qpcLastTime = currentTime;
m_qpcSecondCounter += timeDelta;
// Clamp excessively large time deltas (e.g. after paused in the debugger).
if (timeDelta > m_qpcMaxDelta)
{
timeDelta = m_qpcMaxDelta;
}
// Convert QPC units into a canonical tick format. This cannot overflow due to the previous clamp.
timeDelta *= TicksPerSecond;
timeDelta /= m_qpcFrequency.QuadPart;
uint32 lastFrameCount = m_frameCount;
if (m_isFixedTimeStep)
{
// Fixed timestep update logic
// If the app is running very close to the target elapsed time (within 1/4 of a millisecond) just clamp
// the clock to exactly match the target value. This prevents tiny and irrelevant errors
// from accumulating over time. Without this clamping, a game that requested a 60 fps
// fixed update, running with vsync enabled on a 59.94 NTSC display, would eventually
// accumulate enough tiny errors that it would drop a frame. It is better to just round
// small deviations down to zero to leave things running smoothly.
if (abs(static_cast<int64>(timeDelta - m_targetElapsedTicks)) < TicksPerSecond / 4000)
{
timeDelta = m_targetElapsedTicks;
}
m_leftOverTicks += timeDelta;
while (m_leftOverTicks >= m_targetElapsedTicks)
{
m_elapsedTicks = m_targetElapsedTicks;
m_totalTicks += m_targetElapsedTicks;
m_leftOverTicks -= m_targetElapsedTicks;
m_frameCount++;
update();
}
}
else
{
// Variable timestep update logic.
m_elapsedTicks = timeDelta;
m_totalTicks += timeDelta;
m_leftOverTicks = 0;
m_frameCount++;
update();
}
// Track the current framerate.
if (m_frameCount != lastFrameCount)
{
m_framesThisSecond++;
}
if (m_qpcSecondCounter >= static_cast<uint64>(m_qpcFrequency.QuadPart))
{
m_framesPerSecond = m_framesThisSecond;
m_framesThisSecond = 0;
m_qpcSecondCounter %= m_qpcFrequency.QuadPart;
}
}
private:
// Source timing data uses QPC units.
LARGE_INTEGER m_qpcFrequency;
LARGE_INTEGER m_qpcLastTime;
uint64 m_qpcMaxDelta;
// Derived timing data uses a canonical tick format.
uint64 m_elapsedTicks;
uint64 m_totalTicks;
uint64 m_leftOverTicks;
// Members for tracking the framerate.
uint32 m_frameCount;
uint32 m_framesPerSecond;
uint32 m_framesThisSecond;
uint64 m_qpcSecondCounter;
// Members for configuring fixed timestep mode.
bool m_isFixedTimeStep;
uint64 m_targetElapsedTicks;
};
}
Alrighty got what I wanted with the code below. Was missing the .Tick(####) call.
timer.Tick([&]() {
double time = timer.GetTotalSeconds();
if (time >= checkpt) {
laser_power++;
checkpt += 2;
}
});
Just fixed an integer checkpt to increment by 2 each time so that it runs every 2 seconds. There's probably a better way to do it, but it's 3.30am so I'm being lazy for the sake of putting my mind at ease.

How to terminate a recursive function after a specific time?

I have a recursive function in C++ and I need to immediately terminate the function including all calls which have recursively made after a specific time, say 60 secs.
I have tried the following but doesn't work. takesTooLong is a global variable but if its value changes to 1 in one call, other calls keep seeing it as 0.
The OS is Ubuntu 12.10.
main() is something like this:
int main()
{
takesTooLong = 0;
startTime = clock();
RecursiveFunction();
endTime = clock();
printf("Elapsed time: %f", CalculateElapsedTime(startTime, endTime));
return 0;
}
My recursive function:
void RecursiveFunction(Some Parameters)
{
if (takesTooLong == 1)
return;
endTime = clock();
differenceTime = CalculateElapsedTime(startTime, endTime);
if (differenceTime > MAX_SECONDS)
{
takesTooLong = 1;
}
if (takesTooLong == 0)
{
for (i = 0; i < n && takesTooLong == 0; i++)
{
RecursiveFunction(Some Updated Parameters);
}
}
}
Pseudocode:
typedef ??? Time;//time type
Time getCurrentTime(); //returns time. Or ticks. Anything you could use to measure time.
void recursiveFunction(Time startTime){
...//do something here
Time currentTime = getCurrentTime();
if ((currentTime - startTime) > limit)
return;
recursiveFunction(startTime);
}
void recursiveFunction(){
Time startTime = getCurrentTime();
recursiveFunction(startTime);
}
Yes, you still have to unroll the stack but the alternatives are all ugly, the best probably being longjmp. If you need greater resolution than seconds, or you want to measure process/kernel time rather than wall clock time, use alternatives like timer_create or setitimer.
void alarmHandler (int sig)
{
timedOut = 1;
}
signal(SIGALRM, alarmHandler);
alarm(60);
RecursiveFunction();
alarm(0);
if (timedOut)
{
//report something
timedOut = 0;
}
void RecursiveFunction(Some Parameters)
{
if (timedOut)
return;
//........
}
The easiest way to exit an unknown number of recursions is by throwing an exception. Since you're doing it at most once every 60 seconds, this isn't too bad. Just make it a class TimeLimitExceeded : public std::runtime_error and catch that in the top level wrapper†.
† Most recursive functions should not be called directly, but via a non-recursive wrapper. The signature of the recursive function usually is inconvenient for callers, as it contains implementation details that are irrelevant to the caller.