How to track a recursive function's call stack usage - c++

I am teaching a class to intro C++ students and I want to design a lab that shows how recursive functions differ from iteration. My idea was to track the memory/call stack usage for both and display the difference. I was almost positive that I had done something similar when I did my degree, but can't remember. My experience doesn't lie in C/C++ so any guidance would be appreciated.
Update 1:
I believe I may have miss represented my task. I had hoped to find a way to show how recursion increases the overhead/stack compared to iteration. I followed some suggested links and came up with the following script.
loops=100
counter=0
total1=0
echo "Iteration"
while [ $counter -lt $loops ]; do
"$1" & # Run the given command line in the background.
pid=$! peak1=0
echo -e "$counter.\c"
while true; do
#sleep 0.1
sample="$(pmap $pid | tail -n1 | sed 's/[^0-9]*//g' 2> /dev/null)" || break
if [ -z "$sample" ]; then
break
fi
let peak1='sample > peak1 ? sample : peak1'
done
# echo "Peak: $peak1" 1>&2
total1=$(expr $total1 + $peak1)
counter=$[$counter+1]
done
The program implements a binary search with either iteration or recursion. The idea is to get the average memory use and compare it to the Recursion version of the same program. This does not work as the iteration version often has a larger memory average than the recursion, which doesn't show to my students that recursion has drawbacks. Therefore I am pretty sure I am doing something incorrect.
Is pmap not going to provide me with what I want?

Something like this I think
void recursive(int* ptop) {
int dummy = 0;
printf("stack size %d\n",&dummy - ptop);
recursive(ptop);
}
void start() {
int dummy = 0;
recursive(&dummy);
}
until it will crash.

On any platform that knows them (Linux), backtrace(3), or even better backtrace_symbols(3) and their companions should be of great help.

Related

Control Windows 10's "Power Mode" programmatically

Background
Hi. I have an SB2 (Surface Book 2), and I'm one of the unlucky users who are facing the infamous 0.4GHz throttling problem that is plaguing many of the SB2 machines. The problem is that the SB2 suddenly, and very frequently depending on the ambient temperature, throttles heavily from a boost of 4GHz to 0.4GHz and hangs in there for a minute or two (this causes a severe slowup of the whole laptop). This is extremely frustrating and almost makes the machine unusable for even the simplest of workloads.
Microsoft apparently stated that it fixed the problem in the October 2019 update, but I and several other users are still facing it. I'm very positive my machine is up to date, and I even manually installed all the latest Surface Book 2 firmware updates.
Here's a capture of the CPU state when the problem is happening:
As you can see, the temperature of the unit itself isn't high at all, but CPU is throttling at 0.4GHz exactly.
More links about this: 1 2
Workarounds
I tried pretty much EVERYTHING. Undervolting until freezing screens, disabling BD PROCHOT, disabling power throttling in GPE, messing up with the registry, tuning several CPU/GPU settings. Nothing worked.
You can do only 2 things when the throttling starts:
Wait for it to finish (usually takes a minute or two).
Change the Power Mode in windows 10. It doesn't even matter if you're changing it from "Best performance" to "Best battery life", what matters is that you change it. As soon as you do, throttling completely stops in a couple seconds. This is the only manual solution that worked.
Question
In practice, changing this slider each 10 seconds no matter how heavy the workload is, indefinitely lead to a smooth experience without throttling. Of course, this isn't a feasible workaround by hand.
In theory, I thought that if I could find a way to control this mode programmatically, I might be able to wish this problem goodbye by switching power modes every 10 seconds or so.
I don't mind if it's in win32 (winapi) or a .net thing. I looked a lot, found this about power management, but it seems there's no interface for setting in win32. I could have overlooked it, so here's my question:
Is there any way at all to control the Power Mode in Windows 10 programmatically?
OK... I've been wanting command line or programmatic access to adjust the power slider for a while, and I've run across this post multiple times when looking into it. I'm surprised no one else has bothered to figure it out. I worked it out myself today, motivated by the fact that Windows 11 appears to have removed the power slider from the taskbar and you have to go digging into the Settings app to adjust it.
As previously discussed, in the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power\User\PowerSchemes you can find values "ActiveOverlayAcPowerScheme" and "ActiveOverlayDcPowerScheme" which record the current values of the slider for AC power and battery power, respectively. However, changing these values is not sufficient to adjust the power slider or the system's mode of operation.
Turns out there is an undocumented method in C:\Windows\System32\powrprof.dll called PowerSetActiveOverlayScheme. It takes a single parameter. I "guessed" that it would take a GUID in the same manner that PowerSetActiveScheme does, and it seems to work.
Note — Using an undocumented API is unsupported by Microsoft. This method may break in future Windows releases. It can be used for personal tinkering but I would not suggest using it in any actual production projects.
Here is the C# PInvoke signature:
[DllImportAttribute("powrprof.dll", EntryPoint = "PowerSetActiveOverlayScheme")]
public static extern uint PowerSetActiveOverlayScheme(Guid OverlaySchemeGuid);
It returns zero on success and non-zero on failure.
Calling it is as simple as:
PowerSetActiveOverlayScheme(new Guid("ded574b5-45a0-4f42-8737-46345c09c238"));
It has immediate effect. This particular GUID moved the slider all the way to the right for me and also updated the "ActiveOverlayAcPowerScheme" value in the registry. Using a GUID of all zeros reset the slider to the middle value. You can see what GUID options are available by just observing the values that show up in the registry when you set the power slider to different positions.
There are two methods that can be used to read the current position of the slider. I'm not sure what the difference between them is, they returned the same value each time in my testing.
[DllImportAttribute("powrprof.dll", EntryPoint = "PowerGetActualOverlayScheme")]
public static extern uint PowerGetActualOverlayScheme(out Guid ActualOverlayGuid);
[DllImportAttribute("powrprof.dll", EntryPoint = "PowerGetEffectiveOverlayScheme")]
public static extern uint PowerGetEffectiveOverlayScheme(out Guid EffectiveOverlayGuid);
They also return zero on success and non-zero on failure. They can be called like...
if (PowerGetEffectiveOverlayScheme(out Guid activeScheme) == 0)
{
Console.WriteLine(activeScheme);
}
There is one more method called "PowerGetOverlaySchemes", which I presume can be used to fetch a list of available GUIDs that could be used. It appears to take three parameters and I haven't bothered with figuring it out.
I created a command-line program which can be used to set the power mode, and it can be found at https://github.com/AaronKelley/PowerMode.
Aaron's answer is awesome work, helped me massively, thank you.
If you're anything like me and
don't have Visual Studio at the ready to compile his tool for yourself and/or
don't necessarily want to run an arbitrary executable file off of GitHub (no offence),
you can use Python (3, in this case) to accomplish the same thing.
For completeness' sake, I'll copy over the disclaimer:
Note — Using an undocumented API is unsupported by Microsoft. This method may break in future Windows releases. It can be used for personal tinkering but I would not suggest using it in any actual production projects.
Please also note, that the following is just basic proof-of-concept code!
Getting the currently active Byte Sequence:
import ctypes
output_buffer = ctypes.create_string_buffer(b"",16)
ctypes.windll.powrprof.PowerGetEffectiveOverlayScheme(output_buffer)
print("Current Effective Byte Sequence: " + output_buffer.value.hex())
ctypes.windll.powrprof.PowerGetActualOverlayScheme(output_buffer)
print("Current Actual Byte Sequence: " + output_buffer.value.hex())
On my system, this results in the following values:
Mode
Byte Sequence
Better Battery
77c71c9647259d4f81747d86181b8a7a
Better Performance
00000000000000000000000000000000
Best Performance
b574d5dea045424f873746345c09c238
Apparently Aaron's and my system share the same peculiarity, where the "Better Performance" Byte Sequence is just all zeros (as opposed to the "expected" value of 3af9B8d9-7c97-431d-ad78-34a8bfea439f).
Please note, that the Byte Sequence 77c71c9647259d4f81747d86181b8a7a is equivalent to the GUID 961cc777-2547-4f9d-8174-7d86181b8a7a and b574d5dea045424f873746345c09c238 represents ded574b5-45a0-4f42-8737-46345c09c238.
This stems from the the fact that GUIDs are written down differently than how they're actually represented in memory. (If we assume a GUID's bytes to be written as ABCD-EF-GH-IJ-KLMN its Byte Sequence representation ends up being DCBAFEHGIJKLMN). See https://stackoverflow.com/a/6953207 (particularly the pararaph and table under "Binary encodings could differ") and/or https://uuid.ramsey.dev/en/latest/nonstandard/guid.html if you want to know more.
Setting a value (for "Better Battery" in this example) works as follows:
import ctypes
modes = {
"better_battery": "77c71c9647259d4f81747d86181b8a7a",
"better_performance": "00000000000000000000000000000000",
"best_performance": "b574d5dea045424f873746345c09c238"
}
ctypes.windll.powrprof.PowerSetActiveOverlayScheme(bytes.fromhex(modes["better_battery"]))
For me, this was a nice opportunity to experiment with Python's ctypes :).
Here is a PowerShell version that sets up a scheduled task to toggle the power overlay every minute. It is based off the godsend answers of Michael and Aaron.
The CPU throttling issue has plagued me on multiple Lenovo X1 Yoga laptops (Gen2 and Gen4 models).
# Toggle power mode away from and then back to effective overlay
$togglePowerOverlay = {
$function = #'
[DllImport("powrprof.dll", EntryPoint="PowerSetActiveOverlayScheme")]
public static extern int PowerSetActiveOverlayScheme(Guid OverlaySchemeGuid);
[DllImport("powrprof.dll", EntryPoint="PowerGetActualOverlayScheme")]
public static extern int PowerGetActualOverlayScheme(out Guid ActualOverlayGuid);
[DllImport("powrprof.dll", EntryPoint="PowerGetEffectiveOverlayScheme")]
public static extern int PowerGetEffectiveOverlayScheme(out Guid EffectiveOverlayGuid);
'#
$power = Add-Type -MemberDefinition $function -Name "Power" -PassThru -Namespace System.Runtime.InteropServices
$modes = #{
"better_battery" = [guid] "961cc777-2547-4f9d-8174-7d86181b8a7a";
"better_performance" = [guid] "00000000000000000000000000000000";
"best_performance" = [guid] "ded574b5-45a0-4f42-8737-46345c09c238"
}
$actualOverlayGuid = [Guid]::NewGuid()
$ret = $power::PowerGetActualOverlayScheme([ref]$actualOverlayGuid)
if ($ret -eq 0) {
"Actual power overlay scheme: $($($modes.GetEnumerator()|where {$_.value -eq $actualOverlayGuid}).Key)." | Write-Host
}
$effectiveOverlayGuid = [Guid]::NewGuid()
$ret = $power::PowerGetEffectiveOverlayScheme([ref]$effectiveOverlayGuid)
if ($ret -eq 0) {
"Effective power overlay scheme: $($($modes.GetEnumerator() | where { $_.value -eq $effectiveOverlayGuid }).Key)." | Write-Host
$toggleOverlayGuid = if ($effectiveOverlayGuid -ne $modes["best_performance"]) { $modes["best_performance"] } else { $modes["better_performance"] }
# Toggle Power Mode
$ret = $power::PowerSetActiveOverlayScheme($toggleOverlayGuid)
if ($ret -eq 0) {
"Toggled power overlay scheme to: $($($modes.GetEnumerator()| where { $_.value -eq $toggleOverlayGuid }).Key)." | Write-Host
}
$ret = $power::PowerSetActiveOverlayScheme($effectiveOverlayGuid)
if ($ret -eq 0) {
"Toggled power overlay scheme back to: $($($modes.GetEnumerator()|where {$_.value -eq $effectiveOverlayGuid }).Key)." | Write-Host
}
}
else {
"Failed to toggle active power overlay scheme." | Write-Host
}
}
# Execute the above
& $togglePowerOverlay
Create a scheduled job that runs the above script every minute:
Note that Register-ScheduledJob only works with Windows PowerShell, not PowerShell Core
I couldn't get the job to start without using the System principal. Otherwise gets stuck indefinitely in Task Scheduler with "The task has not run yet. (0x41303)".
Get-Job will show the job in Windows PowerShell, but Receive-Job doesn't return anything even though there is job output in dir $env:UserProfile\AppData\Local\Microsoft\Windows\PowerShell\ScheduledJobs$taskName\Output. This might be due to running as System while trying to Receive-Job as another user?
I wish -MaxResultCount 0 was supported to hide the job in Get-Job, but alas it is not.
You can see the task in Windows Task Scheduler under Task Scheduler Library path \Microsoft\Windows\PowerShell\ScheduledJobs
It was necessary to have two script blocks, one as command and one as arguments (that gets serialized/deserialized as a string) because PowerShell script blocks use dynamic closures instead of lexical closures and thus referencing one script block from another when creating a new runspace is not readily possible.
The min interval for scheduled tasks is 1 minute. If it turns out that more frequent toggling is needed, might just add a loop in the toggling code and schedule the task only for startup or login.
$registerJob = {
param($script)
$taskName = "FixCpuThrottling"
Unregister-ScheduledJob -Name $taskName -ErrorAction Ignore
$job = Register-ScheduledJob -Name $taskName -ScriptBlock $([scriptblock]::create($script)) -RunEvery $([TimeSpan]::FromMinutes(1)) -MaxResultCount 1
$psSobsSchedulerPath = "\Microsoft\Windows\PowerShell\ScheduledJobs";
$principal = New-ScheduledTaskPrincipal -UserId SYSTEM -LogonType ServiceAccount
$someResult = Set-ScheduledTask -TaskPath $psSobsSchedulerPath -TaskName $taskName -Principal $principal
}
# Run as Administrator needed in order to call Register-ScheduledJob
powershell.exe -command $registerJob -args $togglePowerOverlay
To stop and remove the scheduled job (must use Windows PowerShell):
$taskName = "FixCpuThrottling"
Unregister-ScheduledJob -Name $taskName-ErrorAction Ignore

So, what exactly is the deal with QSharedMemory on application crash?

When a Qt application that uses QSharedMemory crashes, some memory handles are left stuck in the system.
The "recommended" way to get rid of them is to
if(memory.attach(QSharedMemory::ReadWrite))
memory.detach();
bool created = memory.create(dataSize, QSharedMemory::ReadWrite);
In theory the above code should work like this:
We attach to a left over piece of sh...ared memory, detach from it, it detects that we are the last living user and gracefully goes down.
Except... that is not what happens in a lot of cases. What I actually see happening, a lot, is this:
// fails with memory.error() = SharedMemoryError::NotFound
memory.attach(QSharedMemory::ReadWrite);
// fails with "segment already exists" .. wait, what?! (see above)
bool created = memory.create(dataSize, QSharedMemory::ReadWrite);
The only somewhat working way I've found for me to work around this is to write a pid file on application startup containing the pid of the currently running app.
The next time the same app is run it picks up this file and does
//QProcess::make sure that PID is not reused by another app at the moment
//the output of the command below should be empty
ps -p $previouspid -o comm=
//QProcess::(runs this script, reads output)
ipcs -m -p | grep $user | grep $previouspid | sed "s/ / /g" | cut -f1 -d " "
//QProcess::(passes the result of the previous script to clean up stuff)
ipcrm -m $1
Now, I can see the problems with such approach myself, but it is the only thing that works
The question is: can someone explain to me what exactly is the deal with not so not existing memory in the first piece of code above and how to deal with it properly?

OCAMLRUNPARAM does not affect stack size

I would like to change my stack size to allow a project with many non-tail-recursive functions to run on larger data. To do so, I tried to set OCAMLRUNPARAM="l=xxx" for varying values of xxx (in the range 0 through 10G), but it did not have any effect. Is setting OCAMLRUNPARAM even the right approach?
In case it is relevant: The project I am interested in is built using OCamlMakefile, target native-code.
Here is a minimal example where simply a large list is created without tail recursion. To quickly check whether the setting of OCAMLRUNPARAM has an effect, I compiled the program stacktest.ml:
let rec create l =
match l with
| 0 -> []
| _ -> "00"::(create (l-1))
let l = create (int_of_string (Sys.argv.(1)))
let _ = print_endline("List of size " ^ string_of_int (List.length l) ^ " created.")
using the command
ocamlbuild stacktest.native
and found out roughly at which length of the list a stack overflow occurs by (more or less) binary search with the following bash script foo.sh:
#!/bin/bash
export OCAMLRUNPARAM="l=$1"
increment=1000000
length=1
while [[ $increment > 0 ]] ; do
while [[ $(./stacktest.native $length) ]]; do
length=$(($length+$increment))
done
length=$(($length-$increment))
increment=$(($increment/2))
length=$(($length+$increment))
done
length=$(($length-$increment))
echo "Largest list without overflow: $length"
echo $OCAMLRUNPARAM
The results vary between runs of this script (and the intermediate results are not even consistent within one run, but let's ignore that for now), but they are similar no matter whether I call
bash foo.sh 1
or
bash foo.sh 1G
i.e. whether the stack size is set to 1 or 2^30 words.
Changing the stack limit via OCAMLRUNPARAM works only for bytecode executables, that are run by the OCaml interpreter. A native program is handled by an operating system and executed directly on CPU. Thus, in order to change the stack limit, you need to use facilities, provided by your operating system.
For example, on Linux there is the ulimit command that handles many process parameters, including the stack limit. Add the following to your script
ulimit -s $1
And you will see that the result is changing.

How to find unreferenced classes in a codebase

We're in a period of development where there are a lot of code that is created which may be short-lived, as it's effectively scaffolding which at some point gets replaced with something else, but will often continue to exist and be forgotten about.
Are there any good techniques for finding the classes in a codebase that aren't used? Obviously there will be many false positives (eg library classes: you might not be using all the standard containers, but you want to know they're there), but if they were listed by directory then it may make it easier to see at a glance.
I could write a script that greps for all class XXX then searches again for all instances, but has to omit results for the cpp file that the class's methods were defined in. This would also be incredibly slow - O(N^2) for the number of classes in the codebase
Code coverage tools aren't really an option here as this is has a GUI that can't have all functions easily invoked programmatically.
Platforms are Visual Studio 2013 or Xcode/clang
EDIT: I don't believe this to be a duplicate of the dead code question. Although there is an overlap, identifying dead or unreachable code isn't quite the same as finding unreferenced classes.
If you're on linux, then you can use g++ to help you with this.
I'm going to assume that only when an instance of the class is created will we consider it as being used. Therefore, rather than looking just for the name of the class you could look for calls to the constructors.
struct A
{
A () { }
};
struct B
{
B () { }
};
struct C
{
C () { }
};
void bar ()
{
C c;
}
int main ()
{
B b;
}
On linux at least, running nm on the binary has the following mangled names:
00000000004005bc T _Z3barv
00000000004005ee W _ZN1BC1Ev
00000000004005ee W _ZN1BC2Ev
00000000004005f8 W _ZN1CC1Ev
00000000004005f8 W _ZN1CC2Ev
Immediately we can tell that none of the constructors for 'A' are called.
Using slightly modified information from this SO answer we can also get g++ to remove function call graphs that are not used:
Which results in:
00000000004005ba W _ZN1BC1Ev
00000000004005ba W _ZN1BC2Ev
So, on linux at least, you can tell that neither A nor C is required in the final executable.
I've come up with a simple shell script that will at least help to focus attention on the classes that are referenced the least. I've made the assumption that if a class isn't used then it's name will still appear in one or two files (declaration in the header and definition in the cpp file). So the script uses ctags to search for class declarations in a source directory. Then for each class it does a recursive grep to find all the files that mention the class (note: you can specify different class and usage directories), and finally it writes the file counts and class names to a file and displays them in numerical order. You can then review all the entries that only had 1 or 2 mentions.
#!/bin/bash
CLASSDIR=${1:-}
USAGEDIR=${2:-}
if [ "${CLASSDIR}" = "" -o "${USAGEDIR}" = "" ]; then
echo "Usage: find_unreferenced_classes.sh <classdir> <usagedir>"
exit 1
fi
ctags --recurse=yes --languages=c++ --c++-kinds=c -x $CLASSDIR | awk '{print $1}' | uniq > tags
[ -f "counts" ] && rm counts
for class in `cat tags`;
do
count=`grep -l -r $class $USAGEDIR --include=*.h --include=*.cpp | wc -l`
echo "$count $class" >> counts
done
sort -n counts
Sample output:
1 SomeUnusedClassDefinedInHeader
2 SomeUnusedClassDefinedAndDeclaredInHAndCppFile
10 SomeClassUsedLots

BASH scripts for generating inputs to parallel C++ jobs

I'm an amateur C++ programmer trying to learn about basic shell scripting. I have a complex C++ program that currently reads in different parameter values from Parameters.h and then executes one or more simulations with each parameter value sequentially. These simulations take a long time to run. Since I have a cluster available, I'd like to effectively parallelize this job, running the simulations for each parameter value on a separate processor. I'm assuming it's easier to learn shell scripting techniques for this purpose than OpenMPI. My cluster runs on the LSF platform.
How can I write my input parameters in Bash so that they are distributed among multiple processors, each executing the program with that value? I'd like to avoid interactive submission. Ideally, I'd have the inputs in a text file that Bash reads, and I'd be passing two parameters to each job: an actual parameter value and a parameter ID.
Thanks in advance for any leads and suggestions.
my solution
GNU Parallel does look slick, but I ended up (with the help of an IT admin) writing a simple bash script that echos to screen three inputs (a treatment identifier, treatment/parameter value, and a simulation identifier):
#!/bin/bash
j=1
for treatment in cat treatments.txt; do
for experiment in cat simulations.txt; do
bsub -oo tr_${j}_sim_${experiment}_screen -eo tr_${j}_sim_${experiment}_err -q short_serial "echo \"$j $treatment $experiment\" | ./a.out"
done
let j=$j+1
done
The file treatments.txt contains a list of the values I'd like to vary, simulations.txt contains a list of all the simulation identifiers I'd like to run (currently just 1,...,s, where s is the total number of simulations I want for each treatment), and the treatments are indexed 1...j.
Maybe check out: http://www.gnu.org/software/parallel/
edit:
Or, check out the -P argument to xargs, example:
time echo {1..5} | xargs -n 1 -P 5 sleep
Say you want to run the program simulate with inputs foo, bar, baz and quux in parallel, then the simplest way is:
inputs="foo bar baz quux"
# Launch processes in the background with &
children=""
for x in $inputs; do
simulate "$x" > "$x.output" &
$children = "$children $!"
done
# Wait for each to finish
for $pid in $children; do
wait $pid
done
for x in $inputs; do
echo "simulate '$x' gave:"
cat "$x.output"
rm -f "$x.output"
done
The problem is that all simulations are launched at the same time, so if your number of inputs is much larger than your number of CPUs/cores, they may swamp the system.
My best stab at this is you background multiple instances of your program and let the OS's scheduler take over to put them on different processors. AFAIK there is no way in any shell to specify which processor a given process should run on.
Something to the effect of:
#!/bin/sh
for arg in foo bar baz; do
./your_program "$arg" &
done