llvm: ScalarEvolution: Difference between tripcount and tripmultiple

llvm: ScalarEvolution: Difference between tripcount and tripmultiple - llvm

I'm currently working with llvm and I'm concentrating on loops: as I wanted to unroll loops I found out that llvm already provides a function that unrolls loops. That functions expects the following arguments:
a count that determines the unroll count (how many times the loop body will exist after unrolling)
a tripcount which determines how many times the loop will execute (before being unrolled). Or to be exact (taken from the documentation of the getSmallConstantTripCount function in the ScalarEvolution class):
[...] it is the number of times that control may reach ExitingBlock before taking the branch. For loops with multiple exits, it may not be the number times that the loop header executes if the loop exits prematurely via another branch
a tripmultiple which - according to the documentation of the getSmallConstantTripMultiple function in the ScalarEvolution class - is
[...] the largest constant divisor of the trip count of this loop [...]
some other arguments that do not matter for this question.
The tripcount and tripmultiple value can be obtained from the ScalarEvolution class using the already mentioned functions. My pass currently uses those values but when I started testing the pass on several loops it seems like both values are always equal (when I started using breaks to create early exits in the CFG llvm could not determine any of those values and always returned "default" values).
My questions are now: what exactly is the difference between those two values? And under which conditions are these values different (some code example would be very usefull)? And could it happen that llvm ScalarEvolution pass cannot compute the tripcount but can determine the tripmultiple? And if so some code would be very helpfull as I currently cannot image such a situation.

Related

Can I make OpenMP to revert to ideal # of threads after using omp_set_num_threads?

Is there a way to make OpenMP revert the number of threads (for the next time it's used) back to the default after the application has already called omp_set_num_threads() with a specific number?
For example, is there a special code (e.g. 0 or -1) I supply to omp_set_num_threads?
Or should I just try doing something like omp_set_num_threads(omp_get_max_threads())?
I am making the assumption that the default number is whatever the implementation of OpenMP deems as "optimal". But I don't know what, if anything, the default is guaranteed to be or even what it should be. All I know is that I have an application that calls omp_set_num_threads(4) for one specific OpenMP block which I must not edit (for now). But I'd like to prevent that one setting from affecting other OpenMP blocks in my code.

I've had this problem before. (Disclaimer: I work with MSVC, which currently only implements the OpenMP 2.0 standard). To the best of my knowledge, there is nothing in the OpenMP 2.0 standard that allows you to find out this default value. omp_get_max_threads() is not required to return it (all subsequent emphasis mine):
The omp_get_max_threads function returns an integer that is guaranteed to be at least as large as the number of threads that would be used to form a team if a parallel region without a num_threads clause were to be encountered at that point in the code.
In other words, it might return a number that is larger than the currently set (or default) value.
There is no special value for omp_set_num_threads either:
The omp_set_num_threads function sets the default number of threads to use for subsequent parallel regions that do not specify a num_threads clause. [...] The value of the parameter num_threads must be a positive integer.
And if you get it wrong, it's up to the implementation what will happen:
If a parallel region is encountered while dynamic adjustment of the number of threads is disabled, and the number of threads requested for the parallel region exceeds the number that the run-time system can supply, the behavior of the program is implementation-defined. An implementation may, for example, interrupt the execution of the program, or it may serialize the parallel region.
You might find more precise (and less unsettling) information in the documentation of your OpenMP implementation. However, in the case of MSVC, that documentation is just a verbatim copy of the OpenMP 2.0 standard...
Since you are in the business of modifying the number of threads this way, I would like to preemptively caution about the interaction of omp_set_dynamic with omp_get_num_threads within MSVC:
Why does omp_set_dynamic(1) never adjust the number of threads (in Visual C++)?

Count number of instructions of various types using LLVM

I'm a new user to the LLVM Compiler Infrastructure. I've gone through the LLVM Programmer's Manual Documentation and understood how to iterate over basic blocks. I wanted to know whether there are any predefined passes for counting instructions. I understand there is instcount, but that returns the total number of instructions. I'm targeting primarily integer and floating point operations. Also what should I do in cases where there are operands of different types in an expression?

The InstCount pass already has a separate counter for each instruction type, in addition to a total instruction count. For example, the number of add instructions will be stored in the NumAddInst statistic variable. You can use that pass or reuse some of its code.

How to write unit test for VIs that contain "Tick Count (ms)" LabVIEW function?

There is a VI that its outputs (indicators) depend not only on the inputs but also on the values of "Tick Count" functions. The problem is that it does not produce the same output for the same inputs. each time that I run it, it gives different outputs. so the unit test that only captures inputs and outputs would fail. So the question is how to write a unit test for this situation?
I cannot include the VI in the question as it contains several subVIs and the "tick count" functions are spread through all level of its subVIs.
EDIT1: I wrote a wrapper that subtracts the output values of two consecutive runs in order to eliminate the base reference time (which is undefined in this function) but it spoils the outputs.

I think you have been given a very difficult task, as the function you've been asked to test is non-deterministic and it is challenging to write unit tests against non-deterministic code.
There are some ways to test non-deterministic functions: for example one could test that a random number generator produced values uniformly distributed to some tolerance, or test that a clock-setting function matched an ntp server to some tolerance. But I think you're team will be happier if you can make the underlying code deterministic.
Your idea to use conditional disable is good, but I would add the additional step of creating a wrapper VI and then search and replace all native Tick Count with it. This way you can do any modifications to Tick Count in one place. If for some reason the code actually uses the tick count for something other than profiling (for example, it is being used to seed a pseudorandom number generator) you can have your "test/debug" case that read from a Notifier that you are injecting a set of fake tick counts into from your testing code. Notifiers work great for something like this.

You could add an (optional input) that allows you to override the tick count value. Give it a default value of -1, and in the VI you can use the tick count value if it's input is -1.
However I have never seen code relying on tick count.

Interpreting GPerfTools sample count

I'm struggling a little with reading the textual output the GPerfTools generate. I think part of the problem is that I don't fully understand how the sampling method operates.
From Wikipedia I gather that profilers based on sample functions usually work by sending an interrupt to the OS and querying the program's current instruction pointer. Now my knowledge about assembly is a little rusty, so I'm wondering what it means if the instruction pointer points to method m at any given time? I.e. does it mean that the function is about to be called or does it mean it's currently executed, or both?
There's a difference if I'm not mistaken, because in the first case the sample count (i.e. times m is seen while taking a sample) translates to the absolute call count of m, while in the latter case it simply translates to times seen, i.e. a mere indication of relative time spent in this method.
Can someone clarify?

Performance question on nested if's

is there be any performance effect on "Lines of code - (C)" running inside nested ifs?
if (condition_1)
{
/* Lines of code */ - (A)
if (condition_2)
{
/* Lines of code */ - (B)
if (condition_n)
{
/* Lines of code */ - (C)
}
}
}
Does that mean you can nest any number of if statements without effecting the execution time for the code enclosing at the end of last if statement?

Remember C and C++ are translated to their assembly equivalents. In most cases, this is likely to be via some form of compare (e.g. cmp) and some form of jmp instruction.
As such, whatever code is generated from (C) will still be the same. The if nesting has no bearing on the output. If the resultant code is to generate add eax, 1 no matter how many ifs precede that, it will still be the same thing.
The only performance penalty will be in the number of if statements you use and whether or not the resultant assembly (jxx) is expensive on your system. However, I doubt that repeated nested use of if is likely to be a performance bottleneck in your application. Usually, it is time required to process data and or time required to get data.

You won't affect the execution time of the indicated code itself, but if evaluating your conditions is complex, or affected by other factors, then it could potentially lengthen the total time of execution.

The code will run as fast as if it was outside.
Just remember that evaluating an expression (in a if statement) is not "free" and will take a bit of time (more if the condition is more complex), so if your code is deeply nested it will take more time to reach it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

llvm: ScalarEvolution: Difference between tripcount and tripmultiple - llvm

Related

Can I make OpenMP to revert to ideal # of threads after using omp_set_num_threads?

Count number of instructions of various types using LLVM

How to write unit test for VIs that contain "Tick Count (ms)" LabVIEW function?

Interpreting GPerfTools sample count

Performance question on nested if's

Categories

Resources