Why local arrays in functions seem to prevent TCO?

Why local arrays in functions seem to prevent TCO? - c++

Looks like having a local array in your function prevents tail-call optimization on it on all compilers I've checked it on:
int foo(int*);
int tco_test() {
// int arr[5]={1, 2, 3, 4, 5}; // <-- variant 1
// int* arr = new int[5]; // <-- variant 2
int x = foo(arr);
return x > 0 ? tco_test() : x;
}
When variant 1 is active, there is a true call to tco_test() in the end (gcc tries to do some unrolling before, but it still calls the function in the end). Variant 2 does TCO as expected.
Is there something in local arrays which make it impossible to optimize tail calls?

If the compiler sill performed TCO, then all of the external foo(arr) calls would receive the same pointer. That's a visible semantics change, and thus no longer a pure optimization.
The fact that the local variable in question is an array is probably a red herring here; it is its visibility to the outside via a pointer that is important.
Consider this program:
#include <stdio.h>
int *valptr[7], **curptr = valptr, **endptr = valptr + 7;
void reset(void)
{
curptr = valptr;
}
int record(int *ptr)
{
if (curptr >= endptr)
return 1;
*curptr++ = ptr;
return 0;
}
int tally(void)
{
int **pp;
int count = 0;
for (pp = valptr; pp < curptr; pp++)
count += **pp;
return count;
}
int tail_function(int x)
{
return record(&x) ? tally() : tail_function(x + 1);
}
int main(void)
{
printf("tail_function(0) = %d\n", tail_function(0));
return 0;
}
As the tail_function recurses, which it does via a tail call, the record function records the addresses of different instances of the local variable x. When it runs out of room, it returns 1, and that triggers tail_function to call tally and return. tally sweeps through the recorded memory locations and adds their values.
If tally were subject to TCO, then there would just be one instance of x. Effectively, it would be this:
int tail_function(int x)
{
tail:
if (record(&x))
return tally();
x = x + 1;
goto tail;
}
And so now, record is recording the same location over and over again, causing tally to calculate an incorrect value instead of the expected 21.
The logic of record and tally depends on x being actually instantiated on each activation of the scope, and that outer activations of the scope have a lifetime which endures until the inner ones terminate. That requirement precludes tail_function from recursing in constant space; it must allocate separate x instances.

Related

Problems implementing recursive best-first search in C++ based on Korf 1992

I am having two main issues implementing the algorithm described in this article in C++: properly terminating the algorithm and freeing up dynamically allocated memory without running into a seg fault.
Here is the pseudocode provided in the article:
RBFS (node: N, value: V, bound: B)
IF f(N)>B, return f(N)
IF N is a goal, EXIT algorithm
IF N has no children, RETURN infinity
FOR each child Ni of N,
IF f(N) < V AND f(Ni) < V THEN F[i] := V
ELSE F[i] := f(Ni)
sort Ni and F[i] in increasing order of F[i]
IF only one child, F[2] := infinity
WHILE (F[1] <= B)
F[1] := RBFS(N1, F[1], MIN(B, F[2]))
insert N1 and F[1] in sorted order
return F[1]
Here, f(Ni) refers to the "computed" function value, whereas F[i] refers to the currently stored value of f(Ni).
Here is my C++ implementation, in which I had to use a global variable to keep track of whether the goal had been reached or not (note, I am trying to maximize my f(n) value as opposed to minimizing, so I reversed inequalities, orders, min/max values, etc.):
bool goal_found = false;
bool state_cmp(FlowState *lhs, FlowState *rhs)
{
return (lhs->value > rhs->value);
}
int _rbfs(FlowState *state, int value, int bound)
{
if (state->value < bound) // Returning if the state value is less than bound
{
int value = state->value;
delete state;
return value;
}
if (state->is_goal()) // Check if the goal has been reached
{
cout << "Solved the puzzle!" << endl;
goal_found = true; // Modify the global variable to exit the recursion
return state->value;
}
vector<FlowState*> children = state->children();
if (children.empty())
{
//delete state; // Deleting this state seems to result in a corrupted state elsewhere
return INT_MIN;
}
int n = 0; // Count the number of children
for (const auto& child: children)
{
if (state->value < value && child->value < value)
child->value = value;
else
child->update_value(); // Equivalent of setting stored value to static value (F[i] := f(Ni))
++n;
}
sort(children.begin(), children.end(), state_cmp);
while (children.front()->value >= bound && !goal_found)
{// Loop depends on the global goal_found variable since this is where the recursive calls happen
if (children.size() < 2)
children.front()->set_value(_rbfs(children.front(), children.front()->value, bound));
else
children.front()->set_value(_rbfs(children.front(), children.front()->value, max(children[1]->value, bound)));
}
// Free children except the front
int i;
for (i = 1; i < n; ++i)
delete children[i];
state->child = children.front(); // Records the path
return state->child->value;
}
void rbfs(FlowState* initial_state)
{
// This is the actual function I invoke to call the algorithm
_rbfs(initial_state, initial_state->get_value(), INT_MIN);
print_path(initial_state);
}
My main questions are:
Is there a way to terminate this function than having to use a global variable (bool goal_reached) without a complete re-implementation? Recursive algorithms usually have some kind of base-case to terminate the function, but I am not seeing an obvious way of doing that.
I can't seem to delete the dead-end state (when the state has no children) without running into a segmentation fault, but not deleting it results in unfreed memory (each state object was dynamically allocated). How can I modify this code to ensure that I've freed all of the states that pass through it?
I ran the program with gdb to see what was going on, and it appears that after deleting the dead-end state, the next state that is recursively called is not actually NULL, but appears to be corrupted. It has an address, but the data it contains is all junk. Not deleting that node lets the program terminate just fine, but then many states aren't getting freed. In addition, I had originally used the classical, iterative best-first search (but it takes up far too much memory for my case, and is much slower), and in that case, all dynamically allocated states were properly freed so the issue is in this code somewhere (and yes, I am freeing each of the states on the path in main() after calling rbfs).

In your code, you have
children.front()->set_value(_rbfs(children.front(), ...
where state inside of _rbfs is thus children.front().
And in _rbfs, you sometimes delete state. So children.front() can be deleted and then called with ->set_value. There's your problem.
Is there any reason why you calling delete at all?

How to speed up a function that returns a pointer to object in c++?

I am a mechanical engineer so please understand I am not trained in proper coding. I have a finite element code that uses grids to make elements which make a model. The element is not important to this question so I have left it out. The elements and grids are read in from a file and that part works.
class Grid
{
private:
int id;
double x;
double y;
double z;
public:
Grid();
Grid(int, double, double, double);
int get_id() { return id;};
};
Grid::Grid() {};
Grid::Grid(int t_id, double t_x, double t_y double t_z)
{
id = t_id; x = t_x; y = t_y; z = t_z;
}
class SurfaceModel
{
private:
Grid** grids;
Element** elements;
int grid_count;
int elem_count;
public:
SurfaceModel();
SurfaceModel(int, int);
~SurfaceModel();
void read_grid(std::string);
int get_grid_count() { return grid_count; };
Grid* get_grid(int);
};
SurfaceModel::SurfaceModel()
{
grids = NULL;
elements = NULL;
}
SurfaceModel::SurfaceModel(int g, int e)
{
grids = new Grid*[g];
for (int i = 0; i < g; i++)
grids[i] = NULL;
elements = new Element*[e];
for (int i = 0; i < e; i++)
elements[i] = NULL;
}
void SurfaceModel::read_grid(std::string line)
{
... blah blah ...
grids[index] = new Grid(n_id, n_x, n_y, n_z);
... blah blah ....
}
Grid* SurfaceModel::get_grid(int i)
{
if (i < grid_count)
return grids[i];
else
return NULL;
}
When I need to actually use the grid I use the get_grid maybe something like this:
SurfaceModel model(...);
.... blah blah .....
for (int i = 0; i < model.get_grid_count(); i++)
{
Grid *cur_grid = model.get_grid(i);
int cur_id = cur_grid->get_id();
}
My problem is that the call to get_grid seems to be taking more time than I think it should to simply return my object. I have run the gprof on the code and found that get_grid gets called about 4 billion times when going through a very large simulation and another operation using the x, y, z occurs about the same. The operation does some multiplication. What I found is that the get_grid and math take about the same amount of time (~40 seconds). This seems like I have done something wrong. Is there a faster way to get that object out of there?

I think you're forgetting to set grid_count and elem_count.
This means, they will have uninitialized (indeterminate) values. If you loop for those values, you can easily end up looping a lot of iterations.
SurfaceModel::SurfaceModel()
: grid_count(0),
grids(NULL),
elem_count(0),
elements(NULL)
{
}
SurfaceModel::SurfaceModel(int g, int e)
: grid_count(g),
elem_count(e)
{
grids = new Grid*[g];
for (int i = 0; i < g; i++)
grids[i] = NULL;
elements = new Element*[e];
for (int i = 0; i < e; i++)
elements[i] = NULL;
}
Howeverm, I suggest you would want to get rid of each instance of new in this program (and use a vector for the grid)

On a modern CPU accessing memory often takes longer than doing multiplication. Getting good performance on modern systems can often mean focusing more on optimizing memory accesses than optimizing computation. Because you are storing your grid objects as an array of dynamically allocated pointers the grid objects themselves will be stored non-contiguously in memory and you will likely get many cache misses when trying to access them. In this example you would probably see a significant speedup by storing your grid objects directly in an array or vector since you will be accessing contiguous memory in your loop and so get good cache utilization and effective hardware prefetching.

4 billion times a microsecond (which is a pretty acceptable time in many cases) gives 4 000 seconds. And since you only get about 40 s (if I get it right), I doubt there's something seriously wrong here. If it's still slow for the task, I'd consider the use of parallel computing.

Merge sort algorithm advice

I have made a program that sorts a list using a merge sort algorithm.
The problem is that I think it should work but it is not working, the merge function returns a array that was sent as parameter. Can you please see the code I wrote and tell me what is wrong, and how it can be improved.
Thanks
void merge_sort(int *niz, int low, int medium, int high) {
int *niz2 = new int[high-low];
int bottom = low;
int top = medium + 1;
for (int f1=low; f1<high-low; f1++) {
if (low > medium) {
niz2[f1] = niz[top++];
}
else if (top > high) {
niz2[f1] = niz[bottom++];
}
else if (niz[bottom] < niz[top]) {
niz2[f1] = niz[bottom++];
}
else {
niz2[f1] = niz[top++];
}
}
niz = niz2;
}
void merge(int *niz, int low, int high) {
if (low < high) {
int medium = (high+low)/2;
merge(niz, low, medium);
merge(niz, medium+1, high);
merge_sort(niz, low, medium, high);
}
}
The output of program:
3 5 2 3 4 9 5 2 7 10
3 5 2 3 4 9 5 2 7 10

You are passing the pointer by value, so the value you assign to niz inside function is not visible in caller function.
Your signatures should be
void merge(int niz[], int low, int medium, int high) and
void merge_sort(int niz[], int low, int high).
In merge which you have named merge_sort, at the bottom, then you should copy back the contents from niz2 to niz, instead of niz = niz2.
*EDIT - *
Also you have got the merge function wrong (which you have named merge_sort). If say you call the function with low = 100, medium = 120, high = 140.
Then for (int f1=low; f1<high-low; f1++) would never loop.
It should be for (int f1=0; f1<high-low; f1++). One another consequence of above mistake is SIGSGV, because you would be accessing, niz2 out of bounds (for the given example).

I think there are lots of errors in this code but the biggest one is
niz = niz2;
Here you are trying to copy the niz2 array back to niz, but it doesn't do that, it just copies the pointer.

I see that you are trying to assign the contents of niz2 to niz via niz = niz2;
. This is incorrect since niz is a pass-by-value pointer, and niz2 is a local pointer that points to an array.
If you want to copy niz2 to niz you either need a loop like for (int i = low; i < high; i++) niz[i] = niz2[i], use an api function like memcpy to overwrite the input array, or if you're trying to redirect the int* niz pointer to use the newly created niz2 array, then you need to pass the input as a pointer-to-pointer then modify it directly, eg merge_sort(int** niz, int low...) and call it via merge_sort(&niz, 0, 20);. Not that if you do modify the input pointer to point to a new array, you should delete the old one first, eg delete [] *niz; *niz = niz2;
The statement niz = niz2; copies the address pointed to by niz2 over the temporary niz pointer in the parameter list. When you pass a pointer to a function that recieves an int* (such as foo(int* nPtr);), the pointer you send is copied into the temporary variable/pointer nPtr. Using foo(int** nPtr); tells it that it's working with the 'address of a pointer to and int', not just the 'address of an int'. In this case you can redirect nPtr via a statement like *nPtr = &tmpInt or *nPtr = tmpPtr. To get the the actual int or data at the source, you'd use tmpInt = **nPtr.

C++ int array pointers recursively to find prime factors

I am trying to make a function that can return the prime factors of a given number in an array (or multi-set, but I'm trying to use an array).
For example, if I put in 12, I want to get 2, 2, and 3, not 2, and 3 like with a set. This is so that I can use these to see if it is a Smith number or not, so I need the numbers seperately.
Also, I am taking a recursive approach.
I have tried (to no avail) to return the array many ways, including passing an initial pointer into the code which points to a space to store the array.
I've tried just initializing the array in the function and then returning it.
From what I can tell, I can get the array back from the base case iteration and then when trying to construct a new array with size oldArray+1 to copy values to, things get messy. This is where I get lost.
From what I've read, although this isn't the most efficient implementation, I should be able to make it work.
I have a function, nextPrime(int n), which given n will give back the next prime up from that number.
See source below:
int* find(int n, int p) {
int root = (int) floor(sqrt(n));
if (p > root) {
// Base case, array gets initialized and returned
// depending on value of n and p.
if (n > 1) {
factors = new int[1];
factors[0] = n;
return factors;
}
else {
factors = new int[0];
return factors;
}
}
else
if (n%p == 0){
// Inductive step if p is a factor
int newFloor = (int) floor(n/p);
factors = find(newFloor, p);
// Initialize new array.
int* newFactors;
newFactors = new int[(sizeof(factors) / sizeof(int)) + 1];
// Add p to first slot, fill rest with contents of factors.
factors[0] = p;
for (int i = 0; i < (sizeof(factors) / sizeof(int)); i++) {
newFactors[i+1] = factors[i];
}
return newFactors;
}
else {
// Inductive step p isn't a factor of n
factors = find(n, factors, nextPrime(p));
return factors;
}
}
As I say, the error is with returning the array and using its value, but why does it seem to return OK from the first iteration?

Something like this could work. Not terribly efficient !!
void FindFactors( int number , std::vector<int>& factors )
{
for ( int i = 2; i <= number; ++i )
{
if ( number % i == 0 )
{
factors.push_back( i );
FindFactors( number / i , factors);
break;
}
}
}
int main()
{
std::vector<int> factors;
FindFactors( 121 , factors );
return 0;
}
After you call the function factors will contain only the prime factors.

You should be using std::vector for this. The main problem you have is that a pointer to an array has no way of knowing the number of items the array contains. Concretely, the part where you say sizeof(factors) is wrong. As I understand, you're expecting that to give you the number of items in the array pointed to by factors, but it really gives you the number of bytes needed to store a pointer to int.
You should be either returning a vector<int> or passing it in as a reference and updating it each time you find a factor.

c++ type error message from compiler, what does it mean?

I'm using g++ on fedora linux 13.
I'm just practicing some exercises from my c++ textbook
and can't get this one program to compile. Here is the code:
double *MovieData::calcMed() {
double medianValue;
double *medValPtr = &medianValue;
*medValPtr = (sortArray[numStudents-1] / 2);
return medValPtr;
}
Here is the class declaration:
class MovieData
{
private:
int *students; // students points to int, will be dynamically allocated an array of integers.
int **sortArray; // A pointer that is pointing to an array of pointers.
double average; // Average movies seen by students.
double *median; // Median value of movies seen by students.
int *mode; // Mode value, or most frequent number of movies seen by students.
int numStudents; // Number of students in sample.
int totalMovies; // Total number of movies seen by all students in the sample.
double calcAvg(); // Method which calculates the average number of movies seen.
double *calcMed(); // Method that calculates the mean value of data.
int *calcMode(); // Method that calculates the mode of the data.
int calcTotalMovies(); // Method that calculates the total amount of movies seen.
void selectSort(); // Sort the Data using selection sort algorithm.
public:
MovieData(int num, int movies[]); // constructor
~MovieData(); // destructor
double getAvg() { return average; } // returns the average
double *getMed() { return median; } // returns the mean
int *getMode() { return mode; } // returns the mode
int getNumStudents() { return numStudents; } // returns the number of students in sample
};
Here is my constructor and destructor and selectSort():
MovieData::MovieData(int num, int movies[]) {
numStudents = num;
// Now I will allocate memory for student and sortArray:
if(num > 0) {
students = new int[num];
sortArray = new int*[num];
// The arrays will now be initialized:
for(int index = 0;index < numStudents;index++) {
students[index] = movies[index];
sortArray[index] = &students[index];
}
selectSort(); // sort the elements of sortArray[] that point to the elements of students.
totalMovies = calcTotalMovies();
average = calcAvg();
median = calcMed();
mode = calcMode();
}
}
// Destructor:
// Delete the memory allocated in the constructor.
MovieData::~MovieData() {
if(numStudents > 0) {
delete [] students;
students = 0;
delete [] sortArray;
sortArray = 0;
}
}
// selectSort()
// performs selection sort algorithm on sortArray[],
// an array of pointers. Sorted on the values its
// elements point to.
void MovieData::selectSort() {
int scan, minIndex;
int *minElement;
for(scan = 0;scan < (numStudents - 1);scan++) {
minIndex = scan;
minElement = sortArray[scan];
for(int index = 0;index < numStudents;index++) {
if(*(sortArray[index]) < *minElement) {
minElement = sortArray[index];
minIndex = index;
}
}
sortArray[minIndex] = sortArray[scan];
sortArray[scan] = minElement;
}
}
The compiler is giving this error:
moviedata.cpp: In memberfunction
'double * MovieData::calcMed()':
moviedata.cpp:82: error: invalid
operands of types 'int*' and 'double'
to binary 'operator/'
I'm not sure what to make of this error, i've tried static casting the types with no luck, what does this error message mean?

you are trying to divide a pointer by a double, which the compiler is saying it does not know how todo.
sortArray is probably defined by
int ** sortArray;
its also worth noting you are returning a pointer to a stack variable, who's value will be undefined as soon as you return out of the function.

sortArray[numStudents - 1] is a pointer to int, which can't be on the left side of a division (when you remember pointers are addresses, this makes sense). If you post more of your code, we can help you correct it.
Perhaps you want something like:
int *MovieData::calcMed() {
return sortArray[(numStudents - 1) / 2];
}
This returns the middle element in your array, which should be a pointer to the middle student. I'm not clear why you're sorting lists of pointers (not the actual values), or why you're returning a pointer here. The return value + 1 will be a pointer to the next value in students, which is not the next greater value numerically. So you might as well return the actual student (int from students). If you do this, you can also average the two middle elements when the count is even (this rule is part of the typical median algorithm).
Note that I changed the return type to int *, the type of sortArray's elements. Also, your comment is incorrect. This is the median, not the mean.
Also, your selection sort is wrong. The inner loop should start at scan + 1.

Your code shows a lack of understanding of pointers. You need to do more reading and practice on simpler examples.
More specifically:
double medianValue; creates a double variable. What for? You're apparently going to return a double * and returning a pointer to a local variable is always wrong, because local variables are "recycled" when their function ends.
double *medValPtr = &medianValue; creates a pointer called medValPtr and sets it to the location of medianValue. Well.
Due to the current contents of medValPtr, *medValPtr = (sortArray[numStudents-1] / 2); has the same effect as typing medianValue = (sortArray[numStudents-1] / 2); (supposing it were to compile at all).
Which it doesn't because sortArray[numStudents-1] is, at a guess, the last item in the array sortArray but happens to be a pointer to something else. You can't divide a pointer (numerically you can, but C++ disallows it's always wrong).
Finally you return medValPtr; which is wrong because medValPtr is pointing to a local variable.

You probably want something like:
int *MovieData::calcMed() {
return sortArray[numStudents/2];
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why local arrays in functions seem to prevent TCO? - c++

Related

Problems implementing recursive best-first search in C++ based on Korf 1992

How to speed up a function that returns a pointer to object in c++?

Merge sort algorithm advice

C++ int array pointers recursively to find prime factors

c++ type error message from compiler, what does it mean?

Categories

Resources