Pytorch second derivative stuck between two errors: Buffers have been freed and variable is volatile - gradient

I have a loss function and a list of weightmatrices and I'm trying to compute second derivatives. Here's a code snippet:
loss.backward(retain_graph=True)
grad_params_w=torch.autograd.grad(loss, weight_list,create_graph=True)
for i in range(layers[a]):
for j in range (layers[a+1]):
second_der=torch.autograd.grad(grad_params_w[a][i,j], my_weight_list[b], create_graph=True)
The above code works (actually the second derivative is called in a seperate function but I put it directly for the sake of brevity). But I am completely confused as to when to use create and retain graph.
First: If I don't do loss.backward(retain_graph) I get the error A:
RuntimeError: element 0 of variables tuple is volatile
If I use it, but don't put any "graph" statement on the first derivative, I get the error B:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
If I specify retain_graph=True, I get error A for the second derivative (i.e. in the for loops) no matter if I put a create graph statement there or not.
Hence, only the above snippet works but it feels weird that I need loss.backward and all the create graph statement. Could somebody clarify this to me? Thanks a lot in advance!!

Related

Temporary array creation and routine GEMM

When I run a Fortran code, I get this warning:
Fortran runtime warning: An array temporary was created for argument '_formal_11' of procedure 'zgemm'
related to this part of the code
do iw=w0,w1
!
do a=1,nmodes
Vw(a,:)=V(a,:)*w(iw,:)
end do
call zgemm('N', 'C',&
nmodes, nmodes, nbnd*nbnd, &
(1.d0,0.0d0),&
Vw, nmodes, &
V, nmodes, &
(0.d0,0.0d0), VwV(iw,:,:), nmodes)
end do
!
If I have understood well, the warning is related to passing non-continguous arrays which could affect the preformances. I would like to take care of this. However it is not clear to me what exactly is the problem here, and what I could do to solve it.
What is going on, is that you activated compiling flags that will warn you of temporary array creation at runtime.
Before getting to more explanation, we have to take a better look at what an array is. An array is an area in memory, together with the information needed to interpret it correctly. Those information include but
are not limited to the data type of the elements, number of dimensions, The start-index and end-index of each dimension, and most importantly, the gap between two successive element.
In very simplistic terms, Fortran 77 and below do not have a built-in mechanism to pass in the gap between successive elements. So when there is no explicit interface of the called subroutine, the compiler ensures that there is no gap between successive element by copying data to a temporary contiguous array. This is a safe mechanism to ensure the predictability of the behavior of the subroutine.
When using modules, Fortran 90 and above use a descriptor to pass those information to the called subroutine; that works hand-in-hand with assumed-shape declaration of arrays. This is also a simplistic description.
In summary, that is a warning that will be of importance only if the performance is affected as Vladimir said.

What actually happens when you call a function [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm trying to understand a little bit more about inline functions. The problem is that i don't think i understand how does 'normal' functions works. Could someone explain:
What happens when you call a function?
The picture above illustrates how i understand a function call. When you call it, the 'thread will go' to the function (going to) , will execute (process), will return (returning). But i'm not sure is correct, and i would like to know what is happening between main passing the thread to the function and the function doing it's stuff. Analog for the other way (returning).
Here I've found something related. But I did not understand quite well. Thanks!
P.s a draw would help!
My guess is that you're looking for a simpler explanation than the one you just read on Quora.
Here it is for you.
In mathematics, a function f(X) is a sequential set of transformations on X which gives you a completely different output (usually called Y)
A function is often mathematically described.
For example, let there be a function f(X) = (8*X + 5)
So for every input, there's gonna be a output.
Like, f(5) will be 45 on computation.
In terms of programming,
This 45 is the returned value (Y)
Whenever you are trying to use a function, you're calling that function for a specific input.
I could get into the details of why a program slows down when you call multiple functions but that was explained in the link you provided. So to keep things simple, imagine if I ask you to calculate something simple once. You'll do it right away. But if I ask you to do a thousand simple calculations, though you'll do it, it'll still take you quite a bit of time. It's not exactly the same with computers, but it's somewhere along the lines.
There are some functions which don't actually return anything at all. In Java they're referred to as void functions.
To understand this, let's take another example.
Let f() be your function, and let it print out 5 lines to your console.
Now whenever you try to call a void function, you can think of it like you've replaced those 5 print statements by one function call.
Say it takes an input now f(X) and it prints out X+18
Since it doesn't return anything, you can't assign f(X) to a variable. But it does print things, so whenever you call it in your code, it basically just executes the statement that prints out (X+18) to your console.
In case this isn't clear enough, say you define a void function g(X) to a block of code.
Now whenever you call it, you can think about replacing g(X) with that block of code.
Hopefully this helps
EDIT:
It seems like you've edited your question, so here's an edited answer for you.
Now, a fair warning - the explanation I'm about to give you is not a technical one. My teacher and one of my colleagues used this explanation for me when I was questioning them about the stack and functions for the first time so it's gonna be in layman terms.
Imagine a huge container (or a box), and let's call it A
The container/box is open from the top.
Any item you put inside the box, goes down to the bottom due to gravity.
Suppose you put an item B inside of the container A. Then you add Cand then D
Since the box is only open from the top, the last item that you put in is the only one you can access first. And only when you take out this item, can you go on to access the item below it. So you'll first have access to D, when you take out D, you'll have access to C, and then finally when you take out C you'll have access to the lowest item B
The container A is basically a stack.
The you can think of the container as a box of memory (not physically but figuratively) it's basically just free memory space.
Whenever you invoke the main function, it gets pushed in the box or as we call it now, stack.
The main( ) function is at the bottom of the stack now. You can relate this to the item B from our above example.
Now suppose you have a function f(X)
And within your main( ) function you're calling f(5)
Calling f(5) creates a new scope for the function where the value of the passed in variable (X) is now set to 5
This entire thing - the function definition, any other variables you've created within the function, and the value of the passed in variable - is called a "stack frame". The stack frame will take up some space in our stack/container. This stack frame also corresponds to the next item in the container, and gets "stacked" over the main method. You can think of it as C from the previous example.
Now comes in the part I said about removing items from the container/stack. Removing corresponds to computing and returning the value. Considering the presence of void functions, a better term would be that removing refers to execution of a function. The first item that needs to be removed is C or f(5) and then B or main( )
Now coming to a simple answer to why things slow down when you're calling a lot of functions. There's one simple answer. Your container doesn't have an infinite capacity.
If you keep filling up container A with items B, C, the items will obviously occupy space. Eventually if you go on adding items D, E, F, G, H... there'll be a point when your container is full, after that if you try to put in more items, your container will "overflow"
Basically when you call a lot of functions, your container/stack keeps filling up, eventually leading to a stack overflow.
So calling a function creates a scope of its own, takes all the information related to the function (definition, passed variables, other variables etc), puts it in a stack frame and then stacks the stack frame.
Hopefully it's all clear now... And since this question has been put on hold, if you still have doubts feel free to leave another comment and tag me in it or contact me somehow :)
PS: I've used a lot of layman language here to explain this, so use this as a general way to visualise what's going on, and refer to technical definitions if you're studying this for an exam
EDIT I've edited it to be an example with just one single function f(5) so that it's easier to understand what I'm saying. The earlier one was slightly incorrect as well

Problems with repeatedly calling a member function in which vector summation is implemented

We have a large loop that need to call this member function repeatedly. It's expected that in each iteration of the loop, the calculation time should be similar. But in our calculation, it becomes much slower gradually when the number of iteration increases. We have found that this is induced by this member function. I don't quite understand why it happens. Can anyone explain it for me? or any suggestions for this kind of summation calculation for vectors?
Basic implementation of this part of code:
Dataunit is a class defining the data structure in which several vectors are declared and defined in its member functions.
predict is an object pointer defined in the class Residue which is assigned a value beforing using here. Basically, before calling respdef() function, The data that Y points to are calculated and its pointer can be accessed by calling dataOutput() . Also numExp and numResp are member variables of Residue which are valued in constructor at definition.
In each step, this function will be called to form Yt according to the values of Y. For example, if the loop is 10000 steps, this function would be called 10000 times. It's expected that the size of Yt are changed slightly which is based on the caculation of the data set Y points to. But this size wouldn't change much.
int Residue::respdef()
{
int m,i;
Dataunit tem; //defining a objective of class Dataunit
const Dataunit* Y=predict->dataOutput(); //dataOutput () returns a pointer
//Size of Yt is set to zero and redefined using push_back and initialization
Yt.clear();
for(m=0;m!=numExp;m++)
{
Yt.push_back(tem);
//initialization is a function to define and initialize Yt.
Yt[m].initialization(Y[m].tvector.size(),numResp);
for (i=0;i!=Y[m].tvector.size();i++)
{
Yt[m].tvector[i]=Y[m].tvector[i]; //Copying Y[m].tvector to Yt[m].tvector
Yt[m].Tvector[i]=Y[m].tvector[i];//Copying Y[m].Tvector to Yt[m].Tvector
Yt[m].resp[0][i]=Y[m].resp[0][i];
Yt[m].resp[1][i]=Y[m].resp[0][i]+Y[m].resp[1][i];
Yt[m].resp[2][i]=Y[m].resp[0][i]+Y[m].resp[1][i]+Y[m].resp[2][i];
Yt[m].resp[3][i]=2*Y[m].resp[4][i];
}
}
return 1;
}
Thanks guy for helping me here. The problem is not in this part of code. I just realize that I changed the function for calculating the data Y points to by adding a push_back in order to prevent data incomplete. So the size of Y increase in every loop which induce the size increase of Yt here. I need to resize Y in every step. Thank you
This is likely caused by calling push_back on an unreserved vector. std::vector stores all its objects in contiguous memory on the heap. It starts off reserving memory(usually in powers of 2) based on the initial need and keep reallocating more when needed. However, there is no way of asking the heap to move the squatters sitting right where your vectors current memory allocation ends and allocate it to you instead. Instead, the vector asks the heap for an amount of more equal to the new size it wants and moves(if possible, copies otherwise) all its underlying objects from the old location to the new one. Thus, starting from an empty vector, time taken to populate it with n objects isn't linear in n. If you have an idea of how big your vector is going to be, you can help the compiler out and call std::vector::reserve. This will help minimize the number of reallocations and reduce the associated overhead.
Well, it could be a lot of things. This is really lousy code that needs a lot of clean up. Where did "predict" and Yt come from? Are they member variables, or globals? What about numExp and numResp? Are they constants, member variables or globals? Since the code is pretty poorly written and not very descriptive, it is hard to make much sense of it from a small snippet. There could be all kinds of crazy operator overloading going on that causes the vectors to grow, etc. It is also not clear what other outside factors may be involved.
The best option would be use a profiler to start with. If you don't have that, you might set some breakpoints with hit counts and look at the vectors and see if they are growing over subsequent calls. Finally, you could add some counting of loops and log that at the end to see if the number of loops is dramatically increasing over time. If both numExp and Y[m].tvector.size() continue increase, then time spent in this method will quickly worsen.

Modify dependent variable in stiff solver (vode)

I am using the dvode ODE solver from netlib to solve a stiff sparse system (the application is atmospheric chemistry). On the first call of the subroutine dvode completes a set of initialisation tasks, and takes the array of initial value of the dependent variable y as input. In subsequent calls, the routine performs the actual integration and the array y is used as output only.
For various reasons, I need to modify one element of the dependent array y during the integration. As y is used as output for all but the first call to dvode, modifications to the input values of y are ignored. It appears the relevant data are stored in a workspace array.
Is there any way to coerce dvode to let me change the value of the dependent array during the integration? I don't want to mess with the internals of the solver, and if possible I want to avoid altering the workspace arrays, since there may be all kinds of dependencies that will be difficult to foresee. I have tried alternating between initialisation and integration calls, but this makes things much slower.
If there is no clear solution, I would also consider trying another (Fortran-compatible) solver for stiff, highly non-linear ODEs.

c++ different error depends on whether or not I'm printing a variable

Preface
This is probably a bad question but I'm truly hopeless here. I know that I my question is a bad one since it is highly specific to my problem but I'll try to describe the problem the best and most general way I can...
Some background
In my code, I have a data structure named bead which holds, among other an integer variable named LID. I also have a data structure named lipid which holds, among others, three pointers to (three different) beads. The lipids are held in a vector<lipid*> variable named lipids and bead's LID variable is equal to the position of the appropriate lipid in lipids +1.
During my (Monte Carlo - MC) simulation, I choose a random number between 0 and 999 and change appropriate lipid. I then test the change using a function named calcEnergy which accept the lipid's bead pointers one at a time as it's input (named mb).
The problem
I got a bad allocation error after I have picked lipids position number 261. I have tried to find why does that happen so I have typed:
if(mb->LID ==261){
printf("something");
}
when I tried this, The simulation stopped (segmentation fault) when the program have picked lipids position number 132. So, I tried to see why is that and printed to the screen the LID of each bead I have sent using printf(). Now the program survive the 132 lipid and crushes on the next lipid.
I am absolutely clueless about the cause of why does the program crashes when I try to read a variable which I know it's valid.
Once again, I know this is a long shot but I have now other idea...
Thanks.
I would guess that a function is taking in a lipid * and overwriting one of its bead pointers, so that later, it reads an invalid pointer which contains invalid bead data.
Try editing your "calc energy" function to not do anything, just take in the pointer and return. See if that stops the bug. If it does, add one line of code at a time until you find the line that causes the error.