Getting the error: "missing 1 required positional argument: 'row'" when using Dataframe.apply() - apply

I am trying to improve performance of my stock order placer algorithm (1000's of lines) by switching from using iterrows() to using apply(), but I am getting an error:
TypeError: ("place_orders() missing 1 required positional argument: 'row'", 'occurred at index 2008-01-14 00:00:00')
Below is an example of the orders file I am reading in (short list for simplicity):
Next...below is my code both my attempt at implementing apply() and the slower iterrows()
I apologize if this is a newbie question, but I need to use the index and the rows inside the function, as the index is a bunch of dates.
Update: Below is an example of my prices_table.

When switching from iterrows to apply you need to change your mindset a little bit. Instead of a looping over the dataframe and taking every row from top to bottom, you just specify what you want to happen in every row. Mostly just let go of row numbers.
So when using apply it's usually a good idea to let go of of row numbers (in you case i). Try using a function like this in your apply:
orders_df.apply(lambda row: place_orders(row), axis=1)
I realize that inside your place_orders function you are using specific (sets of) rows of the prices_table. To overcome this part you might want to merge the dataframes before calling apply, since apply is not really intended to work on multiple dataframes at once.
This forces you to rewrite some of your code, but in my experience the performance increase you gain from not using iterrows is always worth it.

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

How to correctly use the Table.Repeat function?

I'm having some trouble getting the Table.Repeat function to work properly... I'm very new to PowerQuery/BI, so am just about getting my head wrapped around all the coding.
Following the syntax and it would appear everything is correct, given that the addition of columns is optional.
What I am aiming to achieve is to have the entire repeated a specific number of times, the repeat function described here sounds like it fits the bill. But when I have attempted to implement it, it results in an error.
I was previously using the Append Function, however, as I'm trying to append the query several thousand times, this results in the query crashing excel and has become uneditable after the initial setup.
I've tried implementing the Repeat code halfway through the query, where it's needed; and on a new sheet. Halfway through gave me an error stating: that it could not find a Value and that a table.
When I tired it on a new sheet, though I did not get an error, the applied sets disappeared and the data wasn't repeated. I tried the repeating tables, much lower I needed to test out, but this still went into error.
#"Repeat" = Table.Repeat(#"8-1", 2)
Essentially the entire table repeated X number of times.

userWarning pymc3 : What does reparameterize mean?

I built a pymc3 model using the DensityDist distribution. I have four parameters out of which 3 use Metropolis and one uses NUTS (this is automatically chosen by the pymc3). However, I get two different UserWarnings
1.Chain 0 contains number of diverging samples after tuning. If increasing target_accept does not help try to reparameterize.
MAy I know what does reparameterize here mean?
2. The acceptance probability in chain 0 does not match the target. It is , but should be close to 0.8. Try to increase the number of tuning steps.
Digging through a few examples I used 'random_seed', 'discard_tuned_samples', 'step = pm.NUTS(target_accept=0.95)' and so on and got rid of these user warnings. But I couldn't find details of how these parameter values are being decided. I am sure this might have been discussed in various context but I am unable to find solid documentation for this. I was doing a trial and error method as below.
with patten_study:
#SEED = 61290425 #51290425
step = pm.NUTS(target_accept=0.95)
trace = sample(step = step)#4000,tune = 10000,step =step,discard_tuned_samples=False)#,random_seed=SEED)
I need to run these on different datasets. Hence I am struggling to fix these parameter values for each dataset I am using. Is there any way where I give these values or find the outcome (if there are any user warnings and then try other values) and run it in a loop?
Pardon me if I am asking something stupid!
In this context, re-parametrization basically is finding a different but equivalent model that it is easier to compute. There are many things you can do depending on the details of your model:
Instead of using a Uniform distribution you can use a Normal distribution with a large variance.
Changing from a centered-hierarchical model to a
non-centered
one.
Replacing a Gaussian with a Student-T
Model a discrete variable as a continuous
Marginalize variables like in this example
whether these changes make sense or not is something that you should decide, based on your knowledge of the model and problem.

Use the function "mod" in the instructions "if" and "select case"

I wrote a little code in Fortran. But the code doesn't behave as I thought, and I can figure out where is the problem.
I will not put the code here because it has 1200 lines but here its philosophy:
I create a 3D grid represented by a four dimensional table (I stock a vector of 2 elements on each point of the grid, corresponding at the nature of the site and who is occupying the site). This grid represents what we call a crystal (where atoms can be found periodically)
When this grid is constructed, the code scans each point of this grid and it looks to the neighboring sites to count the different type of atoms or the vacancies.
For this last point, I use a triple imbricated loop which permit to explore the different sites and I check the different neighboring site using either the if or the select case instructions. As I want my grid to be periodic, I have the function mod in the argument of the if or the select case.
The problem is sometimes, It found a different element in a neighboring site that the actual element in this specific neighboring site. As an example:
In the two ouput files where all the coordinates are written with the
element type I have grid(0,0,1)=-1 (which correspond to a empty site).
But while the code is looking to the neighboring sites of grdi(0,0,1) It tells that there is actually an element indexed 2 in grid(0,0,1).
I look carefully to the block in the triple implemented loop, but it seems fine.
I would like to know if anyone has already meet this kind of problem, or know if there is some problems using mod in a if or select case argument ?
If some of you want to look closer, I can send you the code, with some explanations.
Arrays are usually dimensioned as:
REAL(KIND=8),DIMENSION(0:N) ::A
or
REAL(KIND=8),DIMENSION(N) :: A
In the later example, they are assumed to start at 1.
You could also go (-N:N) or (10:191)
If you use the compiler switch '-check bounds' or ;-check all' you will see if you are going outside the array/etc. This is not an uncommon thing to get hosed up, but the compiler will abort quickly when the dimension is outside.
Once it works then removed the -check bounds and/or -check all.
Thanks for your consideration francescalus and haraldkl.
It was not related to the dimension of arrays Holmz, but thank you to try to help
It seems I finally succeed to fix it. I will post an over answer If I fully understand why it was not working properly.
Apparently, it was related to the combination of a different argument order in a call procedure and the subroutine header + a declaration in the subroutine with intent(inout).
It was like the intent(inout) was masking the problem. But It a bit strange for me.
Some explanations about the code :
As I said, the code create a 3D grid where each intersection of the 3D grid correspond to a crystallographic site. I attribute a value at each site -1 for an empty site, 1 for a crystal atom (0 if there is a vacancy instead of a crystal atom), 2,3,4,5 for different impurities. Actually, the empty sites and the sites which received crystal atoms are not of the same type, that's why an empty site and a vacancy are distinguished. The impurities can only occupied the empty site and are forbidden to occupied a crystal site.
The aim of the code is to explore the configurational space of the system, in other words all the possible distribution we can obtained with the different elements. To do so I start from a initial configuration and I choose randomly to site (respecting the rules of occupation) and I virtually switch them. I calculate the energy of the old an new configurations, if the new has a lower energy I keep it, if not, i keep the old one. The calculus of the energy is based on the knowledge of the environment of each vacancies and impurities, so we need to know their neighbors. And I repeat the all procedure again and again to converge to the most stable (so the most probable) configuration.
The next step is to include the temperature effect, and to add the second type of empty sites.
Have a nice day,
M.

Apply a function to a range of cells in a spreadsheet

The answers in topics with similar titles haven't given me much of a resolution to my particular problem, but possibly I am not asking the right question. It might help knowing I'm an absolute noob when it comes to spreadsheets, so finding my way around is next to nil.
Currently I can set a basic function in the first cell A1 =ROW()
Simple right? Well now here comes the complication. If I click on the bottom right of the cell and start dragging I can then apply that very same function to a whole range of cells. Let's say I apply it from A1:A10. Every cell within this group now has the same function.
Hooray! We did it, right? I applied a function to a range of cells each with their own output. But wait, if I then go back to the original cell and change its formula none of the other cells change with it. GRRRRR!!!!
There are a couple of fixes I've come up with but don't necessarily know how to implement. The first is to have every cell link back to the original cell and reference its function. This would be useful if I wanted to randomly scatter dependent cells about the document. The other would be much more useful in an orderly group where you know the exact dimensions by specifying in the original cell the size of the array you want to apply the function to.
With that said, let me hear your thoughts.
The closest I've come to an answer is to use FORMULA() which returns the formula used by a cell as text. Unfortunately all answers on evaluating the text resort to scripting. How strange! I thought something like this would be common. Might as well get to scripting.
Hold on, I may have spoke too soon. An array can be made with =MUNIT(), but it's only square. Drats!
Ok... I'm hoping the zebra stripes will eventually become its own answer unless someone else beats me to it. So a simple array can be made with ={1,2;3,4} where commas separate values by column and semicolons for values by row except to generate it you have to press Control+Shift+Enter (because reasons?). I'm thinking now that I'll need to have functions that can generate lists of values based on a single function for each row, and pray that it'll work. So, back to looking. (Wow this is taking forever)
The way I was hypothesizing can't even generate a 1x1, e.g., ={ROW()} returns Err:512 which is a formula overflow.
Alright, in summary so far I've narrowed down the two options,
1) link every cell to the original formula
2) populate an array with a single formula
each with their own incomplete answer,
a) use FORMULA() to return the formula of a cell as text
b) create a hypothetical array like so ={LIST_OF_VALUES()}
These both require a strange form of the nonexistent EVALUATE() function to 'function' correctly. Isn't that fun?
Google Sheets handles case b by allowing ={ROW()}Control+Shift+Enter to generate =ArrayFormula({ROW()}). Working with the general case of any sized array being filled with a single function doesn't exist in the world of spreadsheets it seems. That's very saddening because I can't think of a much better tool for what I want to do. Copy paste it is until I need to use macros.
Depending on your specific use case, creating a user-defined function may help:
use the Basic IDE to create your function;
apply it to any cells on any sheet;
modifying the Basic code will affect all cells where the function is used.
I've elaborated the steps in an answer on superuser.
Sure, you could write some complex code to update functions, but wouldn't the easy way be just to drag it to the same range of cells the same way you did before? It should properly overwrite the existing code in there, and if it doesn't, you can just as easily delete the outdated code and drag the new code in.
Probably the best approach is to simply drag the amended formula over the range of cells (as advised by OldBunny2800). This is less error prone and easier to maintain than a custom macro.
Another option would be to use an array function. Then you only have to edit the function once, and the same edit will be automatically applied to the whole range of cells in that array function.