checking a sequence of rows for a condition - row

I have a dataframe, which looks kinda like this:
Day of Year
Standard Deviation
7.7.2022
1.78
8.7.2022
2.97
9.7.2022
1.74
10.7.2022
1.89
11.7.2022
2.22
12.7.2022
0.78
13.7.2022
1.43
14.7.2022
0.98
15.7.2022
1.32
I calculated Standard Deviation for different days. Now I want to check wether the calculated standard deviation is below a specific value (lets say: 1.56) for three days in a row. If that is true I'd like to create a new column which contains True if the condition is fullfilled or false if not.
I want something like this:
Day of Year
Standard Deviation
Snow
7.7.2022
1.78
False
8.7.2022
2.97
False
9.7.2022
1.74
False
10.7.2022
1.89
False
11.7.2022
2.22
False
12.7.2022
0.78
True
13.7.2022
1.43
True
14.7.2022
0.98
True
15.7.2022
1.32
True
I am quite new to R and as you can see I still struggle a bit! I tried different codes (e.g. for/ if loops) but nothing has worked so far.
It would be a huge help to know at least how to check three rows for a condition.
Thank you for your support!

Related

SAS: Putting observations in bin and keep the ones closest to it

I have a list of observations with a few variables. I need to put them in a bin (below) and only keep one observation in each bin which is closest to the bin's number:
Bins
0.94
0.96
0.98
1.00
1.02
1.04
1.06
Data
Variable Price Value_to_bin Closest bin
a 0.630527682 0.935 0.94
b 0.441296291 0.979 0.98
c 0.350173415 0.969
d 0.920932417 0.993
e 0.361863025 0.959 0.96
f 0.027205755 1.003 1
g 0.878286791 1.045
h 0.206434946 0.971
i 0.259272294 1.021 1.02
j 0.081774863 0.982
k 0.01146324 0.992
l 0.283027273 1.037 1.04
m 0.188747537 0.993
n 0.554786 1.064 1.06
o 0.784774 1.065
And then just keep the ones that are closest to the bin value (i.e. delete the ones that have blanks in the 'closest_bin' variable.
I tried to use proc rank but I can't get rid of the rest or match with the bin (something like 'closest' doesn't exist as far as I know).
SAS SQL with automatic remerging can perform the query quite succinctly. The consistent binning to a 0.02 level allows the ROUND function to compute the bin values to the nearest 0.02 unit.
proc sql;
create table want as
select
var,
price,
value,
round(value,0.02) as valbin_02
from have
group by valbin_02
having abs(valbin_02-value) = min(abs(valbin_02-value))
;

Google Sheet Function with IF statement to add 1/0 column

I want to query a number of rows from one sheet into another sheet, and to the right of this row add a column based on one of the queried columns. Meaning that if column C is "Il", I want to add a column to show 0, otherwise 1 (the samples below will make it clearer.
I have tried doing this with Query and Arrayformula, without query, with Filter and importrange. An example of what I tried:
=query(Data!A1:AG,"Select D, E, J, E-J, Q, AG " & IF(AG="Il",0, 1),1)
Raw data sample:
Captured Amount Fee Country
TRUE 336 10.04 NZ
TRUE 37 1.37 GB
TRUE 150 4.65 US
TRUE 45 1.61 US
TRUE 20 0.88 IL
What I would want as a result:
Amount Fee Country Sort
336 10.04 NZ 1
37 1.37 GB 1
150 4.65 US 1
45 1.61 US 1
20 0.88 IL 0
try it like this:
=ARRAYFORMULA(QUERY({Data!A1:Q, {"Sort"; IF(Data!AG2:AG="IL", 0, 1)}},
"select Col4,Col5,Col9,Col5-Col9,Col17,Col18 label Col5-Col9''", 1))

Netlogo: runtime-error with list, item, and -1

I have a rather specific error in netlogo that I've been staring at for a while now. Hope you guys have some insight.
The error is in a code which looks back in a list called 'strategy'. If the list is longer than investment-time variables 'REfocus' and 'PRICE' are set to a certain value. If the list is not longer than investment-time, the variables are not set (and thus remain 0).
The code consists out of a function strategy_actions and a reporter investment_time. Investment-time is approximately 3 years, but as ticks are in months, investment-time is rescaled to months. In strategy_actions, investment-time is scaled back to years, as each entry in the strategy list is also annual. (The scaling and rescaling seems arbitrary, but as investment-time is used a lot by other parts of the code, it made more sense to do it like this). The goal is to take the strategy from x time back (equal to investment-time).
The code (error follows underneath):
to strategy_actions
set_ROI
start_supply?
if current_strategy != 0
[
let it (investment_time / 12)
ifelse it >= length strategy
[
set REfocus 0
]
[
if item (it - 1) strategy = 1
[
if supply? = true [set_PRICE (set_discrete_distribution 0.29 0.19 0.29 0.15 0.07 0 0) (set_discrete_distribution 0.14 0.12 0.25 0.25 0.25 0 0)]
ifelse any? ids [set REfocus mean [mot_RE] of ids][set REfocus set_discrete_distribution 0.07 0.03 0.07 0.17 0.66 0 0]
]
if item (it - 1) strategy = 2
[
if supply? = true [set_PRICE (set_discrete_distribution 0.27 0.21 0.32 0.11 0.09 0 0) (set_discrete_distribution 0.15 0.11 0.22 0.30 0.23 0 0)]
ifelse any? prods [set REfocus mean [mot_RE] of prods][set REfocus set_discrete_distribution 0.12 0.03 0.10 0.18 0.57 0 0]
]
if item (it - 1) strategy = 3
[
if supply? = true [set_PRICE (set_discrete_distribution 0.26 0.22 0.26 0.18 0.09 0 0) (set_discrete_distribution 0.07 0.08 0.19 0.30 0.35 0 0)]
ifelse any? cons[set REfocus mean [mot_RE] of cons][set REfocus set_discrete_distribution 0.08 0.06 0.15 0.27 0.45 0 0]
]
]
set RE_history fput REfocus RE_history
]
end
to-report investment_time
report ((random-normal 3 1) * 12) ;approximately 3 years investment time
end
somehow, i sometimes get this runtime error during my behaviorspace experiment:
-1 isn't greater than or equal to zero.
error while observer running ITEM
called by procedure STRATEGY_ACTIONS
called by procedure SET_MEETING_ACTIONS
called by procedure GO
Does anyone know what causes this error?
You would help me out a lot!
Cheers,
Maria
It appears that investment_time is occasionally coming in as zero, so you are asking for item (0 - 1) of the strategy list. I did a bit of playing around with item and learned (to my surprise) that item (0.0001 - 1) strategy works just fine, yielding the 0th item in the list in spite of the argument being negative. But item (0 - 1) strategy does give the error you cite. Apparently an item number greater than -1 is interpreted as zero. Indeed item seems to truncate any fractional argument rather than rounding it. E.g., item 0.9 is interpreted as item 0, as is item -0.9
That might be worth putting in the documentation.
HTH,
Charles

Half-Life Determination

Here is the problem I am working on:
You are to develop a menu-driven program that will allow the analyses of data from the file Patient_Data.txt using the following equations:
Half-Life Equations
Ct = C0e^-kt
t½ = ln(2)/k
where:
Ct is the concentration in ug/L at time t
C0 is the initial concentration in ug/L
t is the time in hrs
k is the time constant (1/hrs)
t½ is the half-life in hrs
The user of the program must be able to obtain the average half-life (to 2 decimal places) along with the number of measurements used to calculate the average for any of the 5 patients for which data has been collected.
The program must also be able to display the 2 patient numbers and averages of the patients that have the highest half-life average values.
A menu must be used to select the different options with an additional option for Exit. The program must run until exit is selected by the user.
The program must be designed using functions.
A function called analyzeData must take as input the patient number and must return both the average half-life and the number of measurements in the average for the input patient number.
A separate function called halfLife is to be used for calculating the t½ (half-life) based on C0 (initial concentration), Ct (concentration at time t) and t (time) that are in the data file.
A third function called highest2halfLifes must also be used to determine the two patients with the longest average half-life from the five different patients. All four values (patient1, halfLife1, patient2, halfLife2) must be returned to the main function.
The following data file Patient_Data.txt lists values for C0, Ct, and t, respectively (Patient Data)
1 325 160 2.0
1 600 100 6.2
2 325 220 1.0
3 600 200 4.4
4 325 100 3.0
4 325 88 3.2
2 600 200 3.3
2 325 100 3.3
4 600 210 3.4
5 325 105 3.5
1 600 110 6.0
3 325 100 3.1
2 600 120 5.5
2 600 125 5.5
5 120 60 2.2
2 325 100 3.4

Python 2.7 Pandas: How to replace a for-loop?

I have a large pandas dataframe with 2000 rows (one date per row) and 2000 columns (1 second intervals). Each cell represents a temperature reading.
Starting with the 5th row, I need to go back 5 rows and find all the observations where the the 1st column in the row is higher than the 2nd column in the row.
For the 5th row I may find 2 such observations. I then want to do summary stats on the observations and append those summary stats to a list.
Then I go to the 6st row and go back 5 rows and find all the obvs where the 1th column is higher than the 2nd column. I get all obvs, do summary stats on the obvs and append the results to the new dataframe.
So for each row in the dataframe, I want to go back 5 days, get the obvs, get the stats, and append the stats to a dataframe.
The problem is that if I perform this operation on rows 5 -2000, then I will have a for-loop that is 1995 cycles long, and this takes a while.
What is the better or best way to do this?
Here is the code:
print huge_dataframe
sec_1 sec_2 sec_3 sec_4 sec_5
2013_12_27 0.05 0.12 0.06 0.15 0.14
2013_12_28 0.06 0.32 0.56 0.14 0.17
2013_12_29 0.07 0.52 0.36 0.13 0.13
2013_12_30 0.02 0.12 0.16 0.55 0.12
2013_12_31 0.06 0.30 0.06 0.14 0.01
2014_01_01 0.05 0.12 0.06 0.15 0.14
2014_01_02 0.06 0.32 0.56 0.14 0.17
2014_01_03 0.07 0.52 0.36 0.13 0.13
2014_01_04 0.02 0.12 0.16 0.55 0.12
2014_01_05 0.06 0.30 0.06 0.14 0.01
for each row in huge_dataframe.ix[5:]:
move = row[sec_1] - row[sec_2]
if move < 0: move = 'DOWN'
elif move > 0: move = 'UP'
relevant_dataframe = huge_dataframe.ix[only the 5 rows preceding the current row]
if move == 'UP':
mask = relevant_dataframe[sec_1 < sec_2] # creates a boolean dataframe
observations_df = relevant_dataframe[mask]
elif move == 'DOWN':
mask = relevant_dataframe[sec_1 > sec_2] # creates a boolean dataframe
observations_df = relevant_dataframe[mask]
# At this point I have observations_df which is only filled
# with rows where sec_1 < sec_2 or the opposite, depending on which
# row I am in.
summary_stats = str(observations_df.describe())
summary_list.append(summary_stats) # This is the goal
# I want to ultimatly
# turn the list into a
# dataframe
Since there is no code to create the data, I will just sketch the code that I would try to make work. Generally, try to prevent from row-wise operations whenever you can. I first had no clue either, but then I got interested and some research yielded TimeGrouper:
df = big_dataframe
df['move'] = df['sec_1'] > df['sec2']
def foobarRules(group):
# keep in mind that in here, you refer not to "relevant_dataframe", but to "group"
if (group.tail(1).move == True):
# some logic
else:
# some other logic
return str(group.describe())
grouper = TimeGrouper('5D')
allMyStatistics = df.groupby(grouper).apply(foobarRules)
I have honestly no clue how the return works if you return a multi-dimensional dataframe. I know it works well if you return either a row or a column, but if you return a dataframe that contains both rows and columns for every group - I guess pandas is smart enough to compute a panel of all these. Well, you will find out.