SAS: Putting observations in bin and keep the ones closest to it

SAS: Putting observations in bin and keep the ones closest to it - sas

I have a list of observations with a few variables. I need to put them in a bin (below) and only keep one observation in each bin which is closest to the bin's number:
Bins
0.94
0.96
0.98
1.00
1.02
1.04
1.06
Data
Variable Price Value_to_bin Closest bin
a 0.630527682 0.935 0.94
b 0.441296291 0.979 0.98
c 0.350173415 0.969
d 0.920932417 0.993
e 0.361863025 0.959 0.96
f 0.027205755 1.003 1
g 0.878286791 1.045
h 0.206434946 0.971
i 0.259272294 1.021 1.02
j 0.081774863 0.982
k 0.01146324 0.992
l 0.283027273 1.037 1.04
m 0.188747537 0.993
n 0.554786 1.064 1.06
o 0.784774 1.065
And then just keep the ones that are closest to the bin value (i.e. delete the ones that have blanks in the 'closest_bin' variable.
I tried to use proc rank but I can't get rid of the rest or match with the bin (something like 'closest' doesn't exist as far as I know).

SAS SQL with automatic remerging can perform the query quite succinctly. The consistent binning to a 0.02 level allows the ROUND function to compute the bin values to the nearest 0.02 unit.
proc sql;
create table want as
select
var,
price,
value,
round(value,0.02) as valbin_02
from have
group by valbin_02
having abs(valbin_02-value) = min(abs(valbin_02-value))
;

Related

How do you rank based on a measure referencing two tables in a star schema?

I have a star schema data model. DimDate, DimBranchName, BranchActual, BranchBudget.
I have measures to calculate the YTD variance to Budget by Branch called QVar. Qvar takes the counts from BranchActual and compares it BranchBudget between two dates. The visual is controlled by DimBranchName and DimDate.
Current Result:
BranchName YTDActual YTDBudget QVar
A 100 150 (33%)
B 200 200 0.0%
C 25 15 66%
I want a measure to be able to rank DimBranchName[BranchName] by QVar that will interact with the filters I have in place.
Desired result:
BranchName YTDActual YTDBudget QVar Rank
A 100 150 (33%) 3
B 200 200 0.0% 2
C 25 15 66% 1
What I've tried so far is
R Rank of Actual v Goal =
var V = [QVar]
RETURN
RANKX(ALLSELECTED('BranchActual'),CALCULATE(V),,ASC,Dense)
What I get is all 1's
BranchName YTDActual YTDBudget QVar Rank
A 100 150 (33%) 1
B 200 200 0.0% 1
C 25 15 66% 1
Thanks!

When you declare a variable it is computed once and treated as a constant through the rest of your DAX code, so CALCULATE(V) is simply whatever V was when you declared the variable.
This might be closer to what you want:
R Rank of Actual v Goal =
RANKX ( ALLSELECTED ( DimBranchName[BranchName] ), [QVar],, ASC, DENSE )
This way [QVar] is called within the filter context of the BranchName instead of being a constant. (Note that referencing a measure within another measure implicitly wraps it in CALCULATE so you don't need that again.)

Google Sheet Function with IF statement to add 1/0 column

I want to query a number of rows from one sheet into another sheet, and to the right of this row add a column based on one of the queried columns. Meaning that if column C is "Il", I want to add a column to show 0, otherwise 1 (the samples below will make it clearer.
I have tried doing this with Query and Arrayformula, without query, with Filter and importrange. An example of what I tried:
=query(Data!A1:AG,"Select D, E, J, E-J, Q, AG " & IF(AG="Il",0, 1),1)
Raw data sample:
Captured Amount Fee Country
TRUE 336 10.04 NZ
TRUE 37 1.37 GB
TRUE 150 4.65 US
TRUE 45 1.61 US
TRUE 20 0.88 IL
What I would want as a result:
Amount Fee Country Sort
336 10.04 NZ 1
37 1.37 GB 1
150 4.65 US 1
45 1.61 US 1
20 0.88 IL 0

try it like this:
=ARRAYFORMULA(QUERY({Data!A1:Q, {"Sort"; IF(Data!AG2:AG="IL", 0, 1)}},
"select Col4,Col5,Col9,Col5-Col9,Col17,Col18 label Col5-Col9''", 1))

Netlogo: runtime-error with list, item, and -1

I have a rather specific error in netlogo that I've been staring at for a while now. Hope you guys have some insight.
The error is in a code which looks back in a list called 'strategy'. If the list is longer than investment-time variables 'REfocus' and 'PRICE' are set to a certain value. If the list is not longer than investment-time, the variables are not set (and thus remain 0).
The code consists out of a function strategy_actions and a reporter investment_time. Investment-time is approximately 3 years, but as ticks are in months, investment-time is rescaled to months. In strategy_actions, investment-time is scaled back to years, as each entry in the strategy list is also annual. (The scaling and rescaling seems arbitrary, but as investment-time is used a lot by other parts of the code, it made more sense to do it like this). The goal is to take the strategy from x time back (equal to investment-time).
The code (error follows underneath):
to strategy_actions
set_ROI
start_supply?
if current_strategy != 0
[
let it (investment_time / 12)
ifelse it >= length strategy
[
set REfocus 0
]
[
if item (it - 1) strategy = 1
[
if supply? = true [set_PRICE (set_discrete_distribution 0.29 0.19 0.29 0.15 0.07 0 0) (set_discrete_distribution 0.14 0.12 0.25 0.25 0.25 0 0)]
ifelse any? ids [set REfocus mean [mot_RE] of ids][set REfocus set_discrete_distribution 0.07 0.03 0.07 0.17 0.66 0 0]
]
if item (it - 1) strategy = 2
[
if supply? = true [set_PRICE (set_discrete_distribution 0.27 0.21 0.32 0.11 0.09 0 0) (set_discrete_distribution 0.15 0.11 0.22 0.30 0.23 0 0)]
ifelse any? prods [set REfocus mean [mot_RE] of prods][set REfocus set_discrete_distribution 0.12 0.03 0.10 0.18 0.57 0 0]
]
if item (it - 1) strategy = 3
[
if supply? = true [set_PRICE (set_discrete_distribution 0.26 0.22 0.26 0.18 0.09 0 0) (set_discrete_distribution 0.07 0.08 0.19 0.30 0.35 0 0)]
ifelse any? cons[set REfocus mean [mot_RE] of cons][set REfocus set_discrete_distribution 0.08 0.06 0.15 0.27 0.45 0 0]
]
]
set RE_history fput REfocus RE_history
]
end
to-report investment_time
report ((random-normal 3 1) * 12) ;approximately 3 years investment time
end
somehow, i sometimes get this runtime error during my behaviorspace experiment:
-1 isn't greater than or equal to zero.
error while observer running ITEM
called by procedure STRATEGY_ACTIONS
called by procedure SET_MEETING_ACTIONS
called by procedure GO
Does anyone know what causes this error?
You would help me out a lot!
Cheers,
Maria

It appears that investment_time is occasionally coming in as zero, so you are asking for item (0 - 1) of the strategy list. I did a bit of playing around with item and learned (to my surprise) that item (0.0001 - 1) strategy works just fine, yielding the 0th item in the list in spite of the argument being negative. But item (0 - 1) strategy does give the error you cite. Apparently an item number greater than -1 is interpreted as zero. Indeed item seems to truncate any fractional argument rather than rounding it. E.g., item 0.9 is interpreted as item 0, as is item -0.9
That might be worth putting in the documentation.
HTH,
Charles

SAS Hash Table (Right Join/Union)

I'm looking to do a lookup that is sort of a hybrid between a join and union. I have a large number of records in my main dataset, so I'm looking to do something that wouldn't be a "brute force" method of a many-to-many matrix.
Here is my main dataset, called 'All', which already contains price for each of the products listed.
product date price
apple 1/1/2011 1.05
apple 1/3/2011 1.02
apple 1/4/2011 1.07
pepper 1/2/2011 0.73
pepper 1/3/2011 0.75
pepper 1/6/2011 0.79
My other data dataset ('Prices' - not shown here, but contains the same two keys, product and date) contains prices for all products, on each possible date. The hash table look up I would like to create would essentially look up every date in the 'All' table, and output prices for ALL products for that date, resulting in a table such as this:
product date price
apple 1/1/2011 1.05
pepper 1/1/2011 0.71 *
apple 1/2/2011 1.04 *
pepper 1/2/2011 0.73
apple 1/3/2011 1.02
pepper 1/3/2011 0.75
apple 1/4/2011 1.07
pepper 1/4/2011 0.76 *
apple 1/6/2011 1.10 *
pepper 1/6/2011 0.79
That is, as long as one product has a date and price specified 'All' table, all other products should pull that in from the lookup table as welll.
The asterisks indicate that the price was looked up from the prices table, and new rows containing prices for products were essentially inserted into the new table.
If hash table are not a great way to go about this, please let me know alternative methods.

Well this is far from elegant, but curious if the below gives you the desired result? Since you have multiple records per key in ALL (which I assume you want to maintain), I basically unioned ALL with the records in PRICES that have a date in All, but I added an Except so as to excluded records that were already in ALL. No idea if this makes sense, or is doing what you want. Certainly doesn't qualify as 'elegant'.
data all;
input product $7. date mmddyy10. price;
Y=1;
format date mmddyy10.;
cards;
apple 01/01/2011 1.05
apple 01/01/2011 1.05
apple 01/03/2011 1.02
pepper 01/02/2011 0.73
pepper 01/03/2011 0.75
pepper 01/06/2011 0.79
;
run;
data prices;
input product $7. date mmddyy10. price;
format date mmddyy10.;
cards;
apple 01/01/2011 1.05
apple 01/02/2011 1.04
apple 01/03/2011 1.02
apple 01/04/2011 1.07
apple 01/05/2011 1.01
pepper 01/01/2011 0.70
pepper 01/02/2011 0.73
pepper 01/03/2011 0.75
pepper 01/04/2011 0.76
pepper 01/05/2011 0.77
pepper 01/06/2011 0.79
;
run;
proc sql;
create table want as
select * from all
union corr all
( (select product,date,price from
prices
where date IN (select distinct date from all)
)
except corr
select product,date,price from all
)
;
quit;

Python 2.7 Pandas: How to replace a for-loop?

I have a large pandas dataframe with 2000 rows (one date per row) and 2000 columns (1 second intervals). Each cell represents a temperature reading.
Starting with the 5th row, I need to go back 5 rows and find all the observations where the the 1st column in the row is higher than the 2nd column in the row.
For the 5th row I may find 2 such observations. I then want to do summary stats on the observations and append those summary stats to a list.
Then I go to the 6st row and go back 5 rows and find all the obvs where the 1th column is higher than the 2nd column. I get all obvs, do summary stats on the obvs and append the results to the new dataframe.
So for each row in the dataframe, I want to go back 5 days, get the obvs, get the stats, and append the stats to a dataframe.
The problem is that if I perform this operation on rows 5 -2000, then I will have a for-loop that is 1995 cycles long, and this takes a while.
What is the better or best way to do this?
Here is the code:
print huge_dataframe
sec_1 sec_2 sec_3 sec_4 sec_5
2013_12_27 0.05 0.12 0.06 0.15 0.14
2013_12_28 0.06 0.32 0.56 0.14 0.17
2013_12_29 0.07 0.52 0.36 0.13 0.13
2013_12_30 0.02 0.12 0.16 0.55 0.12
2013_12_31 0.06 0.30 0.06 0.14 0.01
2014_01_01 0.05 0.12 0.06 0.15 0.14
2014_01_02 0.06 0.32 0.56 0.14 0.17
2014_01_03 0.07 0.52 0.36 0.13 0.13
2014_01_04 0.02 0.12 0.16 0.55 0.12
2014_01_05 0.06 0.30 0.06 0.14 0.01
for each row in huge_dataframe.ix[5:]:
move = row[sec_1] - row[sec_2]
if move < 0: move = 'DOWN'
elif move > 0: move = 'UP'
relevant_dataframe = huge_dataframe.ix[only the 5 rows preceding the current row]
if move == 'UP':
mask = relevant_dataframe[sec_1 < sec_2] # creates a boolean dataframe
observations_df = relevant_dataframe[mask]
elif move == 'DOWN':
mask = relevant_dataframe[sec_1 > sec_2] # creates a boolean dataframe
observations_df = relevant_dataframe[mask]
# At this point I have observations_df which is only filled
# with rows where sec_1 < sec_2 or the opposite, depending on which
# row I am in.
summary_stats = str(observations_df.describe())
summary_list.append(summary_stats) # This is the goal
# I want to ultimatly
# turn the list into a
# dataframe

Since there is no code to create the data, I will just sketch the code that I would try to make work. Generally, try to prevent from row-wise operations whenever you can. I first had no clue either, but then I got interested and some research yielded TimeGrouper:
df = big_dataframe
df['move'] = df['sec_1'] > df['sec2']
def foobarRules(group):
# keep in mind that in here, you refer not to "relevant_dataframe", but to "group"
if (group.tail(1).move == True):
# some logic
else:
# some other logic
return str(group.describe())
grouper = TimeGrouper('5D')
allMyStatistics = df.groupby(grouper).apply(foobarRules)
I have honestly no clue how the return works if you return a multi-dimensional dataframe. I know it works well if you return either a row or a column, but if you return a dataframe that contains both rows and columns for every group - I guess pandas is smart enough to compute a panel of all these. Well, you will find out.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SAS: Putting observations in bin and keep the ones closest to it - sas

Related

How do you rank based on a measure referencing two tables in a star schema?

Google Sheet Function with IF statement to add 1/0 column

Netlogo: runtime-error with list, item, and -1

SAS Hash Table (Right Join/Union)

Python 2.7 Pandas: How to replace a for-loop?

Categories

Resources