I have a dataset full of transactions and each observation has account number, date, and transaction amount variables. Obviously multiple transactions will have the same account number.
I want to calculate the total transaction amount for each account number over the last 15 days for each transaction.
So my final dataset set will be a set of transactions with the following variables: account number, date, transaction amount, and total transaction amount over past 15 days.
Any ideas?
Thanks!
You can do it with proc SQL with a self merge, remove the where clause here, it's just for example.
This does actually do two passes of the data but it will be in one proc.
proc sql;
create table want as
select a.stock, a.date, a.open, sum(b.open) as total_open
from sashelp.stocks as a
left join sashelp.stocks as b
on a.date-b.date between 0 and 15
and a.stock=b.stock
where a.stock='IBM'
group by a.stock, a.date, a.open
order by a.stock, a.date;
quit;
Related
The following is an example of the data I have
startdate
enddate
amount
1/1/2010
2/2/2020
10
1/5/2011
2/3/2015
10
1/3/2012
2/2/2023
10
1/4/2013
2/2/2014
10
5/5/2015
2/2/2028
10
1/6/2016
2/2/2032
10
I want to calculate the sum of all existing amounts as of each start date so it should look like this:
startdate
amount
1/1/2010
10
1/5/2011
20
1/3/2012
30
1/4/2013
40
5/5/2015
30
1/6/2016
40
How do I do this in SAS?
Essentially what I want to do is for each of the start dates, calculate the cumulative sum of any amounts that haven't expired. So for the first four dates, it is just a running cumulative sum because none of the amounts have expired. But at 5/5/2015, two of the previous amounts have expired hence a cumulative sum of 30. Same for the last date, where the same two have previously expired and you have the additional amount as of 1/6/2016 therefore 40.
One way to accomplish this is with a self-join via Proc SQL:
proc sql;
create table out_dset as
select a.startdate, sum(a.amount) as amount
from in_dset as a left join in_dset as b
on a.startdate >= b.startdate and a.startdate < b.enddate
group by a.startdate
order by a.startdate;
quit;
For each observation in the original dataset, this code will find observations in the same dataset that meet the date range criteria and will sum up the amount column.
You can change the second comparison operator from < to <= if you want to include situations when a previous amount expired on the same date as a given startdate.
I have a Power BI/DAX question. I'm looking to summarize my data by getting monthly transaction sums (including the year as well, i.e. MM/YY) and filtering them by individual account numbers. Here is an example:
I want to take that and make it into this:
I converted the dates to the format I want with this code:
Transaction Month = MONTH(Table[Date]) & "/" & YEAR(Table[Date])
Then got the total monthly sum:
Total Monthly Sum = CALCULATE(sum(Table[Transaction Amount]),ALLEXCEPT(Table, Table[Transaction Month]))
Now I'm trying to figure out how to filter the total monthly sum by individual account numbers. Just as a note - I need this to be a calculated column as well because I'll want to identify accounts that surpass individual account monthly spending limits. Can anyone help me with this?
Thanks so much!
When working with calendar dates, it pays to have a calendar table linked to the transaction table. In the calendar table you will have each date, from the start date of your relevant time period to the end of the time period relevant to your data. The columns of the calendar table can then contain calculations on that date like month number, month name, year, year-month key, transaction month (as the first day of the month for the date in that row), etc.
Next, connect the two tables in the data model by dragging the transaction date to the calendar date column.
Now you can build charts and report tables that group data by month without writing any complicated DAX. Just pull the field "transaction month" from the calendar table and the Total Sum measure from the transaction table into the field well of the visual.
That's what Power BI is all about.
I have a table with customers' SSN, account number, purchase date, and max purchase date (the most recent purchase date for SSN, across all the accounts). Customers can have multiple accounts. I know how to create a measure the calculate the distinct count of all the accounts that haven't been active since a certain date (6 months, 18 month, 24 months)..
I would like to create a measure or a calculated column to give me the following information.
when users select the date from the slicer (say 6 months) the chart shows the count of the accounts that have not made a purchase in 6 months, the users also want to have a drop down slicer("Yes", "No") to indicate if the SSN had activities under other accounts. i.e. if the max purchase date is greater than the value from the date slicer.
the table structure looks like this:
SSN AccountNumber LastPurchaseDate MaxPurchaseDate
123-45-5678 9876 8/2/2018 9/4/2019
123-45-5678 6398 9/4/2019 9/4/2019
135-65-4321 2233 6/6/2019 6/6/2019
Best way here would be if you add a custom column with the time difference (in the query designer):
= [MaxPurchaseDate] - [LastPurchaseDate]
Now you have something like this:
SSN AccountNumber LastPurchaseDate MaxPurchaseDate DateDiffDays
123-45-5678 9876 9/2/2018 9/4/2019 2
123-45-5678 6398 9/4/2019 9/4/2019 0
135-65-4321 2233 6/12/2019 6/6/2019 6
You can add another column which acts as filter for your 6 months, 18 month, 24 months (convert the DateDiffDays into months).
The following measure counts the accounts:
=Distinctcount('YourTable'[AccountNumber])
If you filter now by your 6 months, 18 month, 24 months column the measure gets after every filtering calculated again and you get your result.
Suppose I have the following database:
DATA have;
INPUT id date gain;
CARDS;
1 201405 100
2 201504 20
2 201504 30
2 201505 30
2 201505 50
3 201508 200
3 201509 200
3 201509 300
;
RUN;
I want to create a new table want where the average of the variable gain is grouped by id and by date. The final database should look like this:
DATA want;
INPUT id date average_gain;
CARDS;
1 201405 100
2 201504 25
2 201505 40
3 201508 200
3 201509 250
I tried to obtain the desired result using the code below but it didn't work:
PROC sql;
CREATE TABLE want as
SELECT *,
mean(gain) as average_gain
FROM have
GROUP BY id, date
ORDER BY id, date
;
QUIT;
It's the asterisk that's causing the issue. That will resolve to id, date, gain, which is not what you want. ANSI SQL would not allow this type of functionality so it's one way in which SAS differs from other SQL implementation.
There should be a note in the log about remerging with the original data, which is essentially what's happening. The summary values are remerged to every line.
To avoid this, list your group by fields in your query and it will work as expected.
PROC sql;
CREATE TABLE want as
SELECT id, date,
mean(gain) as average_gain
FROM have
GROUP BY id, date
ORDER BY id, date
;
QUIT;
I will say, in general, PROC MEANS is usually a better option because:
calculate for multiple variables & statistics without need to list them all out multiple times
can get results at multiple levels, for example totals at grand total, id and group level
not all statistics can be calculated within PROC MEANS
supports variable lists so you can shortcut reference long lists without any issues
Working in SAS but using some SQL code to count the number of unique patients but also the total number of observations for a set of indicators. Each record has a patient identifier, the facility where the patient is, and a group of binary indicators (0,1) for each bed section (the particular place in the hospital where the patient is). For each patient record, only 1 bed section can have a value of '1'. Overall, patients can have multiple observations in a bed section or in other bed sections, i.e. patients can be hospitalized > 1. The idea is to roll this data set up by facility and count the total # of admissions for each bed section but also the total people for each bed section. The people count will always be <= to the observation count. Counting people was just added to my to-do list and to this point I was only summing up observations for each bed section using the code below:
proc sql;
create table fac_bedsect as
select facility,
sum(bedsect_alc) as bedsect_alc,
sum(bedsect_blind) as bedsect_blind,
sum(bedsect_gen) as bedsect_gen
from bedsect_type
group by facility;
quit;
Is there a way I can incorporate into this code the # of unique people for each bed section? Thanks.
With no knowledge of the source table(s) it is impossible to answer precisely, but the syntax for counting distinct values is as seen below. You will need to use the correct column name where I have used "patient_id":
SELECT
facility
, COUNT(DISTINCT patient_id) AS patient_count
, SUM(bedsect_alc) AS bedsect_alc
, SUM(bedsect_blind) AS bedsect_blind
, SUM(bedsect_gen) AS bedsect_gen
FROM bedsect_type
GROUP BY
facility
;