How to sum a variable based on other variables in a table? - sas

I want to sum the volume variable for each name (TRD_STCK_CD) and date (TRD_EVENT_TM) variables.
Here is a sample of my data:
+--------------+--------------+-------------+--------+------------+---------
| TRD_EVENT_DT | TRD_EVENT_TM | TRD_STCK_CD | TRD_EVENT_ROUFOR | VOLUME |
+--------------+--------------+-------------+--------+------------+---------
| 3/24/2008 | 12:28:01 | ALBZ1 | 12:30 | 15370000 |
| 3/24/2008 | 13:13:44 | ALBZ1 | 13:00 | 15670 |
| 3/24/2008 | 12:20:38 | AZAB1 | 12:30 | 6830000 |
| 3/24/2008 | 13:13:44 | AZAB1 | 13:00 | 6950 |
| 3/24/2008 | 9:14:57 | BALI1 | 9:00 | 7871000 |
| 3/24/2008 | 9:15:06 | BALI1 | 9:30 | 1700000 |
| 3/24/2008 | 9:15:14 | BALI1 | 9:30 | 8500000 |
| 3/24/2008 | 9:15:24 | BALI1 | 9:30 | 5100000 |
| 3/24/2008 | 9:29:27 | BALI1 | 9:30 | 8500000 |
| 3/24/2008 | 12:28:00 | BALIl | 12:30 | 8500000 |
| 3/24/2008 | 12:28:07 | BALIl | 12:30 | 8500000 |
| 3/24/2008 | 13:13:44 | BALI1 | 13:00 | 8650 |
+--------------+--------------+-------------+--------+------------+---------
I have deleted some col. for simplicity. In next step, I want a table such as below:
+--------------+--------------+-------------+--------+------------+---------
| TRD_EVENT_DT | TRD_EVENT_TM | TRD_STCK_CD | TRD_EVENT_ROUFOR | VOLUME | volume_Sum |
+--------------+--------------+-------------+--------+------------+---------
| 3/24/2008 | 12:28:01 | ALBZ1 | 12:30 | 15370000 | |
| 3/24/2008 | 13:13:44 | ALBZ1 | 13:00 | 15670 | 15385670 |
| 3/24/2008 | 12:20:38 | AZAB1 | 12:30 | 6830000 | |
| 3/24/2008 | 13:13:44 | AZAB1 | 13:00 | 6950 | 6836950 |
| 3/24/2008 | 9:14:57 | BALI1 | 9:00 | 7871000 | |
| 3/24/2008 | 9:15:06 | BALI1 | 9:30 | 1700000 | |
| 3/24/2008 | 9:15:14 | BALI1 | 9:30 | 8500000 | |
| 3/24/2008 | 9:15:24 | BALI1 | 9:30 | 5100000 | |
| 3/24/2008 | 9:29:27 | BALI1 | 9:30 | 8500000 | |
| 3/24/2008 | 12:28:00 | BALIl | 12:30 | 8500000 | |
| 3/24/2008 | 12:28:07 | BALIl | 12:30 | 8500000 | |
| 3/24/2008 | 13:13:44 | BALI1 | 13:00 | 8650 | 48679650 |
+--------------+--------------+-------------+--------+------------+---------
Please pay attention to last col. It has been generated by summing volumes that have same TRD_STCK_CD var. So each TRD_STCK_CD obs. has just one Volume_Sum data.

Slightly different implementation of the same idea:
/*Sort by TRD_STCK_CD and temporal variables.*/
proc sort data=have out=have_sorted;
by TRD_STCK_CD
TRD_EVENT_DT
TRD_EVENT_TM;
run;
/*Sum VOLUME until the last of each TRD_STCK_CD is reached.*/
data want;
set have_sorted;
by TRD_STCK_CD
TRD_EVENT_DT
TRD_EVENT_TM;
retain tmp_volume_sum;
tmp_volume_sum + VOLUME;
if last.TRD_STCK_CD then do;
Volume_Sum = tmp_volume_sum;
call missing(tmp_volume_sum);
end;
drop tmp_:;
run;

I simplified this even more to something with just 2 columns. The code and the volume.
Here is the sample table creation:
data have;
do code = 'a','b','c';
do i=1 to floor(5*ranuni(1))+1;
volume = floor(500*ranuni(1));
output;
end;
end;
drop i;
run;
First use PROC SQL to sum the volume grouped by code. Save that in a table and put an index on code.
proc sql noprint;
create table sums as
select code, sum(volume) as volume_sum
from have
group by code;
create index code on sums;
quit;
I assume you have sorted your table by code. If not, do so.
Now we run through the data we have. Set the volume_sum to null. If we are on the last record for that code, then look up the value from the SUMS table.
data want;
set have;
by code;
volume_sum = .;
if last.code then
set sums key=code;
run;
Printed I get:
code volume volume_sum
a 485 485
b 129 .
b 460 589
c 271 .
c 265 .
c 24 .
c 33 .
c 409 1002

Related

Opening Stock Value In Powerbi

I am working on Cost of goods sold using Powerbi.
https://www.mediafire.com/file/pmb7u1thsag1kq1/Cost+of+goods+sold.pbix/file
Above is my file which i uploaded on mediafire.
i am taking the average price by using average function year wise.
If you see in File.
| GName | Year | Opening Stock | InQty | OutQty | InItemValue | Average Value | Closing Stock | Closing Stock Value | Opening Stock Value | Cost of Goods Sold |
|-------------|------|---------------|-------|--------|-------------|---------------|---------------|---------------------|---------------------|--------------------|
| Bahria Town | 2016 | | 4454 | 3586 | 126610299.8 | 28426.20113 | 868 | 24673942.58 | 0 | 101936357.2 |
| Bahria Town | 2017 | 868 | 6379 | 6547 | 166903971.5 | 23030.76743 | 700 | 16121537.2 | 0 | 150782434.3 |
| Bahria Town | 2018 | 700 | 9129 | 8709 | 271932546.3 | 27666.3492 | 1120 | 30986311.11 | 0 | 240946235.2 |
| Bahria Town | 2019 | 1120 | 9333 | 9393 | 313226466.8 | 29965.22212 | 1060 | 31763135.45 | 0 | 281463331.4 |
| Bahria Town | 2020 | 1060 | 10192 | 10136 | 362950101.2 | 32256.49673 | 1116 | 35998250.35 | 0 | 326951850.8 |
| Bahria Town | 2021 | 987 | 8882 | 8468 | 404199067.4 | 40956.43605 | 1530 | 62663347.16 | 0 | 346819100.5 |
In Above as you can see i just took the Average Value
Average Value = ([Opening Stock Value]+[initemvaluee])/([inqtyy]+[Opening Stock])
and closing stock value
Closing Stock Value = [Average Value] * [Closing Stock]
When i calculate the closing stock value of previous year its give me error.
circular dependency was detected: Measure: 'mak_stockInHandValue'[Average Value], Measure: 'mak_stockInHandValue'[Opening Stock Value], Measure: 'mak_stockInHandValue'[Average Value].
Any Suggestion to see the closing stock value in the field of opening stock value ?????
I am working on this from more than 2 weeks.
Please help me out
Thanks in advance

How do i add additional rows in M QUERY

I want to add more rows using the Query editor (Power query/ M Query) in only the Start Date and End Date column:
+----------+------------------+--------------+-----------+-------------+------------+
| Employee | Booking Type | Jobs | WorkLoad% | Start Date | End date |
+----------+------------------+--------------+-----------+-------------+------------+
| John | Chargeable | CNS | 20 | 04/02/2020 | 31/03/2020 |
| John | Chargeable | CNS | 20 | 04/03/2020 | 27/04/2020 |
| Bernard | Vacation/Holiday | SN | 100 | 30/04/2020 | 11/05/2020 |
| Bernard | Vacation/Holiday | Annual leave | 100 | 23/01/2020 | 24/02/2020 |
| Bernard | Chargeable | Tech PLC | 50 | 29/02/2020 | 30/03/2020 |
+----------+------------------+--------------+-----------+-------------+------------+
I want to find the MIN(Start Date) and MAX(End Date) and then append the range of start to end dates to this table only in the Start Date and End Date column in the Query Editor (Power Query/ M Query). Preferrable if I can create another table2 duplicating the original table and append these rows.
For example:
+----------+------------------+--------------+-----------+-------------+------------+
| Employee | Booking Type | Jobs | WorkLoad% | Start Date | End date |
+----------+------------------+--------------+-----------+-------------+------------+
| John | Chargeable | CNS | 20 | 04/02/2020 | 31/03/2020 |
| John | Chargeable | CNS | 20 | 04/03/2020 | 27/04/2020 |
| Bernard | Vacation/Holiday | SN | 100 | 30/04/2020 | 11/05/2020 |
| Bernard | Vacation/Holiday | Annual leave | 100 | 23/01/2020 | 24/02/2020 |
| Bernard | Chargeable | Tech PLC | 50 | 29/02/2020 | 30/03/2020 |
| | | | | 23/01/2020 | 23/01/2020 |
| | | | | 24/01/2020 | 24/01/2020 |
| | | | | 25/01/2020 | 25/01/2020 |
| | | | | 26/01/2020 | 26/01/2020 |
| | | | | 27/01/2020 | 27/01/2020 |
| | | | | 28/01/2020 | 28/01/2020 |
| | | | | 29/01/2020 | 29/01/2020 |
| | | | | 30/01/2020 | 30/01/2020 |
| | | | | 31/01/2020 | 31/01/2020 |
| | | | | ... | ... |
| | | | | 11/05/2020 | 11/05/2020 |
+----------+------------------+--------------+-----------+-------------+------------+
The List.Dates function is pretty useful here.
Generate the dates in your range, duplicate that to two columns and then append.
let
StartDate = List.Min(StartTable[Start Date]),
EndDate = List.Max(StartTable[End Date]),
DateList = List.Dates(StartDate, Duration.Days(EndDate - StartDate), #duration(1,0,0,0)),
DateCols = Table.FromColumns({DateList, DateList}, {"Start Date", "End Date"}),
AppendDates = Table.Combine({StartTable, DateCols})
in
AppendDates

Power Bi, compare a text column by month

I need a little help
+---------------------------------+-----------------+---------------+------------+
| Name | Opening Balance | Close Balance | Date |
+---------------------------------+-----------------+---------------+------------+
| LEAL MANZANO ABUNDIO | 394,732.87 | 406,866.31 | 31/08/2018 |
| LOPEZ GRANADOS CLAUDIA CAT | 382,567.83 | 382,567.83 | 31/08/2018 |
| ABARCA RODRIGUEZ ERNESTO | 394,142.32 | 394,142.32 | 31/08/2018 |
| OSOLLO JUAREZ PALOMA | 396,030.58 | 396,030.58 | 31/08/2018 |
| MACHUCA HERNANDEZ GUILLERM | 410,809.87 | 422,943.31 | 31/08/2018 |
| LEAL MANZANO ABUNDIO | 406,866.31 | 409,466.22 | 30/09/2018 |
| LOPEZ GRANADOS CLAUDIA CATALINA | 382,567.83 | 382,567.83 | 30/09/2018 |
| ABARCA RODRIGUEZ ERNESTO | 394,142.32 | 394,142.32 | 30/09/2018 |
| OSOLLO JUAREZ PALOMA | 396,030.58 | 396,030.58 | 30/09/2018 |
| MACHUCA HERNANDEZ GUILLERMO | 422,943.31 | 0 | 30/09/2018 |
| MACIAS SANCHEZ JOSE | 425,457.57 | 425,457.57 | 30/09/2018 |
| PARDINEZ BUCIO EDUARDO | 434,591.25 | 434,591.25 | 30/09/2018 |
| LEAL MANZANO ABUNDIO | 409,466.22 | 0 | 31/10/2018 |
| LOPEZ GRANADOS CLAUDIA CATALINA | 382,567.83 | 382,567.83 | 31/10/2018 |
| ABARCA RODRIGUEZ ERNESTO | 394,142.32 | 394,142.32 | 31/10/2018 |
| OSOLLO JUAREZ PALOMA | 396,030.58 | 396,030.58 | 31/10/2018 |
| MACHUCA HERNANDEZ GUILLERMO | 0 | 0 | 31/10/2018 |
+---------------------------------+-----------------+---------------+------------+
So i have this table with clients Names and dates, i need to compare how it changed month by month, to know how many ins and outs i had.
Thank you.
Samuel, I loaded your table into PowerBI and created the following visuals
This is accomplished by adding a new calculated column "Client Change." This basically adds a column that just puts in a field that identifies a new balance when opening is 0 and closing is > 0. Conversely, it sets the field to 'close balance' when thee opening balance > 0 and the closing = 0. I plunk it into a matrix and use the month grain from the native date hierarchy against the names. Some data fixing on the names is needed for truncated strings in your data set.
ClientChange =
if(AND([ Opening Balance ] = 0, [ Close Balance ] > 0 )
, "NewBalance"
, if(AND ([ Opening Balance ] > 0, [ Close Balance ] = 0)
, "Balance Closed"
, " -- "
)
)
I also added a measure 'client count' that counts all the rows where closing balance isn't 0.
ClientCount = COUNTX(FILTER(testData,[ Close Balance ] <> 0), testData[Name])
Hope it helps. Please note that there is an oddity with 'MACIAS SANCHEZ JOSE' -- he has a record in September but not in August or October. His neither his open nor close balance is 0. It doesn't quite make sense.

SAS - how to 'sum up' based on consecutive occurrences

First time post so hopefully someone can kindly assist on this problem I'm facing within SAS EG (still learning SAS coding so please be kind!)
If you see a snippet of the dataset below what I'm trying to do is tally up the scores (pts) by Ref based on consecutive occurrences that flag has showed for that Ref.
For Example:
If you take Ref 505 for A_Flag there is 2 different sets of consecutive occurrences of that flag then scoring will be as follows:
1st ID > 1st instance = 25 points
2nd ID > 2nd instance but 1st consecutive instance = double to 50 points
3rd ID > 0 instance = 0 points
4th ID > 1st instance = 25 points
5th ID > 2nd instance but 1st consecutive instance = double to 50 points
6th ID > 0 instance = 0 points
Therefore for this Ref A_Pts will be 150 points.
Another example:
If you take Ref 527 for B_Flag there is 4 consecutive occurrences of that flag so coring per ID:
1st ID > 0 instance = 0 points
2nd ID > 1st instance = 10 points
3rd ID > 2nd instance but 1st consecutive instance = double to 20 points
4th ID > 3rd instance but 2nd consecutive instance = double to 40 points
5th ID > 4th instance but 3rd consecutive instance = double to 80 points
Therefore for this Ref B_Pts will be 150 points
I have to say the data is in the necessary order for what I'm trying to achieve.
I'd tried using LAG function but that will only work based on the 1st consecutive instance.
I also tried calculate a count - an enumeration variable based on cats(Ref,A_Flag) - but it then orders the data incorrectly and doesnt count up accordingly
Hopefully this makes sense to someone out there!
The dataset in question:
+-----------+-----+--------+--------+--------+-------+-------+
| date | Ref | FormID | A_Flag | B_Flag | A_Pts | B_Pts |
+-----------+-----+--------+--------+--------+-------+-------+
| 01-Feb-17 | 505 | 74549 | A | | 25 | 0 |
| 01-Feb-17 | 505 | 74550 | A | | 25 | 0 |
| 10-Jan-17 | 505 | 82900 | | B | 0 | 10 |
| 13-Jan-17 | 505 | 82906 | A | | 25 | 0 |
| 09-Jan-17 | 505 | 82907 | A | | 25 | 0 |
| 11-Jan-17 | 505 | 82909 | | B | 0 | 10 |
| 03-Jan-17 | 527 | 62549 | A | | 25 | 0 |
| 04-Jan-17 | 527 | 62550 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76151 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76152 | A | B | 25 | 10 |
| 04-Jan-17 | 527 | 76153 | A | B | 25 | 10 |
+-----------+-----+--------+--------+--------+-------+-------+
Desired output (unless there is a better suggestion):
+-----------+-----+--------+--------+--------+-----------+-----------+
| date | Ref | FormID | A_Flag | B_Flag | A_Pts_Agg | B_Pts_Agg |
+-----------+-----+--------+--------+--------+-----------+-----------+
| 01-Feb-17 | 505 | 74549 | A | | 25 | 0 |
| 01-Feb-17 | 505 | 74550 | A | | 50 | 0 |
| 10-Jan-17 | 505 | 82900 | | B | 0 | 10 |
| 13-Jan-17 | 505 | 82906 | A | | 25 | 0 |
| 09-Jan-17 | 505 | 82907 | A | | 50 | 0 |
| 11-Jan-17 | 505 | 82909 | | B | 0 | 10 |
| 03-Jan-17 | 527 | 62549 | A | | 25 | 0 |
| 04-Jan-17 | 527 | 62550 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76151 | | B | 0 | 20 |
| 04-Jan-17 | 527 | 76152 | A | B | 25 | 40 |
| 04-Jan-17 | 527 | 76153 | A | B | 50 | 80 |
+-----------+-----+--------+--------+--------+-----------+-----------+
So when totalled up it'll be
+-----+-----------+-----------+
| Ref | A_Pts_Agg | B_Pts_Agg |
+-----+-----------+-----------+
| 505 | 150 | 20 |
| 527 | 100 | 150 |
+-----+-----------+-----------+
Try this:
data have;
infile cards dlm='|';
input date :date7. Ref :8. FormID :8. A_Flag :$1. B_Flag :$1. A_Pts :8. B_Pts :8.;
format date date7.;
cards;
| 01-Feb-17 | 505 | 74549 | A | | 25 | 0 |
| 01-Feb-17 | 505 | 74550 | A | | 25 | 0 |
| 10-Jan-17 | 505 | 82900 | | B | 0 | 10 |
| 13-Jan-17 | 505 | 82906 | A | | 25 | 0 |
| 09-Jan-17 | 505 | 82907 | A | | 25 | 0 |
| 11-Jan-17 | 505 | 82909 | | B | 0 | 10 |
| 03-Jan-17 | 527 | 62549 | A | | 25 | 0 |
| 04-Jan-17 | 527 | 62550 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76151 | | B | 0 | 10 |
| 04-Jan-17 | 527 | 76152 | A | B | 25 | 10 |
| 04-Jan-17 | 527 | 76153 | A | B | 25 | 10 |
;
run;
data want;
set have;
by Ref;
retain A_pts_agg B_pts_agg;
if first.Ref then do;
A_pts_agg = A_pts;
B_pts_agg = B_pts;
end;
if lag(A_flag) ne (A_flag) then A_pts_agg = A_pts;
else if A_flag = 'A' then A_pts_agg = A_pts_agg * 2;
if lag(B_flag) ne (B_flag) then B_pts_agg = B_pts;
else if B_flag = 'B' then B_pts_agg = B_pts_agg * 2;
run;

SAS delete keyword deleted everything

I am using the following code to delete some rows from a dataset on a certain condition:
data MK_RETURN;
/*delete some data to solve the beta zero problem*/
if CUM_RETURN<RMIN then delete;
run;
However, I found out that the dataset MK_RETURN became not only empty, but also missing all the variables but CUM_RETURN and return.
Before the delete operation, the dataset contains six ~ seven variables. But after the delete operation, the dataset only contains two (empty variables), i.e. CUM_RETURN, RMIN.
What is wrong here?
The input data is something like
+--------+----------+------+--------------+--------------+-------------+----------+----------------+
| SYMBOL | DATE | time | CUM_RETURN | return_sec | RMIN | one_M | MK_RETURN_RATE |
+--------+----------+------+--------------+--------------+-------------+----------+----------------+
| A | 20130108 | 1 | 0 | | 0.00023571 | 1.90E-11 | 3.130243764 |
| A | 20130108 | 2 | | -0.00117855 | 0.000235988 | 1.90E-11 | 0.000274509 |
| A | 20130108 | 3 | 0.000471976 | 0.000471976 | 0.000235877 | 1.90E-11 | 6.86083E-05 |
| A | 20130108 | 4 | | -0.000471754 | 0.000235988 | 1.90E-11 | 6.86036E-05 |
| A | 20130108 | 5 | -0.000471976 | -0.000943953 | 0.000236211 | 1.90E-11 | 6.85989E-05 |
| A | 20130108 | 6 | | -0.002362112 | 0.000236771 | 1.90E-11 | 0 |
| A | 20130108 | 7 | 0.000711876 | 0.001183852 | 0.000236491 | 1.90E-11 | -0.000137188 |
| A | 20130108 | 8 | | 0.001300698 | 0.000236183 | 1.90E-11 | 0 |
| A | 20130108 | 9 | 0.000711876 | 0 | 0.000236183 | 1.90E-11 | 0 |
| A | 20130108 | 10 | | 0 | 0.000236183 | 1.90E-11 | 0.000137207 |
| A | 20130108 | 11 | 0.000711876 | 0 | 0.000236183 | 1.90E-11 | 0.000137188 |
| A | 20130108 | 12 | | 0.000590458 | 0.000236044 | 1.90E-11 | 6.85848E-05 |
| A | 20130108 | 13 | 0.000711876 | 0 | 0.000236044 | 1.90E-11 | 0 |
| A | 20130108 | 14 | | -0.000118022 | 0.000236072 | 1.90E-11 | -0.0003429 |
| A | 20130108 | 15 | 0.000711876 | 0 | 0.000236072 | 1.90E-11 | -0.000068604 |
+--------+----------+------+--------------+--------------+-------------+----------+----------------+
You didn't declare an input dataset (no set statement) - so you have created a new, empty dataset called MK_RETURN with two variables that were assigned as missing numerics given the absence of a definition.
Try the following (if not too late):
data MK_RETURN;
set INPUTDATASET; /* THIS is the line you need */
/*delete some data to solve the beta zero problem*/
if CUM_RETURN<RMIN then delete;
run;