SAS : Understanding lag function to retain dates based on work progess - sas

I have a work progress sheet to make.
So if we have a table with work progress as new, progress, start, end and restart and some Rules were:
First, when work is NEW, start date set as '1/01/2013' and other following work_progress set same.
Second, if work is END-ed and again ADD-ed,start date is set as '01/01/2016' (Below: Work_id=3). The following work_progress must have same value.
Last case, when work (work_id:1,2) RESTARTs, start date is set beginning of work received. The later dates must follow same
'01/05/2017'. Below is the dataset outputted with my logic.
text -indent
+---------+---------------+-------------------+------------+------------+
| work_id | work_progress | received_date | start | end |
+---------+---------------+-------------------+------------+------------+
| 1 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | END | January 1, 2017 | 01/01/2013 | 02/02/2017 |
| 1 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 1 | PROGRESS | March 20, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | END | January 1, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | PROGRESS | March 20, 2017 | 01/01/2013 | 31/12/2020 |
| 3 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 3 | END | December 25, 2016 | 01/01/2013 | 02/02/2017 |
| 3 | NEW | January 1, 2017 | 01/01/2016 | 31/12/2020 |
| 3 | END | February 5, 2017 | 01/01/2013 | 02/02/2017 |
| 3 | END | March 20, 2017 | 01/01/2013 | 03/03/2017 |
| 3 | END | April 21, 2017 | 01/01/2013 | 04/04/2017 |
+---------+---------------+-------------------+------------+------------+
What i actually what my output to be:
+---------+---------------+-------------------+------------+------------+
| work_id | work_progress | received_date | start | end |
+---------+---------------+-------------------+------------+------------+
| 1 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | END | January 1, 2017 | 01/01/2013 | 02/02/2017 |
| 1 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 1 | PROGRESS | March 20, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | END | January 1, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | PROGRESS | March 20, 2017 | 01/05/2017 | 31/12/2020 |
| 3 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 3 | END | December 25, 2016 | 01/01/2013 | 02/02/2017 |
| 3 | NEW | January 1, 2017 | 01/01/2016 | 31/12/2020 |
| 3 | END | February 5, 2017 | 01/01/2016 | 02/02/2017 |
| 3 | END | March 20, 2017 | 01/01/2016 | 02/02/2017 |
| 3 | END | April 21, 2017 | 01/01/2016 | 02/02/2017 |
+---------+---------------+-------------------+------------+------------+
Requirement:
Start date should be added to later work progress when NEW and
RESTART.
In end date in work_id=3 and work_progress= END. The march and april
both should have end date that of feb.
I require to use lag here to retain the start and end dates. I already have implemented half of my problem's logic except this lag usage part.
Part of sas code:
data m_out_ds;
set m_in_ds;
by work_id work_received_date;
/*--------
Some logic to derive my rules, that gave output, first table above.
----------*/
prevstart = lag(start);
prevend = lag(end);
prev_work_progress = lag(work_progress);
if work_progress = 'END' and prev_work_progress = 'END' then end = prevend;
/*---This gave 02/02/2017 for march received date only,
but we require for april too, obvious the work has ended.----*/
if work_progress = 'PROGRESS' and prev_work_progress ='RESTART'
then start = prevstart;
/*---This however worked---*/
run;
Let me know if you've trouble understanding this.
Thanks.

This seems to match your data, but I still not sure I understand the rules. First let's make your text into data.
data have ;
infile cards dsd dlm='|' truncover ;
row+1;
length work_id 8 work_progress $8 received_date start end 8 ;
informat received_date anydtdte. start end ddmmyy.;
format received_date start end yymmdd10.;
input work_id -- end ;
CARDS;
1|NEW | November 19, 2016|01/01/2013|31/12/2020
1|PROGRESS| December 25, 2016|01/01/2013|31/12/2020
1|END | January 1, 2017 |01/01/2013|02/02/2017
1|RESTART | February 5, 2017 |01/05/2017|31/12/2020
1|PROGRESS| March 20, 2017 |01/01/2013|31/12/2020
2|NEW | November 19, 2016|01/01/2013|31/12/2020
2|PROGRESS| December 25, 2016|01/01/2013|31/12/2020
2|END | January 1, 2017 |01/01/2013|31/12/2020
2|RESTART | February 5, 2017 |01/05/2017|31/12/2020
2|PROGRESS| March 20, 2017 |01/01/2013|31/12/2020
3|NEW | November 19, 2016|01/01/2013|31/12/2020
3|END | December 25, 2016|01/01/2013|02/02/2017
3|NEW | January 1, 2017 |01/01/2016|31/12/2020
3|END | February 5, 2017 |01/01/2013|02/02/2017
3|END | March 20, 2017 |01/01/2013|03/03/2017
3|END | April 21, 2017 |01/01/2013|04/04/2017
;
data want ;
infile cards dsd dlm='|' truncover ;
row+1;
length work_id 8 work_progress $8 received_date start end 8 ;
informat received_date anydtdte. start end ddmmyy.;
format received_date start end yymmdd10.;
input work_id -- end ;
CARDS;
1|NEW |November 19, 2016|01/01/2013|31/12/2020
1|PROGRESS |December 25, 2016|01/01/2013|31/12/2020
1|END |January 1, 2017 |01/01/2013|02/02/2017
1|RESTART |February 5, 2017 |01/05/2017|31/12/2020
1|PROGRESS |March 20, 2017 |01/05/2017|31/12/2020
2|NEW |November 19, 2016|01/01/2013|31/12/2020
2|PROGRESS |December 25, 2016|01/01/2013|31/12/2020
2|END |January 1, 2017 |01/01/2013|31/12/2020
2|RESTART |February 5, 2017 |01/05/2017|31/12/2020
2|PROGRESS |March 20, 2017 |01/05/2017|31/12/2020
3|NEW |November 19, 2016|01/01/2013|31/12/2020
3|END |December 25, 2016|01/01/2013|02/02/2017
3|NEW |January 1, 2017 |01/01/2016|31/12/2020
3|END |February 5, 2017 |01/01/2016|02/02/2017
3|END |March 20, 2017 |01/01/2016|02/02/2017
3|END |April 21, 2017 |01/01/2016|02/02/2017
;
Now let's try to convert it.
data try ;
set have ;
by work_id;
retain new_start new_end ;
format new_start new_end yymmdd10.;
if first.work_id then call missing(of new_start new_end);
if work_progress in ('NEW','RESTART') then new_start=start ;
start=coalesce(new_start,start);
if work_progress in ('END') then do;
if missing(new_end) then new_end=end ;
end=coalesce(new_end,end);
end;
run;
proc compare data=want compare=try;
id row;
run;
proc print data=try; run;

Related

How to show percentage measure in a card visual?

Sample data:
+----------+------------+----------+-----------+
| PersonID | Date | Booked | Picked |
+----------+------------+----------+-----------+
| 1 | 1 Jan 2023 | 100 | 100 |
| 2 | 1 Jan 2023 | 40 | 30 |
| 3 | 1 Jan 2023 | 20 | 40 |
| 1 | 2 Jan 2023 | 50 | 80 |
| 2 | 2 Jan 2023 | 70 | 70 |
| 3 | 2 Jan 2023 | 60 | 40 |
+----------+------------+----------+-----------+
I have a measure as follows:
Performance % = DIVIDE(IF(Calls[Picked]>Calls[Booked],Calls[Booked],Called[Picked]),Calls[Booked])
I have formatted this as %
When I place this in a table visual then I get a % value.
But when I place it into a card visual then it forces me to choose sum/min/max/...
What is the way to display the value of a measure in a card visual?
How to iterate over each row to calculate the percentage value - for example - there is no DIVIDEX in dax.
Can you please try with a Measure as below-
Performance % =
DIVIDE(
IF(
SUM(Calls[Picked])>SUM(Calls[Booked]),
SUM(Calls[Booked]),
SUM(Calls[Picked])
),
SUM(Calls[Booked])
)

Rank of day of a month without weekend days in Power BI

I'm looking for a DAX way to have a column in my table which corresponds to the DAY() function without weekend days. In France weekend days are Saturday and Sunday.
Like this:
Date|Month|Year|Rank of the day with weekend days|Rank of the day without weekend days
01/10/20201|10|2021|1|1
02/10/20201|10|2021|2|1
03/10/20201|10|2021|3|1
04/10/20201|10|2021|4|2
05/10/20201|10|2021|5|3
With DAX Calculated column
You need to have following two columns in the dataset itself
| Date | Day# | Week# | Ranking |
|---------------------------- |------ |------- |--------- |
| Friday, October 1, 2021 | 6 | 40 | 1 |
| Saturday, October 2, 2021 | 7 | 40 | 1 |
| Sunday, October 3, 2021 | 1 | 41 | 1 |
| Monday, October 4, 2021 | 2 | 41 | 2 |
| Tuesday, October 5, 2021 | 3 | 41 | 3 |
| Wednesday, October 6, 2021 | 4 | 41 | 4 |
| Thursday, October 7, 2021 | 5 | 41 | 5 |
| Friday, October 8, 2021 | 6 | 41 | 1 |
| Saturday, October 9, 2021 | 7 | 41 | 1 |
| Sunday, October 10, 2021 | 1 | 42 | 1 |
| Monday, October 11, 2021 | 2 | 42 | 2 |
| Tuesday, October 12, 2021 | 3 | 42 | 3 |
such as
Day# =
WEEKDAY ( 'Table'[Date], 1 )
Week# =
WEEKNUM ( 'Table'[Date] )
Ranking =
VAR _week =
CALCULATE ( MAX ( 'Table'[Week#] ) )
VAR _rank =
RANKX (
FILTER (
ALL ( 'Table' ),
'Table'[Week#] = _week
&& EARLIER ( 'Table'[Day#] ) <> 6
&& EARLIER ( 'Table'[Day#] ) <> 7
),
'Table'[Date],
,
ASC
)
RETURN
_rank

avgOver in Quicksight

I have data from every month in 2019 but only through September in 2020. Each row contains a MonthNo., corresponding to the calendar month, and a user ID entry. It looks like this
| Month | Year | ID | MonthNo. |
|-----------|------|--------|----------|
| January | 2019 | 611330 | 01 |
| January | 2019 | 174519 | 01 |
| January | 2019 | 380747 | 01 |
| February | 2019 | 882347 | 02 |
| February | 2019 | 633797 | 02 |
| February | 2019 | 863219 | 02 |
| March | 2019 | 189924 | 03 |
| March | 2019 | 241922 | 03 |
| March | 2019 | 563335 | 03 |
| April | 2019 | 648660 | 04 |
| April | 2019 | 363710 | 04 |
| April | 2019 | 606284 | 04 |
| May | 2019 | 296508 | 05 |
| May | 2019 | 287650 | 05 |
| May | 2019 | 599909 | 05 |
| June | 2019 | 513844 | 06 |
| June | 2019 | 891633 | 06 |
| June | 2019 | 138250 | 06 |
| July | 2019 | 126235 | 07 |
| July | 2019 | 853840 | 07 |
| July | 2019 | 713104 | 07 |
| August | 2019 | 180511 | 08 |
| August | 2019 | 451735 | 08 |
| August | 2019 | 818095 | 08 |
| September | 2019 | 512621 | 09 |
| September | 2019 | 674079 | 09 |
| September | 2019 | 914015 | 09 |
| October | 2019 | 132859 | 10 |
| October | 2019 | 560572 | 10 |
| October | 2019 | 272557 | 10 |
| November | 2019 | 984001 | 11 |
| November | 2019 | 815688 | 11 |
| November | 2019 | 902748 | 11 |
| December | 2019 | 880285 | 12 |
| December | 2019 | 167629 | 12 |
| December | 2019 | 772039 | 12 |
| January | 2020 | 116886 | 01 |
| January | 2020 | 386078 | 01 |
| February | 2020 | 291060 | 02 |
| February | 2020 | 970032 | 02 |
| March | 2020 | 907555 | 03 |
| March | 2020 | 560827 | 03 |
| April | 2020 | 938039 | 04 |
| April | 2020 | 721640 | 04 |
| May | 2020 | 131719 | 05 |
| May | 2020 | 415596 | 05 |
| June | 2020 | 589375 | 06 |
| June | 2020 | 623663 | 06 |
| July | 2020 | 577748 | 07 |
| July | 2020 | 999572 | 07 |
| August | 2020 | 630975 | 08 |
| August | 2020 | 442278 | 08 |
| September | 2020 | 993318 | 09 |
| September | 2020 | 413214 | 09 |
This example table has exactly 3 records for every month in 2019, and exactly 2 records for every month in 2020. So when I add a calculated field called MonthNotYearTraffic, defined by
// Averages ID count by month number only, intentionally ignoring year.
avgOver(count(ID), [{MonthNo.}])
I expect the following results
| MonthNo. | MonthNotYearTraffic |
|----------|---------------------|
| 01 | 2.5 |
| 02 | 2.5 |
| 03 | 2.5 |
| 04 | 2.5 |
| 05 | 2.5 |
| 06 | 2.5 |
| 07 | 2.5 |
| 08 | 2.5 |
| 09 | 2.5 |
| 10 | 3 |
| 11 | 3 |
| 12 | 3 |
since months 10-12 only have the three abovementioned 2019 entries. But instead, the results are:
I've tried this several different ways and combinations of the following (several of which I know to be insane, but others unsure):
at first not relying on custom, calculated fields
by partitioning on both month and year in the calculated field definition
by messing with level aware aggregations
by ensuring data types to agged by are strings/dimensions
No dice.
This seems like it should be straightforward technique, so any pointers would be nice. Thank you.
It looks as if you need to partition the count of your IDs by month and then divide that count by the count of years in which you have user IDs in that month.
Using your sample data I was able to get your desired output.
MonthNotYearTraffic = countover(ID,[Month],PRE_FILTER)/distinctCountOver(Year,[Month],PRE_FILTER)
I think the problem is that avgOver only works when you have the data displayed like you do in your first table where you are defining the values in the question. Since you are only showing the MonthNo. field and there are not many rows with that same MonthNo. value, there is only one row for each month in that partition so it's simply dividing the count by 1.
Maybe try something like count(ID) / count("MonthNo.")

How not to display subtotals in matrix with blank values

I am using a measure not to display column subtotals. it works fine if I dont have any blanks.
Bond Count w/o =
VAR Bonds = CALCULATE(DISTINCTCOUNT(fact_Premium[PolicyNumber]))
RETURN
IF(
NOT(HASONEVALUE(dim_Date[Year])) && HASONEVALUE(dim_Date[Month]),
BLANK(),
Bonds
)
//HASONEVALUE returns TRUE when there is only one value in specified column
But in this case my matrix have blanks for particular months, and due to that measure doesnt work. So it brings subtotals for each column.
Is any way somehow to modify it so the logic would work for cases like that?
Thank you
.pbix can be found here: https://www.dropbox.com/s/h9xmpx6t997aqg9/TestBI.pbix?dl=0
Edit:
I have created a table with random data posted at the end of the post, where premium is the column with numbers.
The calculation below has two nested IF. The first forces a column subtotal using SELECTEDVALUE(Table[Year]). The second nested IF only consider the sum of premium when is equal to the grand total.
Sum_Premium = IF(SELECTEDVALUE('Table'[Year]),SUM('Table'[Premium]),if(sum('Table'[Premium])=CALCULATE(sum('Table'[Premium]),all('Table')),sum('Table'[Premium]),BLANK()))
Table
+------+-------+---------+
| Year | Month | Premium |
+------+-------+---------+
| 2017 | Jan | 10 |
| 2017 | Feb | 349 |
| 2017 | Mar | 406 |
| 2017 | Apr | 350 |
| 2017 | May | 31 |
| 2017 | Jun | 151 |
| 2017 | Jul | 266 |
| 2017 | Aug | 393 |
| 2017 | Sep | 278 |
| 2017 | Oct | 395 |
| 2017 | Nov | 119 |
| 2017 | Dec | 130 |
| 2018 | Jan | 190 |
| 2018 | Feb | 107 |
| 2018 | Mar | 248 |
| 2018 | Apr | 60 |
| 2018 | May | 302 |
| 2018 | Jun | 23 |
| 2018 | Jul | 248 |
| 2018 | Aug | 347 |
| 2018 | Sep | 31 |
| 2018 | Oct | 218 |
| 2018 | Nov | 326 |
| 2018 | Dec | 251 |
| 2019 | Jan | 173 |
| 2019 | Feb | 86 |
| 2019 | Mar | 29 |
| 2019 | Apr | 68 |
| 2019 | May | 19 |
| 2019 | Jun | 189 |
| 2019 | Jul | 261 |
| 2019 | Aug | 229 |
| 2019 | Sep | 338 |
| 2019 | Oct | 407 |
| 2019 | Nov | 409 |
| 2019 | Dec | 296 |
+------+-------+---------+

Reshape long panel data to wide where data is not unique within ID

I have a dataset that looks like this:
| State | Year | Industry | Employment |
|-------|------|----------|------------|
| AL | 2014 | 1 | 123345 |
| AL | 2015 | 1 | 145411 |
| AL | 2016 | 1 | 149402 |
| AL | 2014 | 2 | 153518 |
| AL | 2015 | 2 | 157773 |
| AL | 2016 | 2 | 163156 |
| AK | 2014 | 1 | 167187 |
| AK | 2015 | 1 | 167863 |
| AK | 2016 | 1 | 163320 |
| AK | 2014 | 2 | 162419 |
| AK | 2015 | 2 | 166116 |
| AK | 2016 | 2 | 170136 |
I would like to end up with a dataset that looks as follows:
| State | Year | Employment_Industry1 | Employment_Industry2 |
|-------|------|----------------------|----------------------|
| AL | 2014 | 123345 | 153518 |
| AL | 2015 | 145411 | 157773 |
| AL | 2016 | 149402 | 163156 |
| AK | 2014 | 167187 | 162419 |
| AK | 2015 | 167863 | 166116 |
| AK | 2016 | 163320 | 170136 |
As you can see, the data I have is in long format but the years are repeated within a State by Industry. This is causing an issue when I reshape wide.
I generated IDs for a couple of different variable groupings, but I end up with an error to the effect of:
values of variable Industry not unique within ID
What kind of an ID do I need to create, or is there something I can do to create the desired dataset?
The following works for me:
clear
input str2 State Year Industry Employment
AL 2014 1 123345
AL 2015 1 145411
AL 2016 1 149402
AL 2014 2 153518
AL 2015 2 157773
AL 2016 2 163156
AK 2014 1 167187
AK 2015 1 167863
AK 2016 1 163320
AK 2014 2 162419
AK 2015 2 166116
AK 2016 2 170136
end
egen id = group(State)
reshape wide Employment, i(id Year) j(Industry)
drop id
order State Year Employment*
list, abbreviate(15) sepby(State)
+------------------------------------------+
| State Year Employment1 Employment2 |
|------------------------------------------|
1. | AK 2014 167187 162419 |
2. | AK 2015 167863 166116 |
3. | AK 2016 163320 170136 |
|------------------------------------------|
4. | AL 2014 123345 153518 |
5. | AL 2015 145411 157773 |
6. | AL 2016 149402 163156 |
+------------------------------------------+