DAX to show rows with null date values in selected date slicer range - powerbi

I have a typical scenario as below.
I have a student table and it contains four columns as below :-
1.StudentID
2.StudentName
3.LastAttendanceDate
4.StudentType
Now there are some null values in the date column LastAttendanceDate.Is it possible to use a date slicer to show these values of the students who have LastAttendanceDate column value as null? In simple words: Say you are a student who went to a school on Monday, Tuesday and Friday and you were absent on Wednesday and Thursday so here Wednesday and Thursday are the days where you were absent in the week and we need to display these records in the table visualization.
My excel Input data:-
StudentID StudentName LastAttendanceDate StudentType
100 Mary 02-05-2011 10:45 Fulltime
100 Mary Fulltime
100 Mary 04-05-2011 12:45 Fulltime
100 Mary 06-05-2011 15:45 Fulltime
100 Mary Fulltime
100 Mary 08-05-2011 19:45 Fulltime
100 Mary 09-05-2011 12:45 Fulltime
101 John 02-05-2011 10:45 Part Time
101 John 03-05-2011 11:23 Part Time
101 John 04-05-2011 10:45 Part Time
101 John 06-05-2011 15:49 Part Time
101 John Part Time
101 John 08-05-2011 19:45 Part Time
101 John 09-05-2011 12:45 Part Time
so here I need to dynamically find in the week/month range or any dynamic date range say from date range 02-05-2011 and 08-05-2011 or 02-05-2011 and 09-05-2011 or even 06-05-2011 and 09-05-2011, the students who were absent and show it in my table visualization.
Can anyone provide an approach or any helpful DAX? Appreciate all the help
My present visualization looks like this :
I want to show the students who were absent in the given time range as selected in the date slicer.
so if I slide the date slicer as per minimum and maximum ranges, it should show all the rows of students who were absent or with null values for Last Attendance Date column in those time range.
Kind regards
Sameer

Related

Calculate the total of one measure (with a upper limit) in another measure

I need to calculate the total value of a column per employee per month. Then I need to impose a limit of 177 per employee per month. This will go into a matrix with employee as rows and months as columns. Lastly, i want to add up all the amounts per month to show the total in a line chart.
I made a measure to calculate the 1% with a max of amount of 177= if(0.01sum[amount]>177, 177,0.01sum[amount]). Then I used this measure in my matrix as explained above. This worked fine, but when i want to make the line chart the limit of 177 is still imposed because I use the same measure.
I tested it with some dummy data! Please do it like this:
Employee Month Amount
Jack January 1500
Joe February 20000
Joe March 1600
Jack April 1800
Brad June 10000
Jack July 9500
Joe February 9500
Brad April 6500
Jack December 12000
Joe June 8000
Brad April 9500
Jack January 1000
Jack April 1100
Jack April 8000
Joe February 12000
Joe February 12500
Joe February 13000
Brad June 15000
Brad June 16000
Here is the measure (DAX Code)you need to use:
your_measure =
if(0.01 * sum(your_table[Amount]) > 177, 177,0.01* sum(your_table[Amount]))
Then lets put it on a matrix and line chart:
If you want your 177 restriction not to be applied in line chart, Why not create another simple total measure:
= 0.01 * SUM(your table[amount])
Update requested from Peter
Now You need to check the whole picture! Employee is not a part of filter context. Model is filtered only by month! I added both measure as legends to the line chart!

Power Pivot - calculating distinctcount per week (rather than per day)

I am having problems with a distinctcount calculated by week. I have the pivot table below. I want to calculate the distinct number of vendors that have sold more than $2400 per week.
I have the following data table "sales" (only the first rows, but it has several vendors and other weeks as well):
sales day sales week vendor ID Total Sales
02.11.2020 45 vendor 1 405
03.11.2020 45 vendor 1 464
04.11.2020 45 vendor 1 466
05.11.2020 45 vendor 1 358
06.11.2020 45 vendor 1 420
07.11.2020 45 vendor 1 343
I have tried to calculate it as such:
= [vendor] =distinctcount('Sales'[vendor ID])
= [Total_sales] = sum('Sales'[Total Sales])
= [# vendors - 2400] =calculate([vendor],filter('Sales',[Total_sales]>2400))
I know that this calculation considers the sales per day, not per week. so, if instead of using $2400 I used $300, for instance, then both vendors would be marked, since in at least one day, the sales of both are higher than $300. But I only want to consider the sales in a weekly basis.
What I expect (check pivot table below): Vendor 2 would be marked (sales = 2456), but not vendor 1 (sales = 1341), i.e., total number of vendors = 1. However, none of the vendors are being counted, since no daily sales are higher then $2400
Row Labels # Vendors (distinct) total sales
Store A 3797
week 45 3797
Vendor 1 1341
02.11.2020 348
04.11.2020 202
05.11.2020 335
06.11.2020 308
07.11.2020 148
Vendor 2 2456
02.11.2020 405
03.11.2020 464
04.11.2020 466
05.11.2020 358
06.11.2020 420
07.11.2020 343
I also tried to create a column of sales in which I removed the day filter, like this:
=calculate([total_sales],ALL('sales'[sales day]))
and then recalculated the [# vendors - 2400], but it still gets me the same result as above.
The question is: how do I get to consider the total sales value per week (and not per day) for the distinctcount. Thank you for the help!
Do you have a Date calendar in your file? if no try to make one, then have a relationship from date to sales day (assuming this has your dates). That way you should be able to summarize by any date grouping eg, Month, Day, Week, Quarter etc...Or you can try parsing the other date field and add new columns to your table = weeknum(Tablename[sales day])

Combine Toad sql queries with decreasing output results into one list

I've been trying to produce a result where multiple queries return more restrictive returns. How can I see the full list as well as those records that meet the more restrictive conditions? Query 1 returns 538 records of sites in the given counties.
SELECT E_SITES.ID "SITE ID",
E_SITES.NAME "SITE NAME",
E_SITES.ADDR_1 "SITE ADDRESS"
E_SITES.CITY_NAME || ', ' || E_SITES.STATE_CODE || ' ' || E_SITES.POSTAL_CODE,
E_SITES.COUNTY_NAME
FROM E_SITES
WHERE E_SITES.COUNTY_NAME IN ('ALLAMAKEE', 'BENTON', 'BLACK HAWK', 'BREMER', 'BUCHANAN', 'CHICKASAW', 'CLAYTON', 'DELAWARE', 'DUBUQUE')
ORDER BY E_SITES.ID
Query 2 returns the number of sites that have a contact person identified. This is 503 records.
SELECT E_SITES.ID "SITE ID",
E_SITES.NAME "SITE NAME",
E_SITES.ADDR_1 "SITE ADDRESS"
E_SITES.CITY_NAME || ', ' || E_SITES.STATE_CODE || ' ' || E_SITES.POSTAL_CODE,
E_SITES.COUNTY_NAME,
E_INDIVIDUALS.FIRST_NAME || ' ' || E_INDIVIDUALS.LAST_NAME
FROM E_SITES, E_AFFILIATIONS, E_INDIVIDUALS
WHERE E_SITES.SITE_ID = E_AFFILIATIONS.SITE_ID
AND E_AFFILIATIONS.INDIVIDUAL_RID = E_INDIVIDUALS.RID
AND E_AFFILIATIONS.AFFILIATION_TYPE = ('SITE_CONTACT')
AND E_SITES.COUNTY_NAME IN ('ALLAMAKEE', 'BENTON', 'BLACK HAWK', 'BREMER', 'BUCHANAN', 'CHICKASAW', 'CLAYTON', 'DELAWARE', 'DUBUQUE')
ORDER BY E_SITES.ID
A further query would return those sites with a mailing address, which reduces the results down to 486 records. I need to get all 538 records, whether or not they have a contact or mailing address, and for those that do, have one row for each site.
Additional Information
My current results can look like this for Query 1 (including column headers for clarity, quotes to distinguish data elements):
"SITE ID" "SITE NAME" "SITE ADDRESS" "CITY, STATE ZIP" "COUNTY_NAME"
"09698" "BODINE ELECTRIC" "18114 KAPP DR" "PEOSTA, IA 52067" "BREMER"
"16895" "BRUGGEMAN LUMBER" "3003 WILLOW RD" "HOPKINTON, IA 52237" "DELAWARE"
"40047" "GENEVIEVE, LLC" "707 LINCOLN ST" "GARNAVILLOR, IA 52052" "CLAYTON"
Query 2 which requires a contact person currently only returns records that meet the requirement, even though I use the (+) operator.
"SITE ID" "SITE NAME" "SITE ADDRESS" "CITY, STATE ZIP" "COUNTY_NAME" "FIRST NAME LAST NAME"
"40047" "GENEVIEVE, LLC" "707 LINCOLN ST" "GARNAVILLOR, IA 52052" "CLAYTON" "DALE KARTMAN"
I get 1 record rather than the 3 records, with 2 having no contact person and 1 with a contact person. This is my dilema. I have to run each of these queries separately, get the results and copy them to a spreadsheet. Then I have to align the records with contact names to the 1st query of all facilities. Very labor intensive. Hope this helps clarify my needs.
If I understood you correctly, it is the OUTER JOIN you're looking for.
Here's a simple example (based on Scott's EMP and DEPT tables) which shows what it is.
There are 4 departments in the DEPT table:
SQL> select deptno from dept order by deptno;
DEPTNO
----------
10
20
30
40
However, no employee works in department 40:
SQL> select deptno, ename from emp order by deptno;
DEPTNO ENAME
---------- ----------
10 KING
10 CLARK
10 MILLER
20 FORD
20 SMITH
20 JONES
30 JAMES
30 TURNER
30 MARTIN
30 WARD
30 ALLEN
30 BLAKE
12 rows selected.
SQL>
If you want to display information collected from both of those tables (department name from the DEPT table and employee name from the EMP table), you'd join those tables - just like you did (I'll use ANSI syntax which actually JOINS tables, instead of enumerating them and putting join conditions into the WHERE clause):
SQL> select d.deptno, d.dname, e.ename
2 from dept d join emp e on e.deptno = d.deptno
3 order by d.deptno;
DEPTNO DNAME ENAME
---------- -------------- ----------
10 ACCOUNTING KING
10 ACCOUNTING CLARK
10 ACCOUNTING MILLER
20 RESEARCH FORD
20 RESEARCH SMITH
20 RESEARCH JONES
30 SALES JAMES
30 SALES TURNER
30 SALES MARTIN
30 SALES WARD
30 SALES ALLEN
30 SALES BLAKE
12 rows selected.
SQL>
Looks OK, but - I'd like to get information about DEPTNO = 40, although nobody works in it. So, use outer join:
SQL> select d.deptno, d.dname, e.ename
2 from dept d left join emp e on e.deptno = d.deptno
3 order by d.deptno;
DEPTNO DNAME ENAME
---------- -------------- ----------
10 ACCOUNTING KING
10 ACCOUNTING CLARK
10 ACCOUNTING MILLER
20 RESEARCH FORD
20 RESEARCH SMITH
20 RESEARCH JONES
30 SALES JAMES
30 SALES TURNER
30 SALES MARTIN
30 SALES WARD
30 SALES ALLEN
30 SALES BLAKE
40 OPERATIONS
13 rows selected.
SQL>
Right! Here it is! (note that LEFT JOIN produces the same result as LEFT OUTER JOIN; no need to specify "outer", although it makes thinks somewhat more obvious).
Also, there's the "old" Oracle outer join operator, (+) (literally, a + sign enclosed into round brackets). The above query would work as well if we put it like this:
select d.deptno, d.dname, e.ename
from dept d, emp e
where d.deptno = e.deptno (+);
I'd suggest you do the same with (outer join) your query. Once again:
join tables in the JOIN clause
put filters into the WHERE clause
Query will be easier to read and maintain, you'll know what is what, and - if necessary (and it might even be the case for you), if you use the "old" (+) operator, you won't be able to outer join one table to more than just one another table. As you're going deeper and deeper, you might need to outer join some table to several others, and that's where ANSI join takes place.
Good luck!

How to collapse data by week correctly in Stata?

I have a transaction level dataset and I want to collapse and calculate weekly average price. The dataset can be simplified as follows,
clear
input str9 date quantity price id
"01jan2010" 50 70 1
"02jan2010" 60 80 2
"02jan2010" 70 90 3
"04jan2010" 70 95 4
"08jan2010" 60 81 5
"09jan2010" 70 88 6
"12jan2010" 55 87 7
"13jan2010" 52 88 8
end
gen date2=date(date,"DMY")
format date2 %td
drop date
I want to create a variable date3. For every transaction happened in a week, date3 is the Monday of that week.
Here's the code I have:
sort date2
gen date3=date2 if dow(date2)==1
replace date3=date3[_n-1] if missing(date3)
format date3 %td
However, there are Mondays with no transactions, but the rest of the week has transactions. In those cases, date3 is not the Monday date of that week, but Monday date in the weeks before.
My data becomes the following using the above code:
quantity price id date2 date3
50 70 1 01jan2010
60 80 2 02jan2010
70 90 3 02jan2010
70 95 4 04jan2010 04jan2010
60 81 5 08jan2010 04jan2010
70 88 6 09jan2010 04jan2010
55 87 7 12jan2010 04jan2010
52 88 8 13jan2010 04jan2010
To me, it does not matter if id =1,2,3 have no date3. What I am concerned is that id=7 and id=8 should have a date3 of 11jan2010. But because there is no transaction on that day, the date becomes 04jan2010. Is there a way to fix this?
(I was thinking of constructing a new dataset with consecutive dates since 01jan2010 and then merge with the one above, and then drop if missing quantity of price. But I was wondering if there's a more efficient way).
In addition, I have a weekly index data that reports on every Friday since 01jan2010. If I use wofd command, Stata will generate 53 weeks in 2010. (Or more precisely, two 2010w52.) How can I get just 52 weeks in Stata?
(I found this http://www.stata.com/statalist/archive/2012-02/msg01030.html but I still cannot figure out how this can help solve my problem. )
Your weeks start on Mondays. Everything you need follows from using dow() to exploit the fact that in every one of your weeks, the day of week function dow() yields 1, 2, 3, 4, 5, 6, 0 for the days from Monday to Sunday.
The present or previous Monday for daily dates daily is just
gen Monday = cond(dow(daily) == 0, daily - 6, daily - dow(daily) + 1)
The branch is like this. If it's a Sunday, the previous Monday was 6 days ago. Otherwise, the Monday that starts the week was today if it's Monday and dow() yields 1, yesterday if it's Tuesday and 2, and so forth. Here the variable Monday is just the dates of Mondays that define the weeks.
Important detail: There are no assumptions here about dates being complete in the data or even in order.
Small note: Arbitrary names like date2 and date3 mean nothing much. Use evocative names in your questions (and your practice).
There was a sequel to the article mentioned by Robert Ferrer. search week, sj in Stata to get the references.
Do not use Stata's weeks and in particular do not use the wofd() function (not a command), as they can't help you. Stata's weeks will not map on to your weeks. The article mentioned by Robert Ferrer really is worthwhile reading to understand this (even though I wrote it).
(This is all explained in the Statalist threads you link to.)

Self Join in Pandas: Merge all rows with the equivalent multi-index

I have one dataframe in the following form:
df = pd.read_csv('data/original.csv', sep = ',', names=["Date", "Gran", "Country", "Region", "Commodity", "Type", "Price"], header=0)
I'm trying to do a self join on the index Date, Gran, Country, Region producing rows in the form of
Date, Gran, Country, Region, CommodityX, TypeX, Price X, Commodity Y, Type Y, Prixe Y, Commodity Z, Type Z, Price Z
Every row should have all the different commodities and prices of a specific region.
Is there a simple way of doing this?
Any help is much appreciated!
Note: I simplified the example by ignoring a few attributes
Input Example:
Date Country Region Commodity Price
1 03/01/2014 India Vishakhapatnam Rice 25
2 03/01/2014 India Vishakhapatnam Tomato 30
3 03/01/2014 India Vishakhapatnam Oil 50
4 03/01/2014 India Delhi Wheat 10
5 03/01/2014 India Delhi Jowar 60
6 03/01/2014 India Delhi Bajra 10
Output Example:
Date Country Region Commodit1 Price1 Commodity2 Price2 Commodity3 Price3
1 03/01/2014 India Vishakhapatnam Rice 25 Tomato 30 Oil 50
2 03/01/2014 India Delhi Wheat 10 Jowar 60 Bajra 10
What you want to do is called a reshape (specifically, from long to wide). See this answer for more information.
Unfortunately as far as I can tell pandas doesn't have a simple way to do that. I adapted the answer in the other thread to your problem:
df['idx'] = df.groupby(['Date','Country','Region']).cumcount()
df.pivot(index= ['Date','Country','Region'], columns='idx')[['Commodity','Price']]
Does that solve your problem?