I have below dataset.
Math Literature Biology date student
4 2 5 2019-08-25 A
4 5 4 2019-08-08 A
5 4 5 2019-08-23 A
5 5 5 2019-08-15 A
5 5 5 2019-07-19 A
5 5 5 2019-07-15 A
5 5 5 2019-07-03 A
5 5 5 2019-06-26 A
1 1 2 2019-06-18 A
2 3 3 2019-06-14 A
5 5 5 2019-05-01 A
2 1 3 2019-04-26 A
I need to develop a solution in powerbi so in output I have cumulative average per subject per month
For example
April May June July August
Math | 2 3.5 3 3.75 4
Literature | 1 3 3 3.75 3.83
Biology | 3 4 3.6 4.125 4.33
Can you help?
You can use a matrix visualization for this.
Create a month-year variable and use it in the columns.
Use Average of Math,Literature and Biology in values
Under the format pane --> Values --> Show on rows --> Select this
This should give the view you are looking for. You can edit the value headers to your requirement.
I'm working with a SAS table where I have ordered data that I need to sum in intervals of 5. I don't have a unique ID I can use for the group by statement and I'm struggling to find a solution.
Say I have this table
Number Name X Y
1 Susan 2 1
2 Susan 3 3
3 Susan 3 3
4 Susan 4 1
5 Susan 1 2
6 Susan 1 1
7 Susan 1 1
8 Susan 2 4
9 Susan 1 5
10 Susan 4 2
1 Steve 2 4
2 Steve 2 3
3 Steve 1 2
4 Steve 3 5
5 Steve 1 1
6 Steve 1 3
7 Steve 2 3
8 Steve 2 4
9 Steve 1 1
10 Steve 1 1
I'd want the output to look like
Number Name X Y
1-5 Susan 13 10
6-10 Susan 9 13
1-5 Steve 9 15
6-10 Steve 7 12
Is there an easy way to get output like this using proc sql? Thanks!
Try this:
proc sql;
select ceil(Number/5) as Grouping, Name, sum(X), sum(Y)
from have
group by Name, Grouping;
quit;
I am trying to reshape a variable to wide but not getting proper way to do so.
I have the day wise count dataset for each SSUID and i would like to reshape the day to wide to show the count for each SSUID in aggregate.
Dataset:
ssuid day count
1226 1 3
1226 2 7
1226 3 5
1226 4 7
1226 5 7
1226 6 6
1227 1 3
1227 2 6
1227 3 7
1227 4 4
1228 1 4
1228 2 4
1228 3 6
1228 4 7
1228 5 5
1229 1 3
1229 2 6
1229 3 6
1229 4 6
1229 5 5
I tried some code but getting the error:
count variable not constant within SSUID variable
My code:
reshape wide day, i(ssuid) j(count)
I would like to get the following result:
ssuid day1 day2 day3 day4 day5 day6
1226 3 7 5 7 7 6
1227 3 6 7 4 . .
1228 4 4 6 7 5 .
1229 3 6 6 6 5 .
The following works for me:
clear
input ssuid day count
1226 1 3
1226 2 7
1226 3 5
1226 4 7
1226 5 7
1226 6 6
1227 1 3
1227 2 6
1227 3 7
1227 4 4
1228 1 4
1228 2 4
1228 3 6
1228 4 7
1228 5 5
1229 1 3
1229 2 6
1229 3 6
1229 4 6
1229 5 5
end
reshape wide count, i(ssuid) j(day)
rename count# day#
list
+-------------------------------------------------+
| ssuid day1 day2 day3 day4 day5 day6 |
|-------------------------------------------------|
1. | 1226 3 7 5 7 7 6 |
2. | 1227 3 6 7 4 . . |
3. | 1228 4 4 6 7 5 . |
4. | 1229 3 6 6 6 5 . |
+-------------------------------------------------+
Importing the data frame
df = pd.read_csv("C:\\Users")
Printing the list of employees usernames
print (df['AssignedTo'])
Returns:
Out[4]:
0 vaughad
1 channln
2 stalasi
3 mitras
4 martil
5 erict
6 erict
7 channln
8 saia
9 channln
10 roedema
11 vaughad
Printing The Dates
Returns:
Out[6]:
0 2015-11-05
1 2016-05-27
2 2016-04-26
3 2016-02-18
4 2016-02-18
5 2015-11-02
6 2016-01-14
7 2015-12-15
8 2015-12-31
9 2015-10-16
10 2016-01-07
11 2015-11-20
Now I need to collect the latest date per employee?
I have tried:
MaxDate = max(df.FilledEnd)
But this just returns one date for all employees.
So we see multiple employees in the data set with different dates, in a new column named "LatestDate" I need the latest date that corresponds to the employee, so for "vaughad" in a new column it would return "2015-11-20" for all of "vaughad" records and in the same column for username "channln" it would return "2016-5-27" for all of "channln" latest dates.
You need to group your data first, using DataFrame.groupby(), after which you can produce aggregate values, like the maximum date in the FilledEnd series:
df.groupby('AssignedTo')['FilledEnd'].max()
This produces a series, with AssignedTo as the index, and the latest date for each of those employees as the values:
>>> df.groupby('AssignedTo')['FilledEnd'].max()
AssignedTo
channln 2016-05-27
erict 2016-01-14
martil 2016-02-18
mitras 2016-02-18
roedema 2016-01-07
saia 2015-12-31
stalasi 2016-04-26
vaughad 2015-11-20
Name: FilledEnd, dtype: object
If you wanted to add those max dates values back to the dataframe, use groupby(...).transform() with numpy.max instead, so you get a series with the same indices:
df['MaxDate'] = df.groupby('AssignedTo')['FilledEnd'].transform(np.max)
This adds in a MaxDate column:
AssignedTo FilledEnd MaxDate
0 vaughad 2015-11-05 2015-11-20
1 channln 2016-05-27 2016-05-27
2 stalasi 2016-04-26 2016-04-26
3 mitras 2016-02-18 2016-02-18
4 martil 2016-02-18 2016-02-18
5 erict 2015-11-02 2016-01-14
6 erict 2016-01-14 2016-01-14
7 channln 2015-12-15 2016-05-27
8 saia 2015-12-31 2015-12-31
9 channln 2015-10-16 2016-05-27
10 roedema 2016-01-07 2016-01-07
11 vaughad 2015-11-20 2015-11-20
I have following data frame:
dataFrame <- data.frame(sent = c(1,1,2,2,3,3,3,4,5), word = c("good printer", "wireless easy", "just right size",
"size perfect weight", "worth price", "website great tablet",
"pan nice tablet", "great price", "product easy install"), val = c(1,2,3,4,5,6,7,8,9))
Data frame "dataFrame" looks like below:
sent word val
1 good printer 1
1 wireless easy 2
2 just right size 3
2 size perfect weight 4
3 worth price 5
3 website great tablet 6
3 pan nice tablet 7
4 great price 8
5 product easy install 9
And then I have words:
nouns <- c("printer", "wireless", "weight", "price", "tablet")
I need to extract only these words (nouns) from dataFrame and only these extracted add to new column (eg.extract) in dataFrame.
I really very appreciate any of your help od advice. Thanks a lot in forward.
Desired output:
sent word val extract
1 good printer 1 printer
1 wireless easy 2 wireless
2 just right size 3 size
2 size perfect weight 4 weight
3 worth price 5 price
3 website great tablet 6 table
3 pan nice tablet 7 tablet
4 great price 8 price
5 product easy install 9 remove this row (no match)
Here's a simple solution using the stringi package (size isn't in your nouns list btw).
library(stringi)
transform(dataFrame,
extract = stri_extract_all(word,
regex = paste(nouns, collapse = "|"),
simplify = TRUE))
# sent word val extract
# 1 1 good printer 1 printer
# 2 1 wireless easy 2 wireless
# 3 2 just right size 3 <NA>
# 4 2 size perfect weight 4 weight
# 5 3 worth price 5 price
# 6 3 website great tablet 6 tablet
# 7 3 pan nice tablet 7 tablet
# 8 4 great price 8 price
# 9 5 product easy install 9 <NA>
this is another solution. a bit more complicated but it also deletes the rows which have no matching between nouns and dataFrame$word
require(stringr)
dataFrame <- data.frame("sent" = c(1,1,2,2,3,3,3,4,5),
"word" = c("good printer", "wireless easy", "just right size",
"size perfect weight", "worth price", "website great tablet",
"pan nice tablet", "great price", "product easy install"),
val = c(1,2,3,4,5,6,7,8,9))
nouns <- c("printer", "wireless", "weight", "price", "tablet")
test <- character()
df.del <- list()
for (i in 1:nrow(dataFrame)) {
if(length(intersect(nouns, unlist(strsplit(as.character(dataFrame$word[i]), " ")))) == 0) {
df.del <- rbind(df.del, i)
} else {
test <- rbind(test,
intersect(nouns, unlist(strsplit(as.character(dataFrame$word[i]), " "))))
}
}
dataFrame <- dataFrame[-c(unlist(df.del)), ]
dataFrame <- cbind(dataFrame, test)
names(dataFrame)[4] <- "extract"
output:
sent word val extract
1 1 good printer 1 printer
2 1 wireless easy 2 wireless
4 2 size perfect weight 4 weight
5 3 worth price 5 price
6 3 website great tablet 6 tablet
7 3 pan nice tablet 7 tablet
8 4 great price 8 price
Here is another solution using loop function and if statement.
word<-dataFrame$word
dat<-NULL
extract<-c(rep(c("remove"), each=length(word)))
n<-length(word)
m<-length(nouns)
for (i in 1:n) {
g<-as.character(word[i])
for (j in 1:m) {
dat<-grepl(nouns[j], g)
if(dat == TRUE) {extract[i] <- nouns[j]}
}
}
dataFrame$extract <- extract
# sent word val extract
#1 1 good printer 1 printer
#2 1 wireless easy 2 wireless
#3 2 just right size 3 remove
#4 2 size perfect weight 4 weight
#5 3 worth price 5 price
#6 3 website great tablet 6 tablet
#7 3 pan nice tablet 7 tablet
#8 4 great price 8 price
#9 5 product easy install 9 remove