I would like to create a large demographic table by each country.
Wondering if there is a way to create a table using 'by statement' and changing the first column name as per the country?
As there is an if condition in countries, so I was wondering to create the list of macro variables for all the country names but they can be between 15 to 25, would you let me know how this can be achieved?
Related
I would like to know whether my data modeling is working for Power BI.
The dataset I am using is training course for students and corporate. The original data has 3 tables separated by individual program. The purpose of my visualization is to analyze the 3 programs from all students in a single dashboard.
Here is the original data after being imported to Power BI:
Here is the data pre-processing:
Remove unneeded column
DPT table
Remove column – No, Date, Quarter
DTP table
Remove column – Count, Email, Date
LLD table
Remove column – Email, To calculate, Learning Hours
Rename column name & impute missing value with “Not given”
DPT table
Trainee = Name, Training Provider = Provider, Course name = Course, Focus area = Domain
DTP table
Participant Name = Name, Event/Training Name = Course, Training providers = Provider
Create new column and impute them with “No given” and put in same
position (for append tables later)
DTP table
Level
LLD table
Company, Provider, Level
Create a new column called Program and impute value as it program
name for each row.
Post cleaned table:
After appending the 3 tables and calling it Master:
. Duplicate the Master table to create Student, Provider and Program table. In each table remove irrelevant columns, remove duplicates and create unique ID.
Final data model:
The focus is the Program, Provider and Student tables. The rest of the tables will be deactivated the relationship when creating calculated columns and measures before I make any correction to the data model.
Is there any proper approach to build the data model?
From my data model in the last picture, does it mean that the Provider table is a fact table while the Student and Program tables are dimensions?
I agree with removing unneeded columns, renaming columns for a better look, and substituting 'Not Given' in place of NULLS (Caution here: Measures and Dimensions handle nulls differently, for dimensional values substituting is okay)
If modeling using PowerBI is a must, then the following strategy should happen:
The dimensions can be Students, Programs, Providers
A Factless Fact table (FactProgram or something like that). It will have dimensional keys to Students, Programs, Providers ( and additional measures that you can create or can be taken from Master )
Remove unnecessary columns from the dimensions, so that the Remove Duplicates will give you what you want. For instance, Student and Program currently have same columns coming from master (Company, Course, Domain, Level, Program, Provider). Make it clear which columns belong to which dimensions and optionally create new dimensions (maybe, DimCompany)
I have two reports: first report has a column with sponsor names (name: Sponsor) and the other report also has sponsor names but written differently (name: Companies). Example:
**Sponsor**
Apple
Target
Amazon
IBM
Samsung
**Sponsor (Other)**
Apple Inc
Target LLC
Amazon Marketplace
IBM Computers
Samsung Company
I have appended these two columns so that they are in the same report called Sponsor_All, column names the same as above. What I would like is to create a new column where it would pull in the names from the sponsor column and change the name of the Sponsor (Other) column based on a lookup table so that all of the names are labeled like the sponsor column. Hope thats makes sense.
It looks like what you need is conditional column of some sort.
First thing I would recommend, when you append the two tables, if column names are the same it will append values from both tables into one column, instead of having two columns with different name and a bunch of null values. After that you can either add Conditional Column and hard code the values (so it will be something like: if Sponsor starts with 'Apple' then 'Apple' etc.).
This is not the best approach though, cause you will have to maintain these conditions manually. Better if there's some sort of pattern that you can notice, e.g. in your example I see that to get from Sponsor (other) to Sponsor you just need to extract the first word. If this is the pattern you always have you can use Custom Column and use formula to extract only the first word.
Lastly, if you already have some sort of lookup table you can merge it (after appending together Sponsors and Sponsors(Other) into one column) and use fuzzy lookup option. In all honesty though I never used it and not sure how good it is. If it gives you good enough result you can "clean it up" in next step with custom or conditional column.
I am new to SAS and data analytics in general. So sorry if my question sound too dumb.
I have a dataset of brand medicine with three variables. Variable 1 contains the drug name, variable two contains whether that drug is BRANDED, Generic or Brand-Generic and variable 3 contains the total sale of that drug.
What I want is percent split the BRANDED, GENERIC AND BRANDED GENERIC drugs among total drug sale. The final output should look like
Branded : 35%
Generic : 25%
Branded-Generic : 40%
Any help with a sas code which would do that is greatly appreciated thank you.
So you want a % sale split! You can try using SQL (proc sql) to get your desired answer.
proc sql;
create table want as
select drug_type, sum(total_sale) as tot_sale
from have
group by drug_type;
create table want as
select *, tot_sale/sum(tot_sale) as percent_sale format=percent10.2
from want;
quit;
I created a table 'want' that will have total sale for each drug type. Using that table, I created a column that has the calculated sale percentage and formatted it to a percent (for easy view).
Of course, there are other ways of doing it, like using proc summary or proc freq or even a data step. But as a beginner, I guess starting out with SQL would be a good decision.
I am essentially trying to do that this guy is trying to do:
Excel drop-down list using vLookup
I've gone through the steps and since my data set has about 400 different drop down options I am hoping there is an easier way than naming ranges. I have a list of about 400 different account names. Each of these account names is tied to a household and identified by the household ID number. A household can have anywhere from 1-5 account names. I would like for my drop down menu to be able to identify a household ID number and then provide the drop down with the account names associated with it.
Example:
Where household Id number is the identifier.
I've gone here as well http://sites.madrocketscientist.com/jerrybeaucaires-excelassistant/data-validation/dynamic-indirect and am trying to find a way to save on some of the manual work.I am a total newbie so thank you in advance for any help you can provide.
Let's pretend you have three worksheets. The first worksheet, Sheet1, is the sheet that your users will be looking at and it has the drop down lists. We'll say that the cell containing the first drop down option is in B1 (the one with about 400 different options) and the cell containing the dependent drop down is in B2:
On your second sheet, which we'll call MasterList, is all of the data options and the corresponding data. Row 1 is a header row; Row 2 and down is the actual data:
On your third sheet, which we'll call DropDownLists, is where the magic happens. It first needs to have a list of all the unique data options. I've put that in column A. You can get the unique data options from your master list through whatever means you prefer (Advanced Filter, Pivot Table, formula, VBA, etc). That list of unique data options is the basis of your drop down list for Sheet1 cell B1. Then in cell B2 of DropDownLists and copied down as far as needed to guarantee it will grab all data associated with the selected Data Option, use this formula (adjust sheet names and ranges to suit your actual data):
=IF(OR(Sheet1!$B$1="",ROW(B1)>COUNTIF(MasterList!$A$2:$A$10000,Sheet1!$B$1)),"",INDEX(MasterList!$B$2:$B$10000,MATCH(1,INDEX((MasterList!$A$2:$A$10000=Sheet1!$B$1)*(COUNTIF(B$1:B1,MasterList!$B$2:$B$10000)=0),),0)))
This makes your DropDownLists sheet look like this:
Lastly, we need to make that list (data based on the selected Data Option) a dynamic named range. I named it listFilteredData and defined it with this formula:
=DropDownLists!$B$2:INDEX(DropDownLists!$B:$B,MAX(2,ROWS(DropDownLists!$B:$B)-COUNTBLANK(DropDownLists!$B:$B)))
Then for Sheet1 cell B2, use Data Validation and define the list with =listFilteredData and you will get results as shown in my example.
Practice with this.....it will get what you need.
Your data will have to sorted for this to work.
Sorted Data
Data Validation for Shows
Data Validation for Episodes
This formula in the data validation box
=OFFSET($A$1,MATCH($E$4,$A:$A,0)-1,1,COUNTIF($A:$A,$E$4),1)
Enter the array formula to find the Duration
Must be confirmed with Ctrl & Shift & Enter
=INDEX($C$4:$C$18,MATCH(1,(E4=$A$4:$A$18)*(F4=$B$4:$B$18),0))
I am trying to create a table where it only counts the attendees one one type of training (rows) if they attended another particular training (column) AFTER the first one. I think I need to recreate a countif function that compares the dates of the trainings, but not sure how to set this up so that it compares the dates of the row trainings and column trainings. Any ideas?
Edit 3/23
Alex, your solution would work if I had different variables for the dates of each type of training. Is there a way to construct this without having to create new variables for each type of training that I want to compare? Put another way, is there a way to refer to the rows and columns of the table in the formula that would compare the dates? So, something like "count if the start date of this column exceeds the start date of this row." (basically, is there something like the Excel index function in Tableau?)
It may help to see how my data is structured -- here is a scrubbed version: https://docs.google.com/spreadsheets/d/1YR1Wz-pfGHhBxDQDGYgmemLGoCK0cSvKOeE8w33ZI3s/edit?usp=sharing
The "table" tab shows the table that I'm trying to create in Tableau.
Define a calculated field for your condition, called say, trained_after, as:
training_b_date > training_a_date
trained_after will be true or false for each data row depending on whether the B training was dated later than the A training
If you want more precise control over the difference between the dates, use the date_diff function. Say date_diff("hour", training_a_date, training_b_date) > 24 to insist upon a short waiting period.
That field may be all you need. You can put trained_after on the filter shelf to filter only to see data rows meeting the condition. Or put it on another shelf to partition the data according to that condition. Or use your field to create other calculated fields.
Realize that if either of your date fields is null, then your calculated field will evaluate to null in that case. Aggregate functions like Sum(), Count() etc ignore null values.