Issue with left join in sas EG7.1 - sas

I am new to sas and i am facing the following issue. I have two data sets:
one set with retailer info, product, date and actual sales
the second one has the same retailer info,product,date and some causal variables that may impact the sales
when i try to merge those 2 using a left join (on retailer,date and product) i get all the info i need properly but the actuals column is giving me empty cells. I dont get any errors when its running.
Does anyone has any idea why this is happening?

after doing some more digging i found out that the issue is not with the join but one step before where i am doing a delta load for the future updates. in the final step of the delta load
proc timeseries data=WORK.prep1 (where=(Actuals>0)) out=prep3;
by customer segment;
var Actuals / setmissing=0 accumulate=total;
id Date interval=week format=date9.;
run;
it gives me back the date with one day less. So the week 12jul2021 it comes back as 11jul2021. this has as a result the join to have different dates in the 2 tables. for the moment i have put aside this step and it is working but i need to find a solution in order to have it work properly

Related

Trying to make a date slicer which includes both date and time

I have a table of many datetime datapointname value triplets from process probe sensors.
I want to have a slicer which I can use to pick from time1 on day 1 to time2 on day 2
So my approach has been to have three slicers [Time Start] [Date Slicer] [Time End]. I have the data table and two time tables in 30 minute intervals so I have been able to create measures and display cards to show on the report the start and stop datetime combo.
I try to combine all this into a new table using DAX which filters the data table using the measures that I have but have not had any success.
Does anyone how how to set up a slicer or set of slicers so I can filter my report (for example), from 2022-Jun-15 at 4:30 PM to 2022-Jun-18 at 10:00 AM?
I am just learning about DAX. I have spent a few hours at this with no sucess.
Thanks
That's probably not a good idea from a modelling perspective. Data modelling best practice suggests that your Date (where 1 row = 1 day) and Time (where 1 row = 1 second or 1 minute, as necessary) dimension tables should be separate.
I would therefore recommend you setup your model in such a way that you have these two dimensions as two separate tables. You should then just be able to have two slicers - one from each table - to do what you need.
Note: this of course means that your fact tables will need to have separate columns (one for the date and one for the time) to join to your Date and Time dimension tables.

Create a pivot table in Power BI with dates instead of aggregate values?

I have a table of companies with descriptive data about where we are in the sales stage with the company, and the date we entered that specific stage. As can be seen below, the stages are rows in a Process Step column
My objective is to pivot this column so each Process Step is a column, with a date below it, as shown in excel:
I tried to edit the query and pivot the column upon loading it, but for the "aggregate value" column, no matter which column I use as the values column, it results in some form of error
My advice would be not to pivot the table in the query and use measures to get dates that you want. The benefit of not doing so is that you are able to perform all sorts of other analytics. For instance, Sankey chart would be hard to do properly with pivoted table.
To get the pivot table you are showing from Excel, it's as simple as using matrix visual in Power BI and putting Client code in rows and Process Step in Columns, then Effective date in values.
If you need to perform calculations between stages, it's also not too difficult. For instance, you can create a meausure that shows only dates at certain stages, another measure for another stage, and so on. For example:
Date uploaded = CALCULATE(MAX(Table[Effective Date]), FILTER(Table, Table[Process Step] = "Upload"))
Date exported = CALCULATE(MAX(Table[Effective Date]), FILTER(Table, Table[Process Step] = "Export"))
Time upload to export = DATEDIFF([Date uploaded], [Date exported], DAY)
These measures will work in the context of client and assuming there is only one date for the upload step (but no Process step in rows or columns). If another scenario is needed, perhaps a different approach could be taken.
Please let me know if that solves your problem.

Summation of rows based on Dropdown selection in Power BI

I have a dropdown in Power BI that contains different project name such as Project One, Two, Three. I have included one formula to bring forecast value which is:
Forecast = Chase * Target%
I have created one measure that calculates forecast. The dataset contains weekly based data for Chase and Target %. For example week 1 (Jan 01-Jan 08) Chase will be 30 and target % as 10 hence the forecast for Week 1 is 3 (30*10%)
When I select one project from dropdown list e.g. "Project One" I see the forecast value populating correctly. Same goes if I select only one project from dropdown list .
The issue arises when I select multiple projects and then the forecast value brings the maximum value instead of bringing summation to the values of all weeks of all projects.
Question: What exactly is causing the issue?
Now I understand your requirement from your comments. You can achieve this through 2 step as explained below-
Step-1: Create a custom column in your data source as below-
row_level_forecast = finetarget[chase]/100.00 * finetarget[target]
Step-2: Create the final Measure as below-
forecast = sum(finetarget[row_level_forecast])
Now, use measure "forecast" in the report. This should give you the desired output.
ISSUE-2: From your comments
If I understand correct, you are talking about a case where you are concern about values in columns I marked red in the below picture-
If I am correct with my understanding, you wants to fill week-3 values for Project-1 with 80/70 and for Project-2 100/90. If this is ok, just follow these following steps.
Step-1: Go to EDIT mode clicking "Transform Data" option and select the table you wants to adjust data.
Step-2: Sort your data first for project_name (ascending) then week (ascending). The output will be also as shown in the above image.
Step-3: Select column "chase" in the table and click Fill>>Down option.
Step-4: Repeat step 3 for column"target" as well.
The final output should be as below. Just move back to main report by clicking "Close and Apply". Data should be now as expected in your report.
When you display the forecast, put it in a grid and add the project column, the week column (e.g. Week 1) and the forecast measure. When you select your multiple projects the grid will show each of those along with the calculated measure. If this does not work, there is something wrong with your measure and you should add your measure calculation script to your question.
The measure should be simple, something like:
Forecast = SUM(YourTable[Chase]) * AVERAGE(YourTable[Target%])

Power BI visualization of data with a Start and End date

THis is an example of what I think i need to do
I would like to ask some modeling advise I cannot solve myself:
I am using Power BI to visualize the time machinery is out of order.
The source is a register of equipment not functioning, with a start date and end date (note that there is no end date if the machine is not fixed yet).
I would like to show the time (hours, percentage, etc) that the machinery is out of order, filter for a specific period /date (e.g. month).
So I have 2 date columns: ‘’Start out of order’’ and ‘’Back in order’’
I do have a date table, which I usually would connect to all the date variables. However, since I am working with a Start and End date. This does not give the result I am looking for.
Any help is very much appreciated!
Kind regards,
Link to my Power BI FILE:
https://wetransfer.com/downloads/83ca3850392967d0d42a5cc71f4352c420200213160932/eb7353
Stijn
I am not sure how you would like to visualise your data, but this is what I managed to do:
create a daysdiff column with
Daysbetween = IF(ISBLANK(TF_Eventos
[End out of order]);DATEDIFF(TF_Eventos[Start out of
order];TF_Eventos[TODAY];HOUR);DATEDIFF(TF_Eventos
[Start out of order];TF_Eventos[End out of order];HOUR))
This creates your column to check difference between Dates.
Then create a separate column with your Date. In this case I copied the Start out of order date, since I thought you might wanted to be able to filter for the start dates. Then simply create a relationship between your newly created Date column and your start out of order date.
Doing so lets you create a visual with the daysbetween (in this case portrayed in hours) and your start dates. Now just simply add a slicer and you can filter on date.
Hope this helps

New SAS variable conditional on observations

(first time posting)
I have a data set where I need to create a new variable (in SAS), based on meeting a condition related to another variable. So, the data contains three variables from a survey: Site, IDnumb (person), and Date. There can be multiple responses from different people but at the same site (see person 1 and 3 from site A).
Site IDnumb Date
a 1 6/12
b 2 3/4
c 4 5/1
a 3 .
d 5 .
I want to create a new variable called Complete, but it can't contain duplicates. So, when I go to proc freq, I want site A to be counted once, using the 6/12 Date of the Completed Survey. So basically, if a site is represented twice and contains a Date in one, I want to only count that one and ignore the duplicate site without a date.
N %
Complete 3 75%
Last Month 1 25%
My question may be around the NODUP and NODUPKEY possibilities. If I do a Proc Sort (nodupkey) by Site and Date, would that eliminate obs "a 3 ."?
Any help would be greatly appreciated. Sorry for the jumbled "table", as this is my first post (hints on making that better are also welcomed).
You can do this a number of ways.
First off, you need a complete/not complete binary variable. If you're in the datastep anyway, might as well just do it all there.
proc sort data=yourdata;
by site date descending;
run;
data yourdata_want;
set yourdata;
by site date descending;
if first.site then do;
comp = ifn(date>0,1,0);
output;
end;
run;
proc freq data=yourdata_want;
tables comp;
run;
If you used NODUPKEY, you'd first sort it by SITE DATE DESCENDING, then by SITE with NODUPKEY. That way the latest date is up top. You also could format COMP to have the text labels you list rather than just 1/0.
You can also do it with a format on DATE, so you can skip the data step (still need the sort/sort nodupkey). Format all nonmissing values of DATE to "Complete" and missing value of date to "Last Month", then include the missing option in your proc freq.
Finally, you could do the table in SQL (though getting two rows like that is a bit harder, you have to UNION two queries together).