Background:
Have a monitoring script that is run 3 times a day and outputs a .csv file to a SharePoint folder. Each time the script is run, the new csv contains an update on the various processes run. Currently able to get all of csv files back as a series of rows in the transformation.
Question:
Is there a way to limit the amount of rows for each day to just the Top 1 row so that the dashboard being created shows the most up-to-date information for each particular day. Would like to do this at the Transform stage so don't have to load any unnecessary data.
Eg. Example data in tranformation:
Filename
Extension
Date created
Keep in Transformation?
file9
.csv
29/04/2021 07:52:41
KEEP
file8
.csv
28/04/2021 16:52:14
KEEP
file7
.csv
28/04/2021 11:52:20
[redundant]
file6
.csv
28/04/2021 07:52:49
[redundant]
file5
.csv
27:04/2021 16:51:41
KEEP
file4
.csv
27/04/2021 11:52:21
[redundant]
file3
.csv
27/04/2021 07:52:03
[redundant]
file2
.csv
26/04/2021 16:52:43
KEEP
file1
.csv
26/04/2021 11:52:20
[redundant]
Feels weird to answer my own question, but thought I would post, just in case someone has the same question...
The steps to get the latest row for each day are:
Ensure that the dataset is ordered by the Date created column in descending order.
Duplicate the Date created column to perform transformations on. It might create a new column called Date created - Copy.
Highlight the Date created - Copy column, and then select Split Column by Delimiter. As it's a Date/Time column, I split the column by the Space delimiter. This will create 2 new columns, Date created - Copy.1 and Date created - Copy.2.
Highlight the new Date column Date created - Copy.1 and then select Remove Rows - Remove Duplicates.
At this point you should only see the latest row of data for each day.
Remove the 2 split columns to tidy up the dataset.
I have three macros below which creates three excel reports with multiple sheets inside.
12 months
%metrics(score =y,vintage=y,excel=excel1);
06 months
%metrics(score =y,vintage=y,excel=excel2)
18 months
%metrics(score =y,vintage=y,excel=excel3)
when i run this SAS it creates three excel sheets with multiple sheets in winscp location sas/output
I want to know how to create a single excel picking up particular sheets from each three excel and output it into new excel document below these three macros without disturbing it and output in same winscp location.
I have created the hive table and started loading the data using statement load data inpath<hdfs path>into table <hive_table_name>
When I tried to open the data there are two problems
1) At the end of last column there are continuous NULL appending to last column which are not present in file
2) When I tried to run count(*) from hive map is 100% and reduce 0% and it is continuously executing. I am not getting any result
Example of the csv data is given below
xxxxx,2xxxx 08:15:00.0,19 ,Wxxxxx 2 IST 2015,0,2015- 100.0,1A,gggg,null,null,null,null,null,null,null,null,null,RP,AAGhghjgS,DELVS3885,1ghhh63,Djhkj85,null,AGY,jkjk85,1122JK,55666,null,1,BjhkhkjDC,null,006hhgjgAGS,null,null,null,/DCS-SYNCUS,null,null,kljlkl,null,null,null,null,null,null,null,null,null,null,null,null,14jkjhj63,DELVS3885,T,null,1A,hgfd,IN,null,null,null,null,null,null,14300963,DELbhjhhjkhk,T,null,1A,DEL,IN,null,null,null,null,null,null,null,hgjhhjj,A,null,UK,ghj,IN,null,null,null,null,null,null,Wed Jan 20 13:36:28 IST 2016
Please help me on this.
I have a csv file that I read via pandas.DataFrame.from_csv.
I want to filter those rows which contains 3rd Friday of a certain month or any months in the date index or in one of the columns.
I played around with dateutil and pandas.datetools.WeekOfMonth(week=2,weekday=4)
but still couldn't figure it out.
An example of your csv file would help us answer your question. I'm assuming you are using timestamps, is that true?
You could try filtering by making a list of the timestamps (the start and end dates here are arbitrary, and will likely come as the first and last row in your csv file):
start_date='1/1/2010'
finish_date='10/1/2010'
third_friday_timestamps = pd.date_range(start_date,finish_date, freq='WOM-3FRI')
this will give a Series of timestamps. Using this series, you can filter the dataframe that came from the csv file using strategies explained here. I hope this helps.
I am working on a WO Aging report and I need to figure out the following information:
Need to group by Trade
Need to group by Status
Need to know
how many work orders are in each status that are:
open for 1 week
open for 2 weeks
open for 3 weeks
open for 4 weeks
open 5 weeks or more
Is there a way to write an if...else..then statement that will allow me to count the number of records within each week?
Sure. If you want the weeks as columns, it's just something like
(case when <your date range comparison> then work_order else null end)
You'd add a new calculation like that for each bucket, and set the aggregation to be count. The trickiest part is probably going to be deriving your week buckets.