I have data consisting of route details of the customers and also their store scores.
raw data with overall ranking for all the customers :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
5371 ABC Chicago CG 1200 5 1
2098 HGT Kansas KK 6500 4.8 2
7680 POE Arizona QW 3300 4.2 3
3476 POE Arizona CV 3300 4 4
6272 KUN Florida ANF 7800 3.9 5
3220 ABC Chicago AF 1200 3.6 6
7266 IOR Califor LU 4500 3.2 7
3789 POE Arizona TR 3300 3 8
9383 KAR Newyork IO 5600 3 9
1583 KUN Florida BOT 7800 2.8 10
8219 ABC Chicago Bb 1200 2.5 11
3734 ABC Chicago AA 1200 2 12
6900 POE Arizona HAL 3300 1.8 13
8454 KUN Florida UYO 7800 1.5 14
Filters
Distname ALL
State ALL
Routecode ALL
This is the overall ranking for all the customers without selecting any filters. So when I select some filter like (Dist name, route code, store score) I want it to show the rank according to the selected filter. Eg :
Dist_Code|Dist_Name|State|Store_name|Route_code|Store_score|Rank
7680 POE Arizona QW 3300 4.2 1
3476 POE Arizona CV 3300 4 2
3789 POE Arizona TR 3300 3 3
6900 POE Arizona HAL 3300 1.8 4
Filter
Distname POE
State Arizona
Routecode 3300
The store score is based on some parameter which I calculated in a model using python.
Currently it is string column in powerbi. I tried some dax but it was not successful.
Imagine a table with 3 columns:
Date
AssetType
Value
2022-01-01
A
1
2022-01-02
A
1.02
2022-01-03
A
1.05
2022-01-04
A
1.09
2022-01-05
A
1.06
2022-01-03
B
1
2022-01-04
B
1.05
2022-01-05
B
1.07
2022-01-06
B
1.09
2022-01-07
B
1.08
The First date of 2022 for each asset is diferent.
Asset A - 2022-01-01
Asset B - 2022-01-03
I want to create a new column or measure that returns the first date of 2022 for both assets.
So far i've tried to use = CALCULATE(STARTOFYEAR(table[date])), FILTER(Table, Table[AssetType] = [Asset type]
Obs. [Asset Type] is a measure tha giver me the type of asset.
But is returning the same date for both assets (2022-01-01)
Does anyone knows how get this done ?
Date
AssetType
Value
FirstDate
2022-01-01
A
1
2022-01-01
2022-01-02
A
1.02
2022-01-01
2022-01-03
A
1.05
2022-01-01
2022-01-04
A
1.09
2022-01-01
2022-01-05
A
1.06
2022-01-01
2022-01-03
B
1
2022-01-03
2022-01-04
B
1.05
2022-01-03
2022-01-05
B
1.07
2022-01-03
2022-01-06
B
1.09
2022-01-03
2022-01-07
B
1.08
2022-01-03
Thx
OK. This Time create a calculated column and paste this code:
FirstDate =
CALCULATE (
MIN ( YourTable[Date] ),
ALLEXCEPT ( YourTable, YourTable[AssetType] )
)
The result :
Im new in sas base and need help.
I have 2 tables with different data and I need merge it.
But on step I need data from next row.
example what I need:
ID Fdate Tdate NFdate NTdate
id1 date1 date1 date2 date2
id2 date2 date2 date3 date3
....
I did it by 2 merges:
data result;
merge table1 table2 by ...;
merge table1(firstobs=2) table2(firstobs=2) by...;
run;
I expected 10 rows but got 9 becouse one-to-one reading stopted on last row of smallest table(merge). How I can get the last row (do one-to-one reading by biggest table)?
Most simple data steps stop not at the bottom of the step but in the middle when they read past the end of the input. The reason you are getting N-1 observations is because the second input has one fewer records. So you need to do something to stop that.
One simple way is to not execute the second read when you are processing the last observation read by the first one. You can use the END= option to create a boolean variable that will let you know when that happens.
Here is simple example using SASHELP.CLASS.
data test;
set sashelp.class end=eof;
if not eof then set sashelp.class(firstobs=2 keep=name rename=(name=next_name));
else call missing(next_name);
run;
Results:
next_
Obs Name Sex Age Height Weight name
1 Alfred M 14 69.0 112.5 Alice
2 Alice F 13 56.5 84.0 Barbara
3 Barbara F 13 65.3 98.0 Carol
4 Carol F 14 62.8 102.5 Henry
5 Henry M 14 63.5 102.5 James
6 James M 12 57.3 83.0 Jane
7 Jane F 12 59.8 84.5 Janet
8 Janet F 15 62.5 112.5 Jeffrey
9 Jeffrey M 13 62.5 84.0 John
10 John M 12 59.0 99.5 Joyce
11 Joyce F 11 51.3 50.5 Judy
12 Judy F 14 64.3 90.0 Louise
13 Louise F 12 56.3 77.0 Mary
14 Mary F 15 66.5 112.0 Philip
15 Philip M 16 72.0 150.0 Robert
16 Robert M 12 64.8 128.0 Ronald
17 Ronald M 15 67.0 133.0 Thomas
18 Thomas M 11 57.5 85.0 William
19 William M 15 66.5 112.0
I have a pandas timeseries with minute tick data:
2011-01-01 09:30:00 -0.358525
2011-01-01 09:31:00 -0.185970
2011-01-01 09:32:00 -0.357479
2011-01-01 09:33:00 -1.486157
2011-01-01 09:34:00 -1.101909
2011-01-01 09:35:00 -1.957380
2011-01-02 09:30:00 -0.489747
2011-01-02 09:31:00 -0.341163
2011-01-02 09:32:00 1.588071
2011-01-02 09:33:00 -0.146610
2011-01-02 09:34:00 -0.185834
2011-01-02 09:35:00 -0.872918
2011-01-03 09:30:00 0.682824
2011-01-03 09:31:00 -0.344875
2011-01-03 09:32:00 -0.641186
2011-01-03 09:33:00 -0.501414
2011-01-03 09:34:00 0.877347
2011-01-03 09:35:00 2.183530
What is the best way to stack it into a dataframe such as :
09:30:00 09:31:00 09:32:00 09:33:00 09:34:00 09:35:00
2011-01-01 -0.358525 -0.185970 -0.357479 -1.486157 -1.101909 -1.957380
2011-01-02 -0.489747 -0.341163 1.588071 -0.146610 -0.185834 -0.872918
2011-01-03 0.682824 -0.344875 -0.641186 -0.501414 0.877347 2.183530
I'd make sure that this is actually want you want to do, as the resulting df loses a lot of the nice time-series functionality that pandas has.
But here is some code that would accomplish it. First, a time column is added, and the index is set to just the date part of the DateTimeIndex. The pivot command reshapes the data, setting the times as columns.
In [74]: df.head()
Out[74]:
value
date
2011-01-01 09:30:00 -0.358525
2011-01-01 09:31:00 -0.185970
2011-01-01 09:32:00 -0.357479
2011-01-01 09:33:00 -1.486157
2011-01-01 09:34:00 -1.101909
In [75]: df['time'] = df.index.time
In [76]: df.index = df.index.date
In [77]: df2 = df.pivot(index=df.index, columns='time')
The results dataframe will have a MultiIndex for the columns (the top level just being the name of your values variable). If you want it back to just a list of columns, the code below will flatten the column list.
In [78]: df2.columns = [c for (_, c) in df2.columns]
In [79]: df2
Out[79]:
09:30:00 09:31:00 09:32:00 09:33:00 09:34:00 09:35:00
2011-01-01 -0.358525 -0.185970 -0.357479 -1.486157 -1.101909 -1.957380
2011-01-02 -0.489747 -0.341163 1.588071 -0.146610 -0.185834 -0.872918
2011-01-03 0.682824 -0.344875 -0.641186 -0.501414 0.877347 2.183530
I'm trying to convert a time series index to a seconds of the day i.e. so that the seconds increases from 0-86399 as the day progresses. I currently can recover the time of the day, but am having trouble converting this to seconds in a vectorized way:
df['timeofday'] = df.index.time
Any ideas? Thanks.
As #Jeff points out my original answer misunderstood what you were doing. But the following should work and it is vectorized. My answer relies on numpy datetime64 operations (subtract the beginning of the day from the current datetime64 and the divide through with a timedelta64 to get seconds):
>>> df
A
2011-01-01 00:00:00 -0.112448
2011-01-01 01:00:00 1.006958
2011-01-01 02:00:00 -0.056194
2011-01-01 03:00:00 0.777821
2011-01-01 04:00:00 -0.552584
2011-01-01 05:00:00 0.156198
2011-01-01 06:00:00 0.848857
2011-01-01 07:00:00 0.248990
2011-01-01 08:00:00 0.524785
2011-01-01 09:00:00 1.510011
2011-01-01 10:00:00 -0.332266
2011-01-01 11:00:00 -0.909849
2011-01-01 12:00:00 -1.275335
2011-01-01 13:00:00 1.361837
2011-01-01 14:00:00 1.924534
2011-01-01 15:00:00 0.618478
df['sec'] = (df.index.values
- df.index.values.astype('datetime64[D]'))/np.timedelta64(1,'s')
A sec
2011-01-01 00:00:00 -0.112448 0
2011-01-01 01:00:00 1.006958 3600
2011-01-01 02:00:00 -0.056194 7200
2011-01-01 03:00:00 0.777821 10800
2011-01-01 04:00:00 -0.552584 14400
2011-01-01 05:00:00 0.156198 18000
2011-01-01 06:00:00 0.848857 21600
2011-01-01 07:00:00 0.248990 25200
2011-01-01 08:00:00 0.524785 28800
2011-01-01 09:00:00 1.510011 32400
2011-01-01 10:00:00 -0.332266 36000
2011-01-01 11:00:00 -0.909849 39600
2011-01-01 12:00:00 -1.275335 43200
2011-01-01 13:00:00 1.361837 46800
2011-01-01 14:00:00 1.924534 50400
2011-01-01 15:00:00 0.618478 54000
May be a bit overdone, but this would be my answer:
from pandas import date_range, Series, to_datetime
# Some test data
rng = date_range('1/1/2011 01:01:01', periods=3, freq='s')
df = Series(randn(len(rng)), index=rng).to_frame()
def sec_in_day(timestamp):
date = timestamp.date() # We get the date less the time
elapsed_time = timestamp.to_datetime() - to_datetime(date) # We get the time
return elapsed_time.total_seconds()
Series(df.index).apply(sec_in_day)
I modified KarlD's answer for datetime with time zone:
d = pd.DataFrame({"t_naive":pd.date_range("20160101","20160102", freq = "2H")})
d['t_utc'] = d['t_naive'].dt.tz_localize("UTC")
d['t_ct'] = d['t_utc'].dt.tz_convert("America/Chicago")
print(d.head())
# t_naive t_utc t_ct
# 0 2016-01-01 00:00:00 2016-01-01 00:00:00+00:00 2015-12-31 18:00:00-06:00
# 1 2016-01-01 02:00:00 2016-01-01 02:00:00+00:00 2015-12-31 20:00:00-06:00
# 2 2016-01-01 04:00:00 2016-01-01 04:00:00+00:00 2015-12-31 22:00:00-06:00
# 3 2016-01-01 06:00:00 2016-01-01 06:00:00+00:00 2016-01-01 00:00:00-06:00
# 4 2016-01-01 08:00:00 2016-01-01 08:00:00+00:00 2016-01-01 02:00:00-06:00
The answer by KarlD gives sec of day in UTC
s0 = (d["t_naive"].values - d["t_naive"].values.astype('datetime64[D]'))/np.timedelta64(1,'s')
s0
# array([ 0., 7200., 14400., 21600., 28800., 36000., 43200.,
# 50400., 57600., 64800., 72000., 79200., 0.])
s1 = (d["t_ct"].values - d["t_ct"].values.astype('datetime64[D]'))/np.timedelta64(1,'s')
s1
# array([ 0., 7200., 14400., 21600., 28800., 36000., 43200.,
# 50400., 57600., 64800., 72000., 79200., 0.])
For sec of day in local time, use:
s2 = (d["t_ct"].view("int64") - d["t_ct"].dt.normalize().view("int64"))//pd.Timedelta(1, unit='s')
#use d.index.normalize() for index
s2.values
# array([64800, 72000, 79200, 0, 7200, 14400, 21600, 28800, 36000,
# 43200, 50400, 57600, 64800], dtype=int64)
or,
s3 = d["t_ct"].dt.hour*60*60 + d["t_ct"].dt.minute*60+ d["t_ct"].dt.second
s3.values
# array([64800, 72000, 79200, 0, 7200, 14400, 21600, 28800, 36000,
# 43200, 50400, 57600, 64800], dtype=int64)