I have a dataframe:
time year month
0 12/28/2013 0:17 2013 12
1 12/28/2013 0:20 2013 12
2 12/28/2013 0:26 2013 12
3 12/29/2013 0:20 2013 12
4 12/29/2013 0:26 2013 12
5 12/30/2013 0:31 2013 12
6 12/30/2013 0:31 2013 12
7 12/31/2013 0:17 2013 12
8 12/31/2013 0:20 2013 12
9 12/31/2013 0:26 2013 12
10 1/1/2014 4:30 2014 1
11 1/1/2014 4:34 2014 1
12 1/1/2014 4:37 2014 1
13 1/2/2014 4:30 2014 1
14 1/2/2014 5:30 2014 1
15 1/3/2014 4:30 2014 1
16 1/3/2014 4:34 2014 1
17 1/3/2014 4:37 2014 1
18 1/4/2014 4:30 2014 1
19 1/4/2014 4:34 2014 1
20 1/4/2014 4:37 2014 1
I use the following code to extract the week information:
df['week'] = df['time'].dt.week
This makes the dataframe as following:
time year month week
0 2013-12-28 00:17:00 2013 12 52
1 2013-12-28 00:20:00 2013 12 52
2 2013-12-28 00:26:00 2013 12 52
3 2013-12-29 00:20:00 2013 12 52
4 2013-12-29 00:26:00 2013 12 52
5 2013-12-30 00:31:00 2013 12 1
6 2013-12-30 00:31:00 2013 12 1
7 2013-12-31 00:17:00 2013 12 1
8 2013-12-31 00:20:00 2013 12 1
9 2013-12-31 00:26:00 2013 12 1
10 2014-01-01 04:30:00 2014 1 1
11 2014-01-01 04:34:00 2014 1 1
12 2014-01-01 04:37:00 2014 1 1
13 2014-01-02 04:30:00 2014 1 1
14 2014-01-02 05:30:00 2014 1 1
15 2014-01-03 04:30:00 2014 1 1
16 2014-01-03 04:34:00 2014 1 1
17 2014-01-03 04:37:00 2014 1 1
18 2014-01-04 04:30:00 2014 1 1
19 2014-01-04 04:34:00 2014 1 1
20 2014-01-04 04:37:00 2014 1 1
I would like to create another column showing year-week (e.g., 2013-52, 2014-1). The problem is when I combine two columns (year, week) in rows 5 through 9, the result is 2013-1 saying the first week of 2013. This is not correct. Is there any solution for this issue?
Use dt.strftime
reference http://strftime.org/
df.time.dt.strftime('%Y-%W')
0 2013-51
1 2013-51
2 2013-51
3 2013-51
4 2013-51
5 2013-52
6 2013-52
7 2013-52
8 2013-52
9 2013-52
10 2014-00
11 2014-00
12 2014-00
13 2014-00
14 2014-00
15 2014-00
16 2014-00
17 2014-00
18 2014-00
19 2014-00
20 2014-00
Name: time, dtype: object
As #TrigonaMinima pointed out, the first week of the year as defined by ISO 8601 (which dt.week follows):
It is the first week with a majority (4 or more) of its days in
January
In your case, week = 1 has 2 days in December and the rest in January, thus fitting the definition of the first week.
Related
I have a calendar table where I created with M.
I'm relating it to a table of activities, where I grouped it by week.
I've calculated the total value of activities I have in that time period by a DAX measure (let's consider 5000), I need to plot a linear descending line over the period from that value (5000) to 0.
I've managed to get close results, but it doesn't stay at 0. It either exceeds or is missing 1 period of time.
Here is the current table and the expected table:
Year
Month
End of Week
Expected
2021
6
05/06/2021
2021
6
12/06/2021
2021
6
19/06/2021
2021
6
26/06/2021
2021
7
03/07/2021
2021
7
10/07/2021
2021
7
17/07/2021
2021
7
24/07/2021
2021
7
31/07/2021
2021
8
07/08/2021
2021
8
14/08/2021
2021
8
21/08/2021
2021
8
28/08/2021
2021
9
04/09/2021
2021
9
11/09/2021
2021
9
18/09/2021
2021
9
25/09/2021
2021
10
02/10/2021
2021
10
09/10/2021
2021
10
16/10/2021
2021
10
23/10/2021
2021
10
30/10/2021
2021
11
06/11/2021
2021
11
13/11/2021
2021
11
20/11/2021
2021
11
27/11/2021
2021
12
04/12/2021
2021
12
11/12/2021
2021
12
18/12/2021
2021
12
25/12/2021
2022
1
01/01/2022
2022
1
08/01/2022
2022
1
15/01/2022
2022
1
22/01/2022
2022
1
29/01/2022
2022
2
05/02/2022
EXPECTED TABLE
Year
Month
End of Week
Expected
2021
6
05/06/2021
5000
2021
6
12/06/2021
4857,143
2021
6
19/06/2021
4714,286
2021
6
26/06/2021
4571,429
2021
7
03/07/2021
4428,571
2021
7
10/07/2021
4285,714
2021
7
17/07/2021
4142,857
2021
7
24/07/2021
4000
2021
7
31/07/2021
3857,143
2021
8
07/08/2021
3714,286
2021
8
14/08/2021
3571,429
2021
8
21/08/2021
3428,571
2021
8
28/08/2021
3285,714
2021
9
04/09/2021
3142,857
2021
9
11/09/2021
3000
2021
9
18/09/2021
2857,143
2021
9
25/09/2021
2714,286
2021
10
02/10/2021
2571,429
2021
10
09/10/2021
2428,571
2021
10
16/10/2021
2285,714
2021
10
23/10/2021
2142,857
2021
10
30/10/2021
2000
2021
11
06/11/2021
1857,143
2021
11
13/11/2021
1714,286
2021
11
20/11/2021
1571,429
2021
11
27/11/2021
1428,571
2021
12
04/12/2021
1285,714
2021
12
11/12/2021
1142,857
2021
12
18/12/2021
1000
2021
12
25/12/2021
857,1429
2022
1
01/01/2022
714,2857
2022
1
08/01/2022
571,4286
2022
1
15/01/2022
428,5714
2022
1
44583
285,7143
2022
1
44590
142,8571
2022
2
44597
0
It is recommended that I remove the decimal places from the visualization of the graph of the linear line. However, I will not round the value for the line to be straight down.
I'm working with panel data in Stata, and I have a set up like the following:
ID
year
value
1
2010
1
2011
20
1
2012
20
1
2013
1
2014
2
2010
2
2011
14
2
2012
14
2
2013
14
2
2014
14
and I want to change the blank entries to be the same as the other entries within that ID, for any year. I.e., I want something like the following:
ID
year
value
1
2010
20
1
2011
20
1
2012
20
1
2013
20
1
2014
20
2
2010
14
2
2011
14
2
2012
14
2
2013
14
2
2014
14
What do you recommend?
If the value in variable value are always the same within id you can use this:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id int year byte value
1 2010 .
1 2011 20
1 2012 20
1 2013 .
1 2014 .
2 2010 .
2 2011 14
2 2012 14
2 2013 14
2 2014 14
end
*Get mean of values within id
bysort id : egen value2 = mean(value)
*Transfer values back to original var to maintain var labels etc. then drop value2
replace value = value2
drop value2
I have an employer-employee database and need to keep only the individuals that have at least one colleague considering the Firm_id variable, but I don't know how to do this in Stata. My dataset is like:
Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
3 22 2011
4 22 2010
4 20 2011
In case above, I would keep only the individuals corresponding to the Id 1 and 2 because they are in the same firm in both of the years in the sample. Individual number 3 in 2011 and Individual 4 in 2011 would be dropped.
The output I'm looking for is like:
Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
4 22 2010
This works for your data example:
clear
input Id Firm_id Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
3 22 2011
4 22 2010
4 20 2011
end
bysort Year Firm_id : keep if Id[1] != Id[_N]
sort Id Year
list
I am trying to replicate the data that is used in "When your fans are online" section" of a business page's insights dashboard. I am using the following parameters in the /insights/page_fans_online api call which returns the data I am after:
parameters={'period':'day','since':'2018-10-20T07:00:00','until':'2018-10-21T07:00:00','access_token':page_token['access_token'][0]}
The data returned can be seen below, where:
end_time = end_time (based on the since & until dates in the parameters)
name = metric
apiHours = hour of day returned
localDate = localized date (applied manually)
localHours = - 6 hour offset to localize to Auckland/New Zealand (applied
manually to replicate what is seen on the insights dashboard.
fansOnline = number of unique page fans online during that hour
Data:
end_time name apiHours localDate localHours fansOnline
2018-10-21T07:00:00+0000 page_fans_online 0 2018-10-19 18 21
1 2018-10-19 19 29
2 2018-10-19 20 20
3 2018-10-19 21 18
4 2018-10-19 22 20
5 2018-10-19 23 15
6 2018-10-19 0 4
7 2018-10-19 1 6
8 2018-10-19 2 5
9 2018-10-19 3 8
10 2018-10-19 4 17
11 2018-10-19 5 19
12 2018-10-19 6 26
13 2018-10-19 7 24
14 2018-10-19 8 20
15 2018-10-19 9 22
16 2018-10-19 10 19
17 2018-10-19 11 22
18 2018-10-19 12 18
19 2018-10-19 13 18
20 2018-10-19 14 18
21 2018-10-19 15 18
22 2018-10-19 16 21
23 2018-10-19 17 28
It took a while to work out that the data returned when pulling page_fans_online using the parameters specified above is for Wednesday October 19th, for a New Zealand business page.
If we look at the last row in the data above:
end_time = 2018-10-21
apiHours = 23
localDate = 2018-10-19
localHours = 17
fansOnline = 28
It is saying on 2018-10-21 # 11 pm there were 28 unique fans online. This translates to , on 2018-10-19 # 5 pm there were 28 unique fans online when the dates and times are manually localized, (I worked the offset out by checking the "When your fans online" graphs on the page insights).
There is a -54 hour offset between 2018-10-21 11:00 pm and 2018-10-19 5:00 pm, and my question is, what is the logic used behind the end_time and hour of day returned by the page_fans_online insights metric and is there any info regarding how this should be localized depending on what country the business is located?
There is only a simple description of what page_fans_online is in the page/insights docs and says the hours are in PST/PDT but that does not help with localizing the date and hour of day:
https://developers.facebook.com/docs/graph-api/reference/v3.1/insights
Suppose I have a very simple dataframe:
>>> a
Out[158]:
monthE yearE dayE
0 10 2014 15
1 2 2012 15
2 2 2014 15
3 12 2015 15
4 2 2012 15
Suppose that I want to create the column with the date related to every line, using three columns of integers.
When I have simple numbers it is enough to do like:
>>> datetime.date(1983,11,8)
Out[159]: datetime.date(1983, 11, 8)
If I have to create a column of dates (theoretically a very basic request), instead:
a.apply(lambda x: datetime.date(x['yearE'],x['monthE'],x['dayE']))
I obtain the following error:
KeyError: ('yearE', u'occurred at index monthE')
I think you can first remove last char E and then use to_datetime, but then get pandas timestamps not python dates:
df.columns = df.columns.str[:-1]
df['date'] = pd.to_datetime(df)
#if multiple columns filter by subset
#df['date'] = pd.to_datetime(df[['year','month','day']])
print (df)
month year day date
0 10 2014 15 2014-10-15
1 2 2012 15 2012-02-15
2 2 2014 15 2014-02-15
3 12 2015 15 2015-12-15
4 2 2012 15 2012-02-15
print (df.date.dtypes)
datetime64[ns]
print (df.date.iloc[0])
2014-10-15 00:00:00
print (type(df.date.iloc[0]))
<class 'pandas.tslib.Timestamp'>
Thank you MaxU for solution:
df['date'] = pd.to_datetime(df.rename(columns = lambda x: x[:-1]))
#if another columns in df
#df['date'] = pd.to_datetime(df[['yearE','monthE','dayE']].rename(columns=lambda x: x[:-1]))
print (df)
monthE yearE dayE date
0 10 2014 15 2014-10-15
1 2 2012 15 2012-02-15
2 2 2014 15 2014-02-15
3 12 2015 15 2015-12-15
4 2 2012 15 2012-02-15
But if really need python dates add axis=1 to apply, but then is impossible use some pandas functions:
df['date'] =df.apply(lambda x: datetime.date(x['yearE'],x['monthE'],x['dayE']), axis=1)
print (df)
monthE yearE dayE date
0 10 2014 15 2014-10-15
1 2 2012 15 2012-02-15
2 2 2014 15 2014-02-15
3 12 2015 15 2015-12-15
4 2 2012 15 2012-02-15
print (df.date.dtypes)
object
print (df.date.iloc[0])
2014-10-15
print (type(df.date.iloc[0]))
<class 'datetime.date'>