Is there an efficient way to fill date gaps in python?

Is there an efficient way to fill date gaps in python? - python-2.7

I fetch my documents from MongoDB as below:
{
"amount": 1200,
"date_closed": "2012-07-02 17:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-03 16:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-04 20:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-04 22:00:00"
}
I get a timestamp like 1343287040 from user(parameter called user_time) which refers to date datetime.datetime(2012, 7, 26, 11, 47, 20).
This is my solution to fill gaps:
Now I create a date format YYYY-mm-dd 00:00:00 by the below code:
hourly_date = str(datetime.datetime.fromtimestamp(user_time).year) + '-' + str(datetime.datetime.fromtimestamp(user_time).month) + '-' + str(datetime.datetime.fromtimestamp(user_time).day) + ' 00:00:00'
user_time is the start date. Now I generate hourly records from user_time until today. The below code generate date range in hour with the format I want:
date_range = pandas.date_range(start=hourly_date, end=datetime.datetime.today(), freq='H')
date_range = date_range.values.astype('<M8[h]').astype(str)
hourly_date = []
for i_hourly in date_range:
tmp_date = pandas.to_datetime(str(i_hourly)).strftime('%Y-%m-%d %H:00:00')
hourly_date.append(tmp_date)
After creating a template date range in hour from user_time until today, I compare it with my date_closed field which is returned from MongoDB:
records_len = len(records)
for i_hourly in hourly_date:
i = 0
for record in records:
i += 1
if i_hourly in record['date_closed']:
break # break from innermost loop
elif records_len == i and i_hourly not in record['date_closed']:
records.append({"amount": 0, "date_closed": i_hourly})
records contains many field lets say from 2012 until today, the problem I want to solve is to see is the date and hour is missing from returned document. If it is missing then we need to add it to records to fill the gap, otherwise I should break from the innermost loop.
This code takes about 57 seconds! This is a huge amount of time. Is there a better an more efficient way to generate date gaps in hour?
EDIT:
amount date_closed
0 21800 2015-07-21 10:00:00
1 5450 2015-07-05 04:00:00
2 571160 2015-06-22 12:00:00
3 65400 2015-06-15 12:00:00
4 10900 2015-06-15 09:00:00
5 109000 2015-06-14 07:00:00
6 109000 2015-06-14 04:00:00
7 1193550 2015-06-11 06:00:00
8 10900 2015-06-11 05:00:00
9 21800 2015-06-09 10:00:00
10 10900 2015-05-31 05:00:00
11 0 2015-05-30 09:00:00
12 114450 2015-05-19 13:00:00
13 261600 2015-05-19 08:00:00
14 108000 2015-05-11 08:00:00
15 2180 2015-05-11 07:00:00
16 344870 2015-05-05 13:00:00
17 70850 2015-05-05 12:00:00
18 5450 2015-05-05 05:00:00
19 109000 2015-05-03 12:00:00
20 327000 2015-05-03 11:00:00
21 310650 2015-04-30 05:00:00
22 38150 2015-04-28 13:00:00
23 26160 2015-04-27 07:00:00
24 109000 2015-04-22 12:00:00
25 97200 2015-03-09 08:00:00
26 21800 2015-07-11 05:00:00
27 26160 2015-05-20 05:00:00
28 37800 2015-03-03 07:00:00
29 130800 2015-06-29 06:00:00
.. ... ...
161 2180 2015-05-25 09:00:00
162 26160 2015-05-09 11:00:00
163 108000 2015-03-03 11:00:00
164 3337200 2014-09-13 05:00:00
165 5249880 2014-09-10 05:00:00
166 712800 2014-08-10 09:00:00
167 151200 2015-02-23 06:00:00
168 48600 2014-08-10 11:00:00
169 6540 2015-04-19 10:00:00
170 172800 2014-09-01 09:00:00
171 1370520 2014-10-15 09:00:00
172 421200 2014-07-26 09:00:00
173 86400 2015-03-01 12:00:00
174 118800 2015-02-21 12:00:00
175 97200 2014-09-17 07:00:00
176 54500 2015-04-23 07:00:00
177 1185840 2014-09-09 06:00:00
178 119016 2015-02-18 09:00:00
179 32400 2014-11-05 08:00:00
180 345600 2014-08-09 10:00:00
181 151200 2015-02-18 12:00:00
182 168480 2014-10-09 06:00:00
183 5668920 2014-10-04 21:00:00
184 669600 2014-08-06 12:00:00
185 194400 2014-08-02 07:00:00
186 313920 2015-06-23 08:00:00
187 6540 2015-05-04 09:00:00
188 669600 2014-07-23 10:00:00
189 64800 2015-01-22 06:00:00
190 669600 2014-08-25 04:00:00
[191 rows x 2 columns]
It shows that I just have 191 records these are returned from Mongo! I wanted to see a list of hourly generated list which is around 121000 records and 191 records of it will be filled by the above code.
The problem is that I suppose these two list are not merged together.

You can first make date_closed column as the index and then .reindex according to hourly_date_rng to populate the missing records.
Here is an example.
import json
import pandas as pd
json_data = [
{
"amount": 0,
"date_closed": "2012-08-04 16:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-04 20:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-04 22:00:00"
}
]
df = pd.read_json(json.dumps(json_data), orient='records')
df
amount date_closed
0 0 2012-08-03 16:00:00
1 0 2012-08-04 20:00:00
2 0 2012-08-04 22:00:00
The hourly_date_rng looks like this
hourly_date_rng = pd.date_range(start='2012-08-04 12:00:00', end='2012-08-4 23:00:00', freq='H')
hourly_date_rng.name = 'date_closed'
hourly_date_rng
DatetimeIndex(['2012-08-04 12:00:00', '2012-08-04 13:00:00',
'2012-08-04 14:00:00', '2012-08-04 15:00:00',
'2012-08-04 16:00:00', '2012-08-04 17:00:00',
'2012-08-04 18:00:00', '2012-08-04 19:00:00',
'2012-08-04 20:00:00', '2012-08-04 21:00:00',
'2012-08-04 22:00:00', '2012-08-04 23:00:00'],
dtype='datetime64[ns]', name='date_closed', freq='H', tz=None)
To align the index and fill the gaps
# make the column datetime object instead of string
df['date_closed'] = pd.to_datetime(df['date_closed'])
# align the index using .reindex
df.set_index('date_closed').reindex(hourly_date_rng).fillna(0).reset_index()
date_closed amount
0 2012-08-04 12:00:00 0
1 2012-08-04 13:00:00 0
2 2012-08-04 14:00:00 0
3 2012-08-04 15:00:00 0
4 2012-08-04 16:00:00 0
5 2012-08-04 17:00:00 0
6 2012-08-04 18:00:00 0
7 2012-08-04 19:00:00 0
8 2012-08-04 20:00:00 0
9 2012-08-04 21:00:00 0
10 2012-08-04 22:00:00 0
11 2012-08-04 23:00:00 0
Edit:
To convert back the result to JSON.
result = df.set_index('date_closed').reindex(hourly_date_rng).fillna(0).reset_index()
# maybe convert date_closed column to string first
result['date_closed'] = pd.DatetimeIndex(result['date_closed']).to_native_types()
# to json function
json_result = result.to_json(orient='records')
# print out the data with pretty print
from pprint import pprint
pprint(json.loads(json_result))
[{'amount': 0.0, 'date_closed': '2012-08-04 12:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 13:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 14:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 15:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 16:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 17:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 18:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 19:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 20:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 21:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 22:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 23:00:00'}]

Related

PowerBI - Select last non-blank value of a Measure based on most recent Date

I have the following table in PowerBI :
FullDate Visits Orders ConversionRate
01/02/2020 00:00:00 100 20 0,20
01/02/2020 01:00:00 550 78 0,14
01/02/2020 02:00:00 652 60 0,09
01/02/2020 03:00:00 0 0 0,00
01/02/2020 04:00:00 0 0 0,00
01/02/2020 05:00:00 0 0 0,00
ConversionRate is a measure :
ConversionRate = DIVIDE(SUM(Table[Orders]),SUM(Table[Visits]))
I need to get the value of trying to display in a card the latest non-blank value of ConversionRate based on the date.
I tried this but it returned empty in all the fields :
LastValue =
CALCULATE(
DIVIDE(SUM([Orders]),SUM([Visits])),
FILTER(Table, Table[FullDate] = MAX( Table[FullDate] ) && Table[Visits] <> 0 )
)
This returned the same value as ConversionRate for each row.
FullDate Visits Orders ConversionRate LastValue
01/02/2020 00:00:00 100 20 0,20 0,20
01/02/2020 01:00:00 550 78 0,14 0,14
01/02/2020 02:00:00 652 60 0,09 0,09
01/02/2020 03:00:00 0 0 0,00 0,00
01/02/2020 04:00:00 0 0 0,00 0,00
01/02/2020 05:00:00 0 0 0,00 0,00
What I want is :
FullDate Visits Orders ConversionRate LastValue
01/02/2020 00:00:00 100 20 0,20 0,09
01/02/2020 01:00:00 550 78 0,14 0,09
01/02/2020 02:00:00 652 60 0,09 0,09
01/02/2020 03:00:00 0 0 0,00 0,00
01/02/2020 04:00:00 0 0 0,00 0,00
01/02/2020 05:00:00 0 0 0,00 0,00
I am sure I am missing something but I am new to DAX.. Any help would be appreciated

You want to capture your 'max' date before you begin the iteration over 'Table' in the Filter function. Doing the calculation to discover the max date and assigning it to a variable gets this done.
Then you use your variable in the filter function, and you get the proper result. An unwanted side effect is that the .09 is calculated in every row, so you have check that Visit column a second time and return a blank if its value is zero.
LastConversion =
VAR maxDate =
CALCULATE (
MAX ( 'Table'[FullDate] ),
FILTER ( ALL ( 'Table' ), 'Table'[Visits] <> 0 )
)
RETURN
IF (
SUM ( 'Table'[Visits] ) = 0,
BLANK (),
CALCULATE (
DIVIDE ( SUM ( [Orders] ), SUM ( [Visits] ) ),
FILTER ( ALL('Table'), 'Table'[FullDate] = maxDate )
)
)

Problem with if condition in my fortran code

I have been trying to extract the atom number corresponding to atom name "OW" using if condition in fortran code from a file. But when I am using the 'if condition' the values are not written in a file. Could anybody help me regarding the same where I am doing wrong.
implicit none
character(len=100)::head,grofile
character(len=5):: res_nm,at_name
integer :: n,i,ierror,at_num
write(*,*) 'enter the name of gro file'
read(*,*) grofile
open(unit=10,file=grofile,status='old',action='read')
openif : if (ierror == 0) then
!open was ok. Read values.
read(10,*)head
read(10,*)n
do i=1,n
read(10,200) at_name,at_num
if (at_name == 'OW') then
write(44,*)at_num
200 format (10x,a5,i5)
endif
enddo
endif openif
end program name
and the input file that I am using is
CNT in water
44316
1LIG C 1 2.814 2.448 2.231 -0.2002 0.0645 -0.2005
1LIG C 2 2.783 2.584 2.233 0.4146 0.2083 -0.1403
1LIG C 3 2.769 2.658 2.350 -0.4678 -0.0886 -0.0500
1LIG C 4 2.687 2.772 2.348 -0.7671 -0.3032 -0.0624
1LIG C 5 2.619 2.795 2.228 -0.2327 -0.2483 -0.3593
1LIG C 6 2.486 2.837 2.238 -0.0621 0.2349 -0.0781
................
1LIG H 1006 2.613 1.972 12.082 -1.2767 0.0570 0.2045
1LIG H 1007 2.804 2.173 12.099 -0.4228 1.8734 1.9762
1LIG H 1008 2.862 2.377 12.097 -0.7176 -2.2587 1.0804
2water OW 1009 2.221 1.281 6.853 -0.6831 -0.3395 0.1402
2water HW1 1010 2.191 1.215 6.789 -1.2195 0.6304 -0.6225
2water HW2 1011 2.143 1.333 6.871 -0.5687 -0.7024 1.7263
2water MW 1012 2.206 1.279 6.847 -0.7389 -0.2594 0.2489
3water OW 1013 2.826 4.482 12.736 -0.2852 0.1750 0.1277
3water HW1 1014 2.735 4.490 12.707 -0.3265 -0.3844 0.1046
3water HW2 1015 2.860 4.406 12.689 0.4937 0.9762 -0.6120
3water MW 1016 2.818 4.473 12.726 -0.1879 0.2065 0.0267
4water OW 1017 3.510 2.042 10.165 0.1154 -0.0258 -0.0813
4water HW1 1018 3.530 2.105 10.095 3.0124 -0.2562 0.4945
4water HW2 1019 3.434 1.993 10.132 -0.4188 1.8748 -1.7521
...............
4.90369 4.90369 14.25892
Also, I am not getting any error for the code and without any output.
The command that I am using
gfortran br_br_gofr_smooth_dlp4.f90 -o read -I /usr/local/include/ -lgmxfort -g -fcheck=all -fbounds-check
./read

You have two errors:
You are using ierror uninitialised, as noted in the comments
at_name is a length 5 character variable. You are comparing it with a 2 letter character variable. For this to be true the leftmost 2 characters of at_name have to be the same as those in the two character variable. Unfortunately as your code is written it reads the atom name into the rightmost 2 characters of at_name. Thus the test fails
The code below shows a way of fixing the above, and does what I think you want. Especially for point 2 there are other ways.
ijb#ijb-Latitude-5410:~/work/stack$ cat pdb.f90
implicit none
character(len=100)::head,grofile
character(len=5):: res_nm,at_name
integer :: n,i,ierror,at_num
write(*,*) 'enter the name of gro file'
read(*,*) grofile
open(unit=10,file=grofile,status='old',action='read',iostat=ierror)
openif : if (ierror == 0) then
!open was ok. Read values.
read(10,*)head
read(10,*)n
do i=1,n
read(10,200) at_name,at_num
if (Adjustl(at_name) == 'OW') then
write(44,*)at_num
200 format (10x,a5,i5)
endif
enddo
endif openif
end program
ijb#ijb-Latitude-5410:~/work/stack$ gfortran -std=f2008 -Wall -Wextra -fcheck=all -g -O pdb.f90
pdb.f90:3:27:
3 | character(len=5):: res_nm,at_name
| 1
Warning: Unused variable ‘res_nm’ declared at (1) [-Wunused-variable]
ijb#ijb-Latitude-5410:~/work/stack$ cat stuff
CNT in water
20
1LIG C 1 2.814 2.448 2.231 -0.2002 0.0645 -0.2005
1LIG C 2 2.783 2.584 2.233 0.4146 0.2083 -0.1403
1LIG C 3 2.769 2.658 2.350 -0.4678 -0.0886 -0.0500
1LIG C 4 2.687 2.772 2.348 -0.7671 -0.3032 -0.0624
1LIG C 5 2.619 2.795 2.228 -0.2327 -0.2483 -0.3593
1LIG C 6 2.486 2.837 2.238 -0.0621 0.2349 -0.0781
1LIG H 1006 2.613 1.972 12.082 -1.2767 0.0570 0.2045
1LIG H 1007 2.804 2.173 12.099 -0.4228 1.8734 1.9762
1LIG H 1008 2.862 2.377 12.097 -0.7176 -2.2587 1.0804
2water OW 1009 2.221 1.281 6.853 -0.6831 -0.3395 0.1402
2water HW1 1010 2.191 1.215 6.789 -1.2195 0.6304 -0.6225
2water HW2 1011 2.143 1.333 6.871 -0.5687 -0.7024 1.7263
2water MW 1012 2.206 1.279 6.847 -0.7389 -0.2594 0.2489
3water OW 1013 2.826 4.482 12.736 -0.2852 0.1750 0.1277
3water HW1 1014 2.735 4.490 12.707 -0.3265 -0.3844 0.1046
3water HW2 1015 2.860 4.406 12.689 0.4937 0.9762 -0.6120
3water MW 1016 2.818 4.473 12.726 -0.1879 0.2065 0.0267
4water OW 1017 3.510 2.042 10.165 0.1154 -0.0258 -0.0813
4water HW1 1018 3.530 2.105 10.095 3.0124 -0.2562 0.4945
4water HW2 1019 3.434 1.993 10.132 -0.4188 1.8748 -1.7521
4.90369 4.90369 14.25892
ijb#ijb-Latitude-5410:~/work/stack$ ls -lrt | tail
-rw-rw-r-- 1 ijb ijb 1106 Jun 11 05:43 pi_orig.f90
-rw-rw-r-- 1 ijb ijb 958 Jun 11 05:56 pi_ijb.f90~
-rw-rw-r-- 1 ijb ijb 1805 Jun 11 06:07 pi_ijb.f90
-rw-rw-r-- 1 ijb ijb 1106 Jun 11 06:29 pi2.f90~
-rw-rw-r-- 1 ijb ijb 1305 Jun 11 06:34 pi2.f90
-rw-rw-r-- 1 ijb ijb 537 Jun 14 08:39 pdb.f90~
-rw-rw-r-- 1 ijb ijb 1462 Jun 14 08:40 stuff~
-rw-rw-r-- 1 ijb ijb 1425 Jun 14 08:41 stuff
-rw-rw-r-- 1 ijb ijb 560 Jun 14 08:43 pdb.f90
-rwxrwxr-x 1 ijb ijb 20520 Jun 14 08:49 a.out
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
enter the name of gro file
stuff
ijb#ijb-Latitude-5410:~/work/stack$ ls -lrt | tail
-rw-rw-r-- 1 ijb ijb 958 Jun 11 05:56 pi_ijb.f90~
-rw-rw-r-- 1 ijb ijb 1805 Jun 11 06:07 pi_ijb.f90
-rw-rw-r-- 1 ijb ijb 1106 Jun 11 06:29 pi2.f90~
-rw-rw-r-- 1 ijb ijb 1305 Jun 11 06:34 pi2.f90
-rw-rw-r-- 1 ijb ijb 537 Jun 14 08:39 pdb.f90~
-rw-rw-r-- 1 ijb ijb 1462 Jun 14 08:40 stuff~
-rw-rw-r-- 1 ijb ijb 1425 Jun 14 08:41 stuff
-rw-rw-r-- 1 ijb ijb 560 Jun 14 08:43 pdb.f90
-rwxrwxr-x 1 ijb ijb 20520 Jun 14 08:49 a.out
-rw-rw-r-- 1 ijb ijb 39 Jun 14 08:49 fort.44
ijb#ijb-Latitude-5410:~/work/stack$ cat fort.44
1009
1013
1017
ijb#ijb-Latitude-5410:~/work/stack$

Dates and Between statement

I am using SAS E.G. 7.1
I have the following code:
data time_dim_monthly;
do i = 0 to 200;
index_no = i;
year_date = year(intnx('month','01JAN2008'd,i));
month_date = month(intnx('month','01JAN2008'd,i));
SOM = put(intnx('month', '01JAN2008'd, i, 'b'),date11.) ;
EOM = put(intnx('month', '01JAN2008'd, i, 'e'),date11.) ;
days_in_month = INTCK('day',intnx('month', '01JAN2008'd, i, 'b'),
intnx('month', '01JAN2008'd, i, 'e'));
output;
end;
run;
followed by
proc sql;
create table calendar as
select year_date, month_date, index_no, put(today(),date11.) as todays_dt, som, eom
from time_dim_monthly
where put(today(),date11.) between som and eom
/*or datepart((INTNX('month',today(),-1)) between som and eom)*/
order by index_no
;
quit;
The output looks like this:
year_date month_date index_no todays_dt SOM EOM
2008 10 9 31-MAY-2017 01-OCT-2008 31-OCT-2008
2009 10 21 31-MAY-2017 01-OCT-2009 31-OCT-2009
2010 10 33 31-MAY-2017 01-OCT-2010 31-OCT-2010
2011 10 45 31-MAY-2017 01-OCT-2011 31-OCT-2011
2012 10 57 31-MAY-2017 01-OCT-2012 31-OCT-2012
2013 10 69 31-MAY-2017 01-OCT-2013 31-OCT-2013
2014 10 81 31-MAY-2017 01-OCT-2014 31-OCT-2014
2015 10 93 31-MAY-2017 01-OCT-2015 31-OCT-2015
2016 10 105 31-MAY-2017 01-OCT-2016 31-OCT-2016
2017 5 112 31-MAY-2017 01-MAY-2017 31-MAY-2017
2017 10 117 31-MAY-2017 01-OCT-2017 31-OCT-2017
2018 5 124 31-MAY-2017 01-MAY-2018 31-MAY-2018
2018 10 129 31-MAY-2017 01-OCT-2018 31-OCT-2018
2019 5 136 31-MAY-2017 01-MAY-2019 31-MAY-2019
2019 10 141 31-MAY-2017 01-OCT-2019 31-OCT-2019
2020 5 148 31-MAY-2017 01-MAY-2020 31-MAY-2020
2020 10 153 31-MAY-2017 01-OCT-2020 31-OCT-2020
2021 5 160 31-MAY-2017 01-MAY-2021 31-MAY-2021
2021 10 165 31-MAY-2017 01-OCT-2021 31-OCT-2021
2022 5 172 31-MAY-2017 01-MAY-2022 31-MAY-2022
2022 10 177 31-MAY-2017 01-OCT-2022 31-OCT-2022
2023 5 184 31-MAY-2017 01-MAY-2023 31-MAY-2023
2023 10 189 31-MAY-2017 01-OCT-2023 31-OCT-2023
2024 5 196 31-MAY-2017 01-MAY-2024 31-MAY-2024
While I'd expected that it would only give me one line:
2017 5 112 31-MAY-2017 01-MAY-2017 31-MAY-2017
Would appreciate help in understanding why this is happening.
Thank you

This is your mistake:
SOM = put(intnx('month', '01JAN2008'd, i, 'b'),date11.) ;
EOM = put(intnx('month', '01JAN2008'd, i, 'e'),date11.) ;
where put(today(),date11.) between som and eom
put creates a character variable. You shouldn't really use between with character variables unless you really know what you're doing (it will compare in alphabetical order).
Use numeric variables. Get rid of the put. Instead use a format statement to make the variables look nice, but still be numeric.
SOM = intnx('month', '01JAN2008'd, i, 'b') ;
EOM = intnx('month', '01JAN2008'd, i, 'e') ;
format som eom date11.;
later
where today() between som and eom

pandas: retain hour while setting a constant for date, minutes

I want to create a new column of datetime by setting a constant for the year, month, day, and minutes, seconds, retaining only the
import pandas as pd
td = pd.DataFrame(['2015-01-01 09:03:00', '2015-01-11 15:47:00',
'2015-01-11 16:47:00', '2015-01-11 01:47:00', '2016-01-11 01:47:00'], columns=['datetime'])
td['datetime'] = pd.to_datetime(td['datetime'])
datetime
0 2015-01-01 09:03:00
1 2015-01-11 15:47:00
2 2015-01-11 16:47:00
3 2015-01-11 01:47:00
4 2016-01-11 01:47:00
The result should look like this.
datetime
0 1900-01-01 09:00:00
1 1900-01-01 15:00:00
2 1900-01-01 16:00:00
3 1900-01-01 01:00:00
4 1900-01-01 01:00:00
How can I code this out? Thanks!

use pd.to_timedelta
pd.to_datetime('1900-01-01') + pd.to_timedelta(td.datetime.dt.hour, unit='H')
0 1900-01-01 09:00:00
1 1900-01-01 15:00:00
2 1900-01-01 16:00:00
3 1900-01-01 01:00:00
4 1900-01-01 01:00:00
Name: datetime, dtype: datetime64[ns]

td['datetime'].apply(lambda x: datetime(1900, 1, 1) + timedelta(hours=x.hour))
0 1900-01-01 09:00:00
1 1900-01-01 15:00:00
2 1900-01-01 16:00:00
3 1900-01-01 01:00:00
4 1900-01-01 01:00:00
Name: datetime, dtype: datetime64[ns]

Python frequency of 2D array

I want to see the frequency of the data for each year.
My array looks like this : List[Data,Year]
List[[259,1910],[259,1910],[259,1910],[192,1910].....
Data Year
259 1910
259 1910
259 1910
192 1910
313 1910
259 1911
259 1911
192 1912
313 1912
I want to get the result like
Data Year Frequency
259 1910 3
259 1911 2
259 1912 0
192 1910 1
192 1911 0
192 1912 1
...
..
.

You can use dictionary to count frequency. Python allows using tuple as dictionary key.
data = [259, 259, 192, 313, 259, 259, 192, 313]
yrs = [1910, 1910, 1910, 1910, 1911, 1911, 1912, 1912]
frequencies = {}
for idx in range(len(data)):
key = (data[idx], yrs[idx])
if key in frequencies:
frequencies[key] += 1
else:
frequencies[key] = 1
data_with_freq = []
for key, freq in frequencies.iteritems():
print (key[0], key[1], freq)
data_with_freq.append((key[0], key[1], freq))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is there an efficient way to fill date gaps in python? - python-2.7

Related

PowerBI - Select last non-blank value of a Measure based on most recent Date

Problem with if condition in my fortran code

Dates and Between statement

pandas: retain hour while setting a constant for date, minutes

Python frequency of 2D array

Categories

Resources