python Find the most reported month

python Find the most reported month - regex

I am trying to find out October(mentioned 2 times), I had the idea to use dictionary to solve this problem. However I struggled a lot to figure out how to find/separate the months, I was not able to use my solution for the 1st str values where there are some spaces. Can someone please suggest how can I modify that split section to cover - , and white space?
import re
#str="May-29-1990, Oct-18-1980 ,Sept-1-1980, Oct-2-1990"
str="May-29-1990,Oct-18-1980,Sept-1-1980,Oct-2-1990"
val=re.split(',',str)
monthList=[]
myDictionary={}
#put the months in a list
def sep_month():
for item in val:
if not item.isdigit():
month,day,year=item.split("-")
monthList.append(month)
#process the month list from above
def count_month():
for item in monthList:
if item not in myDictionary.keys():
myDictionary[item]=1
else:
myDictionary[item]=myDictionary.get(item)+1
for k,v in myDictionary.items():
if v==2:
print(k)
sep_month()
count_month()

from datetime import datetime
import calendar
from collections import Counter
datesString = "May-29-1990,Oct-18-1980,Sep-1-1980,Oct-2-1990"
datesListString = datesString.split(",")
datesList = []
for dateStr in datesListString:
datesList.append(datetime.strptime(dateStr, '%b-%d-%Y'))
monthsOccurrencies = Counter((calendar.month_name[date.month] for date in datesList))
print(monthsOccurrencies)
# Counter({'October': 2, 'May': 1, 'September': 1})
Something to be aware in my solution with %b for the month is that Sept has changed to Sep to work (Month as locale’s abbreviated name). In this case you can either use fullname months (%B) or abbreviated name (%b). If you can not have the big string as with correct month name formatting, just replace the wrong ones ("Sept" for example with "Sep" and always work with date obj).

Not sure that regex is the best tool for this job, I would just use strip() along with split() to handle your whitespace issues and get a list of just the month abbreviations. Then you could create a dict with counts by month using the list method count(). For example:
dates = 'May-29-1990, Oct-18-1980 ,Sept-1-1980, Oct-2-1990'
months = [d.split('-')[0].strip() for d in dates.split(',')]
month_counts = {m: months.count(m) for m in set(months)}
print(month_counts)
# {'May': 1, 'Oct': 2, 'Sept': 1}
Or even better with collections.Counter:
from collections import Counter
dates = 'May-29-1990, Oct-18-1980 ,Sept-1-1980, Oct-2-1990'
months = [d.split('-')[0].strip() for d in dates.split(',')]
month_counts = Counter(months)
print(month_counts)
# Counter({'Oct': 2, 'May': 1, 'Sept': 1})

Related

RegEx for matching the month, day and year

I'm trying to find a regular expression to extract the month, day and year from a datetime stamp in this format:
01/20/2019 12:34:54
It should return a list:
['01', '20', '2019']
I know this can be solved using:
dt.split(' ')[0].split('/')
But, I'm trying to find a regex to do it:
[^\/\s]+
But, I need it to exclude everything after the space.

As you are expecting the date month and year to be returned as a list, you can use this Python code,
import re
s = '01/20/2019 12:34:54'
print(re.findall(r'\d+(?=[ /])', s))
Prints,
['01', '20', '2019']
Otherwise, you can better write your regex as,
(\d{2})/(\d{2})/(\d{4})
And get date, month and year from group1, group2 and group3
Regex Demo
Python code in this way should be,
import re
s = '01/20/2019 12:34:54'
m = re.search(r'(\d{2})/(\d{2})/(\d{4})', s)
if m:
print([m.group(1), m.group(2), m.group(3)])
Prints,
['01', '20', '2019']

You should absolutely be taking advantage of Python's date/time API here. Use strptime to parse your input datetime string to a bona fide Python datetime. Then, just build a list, accessing the various components you need.
dt = "01/20/2019 12:34:54"
dto = datetime.strptime(dt, '%m/%d/%Y %H:%M:%S')
list = [dto.month, dto.day, dto.year]
print(list)
[1, 20, 2019]
If you really want/need to work with the original datetime string, then split provides an option, without even formally using a regex:
dt = "01/20/2019 12:34:54"
dt = dt.split()[0].split('/')
print(dt)
['01', '20', '2019']

This RegEx might help you to do so.
([0-9]+)\/([0-9]+)\/([0-9]+)\s[0-9]+:[0-9]+:[0-9]+
Code:
import re
string = '01/20/2019 12:34:54'
matches = re.search(r'([0-9]+)/([0-9]+)/([0-9]+)', string)
if matches:
print([matches.group(1), matches.group(2), matches.group(3)])
else:
print('Sorry! No matches! Something is not right! Call 911')
Output
['01', '20', '2019']

Identifying dates in a string in python using dateutil returning no output

I am trying to identify dates from a column containing text entries and output the dates to a text file. However, my code is not returning any output. I can't seem to figure out what I did wrong in my code. I'd appreciate some help on this.
My Code:
import csv
from dateutil.parser import parse
with open('file1.txt', 'r') as f_input, open('file2.txt', 'w') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for row in csv_input:
x = str(row[3])
def is_date(x):
try:
parse(x)
csv_output.writerow([row[0], row[1], row[2], row[3], row[4]])
# no return value in case of success
except ValueError:
return False
is_date(x)

Guessing somewhat you input like e.g.:
1,2,3, This is me on march first of 2018 at 2:15 PM, 2015
3,4,5, She was born at 12pm on 9/11/1980, 2015
a version of what you want could be
from dateutil.parser import parse
with open("input.txt", 'r') as inFilePntr, open("output.txt", 'w') as outFilePntr:
for line in inFilePntr:
clmns = line.split(',')
clmns[3] = parse( clmns[3], fuzzy_with_tokens=True )[0].strftime("%Y-%m-%d %H:%M:%S")
outFilePntr.write( ', '.join(clmns) )
Note, as you do not touch the other columns, I just leave them as text. Hence, no need for csv. You never did anything with the return value of parse. I use the fuzzy token, as my column three has the date somewhat hidden in other text. The returned datetime object is transformed into a string of my liking (see here) and inserted in column three, replacing the old value.
I recombine the strings with comma separation again an write it into output.txt, which looks like:
1, 2, 3, 2018-03-01 14:15:00, 2015
3, 4, 5, 1980-09-11 12:00:00, 2015

Pandas: SettingWithCopyWarning, trying to understand how to write the code better, not just whether to ignore the warning

I am trying to change all date values in a spreadsheet's Date column where the year is earlier than 1900, to today's date, so I have a slice.
EDIT: previous lines of code:
df=pd.read_excel(filename)#,usecols=['NAME','DATE','EMAIL']
#regex to remove weird characters
df['DATE'] = df['DATE'].str.replace(r'[^a-zA-Z0-9\._/-]', '')
df['DATE'] = pd.to_datetime(df['DATE'])
sample row in dataframe: name, date, email
[u'Public, Jane Q.\xa0' u'01/01/2016\xa0' u'jqpublic#email.com\xa0']
This line of code works.
df["DATE"][df["DATE"].dt.year < 1900] = dt.datetime.today()
Then, all date values are formatted:
df["DATE"] = df["DATE"].map(lambda x: x.strftime("%m/%d/%y"))
But I get an error:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-
versus-copy
I have read the documentation and other posts, where using .loc is suggested
The following is the recommended solution:
df.loc[row_indexer,col_indexer] = value
but df["DATE"].loc[df["DATE"].dt.year < 1900] = dt.datetime.today() gives me the same error, except that the line number is actually the line number after the last line in the script.
I just don't understand what the documentation is trying to tell me as it relates to my example.
I started messing around with pulling out the slice and assigning to a separate dataframe, but then I'm going to have to bring them together again.

You are producing a view when you df["DATE"] and subsequently use a selector [df["DATE"].dt.year < 1900] and try to assign to it.
df["DATE"][df["DATE"].dt.year < 1900] is the view that pandas is complaining about.
Fix it with loc like this:
df.loc[df.DATE.dt.year < 1900, "DATE"] = pd.datetime.today()

My thought would be that you could do
df.loc[df.DATE.dt.year < 1900, "DATE"] = dt.datetime.today()
df.loc[:, "DATE"] = df.DATE.map(lambda x: x.strftime("%m/%d/%y")
Not at a computer so I can't test but I think that should do it.

Would DateTimeField() work if I have time in this format 1/7/11 9:15 ? If not what would?

I am importing data from a JSON file and it has the date in the following format 1/7/11 9:15
What would be the best variable type/format to define in order to accept this date as it is? If not what would be the most efficient way to accomplish this task?
Thanks.

"What would be the best variable type/format to define in order to accept this date as it is?"
The DateTimeField.
"If not what would be the most efficient way to accomplish this task?"
You should use the datetime.strptime method from Python's builtin datetime library:
>>> from datetime import datetime
>>> import json
>>> json_datetime = "1/7/11 9:15" # still encoded as JSON
>>> py_datetime = json.loads(json_datetime) # now decoded to a Python string
>>> datetime.strptime(py_datetime, "%m/%d/%y %I:%M") # coerced into a datetime object
datetime.datetime(2011, 1, 7, 9, 15)
# Now you can save this object to a DateTimeField in a Django model.

If you take a look at https://docs.djangoproject.com/en/dev/ref/models/fields/#datetimefield, it says that django uses the python datetime library which is docomented at http://docs.python.org/2/library/datetime.html.
Here is a working example (with many debug prints and step-by-step instructions:
from datetime import datetime
json_datetime = "1/7/11 9:15"
json_date, json_time = json_datetime.split(" ")
print json_date
print json_time
day, month, year = map(int, json_date.split("/")) #maps each string in stringlist resulting from split to an int
year = 2000 + year #be ceareful here! 2 digits for a year may cause trouble!!! (could be 1911 as well)
hours, minutes = map(int, json_time.split(":"))
print day
print month
print year
my_datetime = datetime(year, month, day, hours, minutes)
print my_datetime
#Generate a json date:
new_json_style = "{0}/{1}/{2} {3}:{4}".format(my_datetime.day, my_datetime.month, my_datetime.year, my_datetime.hour, my_datetime.minute)
print new_json_style

Get objects created in last 30 days, for each past day

I am looking for fast method to count model's objects created within past 30 days, for each day separately. For example:
27.07.2013 (today) - 3 objects created
26.07.2013 - 0 objects created
25.07.2013 - 2 objects created
...
27.06.2013 - 1 objects created
I am going to use this data in google charts API. Have you any idea how to get this data efficiently?

items = Foo.objects.filter(createdate__lte=datetime.datetime.today(), createdate__gt=datetime.datetime.today()-datetime.timedelta(days=30)).\
values('createdate').annotate(count=Count('id'))
This will (1) filter results to contain the last 30 days, (2) select just the createdate field and (3) count the id's, grouping by all selected fields (i.e. createdate). This will return a list of dictionaries of the format:
[
{'createdate': <datetime.date object>, 'count': <int>},
{'createdate': <datetime.date object>, 'count': <int>},
...
]
EDIT:
I don't believe there's a way to get all dates, even those with count == 0, with just SQL. You'll have to insert each missing date through python code, e.g.:
import datetime
# needed to use .append() later on
items = list(items)
dates = [x.get('createdate') for x in items]
for d in (datetime.datetime.today() - datetime.timedelta(days=x) for x in range(0,30)):
if d not in dates:
items.append({'createdate': d, 'count': 0})

I think this can be somewhat more optimized solution with #knbk 's solution with python. This has fewer iterations and iterations inside SET is highly optimized in python (both in processing and in CPU-cycles).
from_date = datetime.date.today() - datetime.timedelta(days=7)
orders = Order.objects.filter(created_at=from_date, dealer__executive__branch__user=user)
orders = orders.annotate(count=Count('id')).values('created_at').order_by('created_at')
if len(orders) < 7:
orders_list = list(orders)
dates = set([(datetime.date.today() - datetime.timedelta(days=i)) for i in range(6)])
order_set = set([ord['created_at'] for ord in orders])
for dt in (order_set - dates):
orders_list.append({'created_at': dt, 'count': 0})
orders_list = sorted(orders_list, key=lambda item: item['created_at'])
else:
orders_list = orders

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

python Find the most reported month - regex

Related

RegEx for matching the month, day and year

Identifying dates in a string in python using dateutil returning no output

Pandas: SettingWithCopyWarning, trying to understand how to write the code better, not just whether to ignore the warning

Would DateTimeField() work if I have time in this format 1/7/11 9:15 ? If not what would?

Get objects created in last 30 days, for each past day

Categories

Resources