Converting to date format in pandas

Converting to date format in pandas - python-2.7

I have a dataframe that contains a column which holds:
Date:
31062005
072005
12005
2012
I would like to convert these dates to the format:
Date:
31/06/2005
07/2005
01/2005
2012
What is the simplest way to do this? The fields are not in a date format yet, only strings.

Here:
df = pd.DataFrame(['30/06/2005', '07/2005', '1/2005', '2012'], columns=['Date'])
temp = pd.DataFrame(df['Date'].str.split('/').apply(reversed).tolist())\
.fillna('01')
df['Date'] = pd.to_datetime(temp[0].str.cat(temp[1].str.zfill(2))\
.str.cat(temp[2].str.zfill(2)), format='%Y%m%d')

suppose you write a function
def convert_date(s):
if len(s) == 4:
return s
elif len(s) < 7:
return s[: -4].zfill(2) + '/' + s[-4: ]
else:
return s[: -6].zfill(2) + '/' + s[-6: -4].zfill(2) + '/' + s[-4]
Then if your dates are in df.dates, you can use
>>> df.dates.apply(convert_date)
0 31/06/2
1 07/2005
2 01/2005
3 2012
Name: dates, dtype: object
Note that this converts a string in one form to a string in a different form, meaning you can't really manipulate dates further. If you want to do that, I'd suggest you amend the preceding function to use the appropriate datetime.datetime.strptime for the format matching the length of the string. It could look something like this:
def convert_date(s):
if len(s) == 4:
return datetime.datetime.strptime('%Y')
elif len(s) < 8:
return datetime.datetime.strptime('%m%Y')
else:
return datetime.datetime.strptime('%d%m%Y')
Note that your first date (with the 31 days) seems illegal, though.

Related

Replace a string containing parentheses with a float in pandas

I have a dataset with a column of strings which I want to convert to floats. However the column has a single entry containing a number within parentheses (which means to be a negative number). I tried different ways --indirect and direct-- to replace the value with a representation that would enable me to convert it to float but I have been failing and I don't understand why:
Here is the row with the digit under parentheses as a string:
My code:
mask1 = purchases.Amount.str.contains('\(').fillna(False)
purchases.loc[mask1, :]['Amount'] = purchases.loc[mask1, :]['Amount'].str.replace('\(', '-').str.replace('\)', '')
purchases.loc[mask2, :]['Amount'] = purchases.loc[mask2, :]['Amount'].str.replace('\s+', '').str.replace('[a-z]+', '')
# Both fail to replace
purchases.loc[mask1, :]['Amount'] = '-29.99' # direct assignment also fails
The result:
What am I doing wrong? How can I correct it?

Use rstrip for remove last ), then replace ( and last convert to floats:
df = pd.DataFrame({'Amount': ['(29.29)', '(39.39)', '12.5', '340']})
df['Amount'] = df['Amount'].str.strip(')').str.replace('\(', '-').astype(float)
print (df)
Amount
0 -29.29
1 -39.39
2 12.50
3 340.00
Your solutions are very close, what you need, only use loc with columns namef for avoid chain indexing:
mask1 = purchases.Amount.str.contains('\(').fillna(False)
purchases.loc[mask1, 'Amount'] = purchases.loc[mask1, 'Amount'].str.replace('\(', '-').str.replace('\)', '')
purchases.loc[mask2, 'Amount'] = purchases.loc[mask2, 'Amount'].str.replace('\s+', '').str.replace('[a-z]+', '')
purchases.loc[mask1, 'Amount'] = '-29.99'

Why not just check if the string is surrounded by brackets, and if it is, strip them.
from decimal import Decimal
def get_amount(s):
if s[0] == '(' and s[-1] == ')':
return Decimal(s[1:-1])
else:
return Decimal(s)

You can try:
df = pd.DataFrame({'Amount': ['(29.29)', '29.29']})
print(df)
df['Amount']=df.Amount.apply(lambda x: -float(x[1:-1]) if x[0] == '(' else float(x))
print(df)
print(df.dtypes)
Result:
Amount
0 (29.29)
1 29.29
Amount
0 -29.29
1 29.29
Amount float64
dtype: object

Django: DurationField with resolution in microseconds

The django DurationField displays only HH:MM:SS in the django admin interface.
Unfortunately this is not enough in my current context.
I need to be able to show/edit microseconds in the admin interface.
How could this be done?
Update
This was a mistake. My data in the database was wrong. The microseconds where removed in a process before the data came into the database.
Django displayes the microseconds if there are any. You don't need to do anything to show them.

Have a look on source:
https://docs.djangoproject.com/en/2.0/_modules/django/db/models/fields/#DurationField
I think the way is to override forms.DurationField (https://docs.djangoproject.com/en/2.0/_modules/django/forms/fields/#DurationField) and to be exact these method:
from django.utils.duration import duration_string
def duration_string(duration):
"""Version of str(timedelta) which is not English specific."""
days, hours, minutes, seconds, microseconds = _get_duration_components(duration)
string = '{:02d}:{:02d}:{:02d}'.format(hours, minutes, seconds)
if days:
string = '{} '.format(days) + string
if microseconds:
string += '.{:06d}'.format(microseconds)
return string
be aware that there may be need to override these too django.utils.dateparse.parse_duration
def parse_duration(value):
"""Parse a duration string and return a datetime.timedelta.
The preferred format for durations in Django is '%d %H:%M:%S.%f'.
Also supports ISO 8601 representation and PostgreSQL's day-time interval
format.
"""
match = standard_duration_re.match(value)
if not match:
match = iso8601_duration_re.match(value) or postgres_interval_re.match(value)
if match:
kw = match.groupdict()
days = datetime.timedelta(float(kw.pop('days', 0) or 0))
sign = -1 if kw.pop('sign', '+') == '-' else 1
if kw.get('microseconds'):
kw['microseconds'] = kw['microseconds'].ljust(6, '0')
if kw.get('seconds') and kw.get('microseconds') and kw['seconds'].startswith('-'):
kw['microseconds'] = '-' + kw['microseconds']
kw = {k: float(v) for k, v in kw.items() if v is not None}
return days + sign * datetime.timedelta(**kw)

Convert from decimal to binary - python

I'm having an issue with this piece of code I wrote. I'm trying to convert an integer input and print an output with its equivalent in binary base. For example for 5 it should drop an output of '101' however it just prints '10' like if it doesn't take into account the last digit. Please any comments would be greatly appreciated
T = raw_input()
for i in range(0, int(T)):
n = raw_input()
dec_num = int(n)
cnv_bin = ''
while dec_num//2 > 0:
if dec_num%2 == 0:
cnv_bin += '0'
else:
cnv_bin += '1'
dec_num = dec_num//2
print cnv_bin[::-1]

while dec_num//2 > 0:
should be:
while dec_num > 0:
The first time through the loop, 5//2==2, so it continues.
The second time through the loop, 2//2==1, so it continues.
The third time, 1//2==0 and the loop quits without handling the last bit.
Also, you can just do the following to display a number in binary:
print format(dec_num,'b')
Format string version:
print '{0} decimal is {0:b} binary.'.format(5)

Why not use the build-in function bin()?
eg:
bin(5)
output
0b101
If you don't want the prefix(0b), you can exclude it.
bin(5)[2:]
hope to be helpful!

import math
def roundup(n):
return math.ceil(n)
D = eval(input("Enter The Decimal Value: "))
n = roundup(math.log2(D+1))-1
bi = 0
di = D
qi = 0
i = n
print("Binary Value:",end = " ")
while(i>=0):
qi = math.trunc(di/2**i)
bi = qi
print(bi,end = "")
di = di - bi*(2**i)
i = i-1

Assigning variable from csv file based on header using python

I have a csv file that has a single record in it that I need to assign variables to and run through a python script. My date looks like the line below. Has header.
"CompanyName","Contact","Street","CityZip","Store","DateRec","apples","appQuan","oranges","orgQuan","peaches","peaQuan","pumpkins","pumQuan","Receive",0
American Grocers","Allison Smith","456 1st. Street","Podunk, California 00990","Store 135 Order","05/14/2015",1,10,0,4,1,4,2,0
Each value needs to be assigned a variable
1st position, "American Grocers" = CompanyName
2nd position, "Allison Smith" = Contact
3rd position = Street, etc.
After the date it gets tricky. The last 11 values are related to each other and get saved to a key.
If value 7 = 1, then variable 7 = "apples" and variable 8 = 10, else skip values 7 and 8 and go to 9
If value 9 = 1, then variable 9 = "oranges" and value 10 = the variable in position 10 (4), else skip values 9 and 10 and go to 11
If value 11 = 1, then variable 11 = "peaches" and value 12 = the variable in position 10 (4), else skip values 11 and 12 and go to 13
If value 13 = 1, then variable 13 = "pumkins" and value 13 = the variable in position 13 (2), else skip values 13 and 14
If value 15 = 1 then variable 15 = "Delivery", else variable = "Pick up"
thus python would assign the following:
CompanyName = "American Grocers"
Contact = "Allison Smith"
Street = "456 1st. Street"
CityZip = "Podunk, California 00990"
Store = "Store 135 Order"
OrderDate (does not need to be date type) = "05/14/2015"
orderList = {"apples" : 10, "peaches" : 4, "pumpkins" : 2}
Recieve = "Pick up"
I need to manipulate these variables further along in the script.
I have the following code to which outputs the data to its corresponding header information.
import csv
MY_FILE = "C:\\tests\\DataRequestsData\\qryFruit.csv"
def parse(raw_file, delimiter):
opened_file = open(raw_file)
csv_data = csv.reader(opened_file, delimiter=delimiter)
parsed_data = []
fields = csv_data.next()
for row in csv_data:
parsed_data.append(dict(zip(fields, row)))
opened_file.close()
return parsed_data
def main():
new_data = parse(MY_FILE, ",")
print new_data
if __name__ == "__main__":
main()
Output looks like this. (I am not sure why the output is not in the same order as the file ...)
[{'DateRec': '05/14/2015', 'orgQuan': '4', 'CompanyName': 'American Grocers', 'appQuan': '10', 'peaQuan': '4', 'oranges': '0', 'peaches': '1', 'Contact': 'Allison Smith', 'CityZip': 'Podunk, California 00990', 'pumpkins': '2', 'apples': '1', 'pumQuan': '0', 'Store': 'Store 135 Order', 'Street': '456 1st. Street'}]
I do not know how to take this and get the variables assigned as listed above. Suggestions? Using python 2.7

I am not sure why the output is not in the same order as the file ...
In a Python dictionary, entries are displayed an arbitrary order.
Below is the general contour of how to parse the program. The detailed logic: "if this field is this do this otherwise do that" is something that I hope you can do on your own. The specifics are way too long, detailed, and specific to be of interest of value for anyone else and I'm guessing that is why there hasn't been much interest in this.
import csv
MY_FILE = "C:\\tests\\DataRequestsData\\qryFruit.csv"
def parse(raw_file, delimiter):
parsed_data = []
with open(raw_file) as opened_file:
rec = {}
csv_data = csv.reader(opened_file, delimiter=delimiter)
fields = csv_data.next()
for row in csv_data:
for i, val in enumerate(row[0:6]):
rec[fields[i]] = val
# This part below is too specific, long, and complicated
# that it is doubtful filling this out in detail will be use
# or interest to anyone else on stackoverflow. But to give
# you an idea of how to proceed...
if row[6] == '1':
rec[fields[6]] = 'apples'
rec[fields[7]] = 10
else:
# continue
pass
# ...
parsed_data.append(rec)
return parsed_data
def main():
new_data = parse(MY_FILE, ",")
print new_data
if __name__ == "__main__":
main()

I finally accomplished what I was looking for with the following code: Thank you to all who offered help. Your ideas spurred me onto the tangents I needed.
import csv
MY_FILE = csv.reader(open("C:\\tests\\DataRequestsData\\qryFruit.csv", "rb"))
for row in MY_FILE:
CompanyName, Contact, Street, CityZip, Store, DateRec, apples, appQuan, oranges, orgQuan, peaches, peaQuan, pumpkins, pumQuan, Receive = row
s='{'
if apples == "1":
s = s + '"apples"' + ":" + appQuan
if oranges == "1":
s = s + '", "oranges"' + ":" + orgQuan
if peaches == "1":
s = s + '", "peaches"' + ":" + peaQuan
if pumpkins == "1":
s = s + '", "pumpkins"' + ":" + pumQuan
s = s + '}'
if Receive == "0":
Receive = "Pick up"
else:
Receive = "Deliver"

date using either raw_input() or input() coming up as integer

New to Python and have read so many other SO questions that I feel like I am missing something with how to massage user input to string format. I have this simple code and I get the AttributeError: 'int' object has no attribute 'split' so I added exception handiling and am getting error everytime. I have tried almost everything with the str(), datetime() and std.readline() and nothing.
def dateConverter(userDate):
try:
#split the substrings for month day year
date = userDate.split("/")
#day
day = date[:2]
#month
month = date[3:5]#[ beginning : beginning + LENGTH]
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June', 7:'July', 8:'August', 9:'September', 10:'October', 11:'November', 12:'December'}
for key,value in months:
month=value
#year
year = date[4:]
print(str(month + ' ' + day + ',' + year))
return True
except:
print('Error')
return False
print('Enter a date in the format: mm/dd/yyyy \n')
userInput = raw_input()
dateConverter(userInput)
main()
Note: I have both Python27 and Python34 installed on Win7
Edit
vaibhav-sagar was correct, I wasn't slicing the string the right way and had nothing to do with the input. Although, I have Python27 & Python34 installed and even though I set my variable path to Python34 I have to use raw_input() which I heard was deprecated in Python34 so look out for that too. That is what was stumping me! Sorry, this was my second look at Python so it was really new territory. I actually got the slicing examples from another SO answer so that is what I get for assuming. Here is the solution:
#custom date converter func
def dateConverter(userDate):
try:
#split the substrings for month day year
date = userDate.split("/")
#day
day = date[1]#[ beginning : beginning + LENGTH]
#month
month = date[0]
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June', 7:'July', 8:'August', 9:'September', 10:'October', 11:'November', 12:'December'}
month=months[int(month)]
#year
year = date[2]
print(month + ' ' + day + ',' + year)
return True
except:
print('Error')
return False
Next step is to validate using re to validate the date is valid

I am using Python 3.3.5 and getting a different error. An exception is being raised at
for key, value in months:
Because iterating over a dictionary yields only keys, and not keys and values. What you want can be accomplished by:
for key, value in months.items():
More generally, your issues seem unrelated to your massaging of user input. This can be verified by using IDLE or another REPL. For example:
>>> someDate = '12/10/2014'
>>> date = someDate.split('/')
>>> date
['12', '10', '2014']
>>> day = date[:2]
>>> day
['12', '10']
>>> month = date[3:5]
>>> month
[]
>>> year = date[4:]
>>> year
[]
Python's slice syntax is doing something different to what I think you want. I also think you don't need a for loop, instead you can do:
month = months[int(month)]
This will assign the month name to month, like you expect. A function that does what I think you want would look something like this:
def dateConverter(userDate):
#split the substrings for month day year
date = userDate.split("/")
#day
day = date[1]
#month
month = date[0]
months = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June', 7:'July', 8:'August', 9:'September', 10:'October', 11:'November', 12:'December'}
month = months[int(month)]
#year
year = date[2]
print(str(month + ' ' + day + ',' + year))
return True
I hope that helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Converting to date format in pandas - python-2.7

I have a dataframe that contains a column which holds: Date: 31062005 072005 12005 2012 I would like to convert these dates to the format: Date: 31/06/2005 07/2005 01/2005 2012 What is the simplest way to do this? The fields are not in a date format yet, only strings.

Here: df = pd.DataFrame(['30/06/2005', '07/2005', '1/2005', '2012'], columns=['Date']) temp = pd.DataFrame(df['Date'].str.split('/').apply(reversed).tolist())\ .fillna('01') df['Date'] = pd.to_datetime(temp[0].str.cat(temp[1].str.zfill(2))\ .str.cat(temp[2].str.zfill(2)), format='%Y%m%d')

Related

Replace a string containing parentheses with a float in pandas

Django: DurationField with resolution in microseconds

Convert from decimal to binary - python

Assigning variable from csv file based on header using python

date using either raw_input() or input() coming up as integer

Categories

Resources