How can I parse multiple date columns in Pandas? - python-2.7

I have a field/column in a .csv file that I am loading into Pandas that will not parse as a datetime data type in Pandas. I don't really understand why. I want both FirstTime and SecondTime to parse as datetime64 in Pandas DataFrame.
# Assigning a header for our data
header = ['FirstTime', 'Col1', 'Col2', 'Col3', 'SecondTime', 'Col4',
'Col5', 'Col6', 'Col7', 'Col8']
# Loading our data into a dataframe
df = pd.read_csv('MyData.csv', names=header, parse_dates=['FirstTime', 'SecondTime'])
The code above will only parse SecondTime as datetime64[ns]. FirstTime is left as a Object data type. If I do the following code instead:
# Assigning a header for our data
header = ['FirstTime', 'Col1', 'Col2', 'Col3', 'SecondTime', 'Col4',
'Col5', 'Col6', 'Col7', 'Col8']
# Loading our data into a dataframe
df = pd.read_csv('MyData.csv', names=header, parse_dates=['FirstTime'])
It still will not parse FirstTime as a datetime64[ns].
The format for both columns is the same:
# Example FirstTime
# (%f is always .000)
2015-11-05 16:52:37.000
# Example SecondTime
# (%f is always .000)
2015-11-04 15:33:15.000
What am I missing here? Is the first column not able to be datetime by default or something in Pandas?

did you try
df = pd.read_csv('MyData.csv', names=header, parse_dates=True)

I had a similar problem and it turned out in one of my date variables there is an integer cell. So, python recognize it as "object" and the other one is recognized as "int64". You need to make sure both variables are integer.
You can use df.dtypes to see the format of your vaiables.

Related

Django ValueError Can only compare identically-labeled Series objects

i am getting this error. one df dataframe is read from json API and second df2 is read from csv i want to compare one column of csv to API and then matched value to save into new csv. can anyone help me
df2=pd.read_csv(file_path)
r = requests.get('https://data.ct.gov/resource/6tja-6vdt.json')
df = pd.DataFrame(r.json())
df['verified'] = np.where(df['salespersoncredential'] == df2['salespersoncredential'],'True', 'False')
print(df)
Probably just make df['verified'] = np.where(df['salespersoncredential'] == df2['salespersoncredential'],'True', 'False')
this
df['verified'] = df['salespersoncredential'] == df2['salespersoncredential']
assuming the dtypes and are correct.
If the indexes are different on the two dataframes, you might need to .reset_index().

How to change the format of the date in a .csv file using Python 2.7?

I have seen this question asked on Stack Overflow multiple times before, however, what I have not seen is anyone either ask nor answer the question properly here. So, here is my question: I have a .csv file named selected.csv, with three columns - Date (LT), AQI and Raw Conc. The date is in the format dd-mm-yyyy hh:mm. I want to convert the format of the date to yyyy-mm-dd, thereafter, saving the corrected data with three columns - Date, AQI and Raw Conc. as corrected.csv. I have tried the code typed below, but to no avail.
import csv
from datetime import datetime
output_file = open(r"C:\Users\Win-8.1\Desktop\delhi\corrected.csv", "wb")
fieldnames = ['Date', 'AQI' , 'Raw Conc.']
writer = csv.DictWriter(output_file, fieldnames = fieldnames)
writer.writeheader()
with open(r"C:\Users\Win-8.1\Desktop\delhi\selected.csv") as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
output_row = {}
output_row['Date'] = datetime.datetime.strptime(row['Date (LT)'], '%d-%m-%Y %H:%M').strftime('%Y-%m-%d')
output_row['AQI'] = row['AQI']
output_row['Raw Conc.'] = row['Raw Conc.']
writer.writerow(output_row)
output_file.close()
output_row['Date'] = datetime.strptime(row['Date (LT)'], '%d-%m-%Y %H:%M').strftime('%Y-%m-%d')
You are using also %H:%M when you mentioned the format is DD-MM-YYYY. Maybe you didn't intend this?

Pandas dataframe left align data

How do I set the data within a dataframe to be left-aligned?
I'm using python 2.7.13.
This question has been asked before but the accepted answer didn't even work.
The answer given was:
df.style.set_properties(**{'text-align': 'left'})
It doesn't work, my data is still right aligned.
Does anyone know how? Do I have to import any modules other than pandas?
Case 1: Styling to print as html
df.style.set_properties returns an object of type pandas.io.formats.style.Styler
type(df.style.set_properties(**{'text-align': 'left'}))
Out[37]: pandas.io.formats.style.Styler
Which is meant to be rendered as an html string as follows:
s = df.style.set_properties(**{'text-align': 'left'})
s.render()
Then you can use the result of s.render() in your HTML file.
Case 2: Align data left as a dataframe
If you are looking for a way to remove left whitespaces from the values in your dataFrame and leave the data within a dataframe, here's an example on how to do that:
df = pd.DataFrame([[' a',' b'],[' c', ' d']], columns=list('AB'))
df = df.stack().str.lstrip().unstack()
output:
A B
0 a b
1 c d

Renaming columns using column data from another dataframe

I have two dataframes - an original file that looks like this:
Gene Symbol, 10555, 10529, 10519
Map7, .184, .026, .207
nan, .348, .041, .187
Cpm, .45, .278, .453
and a reference file that looks like this:
Experiment_Num, Microarray, Experiment_Name, Chip_Name
10555, Genechip, Famotidine-5d, RG230-2
10529, Genechip, Famotidine-3d, RG230-2
10519, MMchip, Dicyclomine-3d, R01
I am trying to merge them in a way that the header of the original file display the Experiment_Name rather than just the Experiment_Num as follows:
Gene symbol, Famotidine-5d, Famotidine-3d, Dicyclomine-3d
Map7, .184, .026, .207
nan, .348, .041, .187
Cpm, .45, .278, .453
My code is completely written using pandas and looks as follows:
import pandas as pd
df = ('ftp://anonftp.niehs.nih.gov/drugmatrix/Differentially_expressed_gene_lists_directly_from_DrugMatrix/Affymetrix/Affymetrix_annotation.txt', sep='\t', dtype=str)
# Reference file
df2.columns = df2.columns.to_series().replace(df.set_index('Experiment').Compound_Name)
#Original File
df2
I tried to convert the columns of the original DF to it's series representation and then replace the old value which were the Experiment_Num. with the new Experiment_name retrieved from the reference DF, but keep getting
KeyError: 'Experiment'
I tried figuring out what could be causing a KeyError, but found that there are so many possibilities, none of which seems to fix my particular issue.
Thanks for the help if possible!
Troy

Python/Pandas: How do I convert from datetime64[ns] to datetime

I have a script that processes an Excel file. The department that sends it has a system that generated it, and my script stopped working.
I suddenly got the error Can only use .str accessor with string values, which use np.object_ dtype in pandas for the following line of code:
df['DATE'] = df['Date'].str.replace(r'[^a-zA-Z0-9\._/-]', '')
I checked the type of the date columns in the file from the old system (dtype: object) vs the file from the new system (dtype: datetime64[ns]).
How do I change the date format to something my script will understand?
I saw this answer but my knowledge about date formats isn't this granular.
You can use apply function on the dataframe column to convert the necessary column to String. For example:
df['DATE'] = df['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))
Make sure to import datetime module.
apply() will take each cell at a time for evaluation and apply the formatting as specified in the lambda function.
pd.to_datetime returns a Series of datetime64 dtype, as described here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
df['DATE'] = df['Date'].dt.date
or this:
df['Date'].map(datetime.datetime.date)
You can use pd.to_datetime
df['DATE'] = pd.to_datetime(df['DATE'])