deleting semicolons in a column of csv in python - python-2.7

I have a column of different times and I want to find the values in between 2 different times but can't find out how? For example: 09:04:00 threw 09:25:00. And just use the values in between those different times.
I was gonna just delete the semicolons separating hours:minutes:seconds and do it that way. But really don't know how to do that. But I know how to find a value in a column so I figured that way would be easier idk.
Here is the csv I'm working with.
DATE,TIME,OPEN,HIGH,LOW,CLOSE,VOLUME
02/03/1997,09:04:00,3046.00,3048.50,3046.00,3047.50,505
02/03/1997,09:05:00,3047.00,3048.00,3046.00,3047.00,162
02/03/1997,09:06:00,3047.50,3048.00,3047.00,3047.50,98
02/03/1997,09:07:00,3047.50,3047.50,3047.00,3047.50,228
02/03/1997,09:08:00,3048.00,3048.00,3047.50,3048.00,136
02/03/1997,09:09:00,3048.00,3048.00,3046.50,3046.50,174
02/03/1997,09:10:00,3046.50,3046.50,3045.00,3045.00,134
02/03/1997,09:11:00,3045.50,3046.00,3044.00,3045.00,43
02/03/1997,09:12:00,3045.00,3045.50,3045.00,3045.00,214
02/03/1997,09:13:00,3045.50,3045.50,3045.50,3045.50,8
02/03/1997,09:14:00,3045.50,3046.00,3044.50,3044.50,152
02/03/1997,09:15:00,3044.00,3044.00,3042.50,3042.50,126
02/03/1997,09:16:00,3043.50,3043.50,3043.00,3043.00,128
02/03/1997,09:17:00,3042.50,3043.50,3042.50,3043.50,23
02/03/1997,09:18:00,3043.50,3044.50,3043.00,3044.00,51
02/03/1997,09:19:00,3044.50,3044.50,3043.00,3043.00,18
02/03/1997,09:20:00,3043.00,3045.00,3043.00,3045.00,23
02/03/1997,09:21:00,3045.00,3045.00,3044.50,3045.00,51
02/03/1997,09:22:00,3045.00,3045.00,3045.00,3045.00,47
02/03/1997,09:23:00,3045.50,3046.00,3045.00,3045.00,77
02/03/1997,09:24:00,3045.00,3045.00,3045.00,3045.00,131
02/03/1997,09:25:00,3044.50,3044.50,3043.50,3043.50,138
02/03/1997,09:26:00,3043.50,3043.50,3043.50,3043.50,6
02/03/1997,09:27:00,3043.50,3043.50,3043.00,3043.00,56
02/03/1997,09:28:00,3043.00,3044.00,3043.00,3044.00,32
02/03/1997,09:29:00,3044.50,3044.50,3044.50,3044.50,63
02/03/1997,09:30:00,3045.00,3045.00,3045.00,3045.00,28
02/03/1997,09:31:00,3045.00,3045.50,3045.00,3045.50,75
02/03/1997,09:32:00,3045.50,3045.50,3044.00,3044.00,54
02/03/1997,09:33:00,3043.50,3044.50,3043.50,3044.00,96
02/03/1997,09:34:00,3044.00,3044.50,3044.00,3044.50,27
02/03/1997,09:35:00,3044.50,3044.50,3043.50,3044.50,44
02/03/1997,09:36:00,3044.00,3044.00,3043.00,3043.00,61
02/03/1997,09:37:00,3043.50,3043.50,3043.50,3043.50,18
Thanks for the time

If you just want to replace semicolons with commas you can use the built in string replace function.
line = '02/03/1997,09:24:00,3045.00,3045.00,3045.00,3045.00,131'
line = line.replace(':',',')
print(line)
Output
02/03/1997,09,04,00,3046.00,3048.50,3046.00,3047.50,505
Then split on commas to separate the data.
line.split(',')
If you only want the numerical values you could also do the following (using a regular expression):
import re
line = '02/03/1997,09:04:00,3046.00,3048.50,3046.00,3047.50,505'
values = [float(x) for x in re.sub(r'[^\w.]+', ',', line).split(',')]
print values
Which gives you a list of numerical values that you can process.
[2.0, 3.0, 1997.0, 9.0, 4.0, 0.0, 3046.0, 3048.5, 3046.0, 3047.5, 505.0]

Use the csv module! :)
>>>import csv
>>> with open('myFile.csv', newline='') as csvfile:
... myCsvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
... for row in myCsvreader:
... for item in row:
... item.spit(':') # Returns hours without semicolons
Once you extracted different time stamps, you can use the datetime module, such as:
from datetime import datetime, date, time
x = time(hour=9, minute=30, second=30)
y = time(hour=9, minute=30, second=42)
diff = datetime.combine(date.today(), y) - datetime.combine(date.today(), x)
print diff.total_seconds()

Related

Identifying dates in a string in python using dateutil returning no output

I am trying to identify dates from a column containing text entries and output the dates to a text file. However, my code is not returning any output. I can't seem to figure out what I did wrong in my code. I'd appreciate some help on this.
My Code:
import csv
from dateutil.parser import parse
with open('file1.txt', 'r') as f_input, open('file2.txt', 'w') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for row in csv_input:
x = str(row[3])
def is_date(x):
try:
parse(x)
csv_output.writerow([row[0], row[1], row[2], row[3], row[4]])
# no return value in case of success
except ValueError:
return False
is_date(x)
Guessing somewhat you input like e.g.:
1,2,3, This is me on march first of 2018 at 2:15 PM, 2015
3,4,5, She was born at 12pm on 9/11/1980, 2015
a version of what you want could be
from dateutil.parser import parse
with open("input.txt", 'r') as inFilePntr, open("output.txt", 'w') as outFilePntr:
for line in inFilePntr:
clmns = line.split(',')
clmns[3] = parse( clmns[3], fuzzy_with_tokens=True )[0].strftime("%Y-%m-%d %H:%M:%S")
outFilePntr.write( ', '.join(clmns) )
Note, as you do not touch the other columns, I just leave them as text. Hence, no need for csv. You never did anything with the return value of parse. I use the fuzzy token, as my column three has the date somewhat hidden in other text. The returned datetime object is transformed into a string of my liking (see here) and inserted in column three, replacing the old value.
I recombine the strings with comma separation again an write it into output.txt, which looks like:
1, 2, 3, 2018-03-01 14:15:00, 2015
3, 4, 5, 1980-09-11 12:00:00, 2015

Python: Write two columns in csv for many lines

I have two parameters like filename and time and I want to write them in a column in a csv file. These two parameters are in a for-loop so their value is changed in each iteration.
My current python code is the one below but the resulting csv is not what I want:
import csv
import os
with open("txt/scalable_decoding_time.csv", "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
filename = ["one","two", "three"]
time = ["1","2", "3"]
zipped_lists = zip(filename,time)
for row in zipped_lists:
print row
writer.writerow(row)
My csv file must be like below. The , must be the delimeter. So I must get two columns.
one, 1
two, 2
three, 3
My csv file now reads as the following picture. The data are stored in one column.
Do you know how to fix this?
Well, the issue here is, you are using writerows instead of writerow
import csv
import os
with open("scalable_decoding_time.csv", "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
level_counter = 0
max_levels = 3
filename = ["one","two", "three"]
time = ["1","2", "3"]
while level_counter < max_levels:
writer.writerow((filename[level_counter], time[level_counter]))
level_counter = level_counter +1
This gave me the result:
one,1
two,2
three,3
Output:
This is another solution
Put the following code into a python script that we will call sc-123.py
filename = ["one","two", "three"]
time = ["1","2", "3"]
for a,b in zip(filename,time):
print('{}{}{}'.format(a,',',b))
Once the script is ready, run it like that
python2 sc-123.py > scalable_decoding_time.csv
You will have the results formatted the way you want
one,1
two,2
three,3

parsing records with key value pairs in python

I have a file with millions of records like this
2017-07-24 18:34:23|CN:SSL|RESPONSETIME:23|BYTESIZE:1456|CLIENTIP:127.0.0.9|PROTOCOL:SSL-V1.2
Each record contains around 30 key-value pairs with "|" delimeter. Key-value pair position is not constant.
Trying to parse these records using python dictionary or list concepts.
Note: 1st column is not in key-value format
your file is basically a |-separated csv file holding first the timestamp, then 2 fields separated by :.
So you could use csv module to read the cells, then pass the result of str.split to a dict in a gencomp to build the dictionary for all elements but the first one.
Then update the dict with the timestamp:
import csv
list_of_dicts = []
with open("input.txt") as f:
cr = csv.reader(f,delimiter="|")
for row in cr:
d = dict(v.split(":") for v in row[1:])
d["date"] = row[0]
list_of_dicts.append(d)
list_of_dicts contains dictionaries like
{'date': '2017-07-24 18:34:23', 'PROTOCOL': 'SSL-V1.2', 'RESPONSETIME': '23', 'CN': 'SSL', 'CLIENTIP': '127.0.0.9', 'BYTESIZE': '1456'}
You repeat the below process for all the lines in your code. I am not clear about the date time value. So I haven't included that in the input. You can include it based on your understanding.
import re
given = "CN:SSL|RESPONSETIME:23|BYTESIZE:1456|CLIENTIP:127.0.0.9|PROTOCOL:SSL-
V1.2"
results = dict()
list_for_this_line = re.split('\|',given)
for i in range(len(list_for_this_line)):
separated_k_v = re.split(':',list_for_this_line[i])
results[separated_k_v[0]] = separated_k_v[1]
print results
Hope this helps!

Displaying timestamp in textarea using matplotlib

i have a map that i plotted using matplotlib from a csv file that I read using pandas, i need to display the date of my data in a textearea so i am doing this:
Start =data.index.max()
End = data.index.min()
txt = 'Date debut:',End,'Date fin:',Start
props1 = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
ax.text(0.17, 0.17, txt, transform=ax.transAxes, fontsize=8, bbox=props1, family = 'monospace')
plt.show()
And i got this results :
As you can see it's not a really satisfying result, so i need to adjust the text written on the map to the right bottom out of the map,insert a space between date début et date fin and finally hide the'timestamp' from the textarea and leave only the dates, how can I proceed ?
The text can be positionned using the first two arguments; just replace the numbers 0.17 by something else. In this respect it may help to use ha and va (horizontal and vertical alignment) and set them such that the coordinates can be easily chosen (e.g. ha="right" makes sense when specifying coordinates at the right side of the plot). Note that you may well chose negative values if that makes sense to you.
To format the string nicely you first want to convert the Timestamp to a string. This is done using the strftime method. As argument you specify a formatting sting, e.g. "%d %b %Y" for day month year format. A complete set of formatting option can of course be found in the python documentation.
A complete example may be:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
d = pd.date_range("2017-01-01","2017-06-30",freq="D" )
x = np.random.rand(len(d))
data = pd.DataFrame(x, index=d)
fig, ax = plt.subplots()
start = data.index.min().strftime("%d %b %Y")
end = data.index.max().strftime("%d %b %Y")
txt = "Date debut: {}, date fin: {}".format(start, end)
props1 = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
ax.text(0.98, 0.03, txt, transform=ax.transAxes, fontsize=8, bbox=props1,
family = 'monospace', ha="right", va="bottom")
plt.show()

python/pandas:need help adding double quotes to columns

I need to add double quotes to specific columns in a csv file that my script generates.
Below is the goofy way I thought of doing this. For these two fixed-width fields, it works:
df['DATE'] = df['DATE'].str.ljust(9,'"')
df['DATE'] = df['DATE'].str.rjust(10,'"')
df['DEPT CODE'] = df['DEPT CODE'].str.ljust(15,'"')
df[DEPT CODE'] = df['DEPT CODE'].str.rjust(16,'"')
For the following field, it doesn't. It has a variable length. So, if the value is shorter than the standard 6-digits, I get extra double-quotes: "5673"""
df['ID'] = df['ID'].str.ljust(7,'"')
df['ID'] = df['ID'].str.rjust(8,'"')
I have tried zfill, but the data in the column is a series-- I get "pandas.core.series.Series" when i run
print type(df['ID'])
and I have not been able to convert it to string using astype. I'm not sure why. I have not imported numpy.
I tried using len() to get the length of the ID number and pass it to str.ljust and str.rjust as its first argument, but I think it got hung up on the data not being a string.
Is there a simpler way to apply double-quotes as I need, or is the zfill going to be the way to go?
You can add a speech mark before / after:
In [11]: df = pd.DataFrame([["a"]], columns=["A"])
In [12]: df
Out[12]:
A
0 a
In [13]: '"' + df['A'] + '"'
Out[13]:
0 "a"
Name: A, dtype: object
Assigning this back:
In [14]: df['A'] = '"' + df.A + '"'
In [15]: df
Out[15]:
A
0 "a"
If it's for exporting to csv you can use the quoting kwarg:
In [21]: df = pd.DataFrame([["a"]], columns=["A"])
In [22]: df.to_csv()
Out[22]: ',A\n0,a\n'
In [23]: df.to_csv(quoting=1)
Out[23]: '"","A"\n"0","a"\n'
With numpy, not pandas, you can specify the formatting method when saving to a csv file. As very simple example:
In [209]: np.savetxt('test.txt',['string'],fmt='%r')
In [210]: cat test.txt
'string'
In [211]: np.savetxt('test.txt',['string'],fmt='"%s"')
In [212]: cat test.txt
"string"
I would expect the pandas csv writer to have a similar degree of control, if not more.