I have a txt file oldDates.txt. I want to loop through it, modify the date formats and write the newly formatted dates to a new txt file. My code so far:
from datetime import datetime
f = open('oldDates.txt', r)
oldDates = []
newDates = []
for line in f.readlines():
oldDates.append(line)
print(line) # for testing
for oldDate in oldDates:
dt = datetime.strptime(oldDate, '%d/%m/%Y').strftime('%d/%m/%Y')
newDates.append(dt)
with open('newDates.txt', 'w') as w:
for newDate in newDates:
w.write(newDate+"\n")
f.close()
w.close()
However, this give an error:
ValueError: unconverted data remains
I'm not sure where I'm going wrong here, and if there's a more efficient way of doing this then I'd be glad to hear about it. The date conversion seems to work fine from the test print.
There are blank lines in the file and I'm wondering if I need to handle these (I'm not sure how).
Any help much appreciated!
Now, I am no expert in Python, but do you verify your input to be the correct format ? If you apply a regexp on the input line you can easily catch blank lines and any incorrect values.
Related
I have created a script which will give you the match rows between the two files. Post that, I am returning the output file to a function, which will be used the file as input to create pivot using pandas.
But somehow, something seems to be wrong, below is the code snippet
def CreateSummary(file):
out_file = file
file_df = pd.read_csv(out_file) ## This function is appending NULL Bytes at
the end of the file
#print file_df.head(2)
The above code is giving me the error as
ValueError: No columns to parse from file
Tried another approach:
file_df = pd.read_csv(out_file,delim_whitespace=True,engine='python')
##This gives me error as
_csv.Error: line contains NULL byte
Any suggestions and criticism is highly appreciated.
I am trying to build a tool that can convert .csv files into .yaml files for further use. I found a handy bit of code that does the job nicely from the link below:
Convert CSV to YAML, with Unicode?
which states that the line will take the dict created by opening a .csv file and dump it to a .yaml file:
out_file.write(ry.safe_dump(dict_example,allow_unicode=True))
However, one small kink I have noticed is that when it is run once, the generated .yaml file is typically incomplete by a line or two. In order to have the .csv file exhaustively read through to create a complete .yaml file, the code must be run two or even three times. Does anybody know why this could be?
UPDATE
Per request, here is the code I use to parse my .csv file, which is two columns long (with a string in the first column and a list of two strings in the second column), and will typically be 50 rows long (or maybe more). Also note that it designed to remove any '\n' or spaces that could potentially cause problems later on in the code.
csv_contents={}
with open("example1.csv", "rU") as csvfile:
green= csv.reader(csvfile, dialect= 'excel')
for line in green:
candidate_number= line[0]
first_sequence= line[1].replace(' ','').replace('\r','').replace('\n','')
second_sequence= line[2].replace(' ','').replace('\r','').replace('\n','')
csv_contents[candidate_number]= [first_sequence, second_sequence]
csv_contents.pop('Header name', None)
Ultimately, it is not that important that I maintain the order of the rows from the original dict, just that all the information within the rows is properly structured.
I am not sure what would cause could be but you might be running out of memory as you create the YAML document in memory first and then write it out. It is much better to directly stream it out.
You should also note that the code in the question you link to, doesn't preserve the order of the original columns, something easily circumvented by using round_trip_dump instead of safe_dump.
You probably want to make a top-level sequence (list) as in the desired output of the linked question, with each element being a mapping (dict).
The following parses the CSV, taking the first line as keys for mappings created for each following line:
import sys
import csv
import ruamel.yaml as ry
import dateutil.parser # pip install python-dateutil
def process_line(line):
"""convert lines, trying, int, float, date"""
ret_val = []
for elem in line:
try:
res = int(elem)
ret_val.append(res)
continue
except ValueError:
pass
try:
res = float(elem)
ret_val.append(res)
continue
except ValueError:
pass
try:
res = dateutil.parser.parse(elem)
ret_val.append(res)
continue
except ValueError:
pass
ret_val.append(elem.strip())
return ret_val
csv_file_name = 'xyz.csv'
data = []
header = None
with open(csv_file_name) as inf:
for line in csv.reader(inf):
d = process_line(line)
if header is None:
header = d
continue
data.append(ry.comments.CommentedMap(zip(header, d)))
ry.round_trip_dump(data, sys.stdout, allow_unicode=True)
with input xyz.csv:
id, title_english, title_russian
1, A Title in English, Название на русском
2, Another Title, Другой Название
this generates:
- id: 1
title_english: A Title in English
title_russian: Название на русском
- id: 2
title_english: Another Title
title_russian: Другой Название
The process_line is just some sugar that tries to convert strings in the CSV file to more useful types and strings without leading spaces (resulting in far less quotes in your output YAML file).
I have tested the above on files with 1000 rows, without any problems (I won't post the output though).
The above was done using Python 3 as well as Python 2.7, starting with a UTF-8 encoded file xyz.csv. If you are using Python 2, you can try unicodecsv if you need to handle Unicode input and things don't work out as well as they did for me.
I am trying to extract a dynamic value (static characters) from a csv file in a specific column and output the value to another csv.
The data element I am trying to extract is '12385730561818101591' from the value 'callback=B~12385730561818101591' located in a specific column.
I have written the below python script, but the output results are always blank. The regex '=(~[0-9]+)' was validated to successfully pull out the '12385730561818101591' value. This was tested on www.regex101.com.
When I use this in Python, no results are displayed in the output file. I have a feeling the '~' is causing the error. When I tried searching for '~' in the original CSV file, no results were found, but it is there!
Can the community help me with the following:
(1) Determine root cause of no output and validate if '~' is the problem. Could the problem also be the way I'm splitting the rows? I'm not sure if the rows should be split by ';' instead of ','.
import csv
import sys
import ast
import re
filename1 = open("example.csv", "w")
with open('example1.csv') as csvfile:
data = None
patterns = '=(~[0-9]+)'
data1= csv.reader(csvfile)
for row in data1:
var1 = row[57]
for item in var1.split(','):
if re.search(patterns, item):
for data in item:
if 'common' in data:
filename1.write(data + '\n')
filename1.close()
Here I have tried to write sample code. Hope this will help you in solving the problem:
import re
str="callback=B~12385730561818101591"
rc=re.match(r'.*=B\~([0-9A-Ba-b]+)', str)
print rc.group(1)
You regex is wrong for your example :
=(~[0-9]+) will never match callback=B~12385730561818101591 because of the B after the = and before the ~.
Also you include the ~ in the capturing group.
Not exatly sure what's your goal but this could work. Give more details if you have more restrictions.
=.+~([0-9]+)
EDIT
Following the new provided information :
patterns = '=.+~([0-9]+)'
...
result = re.search(patterns, item):
number = result.group(0)
filename1.write(number + '\n')
...
Concerning your line split on the \t (tabulation) you should show an example of the full line
I've got a csv with:
T,8,101
T,10,102
T,5,103
and need to search the csv file, in the 3rd column for my input, and if found, return the 2nd column value in that same row (searching "102" would return "10"). I then need to save the result to use in another calculation. (I am just trying to print the result for now..) I am new to python (2 weeks) and wanted to get a grasp on reading/writing in csv files. All the searchable results, didn't give me the answer I needed. Thanks
Here is my code:
name = input("waiting")
import csv
with open('cards.csv', 'rt') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
if row[2] == name:
print(row[1])
As stated in my comment above I would implement a general approach without using the csv-module like this:
import io
s = """T,8,101
T,10,102
T,5,103"""
# use io.StringIO to get a file-like object
f = io.StringIO(s)
lines = [tuple(line.split(',')) for line in f.read().splitlines()]
example_look_up = '101'
def find_element(look_up):
for t in lines:
if t[2] == look_up:
return t[1]
result = find_element(example_look_up)
print(result)
Please keep in mind, that this is Python3-code. You need to replace print() with print if using with Python2 and maybe change something related to the StringIO which I am using for demonstration purposes here in order to get a file-like object. However, this snippet should give you a basic idea about a possible solution.
When I run the below code it prints out a calendar for an entire year (which I don't want). I want it to write to file but it won't. It also returns the error message TypeError: expected a character buffer object. Also, casting it into a string doesn't work.
import calendar
cal = calendar.prcal(2015)
with open('yr2015.txt', 'w') as wf:
wf.write(cal)
As an example, the below code prints one month of a year and returns a string, but this isn't what I want
print calendar.month(2015, 4)
print type(calendar.month(2015, 4))
So when I run the below code I get the error message <type 'NoneType'>. It seems to me that it should be a string but obviously isn't. Any suggestions on how I can get a 12-month calendar into a text file?
print type(calendar.prcal(2015))
prcal doesn't return anything. Use cal = calendar.TextCalendar().formatyear(2015) Instead.
Your question is a bit confusing. However, you don't have to make python write to the file.
Let python write to stdout and redirect stdout to a file
python myCal.py > yr2015.txt
This should do the trick
That is because calendar.prcal() will only print the calendar of an year. And it wont return you any values. So this line in your code print type(calendar.prcal(2015)) will return none type error.