Python csv file determine data type - python-2.7

new student here so please excuse my ignorance, I have searched a lot and not found a solution to my problem. I am needing to import a CSV file with mixed data types [ int, float, and string], determine the data type, then do maths on the ints and floats.
The problem is that csv reader converts everything to strings ( or they are already strings?). I can try and convert to float, and if it throws an error message I know it is a string, but how would I tell if it is a float, as my program needs to determine between the two?
I am only allowed to import CSV and no others. This is my second first-year python subject, and really not sure how to do this.
Edit, found one answer that seems similar to my problem, but it still returns the wrong answers, ints are usually, but not always, still returned as string type:
import csv
tests = [
# (Type, Test)
(int, int),
(float, float),
]
def getType(value):
for typ, test in tests:
try:
test(value)
return typ
except ValueError:
print 'value error'
continue
# No match
return str
file = open('adult.csv')
reader = csv.reader(file)
filename = 'output.xml'
text = open(filename, 'w')
text.write('<?xml version="1.0"?>')
text.write('<!DOCTYPE summary [')
headers = reader.next()
for i in headers:
print '<name>'
print i
print '</name>'
print '<dataType>'
for a in i[1]:
print getType[a]
#for row in fields:
# text = row[2]
# print type(text)
print '</dataType>'
#for value in i:
# print type(value)
print '<!ELEMENT summary\n\n>'
#text.write('<element>')

Ok sorry everybody, think i found some code i think will work:
determine the type of a value which is represented as string in python
SHOULD have searched harder, although still not reliably giving the correct type

Related

Python iterate through txt file of dates and change date format

I have a txt file oldDates.txt. I want to loop through it, modify the date formats and write the newly formatted dates to a new txt file. My code so far:
from datetime import datetime
f = open('oldDates.txt', r)
oldDates = []
newDates = []
for line in f.readlines():
oldDates.append(line)
print(line) # for testing
for oldDate in oldDates:
dt = datetime.strptime(oldDate, '%d/%m/%Y').strftime('%d/%m/%Y')
newDates.append(dt)
with open('newDates.txt', 'w') as w:
for newDate in newDates:
w.write(newDate+"\n")
f.close()
w.close()
However, this give an error:
ValueError: unconverted data remains
I'm not sure where I'm going wrong here, and if there's a more efficient way of doing this then I'd be glad to hear about it. The date conversion seems to work fine from the test print.
There are blank lines in the file and I'm wondering if I need to handle these (I'm not sure how).
Any help much appreciated!
Now, I am no expert in Python, but do you verify your input to be the correct format ? If you apply a regexp on the input line you can easily catch blank lines and any incorrect values.

Why must I run this code a few times before my entire .csv file is converted into a .yaml file?

I am trying to build a tool that can convert .csv files into .yaml files for further use. I found a handy bit of code that does the job nicely from the link below:
Convert CSV to YAML, with Unicode?
which states that the line will take the dict created by opening a .csv file and dump it to a .yaml file:
out_file.write(ry.safe_dump(dict_example,allow_unicode=True))
However, one small kink I have noticed is that when it is run once, the generated .yaml file is typically incomplete by a line or two. In order to have the .csv file exhaustively read through to create a complete .yaml file, the code must be run two or even three times. Does anybody know why this could be?
UPDATE
Per request, here is the code I use to parse my .csv file, which is two columns long (with a string in the first column and a list of two strings in the second column), and will typically be 50 rows long (or maybe more). Also note that it designed to remove any '\n' or spaces that could potentially cause problems later on in the code.
csv_contents={}
with open("example1.csv", "rU") as csvfile:
green= csv.reader(csvfile, dialect= 'excel')
for line in green:
candidate_number= line[0]
first_sequence= line[1].replace(' ','').replace('\r','').replace('\n','')
second_sequence= line[2].replace(' ','').replace('\r','').replace('\n','')
csv_contents[candidate_number]= [first_sequence, second_sequence]
csv_contents.pop('Header name', None)
Ultimately, it is not that important that I maintain the order of the rows from the original dict, just that all the information within the rows is properly structured.
I am not sure what would cause could be but you might be running out of memory as you create the YAML document in memory first and then write it out. It is much better to directly stream it out.
You should also note that the code in the question you link to, doesn't preserve the order of the original columns, something easily circumvented by using round_trip_dump instead of safe_dump.
You probably want to make a top-level sequence (list) as in the desired output of the linked question, with each element being a mapping (dict).
The following parses the CSV, taking the first line as keys for mappings created for each following line:
import sys
import csv
import ruamel.yaml as ry
import dateutil.parser # pip install python-dateutil
def process_line(line):
"""convert lines, trying, int, float, date"""
ret_val = []
for elem in line:
try:
res = int(elem)
ret_val.append(res)
continue
except ValueError:
pass
try:
res = float(elem)
ret_val.append(res)
continue
except ValueError:
pass
try:
res = dateutil.parser.parse(elem)
ret_val.append(res)
continue
except ValueError:
pass
ret_val.append(elem.strip())
return ret_val
csv_file_name = 'xyz.csv'
data = []
header = None
with open(csv_file_name) as inf:
for line in csv.reader(inf):
d = process_line(line)
if header is None:
header = d
continue
data.append(ry.comments.CommentedMap(zip(header, d)))
ry.round_trip_dump(data, sys.stdout, allow_unicode=True)
with input xyz.csv:
id, title_english, title_russian
1, A Title in English, Название на русском
2, Another Title, Другой Название
this generates:
- id: 1
title_english: A Title in English
title_russian: Название на русском
- id: 2
title_english: Another Title
title_russian: Другой Название
The process_line is just some sugar that tries to convert strings in the CSV file to more useful types and strings without leading spaces (resulting in far less quotes in your output YAML file).
I have tested the above on files with 1000 rows, without any problems (I won't post the output though).
The above was done using Python 3 as well as Python 2.7, starting with a UTF-8 encoded file xyz.csv. If you are using Python 2, you can try unicodecsv if you need to handle Unicode input and things don't work out as well as they did for me.

Python 2.7: Returning a value in a csv file from input

I've got a csv with:
T,8,101
T,10,102
T,5,103
and need to search the csv file, in the 3rd column for my input, and if found, return the 2nd column value in that same row (searching "102" would return "10"). I then need to save the result to use in another calculation. (I am just trying to print the result for now..) I am new to python (2 weeks) and wanted to get a grasp on reading/writing in csv files. All the searchable results, didn't give me the answer I needed. Thanks
Here is my code:
name = input("waiting")
import csv
with open('cards.csv', 'rt') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
if row[2] == name:
print(row[1])
As stated in my comment above I would implement a general approach without using the csv-module like this:
import io
s = """T,8,101
T,10,102
T,5,103"""
# use io.StringIO to get a file-like object
f = io.StringIO(s)
lines = [tuple(line.split(',')) for line in f.read().splitlines()]
example_look_up = '101'
def find_element(look_up):
for t in lines:
if t[2] == look_up:
return t[1]
result = find_element(example_look_up)
print(result)
Please keep in mind, that this is Python3-code. You need to replace print() with print if using with Python2 and maybe change something related to the StringIO which I am using for demonstration purposes here in order to get a file-like object. However, this snippet should give you a basic idea about a possible solution.

python insert to postgres over psycopg2 unicode characters

Hi guys I am having a problem with inserting utf-8 unicode character to my database.
The unicode that I get from my form is u'AJDUK MARKO\u010d'. Next step is to decode it to utf-8. value.encode('utf-8') then I get a string 'AJDUK MARKO\xc4\x8d'.
When I try to update the database, works the same for insert btw.
cur.execute( "UPDATE res_partner set %s = '%s' where id = %s;"%(columns, value, remote_partner_id))
The value gets inserted or updated to the database but the problem is it is exactly in the same format as AJDUK MARKO\xc4\x8d and of course I want AJDUK MARKOČ. Database has utf-8 encoding so it is not that.
What am I doing wrong? Surprisingly couldn't really find anything useful on the forums.
\xc4\x8d is the UTF-8 encoding representation of Č. It looks like the insert has worked but you're not printing the result correctly, probably by printing the whole row as a list. I.e.
>>> print "Č"
"Č"
>>> print ["Č"] # a list with one string
['\xc4\x8c']
We need to see more code to validate (It's always a good idea to give as much reproducible code as possible).
You could decode the result (result.decode("utf-8")) but you should avoid manually encoding or decoding. Psycopg2 already allows you send Unicodes, so you can do the following without encoding first:
cur.execute( u"UPDATE res_partner set %s = '%s' where id = %s;" % (columns, value, remote_partner_id))
- note the leading u
Psycopg2 can return Unicodes too by having strings automatically decoded:
import psycopg2
import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
Edit:
SQL values should be passed as an argument to .execute(). See the big red box at: http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters
Instead
E.g.
# Replace the columns field first.
# Strictly we should use http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql
sql = u"UPDATE res_partner set {} = %s where id = %s;".format(columns)
cur.execute(sql, (value, remote_partner_id))

Index Out of Range on Tuple Integer Converted To String

I'm trying to print the last digit of an integer that's been pulled from a database and converted to a string. normally, this code works fine to print the last character of a string even if its a number thats been coverted to a string:
x = 225
print x[-1]
5
However, when I try to do the same thing on a value thats been pulled from a database, the Python interpreter gives me a string index out of range error.
Here is my code:
import MySQLdb
#mysql card_numbercheck
class NumberCheck(object):
def __init__(self):
self.conn = MySQLdb.connect(host='localhost', user='root', passwd='', db='mscan')
self.c = self.conn.cursor()
def query(self, arg, cardname):
self.c.execute(arg, cardname)
return self.c
def __del__(self):
self.conn.close()
# Define SQL statement to select all Data from Column Name
sql = "SELECT card_number FROM inventory_2 WHERE card_name = %s"
#Connect To DB and Get Number of Card.
def Get_MTG_Number():
global card_number
MtgNumber = NumberCheck()
for number in MtgNumber.query(sql, 'Anathemancer'):
card_number = str(number[0])
print card_number[-1]
Get_MTG_Number()
Logically, I don't really understand why this code wouldn't work. Any help would be highly appreciated.
Kind Regards
Jack
Maybe some of the data fields are blank, and by converting to a string you are just getting an empty string? Trying to index such a string will give you an 'index out of range' error.
To see what is going on, you could simply print card_number upon each iteration of the loop. You could also do with a bit more error handling to sanitize the data you are getting from a database.