Write tuple to csv by skipping missing columns - python-2.7

I have a list of ordered tuples which each tuple contains column name and value pair to be written to a csv for example
lst = [('name','bob'),('age',19),('loc','LA')]
which has in for for bob, age 19 and location, loc, in LA. I want to be able to write this to CSV file based on column names and sometimes some of these columns are missing, for example for another row.
lst2 = [('name','bob'),('loc','LA')]
age is missing, how I can write these rows properly in python to a csv?

Those tuples can be used to initialize a dict so csv.DictWriter seems the best choice. In this example I create a dict filled with default values. For each list of tuples, I copy the dict, update with the known values and write it out.
import csv
# sample data
lst = [('name','bob'),('age',19),('loc','LA')]
lst2 = [('name','jane'),('loc','LA')]
lists = [lst, lst2]
# columns need some sort of default... I just guessed
defaults = {'name':'', 'age':-1, 'loc':'N/A'}
with open('output.csv', 'wb') as outfile:
writer = csv.DictWriter(outfile, fieldnames=sorted(defaults.keys()))
writer.writeheader()
for row_tuples in lists:
# copy defaults then update with known values
kv = defaults.copy()
kv.update(row_tuples)
writer.writerow(kv)
# debug...
print open('output.csv').read()

You should give more examples, as to what exactly is required- as what if the location is not given in ls2 then what do you want to write to your csv? From what I understand, you can make a function and default argument:
import csv
def write_tuples_to_csv(name="DefaultName", age="DefaultAge", loc="Default location"):
writer = csv.writer(open("/path/to/csv/file", 'a')) # appending to a file
row = (name, age, loc)
writer.writerow(['name','num','location'])
writer.writerow(row)
Now you can call this function for every item in the list. This should help you to get you started.

Related

Looping through a list (with sublists) and assign the matching IDs to the same key and all of the corresponding values from that sublist?

I'm quite new in python coding and I canĀ“t solve the following problem:
I have a list with trackingpoints for different animals(ID,date,time,lat,lon) given in strings:
aList = [[id,date,time,lat,lon],
[id2,date,time,lat,lon],
[...]]
The txt file is very big and the IDs(a unique animal) is occuring multiple times:
i.e:
aList = [['25','20-05-13','15:16:17','34.89932','24.09421'],
['24','20-05-13','15:16:18','35.89932','23.09421],
['25','20-05-13','15:18:15','34.89932','24.13421'],
[...]]
What I'm trying to do is order the ID's in dictionaries so each unique ID will be the key and all the dates, times, latitudes and longitudes will be the values. Then I would like to write each individual ID to a new txt file so all the values for a specific ID are in one txt file. The output should look like this:
{'25':['20-05-13','15:16:17','34.89932','24.09421'],
['20-05-13','15:18:15','34.89932','24.13421'],
[...],
'24':['20-05-13','15:16:18','35.89932','23.09421'],
[...]
}
I have tried the following (and a lot of other solutions which didn't work):
items = {}
for line in aList:
key,value = lines[0],lines[1:]
items[key] = value
Which results in a key with the last value in the list forthat particular key :
{'25':['20-05-13','15:18:15','34.89932','24.13421'],
'24':['20-05-13','15:16:18','35.89932','23.09421']}
How can I loop through my list and assign the same IDs to the same key and all the corresponding values?
Is there any simple solution to this? Other "easier to implement" solutions are welcome!
I hope it makes sense :)
Try adding all the lists that match to the same ID as list of lists:
aList = [['25','20-05-13','15:16:17','34.89932','24.09421'],
['24','20-05-13','15:16:18','35.89932','23.09421'],
['25','20-05-13','15:18:15','34.89932','24.13421'],
]
items = {}
for line in aList:
key,value = line[0],line[1:]
if key in items:
items[key].append(value)
else:
items[key] = [value]
print items
OUTPUT:
{'24': [['20-05-13', '15:16:18', '35.89932', '23.09421']], '25': [['20-05-13', '15:16:17', '34.89932', '24.09421'], ['20-05-13', '15:18:15', '34.89932', '24.13421']]}

How to get 3 unique values using random.randint() in python?

I am trying to populate a list in Python3 with 3 random items being read from a file using REGEX, however i keep getting duplicate items in the list.
Here is an example.
import re
import random as rn
data = '/root/Desktop/Selenium[FILTERED].log'
with open(data, 'r') as inFile:
index = inFile.read()
URLS = re.findall(r'https://www\.\w{1,10}\.com/view\?i=\w{1,20}', index)
list_0 = []
for i in range(3):
list_0.append(URLS[rn.randint(1, 30)])
inFile.close()
for i in range(len(list_0)):
print(list_0[i])
What would be the cleanest way to prevent duplicate items being appended to the list?
(EDIT)
This is the code that i think has done the job quite well.
def random_sample(data):
r_e = ['https://www\.\w{1,10}\.com/view\?i=\w{1,20}', '..']
with open(data, 'r') as inFile:
urls = re.findall(r'%s' % r_e[0], inFile.read())
x = list(set(urls))
inFile.close()
return x
data = '/root/Desktop/[TEMP].log'
sample = random_sample(data)
for i in range(3):
print(sample[i])
Unordered collection with no duplicate entries.
Use the builtin random.sample.
random.sample(population, k)
Return a k length list of unique elements chosen from the population sequence or set.
Used for random sampling without replacement.
Addendum
After seeing your edit, it looks like you've made things much harder than they have to be. I've wired a list of URLS in the following, but the source doesn't matter. Selecting the (guaranteed unique) subset is essentially a one-liner with random.sample:
import random
# the following two lines are easily replaced
URLS = ['url1', 'url2', 'url3', 'url4', 'url5', 'url6', 'url7', 'url8']
SUBSET_SIZE = 3
# the following one-liner yields the randomized subset as a list
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
print(urlList) # produces, e.g., => ['url7', 'url3', 'url4']
Note that by using len(URLS) and SUBSET_SIZE, the one-liner that does the work is not hardwired to the size of the set nor the desired subset size.
Addendum 2
If the original list of inputs contains duplicate values, the following slight modification will fix things for you:
URLS = list(set(URLS)) # this converts to a set for uniqueness, then back for indexing
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
Or even better, because it doesn't need two conversions:
URLS = set(URLS)
urlList = [u for u in random.sample(URLS, SUBSET_SIZE)]
seen = set(list_0)
randValue = URLS[rn.randint(1, 30)]
# [...]
if randValue not in seen:
seen.add(randValue)
list_0.append(randValue)
Now you just need to check list_0 size is equal to 3 to stop the loop.

Python 2.7 - How to call individual columns from transposed csv file

I understand that the csv module exists, however for my current project we are not allowed to use the module to call csv files.
My code is as follows;
table = []
for line in open("data.csv"):
data = line.split(",")
table.append(data)
transposed = [[table[j][i] for j in range(len(table))] for i in range(len(table[0]))]
rows = transposed[1][1:]
rows = [float(i) for i in rows]
I'm really new to python so this is probably a massively basic question, I've been scouring the internet all day and struggle to find a solution. All I need to do is to be able to call data from any individual column so I can analyse it. Thanks
your data is organized in a list of lists. Each sub list represents a row. To better illustrate this I would avoid using list comprehensions because they are more difficult to read. Additionally I would avoid using variables like 'i' and 'j' and instead use more descriptive names like row or column. Here is a simple example of how I would accomplish this
def read_csv():
table = []
with open("data.csv") as fileobj:
for line in fileobj.readlines():
data = line.strip().split(',')
table.append(data)
return table
def get_column_data(data, column_index):
column_data = []
for row in data:
cell_data = row[column_index]
column_data.append(cell_data)
return column_data
data = read_csv()
get_column_data(data, column_index=2) #example usage

csv file to list of tuples - excluding certain columns

I need to create a list of tuples from a .csv file. On another post a member suggested using this code:
import csv
with open('movieCatalogue.csv') as f:
data=[tuple(line) for line in csv.reader(f)]
data.pop(0)
print(data)
This is almost perfect except the first column in the .csv file contains the product id which I do not one in the tuples. Is there a way to prevent certain columns in each line from being copied.
First, I suppose you're dropping the title line with data.pop(0). You could save a list dealloc/move by skipping when reading.
Then, when you compose your tuple, just drop the first element using sub-list syntax: line[start:stop:step], starting at index 0.
import csv
with open('movieCatalogue.csv') as f:
cr = csv.reader(f)
# drop the first line: better as next(f)
# since it works even if the title line is multi-line!
next(cr)
data=[tuple(line[1:]) for line in cr] # drop first column of each line
print(data)

Does csv.DictReader store file in memory?

I have to read a large CSV file almost of 100K rows in the file, also it will be very easier to process that file if I can read each file row in a dictionary format.
After little research I found python's built-in function csv.DictReader from the csv module.
But in the documentation it is not clear mentioned whether it stores whole file in memory or not.
But it has mentioned that:
The fieldnames parameter is a sequence whose elements are associated with the fields of the input data in order.
But I'm not sure whether sequence is stored in memory or not.
So the question is, does it store whole file in the memory?
If so, is there any other option to read single row as a generaror expression from the file and read get row as dict .
Here is my code:
def file_to_dictionary(self, file_path):
"""Read CSV rows as a dictionary """
file_data_obj ={}
try:
self.log("Reading file: [{}]".format(file_path))
if os.path.exists(file_path):
file_data_obj = csv.DictReader(open(file_path, 'rU'))
else:
self.log("File does not exist: {}".format(file_path))
except Exception as e:
self.log("Failed to read file.", e, True)
return file_data_obj
As far as im aware the DictReader object you create, in your case file_data_obj, is a generator type object.
Generator objects are not stored in memory but can only be iterated over once!
To print the fieldnames of your data as a list you can simply use: print file_data_obj.fieldnames
Secondly, in my experience I find it much easier to use a list of dictionaries when reading data from csv files, where each dictionary represents a row in your file. Consider the following:
def csv_to_dict_list(path):
csv_in = open(path, 'rb')
reader = csv.DictReader(csv_in, restkey=None, restval=None, dialect='excel')
fields = reader.fieldnames
list_out = [row for row in reader]
return list_out, fields
Using the function above (or something similar), you can acheive your goal with a couple of lines. Eg:
data, data_fields = csv_to_dict_list(path)
print data_fields (prints fieldnames)
print data[0] (prints first row of data from file)
Hope this helps!
Luke