Save multiple lists into 1 csv by column? - list

How to save multiple lists into a .csv file with numpy, where each list represents a column.
For example:
A=['A','B','C']
B=['1','2','3']
The csv output file should look like this
A,1
B,2
C,3
Tried this but it only takes 1 argument, how to do it?
numpy.savetxt('output.csv', A,B , delimiter=',', fmt='%s')

You don't really need numpy for this. It is easy enough to do with the csv Python module.
For example,
In [13]: A
Out[13]: ['A', 'B', 'C']
In [14]: B
Out[14]: ['1', '2', '3']
In [15]: import csv
In [16]: with open('AB.csv', 'w') as f:
...: writer = csv.writer(f)
...: writer.writerows(zip(A, B))
...:
In [17]: !cat AB.csv
A,1
B,2
C,3
Or just plain Python, without the csv module:
In [26]: with open('AB.csv', 'w') as f:
...: f.write(''.join('{},{}\n'.format(a, b) for a, b in zip(A, B)))
...:
In [27]: !cat AB.csv
A,1
B,2
C,3
But if you really want to use numpy.savetxt:
In [28]: import numpy as np
In [29]: np.savetxt('AB.csv', list(zip(A, B)), fmt='%s', delimiter=',')
In [30]: !cat AB.csv
A,1
B,2
C,3
All of those suggestions use zip(A, B) to create a sequence of paired tuples from A and B:
In [34]: list(zip(A, B))
Out[34]: [('A', '1'), ('B', '2'), ('C', '3')]

Related

Replacing strings using regex using Pandas

In Pandas, why does the following not replace any strings containing an exclamation mark with whatever follows it?
In [1]: import pandas as pd
In [2]: ser = pd.Series(['Aland Islands !Åland Islands', 'Reunion !Réunion', 'Zi
...: mbabwe'])
In [3]: ser
Out[3]:
0 Aland Islands !Åland Islands
1 Reunion !Réunion
2 Zimbabwe
dtype: object
In [4]: patt = r'.*!(.*)'
In [5]: repl = lambda m: m.group(1)
In [6]: ser.replace(patt, repl)
Out[6]:
0 Aland Islands !Åland Islands
1 Reunion !Réunion
2 Zimbabwe
dtype: object
Whereas the direct reference to the matched substring does work:
In [7]: ser.replace({patt: r'\1'}, regex=True)
Out[7]:
0 Åland Islands
1 Réunion
2 Zimbabwe
dtype: object
What am I doing wrong in the first case?
It appears that replace does not support a method as a replacement argument. Thus, all you can do is to import re library implicitly and use apply:
>>> import re
>>> #... your code ...
>>> ser.apply(lambda row: re.sub(patt, repl, row))
0 Åland Islands
1 Réunion
2 Zimbabwe"
dtype: object
There are two replace methods in Pandas.
The one that acts directly on a Series can take a regex pattern string or a compiled regex and can act in-place, but doesn't allow the replacement argument to be a callable. You must set regex=True and use raw strings.
With:
import re
import pandas as pd
ser = pd.Series(['Aland Islands !Åland Islands', 'Reunion !Réunion', 'Zimbabwe'])
Yes:
ser.replace(r'.*!(.*)', r'\1', regex=True, inplace=True)
ser.replace(r'.*!', '', regex=True, inplace=True)
regex = re.compile(r'.*!(.*)', inplace=True)
ser.replace(regex, r'\1', regex=True, inplace=True)
No:
repl = lambda m: m.group(1)
ser.replace(regex, repl, regex=True, inplace=True)
There's another, used as Series.str.replace. This one accepts a callable replacement but won't substitute in-place and doesn't take a regex argument (though regular expression pattern strings can be used):
Yes:
ser.str.replace(r'.*!', '')
ser.str.replace(r'.*!(.*)', r'\1')
ser.str.replace(regex, repl)
No:
ser.str.replace(regex, r'\1')
ser.str.replace(r'.*!', '', inplace=True)
I hope this is helpful to someone out there.
Try this snippet:
pattern = r'(.*)!'
ser.replace(pattern, '', regex=True)
In your case, you didn't set regex=True, as it is false by default.

How to append a row in CSV file to list?

I have a CSV file contains data reviews and I want to append it to list.
Here is a sample in my file.csv:
I love eating them and they are good for watching TV and looking at movies
This taffy is so good. It is very soft and chewy
I want save in a list all the words of the second line and print them:
['This', 'taffy', 'is', 'so', 'good.', 'It', 'is', 'very', 'soft', 'and', 'chewy']
I tried this:
import csv
with open('file.csv', 'r') as csvfile:
data = csv.reader(csvfile, delimiter=',')
texts = []
next(data)
for row in data:
texts.append(row[2])
print(texts)
My problem is it doesn't print anythings. Can anyone help here?.. Thanks in advance
Don't forget to import csv, if you want to save all the words in the second line, you have to enumerate the lines and take what you want, after that split them and save it in the list, like this:
import csv
texts = []
with open('csvfile.csv', 'r') as csvfile:
for i, line in enumerate(csvfile):
if i == 1:
for word in line.split():
texts.append(word)
print(texts)
$['This', 'taffy', 'is', 'so', 'good.', 'It', 'is', 'very', 'soft', 'and', 'chewy']

Python: How to print specific columns with cut some strings on one of the column reading csv

Am new to Python, hence apologize for basic question.
I've a csv file in the below mentioned format.
##cat temp.csv
Id,Info,TimeStamp,Version,Numitems,speed,Path
18699504331,NA/NA/NA,2017:01:01:13:40:31,3.16,6,781.2kHz,/home/user1
31287345804,NA/NA/NA,2017:01:03:14:35:04,3.16,2,111.5MHz,/home/user2
16360534162,NA/NA/NA,2017:01:02:21:39:51,3.16,3,230MHz,/home/user3
I wanted to read csv and print only specific column of interest and cut some strings in one of the column in a readable fashion, so i can use it.
Here is the python code:
cat temp.py
import csv
with open('temp.csv') as cvsfile:
readcsv = csv.reader(cvsfile, delimiter=',')
Id =[]
Info =[]
Timestamp =[]
Version =[]
Numitems =[]
Speed =[]
Path =[]
for row in readcsv:
lsfid = row[0]
modelinfo = row[1]
timestamp = row[2]
compilever = row[3]
numofavb = row[4]
frequency = row[5]
designpath = row[6]
Id.append(lsfid)
Info.append(modelinfo)
Timestamp.append(timestamp)
Version.append(compilever)
Numitems.append(numofavb)
Speed.append(frequency)
Path.append(designpath)
print(Id)
print(Info)
print(Timestamp)
print(Version)
print(Numitems)
print(Speed)
print(Path)
Output:
python temp.py
['Id', '18699504331', '31287345804', '16360534162', '18772620814', '18699504331', '31287345804', '16360534162']
['Info', 'NA/NA/NA', 'NA/NA/NA', 'NA/NA/NA', 'NA/NA/NA', 'NA/NA/NA', 'NA/NA/NA', 'NA/NA/NA']
['TimeStamp', '2017:01:01:13:40:31', '2017:01:03:14:35:04', '2017:01:02:21:39:51', '2017:01:03:14:40:47', '2017:01:01:13:40:31', '2017:01:03:14:35:04', '2017:01:02:21:39:51']
['Version', '3.16', '3.16', '3.16', '3.16', '3.16', '3.16', '3.16']
['Numitems', '6', '2', '3', '2', '6', '2', '3']
['speed', '781.2kHz', '111.5MHz', '230MHz', '100MHz', '781.2kHz', '111.5MHz', '230MHz']
['Path', '/home/user1', '/home/user2', '/home/user3', '/home/user4', '/home/user5', '/home/user6', '/home/user7']
But what i wanted is in well organized look with my choice of column to be printed, something like below...
Id Info TimeStamp Version Numitems speed Path
18699504331 NA/NA/NA 2017:01:01:13:40:31 3.16 6 781.2kHz user1
31287345804 NA/NA/NA 2017:01:02:21:39:51 3.16 2 111.5MHz user2
31287345804 NA/NA/NA 2017:01:02:21:39:51 3.16 2 111.5MHz user3
Any help could be greatly appreciated!
Thanks in Advance
Velu.V
Check out numpy's genfromtxt function. You can use the use the usecols keyword argument to specify that you only want to read certain columns, see also https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html . For example lets say we have the following csv sheet:
col1 , col2 , col3
0.5, test, 0.3
0.7, test2, 0.1
Then,
import numpy as np
table=np.genfromtxt(f,delimiter=',',skip_header=0,dtype='S',usecols=[0,1])
will load the first two columns. You can then use the tabulate package ( https://pypi.python.org/pypi/tabulate) to nicely print out your table.
print tabulate(table,headers='firstrow')
Will look like:
col1 col2
------- --------
0.5 test
0.7 test2
Hope that answers your question

Using python groupby or defaultdict effectively?

I have a csv with name, role, years of experience. I want to create a list of tuples that aggregates (name, role1, total_exp_inthisRole) for all the employess.
so far i am able to use defaultdict to do the below
import csv, urllib2
from collections import defaultdict
response = urllib2.urlopen(url)
cr = csv.reader(response)
parsed = ((row[0],row[1],int(row[2])) for row in cr)
employees =[]
for item in parsed:
employees.append(tuple(item))
employeeExp = defaultdict(int)
for x,y,z in employees: # variable unpacking
employeeExp[x] += z
employeeExp.items()
output: [('Ken', 15), ('Buckky', 5), ('Tina', 10)]
but how do i use the second column also to achieve the result i want. Shall i try to solve by groupby multiple keys or simpler way is possible? Thanks all in advance.
You can simply pass a tuple of name and role to your defaultdict, instead of only one item:
for x,y,z in employees:
employeeExp[(x, y)] += z
For your second expected output ([('Ken', ('engineer', 5),('sr. engineer', 6)), ...])
You need to aggregate the result of aforementioned snippet one more time, but this time you need to use a defaultdict with a list:
d = defaultdict(list)
for (name, rol), total_exp_inthisRole in employeeExp.items():
d[name].append(rol, total_exp_inthisRole)

sort a list of strings with name and number in

I am trying to sort a list of 100 filenames so they will used in the right order in later calculations. All the filenames have 'name_1' in the beginning of the name and '_out.txt' at the end. The difference is a number in between, going from 1-100
The list looks a bit like this:
['name_1_100_out.txt', 'name_1_10_out.txt', 'name_1_6_out.txt', 'name_1_5_out.txt', 'name_1_2_out.txt']
For this actual example I want:
['name_1_2_out.txt', 'name_1_5_out.txt', 'name_1_6_out.txt', 'name_1_10_out.txt', 'name_1_100_out.txt']
Now I have tried both list.sort and sorted(list) but with no luck. I have also tried with the key=int or key=str but none of them could help, since it seems, that it could not convert only a part of the string to int.
Can anyone help me with advice
You need leading zeros to sort the way you want.
#!/usr/bin/python
# -*- coding: utf-8 -*-
L=['name_1_100_out.txt', 'name_1_10_out.txt', 'name_1_6_out.txt', 'name_1_5_out.txt', 'name_1_2_out.txt']
OUT=[]
n='100' # max number
for item in L:
old=item[7:-8] # Faulty index
if len(old) < len(n):
new='0'*(len(n)-len(old))+old # Nice index
item=item.replace(old, new)
OUT.append(item)
OUT.sort()
print OUT
Result
['name_1_002_out.txt', 'name_1_005_out.txt', 'name_1_006_out.txt', 'name_1_010_out.txt', 'name_1_100_out.txt']
I would suggest renaming files to make life easier later on since not all file managers display faulty filenames in order.
You can use the key function for this task:
>>> l = ['name_1_100_out.txt', 'name_1_10_out.txt', 'name_1_6_out.txt', 'name_1_5_out.txt', 'name_1_2_out.txt']
>>> sorted(l,key=lambda s: int(s.split('_')[2]))
['name_1_2_out.txt', 'name_1_5_out.txt', 'name_1_6_out.txt', 'name_1_10_out.txt', 'name_1_100_out.txt']
lista = ['2','3','5','8','4','6','1']
listb = [('2','3'),('5','8'),('4','6'),('1','9')]
listc = {'a':'3','b':'5','c':'9','d':'4','e':'2','f':'0'}
d = sorted(lista, key=lambda item:int(item), reverse=True)
e = sorted(listb, key=lambda item:int(item[0]) + int(item[1]), reverse=True)
f = sorted(listc.items(), key=lambda item:int(item[1]), reverse=True)
print(d)
print(e)
print(f)
output:
['8', '6', '5', '4', '3', '2', '1']
[('5', '8'), ('4', '6'), ('1', '9'), ('2', '3')]
[('c', '9'), ('b', '5'), ('d', '4'), ('a', '3'), ('e', '2'), ('f', '0')]