Convert a text file to an numpyarray - python-2.7

I am new to python. I have a .txt file
SET = 20,21,23,21,23
45,46,42,23,55
with many number of rows. How would I convert this txt file into an array ignoring spaces and commas? Any help would be really appreciated.

l1=[]
file = open('list-num')
for l in file:
l2 = map(int,l.split(','))
l1 = l1 + l2
print l1

Your data looks like :
SET 1 = 13900100,13900141,13900306,13900442,13900453,13900461, 13900524,13900537,13900619,13900632,13900638,13900661, 13900665,13900758,13900766,13900825,13900964,13901123, 13901131,13901136,13901141,13901143,13901195,13901218,
you can use the numpy command : np.genfromtxt ()
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt("text.txt", delimiter=",")
data = data[np.logical_not(np.isnan(data))] #Remove nan value
print data
I get :
[ 13900141. 13900306. 13900442. 13900453. 13900461. 13900524.
13900537. 13900619. 13900632. 13900638. 13900661. 13900665.
13900758. 13900766. 13900825. 13900964. 13901123. 13901131.
13901136. 13901141. 13901143. 13901195. 13901218.]
It should work ;)
------------------------------------
Other way :
import numpy as np
f = open("text.txt", "r") #Open data file
data = f.read() #Read data file
cut = data.split() #Split data file
value = cut[2] #Pick the value part
array = np.array(value) #Value becomes an array
print array
I get :
13900100,13900141,13900306,13900442,13900453,13900461,13900524,13900537,13900619,13900632,13900638,13900661,13900665,13900758,13900766,13900825,13900964,13901123,13901131,13901136,13901141,13901143,13901195,13901218

Related

Use python to calculate data in CSV

Like the sample, first I want to read the CSV file and sum each row and store the result in new column (which need to create).
sample:
import csv
new_rows = []
with open('file.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
row = [int(val) for val in row]
row.append(sum(row))
new_rows.append(row)
with open('file.csv', 'w') as csvfile:
csv.writer(csvfile).writerows(new_rows)
turning file.csv from
1,2
3,4
into
1,2,3
3,4,7

sys.argv : index out of range

I try to run this code
import sys
import numpy as np
filename = sys.argv[1]
X = []
y = []
with open(filename, 'r') as f:
for line in f.readlines():
xt, yt = [float(i) for i in line.split(',')]
X.append(xt)
y.append(yt)
and I get this error
4 filename = sys.argv[1]
5 X = []
6 y = []
IndexError: list index out of range
how can I fix it ?
I have a file in txt that I want to read my data from it.
4.94,4.37
-1.58,1.7
-4.45,1.88
-6.06,0.56
-1.22,2.23
-3.55,1.53
0.36,2.99
-3.24,0.48
1.31,2.76
2.17,3.99
2.94,3.25
-0.92,2.27
-0.91,2.0
1.24,4.75
1.56,3.52
-4.14,1.39
3.75,4.9
4.15,4.44
0.33,2.72
3.41,4.59
2.27,5.3
2.6,3.43
1.06,2.53
1.04,3.69
2.74,3.1
-0.71,2.72
-2.75,2.82
0.55,3.53
-3.45,1.77
1.09,4.61
2.47,4.24
-6.35,1.0
1.83,3.84
-0.68,2.42
-3.83,0.67
-2.03,1.07
3.13,3.19
0.92,4.21
4.02,5.24
3.89,3.94
-1.81,2.85
3.94,4.86
-2.0,1.31
0.54,3.99
0.78,2.92
2.15,4.72
2.55,3.83
-0.63,2.58
1.06,2.89
-0.36,1.99
Make sure you pass the filename as #hpaulj suggested. You could also check the length of sys.argv
import sys
print(sys.argv, len(sys.argv))
if len(sys.argv) < 2:
sys.exit('Usage: %s input_file' % sys.argv[0])
You may also want to check this helper class:
https://docs.python.org/2/library/fileinput.html#module-fileinput

Make a comma separated list of out of co-ordinates from a csv file

I have values x and y in a csv and i am reading those values and converting them into a numpy array using below code:
import numpy as np
import csv
data = np.loadtxt('datapoints.csv', delimiter=',')
# Putting data from csv file to variable
x = data[:, 0]
y = data[:, 1]
# Converting npArray to simple array
np.asarray(x)
np.asarray(y)
So, now i have the values of x and y.
But, i want them to be in this format:
[[x1,y1],[x2,y2], [x3,y3], ...... [xn,yn]]
How do i do that?
use zip :
result = [list(a) for a in zip(np.asarray(x),np.asarray(y))]

How to convert recurrent vertical column into rows than stack them together in Python/Pandas?

I am generating some data vertically at first, but would like to transpose them into row data, then stack them into an array like a Pandas data frame. How do I get a final product of a pandas data frame with 4 columns ('fr', 'en', 'ir', 'ab') and three rows?
# coding=utf-8
import pandas as pd
from pandas import DataFrame, Series
import numpy as np
import nltk
import re
import random
from random import randint
import csv
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Get csv file into data frame
data = pd.read_csv("FamilySearchData_All_OCT2015_newEthnicity_filledEthnicity_processedName_trimmedCol.csv", header=0, encoding="utf-8")
df = DataFrame(data)
columns = ['fr', 'en', 'ir', 'ab']
classes = ['ethnicity2', 'Ab_group', 'Ab_tribe']
df_count = DataFrame(columns=columns)
for j in classes:
for i in columns:
ethnicity_tar = str(i)
count = 0
try:
count = df[str(j)].value_counts()[ethnicity_tar]
except Exception as e:
count = ''
print ethnicity_tar, count
Output:
fr 1554455
en 1196932
ir 941852
ab 95131
fr 1554444
en 16000
ir 940850
ab 9371
fr 1554600
en 2196931
ir 940957
ab 9399
What I would like at the end:
fr en ir ab
1554455 1196932 941852 95131
1554444 16000 940850 9371
1554600 2196931 940957 9399
To implement this I would create a dictionary (hash) of the column names each containing an array. Then as I loop through the rows in your file, I'd use the first value to index into the dictionary to get the array and then append the numerical value to that array.
Once this interim data structure is built, you could loop through the arrays pulling the same index value for each row and printing them:
for i in range(0, n):
print str(hash['fr'][i]) + " " +
str(hash['en'][i]) + " " +
str(hash['ir'][i]) + " "
str(hash['ab'][i])

How do I separate out unique rows in a list that has both a datetime and float column?

I'm relatively new to Python, and I am having trouble separating out unique rows from a data set that I had recently converted into lists. I broke separated out the data's unixtime recordings and converted them into datetime. Then when I recombined the data into a list I tried to separate out the unique rows of data. But instead I get the error.
[[[datetime.datetime(2014, 6, 20, 0, 0) -16.0]
[datetime.datetime(2014, 6, 20, 0, 0) -16.0]........
Traceback (most recent call last):
File "C:\Users\lenovo\Favorites\Microsoft 网站\Downloads\OTdataparser.py", line 33, in <module>
indicies = np.unique(okdat, return_index = True) #<-- NOT WORKING
File "C:\Python27\lib\site-packages\numpy\lib\arraysetops.py", line 180, in unique
perm = ar.argsort(kind='mergesort')
TypeError: can't compare datetime.datetime to float
My script is below.
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import math
ds5 = np.genfromtxt("gpsdata.dat.140620", delimiter = '',
usecols = (2,4,5), dtype = object)
print ds5
ds = np.array([x for x in ds5 if x[0] == "06/20/2014"])
dot = ds[:,2].astype(float)
print ds
rndsht = np.genfromtxt(ds[:,1], delimiter = ".", dtype = float) #Rm decimal
print rndsht
dutc = np.array([datetime.utcfromtimestamp(x) for x in rndsht[:,0]])
print dutc
#dutc = np.array([datetime.utcfromtimestamp(x) for x in ds[:,1].astype(float)])
okdat = np.dstack((dutc,dot))
#okdat.astype(object)
print okdat
#indicies = np.unique(dutc, return_index=True) #<-- WORKS! BUT okdat??
#print indicies
indicies = np.unique(okdat, return_index = True) #<-- NOT WORKING
print indicies
#Can't figure out how to use indicies to limit dot
You could write your own unique function.
Here is quick example (you can probably do better). Note that is doesn't preserve order, but you could use insert and do that.
def
def unique(data):
x = 0
while x < len(data):
i = data[x]
c = 0
while (i in data):
c += 1
data.remove(i)
data.append(i)
if (c <= 1):
x += 1
return data