writing to columns in same row in csv file (python) - python-2.7

Im trying to write values to a csv file such that for every two iterations, the result is in the same row and then the next the values print to a new row. Any help would be greatly appreciated. Thank you!
This is what I have so far:
import csv
import math
savePath = '/home/dehaoliu/opencv_test/Engineering_drawings_outputs/'
with open(str(savePath) +'outputsTest.csv','w') as f1:
writer=csv.writer(f1, delimiter='\t',lineterminator='\n',)
temp = []
for k in range(0,2):
temp = []
for i in range(0,4):
a = 2 +i
b = 3+ i
list = [a,b]
temp.append(list)
writer.writerow(temp)
The result I am getting now is
[2 3][3 4][4 5][5 6]
[2 3][3 4][4 5][5 6]
But I would like to get this (without the brackets) where each number in a row is in a separate column:
2 3 3 4
4 5 5 6

Try the following:
import csv
import math
savePath = '/home/dehaoliu/opencv_test/Engineering_drawings_outputs/'
with open(str(savePath) +'outputsTest.csv','w') as f1:
writer=csv.writer(f1, delimiter='\t',lineterminator='\n',)
temp = [2, 3]
for i in range(2):
temp = [x + i for x in temp]
additional = [y+1 for y in temp]
writer.writerow(temp + additional)
temp = additional[:]
This should return:
# 2 3 3 4
# 4 5 5 6
You start with a temporary containing the numbers 2 and 3. Then, you loop from 0 to 2 (excluding). At every iteration, you increment the values of the temporary by the current index and subsequently create an additional list with these new values of your temporary list. Once that's done, you join the two lists together and write the result out to your file. At this point, you can set your temporary list to be equal to the values of the additional list, before moving on to the next iteration.
I hope this helps.

The way you present it you can do it with a simple seed and increment:
import csv
import os
save_path = "/home/dehaoliu/opencv_test/Engineering_drawings_outputs/"
with open(os.path.join(save_path, "outputsTest.csv"), "w") as f:
writer = csv.writer(f, delimiter="\t", lineterminator="\n")
temp = [2, 3, 3, 4] # init seed
increment = len(temp) // 2 # how many pairs we have, used to increase our seed each row
for _ in range(2): # how many rows do you need, any positive integer will do
writer.writerow(temp) # write the current value
temp = [x + increment for x in temp] # add 'increment' to the elements
Resulting in:
2 3 3 4
4 5 5 6
But if your seed is: temp = [2, 3, 3, 4, 4, 5] and you decide to generate 4 rows, it will still adapt:
2 3 3 4 4 5
5 6 6 7 7 8
8 9 9 10 10 11
11 12 12 13 13 14

Related

numpy array to pandas pivot table

I'm new to pandas and am trying to create a pivot table from a numpy array.
variable npArray is just that, a numpy array:
>>> npArray
array([(1, 3), (4, 3), (1, 3), ..., (1, 4), (1, 12), (1, 12)],
dtype=[('MATERIAL', '<i4'), ('DIVISION', '<i4')])
I'd to count occurrences of each material by division, with division being rows and material being columns. Example:
What I have:
#numpy array to pandas data frame
pandaDf = pandas.DataFrame (npArray)
#pivot table - guessing here
pandas.pivot_table (pandaDf, index = "DIVISION",
columns = "MATERIAL",
aggfunc = numpy.sum) #<--- want count, not sum
Results:
Empty DataFrame
Columns: []
Index: []
Sample of pandaDf:
>>> print pandaDf
MATERIAL DIVISION
0 1 3
1 4 3
2 1 3
3 1 3
4 1 3
5 1 3
6 1 3
7 1 3
8 1 3
9 1 3
10 1 3
11 1 3
12 4 3
... ... ...
3845291 1 4
3845292 1 4
3845293 1 4
3845294 1 12
3845295 1 12
[3845296 rows x 2 columns]
Any help would be appreciated.
Something similar has already been asked: https://stackoverflow.com/a/12862196/9754169
Bottom line, just do aggfunc=lambda x: len(x)
#GerardoFlores is correct. Another solution I found was adding a column for frequency.
#numpy array to pandas data frame
pandaDf = pandas.DataFrame (npArray)
print "adding frequency column"
pandaDf ["FREQ"] = 1
#pivot table
pivot = pandas.pivot_table (pandaDf, values = "FREQ",
index = "DIVISION", columns = "MATERIAL",
aggfunc = "count")

for loop in pandas to search dataframe and update list stuck

I want to count areas of interest in my dataframe column 'which_AOI' (ranging from 0 -9). I would like to have a new column with the results added to a dataframe depending on a variable 'marker' (ranging from 0 - x) which tells me when one 'picture' is done and the next begins (one marker can go on for a variable length of rows). This is my code so far but it seems to be stuck and runs on without giving output. I tried reconstructing it from the beginning once but as soon as i get to 'if df.marker == num' it doesn't stop. What am I missing?
(example dataframe below)
## AOI count of spec. type function (in progress):
import numpy as np
import pandas as pd
path_i = "/Users/Desktop/Pilot/results/gazedata_filename.csv"
df = pd.read_csv(path_i, sep =",")
#create a new dataframe for AOIs:
d = {'marker': []}
df_aoi = pd.DataFrame(data=d)
### Creating an Aoi list
item = df.which_AOI
aoi = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] #list for search
aoi_array = [0, 0 , 0, 0, 0, 0, 0, 0, 0, 0] #list for filling
num = 0
for i in range (0, len (df.marker)): #loop through the dataframe
if df.marker == num: ## if marker = num its one picture
for index, item in enumerate(aoi): #look for item (being a number in which_AOI) in aoi list
if (item == aoi[index]):
aoi_array[index] += 1
print (aoi)
print (aoi_array)
se = pd.Series(aoi_array) # make list into a series to attach to dataframe
df_aoi['new_col'] = se.values #add list to dataframe
aoi_array.clear() #clears list before next picture
else:
num +=1
index pos_time pos_x pos_y pup_time pup_diameter marker which_AOI fixation Picname shock
1 16300 168.608779907227 -136.360855102539 16300 2.935715675354 0 7 18 5 save
2 16318 144.97673034668 -157.495513916016 16318 3.08838820457459 0 8 33 5 save
3 16351 152.92560577392598 -156.64172363281298 16351 3.0895299911499 0 7 17 5 save
4 16368 152.132453918457 -157.989685058594 16368 3.111008644104 0 7 18 5 save
5 16386 151.59835815429702 -157.55587768554702 16386 3.09514689445496 0 7 18 5 save
6 16404 150.88092803955098 -152.69479370117202 16404 3.10009074211121 1 7 37 5 save
7 16441 152.76554107666 -142.06188964843798 16441 3.0821495056152304 1 7 33 5 save
Not 100% clear based on your question but it sounds like you want to count the number of rows for each which_AOI value in each marker.
You can accomplish this using groupby
df_aoi = df.groupby(['marker','which_AOI']).size().unstack('which_AOI',fill_value=0)
In:
pos_time pos_x pos_y pup_time pup_diameter marker \
0 16300 168.608780 -136.360855 16300 2.935716 0
1 16318 144.976730 -157.495514 16318 3.088388 0
2 16351 152.925606 -156.641724 16351 3.089530 0
3 16368 152.132454 -157.989685 16368 3.111009 0
4 16386 151.598358 -157.555878 16386 3.095147 0
5 16404 150.880928 -152.694794 16404 3.100091 1
6 16441 152.765541 -142.061890 16441 3.082150 1
which_AOI fixation Picname shock
0 7 18 5 save
1 8 33 5 save
2 7 17 5 save
3 7 18 5 save
4 7 18 5 save
5 7 37 5 save
6 7 33 5 save
Out:
which_AOI 7 8
marker
0 4 1
1 2 0

Concatenate pandas dataframe with result of apply(lambda) where lambda returns another dataframe

A dataframe stores some values in columns, passing those values to a function I get another dataframe. I'd like to concatenate the returned dataframe's columns to the original dataframe.
I tried to do something like
i = pd.concat([i, i[['cid', 'id']].apply(lambda x: xy(*x), axis=1)], axis=1)
but it did not work with error:
ValueError: cannot copy sequence with size 2 to array axis with dimension 1
So I did like this:
def xy(x, y):
return pd.DataFrame({'x': [x*2], 'y': [y*2]})
df1 = pd.DataFrame({'cid': [4, 4], 'id': [6, 10]})
print('df1:\n{}'.format(df1))
df2 = pd.DataFrame()
for _, row in df1.iterrows():
nr = xy(row['cid'], row['id'])
nr['cid'] = row['cid']
nr['id'] = row['id']
df2 = df2.append(nr, ignore_index=True)
print('df2:\n{}'.format(df2))
Output:
df1:
cid id
0 4 6
1 4 10
df2:
x y cid id
0 8 12 4 6
1 8 20 4 10
The code does not look nice and should work slowly.
Is there pandas/pythonic way to do it properly and fast working?
python 2.7
Option 0
Most directly with pd.DataFrame.assign. Not very generalizable.
df1.assign(x=df1.cid * 2, y=df1.id * 2)
cid id x y
0 4 6 8 12
1 4 10 8 20
Option 1
Use pd.DataFrame.join to add new columns
This shows how to adjoin new columns after having used apply with a lambda
df1.join(df1.apply(lambda x: pd.Series(x.values * 2, ['x', 'y']), 1))
cid id x y
0 4 6 8 12
1 4 10 8 20
Option 2
Use pd.DataFrame.assign to add new columns
This shows how to adjoin new columns after having used apply with a lambda
df1.assign(**df1.apply(lambda x: pd.Series(x.values * 2, ['x', 'y']), 1))
cid id x y
0 4 6 8 12
1 4 10 8 20
Option 3
However, if your function really is just multiplying by 2
df1.join(df1.mul(2).rename(columns=dict(cid='x', id='y')))
Or
df1.assign(**df1.mul(2).rename(columns=dict(cid='x', id='y')))

Subtract value in one data frame from the next value in a second data frame

I have a data frame that is composed of several datasets (about 146 and counting). two of my columns are labeled "start_time" and "stop_time," which represent the start and stop of a response (i.e., the total duration of the response).
I need to get the "inter-response time" or the start_time subtracted from the next corresponding value in start_time. Basically if:
start_time = [1,4,7]
stop_time = [2,5,8]
I need:
stop_time[0] - start_time[1]
stop_time[2] - start_time[3]
in order to get:
iri = [2,2]
My code looks like this:
iri_t = []
def grps():
for grp in lset2_name_grps.groups:
beg_eng_t = pd.DataFrame([lset2_name_grps.stop_time, lset2_name_grps.start_time], columns=['end_t','beg_t'])
end_t = [i for i in lset2_name_grps.stop_time]
beg_t = [i for i in lset2_name_grps.start_time]
beg_t = np.insert(beg_t, len(beg_t),0)
end_t = np.insert(end_t, 0,0)
iri_t.append(np.subtract(end_t, beg_t))
# for i,j in zip(end_t, beg_t):
# iri_t.append(np.subtract(i,j))
# lset2_name_grps['iri'] = iri_t
grps()
Essentially, it doesn't do anything close to what I'm trying to accomplish and the only out I get is either "Not Implemented" or an error.
How about something like this:
import pandas as pd
starts = pd.Series([1, 4, 7])
stops = pd.Series([2, 5, 8])
iri_t = [0]
for i in range(1, len(starts)):
iri_t.append(starts[i] - ends[i-1])
times_df = pd.concat([starts, stops, pd.Series(iri_t)], axis=1)
This creates the following data_frame:
0 1 2
0 1 2 0
1 4 5 2
2 7 8 2
I think what your asking (correct me if I'm wrong) is best accomplished by putting the two columns in a single dataframe, using shift to offset one of your columns, then doing an ordinary subtraction.
df = pd.DataFrame({'start_time':[1,4,7], 'stop_time':[2,5,8]})
df.stop_time - df.start_time.shift()
Out[5]:
0 NaN
1 4
2 4
dtype: float64

How to call lists in a for loop?

I am stuck at a failry simple looping exercise through lists and getting error "TypeError: 'list' object is not callable".
I have three lists with n number of records. I want to write first record from all lists in the same line and want to repeat this procedure for n number of records, it will result in n number of lines. Following are lists that I want to use:
lst1 = ['1','2','4','5','3']
lst2 = ['3','4','3','4','3']
lst3 = ['0.52','0.91','0.18','0.42','0.21']
istring=""
lst=0
for i in range(0,10): # range is simply upper limit of number of records in lists
entry = lst1(lst)
istring = istring + entry.rjust(11) # first entry from each list will be cat here
lst=lst+1
Any startup would be really helpful.
This works for any size of lists:
for i in zip(lst1, lst2, lst3):
for j in i:
print j.rjust(11),
print
1 3 0.52
2 4 0.91
4 3 0.18
5 4 0.42
3 3 0.21
>>> lst1 = ['1','2','4','5','3']
>>> lst2 = ['3','4','3','4','3']
>>> lst3 = ['0.52','0.91','0.18','0.42','0.21']
>>> a = zip(lst1, lst2, lst3)
>>> istring = ""
>>> for entry in a:
... istring += entry[0].rjust(11)
... istring += entry[1].rjust(11)
... istring += entry[2].rjust(11) + "\n"
...
>>> print istring
1 3 0.52
2 4 0.91
4 3 0.18
5 4 0.42
3 3 0.21
Try entry = lst1[lst] instead of entry = lst1(lst)
() usually denotes calling a function, whereas
[] usually denotes accessing an element of something.
A list is not a function.
Also, while you can keep your own index, a for loop makes this unnecessary
x = [1,2,3,4,5,7,9,11,13,15]
y = [2,4,6,8,10,12,14,16,18,20]
z = [3,4,5,6,7,8,9,10,11,12]
for i in range(0,10):
print x[i], y[i], z[i]
1 2 3
2 4 4
3 6 5
4 8 6
5 10 7
7 12 8
9 14 9
11 16 10
13 18 11
15 20 12