storing a numpy arrays in a list - list

I'd like to automate the process of loading some ASCII datafiles in numpy in order to plot them. The filenames are given to the program via terminal and the content is loaded and saved in a list. So basically the idea is to have list that contains numpy arrays that I can call later via indexing to plot each individual data.
The problem I have is that indexing is not working with these lists I make
subplots_array = [[0,0],[0,0],[0,0],[0,0]]
subplots_axes = [0,0,0,0] #this array will allow to create subplots for '''
# each of the above data
fig = plt.figure()
counter = 0
for x in arguments_list:
for filename in glob.glob(x):
mydata = np.loadtxt(filename)
subplots_array[counter] = mydata # This loads the data from files
#specified in arguments argv into a subplot array as numpy sub-array
counter += 1
counter = 0
for x in subplots_array:
subplots_axes[counter] = fig.add_subplot(counter+1, 1, 1)
subplots_axes[counter].scatter(subplots_array[counter][:, 0]), subplots_array[counter][:, 1], s = 12, marker = "x")
counter = counter + 1
This is the error I get. The funny thing is if I substitute "counter" with a numerical index like 0 or 1 or 2 etc, the data is plotted correctly, despite counter being defined as an index as well. So I am out of ideas.
Traceback (most recent call last):
File "FirstTrial.py", line 89, in <module>
subplots_axes[counter].scatter(subplots_array[counter][:,0], subplots_array[counter][:, 1], s = 12, marker = "x")
TypeError: list indices must be integers or slices, not tuple
I hope this is enough description to help me solve the issue.

Your indentation is off, and with out your files I can't reproduce your code. But here's what I think is happening:
In [1]: subplots_array = [[0,0],[0,0],[0,0],[0,0]]
In [2]: subplots_array[0]=np.ones((2,3)) # one or more counter loops
indexing with counter==0 works:
In [4]: subplots_array[0][:,0]
Out[4]: array([1., 1.])
but your error with the next counter value:
In [5]: subplots_array[1][:,0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-0b654866bdee> in <module>
----> 1 subplots_array[1][:,0]
TypeError: list indices must be integers or slices, not tuple
Here I replaced one element of subplot_array with a 2d array, but left the others alone. They were initialed as lists:
In [6]: subplots_array
Out[6]:
[array([[1., 1., 1.],
[1., 1., 1.]]), [0, 0], [0, 0], [0, 0]]
So the problem isn't with the counter type itself, but with the next level of indexing.

Related

Calling as a list by filtering values in a Data Frame

I'm trying to write a code that returns only the numeric values greater than zero in a Data frame with column names as a list using for-loop,
but I couldn't succeed. How do you think I can fix this error and how can I solve this problem?
The code I wrote is below:
col=df.columns
for i in col:
Liste=[]
index=[]
for j in range(40):
if i[j] != 0:
Liste=Liste + i[j]
index=index + j
i=zip(index,Liste)
I get this error;
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_98056/4001655036.py in <module>
5 for j in range(40):
6 if i[j] != 0:
----> 7 Liste=Liste + i[j]
8 index=index + j
9 i=zip(index,Liste)
TypeError: can only concatenate list (not "str") to list
My Data Frame is like this:
The behavior of + operator with lists is to concatenate lists.
list1 = [1, 2, 3]
list2 = [4, 5, 6]
list3 = list1 + list2. # Returns [1, 2, 3, 4, 5, 6]
In your code the problem is at line where you are trying to append the item to list:
Liste=Liste + i[j]
I am fairly sure you intend to add item to the list, thus replace the line with this line:
Liste.append(i[j]). # no need for Liste on left hand side, since append with modify the list
Hope that helps.

python3 delete more than one elements in a list using another list as index

import numpy as np
m = np.arange(10).tolist()
n = [2, 6, 4]
I want to delete the 2nd, 6th, and 4th elements in list m.
del m[n]
Traceback (most recent call last):
File "", line 1, in
TypeError: list indices must be integers or slices, not list
I tried this:
ns = np.sort(n)
for i in np.arange(len(ns)):
m.pop(ns[i] - i)
but it pop out the deleted elements
Is there any elegant method to do this job?
For this simple case, you can use m = np.delete(m, n).
Here is a link to the doc : https://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html

numpy recarray append_fields: can't append numpy array of datetimes

I have a recarray containing various fields and I want to append an array of datetime objects on to it.
However, it seems like the append_fields function in numpy.lib.recfunctions won't let me add an array of objects.
Here's some example code:
import numpy as np
import datetime
import numpy.lib.recfunctions as recfun
dtype= np.dtype([('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4')])
obs = np.array([(0.1,10.0),(0.2,11.0),(0.3,12.0)], dtype=dtype)
dates = np.array([datetime.datetime(2001,1,1,0),
datetime.datetime(2001,1,1,0),
datetime.datetime(2001,1,1,0)])
# This doesn't work:
recfun.append_fields(obs,'obdate',dates,dtypes=np.object)
I keep getting the error TypeError: Cannot change data-type for object array.
It seems to only be an issue with np.object arrays as I can append other fields ok. Am I missing something?
The problem
In [143]: recfun.append_fields(obs,'test',np.array([None,[],1]))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-143-5c3de23b09f7> in <module>()
----> 1 recfun.append_fields(obs,'test',np.array([None,[],1]))
/usr/local/lib/python3.5/dist-packages/numpy/lib/recfunctions.py in append_fields(base, names, data, dtypes, fill_value, usemask, asrecarray)
615 if dtypes is None:
616 data = [np.array(a, copy=False, subok=True) for a in data]
--> 617 data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
618 else:
619 if not isinstance(dtypes, (tuple, list)):
/usr/local/lib/python3.5/dist-packages/numpy/lib/recfunctions.py in <listcomp>(.0)
615 if dtypes is None:
616 data = [np.array(a, copy=False, subok=True) for a in data]
--> 617 data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
618 else:
619 if not isinstance(dtypes, (tuple, list)):
/usr/local/lib/python3.5/dist-packages/numpy/core/_internal.py in _view_is_safe(oldtype, newtype)
363
364 if newtype.hasobject or oldtype.hasobject:
--> 365 raise TypeError("Cannot change data-type for object array.")
366 return
367
TypeError: Cannot change data-type for object array.
So the problem is in this a.view([(name, a.dtype)]) expression. It tries to make a single field structured array from a. That works with dtypes like int and str, but fails with object. That failure is in the core view handling, so isn't likely to change.
In [148]: x=np.arange(3)
In [149]: x.view([('test', x.dtype)])
Out[149]:
array([(0,), (1,), (2,)],
dtype=[('test', '<i4')])
In [150]: x=np.array(['one','two'])
In [151]: x.view([('test', x.dtype)])
Out[151]:
array([('one',), ('two',)],
dtype=[('test', '<U3')])
In [152]: x=np.array([[1],[1,2]])
In [153]: x
Out[153]: array([[1], [1, 2]], dtype=object)
In [154]: x.view([('test', x.dtype)])
...
TypeError: Cannot change data-type for object array.
The fact that recfunctions requires a separate load indicates that it is somewhat of a backwater, that isn't used a lot, and not under active development. I haven't examined the code in detail, but I suspect a fix would be a kludge.
A fix
Here's a way of adding a new field from scratch. It performs the same basic actions as append_fields:
Define a new dtype, using the obs and the new field name and dtype:
In [158]: obs.dtype.descr
Out[158]: [('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4')]
In [159]: obs.dtype.descr+[('TEST',object)]
Out[159]: [('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4'), ('TEST', object)]
In [160]: dt1 =np.dtype(obs.dtype.descr+[('TEST',object)])
Make an empty target array, and fill it by copying data by field name:
In [161]: newobs = np.empty(obs.shape, dtype=dt1)
In [162]: for n in obs.dtype.names:
...: newobs[n]=obs[n]
In [167]: dates
Out[167]:
array([datetime.datetime(2001, 1, 1, 0, 0),
datetime.datetime(2001, 1, 1, 0, 0),
datetime.datetime(2001, 1, 1, 0, 0)], dtype=object)
In [168]: newobs['TEST']=dates
In [169]: newobs
Out[169]:
array([( 0.1 , 10., datetime.datetime(2001, 1, 1, 0, 0)),
( 0.2 , 11., datetime.datetime(2001, 1, 1, 0, 0)),
( 0.30000001, 12., datetime.datetime(2001, 1, 1, 0, 0))],
dtype=[('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4'), ('TEST', 'O')])
datetime64 alternative
With the native numpy datetimes, append works
In [179]: dates64 = dates.astype('datetime64[D]')
In [180]: recfun.append_fields(obs,'test',dates64,usemask=False)
Out[180]:
array([( 0.1 , 10., '2001-01-01'),
( 0.2 , 11., '2001-01-01'), ( 0.30000001, 12., '2001-01-01')],
dtype=[('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4'), ('test', '<M8[D]')])
append_fields has some bells-n-whistles that my version doesn't - fill values, masked arrays, recarray, etc.
structured dates array
I could create a structured array with the dates
In [197]: sdates = np.array([(i,) for i in dates],dtype=[('test',object)])
In [198]: sdates
Out[198]:
array([(datetime.datetime(2001, 1, 1, 0, 0),),
(datetime.datetime(2001, 1, 1, 0, 0),),
(datetime.datetime(2001, 1, 1, 0, 0),)],
dtype=[('test', 'O')])
There must be a function that merges fields of existing arrays, but I'm not finding it.
previous work
This felt familiar:
https://github.com/numpy/numpy/issues/2346
TypeError when appending fields to a structured array of size ONE
Adding datetime field to recarray

assigning kth elemnt of a list to 'obj_k'

I am working with Python 2.7. Given a list of n objects i want to assign the variable 'obj_k' to the kth element of the list.
For example given the list
mylist = [1,'car', 10]
i am looking for a way to do the following for me
obj_0 = 1
obj_1 = 'car'
obj_2 = 10
This seems pretty basic, but i don't see how to do it. Morally i am thinking about something along the lines of
for i in range(len(mylist)): obj_i = mylist[i]
which is obviously syntactically invalid.
I am not sure if dynamically creating variable is a good idea but here is what you want. If I were you, I probably use a dictionary to store the variables to just avoid polluting the namespace.
In [1]: mylist = [1,'car', 10]
In [2]: print obj_1
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-2f857dd36a40> in <module>()
----> 1 print obj_1
NameError: name 'obj_1' is not defined
In [3]: for i,c in enumerate(mylist):
...: globals()['obj_'+str(i)] = c
...:
In [4]: obj_1
Out[4]: 'car'

out of bounds error when using a list as an index

I have two files: one is a single column (call it pred) and has no headers, the other has two columns: ID and IsClick (it has headers). My goal is to use the column ID as an index to pred.
import pandas as pd
import numpy as np
def LinesInFile(path):
with open(path) as f:
for linecount, line in enumerate(f):
pass
f.close()
print 'Found ' + str(linecount) + ' lines'
return linecount
path ='/Users/mas/Documents/workspace/Avito/input/' # path to testing file
submission = path + 'submission1234.csv'
lines = LinesInFile(submission)
lines = LinesInFile(path + 'sampleSubmission.csv')
sample = pd.read_csv(path + 'sampleSubmission.csv')
preds = np.array(pd.read_csv(submission, header = None))
index = sample.ID.values - 1
print index
print len(index)
sample['IsClick'] = preds[index]
sample.to_csv('submission.csv', index=False)
The output is:
Found 7816360 lines
Found 7816361 lines
[ 0 4 5 ..., 15961507 15961508 15961511]
7816361
Traceback (most recent call last):
File "/Users/mas/Documents/workspace/Avito/July3b.py", line 23, in <module>
sample['IsClick'] = preds[index]
IndexError: index 7816362 is out of bounds for axis 0 with size 7816361
there seems something wrong because my file has 7816361 lines counting the header while my list has an extra element (len of list 7816361)
I don't have your csv files to recreate the problem, but the problem looks like it is being caused by your use of index.
index = sample.ID.values - 1 is taking each of your sample ID's and subtracting 1. These are not index values in pred as it is only 7816360 long. Each of the last 3 items in your index array (based on your print output) would go out of bounds as they are >7816360. I suspect the error is showing you the first of your ID-1 that go out of bounds.
Assuming you just want to join the files based on their line number you could do the following:
sample=pd.concat((pd.read_csv(path + 'sampleSubmission.csv'),pd.read_csv(submission, header = None).rename(columns={0:'IsClick'})),axis=1)
Otherwise you'll need to perform a join or merge on your two dataframes.