Select a batch from a matrix in a loop - python-2.7

Assume I have the following matrix:
X = np.array([[1,2,3], [4,5,6], [7,8,9], [70,80,90], [45,43,68], [112,87,245]])
I want to draw a batch of 2 random rows at each time loop, and send it to a function. For instance, a batch in iteration i can be batch = [[4,5,6], [70,80,90]]
I do the following:
X = np.array([[1,2,3], [4,5,6], [7,8,9], [70,80,90], [45,43,68], [112,87,245]])
def caclulate_batch(batch):
pass
for i in range(X.shape[0]/2):
batch = np.array([])
for _ in range(2):
r = random.randint(0, 5)
batch = np.append(batch, X[r])
caclulate_batch(batch)
There are two problems here: (1) It returns appended array (2) The random number can be repeated which can choose the same row many times. How can modify the code to fit my requirement.

r = np.random.randint(0, len(x), 2) should get you the indices. That lets you use fancy indexing to get the subset: batch = x[r, :].
If you want to accumulate arrays along a new dimension, as your loop does, use np.stack or np.block instead of np.append.

(1) You can use numpy.stack instead of append. EDIT: But this function would be called when you have all your batch in a list like:
list = ([1,2], [3,4])
numpy.stack(list)
# gives [[1,2],
# [3,4]]
(2) You can shuffle X array, loop through the results and extract two by two. Look at numpy.random.shuffle
It would look like that:
S = np.random.shuffle(X)
for i in range(S.shape[0]/2):
batch = S[i*2:i*2+1]
caclulate_batch(batch)

Related

nested list of lists of inegers - doing arithmetic operation

I have a list like below and need to firs add items in each list and then multiply all results 2+4 = 6 , 3+ (-2)=1, 2+3+2=7, -7+1=-6 then 6*1*7*(-6) = -252 I know how to do it by accessing indexes and it works (as below) but I also need to do it in a way that it will work no matter how many sublist there is
nested_lst = [[2,4], [3,-2],[2,3,2], [-7,1]]
a= nested_lst[0][0] + nested_lst[0][1]
b= nested_lst[1][0] + nested_lst[1][1]
c= nested_lst[2][0] + nested_lst[2][1] + nested_lst[2][2]
d= nested_lst[3][0] + nested_lst[3][1]
def sum_then_product(list):
multip= a*b*c*d
return multip
print sum_then_product(nested_lst)
I have tried with for loop which gives me addition but I don't know how to perform here multiplication. I am new to it. Please, help
nested_lst = [[2,4], [3,-2],[2,3,2], [-7,1]]
for i in nested_lst:
print sum(i)
Is this what you are looking for?
nested_lst = [[2,4], [3,-2],[2,3,2], [-7,1]] # your list
output = 1 # this will generate your eventual output
for sublist in nested_lst:
sublst_out = 0
for x in sublist:
sublst_out += x # your addition of the sublist elements
output *= sublst_out # multiply the sublist-addition with the other sublists
print(output)

Python keeps printing first index of for loop

I have data in two directories and i'm using for loop to read the files from both the folders.
path_to_files = '/home/Desktop/computed_2d/'
path_to_files1 = '/home/Desktop/computed_1d/'
for filen in [x for x in os.listdir(path_to_files) if '.ares' in x]:
df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
for filen1 in [x for x in os.listdir(path_to_files1) if '.ares' in x]:
df1 = pd.read_table(path_to_files1+filen1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
print(filen,filen1)
Now what's happening is like when tried to print the filenames then it kept printing the names forever. So, its basically taking the first iteration from first loop then print it with all the iteration of the second loop.I don't understand why is it happening.
But what i want to do is, i want to print the first iteration of first loop with the first iteration of second for loop
As the file names are same in both the folders.So when i do the print, then desired result should look like something like this:
(txt_1.txt,txt_1.txt)
(txt_2.txt,txt_2.txt)
(txt_3.txt,txt_3.txt)
(txt_4.txt,txt_4.txt)
Where i'm making the mistake??
If I understand your question correctly, you seem to want to print pairs of files from path_to_files and path_to_files1. Since you are nesting a for loop, for every iteration of the nested for loop, filen is not going to change.
I think you might want something more like this:
path_to_files = '/home/Desktop/computed_2d/'
path_to_files1 = '/home/Desktop/computed_1d/'
filelistn = [x for x in os.listdir(path_to_files) if '.ares' in x]
filelist1 = [x for x in os.listdir(path_to_files1) if '.ares' in x]
for filen, filen1 in zip(filelistn, filelist1):
df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
df1 = pd.read_table(path_to_files1+filen1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
print(filen,filen1)
For a sample input of:
filelistn = ['a.ar', 'b.ar']
filelist1 = ['c.ar', 'd.ar']
I get the following output:
('a.ar', 'c.ar')
('b.ar', 'd.ar')

Using For loop on nested list

I'm using a nested list to hold data in a Cartesian coordinate type system.
The data is a list of categories which could be 0,1,2,3,4,5,255 (just 7 categories).
The data is held in a list formatted thus:
stack = [[0,1,0,0],
[2,1,0,0],
[1,1,1,3]]
Each list represents a row and each element of a row represents a data point.
I'm keen to hang on to this format because I am using it to generate images and thus far it has been extremely easy to use.
However, I have run into problems running the following code:
for j in range(len(stack)):
stack[j].append(255)
stack[j].insert(0, 255)
This is intended to iterate through each row adding a single element 255 to the start and end of each row. Unfortunately it adds 12 instances of 255 to both the start and end!
This makes no sense to me. Presumably I am missing something very trivial but I can't see what it might be. As far as I can tell it is related to the loop: if I write stack[0].append(255) outside of the loop it behaves normally.
The code is obviously part of a much larger script. The script runs multiple For loops, a couple of which are range(12) but which should have closed by the time this loop is called.
So - am I missing something trivial or is it more nefarious than that?
Edit: full code
step_size = 12, the code above is the part that inserts "right and left borders"
def classify(target_file, output_file):
import numpy
import cifar10_eval # want to hijack functions from the evaluation script
target_folder = "Binaries/" # finds target file in "Binaries"
destination_folder = "Binaries/Maps/" # destination for output file
# open the meta file to retrieve x,y dimensions
file = open(target_folder + target_file + "_meta" + ".txt", "r")
new_x = int(file.readline())
new_y = int(file.readline())
orig_x = int(file.readline())
orig_y = int(file.readline())
segment_dimension = int(file.readline())
step_size = int(file.readline())
file.close()
# run cifar10_eval and create predictions vector (formatted as a list)
predictions = cifar10_eval.map_interface(new_x * new_y)
del predictions[(new_x * new_y):] # get rid of excess predictions (that are an artefact of the fixed batch size)
print("# of predictions: " + str(len(predictions)))
# check that we are mapping the whole picture! (evaluation functions don't necessarily use the full data set)
if len(predictions) != new_x * new_y:
print("Error: number of predictions from cifar10_eval does not match metadata for this file")
return
# copy predictions to a nested list to make extraction of x/y data easy
# also eliminates need to keep metadata - x/y dimensions are stored via the shape of the output vector
stack = []
for j in range(new_y):
stack.append([])
for i in range(new_x):
stack[j].append(predictions[j*new_x + i])
predictions = None # clear the variable to free up memory
# iterate through map list and explode each category to cover more pixels
# assigns a step_size x step_size area to each classification input to achieve correspondance with original image
new_stack = []
for j in range(len(stack)):
row = stack[j]
new_row = []
for i in range(len(row)):
for a in range(step_size):
new_row.append(row[i])
for b in range(step_size):
new_stack.append(new_row)
stack = new_stack
new_stack = None
new_row = None # clear the variables to free up memory
# add a border to the image to indicate that some information has been lost
# border also ensures that map has 1-1 correspondance with original image which makes processing easier
# calculate border dimensions
top_and_left_thickness = int((segment_dimension - step_size) / 2)
right_thickness = int(top_and_left_thickness + (orig_x - (top_and_left_thickness * 2 + step_size * new_x)))
bottom_thickness = int(top_and_left_thickness + (orig_y - (top_and_left_thickness * 2 + step_size * new_y)))
print(top_and_left_thickness)
print(right_thickness)
print(bottom_thickness)
print(len(stack[0]))
# add the right then left borders
for j in range(len(stack)):
for b in range(right_thickness):
stack[j].append(255)
for b in range(top_and_left_thickness):
stack[j].insert(0, 255)
print(stack[0])
print(len(stack[0]))
# add the top and bottom borders
row = []
for i in range(len(stack[0])):
row.append(255) # create a blank row
for b in range(top_and_left_thickness):
stack.insert(0, row) # append the blank row to the top x many times
for b in range(bottom_thickness):
stack.append(row) # append the blank row to the bottom of the map
# we have our final output
# repackage this as a numpy array and save for later use
output = numpy.asarray(stack,numpy.uint8)
numpy.save(destination_folder + output_file + ".npy", output)
print("Category mapping complete, map saved as numpy pickle: " + output_file + ".npy")

Python: referring to each duplicate item in a list by unique index

I am trying to extract particular lines from txt output file. The lines I am interested in are few lines above and few below the key_string that I am using to search through the results. The key string is the same for each results.
fi = open('Inputfile.txt')
fo = open('Outputfile.txt', 'a')
lines = fi.readlines()
filtered_list=[]
for item in lines:
if item.startswith("key string"):
filtered_list.append(lines[lines.index(item)-2])
filtered_list.append(lines[lines.index(item)+6])
filtered_list.append(lines[lines.index(item)+10])
filtered_list.append(lines[lines.index(item)+11])
fo.writelines(filtered_list)
fi.close()
fo.close()
The output file contains the right lines for the first record, but multiplied for every record available. How can I update the indexing so it can read every individual record? I've tried to find the solution but as a novice programmer I was struggling to use enumerate() function or collections package.
First of all, it would probably help if you said what exactly goes wrong with your code (a stack trace, it doesn't work at all, etc). Anyway, here's some thoughts. You can try to divide your problem into subproblems to make it easier to work with. In this case, let's separate finding the relevant lines from collecting them.
First, let's find the indexes of all the relevant lines.
key = "key string"
relevant = []
for i, item in enumerate(lines):
if item.startswith(key):
relevant.append(item)
enumerate is actually quite simple. It takes a list, and returns a sequence of (index, item) pairs. So, enumerate(['a', 'b', 'c']) returns [(0, 'a'), (1, 'b'), (2, 'c')].
What I had written above can be achieved with a list comprehension:
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
So, we have the indexes of the relevant lines. Now, let's collected them. You are interested in the line 2 lines before it and 6 and 10 and 11 lines after it. If your first lines contains the key, then you have a problem – you don't really want lines[-1] – that's the last item! Also, you need to handle the situation in which your offset would take you past the end of the list: otherwise Python will raise an IndexError.
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
You could also catch the IndexError, but that won't save us much typing, as we have to handle negative indexes anyway.
The whole program would look like this:
key = "key string"
with open('Inputfile.txt') as fi:
lines = fi.readlines()
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
with open('Outputfile.txt', 'a') as fi:
fi.writelines(out)
To get rid of duplicates you can cast list to set; example:
x=['a','b','a']
y=set(x)
print(y)
will result in:
['a','b']

Deleting duplicate x values and their corresponding y values

I am working with a list of points in python 2.7 and running some interpolations on the data. My list has over 5000 points and I have some repeating "x" values within my list. These repeating "x" values have different corresponding "y" values. I want to get rid of these repeating points so that my interpolation function will work, because if there are repeating "x" values with different "y" values it runs an error because it does not satisfy the criteria of a function. Here is a simple example of what I am trying to do:
Input:
x = [1,1,3,4,5]
y = [10,20,30,40,50]
Output:
xy = [(1,10),(3,30),(4,40),(5,50)]
The interpolation function I am using is InterpolatedUnivariateSpline(x, y)
have a variable where you store the previous X value, if it is the same as the current value then skip the current value.
For example (pseudo code, you do the python),
int previousX = -1
foreach X
{
if(x == previousX)
{/*skip*/}
else
{
InterpolatedUnivariateSpline(x, y)
previousX = x /*store the x value that will be "previous" in next iteration
}
}
i am assuming you are already iterating so you dont need the actualy python code.
A bit late but if anyone is interested, here's a solution with numpy and pandas:
import pandas as pd
import numpy as np
x = [1,1,3,4,5]
y = [10,20,30,40,50]
#convert list into numpy arrays:
array_x, array_y = np.array(x), np.array(y)
# sort x and y by x value
order = np.argsort(array_x)
xsort, ysort = array_x[order], array_y[order]
#create a dataframe and add 2 columns for your x and y data:
df = pd.DataFrame()
df['xsort'] = xsort
df['ysort'] = ysort
#create new dataframe (mean) with no duplicate x values and corresponding mean values in all other cols:
mean = df.groupby('xsort').mean()
df_x = mean.index
df_y = mean['ysort']
# poly1d to create a polynomial line from coefficient inputs:
trend = np.polyfit(df_x, df_y, 14)
trendpoly = np.poly1d(trend)
# plot polyfit line:
plt.plot(df_x, trendpoly(df_x), linestyle=':', dashes=(6, 5), linewidth='0.8',
color=colour, zorder=9, figure=[name of figure])
Also, if you just use argsort() on the values in order of x, the interpolation should work even without the having to delete the duplicate x values. Trying on my own dataset:
polyfit on its own
sorting data in order of x first, then polyfit
sorting data, delete duplicates, then polyfit
... I get the same result twice