I'm using a nested list to hold data in a Cartesian coordinate type system.
The data is a list of categories which could be 0,1,2,3,4,5,255 (just 7 categories).
The data is held in a list formatted thus:
stack = [[0,1,0,0],
[2,1,0,0],
[1,1,1,3]]
Each list represents a row and each element of a row represents a data point.
I'm keen to hang on to this format because I am using it to generate images and thus far it has been extremely easy to use.
However, I have run into problems running the following code:
for j in range(len(stack)):
stack[j].append(255)
stack[j].insert(0, 255)
This is intended to iterate through each row adding a single element 255 to the start and end of each row. Unfortunately it adds 12 instances of 255 to both the start and end!
This makes no sense to me. Presumably I am missing something very trivial but I can't see what it might be. As far as I can tell it is related to the loop: if I write stack[0].append(255) outside of the loop it behaves normally.
The code is obviously part of a much larger script. The script runs multiple For loops, a couple of which are range(12) but which should have closed by the time this loop is called.
So - am I missing something trivial or is it more nefarious than that?
Edit: full code
step_size = 12, the code above is the part that inserts "right and left borders"
def classify(target_file, output_file):
import numpy
import cifar10_eval # want to hijack functions from the evaluation script
target_folder = "Binaries/" # finds target file in "Binaries"
destination_folder = "Binaries/Maps/" # destination for output file
# open the meta file to retrieve x,y dimensions
file = open(target_folder + target_file + "_meta" + ".txt", "r")
new_x = int(file.readline())
new_y = int(file.readline())
orig_x = int(file.readline())
orig_y = int(file.readline())
segment_dimension = int(file.readline())
step_size = int(file.readline())
file.close()
# run cifar10_eval and create predictions vector (formatted as a list)
predictions = cifar10_eval.map_interface(new_x * new_y)
del predictions[(new_x * new_y):] # get rid of excess predictions (that are an artefact of the fixed batch size)
print("# of predictions: " + str(len(predictions)))
# check that we are mapping the whole picture! (evaluation functions don't necessarily use the full data set)
if len(predictions) != new_x * new_y:
print("Error: number of predictions from cifar10_eval does not match metadata for this file")
return
# copy predictions to a nested list to make extraction of x/y data easy
# also eliminates need to keep metadata - x/y dimensions are stored via the shape of the output vector
stack = []
for j in range(new_y):
stack.append([])
for i in range(new_x):
stack[j].append(predictions[j*new_x + i])
predictions = None # clear the variable to free up memory
# iterate through map list and explode each category to cover more pixels
# assigns a step_size x step_size area to each classification input to achieve correspondance with original image
new_stack = []
for j in range(len(stack)):
row = stack[j]
new_row = []
for i in range(len(row)):
for a in range(step_size):
new_row.append(row[i])
for b in range(step_size):
new_stack.append(new_row)
stack = new_stack
new_stack = None
new_row = None # clear the variables to free up memory
# add a border to the image to indicate that some information has been lost
# border also ensures that map has 1-1 correspondance with original image which makes processing easier
# calculate border dimensions
top_and_left_thickness = int((segment_dimension - step_size) / 2)
right_thickness = int(top_and_left_thickness + (orig_x - (top_and_left_thickness * 2 + step_size * new_x)))
bottom_thickness = int(top_and_left_thickness + (orig_y - (top_and_left_thickness * 2 + step_size * new_y)))
print(top_and_left_thickness)
print(right_thickness)
print(bottom_thickness)
print(len(stack[0]))
# add the right then left borders
for j in range(len(stack)):
for b in range(right_thickness):
stack[j].append(255)
for b in range(top_and_left_thickness):
stack[j].insert(0, 255)
print(stack[0])
print(len(stack[0]))
# add the top and bottom borders
row = []
for i in range(len(stack[0])):
row.append(255) # create a blank row
for b in range(top_and_left_thickness):
stack.insert(0, row) # append the blank row to the top x many times
for b in range(bottom_thickness):
stack.append(row) # append the blank row to the bottom of the map
# we have our final output
# repackage this as a numpy array and save for later use
output = numpy.asarray(stack,numpy.uint8)
numpy.save(destination_folder + output_file + ".npy", output)
print("Category mapping complete, map saved as numpy pickle: " + output_file + ".npy")
Related
I am trying to find an integer programming formulation for the following problem:
Sort a list of items according to 2 criteria, represented by the following cost:
'Positional cost': dependent on the position of the item in the list
'Neighbour cost': dependent on two items being positioned next to eachother in the list (ie the items are immediate neighbours)
The following apply:
Every item can only be placed once
The list has a beginning position P_1 (-> the item in that position does not have a preceeding list item)
The list has an end position P_X (-> the item in that position does not have a subsequent item)
I used PuLP to formulate as described below and tried solving with GLPK_CMD and COIN_CMD. It works for 3 items, but returns 'Undefined' for 4 or more items.
As far as I can tell, an order such as the one described in DataFrame 'feasible_solution_example' will not violate the constraints. can anyone venture a guess why the solver does not find a solution?
Any input and comments are highly appreciated.
import pulp
import numpy as np
import pandas as pd
############# Size of problem: number of items in the list to be sorted
length_of_list=12
list_of_items=['I'+str(x).zfill(3) for x in np.arange(1, length_of_list+1)]
list_of_neighbours=['N_I'+str(x).zfill(3) for x in np.arange(1,length_of_list+1)]
list_of_positions=['P'+str(x).zfill(3) for x in np.arange(1,length_of_list+1)]
############# Cost matrices
random_position_cost=np.random.randint(-length_of_list, length_of_list, [length_of_list,length_of_list])
positional_cost=pd.DataFrame(random_position_cost, index=list_of_items, columns=list_of_positions).astype(int)
random_neighbour_cost=np.random.randint(-length_of_list, length_of_list, [length_of_list,length_of_list])
neigbour_cost=pd.DataFrame(data=random_neighbour_cost, index=list_of_items, columns=list_of_neighbours)
############# This is/should be an example of a feasible solution
feasible_solution_example=pd.DataFrame(index=list_of_items, columns=(list_of_items + list_of_neighbours))
feasible_solution_example.loc[:, list_of_items]=np.identity(length_of_list)
feasible_solution_example.loc[:, list_of_neighbours[0]]=0
feasible_solution_example.loc[:, list_of_neighbours[1:]]=np.identity(length_of_list)[:,:-1]
feasible_solution_example=feasible_solution_example.astype(int)
############# Model definition
model = pulp.LpProblem("Optimal order problem ", pulp.LpMinimize)
# Decision variables:
item_in_position = pulp.LpVariable.dicts("item_in_position ",
((a, b) for a in feasible_solution_example.index for b in list_of_positions),
cat='Binary')
item_neighbours = pulp.LpVariable.dicts("item_neighbours",
((a, b) for a in feasible_solution_example.index for b in list_of_neighbours if str('N_' + a)!=b),
cat='Binary')
# Objective Function
model += (
pulp.lpSum([
(positional_cost.loc[i, j] * item_in_position[(i, j)])
for i in feasible_solution_example.index for j in list_of_positions] +
[(neigbour_cost.loc[i, j] * item_neighbours[(i, j)])
for i in feasible_solution_example.index for j in list_of_neighbours if str('N_' + i)!=j]
)
)
## Constraints:
# 1- Every item can take only one position...
for cur_item in feasible_solution_example.index:
model += pulp.lpSum([item_in_position[cur_item, j] for j in list_of_positions]) == 1
# 2-...and every position can only be taken once
for cur_pos in list_of_positions:
model += pulp.lpSum([item_in_position[i, cur_pos] for i in feasible_solution_example.index]) == 1
# 3-...item in position 1 can not be the neighbour (ie the preceeding item) to any other item :
# -> all neighbour values for any item x + the value for position 1 for this item must add up to 1
for cur_neighbour in list_of_neighbours:
model += pulp.lpSum([item_neighbours[cur_item, cur_neighbour] for cur_item in list_of_items if cur_neighbour!=str('N_' + cur_item)] +
item_in_position[cur_neighbour.split('_')[1], list_of_positions[0]] ) == 1
# 4-...item in the last position can not have any neighbours (ie any preceeding items):
# -> all neighbour values of all items + value for the last position must sum up to 1:
for cur_item in feasible_solution_example.index:
model += pulp.lpSum(([item_neighbours[cur_item, cur_neighbour] for cur_neighbour in list_of_neighbours if cur_neighbour!=str('N_' + cur_item)] +
[item_in_position[cur_item,list_of_positions[-1]]])) ==1
# 5-... When two items are neighbours the cost for that combination needs to be "switched on"
# -> (e.g. I001 in position P001 and I002 in P002 then I001 and N_I002 must be eqaul to 1)
for item_x in np.arange(0,length_of_list-1):
for neighbour_y in np.arange(0,length_of_list-1):
if item_x!=neighbour_y:
# Item x would be neighbour with lot y IF:
for cur_position in np.arange(0,length_of_list-1):
#...they would have subsequent positions...
model += item_neighbours[list_of_items[item_x], list_of_neighbours[neighbour_y]] +1 >= item_in_position[list_of_items[item_x],list_of_positions[cur_position]] + item_in_position[list_of_items[neighbour_y],list_of_positions[cur_position+1]]
############# Solve
model.solve()
model.solve(pulp.GLPK_CMD())
model.solve(pulp.COIN_CMD())
print(pulp.LpStatus[model.status])
Update:
Thanks for the comments. After removing constraints this works now.
Assume I have the following matrix:
X = np.array([[1,2,3], [4,5,6], [7,8,9], [70,80,90], [45,43,68], [112,87,245]])
I want to draw a batch of 2 random rows at each time loop, and send it to a function. For instance, a batch in iteration i can be batch = [[4,5,6], [70,80,90]]
I do the following:
X = np.array([[1,2,3], [4,5,6], [7,8,9], [70,80,90], [45,43,68], [112,87,245]])
def caclulate_batch(batch):
pass
for i in range(X.shape[0]/2):
batch = np.array([])
for _ in range(2):
r = random.randint(0, 5)
batch = np.append(batch, X[r])
caclulate_batch(batch)
There are two problems here: (1) It returns appended array (2) The random number can be repeated which can choose the same row many times. How can modify the code to fit my requirement.
r = np.random.randint(0, len(x), 2) should get you the indices. That lets you use fancy indexing to get the subset: batch = x[r, :].
If you want to accumulate arrays along a new dimension, as your loop does, use np.stack or np.block instead of np.append.
(1) You can use numpy.stack instead of append. EDIT: But this function would be called when you have all your batch in a list like:
list = ([1,2], [3,4])
numpy.stack(list)
# gives [[1,2],
# [3,4]]
(2) You can shuffle X array, loop through the results and extract two by two. Look at numpy.random.shuffle
It would look like that:
S = np.random.shuffle(X)
for i in range(S.shape[0]/2):
batch = S[i*2:i*2+1]
caclulate_batch(batch)
So presently code is as so:
table = []
for line in open("harrytest.csv") as f:
data = line.split(",")
table.append(data)
transposed = [[table[j][i] for j in range(len(table))] for i in range(len(table[0]))]
openings = transposed[1][1: - 1]
openings = [float(i) for i in openings]
mean = sum(openings)/len(openings)
print mean
minimum = min(openings)
print minimum
maximum = max(openings)
print maximum
range1 = maximum - minimum
print range1
This only prints one column of 7 for me, it also leaves out the bottom line. We are not allowed to import with csv module, use numpy, pandas. The only module allowed is os, sys, math & datetime.
How do I write the code so as to get median, first, last values for any column.
Change this line:
openings = transposed[1][1: - 1]
to this
openings = transposed[1][1:]
and the last row should appear. You calculations for mean, min, max and range seem correct.
For median you have to sort the row and select the one middle element or average of the two middle elements. First and last element is just row[0] and row[-1].
I am working with a list of points in python 2.7 and running some interpolations on the data. My list has over 5000 points and I have some repeating "x" values within my list. These repeating "x" values have different corresponding "y" values. I want to get rid of these repeating points so that my interpolation function will work, because if there are repeating "x" values with different "y" values it runs an error because it does not satisfy the criteria of a function. Here is a simple example of what I am trying to do:
Input:
x = [1,1,3,4,5]
y = [10,20,30,40,50]
Output:
xy = [(1,10),(3,30),(4,40),(5,50)]
The interpolation function I am using is InterpolatedUnivariateSpline(x, y)
have a variable where you store the previous X value, if it is the same as the current value then skip the current value.
For example (pseudo code, you do the python),
int previousX = -1
foreach X
{
if(x == previousX)
{/*skip*/}
else
{
InterpolatedUnivariateSpline(x, y)
previousX = x /*store the x value that will be "previous" in next iteration
}
}
i am assuming you are already iterating so you dont need the actualy python code.
A bit late but if anyone is interested, here's a solution with numpy and pandas:
import pandas as pd
import numpy as np
x = [1,1,3,4,5]
y = [10,20,30,40,50]
#convert list into numpy arrays:
array_x, array_y = np.array(x), np.array(y)
# sort x and y by x value
order = np.argsort(array_x)
xsort, ysort = array_x[order], array_y[order]
#create a dataframe and add 2 columns for your x and y data:
df = pd.DataFrame()
df['xsort'] = xsort
df['ysort'] = ysort
#create new dataframe (mean) with no duplicate x values and corresponding mean values in all other cols:
mean = df.groupby('xsort').mean()
df_x = mean.index
df_y = mean['ysort']
# poly1d to create a polynomial line from coefficient inputs:
trend = np.polyfit(df_x, df_y, 14)
trendpoly = np.poly1d(trend)
# plot polyfit line:
plt.plot(df_x, trendpoly(df_x), linestyle=':', dashes=(6, 5), linewidth='0.8',
color=colour, zorder=9, figure=[name of figure])
Also, if you just use argsort() on the values in order of x, the interpolation should work even without the having to delete the duplicate x values. Trying on my own dataset:
polyfit on its own
sorting data in order of x first, then polyfit
sorting data, delete duplicates, then polyfit
... I get the same result twice
I am trying to convert the below Matlab code into C++ using codegen. However it fails at build and I get the error:
"??? Unless 'rows' is specified, the first input must be a vector. If the vector is variable-size, the either the first dimension or the second must have a fixed length of 1. The input [] is not supported. Use a 1-by-0 or 0-by-1 input (e.g., zeros(1,0) or zeros(0,1)) to represent the empty set."
It then points to [id,m,n] = unique(id); being the culprit. Why doesn't it build and what's the best way to fix it?
function [L,num,sz] = label(I,n) %#codegen
% Check input arguments
error(nargchk(1,2,nargin));
if nargin==1, n=8; end
assert(ndims(I)==2,'The input I must be a 2-D array')
sizI = size(I);
id = reshape(1:prod(sizI),sizI);
sz = ones(sizI);
% Indexes of the adjacent pixels
vec = #(x) x(:);
if n==4 % 4-connected neighborhood
idx1 = [vec(id(:,1:end-1)); vec(id(1:end-1,:))];
idx2 = [vec(id(:,2:end)); vec(id(2:end,:))];
elseif n==8 % 8-connected neighborhood
idx1 = [vec(id(:,1:end-1)); vec(id(1:end-1,:))];
idx2 = [vec(id(:,2:end)); vec(id(2:end,:))];
idx1 = [idx1; vec(id(1:end-1,1:end-1)); vec(id(2:end,1:end-1))];
idx2 = [idx2; vec(id(2:end,2:end)); vec(id(1:end-1,2:end))];
else
error('The second input argument must be either 4 or 8.')
end
% Create the groups and merge them (Union/Find Algorithm)
for k = 1:length(idx1)
root1 = idx1(k);
root2 = idx2(k);
while root1~=id(root1)
id(root1) = id(id(root1));
root1 = id(root1);
end
while root2~=id(root2)
id(root2) = id(id(root2));
root2 = id(root2);
end
if root1==root2, continue, end
% (The two pixels belong to the same group)
N1 = sz(root1); % size of the group belonging to root1
N2 = sz(root2); % size of the group belonging to root2
if I(root1)==I(root2) % then merge the two groups
if N1 < N2
id(root1) = root2;
sz(root2) = N1+N2;
else
id(root2) = root1;
sz(root1) = N1+N2;
end
end
end
while 1
id0 = id;
id = id(id);
if isequal(id0,id), break, end
end
sz = sz(id);
% Label matrix
isNaNI = isnan(I);
id(isNaNI) = NaN;
[id,m,n] = unique(id);
I = 1:length(id);
L = reshape(I(n),sizI);
L(isNaNI) = 0;
if nargout>1, num = nnz(~isnan(id)); end
Just an FYI, if you are using MATLAB R2013b or newer, you can replace error(nargchk(1,2,nargin)) with narginchk(1,2).
As the error message says, for codegen unique requires that the input be a vector unless 'rows' is passed.
If you look at the report (click the "Open report" link that is shown) and hover over id you will likely see that its size is neither 1-by-N nor N-by-1. The requirement for unique can be seen if you search for unique here:
http://www.mathworks.com/help/coder/ug/functions-supported-for-code-generation--alphabetical-list.html
You could do one of a few things:
Make id a vector and treat it as a vector for the computation. Instead of the declaration:
id = reshape(1:prod(sizI),sizI);
you could use:
id = 1:numel(I)
Then id would be a row vector.
You could also keep the code as is and do something like:
[idtemp,m,n] = unique(id(:));
id = reshape(idtemp,size(id));
Obviously, this will cause a copy, idtemp, to be made but it may involve fewer changes to your code.
Remove the anonymous function stored in the variable vec and make vec a subfunction:
function y = vec(x)
coder.inline('always');
y = x(:);
Without the 'rows' option, the input to the unique function is always interpreted as a vector, and the output is always a vector, anyway. So, for example, something like id = unique(id) would have the effect of id = id(:) if all the elements of the matrix id were unique. There is no harm in making the input a vector going in. So change the line
[id,m,n] = unique(id);
to
[id,m,n] = unique(id(:));