In MATLAB,
I have the following data:
mass = [ 23 45 44]
velocity = [34 53 32]
time = [1 2 3]
acceleration = [32 22 12]
speed = [12 33 44]
What I'm trying to achieve is to apply uicontrol that creates two lists with this data (mass, velocity, time, acceleration, speed), and have the ability to click on one of the variables (mass) in each column and there is a numerical data output, like mass = 23 45 44
Output: numerical data stored in these variables
Here is the code:
function learnlists()
figure;
yourcell={'mass','velocity','time','acceleration','speed'}
hb = uicontrol('Style', 'listbox','Position',[100 100 200 200],...
'string',yourcell,'Callback',#measurements)
yourcell={'mass','velocity','time','acceleration','speed'}
hc = uicontrol('Style', 'listbox','Position',[300 100 200 200],...
'string',yourcell,'Callback',#measurements)
function [out] = measurements(hb,evnt)
outvalue = get(hb,'value');
v = get(hb,'value')
if v == 1
mass = [1 2 3 4 5]
elseif v == 2
velocity = [ 1 2 3 4 5]
end
end
end
Thanks,
Amanda
I suggest you to not use a function to keep things simpler and keep all the variables in your base workspace.
Here is an example for one list box:
mass = [ 23 45 44];
velocity = [34 53 32];
time = [1 2 3];
acceleration = [32 22 12];
speed = [12 33 44];
figure;
yourcell = {'mass','velocity','time','acceleration','speed'};
hb = uicontrol('Style', 'listbox','Position',[100 100 200 200],...
'string',yourcell,'Callback',...
['switch get(hb, ''Value''), ',...
'case 1, mass, ',...
'case 2, velocity, ',...
'case 3, time, ',...
'case 4, acceleration, ',...
'case 5, speed, ',...
'end']);
However this displays in command window, you could change the code to show it in a text box in your gui.
You can also execute a script as the Callback function.
hb = uicontrol('Style', 'listbox','Position',[100 100 200 200],...
'string',yourcell,'Callback', 'myScript');
and then create an m-script in your directory:
(myScript.m)
switch get(hb, 'Value')
case 1
mass
case 2
velocity
case 3
time
case 4
acceleration
case 5
speed
end
Note that everything is still in your base workspace.
Hope it helps.
Related
I have a big (600,600,600) numpy array filled with my data. Now I would like to extract regions from this with a given width around an arbitrary line through the box.
For the line I have the x, y and z coordinates of every point in separate numpy arrays. So let's say the line has 35 points in the data box, then the x, y and z arrays each have lengths of 35 as well. I can extract the points along the line itself by using indexing like this
extraction = data[z,y,x]
Now ideally I'd like to extract a box around it by doing something like the following
extraction = data[z-3:z+3,y-3:y+3,z-3:z+3]
but because x, y and z are arrays, this is not possible. The only way I could think of of doing this is through a for-loop for each point, so
extraction = np.array([])
for i in range(len(x)):
extraction = np.append(extraction,data[z[i]-3:z[i]+3,y[i]-3:y[i]+3,z[i]-3:z[i]+3])
and then reshaping the extraction array afterwards. However, this is very slow and there will be some overlap between each of the slices in this for-loop I'd like to prevent.
Is there a simple way to do this directly without a for-loop?
EDIT:
Let me rephrase the question through another idea I came up with that is also slow. I have a line running through the datacube. I have a lists of x, y and z coordinates (the coordinates being the indices in the datacube array) with all the points that define the line.
As an example these lists look like this:
x_line: [345 345 345 345 342 342 342 342 342 342 342 342 342 342 342 342]
y_line: [540 540 540 543 543 543 543 546 546 546 549 549 549 549 552 552]
z_line: [84 84 84 87 87 87 87 87 90 90 90 90 90 93 93 93]
As you can see, some of these coordinates are identical, due to the lines being defined in different coordinates and then binned to the resolution of the data box.
Now I want to mask all cells in the datacube with a distance larger than 3 cells.
For a single point along the line (x_line[i], y_line[i], z_line[i]) this is relatively easy.I created a meshgrid for the coordinates in the datacube and then create a mask array of zeros and put everything satisfying the condition to 1:
data = np.random.rand(600,600,600)
x_box,y_box,z_box = np.meshgrid(n.arange(600),n.arange(600),n.arange(600))
mask = np.zeros(np.shape(data))
for i in range(len(x_line)):
distance = np.sqrt((x_box-x_line[i])**2 + (y_box-y_line[i])**2 + (z_box-z_line[i])**2)
mask[distance <= 3] = 1.
extraction = data[mask == 1.]
The advantage of this is that the mask array removes the problem of having duplicate extractions. However, both the meshgrid and distance calculations are very slow. So is it possible to do the calculation of the distance directly on the entire line without having to do a for-loop over each line point, so that it directly masks all cells that are within a distance of 3 cells from ANY of the line points?
How about this?
# .shape = (N,)
x, y, z = ...
# offsets in [-3, 3), .shape = (6, 6, 6)
xo, yo, zo = np.indices((6, 6, 6)) - 3
# box indices, .shape = (6, 6, 6, N)
xb, yb, zb = x + xo[...,np.newaxis], y + yo[...,np.newaxis], z + zo[...,np.newaxis]
# .shape = (6, 6, 6, N)
extractions = data[xb, yb, zb]
This extracts a series of 6x6x6 cubes, each "centered" on the coordinates in x, y, and z
This will produce duplicate coordinates, and fail on cases near the borders
If you keep your xyz in one array, this gets a little less verbose, and you can remove the duplicates:
# .shape = (N,3)
xyz = ...
# offsets in [-3, 3), .shape = (6, 6, 6, 3)
xyz_offset = np.moveaxis(np.indices((6, 6, 6)) - 3, 0, -1)
# box indices, .shape = (6, 6, 6, N, 3)
xyz_box = xyz + xyz_offset[...,np.newaxis,:]
if remove_duplicates:
# shape (M, 3)
xyz_box = xyz_box.reshape(-1, 3)
xyz_box = np.unique(xyz_box, axis=0)
xb, yb, zb = xyz_box
extractions = data[xb, yb, zb]
I have the following data frame:
df1 <- data.frame(x1=c(1,2,3,4), x2=c(10,20,30,40), x3=c(100,200,300,400))
And I want to generate al the possible data frames that can be created from combining d1$x1, df1$x2 and df1$x3 in different orders so 4^3 different dataframes, e.g:
x1 x2 x3
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400`
x1 x2 x3
1 1 40 400
2 2 30 300
3 3 20 200
4 4 10 100
and so on. For each of them I want to compute the following function:
my.function <- function(x1, x2, x3) {
sum(0.3*x1^2+0.3*x2^2+0.4*x3)/nrow(x1)
}
I did this, but it's clearly wrong:
res1 <- rep(NA, nrow(df1)^3)
for(i in 1:nrow(df1)){
for(j in 1:nrow(df1)){
for(k in 1:nrow(df1)){
x1.1 <- as.vector(c(df1[-i, 1], df1[i, 1]))
x2.1 <- as.vector(c(df1[-k, 2], df1[k, 2]))
x3.1 <- as.vector(c(df1[-j, 3], df1[j, 3]))
res1[nrow(df1)^2*(i-1) + nrow(df1)*(j-1)+k] <- m.function(x1.1, x2.1, x3.1)
}
}
}
I tried to find a similar problem of mine without much luck, could you please help me?
Thank you so much!!!
What's a simple and efficient way to shuffle a dataframe in pandas, by rows or by columns? I.e. how to write a function shuffle(df, n, axis=0) that takes a dataframe, a number of shuffles n, and an axis (axis=0 is rows, axis=1 is columns) and returns a copy of the dataframe that has been shuffled n times.
Edit: key is to do this without destroying the row/column labels of the dataframe. If you just shuffle df.index that loses all that information. I want the resulting df to be the same as the original except with the order of rows or order of columns different.
Edit2: My question was unclear. When I say shuffle the rows, I mean shuffle each row independently. So if you have two columns a and b, I want each row shuffled on its own, so that you don't have the same associations between a and b as you do if you just re-order each row as a whole. Something like:
for 1...n:
for each col in df: shuffle column
return new_df
But hopefully more efficient than naive looping. This does not work for me:
def shuffle(df, n, axis=0):
shuffled_df = df.copy()
for k in range(n):
shuffled_df.apply(np.random.shuffle(shuffled_df.values),axis=axis)
return shuffled_df
df = pandas.DataFrame({'A':range(10), 'B':range(10)})
shuffle(df, 5)
Use numpy's random.permuation function:
In [1]: df = pd.DataFrame({'A':range(10), 'B':range(10)})
In [2]: df
Out[2]:
A B
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
In [3]: df.reindex(np.random.permutation(df.index))
Out[3]:
A B
0 0 0
5 5 5
6 6 6
3 3 3
8 8 8
7 7 7
9 9 9
1 1 1
2 2 2
4 4 4
Sampling randomizes, so just sample the entire data frame.
df.sample(frac=1)
As #Corey Levinson notes, you have to be careful when you reassign:
df['column'] = df['column'].sample(frac=1).reset_index(drop=True)
In [16]: def shuffle(df, n=1, axis=0):
...: df = df.copy()
...: for _ in range(n):
...: df.apply(np.random.shuffle, axis=axis)
...: return df
...:
In [17]: df = pd.DataFrame({'A':range(10), 'B':range(10)})
In [18]: shuffle(df)
In [19]: df
Out[19]:
A B
0 8 5
1 1 7
2 7 3
3 6 2
4 3 4
5 0 1
6 9 0
7 4 6
8 2 8
9 5 9
You can use sklearn.utils.shuffle() (requires sklearn 0.16.1 or higher to support Pandas data frames):
# Generate data
import pandas as pd
df = pd.DataFrame({'A':range(5), 'B':range(5)})
print('df: {0}'.format(df))
# Shuffle Pandas data frame
import sklearn.utils
df = sklearn.utils.shuffle(df)
print('\n\ndf: {0}'.format(df))
outputs:
df: A B
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
df: A B
1 1 1
0 0 0
3 3 3
4 4 4
2 2 2
Then you can use df.reset_index() to reset the index column, if needs to be:
df = df.reset_index(drop=True)
print('\n\ndf: {0}'.format(df)
outputs:
df: A B
0 1 1
1 0 0
2 4 4
3 2 2
4 3 3
A simple solution in pandas is to use the sample method independently on each column. Use apply to iterate over each column:
df = pd.DataFrame({'a':[1,2,3,4,5,6], 'b':[1,2,3,4,5,6]})
df
a b
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
df.apply(lambda x: x.sample(frac=1).values)
a b
0 4 2
1 1 6
2 6 5
3 5 3
4 2 4
5 3 1
You must use .value so that you return a numpy array and not a Series, or else the returned Series will align to the original DataFrame not changing a thing:
df.apply(lambda x: x.sample(frac=1))
a b
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
From the docs use sample():
In [79]: s = pd.Series([0,1,2,3,4,5])
# When no arguments are passed, returns 1 row.
In [80]: s.sample()
Out[80]:
0 0
dtype: int64
# One may specify either a number of rows:
In [81]: s.sample(n=3)
Out[81]:
5 5
2 2
4 4
dtype: int64
# Or a fraction of the rows:
In [82]: s.sample(frac=0.5)
Out[82]:
5 5
4 4
1 1
dtype: int64
I resorted to adapting #root 's answer slightly and using the raw values directly. Of course, this means you lose the ability to do fancy indexing but it works perfectly for just shuffling the data.
In [1]: import numpy
In [2]: import pandas
In [3]: df = pandas.DataFrame({"A": range(10), "B": range(10)})
In [4]: %timeit df.apply(numpy.random.shuffle, axis=0)
1000 loops, best of 3: 406 µs per loop
In [5]: %%timeit
...: for view in numpy.rollaxis(df.values, 1):
...: numpy.random.shuffle(view)
...:
10000 loops, best of 3: 22.8 µs per loop
In [6]: %timeit df.apply(numpy.random.shuffle, axis=1)
1000 loops, best of 3: 746 µs per loop
In [7]: %%timeit
for view in numpy.rollaxis(df.values, 0):
numpy.random.shuffle(view)
...:
10000 loops, best of 3: 23.4 µs per loop
Note that numpy.rollaxis brings the specified axis to the first dimension and then let's us iterate over arrays with the remaining dimensions, i.e., if we want to shuffle along the first dimension (columns), we need to roll the second dimension to the front, so that we apply the shuffling to views over the first dimension.
In [8]: numpy.rollaxis(df, 0).shape
Out[8]: (10, 2) # we can iterate over 10 arrays with shape (2,) (rows)
In [9]: numpy.rollaxis(df, 1).shape
Out[9]: (2, 10) # we can iterate over 2 arrays with shape (10,) (columns)
Your final function then uses a trick to bring the result in line with the expectation for applying a function to an axis:
def shuffle(df, n=1, axis=0):
df = df.copy()
axis = int(not axis) # pandas.DataFrame is always 2D
for _ in range(n):
for view in numpy.rollaxis(df.values, axis):
numpy.random.shuffle(view)
return df
This might be more useful when you want your index shuffled.
def shuffle(df):
index = list(df.index)
random.shuffle(index)
df = df.ix[index]
df.reset_index()
return df
It selects new df using new index, then reset them.
I know the question is for a pandas df but in the case the shuffle occurs by row (column order changed, row order unchanged), then the columns names do not matter anymore and it could be interesting to use an np.array instead, then np.apply_along_axis() will be what you are looking for.
If that is acceptable then this would be helpful, note it is easy to switch the axis along which the data is shuffled.
If you panda data frame is named df, maybe you can:
get the values of the dataframe with values = df.values,
create an np.array from values
apply the method shown below to shuffle the np.array by row or column
recreate a new (shuffled) pandas df from the shuffled np.array
Original array
a = np.array([[10, 11, 12], [20, 21, 22], [30, 31, 32],[40, 41, 42]])
print(a)
[[10 11 12]
[20 21 22]
[30 31 32]
[40 41 42]]
Keep row order, shuffle colums within each row
print(np.apply_along_axis(np.random.permutation, 1, a))
[[11 12 10]
[22 21 20]
[31 30 32]
[40 41 42]]
Keep colums order, shuffle rows within each column
print(np.apply_along_axis(np.random.permutation, 0, a))
[[40 41 32]
[20 31 42]
[10 11 12]
[30 21 22]]
Original array is unchanged
print(a)
[[10 11 12]
[20 21 22]
[30 31 32]
[40 41 42]]
Here is a work around I found if you want to only shuffle a subset of the DataFrame:
shuffle_to_index = 20
df = pd.concat([df.iloc[np.random.permutation(range(shuffle_to_index))], df.iloc[shuffle_to_index:]])
Suppose the address for A[10][10] is 40000, double takes 16 bytes, and byte addressing is used, what are the addresses for A[40, 50]?
I am just trying to calculate a simple point in a 2D and just wanted to double check that I plugged in the right values in the equation
BA + [n * (i - LBR) + (j - LBC)] * w
40000 +[10*(40-0)+(50-0)]*16
40000+[10*(40)+(50)]*16
40000+[900]*16 = 54400
Did I apply the formula correctly here? I wasn't sure if i plugged in the right values?
In C++ a 2d array is just an array of arrays, so in A the memory is used for
A[ 0][ 0] A[ 0][ 1] A[ 0][ 2] ... A[ 0][99]
A[ 1][ 0] A[ 1][ 1] A[ 1][ 2] ... A[ 1][99]
...
A[99][ 0] A[99][ 1] A[99][ 2] ... A[99][99]
where each row just follows the previous one in memory.
The address in of an element at (row, col) is
(unsigned char *)(&A[0][0]) + (row*row_size + col) * element_size
In your case you know that the element you are searching is 30 rows lower and 40 elements to the right of given element, therefore the address will be
40000 + ((40 - 10)*100 + (50 - 10)) * 16
totaling to 88640.
You can get to the same result by subtracting the relative address of element (10, 10) from the given address (to find the start of the array) and then by adding the relative address of (40, 50).
The answer is dependent on whether you are using row major ordering or column major ordering. In row major ordering the data is stored row wise. In column major ordering the data is stored column wise. Consider the following 2D array to be stored in memory,
11 22 33
44 55 66
77 88 99
In row major ordering the elements are stored contiguously as 11,22,33,44,55,66,77,88,99.
In column major ordering the the elements are stored contiguously 11,44,77,22,55,88,33,66,99.
The meaning of following equation:
BA + [n * (i - LBR) + (j - LBC)] * w
If you have an array A[n][n] and you know the address of entry A[LBR][LBC] as BA, then address of A[i][j] can be calculated as follows. Assuming n = 6,
00 01 02 03 04 05
10 11 12 13 14 15
20 21 22 23 24 25
30 31 32 33 34 35
40 41 42 43 44 45
50 51 52 53 54 55
Here suppose we know the address of A[2,1] = 1000. We need to calculate the address of A[4,2]. Now to reach [4,2] from [2,1], how many entries will we have to travel? Ofcourse as #Deepu specifies, we can do it two ways, either travel row-wise or column-wise. From the equation it appears that row-wise travel has been selected.
22 to 25 (4)
30 to 35 (6)
40 to 42.(3)
= 13 entries.
Hence address of A[4,2] = 1000 + 13*(numOfbytes per entry)
To verify with the equation,
i - LBR = 4 - 2 = 2.
j - LBC = 2 - 1 = 1.
Hence, n*( i - LBR ) + (j - LBC) = 6*2 + 1 = 13.
i'd like to be able to calculate the 'mean brightest point' in a line of pixels. It's for a primitive 3D scanner.
for testing i simply stepped through the pixels and if the current pixel is brighter than the one before, the brightest point of that line will be set to the current pixel. This of course gives very jittery results throughout the image(s).
i'd like to get the 'average center of the brightness' instead, if that makes sense.
has to be a common thing, i'm simply lacking the right words for a google search.
Calculate the intensity-weighted average of the offset.
Given your example's intensities (guessed) and offsets:
0 0 0 0 1 3 2 3 1 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
this would give you (5+3*6+2*7+3*8+9)/(1+3+2+3+1) = 7
You're looking for 1D Convolution which takes a filter with which you "convolve" the image. For example, you can use a Median filter (borrowing example from Wikipedia)
x = [2 80 6 3]
y[1] = Median[2 2 80] = 2
y[2] = Median[2 80 6] = Median[2 6 80] = 6
y[3] = Median[80 6 3] = Median[3 6 80] = 6
y[4] = Median[6 3 3] = Median[3 3 6] = 3
so
y = [2 6 6 3]
So here, the window size is 3 since you're looking at 3 pixels at a time and replacing the pixel around this window with the median. A window of 3 means, we look at the first pixel before and first pixel after the pixel we're currently evaluating, 5 would mean 2 pixels before and after, etc.
For a mean filter, you do the same thing except replace the pixel around the window with the average of all the values, i.e.
x = [2 80 6 3]
y[1] = Mean[2 2 80] = 28
y[2] = Mean[2 80 6] = 29.33
y[3] = Mean[80 6 3] = 29.667
y[4] = Mean[6 3 3] = 4
so
y = [28 29.33 29.667 4]
So for your problem, y[3] is the "mean brightest point".
Note how the borders are handled for y[1] (no pixels before it) and y[4] (no pixels after it)- this example "replicates" the pixel near the border. Therefore, we generally "pad" an image with replicated or constant borders, convolve the image and then remove those borders.
This is a standard operation which you'll find in many computational packages.
your problem is like finding the longest sequence problem. once you are able to determine a sequence( the starting point and the length), the all that remains is finding the median, which is the central element.
for finding the sequence, definition of bright and dark has to be present, either relative -> previous value or couple of previous values. absolute: a fixed threshold.