To whom it may concern,
I am pretty new to tensorflow. I am trying to solve the famous MNIST problem for CNN. But i have encountered difficulty when i have to resuffle the x_training data (which is a [40000, 28, 28, 1] shape data.
my code is as below:
x_train_final = tf.reshape(x_train_final, [-1, image_width, image_width, 1])
x_train_final = tf.cast(x_train_final, dtype=tf.float32)
perm = np.arange(num_training_example).astype(np.int32)
np.random.shuffle(perm)
x_train_final = x_train_final[perm]
Below errors happened:
ValueError: Shape must be rank 1 but is rank 2 for 'strided_slice_1371' (op: 'StridedSlice') with input shapes: [40000,28,28,1], [1,40000], [1,40000], [1].
Anyone can advise how can i work around this? Thanks.
I would suggest you to make use of scikit's shuffle function.
from sklearn.utils import shuffle
x_train_final = shuffle(x_train_final)
Also, you can pass in multiple arrays and shuffle function will reorganize(shuffle) the data in those multiple arrays maintaining same shuffling order in all those arrays. So with that, you can even pass in your label dataset as well.
Ex:
X_train, y_train = shuffle(X_train, y_train)
Related
I'm having difficulties setting up a NN in Keras. Please help me!
This is my code and I'm getting random values every time when I predict.
model = Sequential()
layer1 = Dense(5, input_shape = (5,))
model.add(layer1)
model.add(Activation('relu'))
layer2 = Dense(1)
model.add(layer2)
model.add(Activation('relu'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(xtrain, ytrain, verbose=1)
I have 5 input features and want to predict a single continuous value as an output
Input space have five features.
The problem was that i am getting random prediction at same input. Now, I have reach the solution. It is happening just because of that i am not doing the normalisation of features.
Thanks
From my point of view,
you are not giving your input shape correctly
layer1 = Dense(5, input_shape = (5,))
What is your actual input shape?
I am using python2.7 opencv library to calculate histograms of some images, all of the exact same size (cv2.calchist)
i have a need to do 2 things:
1. calculate the average of multiple images - multiple images who represent a similar object, and therefor i want to have a "representive" histogram of that object (if you have a better idea i am open to suggustions) for future comparisons.
2. store the histogram data in my mongo db for future comparisons (cv2 correlation)
the only code i see rellevant for the question is my histogram_comparison code:
def histogram_comparison(real, fake):
images = [real, fake]
index = []
for image in images:
image = image.decode('base64')
image = np.fromstring(image, dtype=np.uint8)
image = cv2.imdecode(image, 1)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
hist = cv2.calcHist([image], [0, 1, 2], None, [32, 32, 32],
[0, 256, 0, 256, 0, 256])
hist = cv2.normalize(hist).flatten()
index.append(hist)
result_dist = cv2.compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL)
return round(result_dist, 5)
taken from: http://www.pyimagesearch.com/2014/07/14/3-ways-compare-histograms-using-opencv-python/
i do realize that when using numpy's (or was it scipy?) histograms, there is an easy way to get the bins and average them, but then im not really sure how then to compare between histograms so i would rather stay with opencv
thanks in advance
Since OpenCV (since 2.2) natively uses numpy arrays and since len(images) is constant, you can get avg between all your histograms and stores in mongo by simply:
h, b = np.histogram(images, bins=[0, 256])
db.histograms.insert({hist:(h/len(images)), bins:b })
I do not know if it's exactly what you want, but I hope it helps! See ya!
I want to use a decision tree classifier in order to predict something.
As you can see here:
from sklearn import tree
sample1 = [120,1]
sample2 = [123,3]
features = [sample1,sample2]
labels = [0,1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
I have two samples:
Sample one: [120,1] which I labelled as 0
Sample two: [123,3] which I labelled as 1
So far so good.
But now, instead of this samples, I want to train using an array, something like:
features = [[120,120.2][1, 1.2]]
and the respective label for this sample is:
label = [1]
So my code should be:
from sklearn import tree
features = [[120,120.2][1, 1.2]]
labels = [1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
I'm getting the following error:
TypeError: list indices must be integers, not tuple
I understand that the classifier wants a list of integers, and not tuples.
And a solution may be:
features = [[120, 120.2, 1, 1.2]]
labels = [1]
But I don't want to mix up the data, since I have it separately into arrays.
Is there any way I can train my classifier with arrays of arrays of data?
Thanks
No you can't use this format with your data, you need to aggregate them in one array.
The expected shape is (n_samples, n_features).
It's even more logic, because an example is described by some features and by using the expected format it describes better your data.
In order to split my data into train and test data separately, I'm using
sklearn.cross_validation.train_test_split function.
When I supply my data and labels as list of lists to this function, it returns train and test data in two separate lists.
I want to get the indices of the train and test data elements from the original data list.
Can anyone help me out with this?
Thanks in advance
You can supply the index vector as an additional argument. Using the example from sklearn:
import numpy as np
from sklearn.cross_validation import train_test_split
X, y,indices = (0.1*np.arange(10)).reshape((5, 2)),range(10,15),range(5)
X_train, X_test, y_train, y_test,indices_train,indices_test = train_test_split(X, y,indices, test_size=0.33, random_state=42)
indices_train,indices_test
#([2, 0, 3], [1, 4])
Try the below solutions (depending on whether you have imbalance):
NUM_ROWS = train.shape[0]
TEST_SIZE = 0.3
indices = np.arange(NUM_ROWS)
# usual train-val split
train_idx, val_idx = train_test_split(indices, test_size=TEST_SIZE, train_size=None)
# stratified train-val split as per Response's proportion (if imbalance)
strat_train_idx, strat_val_idx = train_test_split(indices, test_size=TEST_SIZE, stratify=y)
I'm trying to move values from a numpy array to a NetCDF file, which I am creating. Currently I'm trying to find the best way to emulate 'fancy indexing' of numpy arrays when creating a netCDF file, but the two indexing systems don't match when the dataset only has two points.
import netCDF4
import numpy as np
rootgrp = netCDF4.Dataset('Test.nc','w',format='NETCDF4')
time = rootgrp.createDimension('time',None)
dim1 = rootgrp.createDimension('dim1',100)
dim2 = rootgrp.createDimension('dim2',100)
dim3 = rootgrp.createDimension('dim3',100)
ncVar = rootgrp.createVariable('ncVar','f4',('time','dim1','dim2','dim3'))
npArr = np.arange(0,10000)
npArr = np.reshape(npArr,(100,100))
So this works just fine:
x,y=np.array(([1,75,10,99],[40,88,19,2]))
ncVar[0,x,y,0] = npArr[x,y]
While this does not:
x,y=np.array(([1,75],[40,88]))
ncVar[0,x,y,0] = npArr[x,y]
These assignments are part of a dynamic loop that determines x,y to create values for ncVar at ~1000 time-steps
EDIT: the issue seems to be that the first case recognizes x,y as defining a series of pts, and so returns a [4,] size array (despite the documentation on netCDF4 'fancy indexing'), while the second interprets them combinatorially and so returns a [2,2] size array (as stated in the documentation). Has anyone run into this or found a workaround?