Can we give the test data, without labelling them?

Can we give the test data, without labelling them? - python-2.7

I came across this snippet in the Tensorflow documentation, MNIST For ML Beginners.
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
Now, I want to feed my own test images, without labelling them and would like the model to predict the labels, how do I achieve this?

Yes you can, but it would not be deep learning instead it would be clustering. ( Ex: K means Clustering )
Basic idea is like the following:
Create two placeholders for input and centroids
Decide a distance metric
Create graph
feed only dataset to run the graph

Related

Can I make Amazon SageMaker deliver a recommendation based on historic data instead of a probability score?

We have a huge set of data in CSV format, containing a few numeric elements, like this:
Year,BinaryDigit,NumberToPredict,JustANumber, ...other stuff
1954,1,762,16, ...other stuff
1965,0,142,16, ...other stuff
1977,1,172,16, ...other stuff
The thing here is that there is a strong correlation between the third column and the columns before that. So I have pre-processed the data and it's now available in a format I think is perfect:
1954,1,762
1965,0,142
1977,1,172
What I want is a predicition on the value in the third column, using the first two as input. So in the case above, I want the input 1965,0 to return 142. In real life this file is thousands of rows, but since there's a pattern, I'd like to retrieve the most possible value.
So far I've setup a train job on the CSV file using the Linear Learner algorithm, with the following settings:
label_size = 1
feature_dim = 2
predictor_type = regression
I've also created a model from it, and setup an endpoint. When I invoke it, I get a score in return.
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=payload)
My goal here is to get the third column prediction instead. How can I achieve that? I have read a lot of the documentation regarding this, but since I'm not very familiar with AWS, I might as well have used the wrong algorithms for what I am trying to do.
(Please feel free to edit this question to better suit AWS terminology)

For csv input, the label should be in the first column, as mentioned here: So you should preprocess your data to put the label (the column you want to predict) on the left.
Next, you need to decide whether this is a regression problem or a classification problem.
If you want to predict a number that's as close as possible to the true number, that's regression. For example, the truth might be 4, and the model might predict 4.15. If you need an integer prediction, you could round the model's output.
If you want the prediction to be one of a few categories, then you have a classification problem. For example, we might encode 'North America' = 0, 'Europe' = 1, 'Africa' = 2, and so on. In this case, a fractional prediction wouldn't make sense.
For regression, use 'predictor_type' = 'regressor' and for classification with more than 2 classes, use 'predictor_type' = 'multiclass_classifier' as documented here.
The output of regression will contain only a 'score' field, which is the model's prediction. The output of multiclass classification will contain a 'predicted_label' field, which is the model's prediction, as well as a 'score' field, which is a vector of probabilities representing the model's confidence. The index with the highest probability will be the one that's predicted as the 'predicted_label'. The output formats are documented here.

predictor_type = regression is not able to return the predicted label, according to
the linear-learner documentation:
For inference, the linear learner algorithm supports the application/json, application/x-recordio-protobuf, and text/csv formats. For binary classification models, it returns both the score and the predicted label. For regression, it returns only the score.
For more information on input and output file formats, see Linear
Learner Response Formats for inference, and the Linear Learner Sample
Notebooks.

Python Tensorflow Summarize Two Histograms Together

I was wondering if there is a way to summarize two histograms in tensorflow together and get something resembling the behavior of tf.summary.histogram. The reason is that I need to summarize batch logits for two different operations and I need the histograms to be "superimposed" so that I can compare their dynamics with respect to one another during training by only looking at the log file using Tensorboard.

Classify data using ten fold cross validation in weka

I am trying to learn Weka.I am using a data set which has three classes of activity. I am trying to build a classifier, use ten-fold cross validation and tabulate the accuracy. However i cant tell which data belongs to which class. How do i proceed? I am not sure how to upload the data set here.Any help would be appreciated.

In order to get results using a k-fold cross validation, your data points must have class labels. For instance, if I give you a set of data and ask you to classify them into three classes but I do not know the classes of the data points, when you classify them and return them back to me, how do I calculate your classification accuracy?

SGDClassifier with HashingVectorizer and TfidfTransformer

I would like to understand if it is possible to train an online SGDClassifier (with partial_fit) using HashingVectorizer and TfidfTransformer. Simply joining them in a Pipeline will not work as TfidfTransformer is stateful so that would break the online learning process. This post says it's not possible to use tf-idf in an online fashion but a comment on this post suggests that it may somehow be possible: "In particular if you use stateful transformers as TfidfTransformer you will need to do several passes on your data". Is that possible without loading the whole training set into memory? If so, how? If not, is there an alternative solution to combine HashingVectorizer with tf-idf on large datasets?

Is that possible without loading the whole training set into memory?
No. TfidfTransformer needs to have the entire X matrix in memory. You'll need to roll your own tf-idf estimator, use that to compute per-term document frequencies in one pass over the data, then do another pass to produce tf-idf features and fit a classifier to them.

CV2: preprossing images for machine learning

I planning to create svm with opencv2 machine learning libraries to process some images. I have done some digging on this site and I have found I need to convert the images into vectors and create a matrix out of these vectors. However I have found no information how to do that. Please help. Please also not that I am using python

probably all opencv ml algos want the following inputs:
a NxM trainData (float)Mat, that is composed like:
one row(N) per feature, where the feature size is M
a Nx1 array of labels(class ids) where each item is the label for the corresponding feature at index i
so, if you have a lot of 1d, flattened features, and corresponding labels you would:
# pseudocode
svm = cv2.SVM() # getting the params right is a science of its own..
traindata, trainlabels = [],[]
for i in (my trainig data ):
traindata.extend(feature) # again, 1 flattened array of numbers
trainlabels.append(label) # 1 class id for the feature above
# now train it:
svm.train(np.array(traindata), np.array(trainlabels))
# after that, we can go and predict labels from new test input,
# it will return the predicted label(same one you fed to the training before...)
p = svm.predict(test_feature)
also look here, please !

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Can we give the test data, without labelling them? - python-2.7

Yes you can, but it would not be deep learning instead it would be clustering. ( Ex: K means Clustering ) Basic idea is like the following: Create two placeholders for input and centroids Decide a distance metric Create graph feed only dataset to run the graph

Related

Can I make Amazon SageMaker deliver a recommendation based on historic data instead of a probability score?

Python Tensorflow Summarize Two Histograms Together

Classify data using ten fold cross validation in weka

SGDClassifier with HashingVectorizer and TfidfTransformer

CV2: preprossing images for machine learning

Categories

Resources