How to use weighted attributes in predicting by IBK? - weka

I am working on predicting a numeric valued column using IBK. I have two data set: train data set and test data set. Since my predicted values using the following code don't satisfied me I want to weight each column to make the predicted values more acceptable.
After searching, I found out that the only scheme in Weka that does take attribute weights into account is naive Bayes. I wonder is ther any way to weight attributes using naive Bayes and then use the output of naive Bayes (weighted attributes) in IBK?
try{
IBk knn = new IBk();
String[] options = new String[2];
options[0]= "-F";
options[1]= "-E";
knn.setOptions(options);
knn.setKNN(100);
knn.setCrossValidate(false);
trainData.setClassIndex(trainData.numAttributes()-2);
testData.setClassIndex(testData.numAttributes()-2);
knn.buildClassifier(trainData);
Evaluation eval = new Evaluation(trainData);
eval.evaluateModel(knn, testData);
System.out.println(eval.toSummaryString("\nResults\n\n", false));
for(int i=0; i<= testData.numInstances()-1; i++)
{
double c = knn.classifyInstance(testData.instance(i));
System.out.println(c);
}
}
catch (Exception ex) {
ex.printStackTrace();
}

Related

TFLite c++ determine the classification on output

I'm trying to get an output from a trained model which has a classification, the input node count is 1, and the output node count is 2. However, I'm not quite sure where the classification lands and how exactly do I handle it.
for(size_t idx = 0; idx < input_node_count; idx++)
{
float* data_ptr = interpreter->typed_input_tensor<float>(idx);
memcpy(data_ptr, my_input.data(), input_elem_size[idx]);
}
if (kTfLiteOk != interpreter->Invoke())
{
return false;
}
for(size_t idx = 0; idx < output_node_count; idx++)
{
float* output = interpreter->typed_output_tensor<float>(idx);
output_buffer[idx] = std::vector<float> (output,
output + output_elem_size[idx]);
}
result = output_buffer[1];
classification_result = output_buffer[0]; // Best way to approach this
As of now, I can just print out the sizes and see that result is 196.608 elements and classification_result is 2, as it should. My problem is I hard-coded this to be index 1 and 0 but this might not always be the case in my program which runs all sorts of models. So sometimes classification might be index 1, which causes the above code to fall apart.
I've tried to check the sizes of the buffers however that is also not guaranteed since the classification size and the result size is different for each input. Is there a way for me to know for certain which index is which? Am I approaching this the right way?
Use Tensorflow lite signatures for this. Signature defs can help you with that by accessing inputs/outputs using names defined in the original model.
See conversion and inference example here
Python example
# Load the TFLite model in TFLite Interpreter
interpreter = tf.lite.Interpreter(TFLITE_FILE_PATH)
# There is only 1 signature defined in the model,
# so it will return it by default.
# If there are multiple signatures then we can pass the name.
my_signature = interpreter.get_signature_runner()
# my_signature is callable with input as arguments.
output = my_signature(x=tf.constant([1.0], shape=(1,10), dtype=tf.float32))
# 'output' is dictionary with all outputs from the inference.
# In this case we have single output 'result'.
print(output['result'])
For C++
# To run
auto my_signature = interpreter_->GetSignatureRunner("my_signature");
# Set your inputs and allocate tensors
auto* input_tensor_a = my_signature->input_tensor("input_a");
...
# Execute
my_signature->Invoke();
# output
auto* output_tensor_x = my_signature->output_tensor("output_x");

How can I get my price vector to = values inside my json array

Hello I am pretty new to json parsing and parsing in general so I am wondering what is the best way I can assaign the correct values for the price of the underlying stock I am looking at. Below is an example of the code I am working with and comments next to them showing kinda what Im confused about
Json::Value chartData = IEX::stocks::chart(symbolSearched);
int n = 390;
QVector<double> time(n), price(n);
//Time and Date Setup
QDateTime start = QDateTime(QDate::currentDate());
QDateTime local = QDateTime::currentDateTime();
QDateTime UTC(local);
start.setTimeSpec(Qt::UTC);
double startTime = start.toTime_t();
double binSize = 3600*24;
time[0] = startTime;
price[0] = //First market price of the stock at market open (930AM)
for(int i = 0; i < n; i++)
{
time[i] = startTime + 3600*i;
price[i] = //Stores prices of specific company stock price all the way until 4:30PM(Market close)
}
the charData is the json output with all the data,
.
I am wondering how I can get the various values inside the json and store them, and also since its intraday data how can I get it where it doesnt store p[i] if there is no data yet due to it being early in the day, and what is the best way to update this every minute so it continously reads in real time data?
Hope I understood correctly (correct me if not) and you just want to save some subset of json data to your QVector. Just iterate through all json elements:
for (int idx = 0; index < chartData.size(); ++idx) {
time[idx] = convert2Timestamp(chartData[idx]["minute"]);
price[idx] = convert2Price(chartData[idx]["high"], chartData[idx]["low"],
chartData[idx]["open"], chartData[idx]["close"], chartData[idx]["average"]);
}
Then you should define what is the logic of convert2Timestamp (how would you like to store the time information) and the logic of convert2Price - how would you like to store the price info, only highest/lowest, only the closing value, maybe all of these numbers grouped together in a structure/class.
Then if you want to execute similar logic every minute to update your locally recorded data, maybe instead of price[idx] = /* something */ you should push additional items that are new to your vector.
If there is a possibility that some of the json keys might not exist, in JsonCPP you could provide a default value e.g. elem.get(KEY, DEFAULT_VAL).

How to write a unit test

everyone.
I'm new to unit testing and can't get the idea of it.
I have a module that have a process() function. Inside process() function module does a lot of non-trivial job. The task is to check module's output.
For example, there is a method calculateDistance() inside process() method that calculates distance value. Test requirement sounds like "check that module calculates distance..."
As I understand I have to:
Prepare input data
Call module's process() method
Calculate distance by hands
Get module's output value for distance
Compare this value to value calculated.
But it is easy for some trivial cases, for example some ariphmetical operations. But what if inside calculateDistance() there are lots of formulas. How should I calculate this distance's value? Should I just copy all source code from calculateDistance() function to the test's code to calculate this value from step 3?
Here is the source code example:
void process(PropertyList& propList)
{
m_properties = &propList;
...
for (int i = 0; i < m_properties.propertiesCount; i++)
{
calculateDistance();
}
}
void calculateDistance(int indx)
{
Property& property = m_properties->properties[indx];
property.distance = 0;
for (int i = 0; i < property.objectsCount - 1; i++)
{
property.distance += getDistanceBetweenObjs(i, i + 1);
}
}
int getDistanceBetweenObjects(indx1, indx2)
{
// here is some complex algorithm of calculating the distance
}
So, my question is: should I prepare input data where I know the resulting distance value and just compare this value to the output value? Or should I get input data structures, calculate distance value the same way as calculateDistance() method does (here is code duplication) and compare this value to the output distance?
But what if inside calculateSmth() there are lots of formulas.
Refactor the code and break out those formulas to separate functions and test them individually.

Iteratively prediction using caffe in a C++ project

I am a beginner to caffe
. Recently I have been learning how to use a pretrained caffe model to do some prediction in my own project,
and now I am trying to do a iteratively prediction while in each loop there will be a new data(input) and will be used to predict something.
I use memory data layer as my data input layer.
Before entering the loop, I will make some declaration
caffe::Datum datum;
datum.set_channels(1);
datum.set_height(1);
datum.set_width(30);
vector<float> mydata;
vector<caffe::Datum> dvector;
boost::shared_ptr<MemoryDataLayer<float> > memory_data_layer;
memory_data_layer = boost::static_pointer_cast<MemoryDataLayer<float>>(net.layer_by_name("datas"));
const boost::shared_ptr<Blob<float>> & blobs = net.blob_by_name("result");
const float* output = blobs->cpu_data();
And in each loop, "mydata" will get some new data and will be used for a new prediction.
Here is what I do in each loop
("mydata" updated)
datum.clear_data();
for(int i=0;i<30;i++)
datum.add_float_data(mydata[i]);
dvector.clear();
dvector.push_back(datum);
memory_data_layer->AddDatumVector(dvector);
float loss = 0.0;
net.Forward(&loss);
for (int i = 0; i < 10; i =++)
{
cout<< output[i] <<endl;
}
For the first loop, the result is correct.
But for the following loop, though "mydata" get the new data, the output remains unchanged, it still show the same result as the first loop.
Did I skip any necessary step?
How to fix it?
Thanks.
Solved by replacing
datum.clear_data();
to
datum.clear_float_data();
I think it is because float data take another memory space
so if i want to clear up the old float data
I need to clear the space for float data.

Select a record where value is max in CouchDB

I have a couchdb database with 156 json documents. I want to find the max of a certain value in the documents and then emit the whole document that contains that maximun value. I used this code but it doesnt seem to work. This may help you understand what i mean.
function(doc) {
var i,m;
for (i = 0; i < doc.watchers.length; i++)
m = Math.max(m,doc.watchers[i]);
for (i = 0; i < doc.watchers.length; i++)
if (m = doc.watchers[i])
emit(doc.watchers[i], doc.watchers);
}
I would also like to select the 2 top documents, that have the max value.
Just using a map function won't work because the way map/reduce works doesn't make available the complete set of keys and values to the map function at the same time.
I would use a reduce function for that. Something like this:
Map function
function(doc) {
var i,m;
for (i = 0; i < doc.watchers.length; i++) {
emit([doc._id, doc.watchers[i]], doc.watchers);
}
}
Reduce function
function(keys, values) {
var top=0;
var index=0;
for (i=0;i<keys.length;i++) {
if (keys[i][1] > top) {
top = keys[i][1];
index = i;
}
}
return({keys[index], values[index]})
}
For this strategy to work you need to query using group_level=1, so that couch passes the results grouped by document id to the map function.
I haven't tested this solution and it doesn't solve your second question yet. I can refine it if you think it goes the right way.