Interpretation of classification in Weka - weka

I would like to use Weka to solve my classification problem.
I have a set of instances of my training data. Lets say that the data looks like:
#relation Relation1
#attribute att1 {val11, val12}
#attribute att2 {val21, val22}
#attribute class {class1, class2, class3}
#data
val11, val21, class1
val11, val22, class2
val12, val21, class3
In my code I read the training set from the file. I train the J48 tree and try to classify an instance. However, I have no idea how to interpret the results of the classification.
My code is following:
try {
DataSource source = new DataSource("trainingset.arff");
Instances data = source.getDataSet();
if (data.classIndex() == -1) {
data.setClassIndex(data.numAttributes() - 1);
}
Instance xyz = new Instance(data.numAttributes());
xyz.setDataset(data);
xyz.setValue(data.attribute(0), "val11");
xyz.setValue(data.attribute(1), "val21");
String[] options = new String[1];
options[0] = "-U"; // unpruned tree
J48 tree = new J48(); // new instance of tree
tree.setOptions(options); // set the options
tree.buildClassifier(data); // build classifier
double[] distributionForInstance = tree.distributionForInstance(xyz);
System.out.println(distributionForInstance[0]);
System.out.println(distributionForInstance[1]);
System.out.println(distributionForInstance[2]);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
As an output I get:
0.3333333333333333
0.3333333333333333
0.3333333333333333
I also tried other way of classifying the instance:
double classifyInstance = tree.classifyInstance(xyz);
System.out.println(classifyInstance);
In this case the output is:
0.0
Could you explain how should I interpret the outputs from the distributionForInstance and classifyInstance methods?
My aim is to be able to create the classifier which would tell me to which class does the given instance belong.

Have a look at the javadoc. The distributionForInstance method returns an array with class membership probabilities (first element probability of instance being in first class etc) and classifyInstance returns the class (as an ID -- think index into array of class labels).

Use value method of Attribute to get class label:
double classifyInstance = tree.classifyInstance(xyz);
String classStr = data.classAttribute().value(classifyInstance);

Related

Create XML dataset with the same table name as initial data set in DBUnit?

I'm trying to create an initial DB state in DB Unit like this...
public function getDataSet() {
$primary = new \PHPUnit\DbUnit\DataSet\CompositeDataSet();
$fixturePaths = [
"test/Seeds/Upc/DB/UpcSelect.xml",
"test/Seeds/Generic/DB/ProductUpcSelect.xml"
];
foreach($fixturePaths as $fixturePath) {
$dataSet = $this->createXmlDataSet($fixturePath);
$primary->addDataSet($dataSet);
}
return $primary;
}
Then after my query I'm attempting to call this user-defined function...
protected function compareDatabase(String $seedPath, String $table) {
$expected = $this->createFlatXmlDataSet($seedPath)->getTable($table);
$result = $this->getConnection()->createQueryTable($table, "SELECT * FROM $table");
$this->assertTablesEqual($expected, $result);
}
The idea here is that I have an initial DB state, run my query, then compare the actual table state with the XML data set representing what I expect the table to look like. This process is described in PHPUnit's documentation for DBUnit but I keep having an exception thrown...
PHPUnit\DbUnit\InvalidArgumentException: There is already a table named upc with different table definition
Test example...
public function testDeleteByUpc() {
$mapper = new UpcMapper($this->getPdo());
$mapper->deleteByUpc("someUpcCode1");
$this->compareDatabase("test/Seeds/Upc/DB/UpcAfterDelete.xml", 'upc');
}
I seem to be following the docs...how is this supposed to be done?
This was actually unrelated to creating a second XML Dataset. This exception was thrown because the two fixtures I loaded in my getDataSet() method both had table definitions for upc.

Neo4j Spring Data Query Builder

Is there a way of dynamically building a cypher query using spring data neo4j?
I have a cypher query that filters my entities similar to this one:
#Query("MATCH (n:Product) WHERE n.name IN {0} return n")
findProductsWithNames(List<String> names);
#Query("MATCH (n:Product) return n")
findProductsWithNames();
When the names list is empty or null i just want to return all products. Therefore my service impl. checks the names array and calls the correct repository method. The given example is looks clean but it really gets ugly once the cypher statements are more complex and the code starts to repeat itself.
You can create your own dynamic Cypher queries and use Neo4jOperations to execute them. Here is it an example (with a query different from your OP) that I think can ilustrate how to do that:
#Autowired
Neo4jOperations template;
public User findBySocialUser(String providerId, String providerUserId) {
String query = "MATCH (n:SocialUser{providerId:{providerId}, providerUserId:{providerUserId}})<-[:HAS]-(user) RETURN user";
final Map<String, Object> paramsMap = ImmutableMap.<String, Object>builder().
put("providerId", providerId).
put("providerUserId", providerUserId).
build();
Map<String, Object> result = template.query(query, paramsMap).singleOrNull();
return (result == null) ? null : (User) template.getDefaultConverter().convert(result.get("user"), User.class);
}
Hope it helps
Handling paging is also possible this way:
#Test
#SuppressWarnings("unchecked")
public void testQueryBuilding() {
String query = "MATCH (n:Product) return n";
Result<Map<String, Object>> result = neo4jTemplate.query(query, Collections.emptyMap());
for (Map<String, Object> r : result.slice(1, 3)) {
Product product = (Product) neo4jTemplate.getDefaultConverter().convert(r.get("n"), Product.class);
System.out.println(product.getUuid());
}
}

how to declare" class hierarchy atrribute " in weka

I try to use Weka to create .arff file and run on CLUS.
But i have a problem with hierarchy atrribute.
#attribute 'class hierarchical' {Dummy,Top/Arts/Animation,Top/Arts}
I create .arff by this Code.
// 1. set up attributes
attributes = new FastVector();
// - numeric
int NumericAttSize=0;
for(String word : ListOfWord)
{
if(word.length()>1)
{
attributes.addElement(new Attribute(word));
NumericAttSize++;
}
}
// - nominal
attVals = new FastVector();
attVals.addElement("Dummy");
for (String branch : ListOfBranch)
{
attVals.addElement(branch);
}
attributes.addElement(new Attribute("class hierarchical", attVals));
// 2. create Instances object
dataSet = new Instances("training", attributes, 0);
// 3. fill with data
for(String DocID : indexTFIDF.keySet())
{
values = new double[dataSet.numAttributes()];
for(String word : ListOfWord)
{
int index = ListOfWord.indexOf(word);
if(indexTFIDF.get(DocID).containsKey(word))
values[index]=indexTFIDF.get(DocID).get(word);
}
String Branch = DocDetail.get(DocID).get("1");
values[NumericAttSize]= ListOfBranch.indexOf(Branch)+1;
dataSet.add(new Instance(1.0,values));
}
ArffSaver arffSaverInstance = new ArffSaver();
arffSaverInstance.setInstances(dataSet);
arffSaverInstance.setFile(new File("training.arff"));
arffSaverInstance.writeBatch();
then when I run "training.arff" in CLUS, I got this error message:
Error: Classes value not in tree hierarchy: Top/Arts/Animation (lookup: Animation, term: Top/Arts, subterms: Animation})
I think the problem is how i declare hierarchical attribute as a nominal attribute, but I have no other ideas how to declare this attribute.
Every suggestion would be helpful. Thanks in advance.
According to an example in the Clus manual (which is in this zip in /Clus/docs/clus-manual.pdf) a hierarchical attribute should be formatted as follows:
#ATTRIBUTE class hierarchical rec/sport/swim,rec/sport/run,rec/auto,alt/atheism
So in your case you should remove the quotes around 'class hierarchical' and remove the curly braces {} around your values resulting in:
#ATTRIBUTE class hierarchical Dummy,Top/Arts/Animation,Top/Arts
Also, if you have multi-label data (i.e., multiple labels per data sample), then you can separate multiple hierarchical values using #, as follows:
#DATA
1,...,1,rec/sport/run#rec/sport/swim

Same Instances header ( arff ) for all my database queries

I am using InstanceQuery , SQL queries, to construct my Instances. But my query results does not come in the same order always as it is normal in SQL.
Beacuse of this Instances constucted from different SQL has different headers. A simple example can be seen below. I suspect my results changes because of this behavior.
Header 1
#attribute duration numeric
#attribute protocol_type {tcp,udp}
#attribute service {http,domain_u}
#attribute flag {SF}
Header 2
#attribute duration numeric
#attribute protocol_type {tcp}
#attribute service {pm_dump,pop_2,pop_3}
#attribute flag {SF,S0,SH}
My question is : How can I give correct header information to Instance construction.
Is something like below workflow is possible?
get pre-prepared header information from arff file or another place.
give instance construction this header information
call sql function and get Instances (header + data)
I am using following sql function to get instances from database.
public static Instances getInstanceDataFromDatabase(String pSql
,String pInstanceRelationName){
try {
DatabaseUtils utils = new DatabaseUtils();
InstanceQuery query = new InstanceQuery();
query.setUsername(username);
query.setPassword(password);
query.setQuery(pSql);
Instances data = query.retrieveInstances();
data.setRelationName(pInstanceRelationName);
if (data.classIndex() == -1)
{
data.setClassIndex(data.numAttributes() - 1);
}
return data;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
I tried various approaches to my problem. But it seems that weka internal API does not allow solution to this problem right now. I modified weka.core.Instances append command line code for my purposes. This code is also given in this answer
According to this, here is my solution. I created a SampleWithKnownHeader.arff file , which contains correct header values. I read this file with following code.
public static Instances getSampleInstances() {
Instances data = null;
try {
BufferedReader reader = new BufferedReader(new FileReader(
"datas\\SampleWithKnownHeader.arff"));
data = new Instances(reader);
reader.close();
// setting class attribute
data.setClassIndex(data.numAttributes() - 1);
}
catch (Exception e) {
throw new RuntimeException(e);
}
return data;
}
After that , I use following code to create instances. I had to use StringBuilder and string values of instance, then I save corresponding string to file.
public static void main(String[] args) {
Instances SampleInstance = MyUtilsForWeka.getSampleInstances();
DataSource source1 = new DataSource(SampleInstance);
Instances data2 = InstancesFromDatabase
.getInstanceDataFromDatabase(DatabaseQueries.WEKALIST_QUESTION1);
MyUtilsForWeka.saveInstancesToFile(data2, "fromDatabase.arff");
DataSource source2 = new DataSource(data2);
Instances structure1;
Instances structure2;
StringBuilder sb = new StringBuilder();
try {
structure1 = source1.getStructure();
sb.append(structure1);
structure2 = source2.getStructure();
while (source2.hasMoreElements(structure2)) {
String elementAsString = source2.nextElement(structure2)
.toString();
sb.append(elementAsString);
sb.append("\n");
}
} catch (Exception ex) {
throw new RuntimeException(ex);
}
MyUtilsForWeka.saveInstancesToFile(sb.toString(), "combined.arff");
}
My save instances to file code is as below.
public static void saveInstancesToFile(String contents,String filename) {
FileWriter fstream;
try {
fstream = new FileWriter(filename);
BufferedWriter out = new BufferedWriter(fstream);
out.write(contents);
out.close();
} catch (Exception ex) {
throw new RuntimeException(ex);
}
This solves my problem but I wonder if more elegant solution exists.
I solved a similar problem with the Add filter that allows adding attributes to Instances. You need to add a correct Attibute with proper list of values to both datasets (in my case - to test dataset only):
Load train and test data:
/* "train" contains labels and data */
/* "test" contains data only */
CSVLoader csvLoader = new CSVLoader();
csvLoader.setFile(new File(trainFile));
Instances training = csvLoader.getDataSet();
csvLoader.reset();
csvLoader.setFile(new File(predictFile));
Instances test = csvLoader.getDataSet();
Set a new attribute with Add filter:
Add add = new Add();
/* the name of the attribute must be the same as in "train"*/
add.setAttributeName(training.attribute(0).name());
/* getValues returns a String with comma-separated values of the attribute */
add.setNominalLabels(getValues(training.attribute(0)));
/* put the new attribute to the 1st position, the same as in "train"*/
add.setAttributeIndex("1");
add.setInputFormat(test);
/* result - a compatible with "train" dataset */
test = Filter.useFilter(test, add);
As a result, the headers of both "train" and "test" are the same (compatible for Weka machine learning)

How to get the first member of the related collection in JPQL

I have Product table which has a related table Images with a relation 1:M.
Class Product {
private Integer productId;
private String productName;
....
....
....
private List<Image> productImageList;
....
....
....
}
Class Image{
private Integer imageId;
private String imageName;
}
Class ProductLite{
private Integer productId;
private String productName;
private String imageName;
}
I am trying a JPQL query where I want to query to fetch products and the first image from the productImageList and returning a ProductLite object using the new constructor.
#TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
public List<ProductLite> getAllProductLite() {
Query q = em.createQuery("SELECT NEW com.mycomp.application.entity.ProductLite(p.productId, p.productName, p.productImageList.get(0).getImageName())"
+ " from Product p"
+ " ORDER by p.productName");
List<ProductLite> prods = q.getResultList();
return prods;
}
But for some reason I am not able to get it to work. I get a NoViableException. So I tried moving the logic of getting the first image (getImage() method) to the Product Entity so in the query I could just call the getImage(). Even that does not seem to work.
java.lang.IllegalArgumentException: An exception occurred while creating a query in EntityManager:
Exception Description: Syntax error parsing the query [SELECT NEW com.meera.application.entity.ProductLite(distinct p.productId, p.productName, p.getImage()) from Product p, IN(p.productImageList) pil where p.category.categoryCode = :categoryCode ORDER by p.productName ], line 1, column 52: unexpected token [distinct].
Internal Exception: NoViableAltException(23#[452:1: constructorItem returns [Object node] : (n= scalarExpression | n= aggregateExpression );])
Any help is appreciated.
First, you cannot call methods in entity class from your JP QL query. Second, to use the order of entities in list, you need persisted order.
To create column for order to the join table between image and product, you have to add
#OrderColumn-annotation to the productImageList. For example:
#OrderColumn(name = "myimage_order")
//or dont't define name and let it default to productImageList_order
#OneToMany
private List<Image> productImageList;
Then you have to modify query to use that order to choose only first image:
SELECT NEW com.mycomp.application.entity.ProductLite(
p.productId, p.productName, pil.imageName)
FROM Product p JOIN p.productImageList pil
WHERE INDEX(pil) = 0
ORDER by p.productName