Finding out wrongly classified instances when using WEKA

Finding out wrongly classified instances when using WEKA - weka

I am using GUI version of WEKA and I am classifying using the Naive Bayes. Can anyone please let me know how to find out which instances are misclassified.

Go to classify tab in Weka explorer
Click more options...
Check output predictions
Click OK
Hope that helps.

I faced this very same problem earlier and I tackle it just fine now.
What I do, is the following:
Make one String attribute that assigns each instance a unique ID. I
have assigned the names of the documents to each of my instances.
Generate the WEKA supported .arff file.
Whenever you have to run a classifier on this .arff data, you will notice that you have to exclude the Instance ID attribute. If you don't do this, Weka will pop-up an error saying that the classifier cannot process String attributes. Instead of excluding, run filter StringToNominal on the InstanceID.
Now, as said by #Rushdi, click "More Options" on the classify tab.
Check Output predictions on the "Classifier Evaluation Options" pop-up.
Enter the Attribute number of the Instance ID in the "Output additional attributes" box.
Run the classifier on the whole data, excluding the Instance ID attribute. (Most classifiers have this as an option called "StartSet" in "Ranker" for example which I use along with SMO classifier.)
If you've done everything properly so far, you will see all the instances listed along with their real and predicted output values and also the Instance ID which can tell you exactly which documents were incorrectly classified.
Hope this helps someone.
Good Luck!

In your output there should be a incorrectly classified section with a number and percentage, that should be it. The red box in this image is what you're looking for. Edit: Original image source here

This works for me:
Decompile the official weka.jar
Search into the library the classification that you want to test for know how it works and determinate what instances are misclassified.

Related

AWS console search in array

I am looking for a string that I can paste into AWS search(searching volumes for example) in browser so I can look for volume names/id's either like "foo" or like "bar", so in essence "if in array" type of search.
I tried multiple approaches and fail on every one. Only thing that I can do is paste my string, hit enter, paste another one, hit enter, its feasible for 2 strings but not for 10.

TL;DR: \bFOO|\bBAR
If you click the "?" button in the search field (at least in EC2) you get a "search help" view describing advanced usages of the search bar:
This help page shows that you can use regular expressions. This one should work for your use case: \bFOO|\bBAR
Help window screenshot for reference:

Script to generate html Beyond Compare folder differences

I've found several ways to automate folder comparison using scripts in Beyond Compare, but none that produce the pretty html report created from Session>Folder Compare Report>View in browser.
Here is an example of what that looks like.
I would love to be able to find the script that gives me that html difference report.
Thanks!
This is what I am currently getting

load "C:\Users\UIDQ5763\Desktop\Enviornment.cpp" &
"C:\Users\UIDQ5763\Desktop\GreetingsConsoleApp"
folder-report layout:side-by-side options:display-all &
output-to:C:\Users\UIDQ5763\Report.html output-options:html-color
The documentation for Beyond Compare's scripting language is here. You were probably missing either layout:side-by-side, which gives the general display, or output-options:html-color which is required to get the correct HTML stylized output. You may want to change options:display-all to options:display-mismatches if you only want to see the differences, and you might want to add an expand all command immediately before the folder-report line if you want to see the subfolders recursively.'
The & characters shown in the sample are line continuation characters. Remove them if you don't need to wrap your lines.

WSDL Document generator wsdl-viewer

I am looking for a tool to generate documentation from a WSDL file. I have found wsdl-viewer.xsl which seems promissing. See https://code.google.com/p/wsdl-viewer/
However, I am seeing a limitation where references to complex data types are not mapped out (or explained). For example, say we have a createSnapshot() operation that creates a snapshot and returns an object representing an snapshot. Runing xsltproc(1) and using wsdl-viewer.xsl, the rendered documentation has an output section that describes the output as
Output: createSnapshotOut
parameter type ceateSnapshotResponse
snapshots type Snapshot
I'd like to be able to click on "Snapshot" and see the schema definition of Snapshot.
Is this possible? Perhaps I am not using xsltproc(1) correctly. Perhaps it is not able to find the xsd files. Here are some of the relevant files I have:
SnapshotMgmntProvider.wsdl
SnapshotMgmntProviderDefinitions.xsd
SnapshotMgmntProviderTypes.xsd
Thanks
Medi

How do I display a field name containing the substring OMIT in ApEx?

One of the fields in my database table is named DATEOFDISCHARGEFROMITU. In any report output, this displays as DATEOFDISCHARGEFRU. I've figured out that the missing characters form the word 'OMIT', which makes me think it's related to this old problem in a previous version of ApEx (I'm using version 4.1.)
Is there a way to display the whole field name in the report header when the field name contains the string 'OMIT'?
Note: Using html character codes will allow the field name to display properly, but then when the report is exported to CSV the character codes are of course shown instead of the full field name. I need a solution that works for exports as well as displaying onscreen.

Platforms (tested): Oracle Application Express (APEX), Version 4.0.2
Note: I am not sure how the linked OTN post is relevant to your problem aside from the coincidence that their file export contains the word "OMIT" and your column title contains the word "OMIT".
It's safe to say that "OMIT" isn't an APEX or ORACLE reserved word that is sabotaging your output. However, if you were talking about a scrap of SQL that attempted to create a table named "SELECT" or "WHERE"
i.e., SELECT * FROM "SELECT" WHERE...
you'll be blocked by the RDBMS from proceeding. :)
I tried an export with a query that contained a column header labeled "OMIT" (see the far right in the example.) The .csv file interpreted by Microsoft Excel looked like this:
I wrote up a separate Q&A post about creating dynamic APEX report headers to answer your follow-on question about a suitable solution for providing a clean, htmlcode-free output when a report is eventually exported to a text, comma separated (or other delimited) output.
In summary, the linked post suggests to set up a dynamic PL/SQL Function within a page item. The page item can be referenced directly in the report column header definition. This is a screenshot demonstrating a possible solution:
The link to the general explanation has more details on the APEX design tasks that gets to this final product.
Onward.

I solved this by using this solution for exporting to csv without an enclosing quote character - as that was another challenge I was faced with for the particular application I was developing. By manually creating the export file I was also able to define the column headings exactly, and the "OMIT" issue did not occur.
Technically that's not a solution for displaying a report with the required headings that can also be exported (Richard's response does that) but it does what I need it to and solves the immediate problem of the DATEOFDISCHARGEFROMITU column heading.

What would be the SQL script to generate Sitecore Broken Link Report

I'm helping our Editors to clean up broken links and have been looking for answers to the following:
The Broken Link Report cannot be exported or sorted so it's not very useful (we have many broken links ~2000). Is there a SQL script that I can run to create the same report?
If an Editor fix a link, Rerun the report doesn't seem to take the item off the report. Does she have to Rebuild Link Database every time?
The Links button in the menu is helpful, but it is listing All Versions of referrers. Is there a SQL script to find only the lastest version?
When delete or archive an item and let Sitecore remove broke links, will all the affected items be published?
We are dealing with a large report (~2000 items) due to not maintaining it diligently. The goal is to reduce the number to 100~200 and keep it under control from now on. Any general advice on how to clean up broken links report is appreciated.

For your first (and partly third) questions:
In the Core database you can check what gets executed on the click of the Broken Link Report (the item that defines it is located in : /sitecore/content/Documents and settings/All users/Start menu/Right/Reporting Tools/Scan for Broken Links.
The application that gets started is /Applications/Tools/Broken Links.aspx, so if we look at *webroot*/sitecore/shell/Applications/Tools/Broken Links/Broken links.xml, we can see that the code used for it is Sitecore.Shell.Applications.Tools.BrokenLinks.BrokenLinksForm in the Sitecore.Client assembly.
Using Reflector you can see what it's executing. For your requirements, what I would say would be the easiest is to create your own version of the BrokenLinksForm, possibly simply adding an export functionality on it, or modify the code so it only takes the latest version. From looking at it very quickly I think the code to change (which is actually in the nested Scanner class) is:
...
foreach (ItemLink link in Globals.LinkDatabase.GetBrokenLinks(database))
{
list.Add(link);
}
...
You could possibly check whether the link item is the latest version, possibly by using something like
...
var version = link.GetSourceItem();
if (version.Versions.GetLatestVersion().Version == link.SourceItemVersion)
{
list.Add(link);
}
...
While you're at it you could of course also put in some sorting functionality :-)
It doesn't translate 1-on-1 with the Links button in the menu, but it should give you some pointers in the right direction.
As to your 2nd question: I believe that yes, the Link database does need to be rebuilt. I don't know if Sitecore has a schedule set up by default, but you could create your own agent in the <scheduling> node in the web.config to do this after X time.
Your last question: If you delete or archive an item and have Sitecore remove the broken links the affected items will, by default, not be published. If you have an auto-publish set up it'll show up of course.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding out wrongly classified instances when using WEKA - weka

I am using GUI version of WEKA and I am classifying using the Naive Bayes. Can anyone please let me know how to find out which instances are misclassified.

Go to classify tab in Weka explorer Click more options... Check output predictions Click OK Hope that helps.

In your output there should be a incorrectly classified section with a number and percentage, that should be it. The red box in this image is what you're looking for. Edit: Original image source here

This works for me: Decompile the official weka.jar Search into the library the classification that you want to test for know how it works and determinate what instances are misclassified.

Related

AWS console search in array

Script to generate html Beyond Compare folder differences

WSDL Document generator wsdl-viewer

How do I display a field name containing the substring OMIT in ApEx?

What would be the SQL script to generate Sitecore Broken Link Report

Categories

Resources