Create arff training and test files for weka - weka

Good day.
An apology for my English but it's not my native language so they know apologize for any errors.
I have a text file with the data processing which want to get a .arff file, which is the file type using weka.
I do not want to generate a single file. I get 2 files, one for training the model (training) and another to test the model (test).
This is done directly weka eh applying a filter stringtowordtokenizer but the problem is that when you use the second file to test a mistake because it is not fair test a model that has words that should not be.
If someone helps me I would appreciate.
Thank you and best regards.

Related

Converting RTF tables into xml

I was given the task to convert great amount of RTF tables into XML ones (around or way more than 100.000), but I have no idea how to even start it and i cannot get help from the lead developer, because ironically he had never written a line of code.
I was thinking about c++ as I need t to be fast, but I'm open to any ideas.
What I need is some information I can start the project with or any library/program I could use for my help, thank you.
EDIT: I have XSD schemas to work with.
Found the solution after looking for a while. I can use LibreOffice to save it as html or other various forms that will keep the table as it is and also give a clear code i can pull an XSD on to make it valid also.

Tensorflow graph use questions

I've trained a little dataset of car with inception.
Now, I Have a meta file, a ckpt file then a pbtxt file.
And now I want to know how to make prediction with it..
I tried to use freeze_graph.py but it ask an output_node_names parameter and I absolutely don't know what it could be.
If you know how I could use my ckpt/meta/pbtxt to do prediction or how to freeze my graph with freeze_graph to use classify.py I would be very thanksful!
Thanks in advance!

Converting source code directory into ARFF (WEKA)

Currently, i am working on a project using WEKA. Being naive and newbie in it, there are many things which i am not familair with. In my last project I used text files as a classification using WEKA. I applied the TextDirectoryLoader convertor to convert a directory containing text files as mentioned on this URL Text categorization with WEKA. Now I want to use the same stretagy for converting a directory containing source code (instead of text). For example, I have a Jedit source file containing Java source code. I am trying to convert it to ARFF file so that i can apply classifiers or other functions present in WEKA on that ARFF file for data mining purposes. I have also tried a test file given on following URL ARFF files from Text Collections. I believe i can use the same file as an example to convert source code files. However, I do not know what attributes should I define in a FastVector? and What format should the data be in (String or numeric). And what other sections should an ARFF file may have?
As in the example the authors have defined following attributes
FastVector atts = new FastVector(2);
atts.addElement(new Attribute("filename", (FastVector) null));
atts.addElement(new Attribute("contents", (FastVector) null));
I have tried to find some examples on Google but no success.
Could anyone here suggests me any solution or alternate to solve the above said problem? (Example code will be highly appreciated).
Or atleast could give me a short example which convertes a source code directory into an ARFF file. (If it is possible).
If not possible what could be the possible reason
Any alternate solution (except WEKA) where I can use the same set of functions on a source code.
It is not clear, what is your goal? Do you want to classify the source code files, or find the files which are contains any bug, or what?
As I imagine, you want to extract features from each source file, and represent it with an instance. Then you can apply any machine learning based algorithm.
Here, you can find a java example, how can you construct an arff file from java:
https://weka.wikispaces.com/Creating+an+ARFF+file
But, you have to define your task specific features and extract it from each source code files.

Weka ,Text Classification on an arff file

.This is a basic question .I am trying to classify text files into 20 different classes.
Therefore I have a project structure with a folder called train,test.
In the train folder I have 20 different folders ,each folder again has many files related to that particular class.ex:weather, atheism...etc
I have now created a train.arff file for the entire train folder.When the data is visualized through I can see only two attributes .
Have provided a link below:
Screen in weka
My Doubt is how can i view the various files under these folders and remove the stopwords,punctuation,stemmin.How do I go about preprocessing.If some links to good resources are available please suggest and provide the necessary links
I found the videos below quite helpful when I first got my hands on text classification using Weka. You might want to take a look.
Weka Tutorial 31: Document Classification 1 (Application)
Weka Tutorial 32: Document classification 2 (Application)
WEKA Text Classification for First Time & Beginner Users
You might want to use StringToWordVector filter to see the effect of each word as an attribute, which is indeed described in detail in the first and last video . Within the filter settings you can give a stopwords list and choose in each run to use it or not. Same with the stemming you can change it as well. This documentation and videos will get you to understand it easily.

WEKA: Classifying an ARFF data with a given SMO model

I'm new with weka and this is my problem:
I've a unlabeled arff data and a given SMO model; I need classify this data with that model.
I searched examples, but all of them use a testing set to build classifier and I've not testing sets.
I need get classification with java or weka command line.
I tryed (under linux) command like:
java weka.classifiers.functions.SMO -l /path/of/mymodel/SMOModel.model -T /path/pf/myunlabeledarff/unlabeled.arff
but I get several errors :S
Can someone help me?
Thanks a lot
Documentation showing that the -l flag works is here: http://weka.wikispaces.com/Primer. That documentation also indicates that your syntax is correct, and that what you are trying to do is possible.
You say that the data is unlabeled: this can cause errors if the arff file you are using to predict does not match the format of the arff file which was used to create the model. Make sure that the arff header has the class attribute declared in it, and that every instance (row) in the file has a class value in it (even if the value is a ? to indicate unknown). Otherwise the formats won't match, and the classifier won't work.
Please post your error messages if this does not solve the problem.