WSO2 ML Cross Validation and Grid Search - wso2

I would like to know if the WSO2 ML implement Cross-Validation and Grid Search for best model selection.

Presently, (as of version 1.1.0) WSO2 Machine Learner does not have a direct method for hyper-parameters optimization. As mentioned in your question, we are planning to include Random Search and Grid Search in one of the upcoming releases. In order to track the progress of this process, I have created a public JIRA [1]. So when the new feature is ready I will notify you via this SO Question.
Next, let me briefly describe cross-validation process we use in WSO2 Machine Learning server. In the third step of the ML Wizard of the ML Server, you can set the training data fraction (please see the attached screen shot).
So let's say you pick 0.7 of your data for training. Then, model building process will use 70% of your data for training and rest of the dataset (i.e. 30%) will be used for cross-validation. As you might recognize this a most basic approach for cross-validation and it is not particularly suitable for small datasets. So in upcoming releases, we are planning to include K-fold cross-validations [2] in addition to the currently available cross-validation method.
Yandi, if you need further help regarding this question or anything related to our product please let me know.
Thanks,
Upul
[1] https://wso2.org/jira/browse/ML-313
[2] https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation

Related

Example of running a ray.rllib model in a JAX environment?

I am trying to train a DQN agent in an environment coded in JAX, but the initialization of the trainer fails when it first tries to reset the environment (with a not-valid JAX type error). Before getting into the debugging process, I thought of looking for example projects but I cannot find anything so i am wondering whether interfacing them is not possible.
Currently there is no example in RLlib right out of the box that supports jax envs. I think the main functionalities should be there, it was just never tested with Jax envs to figure out the feature requirements and the missing parts. I encourage you to give it a try and create a feature request for it on ray's issue tracker on github and if you have bandwidth give it a try and make a contribution. You can also let the RL team know about the missing features for enabling RLlib to work on JAX envs.

Getting low accuracy on two fields after labelling using the tool, Form Recognizer, Custom Label

I need help with recognition of two particular fields- credit date and credit type. Getting low accuracy (training ~30%) after labelling and even lower on the test set (~10%).
I am using Custom Label API after labelling, tagging and training.
I think as these two fields appear at different places relative to other fields due to different number of entries in different receipts.
Is there anything I can do to improve these fields' accuracy.
Cognitive Services Form Recognizer service has added support for new and exciting features - multiple forms models (model compose), language expansion, pre-built business cards model, selection marks and lots more are now available in the Form Recognizer v2.1 release.
Form Recognizer sample labeling tool has been updated to support the new release functionality, see this quick start for getting started with custom train with labels.
Please find the snapshot for the JSON for the image that you are trying.

Amazon Machine Learning models rebuilding possibilities

There is only 2 kinds of in-built prediction/classification models in AWS Machine Learning. Logistic regression and linear regression. Is it possible somehow in current version of AWS ML to:
1) Re-build this what is under the hood of logistic and linear regression models
2) Build your own models written in Python/R, implement them on AWS ML and run things such as neural nets, random forests, clustering alghoritms?
In AWS ML Developer Guide latest version I could not find answers on those questions explicite, that it is impossible to do so. Any tips?
A bit of background first...
Amazon Machine Learning can build models for three kinds of machine learning problems (binary/multiclass classification & regression). As you previously mentioned, the model selected and trained by the platform is abstracted from the user.
This "black box" implementation is perhaps the largest deficiency of Amazon's machine learning platform. You have no information on what model or how the model is trained (beyond, for ex. linear regression, stochastic gradient descent). Amazon is quite clear that this is intentional, as they want the platform to be built into an application, and not just used to train models for one. See the 47:25 and 53:30 mark of this Q&A.
So, to answer your questions:
You cannot see how the exactly models have been trained, for example what constants in a linear regression (although you may be able to deduce by testing the model). When you query the model, the response includes a field which indicates the algorithm used for that particular model (for ex. SGD). A full list of learning algorithms can be found here.
Unfortunately not. You cannot create your own models and import them into AWS Machine Learning, meaning that no decision trees or neural network models can run on the platform.

Amazon Machine Learning for sentiment analysis

How flexible or supportive is the Amazon Machine Learning platform for sentiment analysis and text analytics?
You can build a good machine learning model for sentiment analysis using Amazon ML.
Here is a link to a github project that is doing just that: https://github.com/awslabs/machine-learning-samples/tree/master/social-media
Since the Amazon ML supports supervised learning as well as text as input attribute, you need to get a sample of data that was tagged and build the model with it.
The tagging can be based on Mechanical Turk, like in the example above, or using interns ("the summer is coming") to do that tagging for you. The benefit of having your specific tagging is that you can put your logic into the model. For example, the difference between "The beer was cold" or "The steak was cold", where one is positive and one was negative, is something that a generic system will find hard to learn.
You can also try to play with some sample data, from the project above or from this Kaggle competition for sentiment analysis on movie reviews: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews. I used Amazon ML on that data set and got fairly good results rather easily and quickly.
Note that you can also use the Amazon ML to run real-time predictions based on the model that you are building, and you can use it to respond immediately to negative (or positive) input. See more here: http://docs.aws.amazon.com/machine-learning/latest/dg/interpreting_predictions.html
It is great for starting out. Highly recommend you explore this as an option. However, realize the limitations:
you'll want to build a pipeline because models are immutable--you have to build a new model to incorporate new training data (or new hyperparameters, for that matter)
you are drastically limited in the tweakability of your system
it only does supervised learning
the target variable can't be other text, only a number, boolean or categorical value
you can't export the model and import it into another system if you want--the model is a black box
Benefits:
you don't have to run any infrastructure
it integrates with AWS data sources well
the UX is nice
the algorithms are chosen for you, so you can quickly test and see if it is a fit for your problem space.

GVNIX and spatial searchs

I have installed this great development tool and I´m testing how can I use spatial queries or customize other functions.
For example, in the petclinic-geo project create a new map that shows only the owners inside Valencia (area). I think there is no roo commands that can create spatial queries.
In this case, how can I create new custom functions ?, Do I need to remove Roo to do that or either codes can coexist?
Thanks
Sorry Allen, Currently this feature is not implemented on framework. But, in gvNIX 2.0 roadmap (current in definition), probably, will includes advanced search for geometrical entities.
Anyway, you could make push-in of generated .aj files to customize the data request used by map component to fit returned data to your requirements.
Good luck!