Making predictions in AutoML Natural Language UI - google-cloud-platform

I've been testing the Natural Language UI functionality and created a model for Single-Label classification. To train the model I used a csv with two columns, the first column has the text and the second has the label.
Then I get to the "Test & Use" tab to perform predictions. I upload a csv file into GS and when I try to select it I get the message that "Invalid file type, only following file types allowed: pdf, tif, tiff"
I was wondering whether I can use a csv file similar to when I trained the model.
Picture:

As per the documentation, you can't do batch prediction from the UI.
You need to use the API's batchPredict method to do so.
If you would like to use your model to do high-throughput asynchronous prediction on a corpus of documents you can use the batchPredict method. The batch prediction methods require you to specify input and output URIs that point to locations in Cloud Storage buckets.
The input URI points to a CSV or JSONL file, which specifies the content to analyze
Example:
gs://folder/text1.txt
gs://folder/text2.pdf

Related

Vertex AI batch predictions - getting feature names for DataFrame

I'm using batch predictions with custom trained models. Generally, one would want to write a line like,
df = pd.DataFrame(instances)
...as one of the first steps prior to doing any custom preprocessing of features. However, this doesn't work with batch predictions - the resulting DataFrame will not have column names as expected. It appears to be a numpy array.
Is there a decent or canonical approach to retrieving the feature (column) names, in case the table changes? (It's better not to assume that the table's columns and their positions all stay the same.)
I'm initiating the batch prediction job with the python client. I based my model off of this example.

How to Generalize an Informatica Mapping & Workflow?

I need to read a text file (JSON format) and load it into a database table in Informatica Developer. For only one text file and one database table, that is easy.
But now I have N different text files, hence N different database tables and their corresponding data processor transformations. The transformation logic inside the mappings is the same. Besides creating N sets of mappings and workflows for each set of text files, is it possible to create just one generalized mapping and workflow to cater for all text files? I would appreciate it if any one of you could give me a general direction for me to explore further.

How do I apply my model to a new dataset in WEKA?

I have created a new prediction model based on a dataset that was given to me. It predicts a nominal (binary) class attribute (positive/negative) based on a number of numerical attributes.
Now I have been asked to use this prediction model to predict classes for a new dataset. This dataset has all the same attributes except for the class column, which does not exist yet. How do I apply my model to this new data? I have tried adding an empty class column to my new dataset and then doing the following:
Simply loading the new dataset in WEKA's explorer and loading the model. It tells me there is no training data.
Opening my training set in WEKA's explorer and then opening my training model, then choosing my new data as a 'supplied test set'. It runs but does not output any predictions.
I should note that the model works fine when testing on the training data for cross validation. It also works fine with a subset of the training data I separated ages ago for test/eval use. I think it may be a problem with how I am adding a new class column, maybe?
For making predictions, Weka requires the two datasets, training and the one for making predictions, to have the exact same structure, down to the order of labels. That also means, that you need to have a class attribute with the correct labels present. In terms of values for your class attribute, simply use the missing value (denoted by a question mark).
See the FAQ How do i make predictions with a trained model? on the Weka wiki for more information on how to make predictions.

Can I use pre-labeled data in AWS SageMaker Ground Truth NER?

Let's say I have some text data that has already been labeled in SageMaker. This data could have either been labeled by humans or an ner model. Then let's say I want to have a human go back over the dataset, either to label new entity class or correct existing labels. How would I set up a labeling job to allow this? I tried using an output manifest from another labeling job, but all of the documents that were already labeled cannot be accessed by workers to re-label.
Yes, this is possible you are looking for Custom Labelling worklflows you can also apply either Majority Voting (MV) or MDS to evaluate the accuracy of the job

Does the Google BigQuery ML automatically makes the time series data stationary?

So I am a newbie to Google BigQuery ML and was wondering if the auto.arima automatically makes my time series data stationary ?
Suppose, I have a data that is not stationary and if I give the data as is to the auto arima model using Google BigQuery ML, will it first makes my data stationary before taking it as input ?
but that's part of the modeling procedure
From the documentation that explains What's inside a BigQuery ML time series model, it does appear that auto.ARIMA will make the data stationary.
However, I would not expect it to alter the source data table; it won't make that stationary, but in the course of building candidate models it may alter the input data prior to actual model fitting (transforms; e.g. box-cox, make stationary, etc.)