Stepwise regression in Google BigQuery - google-cloud-platform

How to perform stepwise regression in GCP BigQueryML? The purpose is to identify which variables are significant and should be taken into consideration for creating statistical models.
Could not find any documentation on GCP.

You can see the BQML explanation with the command ML.GLOBAL_EXPLAIN which is documented here
For each feature, you have an attribution value that explain the influence of the feature on the model inference/prediction.

Related

How can I resolve imbalanced datasets for AutoML classification on GCP?

I am planning to use AutoML for the classification of my tabular data. But there is a moderate imbalance in the target variable.
When running my own model, I would either upsample, downsample or build synthetic samples to resolve the imbalance.
Is there such a possibility on AutoML on GCP? If not, how can one resolve such cases?
Auto ML Tabular Data Classification
AutoML Tables is a supervised learning service. This means that you train a machine learning model with example data. In general, the more training examples you have, the better your outcome. The amount of example data required also scales with the complexity of the problem you're trying to solve. See guide on number of data to use.
So with regards to the imbalance in your dataset, the only way to resolve this case is to adjust the data (add or remove samples) for you to achieve optimal results.
For more information you can refer to AutoML Tables guide.

MLOps monitoring with quicksight

We currently have 3 machine learning models in production in our team (2 classifiers & one time-series). Sagemaker studio with Sagemaker model monitoring wasn't the right option for us because of our CICD architecture. So now we have an ECS container with our models for predictions.
We now want to apply proper model monitoring to our model. My idea is to store ground truth and prediction data in s3 and apply quicksight for monitoring to this via Athena.
My question is:
Is this a good way of doing this? Can we apply the right metrics this way?
So the long and the short it is, no one can give you a complete answer because this is a vast and wide industry level problem, and you should know that. You need to learn how it works, in general, to figure out how and what to implement for a given use case + desired perf metrics + distance metrics (drift) + tech stack.
You will have to decipher and learn the code examples and article below, then reimplement and refactor for your use case.
1. Code:
https://github.com/graviraja/MLOps-Basics/tree/main/week_9_monitoring
2. Article:
https://www.ravirajag.dev/blog/mlops-serverless
3. GitHub/Sagemaker: Model monitoring with your own container:
https://github.com/aws-samples/sagemaker-model-monitor-bring-your-own-container
4. GitHub/Sagemaker: Visualize model monitoring data:
https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_model_monitor/visualization/SageMaker-Model-Monitor-Visualize.ipynb

Google Professional Cloud Architect Exam - BigQuery vs Cloud Spanner

Recently I cleared the google cloud PCA exam but want to clarify one question which I have doubt.
" You are tasked with building online analytical processing (OLAP) marketing analytics and reporting tools. This requires a relational database that can operate on hundreds of terabytes of data. What is the Google-recommended tool for such applications?"
What is the answer? Is it Bigquery or cloud spanner? as there are 2 parts in question. If we consider it for OLAP then it is Bigquery and for 2nd part for RDBMS it should be Cloud Spanner.
Appreciate it if I can have some clarification.
Thanks
For Online Analytical Processing (OLAP) databases, consider using BigQuery.
When performing OLAP operations on normalized tables, multiple tables have to be JOINed to perform the required aggregations. JOINs are possible with BigQuery and sometimes recommended on small tables.
You can check this documentation for further information.
BigQuery for OLAP and Google Cloud Spanner for OLTP.
Please check this other page for more information about it.
I agree that the question is confusing.
But according to the official documentation :
Other storage and database options
If you need interactive querying in an online analytical processing
(OLAP) system, consider BigQuery.
However BigQuery is not considered relational database.
The big query does not provide you relationship between tables but you can join them freely.
If your performance falls cluster then partition on the joining fields.
Is it possible to create relationships between tables?
Some more literature if some want to go into the details.
By using MapReduce, enterprises can cost-effectively apply parallel
data processing on their Big Data in a highly scalable manner, without
bearing the burden of designing a large distributed computing cluster
from scratch or purchasing expensive high-end relational database
solutions or appliances.
https://cloud.google.com/files/BigQueryTechnicalWP.pdf
Hence Bigquery

GCP AutoML Vision - How to count the number of annotations each of my team members makes in GCP AutoML Vision Annotation Tool using the Web UI?

We are automating the process of our deep learning project. Images are automatically uploaded to a dataset in AutoML Vision (Object detection) in the Google Cloud Platform. We have a couple of team members who regularly annotate the uploaded images by using the provided Annotation Tool in Web UI. We need to measure the productivity of our team members by counting the annotations they make for each of them. I haven't found an efficient solution yet. I would appreciate it if you could share your ideas.
There is not a feature to identify who annotated which images; however, the approach I can think of is that you can split the work between your team members and distribute the labels that each one should annotate. Then you can simply count the number annotations for each label. For instance, in from this guide you can give Baked Goods and Cheese to one collaborator and Salad and Seafood to another one, and so on, so that you can check the totals in the UI. Even, the label statistics can give you more details of annotations for each label (hence for each team member), note that statistics are only available in AutoML Vision Object Detection UI.
An automated approach, in case you are interested in, is Human Labeling Service; according to documentation, currently, it is only available by email because of the Coronavirus (COVID-19) measures
If recommendations above don't fit your needs, you could always file a Feature Request for asking the desired functionality and add the required details.

Google Cloud AutoML Natural Language for Chatbot like application

I want to develop a chatbot like application which gives response to input questions using Google Cloud Platform.
Naturally, Dialogflow is suited for this such applications. But due to business conditions, I cannot use Dialogflow.
An alternative could be AutoML Natural Language, where I do not need much machine learning expertise.
AutoML Natural Language requires documents which are labelled. These documents can be used for training a model.
My example document:
What is cost of Swiss tour?
Estimate of Switzerland tour?
I would use a label such as Switzerland_Cost for this document.
Now, in my application I would have a mapping between Labels and Responses.
During Prediction, when I give an input question to the trained model, I would get a predicted label. I can then use this label to return the mapped response.
Is there a better approach to my scenario?
I'm from Automl team. This seems like a good approach to me. People use Automl NL for intent detection, which is pretty aligned with what you try to do here.