Vertex AI Endpoint Does Not Scale Up - google-cloud-platform

Any idea why Vertex AI is not scaling up?
To handle online predictions, I’ve deployed a custom trained model to an endpoint with the following configurations:
{
"dedicated_resources": {
"machine_spec": {
"machine_type": "n1-standard-8",
"accelerator_type": "NVIDIA_TESLA_P100",
"accelerator_count": 1
},
"min_replica_count": 3,
"max_replica_count": 6,
"autoscaling_metric_specs": [
{
"metric_name": "aiplatform.googleapis.com/prediction/online/cpu/utilization",
"target": 30
}
]
}
}
I ran traffic for over 20 minutes and the replica count / target remains at 3. I can observe in the Google console that the CPU usage on all nodes is roughly 60% (over the cpu target of 30%). In the model settings view, I can also confirm all of these settings did indeed register correctly.
Any thoughts here?

Related

AWS Glue Job parallel running got error "Rate exceeded" ThrottlingException Status Code: 400

I have a simple (just print hello) glue 2.0 job that runs in parallel, triggered from a step function map. Glue job Maximum concurrency is set to 40 and so as Step Funcitons Map's MaxConcurrency.
.
It runs fine if I kicked off under 20 parallel glue jobs but exceeding that (I tried max 35 parallel) I got intermittent errors like this:
Rate exceeded (Service: AWSGlue; Status Code: 400; Error Code:
ThrottlingException; Request ID: 0a350b23-2f75-4951-a643-20429799e8b5;
Proxy: null)
I've checked the service quotas documentation
https://docs.aws.amazon.com/general/latest/gr/glue.html and my account settings. 200 max should have handled my 35 parallel jobs happily.
There are no other Glue job scheduled to be run at the same time in my aws account.
Should I just blindly request to increase the quota and see it fixed or is there anything I can do to get around this?
Thanks to luk2302 and Robert for the suggestions.
Based on their advice, I reach to a solution.
Add a retry in the Glue Task. (I tried IntervalSeconds 1 and BackoffRate 1 but that's too low and didn't work)
"Resource": "arn:aws:states:::glue:startJobRun",
"Type": "Task",
"Retry": [
{
"ErrorEquals": [
"Glue.AWSGlueException"
],
"BackoffRate": 2,
"IntervalSeconds": 2,
"MaxAttempts": 3
}
]
Hope this helps someone.
The quota that you are hitting is not the concurrent job quota of Glue, but the Start Job Run API quota. You basically requested too many job runs per second. If possible just wait in between every Start Job Run call.

How to improve Airflow task concurrency

I have question about dag and task concurrency.
Scneario:
I have two DAG files
DAG 1 has only one task
DAG 2 has three tasks, From three, one task is calling third party API (Third party API response time is 900 miliseconds, it is simple weather API for showing current weather of provided city. e.g https://api.weatherapi.com/v1/current.json?key=api_key&q=Londodn ) and other 2 task are just for logs(print statment)
I trigger DAG 1 with the custom payloads having 1000 records
(
for e.g.
conf: {
[
{
"city": "London",
...
},
{
...
}
]
}
)
DAG 1 task just loop though the records and call the DAG 2 1000 times with individual record
So first, I want to ask here about this approach. Is this a good approach to call the list of data with 2 DAGs or is there any better way to do this?
My concern is it is taking 17 minutes for DAG 2 to process all 1000 execution
I am using Managed Workflow (AWS) configuration are as below:
Environment class: mw1.large
Scheduler count: 4
Maximum worker count: 25
Minimum worker count: 20
Region: us-west-2
core.max_active_runs_per_dag: 1000
core.max_active_tasks_per_dag: 5000
Default MWAA config for task as per aws documentation
(https://docs.aws.amazon.com/mwaa/latest/userguide/best-practices-tuning.html)
core.parallelism: 10000
core.dag_concurrency: 10000
Can anyone guide me how can I improve my AWS Managed Airflow performance to improve the parallelism of DAG run?
I want to understand the parallelism and concurrency settings if they are set to this high as above configs are then why it takes 17 minutes for Airflow to complete tasks?
Thanks!

Difference between AWS IoT shadow file desired and reported fields

I'm looking at the AWS IoT documentation for shadow states and trying to better understand the use of desired and reported in the shadow file.
The documentation states:
When the shadow's state changes, AWS IoT sends /delta messages to all MQTT subscribers with the difference between the desired and the reported states.
After looking through the rest of the documentation I don't feel like I have a clear grasp of the use case for desired vs reported. Can someone explain the use case? When do we use one vs. the other?
Let's start from the beginning, a device shadow is a persistent virtual shadow of a Thing defined in AWS IoT Registry. Basically, it’s a JSON State Document that is used to store and retrieve current state information for a Thing. You can interact with a Device Shadow using MQTT Topics or REST API calls. The main advantage of Shadows is that you can interact with it, regardless of whether the thing is connected to the Internet or not.
A shadow’s document contains a state property that describes aspects of the device’s state:
{
"state": {
"desired": {
"color": "RED"
},
"reported": {
"color": "GREEN"
},
"delta": {
"color": "RED"
}
}
}
Here's a description of each state:
Apps specify the desired states of device properties by updating the desired object.
Devices report their current state in the reported object.
AWS IoT reports differences between the desired and the reported state in the delta object.
Every shadow has a reserved MQTT Topic and HTTP URL that supports the get, update, and delete actions on the shadow. Let's take a look:
$aws/things/THING_NAME/shadow/update: publish to this Topic to update/create the Thing Shadow;
$aws/things/THING_NAME/shadow/update/accepted: AWS IoT publishes reported or desired portion of the State Document to this Topic, on accepting the update request;
$aws/things/THING_NAME/shadow/update/rejected: AWS IoT publishes an Error Message to this Topic when it rejects update request;
$aws/things/THING_NAME/shadow/update/documents: AWS IoT publishes a State Document with Previous and Current State information to this topic whenever an update to the shadow is successfully performed;
$aws/things/THING_NAME/shadow/update/delta: AWS IoT publishes a response Delta State Document to this topic when it accepts a change for the thing shadow and the request state document contains different values for desired and reported states.
Here's an example. Let's say that we have an air purifier and we want to change the fan speed. The flow will be the following:
User changes the fan speed from the air purifier mobile application
The mobile application publishes the following JSON message to this MQTT topic: $aws/things/THING_NAME/shadow/update to update the device shadow with a new desired state: "fanSpeed": 50. It will look like this:
{
"state": {
"desired": {
"fanSpeed": 50
}
}
}
On successful shadow update, if the previous reported state is different from "fanSpeed": 50, AWS IoT will publish desired state to delta topic $aws/things/THING_NAME/shadow/update/delta.
The shadow state document may look like this:
{
"state": {
"desired": {
"fanSpeed": 50
},
"reported": {
"fanSpeed": 100
},
"delta": {
"fanSpeed": 50
}
}
}
The device (our air purifier) that is subscribed to delta topic will perform the requested operation (set fan speed to 50 in this case), and report back the new state to AWS IoT Device Shadow, using Update Topic $aws/things/THING_NAME/shadow/update with following JSON message:
{
"state": {
"reported": {
"fanSpeed": 50
}
}
}
Now our air purifier has a fan speed of 50... and that's how it works ;)

AWS Elasticache backed by memcached CPU usage flat line at 1%

I've created an ElastiCache cluster in AWS, with node type as t3.micro (500 MB, 2 vCPUs and network up to 5 gigabit). My current setup is having 3 nodes for High Availability, each node is in a different AZ.
I'm using the AWS labs memcached client for Java (https://github.com/awslabs/aws-elasticache-cluster-client-memcached-for-java) that allows auto discovery of nodes, i.e. I only need to provide the cluster DNS record and the client will automatically discover all nodes within that cluster.
I intermittently get some timeout errors:
1) Error in custom provider, net.spy.memcached.OperationTimeoutException: Timeout waiting for value: waited 2,500 ms. Node status: Connection Status { /XXX.XX.XX.XXX:11211 active: false, authed: true, last read: 44,772 ms ago /XXX.XX.XX.XXX:11211 active: true, authed: true, last read: 4 ms ago /XXX.XX.XX.XXX:11211 active: true, authed: true, last read: 6 ms ago
I'm trying to understand what's the problem, but nothing really stands out by looking at the CloudWatch metrics.
The only thing that looks a bit weird is the CPU utilization graph:
The CPU always maxes out at 1% during peak hours, so I'm trying to understand how to read this value and whether this is not a 1% but more of a 100%, indicating that there's a bottleneck on the CPU.
Any help on this?
Just one question. Why are using such small instances? How is the memory use. My guess is the same as yours. The CPU is causing the trouble. 3 micro instances are not much.
I would try to increase the instances. But it is just a guess.

How can I avoid "IN_USED_ADDRESSES" error when starting multiple Dataflow jobs from the same template?

I have created a Dataflow template which allows me to import data from CSV file in Cloud Storage into BigQuery. I use Cloud Function for Firebase to create jobs from this template at certain time everyday. This is the code in the Function (with some irrelevant parts removed).
const filePath = object.name?.replace(".csv", "");
// Exit function if file changes are in temporary or staging folder
if (
filePath?.includes("staging") ||
filePath?.includes("temp") ||
filePath?.includes("templates")
)
return;
const dataflow = google.dataflow("v1b3");
const auth = await google.auth.getClient({
scopes: ["https://www.googleapis.com/auth/cloud-platform"],
});
let request = {
auth,
projectId: process.env.GCLOUD_PROJECT,
location: "asia-east1",
gcsPath: "gs://my_project_bucket/templates/csv_to_bq",
requestBody: {
jobName: `csv-to-bq-${filePath?.replace(/\//g, "-")}`,
environment: {
tempLocation: "gs://my_project_bucket/temp",
},
parameters: {
input: `gs://my_project_bucket/${object.name}`,
output: biqQueryOutput,
},
},
};
return dataflow.projects.locations.templates.launch(request);
This function is triggered every time any file is written in Cloud Storage. I am working with sensors so at least I have to import 89 different data i.e. different CSV files within 15 minutes.
The whole process works fine if there are only 4 jobs working at the same time. However, when the function tried to create the fifth job, the API returned many different types of errors.
Error 1 (not exact since somehow I cannot find the error anymore):
Error Response: [400] The following quotas were exceeded: IN_USE_ADDRESSES
Error 2:
Dataflow quota error for jobs-per-project quota. Project *** is running 25 jobs.
Please check the quota usage via GCP Console.
If it exceeds the limit, please wait for a workflow to finish or contact Google Cloud Support to request an increase in quota.
If it does not, contact Google Cloud Support.
Error 3:
Quota exceeded for quota metric 'Job template requests' and limit 'Job template requests per minute per user' of service 'dataflow.googleapis.com' for consumer 'project_number:****'.
I know I can space out starting jobs to avoid Error 2 and 3. However, I don't know how to start jobs in a way that won't fill up the addresses. So, how do I avoid that? If I cannot, then what approach should I use?
I had answered this in another post here - Which Compute Engine quotas need to be updated to run Dataflow with 50 workers (IN_USE_ADDRESSES, CPUS, CPUS_ALL_REGIONS ..)?.
Let me know if that helps.
This is a GCP external IP quota issue and the best solution is not to use any public IPs for dataflow jobs as long as your pipeline resources stay within GCP networks.
To enable public IP in dataflow jobs:
Create or update your subnetwork to allow Private google access. this is fairly simple to do using the console - VPC > networks > subnetworks > tick enable private google access
In the parameters of your Cloud Dataflow job, specify --usePublicIps=false and --network=[NETWORK] or --subnetwork=[SUBNETWORK].
Note: - For internal IP IN_USED errors just change your subnet CIDR range to accommodate more addresses like 20.0.0.0/16 will give you close to 60k internal IP address.
By this, you will never be exceeding your internal IP ranges