How can we see complete logs for GPU/CPU scaling on Vertex AI? I want to see all the details about how Vertex AI is doing auto-scaling. How can I see the complete logs?
I doubt that you will have complete logs for GPU/CPU scaling on Vertex AI. Currently it supports 2 kinds of Audit logs like:
Admin Activity audit logs
Google Cloud services write audit logs to help you answer the questions, "Who did what, where, and when?" within your Google Cloud resources.
It's about who make some changes.
Data Access audit logs
Data Access audit logs contain API calls that read the configuration or metadata of resources, as well as user-driven API calls that create, modify, or read user-provided resource data.
It's more about data not resources.
The closes logs which can you get in Vertex AI is described in:
Online prediction logging which provides
Container logging, which logs the stdout and stderr streams from your prediction nodes to Cloud Logging. These logs are essential and required for debugging.
Access logging, which logs information like timestamp and latency for each request to Cloud Logging.
or
Cloud Monitoring metrics which provides CPU/GPU/Memory usage or Network usage.
Currently those are the available options for Vertex AI. I don't think that you will be able to see GPU/CPU scaling like in Computer Engine logs.
Related
I keep reading that I can write a log query to sample a percentage of logs but I have found zero examples.
https://cloud.google.com/blog/products/gcp/preventing-log-waste-with-stackdriver-logging\
You can also choose to sample certain messages so that only a percentage of the messages appear in Stackdriver Logs Viewer
How do I get 10% of all GCE load balancer logs with a log query? I know I can configure this on the backend, but I don't want that. I want to get 100% of logs in stackdriver and create a pub/sub log sink with a log query that only captures 10% of them and sends those sampled logs somewhere else.
I suspect you'll want to create a Pub/Sub sink for Log Router. See Configure and manage sinks
Using Google's Log querying, you can use sample to filter (inclusion) logs.
I have an on-premise app deployed in an Application Server (e.g. Tomcat) and it generates its own log file. If I decide to migrate this to an AWS EC2, including the Application Server, is it possible to port my application logs in Cloudwatch instead? or is Cloudwatch only capable of logging the runtime logs in my application server? is it a lot of work to do this or is this even possible?
Kind of confuse on Cloudwatch. Seems it can do multiple things but is it really right to make it do that? Its only supposed to log metrics right, so it can alert whatever or whoever needs to be alerted.
If you have already developed application that produces its own log files, you can use CloudWatch Logs Agent to ingest the logs into CloudWatch Logs:
After installation is complete, logs automatically flow from the instance to the log stream you create while installing the agent. The agent confirms that it has started and it stays running until you disable it.
The metrics, such as RAM usage, disk space, can also be monitored and pushed to CloudWatch through the agent.
In both cases, logs and metrics, you can setup CloudWatch Alarms to automatically detect anomalies and notify you, or perform other actions, when they are detected. For logs, this is done through metric filters:
You can search and filter the log data coming into CloudWatch Logs by creating one or more metric filters. Metric filters define the terms and patterns to look for in log data as it is sent to CloudWatch Logs. CloudWatch Logs uses these metric filters to turn log data into numerical CloudWatch metrics that you can graph or set an alarm on.
update
You can also have your application to inject logs directly to CloudWatch logs using AWS SDK. For example, in python, you can use put_log_events.
I can find a graph of "Group size" in the page of the instance group.
However, when I try to find this metric in Stackdriver, it doesn't exist.
I tried looking in the metricDescriptors API, but it doesn't seem to be there either.
Where can I find this metric?
I'm particularly interested in sending alerts when this metrics goes to 0.
There is not a Stackdriver Monitoring metric for this data yet. You can fetch the size using the instanceGroups.get API call. You could create a system that polls this data and posts it back to Stackdriver Monitoring as a custom metric and then you will be able to access it from Stackdriver.
I've stored analytics in a BigQuery dataset, which I've been doing for over 1.5 years by now, and have hooked up DataStudio, etc and other tools to analyse the data. However, I very rarely look at this data. Now I logged in to check it, and it's just completely gone. No trace of the dataset, and no audit log anywhere showing what happened. I've tracked down when it disappeared via the billing history, and it seems that it mysteriously was deleted in November last year.
My question to the community is: Is there any hope that I can find out what happened? I'm thinking audit logs etc. Does BigQuery have any table-level logging? For how long does GCP store these things? I understand the data is probably deleted since it was last seen so long ago, I'm just trying to understand if we were hacked in some way.
I mean, ~1 TB of data can't just disappear without leaving any traces?
Usually, Cloud Audit Logging is used for this
Cloud Audit Logging maintains two audit logs for each project and organization: Admin Activity and Data Access. Google Cloud Platform services write audit log entries to these logs to help you answer the questions of "who did what, where, and when?" within your Google Cloud Platform projects.
Admin Activity logs contain log entries for API calls or other administrative actions that modify the configuration or metadata of resources. They are always enabled. There is no charge for your Admin Activity audit logs
Data Access audit logs record API calls that create, modify, or read user-provided data. To view the logs, you must have the IAM roles Logging/Private Logs Viewer or Project/Owner. ... BigQuery Data Access logs are enabled by default and cannot be disabled. They do not count against your logs allotment and cannot result in extra logs charges.
The problem for you is retention for Data Access logs - 30 days (Premium Tier) or 7 days (Basic Tier). Of course, for longer retention, you can export audit log entries and keep them for as long as you wish. So if you did not do this you lost these entries and your only way is to contact Support, I think
Does amazon ec2 generate emails and pdfs of the monitoring information which it does on regular timely basis ? It provides some graphs for CPU Utilisation, Disk reads, Dish reads information, Disk write, Disk write oprations, Network in etc. I need to get all these graphs and data from aws console to my email address in the form of pdf.Can i get it directly or if there is another way to get backup on regular basis.
All the metrics from EC2 are stored in CloudWatch (with most other AWS service metrics). Unfortunately there is no export feature built in so you either need to make one using the CloudWatch api/cli or use someone else's
https://github.com/petezybrick/awscwxls
The following is a good starting point for collecting the EC2 metrics, before creating the graphs if you want to do it on your own.
http://docs.aws.amazon.com/cli/latest/reference/cloudwatch/get-metric-statistics.html
Option 3 is to script a login and screenshot of the AWS Console on the metrics page in a browser