"Android Device Verification" Service quota usage - google-cloud-platform

I'm using Android Device Verification service (SafetyNet's attestation api), for verifying whether the request is sent from the same app which I built.
We have a quota limit of 10,000 (which can be increased) on the number of request we can do using SafetyNet's attestation api.
Now, I want to know if my limit is breached so that I can stop using that API.
For that I was looking into stack-driver alerting but I couldn't find Android Device Verification service in it. (Even though I was able to find it in Quotas)

You can monitor Safetynet Attestations in Stackdriver by specifying these filters and settings:
Resource Type: Consumed API
Metric: Request count (type search term "serviceruntime.googleapis.com/api/request_count" to find correct count quickly)
Add Filter service = androidcheck.googleapis.com
Use aggregator "sum" to get the total count of requests.
You can set advanced aggregation options to aggregate the daily level to compare with your quota. This can be done by setting Aligner: "sum" and Alignment period: "1440m". This gives daily sums of requests for the chart (1440m=24h*60m = number of minutes per day).

Related

Google Cloud managed service for Prometheus consistently ingests wrong values for certain metrics

We set up google cloud managed service for prometheus this week. While creating grafana dashboards I noticed that most metrics were ingested correctly, but some values were consistently incorrect.
The output of the metrics endpoint looks like this (truncated):
# HELP channel_socket_bytes_received_total Number of bytes received from clients # TYPE channel_socket_bytes_received_total counter # HELP event_collection_size Number of elements # TYPE event_collection_size gauge event_collection_size{name="interest"} 18 event_collection_size{name="rfq"} 362 event_collection_size{name="negotiation"} 12 # TYPE sq_firestore_read_total counter sq_firestore_read_total{collection="negotiation"} 12 sq_firestore_read_total{collection="rfq_interest"} 18 sq_firestore_read_total{collection="rfq"} 362
The output on this endpoint is generated by "prom-client": "14.1.0".
Google managed service for prometheus ingests these metrics. Almost all of them work as expected. But the sq_firestore_read_total metric is consistently wrong.
The google cloud metrics explorer shows these values:
Services were restarted a number of times. Once the value of one label reached 3, but more common is that the values of all 3 labels of the metric stay stuck at 0.
It seems to me that something goes wrong during the ingestion stage. Is this a bug in the google cloud managed service for prometheus?
Important to reiterate: The values I expect are 12, 16, and 362. The values that are ingested are either 0 and sometimes 3.

GCP alerting policies based on percentage

I am trying to create some alerting policies in GCP for my application hosted in Kubernetes cluster.
We have a Cloud load balancer serving the traffic and I can see the HTTP status codes like 2XX, 5XX etc.
I need to create some alerting policies based on the error percentage rather than the absolute value like ((NumberOfFailures/Total) * 100) so that if my error percentage goes above say 50% then trigger an alert.
I couldn't find anything on the google documentation. It just tells you to use counter which is like using an absolute value. I am looking for something like if the failure rate goes beyond 50% in a rolling window of 15 minutes then trigger the alert.
Is that even possible to do that natively in GCP?
Yes, I think this is possible with MQL. I have recently created something similar to your use case.
fetch api
| metric 'serviceruntime.googleapis.com/api/request_count'
| filter
(resource.service == 'my-service.com')
| group_by 10m, [value_request_count_aggregate: aggregate(value.request_count)]
| every 10m
| { group_by [metric.response_code_class],
[response_code_count_aggregate: aggregate(value_request_count_aggregate)]
| filter (metric.response_code_class = '5xx')
; group_by [],
[value_request_count_aggregate_aggregate:
aggregate(value_request_count_aggregate)] }
| join
| value [response_code_ratio: val(0) / val(1)]
| condition gt(val(), 0.1)
In this example, I am using the request count for a service my-service.com. I am aggregating the request count over the last 10 minutes and responses with response code 5xx. Additionally, I am aggregating the request count over the same time period, but all response codes. Then in the last two lines, I am computing the ratio of the number of 5xx status codes with the number of all response codes. Finally, I create a boolean value that is true when the ratio is above 0.1 and that I can use to trigger an alert.
I hope this gives you a rough idea of how you can create your own alerting policy based on percentages.

Not able to submit request for quota increase in GCP for any resource available

I have been trying to increase the quota limit for multiple GCP resources including compute engine and IP addresses but always get a popup that "not eligible for quota increase". I found this issue happening with other users as well but it was still unsolved for all of them. Just to clarify, the account I am running is were part of the "GCP for Startup" program with billing enabled globally. I have added relevant screen snips here and here
I have researched and replicated on my side. Basically this is modifiable going to the console by following the next steps:
Go to Cloud Console > IAM & admin > Quotas page
Search the quota limit for your appropriate region
Submit the request with the new limit and save the Case IDs shared with you. You should also receive an email confirmation.
On my side, I could checkbox and edit so after some minutes a received an email with the confirmation. As per your images, I see that the boxes are on gray and you are unable to edit the quotas, therefore you would need to contact the GCP sales team to inspect further.
You could reach by **1 800-654-2533** from Monday to Friday 6AM-6PM CST or make use of the chat or requesting a call back in the link contact provided
cheers,

Hasura on Google Cloud Run - Monitoring

I would like to have a monitoring on my Hasura API on Google Cloud Run. Actually I'm using the monitoring of Google Cloud but It is not really perfect. I have the count of 200 code request. But I want for example, the number of each query / mutation endpoint request.
I want :
count 123 : /graphql/user
count 234 :/graphql/profil
I have :
count 357 : /graphql
If you have an idea.
Thanks
You can't do this with GraphQL unfortunately. All queries are sent to the /v1/graphql endpoint on Hasura, and the only way to distinguish the operations is by parsing the query parameter of the HTTP request and grabbing the operation name.
If Google Cloud allows you to query properties in logs of HTTP requests, you can set up filters on the body, something like:
"Where [request params].query includes 'MyQueryName'"
Otherwise your two options are:
Use Hasura Cloud (https://hasura.io/cloud), which gives you a count of all operations and detailed metrics (response time, variables, etc) on your console dashboard
Write and deploy a custom middleware server or a script for a reverse proxy that handles this

Monitor AWS Service Status using Splunk

Problem
Dependency on AWS Services status
If you depend on Amazon AWS service to operate, you need to keep a close eye on the status of their services. Amazon uses the website http://status.aws.amazon.com/, which provides links to RSS feeds to specific services in specific regions.
Potential Errors
Our service uses S3, CloudFront, and other services to operate. We'd like to be informed on any service that might go down during hours of operations, and automate what we should do in case something goes wrong.
Splunk Logging
We use Splunk for Logging all of our services.
Requirement
For instance, if errors occurs in the application while writing to S3, we'd like to know if that was caused by a potential outage in AWS.
How to monitor the Status RSS feed in Splunk?
Is there an HTTP client for that? A background service?
Solution
You can use the Syndication Input app to collect the RSS feed data from the AWS Status
Create a query that fetches the RSS Items that have errors and stores in Splunk indexes under the syndication sourcetype.
Create an alert based on the query, a since field so that we can adjust the alerts over time.
How
Ask your Splunk team to install the app "Syndication Input" on the environments you need.
After that, just collect each of the RSS feeds needed and add them to the Settings -> Data Input -> Syndication Feed. Take all the URLs from the Amazon Status RSS feeds and use them as Splunk Data Input, filling out the form with certain interval:
http://status.aws.amazon.com/rss/cloudfront.rss
http://status.aws.amazon.com/rss/s3-us-standard.rss
http://status.aws.amazon.com/rss/s3-us-west-1.rss
http://status.aws.amazon.com/rss/s3-us-west-2.rss
When you are finished, the Syndication App has the following:
Use the search for the errors when the occur, adjusting the “since” date so that you can create an alert for the results. I added a day in the past just for display purpose.
since should be some start day you will start monitoring AWS. This helps the query to result in any new event when Amazon publishes new errors captured from the text Informational message:.
The query should not return anything new because the since will not return any date.
Since the token RESOLVED is appended to a new RSS feed item, we exclude them from the alerts.
.
sourcetype=syndication "Informational message:" NOT "RESOLVED"
| eval since=strptime("2010-08-01", "%Y-%m-%d")
| eval date=strptime(published_parsed, "%Y-%m-%dT%H:%M:%SZ")
| rex field=summary_detail_base "rss\/(?<aws_object>.*).rss$"
| where date > since
| table aws_object, published_parsed, id, title, summary
| sort -published_parsed
Create an Alert with the Query. For instance, to send an email: