Extract gcloud VM Instance Monitoring Data to BigQuery - google-cloud-platform

Outline
We are running an ecommerce platform on Google Cloud on a dedicated VM Instance. Most of our traffic happens on Monday, as we then send our newsletters to our customer-base. Because of that we have huge traffic-peaks each Monday.
Goal
Because of this peak-traffic we need to make sure, that we understand how much server-load a single user is generating on average. To achieve this, we want to correlate our VM Instance Monitoring Data with our Google Analytics Data in Google Datastudio. To get a better understanding of the mentioned dynamics.
Problem
As far as we are aware (based on the docs), there is no direct data-consumption from the gcloud sdk possible in Google Datastudio. With that as a fact, we tried to extract the data via. BigQuery, but also there didn't found a possibility to access the monitoring data of our VM Instance.
Therefore we are looking for a solution, how we can extract our monitoring data of our VM Instances to Google Datastudio (preferably via BigQuery). Thank you for your help.

Here is Google official solution for monitoring export.
This page describes how to export monitoring metrics to bigquery dataset.
Solution deployments use pub/sub, app engine, Cloud scheduler and some python codes.
I think you only need to export the metrics listed in here.
If you complete exporting process successfully, then you can use Google Data studio for visualizing your metric data.

Related

Extracting metrics such as CPU utilization into reports via command line/bash scripts?

In Azure for example, I created a few bash scripts give me things like average daily CPU utilization over whatever time period I want for any/all VMs using their command line tool.
I can't seem to figure out how to do this in Google cloud except by manually using the console (automatically generated daily usage reports don't seem to give me any CPU info either), so far numerous searches have told me that using the monitoring function in the google cloud console is basically the only way I can do this, as the cli "gcloud" will only report quotas back which isn't really what I'm after here. I haven't bothered with the ops agent install yet, as my understanding is that this is just for adding additional metrics (to the console) and not functionality to the google cloud cli. Up to this point I've only ever managed Azure and some AWS, so maybe what I'm trying to do isn't even possible in Google cloud?
Monitoring (formerly Stackdriver) does seem to be neglected by the CLI (gcloud).
There is a gcloud monitoring "group" but even the gcloud alpha monitoring and gcloud beta monitoring commands are limited (and don't include e.g. metrics).
That said, gcloud implements Google's underlying (service) APIs and, for those (increasingly fewer) cases where the CLI does not yet implement an API and its methods, you can use APIs Explorer to find the underlying e.g. Monitoring service directly.
Metrics can be access through a query over the underlying time-series data, e.g. projects.timeseries.query. The interface provides a form that enables you to invoke service methods from the browser too.
You could then use e.g. curl to construct the queries you need for your bash scripts and other tools (e.g. jq) to post-process the data.
Alternatively, and if you want a more programmatic experience with good error-handling and control over the output formatting, you can use any of the language-specific SDKs (client libraries).
I'd be surprised if someone hasn't written a CLI tool to complement gcloud for monitoring as it's a reasonable need.
It may be worth submitting a feature request on Google's Issue Tracker. I'm unsure whether it would best be placed under Cloud CLI or Monitoring. Perhaps try Monitoring.

Adding Google Analytics metrics in GCP Monitoring

I would like to be able to add some of the metrics that are captured in Google Analytics into Google Cloud Platform Monitoring, specifically the number of active users through time. Is this kind of metric available in GCP Monitoring?
Best wishes,
Andrew
Interesting question.
There should be a way to do this but I think Analytics metrics aren't available (directly) in Cloud Monitoring.
Here's the list of systems that can provide metrics to Cloud Monitoring out-of-the-box.
While Cloud Monitoring supports more than just Google Cloud Platform, it does not support Google's non-GCP services (directly).
I write 'directly' because you could write a metrics exporter for Analytics to do this for yourself (using custom metrics) and it's very likely that someone has already written one.
Getting metrics using Realtime Reporting API
Writing Custom Metrics to Cloud Monitoring

GCP | How I can see all the working virtual machines on a project?

I wrote some code to automate the training procedure on our company vm instances.
you probably know that sometimes GCP can't provide you at the current moment with a machine - 'out of resource' exception.
so , I'd like to monitor which of my machines successfully turned on and which not.
if there is some way to show it on Bigquery it will be great.
thanks .
Using the Cloud Monitoring (Stackdriver) functionality is good way for monitoring all you VMs.
Here is a detailed guide to implement Monitoring on a Compute Engine Instance.
Hope you find it useful.
You can use Google cloud's activity logs too:
Activity logging is enabled by default for all Compute Engine
projects.
You can see your project's activity logs through the Logs Viewer in
the Google Cloud Console:
In the Cloud Console, go to the Logging page. Go to the Logging page
When in the Logs Viewer, select and filter your resource type from the
first drop-down list. From the All logs drop-down list, select
compute.googleapis.com/activity_log to see Compute Engine activity
logs.
Here is the Official documentation.

How to schedule importing data files from SFTP server located on compute engine instance into BigQuery?

What I want to achieve:
Transfer hourly coming data files onto a SFTP file server located on a compute engine VM from several different feeds into Bigquery with real-time updates effectively & cost-efficiently.
Context:
The software I am trying to import data from is an old legacy software and does not support direct exports to cloud. So direct connection from software to cloud isn't an option.
It does however support exporting data to a SFTP server. Which is not available via any GCP tools directly.
So I have setup a manual SFTP server using vsftpd on a compute engine VM instance with expandable storage then giving it a static IP and hardwired that IP into my software. Data now comes to the compute engine instance on hourly intervals seamlessly.
Files are generated on hourly basis. Thus a different file for each hour. However they might contain some duplication .i.e. some of the end records of previous hour's file may overlap with the beginning of the current hour's file.
Files are coming from different source feeds and I have feed names in the filenames so ever-growing data on my compute engine VM instance look like:
feed1_210301_0500.csv
feed2_210301_0500.csv
feed3_210301_0500.csv
feed1_210301_0600.csv
feed2_210301_0600.csv
feed3_210301_0600.csv
feed1_210301_0700.csv
feed2_210301_0700.csv
feed3_210301_0700.csv
...
What I have tried:
I have set Bigquery access & cloud storage permissions within VM instance to access data from VM instance onto BigQuery:
I have tried importing data into BigQuery directly as well as on google cloud storage to import data from there and yet there is no option to directly import data from VM instance to BigQuery nor I can somehow import data from VM to GCS then to load into BigQuery but there is no option for that and documentation is silent on the matter of scheduled transfers as well.
There are some external data transfer services like Fivetran and HevoData but they are relatively expensive and also seem much of a overkill as both my source or destination is on the GCP and it wont be much different than having a third VM and scheduling some scripts for imports. (Which BTW is my current workaround :D i.e. Using python scripts to stream data into BigQuery as explained here)
Currently I am exploring DataFusion which is only free for 120 hrs each month, has extra costs for underlying Dataprep pipelines and not sure if its the right way to go about. Also I am currently exploring tools like Cloud Scheduler & Cloud Composer to see if any fits my data needs but as of now could not find a viable solution.
I am happy to learn any new tools and technologies and any advice bettering the situation in anyway is also appreciated.
I just tried uploading directly from the GCE VM and it worked flawlessly. I've enabled BigQuery in Cloud API access scopes, created the file (test_data.csv) with some random data, that satisfies the schema of the table (test_table) that I have in BigQuery table dataset (test_dataset) and ran:
bq load test_dataset.test_table test_data.csv
You could use GCS on-premise transfer (you can schedule it) => then schedule a GCS transfer to BigQuery.
If nor this, nor external data transfer services work for you, then I believe that your best bet is to create a script to schedule a batch load of the data from your VM to BigQuery.
Maybe this other answer might help you as well.

Find the Project, Bucket, Compute Instance Details for GCP Platform

How we can find the details programmatically about GCP Infrastructure like various Folders, Projects, Compute Instances, datasets etc. which can help to have a better understanding of GCP platform.
Regards,
Neeraj
There is a service in GCP called Cloud Asset Inventory. Cloud Asset Inventory is a storage service that keeps a five week history of Google Cloud Platform (GCP) asset metadata.
It allows you to export all asset metadata at a certain timestamp to Google Cloud Storage or BigQuery.
It also allows you to search resources and IAM policies.
It supports a wide range of resource types, including:
Resource Manager
google.cloud.resourcemanager.Organization
google.cloud.resourcemanager.Folder
google.cloud.resourcemanager.Project
Compute Engine
google.compute.Autoscaler
google.compute.BackendBucket
google.compute.BackendService
google.compute.Disk
google.compute.Firewall
google.compute.HealthCheck
google.compute.Image
google.compute.Instance
google.compute.InstanceGroup
...
Cloud Storage
google.cloud.storage.Bucket
BigQuery
google.cloud.bigquery.Dataset
google.cloud.bigquery.Table
Find the full list here.
The equivalent service in AWS is called AWS Config.
I have found open source tool named as "forseti Security", which is easy to install and use. It has 5 major components in it.
Inventory : Regularly collects the data from GCP and store the results in cloudSQL under the table “gcp_inventory”. In order to refer to the latest inventory information you can refer to the max value of column : inventory_index_id.
Scanner : It periodically compares the policies applied on GCP resources with the data collected from Inventory. It stores the scanner information in table “scanner_index”
Explain : it helps to manage the cloud IAM policies.
Enforcer : This component use Google Cloud API to enforce the policies you have set in GCP platform.
Notifier : It helps to send notifications to Slack, Cloud Storage or SendGrid as show in Architecture diagram above.
You can find the official documentation here.
I tried using this tool and found it really useful.