Can google access data on compute engine virtual machine?

Can google access data on compute engine virtual machine? - google-cloud-platform

I'm using an always-free VM on Google Cloud (e2-micro). When creating the instance, there's an option Enable Confidential Computing service, but that requires n2d machine, not part of the always-free resources.
Does that mean Google can read my VM's data?
In other words, without that option enabled, what can Google read on my VM?
I'm not worried about system health monitoring data. I'm only concerned with files and folders that I put there.

Google has written policies that describe what they can access and when. Google also provides the ability to log their access.
Confidential Computing is a different type of technology that is not related to Google accessing your data.
Start with this page which provides additional links:
Creating trust through transparency
This Whitepaper is a good read. Page 9 answers your question:
Trusting your data with Google Cloud Platform

You may have heard of Encryption in Transit, or Encryption at Rest. Confidential Computing just encrypts data while it's being processed within the VM as well (Encryption during Processing?).
You need to use n2d machine types because it uses tech/features available on the AMD EPYC procs.

A Confidential Virtual Machine (Confidential VM) is a type of N2D Compute Engine VM running on hosts based on the second generation of AMD Epyc processors, code-named "Rome." Using AMD Secure Encrypted Virtualization (SEV), Confidential VM features built-in optimization of both performance and security for enterprise-class high memory workloads, as well as inline memory encryption that doesn't introduce significant performance penalty to those workloads.
You can select the Confidential VM service when creating a new VM using the Google Cloud Console, the Compute Engine API, or the gcloud command-line tool.
You can find more details here.
You can check their privacy document here.

Related

How to get hardware information from VMs?

We are migrating our production environment from DigitalOcean to GCP.
However, because it is different, we don't know where to get some information about our VMs.
Is it possible to have a report that tells me the amount of CPUs, Machine Type, amount of RAM, amount of SSD and amount of SSD used by VM?

Compute Engine lets you export detailed reports of your Compute Engine usage (daily & monthly) to a Cloud Storage bucket using the usage export feature. Usage reports provide information about the lifetime of your resources.
VM instance insights help you understand the CPU, memory, and network usage of your Compute Engine VMs.
As #Dharmaraj mentioned in the comment, GCP introduced a new observability tab designed to give insights into common scenarios and issues associated with CPU, Disk, Memory, Networking, and live processes. With access to all of this data in one location, you can easily correlate between signals over a given time frame.
Finally, the Stackdriver agent can be installed on GCE VMs, allowing additional metrics like memory monitoring. You can also use Stackdriver's notification and alerting features. However, premium-tier accounts are the only ones that can access agent metrics.

GKE vs Cloud run

I have a python Flask APIs deployed on cloud run. Autoscaling, CPU, concurrency everything is configurable in cloud run.
Now the problem is with the actual load testing with around 40k concurrent users hitting APIs continuously.
Does cloud run handle these huge volumes or should we port our app to GKE?
What are the factors decide Cloud run vs GKE?

Cloud Run is designed to handle exactly what you're talking about. It's very performant and scalable. You can set things like concurrency per container/service as well which can be handy where payloads might be larger.
Where you would use GKE is when you need to customise your platform, perform man in the middle or environment complexity, or to handle potentially long running compute, etc. You might find this in large enterprise or highly regulated environments. Kubernetes is almost like a private cloud though, it's very complex, has its own way of working, and requires ongoing maintenance.
This is obviously opinionated but if you can't think of a reason why you need Kubernetes/GKE specifically, Cloud Run wins for an API.
To provide more detail though; see Cloud Run Limits and Quotas.
The particularly interesting limit is the 1000 container instances but note that it can be increased at request.

How is the Disk Encryption Key generated by the guest owner in Google confidential VMs?

according to the AMD SEV API specification [1], the guest owner authenticates the AMD platform and verifies the integrity measurement of the launched VM guest, and later encrypts the disk encryption key and sends it to the guest (this flow is shown in Appendix A). However, when searching through the docs of Google confidential VM [2] I could not find any information about either authenticating the platform or sending the wrapped disk encryption keys to the guest.
My specific question is: in the Google Confidential VM implementation, which party generates the disk encryption key? How can the guest owner verify the launch and generate the disk encryption key? If the key is generated by the firmware under the platform provider's control, Google Cloud Platform (GCP) in this case, then the user does not gain any additional security/privacy protection from GCP insiders (as claimed in the docs [2]).
P.S. A bug in the docs: to get support one is advised to post on Stack Overflow with the "confidential-vm-tag" [3], however, no such tag exists as of 2020-07-29.
[1] AMD Secure Encrypted Virtualisation API v0.24 https://www.amd.com/system/files/TechDocs/55766_SEV-KM_API_Specification.pdf
[2] https://cloud.google.com/compute/confidential-vm/docs/about-cvm
[3] https://cloud.google.com/compute/confidential-vm/docs/getting-support

I totally agree with Nico. One way would be to extract PDH and PEK keys from Confidential VM but so far, I have not found any way to do this.

mebius99's answer is correct. I think, based on your reply, that you expect something unique for Confidential VMs. This is not the case, and ultimately why Confidential VMs are so powerful...you don't need to drastically change your existing tooling/orchestration. Google's implementation is flexible so you can use disk encryption in a variety of ways... But Google does not allow users to give the LAUNCH_SECRET per the AMD doc.
I don't think this is "going against the spec", Appendix A of the AMD spec says
The following flow charts are provided to illustrate how the usage
of the SEV API might be implemented. Note that these are only examples
and there may be other implementation strategies.
Unless I am completely missing the boat on what you are asking...

Technically, the answer to the questions depends on which of three available approaches is chosen:
Google-managed keys;
Customer-managed;
Customer-supplied.
Practically, data protection issues are rarely purely technical. Crucial points are the data sensitivity level and Data Protection and Privacy agreement between the customer and the cloud provider. In other words, whether the customer can trust Google in terms of data protection, or due to the existing compliance policies the Cloud is considered as a hostile environment.
Google-managed keys (default). Google uses its infrastructure to generate and manage keys for the customer automatically. The customer has no control over the encryption keys.
Customer-managed keys (CMEK). Google uses its infrastructure to create, maintain and rotate keys for the customer. But CMEK gives the customer control over the keys via Cloud KMS. KMS used for CMEK is a cloud-hosted service that helps customers to ensure the lifecycle of encryption keys: generate, rotate, disable, revoke. Thus the customer gets more control over protected data, because the customer, for instance, can quickly terminate access to data by disabling or destroying the CMEK key.
Customer-supplied keys (CSEK). Data is encrypted using the keys owned by the customer. These keys are not sent to Google, but are stored and managed outside of the Google Cloud Platform. Key maintenance, rotation and deprecation is the responsibility of the customer.
The downside of the customer-managed or customer-supplied keys is that access to the encrypted data can be lost due to an unintentional deletion or loss of the key.
Cloud HSM. Encryption keys can be securely stored on a fully managed Hardware Security Modules in Google datacenters. Customers are provided with strong guarantees that their keys cannot leave the boundary of certified HSMs, and that their keys cannot be accessed by malicious persons or insiders. Permissions on key resources are managed with IAM.
Update
The document [1] provided by AMD is a technical preview. It shouldn't be considered as an established standard. That is why the cloud providers are not required to strictly follow this specification, and the offering "Confidential VM" based on a technical preview is reasonably in Beta.
For those who is concerned in verifying the platform by following the example implementation from the AMD's technical preview, Google provides a validation mechanism
Google Cloud > Confidential VM > Doc > Validating Confidential VMs using Cloud Monitoring:
Cloud Monitoring and Cloud Logging let you monitor and validate your
Confidential VM instances.
Integrity monitoring is a feature of both
Shielded VM and Confidential VM that helps you understand and make
decisions about the state of your VM instances.
You can view
integrity reports in Cloud Monitoring and set alerts on integrity
failures. You can review the details of integrity monitoring results
in Cloud Logging.
Confidential VM generates a unique type of
integrity validation event, called a launch attestation report
event. Every time an AMD Secure Encrypted Virtualization (SEV)
-based Confidential VM boots, a launch attestation report event is generated as part of the integrity validation events for the VM.
AMD SEV does not deal with the generation of disk encryption keys. Within the CSEK approach the keys are generated by the customer. The AMD SEV role is to provide a safe environment with the encrypted memory in order to protect the customer supplied key while in use. To safely encrypt memory, AMD SEV relies on the 2nd Gen AMD EPYC™ processors. "These keys are generated by the AMD Secure Processor during VM creation and reside solely within it, making them unavailable to Google or any VMs running on the host."
Please see links below for more details:
Google Cloud Blog > Introducing Google Cloud Confidential Computing with Confidential VMs
Google Cloud > Confidential VM > Doc > Confidential VMs and Compute Engine
Google Cloud > Compute Engine > Doc > Encrypt disks with customer-supplied encryption keys
SUSE > AMD Secure Encrypted Virtualization (AMD-SEV) Guide

How to monitor Google Cloud Platform (GCP) costs on an hourly basis?

I am running a VM instance on GCP (actually a ready Deep Learning Package: 8 CPUs, 1 Tesla V100 GPU, ..., access via a Jupyter Notebook).
Is there a way to monitor the overall usage and costs in real-time?
I am thinking about a "Live usage" link inside https://console.cloud.google.com/, which shows which products are currently used, and their price per second/hour.

I think it is not possible to monitor the services usage per second/hour. In case you want to analyze your projects bills, GCP offers several options that you can use for this matter, such as Billng Cicles, Billing Reports, Export Billing Data to a File or BigQuery and Visualize your spend with Data Studio; however, it is important to keep in mind that these alternatives may require certain amount of time to reflect each service usage.
Additionally, you can use the Cloud Billing Catalog API to get the list of all the public services and SKUs metadata in a programmatic, real-time way that can be used as a complement of the cost management tools mentioned above to reconcile list pricing rates.

Architecture Questions to Autoscale Moodle on Google Cloud Platform

We're setting up a Moodle for our LMS and we're designing it to autoscale.
Here are the current stack specifications:
-Moodle Application (App + Data) baked into an image and launched into a Managed Instance Group
-Cloud SQL for database (MySQL 5.7 connected through Cloud SQL Proxy)
-Cloud Load Balancer - HTTPS load balancing with the managed instance group as backend + session affinity turned on
Questions:
Do I still need Redis/Memcached for my session? Or is the load balancer session affinity enough?
I'm thinking of using Cloud Filestore for the Data folder. Is this recommendable vs another Compute Engine?
I'm more concerned of the session cache and content cache for future user increase. What would you recommend adding into the mix? Any advise on the CI/CD would also be helpful.

So, I can't properly answer these questions without more information about your use case. Anyway, here's my best :)
How bad do you consider to be forcing the some users to re-login when a machine is taken down from the managed instance group? Related to this, how spiky you foresee your traffic will be? How many users will can a machine serve before forcing the autoscaler to kick in and more machines will be added or removed to/from the pool (ie, how dynamic do you think your app will need to be)? By answering these questions you should get an idea. Also, why not using Datastore/Firestore for user sessions? The few 10s of millisecond of latency shouldn't compromise the snappy feeling of your app.
Cloud Filestore uses NFS and you might hit some of the NFS idiosyncrasies. Will you be ok hitting and dealing with that? Also, what is an acceptable latency? How big is the blobs of data you will be saving? If they are small enough, you are very latency sensitive, and you want atomicity in the read/write operations you can go for Cloud BigTable. If latency is not that critical Google Cloud Storage can do it for you, but you also lose atomicity.
Google Cloud CDN seems what you want, granted that you can set up headers correctly. It is a managed service so it has all the goodies without you lifting a finger and it's cheap compared to serving stuff from your application/Google Cloud Storage/...
Cloud Builder for seems the easy option, unless you want to support more advanced stuff that are not yet supported.
Please provide more details so I can edit and focus my answer.

there is study for the autoscaling, using redis memory store show large network bandwidth from cache server, compare than compute engine with redis installed.
moodle autoscaling on google cloud platform
regarding moodle data, it show compute engine with NFS should have enough performance compare than filestore, much more expensive, as the speed also depend on the disk size.
I use this topology for the implementation
Autoscale Topology Moodle on GCP

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js