In general any cloud service provider, GCP in this context, is it not relevant and mandatory for Google to specifically allow consumers to choose data residency and data processing region option for all services? Else serverless option will have serious adoption issue. Please clarify.
Google Cloud have two types of the products available: that have specified location and available globally.
You can deploy resources in specific location, multi-regional for:
Compute: Compute Engine, App Engine, Google Kubernetes Engine, Cloud Functions
Storage & Databases: Cloud Storage, Bigtable, Spanner, Cloud SQL, Firestore, Memorystore, Persistent Disk...
BigData & Machine learning: BigQuery, Composer, Dataflow, Dataproc, AI training,
Networking: VPC, Cloud Load Balancing,
Developer Tools...
Following products are available only globally: Networking, Big Data Pub/Sub, Machine Learning like vision API, Management Tools, Developer Tools, IAM.
For detailed list please check Google Cloud Locations Documentation
Even if the product is available globally, for example PubSub: it is possible to specify where messages are stored.
If the data in transit are the concern, you have to be aware that Google Cloud Platform uses data encryption at Rest. It consists on several layers of encryption to protect customer data.
Related
GCP's DLP API Page has a short description as "Provides methods for detection, risk analysis, and de-identification of privacy-sensitive fragments in text, images, and Google Cloud Platform storage repositories."
I would like to know if GCP Cloud Source Repositories is counted as a Google Cloud Platform storage repositories.
Let me know.
Thanks!
Regards,
Yuva
GCP Cloud Source Repositories are not included in DLP processing. You can schedule DLP to scan a variety of data resources that are describe at Inspecting storage and databases for sensitive data. But these are:
Google Cloud Storage (GCS) - Blob storage.
BigQuery - Petabyte scale data warehouse.
DataStore - NoSQL database.
The phrase "Google Cloud Platform storage repositories" is a little confusing but what I believe is meant there is "Google Cloud Storage". Don't let the word "repository" confuse you. There is no story relating to Cloud Source Repositories.
Kolban is correct, however you could scan Cloud Source Repositories using dataflow and inspectContent. Combining dataflow to scan arbitrary data sources using the *content methods is a common solution.
with Google cloud platform, cloud SQL you get a lot of options to setup the infrastructure. Does this mean cloud SQL is infrastructure as a service ?
No, the infrastucture of Cloud SQL is managed by Google and by it's engineers, so, Cloud SQL is PAAS (Plaform As A Service).
Cloud SQL is a docker container built on top of a GCE instance, and Google monitor everything for you, and fix the Cloud SQL instance automatically if something goes wrong (Sometimes Google software engineers have to perform some actions to fix some issues if the instance is stuck). So, the only thing that you have to take care of is to store and query your data.
Also, Cloud SQL offers a lot of interesting features, such as, failover replicas, read replicas, user and database adminitration, etc.
So, in Cloud SQL, Google doesn't just sell the infrastucture to create databases, but also the application itself and the monitoring tools too.
How we can find the details programmatically about GCP Infrastructure like various Folders, Projects, Compute Instances, datasets etc. which can help to have a better understanding of GCP platform.
Regards,
Neeraj
There is a service in GCP called Cloud Asset Inventory. Cloud Asset Inventory is a storage service that keeps a five week history of Google Cloud Platform (GCP) asset metadata.
It allows you to export all asset metadata at a certain timestamp to Google Cloud Storage or BigQuery.
It also allows you to search resources and IAM policies.
It supports a wide range of resource types, including:
Resource Manager
google.cloud.resourcemanager.Organization
google.cloud.resourcemanager.Folder
google.cloud.resourcemanager.Project
Compute Engine
google.compute.Autoscaler
google.compute.BackendBucket
google.compute.BackendService
google.compute.Disk
google.compute.Firewall
google.compute.HealthCheck
google.compute.Image
google.compute.Instance
google.compute.InstanceGroup
...
Cloud Storage
google.cloud.storage.Bucket
BigQuery
google.cloud.bigquery.Dataset
google.cloud.bigquery.Table
Find the full list here.
The equivalent service in AWS is called AWS Config.
I have found open source tool named as "forseti Security", which is easy to install and use. It has 5 major components in it.
Inventory : Regularly collects the data from GCP and store the results in cloudSQL under the table “gcp_inventory”. In order to refer to the latest inventory information you can refer to the max value of column : inventory_index_id.
Scanner : It periodically compares the policies applied on GCP resources with the data collected from Inventory. It stores the scanner information in table “scanner_index”
Explain : it helps to manage the cloud IAM policies.
Enforcer : This component use Google Cloud API to enforce the policies you have set in GCP platform.
Notifier : It helps to send notifications to Slack, Cloud Storage or SendGrid as show in Architecture diagram above.
You can find the official documentation here.
I tried using this tool and found it really useful.
Let's say a company has an application with a database hosted on AWS and also has a read replica on AWS. Then that same company wants to build out a data analytics infrastructure in Google Cloud -- to take advantage of data analysis and ML services in Google Cloud.
Is it necessary to create an additional read replica within the Google Cloud context? If not, is there an alternative strategy that is frequently used in this context to bridge the two cloud services?
While services like Amazon Relational Database Service (RDS) provides read-replica capabilities, it is only between managed database instances on AWS.
If you are replicating a database between providers, then you are probably running the database yourself on virtual machines rather than using a managed service. This means the databases appear just like any resource on the Internet, so you can connect them exactly the way you would connect two resources across the internet. However, you would be responsible for managing, monitoring, deploying, etc. This takes away from much of the benefit of using cloud services.
Replicating between storage services like Amazon S3 would be easier since it is just raw data rather than a running database. Also, Big Data is normally stored in raw format rather than being loaded into a database.
If the existing infrastructure is on a cloud provider, then try to perform the remaining activities on the same cloud provider.
This is not a duplicate question. I am just confused in Iaas,Saas with respect to AWS services like Dynamo, RDS, RedShift and Kinesis etc. They helps users to create database So, should we categorize them in Iaas or Saas?
Thanks
To help you understand, SaaS is Software as a Service. It's more like an on demand application where you don't have to worry about configurations, accesses, whitelisting etc. For instance, Google Maps (or Google Apps).
IaaS or Infra as a Service gives you more flexibility in terms of spawning of nodes and clusters, to deal with security services at IP and Port levels, manage access control and authentication etc. On AWS, you may specify what all private or public IPs will have access to your system, whether you prefer to go with dense storage or dense compute nodes for your warehouse, rotate your log files etc.
A page on Amazon RDS reads -
When you buy a server, you get CPU, memory, storage, and IOPS, all
bundled together. With Amazon RDS, these are split apart so that you
can scale them independently.
So, in short... Services like AWS and Azure are mostly now either IaaS or PaaS.