How to share data between different GCP accounts

How to share data between different GCP accounts - google-cloud-platform

Our company is creating multi-tenant products and services under our own Google Cloud Platform Account/Organization. Close to 90% of data will be managed and stored within this one account. But each of our customers has their own GCP Account/Organization, and roughly 10% of the total data will come from their end of things (via GCP Storage, databases, etc). Customers will also have their own unrelated data and services, hence the need to use separate accounts.
The data quantity could be as low as 1GB per day or high as 100GB per day, depending on the size of the customer. The data will generally be numerous large files between 100 and 500MB (CSV/row-based data).
What are strategies to safely and efficiently share data between two or more GCP Accounts? Is there something native within GCP that allows for this and helps manage users/permissions, or do we need to build our own APIs/services as if we were communicating with someone else external to GCP?

GCP has a shared VPC concept (https://cloud.google.com/vpc/docs/shared-vpc) that allows you to create a shared network between projects, so you can share resources using internal IPs between projects. This isn't useful for sharing data between accounts though, it is for sharing it inside one organization with multiple projects for different departments.
AFAIK, for sharing data between accounts you have to use VPC Peering (https://cloud.google.com/vpc/docs/vpc-peering) or go through the internet. With peering your data doesn't leave Google's network and it is used by 3rd parties like MongoDB that sell their own cloud platform that actually runs on GCP (and other cloud vendors).
If your actually data is just files though, I don't think there is much risk in going over the internet and using cloud storage. There are many strategies for securing this type of data transfer.

The Google cloud resources and IAM role are enough for segregating the data.
For cloud storage, create a bucket per customer. Grant the correct accounts (user or service) on the bucket to allow a customer to see one, or several (in case of merge for example)
For bigquery, create a dataset per customer and apply the same IAM policy as before.
For Cloud SQL, it's more tricky, because not bind to IAM role. Create a database per customer and play with database user right for granting the access.
Remember, that IAM perform authentication, and authorization only on GCP resources. You can't have custom authorization with IAM. If it's a requirement, you have to implement by yourselves this checks.
In my company, we use Firestore for storing the authorization and the user profiles. The authentication is ensure by GCP (IAP for example) and we use the email of the user as key for the authorizations.

Related

Can I use temporary AWS IAM credentials with the BigQuery Data Transfer Service?

Currently, we use AWS IAM User permanent credentials to transfer customers' data from our company's internal AWS S3 buckets to customers' Google BigQuery tables following BigQuery Data Transfer Service documentation.
Using permanent credentials possesses security risks related to the data stored in AWS S3.
We would like to use AWS IAM Role temporary credentials, which require the support of a session token on the BiqQuery side to get authorized on the AWS side.
Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
We considered Omni framework (https://cloud.google.com/bigquery/docs/omni-aws-cross-cloud-transfer) to transfer data from S3 to BQ, however, we faced several concerns/limitations:
Omni framework targets data analysis use-case rather than data transfer from external services. This concerns us that the design of Omni framework may have drawbacks in relation to data transfer at high scale
Omni framework currently supports only AWS-US-EAST-1 region (we require support at least in AWS-US-WEST-2 and AWS-EU-CENTRAL-1 and corresponding Google regions). This is not backward compatible with current customers' setup to transfer data from internal S3 to customers' BQ.
Our current customers will need to signup for Omni service to properly migrate from the current transfer solution we use
We considered a workaround with exporting data from S3 through staging in GCS (i.e. S3 -> GCS -> BQ), but this will also require a lot of effort from both customers and our company's sides to migrate to the new solution.

Is there a way that the BigQuery Data Transfer Servce can use AWS IAM roles or temporary credentials to authorise against AWS and transfer data?
No unfortunately.
The official Google BigQuery Data Transfer Service only mentions AWS access keys all throughout the documentation:
The access key ID and secret access key are used to access the Amazon S3 data on your behalf. As a best practice, create a unique access key ID and secret access key specifically for Amazon S3 transfers to give minimal access to the BigQuery Data Transfer Service. For information on managing your access keys, see the AWS general reference documentation.
The irony of the Google documentation is that while it refers to best practices and links to the official AWS docs, it actually doesn't endorse best practices and ignores what AWS mention:
We recommend that you use temporary access keys over long term access keys, as mentioned in the previous section.
Important
Unless there is no other option, we strongly recommend that you don't create long-term access keys for your (root) user. If a malicious user gains access to your (root) user access keys, they can completely take over your account.
You have a few options:
hook into both sides manually (i.e. link up various SDKs and/or APIs)
find an alternative BigQuery-compatible service, which does as such
accept the risk of long-term access keys.
In conclusion, Google is at fault here of not following security best practices and you - as a consumer - will have to bear the risk.

Monitoring GCP spend without access to Billing API

I run a small research group at a large university that manages hundreds of GCP accounts. The university acts as the Billing Administrator, and my research group was assigned a GCP "project" for all of our work. However, for privacy reasons, they cannot give me access to the Billing API because this would allow me to see the billing details for other labs.
Because we have trainees in our lab who WILL make mistakes, I would like to setup an automated system that monitors our current GCP bill, and (1) sends notifications or (2) terminates all VMs, when that bill reaches certain predefined limits. For example, if our monthly budget is $10k, then I would like to receive a notification at $5k, another notification at $10k, and I would like to terminate all VMs at $15k.
My problem is that in order to implement a system like this, I need access to the Billing API. I have already contacted my system administrator and they have said that this is impossible. Instead, they proposed that I write a script that lists all VMs and uses the Cost Calculator to estimate my monthly GCP bill.
However, this seems a little circuitous. When I am using the Google Cloud Console, I can see the total and forecasted costs for my project, so it seems that I should be able to access this information programmatically. However, I cannot find any information on how to do this, since all solutions require me to activate the Billing API. Any ideas?

There is no API to fetch the data you see in the Google Cloud Console. You will need to export the billing data and then process each row of data to generate reports.
There are two options that I can think of:
Option 1) Ask the admin to set up billing data export to BigQuery. Grant you permission to query the billing tables. You can then query BiGQuery to generate your own cost reports.
Set up Cloud Billing data export to BigQuery
Option 2) Create a separate billing account for your project and grant you permission. A GCP ORG can have multiple Billing Accounts tied to the same Payments Account. This option supports creating budget alerts.

AWS cloud cost per user

I have an AWS account. There are multiple users being managed by IaM service.
Each user has access key and is at liberty to perform various actions such as files upload.
Is there any means to monitor cloud costs and usage by user?
I utilize cost explorer AWS service and intend filtering and grouping costs/usages by user. Unfortunately haven't come up with any way to nail it in the most graceful way.

One common way to do this is to use Cost Allocation Tags.
You can define these tags and enforce them, e.g. using AWS Config and/or tag policies.

AWS does not track costs by user.
When an IAM User makes a request to AWS to create resources (eg an EC2 instance or an RDS database), the user's permissions are checked to confirm that they are permitted to make that API call. If they are permitted, then the API call is allowed and the resources are created.
Resources created in an AWS Account are owned by the AWS Account, not an individual user. Thus, there is no relationship between resources and the credentials used to create the resource.
The closest link between users and resources would be the audit trail of API calls kept by AWS CloudTrail. CloudTrail stores information about the API call and the user that made the call, but it does not directly link to the resources that were created. This would take some effort to back-trace resources to users.
Typically, cost management is done by tagging resources. Such tags would identify cost centers or project codes that can be used to charge-back the cost of systems. Enforcing tagging is difficult. Only some services allow tagging to be enforced when services are launched. For others, it would be a matter of identifying resources that do not meet tagging requirements. See: Using AWS Config Rules to Manage Resource Tag Compliance | Sumo Logic

You can monitor every IAM user action through cloud trails logs. So you could imagine a solution based on those logs to calculate the cost of all actions from one IAM user

I allways recomend to have an account per user type or subscription type in your system (free or premium for exmaple). Depending on the user who use your services, you will login that use with this account. Then, using the AWS Cost Categories, you can to see the cost by users type, and then, knowing your number of users for each tipe or subscription you could know your price per user.

What are the pros/docs of "horizontal" vs "vertical" account structure for managing the AWS cloud service for 1000s of clients

We are developing a custom application, API architecture, related services and processes, based on a LAMP stack and all relevant AWS services: Elastic Beanstalk, EC2, S3, ELB, RDS, API Gateway, Lambda, SNS etc. We would propose to manage the app and all related infrastructure for a flat monthly rate to our client base. We would handle all payment details with Amazon directly for all clients. We are essentially building out a multi-tenant application on AWS. We want to be able to service the AWS infrastructure for potentially 1000s of accounts/clients.
Here is the question: What are the pros/cons of:
Option A) hosting all services in a single AWS account using carefully structured IAM roles, users, and permissions, and co-mingling customer data while insuring logical and secure separation of customer data within the account?
- VS -
Option B) creating a unique AWS for each account each client, and manage each account via a local profile. In this approach, all data is fully segregated, but we have to manage common activities (user management, code deployment, operations) across 100s of discrete accounts. There is a data security advantage, but it is feasible to manage that many accounts? Any tools or processes for doing it this way? Each company technician would need a login across every account.
The isolation of option B improves security for each client, as any potential security breach would be limited to a single account. But would code deployments be a nightmare? But what about configuration management?
Is there an account federation service that would help manage option B? Or am I nuts for even considering option B?

Lots to think about, but IMO, in this instance, security trumps all other concerns and that would make me choose option B with the little I know about your setup.
Just think what would happen to your business if the 'master' account was compromised - by a hacker (internal or external) - your clients would be running for the door.
Having lots of accounts to manage is an obstacle, but if treat your infrastructure as code, your code-deployments and everything else can be automated - with 1000s of accounts you will have no choice but to put those systems in place.

AWS storage service for multi-tenant web app

Which services are handy for creating a specific amount of storage allocation for each tenant, increasing/decreasing capacity, and monitoring free and used capacity.

The most flexible storage option on Amazon Web Services is S3 - Simple Storage Service.
S3 is an object store storage facility with which you can upload objects of any type. S3 also support multipart uploads for big files.
To separate your different tenants data, you could use folders in a bucket and do some application logic to stop different tenants accessing each others files.
You can use bucket policies to give different IAM users access to different folders, however, it wouldn't make sense to create an IAM user for each of your tenants.
I encourage you to read the docs:
http://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js