I am interested in setting up a site hosting a cloud-hosted jupyter-style notebook, which users can provide their credentials to. The credentials are used for authenticated access to a REST API which can then be called from the notebook to retrieve data which users can then perform data-science type investigations on, e.g. similar to a kaggle notebook, but for a specific REST API.
Ideally the kernal would be hosted on the client's machine or elsewhere, so as to avoid having to provide server-side resources (storage and compute power) to run user's code.
I've been looking at jupyterlite and google colab as possible solutions.
Any suggestions for how to approach this would be much appreciated.
Related
We're making some machines in which there's a part which uploads the images captured by the camera to Google Cloud Storage. For this purpose what I've done is
Create a service account for each machine.
Create a custom role with
permissions:
storage.objects.create
storage.buckets.get
storage.objects.get
Apply this role to that service account.
Download the JSON credentials key file and use this file with python script (in which I specify bucket name) to upload image to GCP Storage.
Is this way of doing things efficient and secure given that we only ship 2-3 machines each month?
Also I will have to ship JSON file with each machine, if the above method is valid, is this fine or there's any method to hide this key file?
Your case isn't so simple!
Firstly, if you want to put a service account in each machine, you will be limited a day (you are limited to 100 service accounts per project). And using the same service account, or the same key is too dangerous
Secondly, your use case sounds like IoT use case where you have lot of devices on edge to communicate with the cloud. But PubSub messages are limited to 10Mb max and IoT Core solution doesn't fit with your case.
The 2 latest solutions are based on the same principle:
Make an endpoint public (Cloud Run, Cloud Functions, App Engine or whatever you want)
Call this endpoint with your machine, and their own token (i.e. a string, encrypted or not)
Check the token, if OK you can (here the 2 alternatives)
Create an access token (short lived token) on a service account with the minimal permission for the machine usage, and send it back to the machine. The machine will use it to call the Google Cloud API, such as Cloud Storage API. The advantage of this solution is that you will be able to use the access token to reach other GCP APIs in the future if your use case, and your machine update require them.
Create a signedUrl and send it back to the machine. Then the machine has to upload file to this URL. The advantage is the strict limitation to Cloud Storage, no other GCP service.
The main issue with the 2 latest solution is that required public endpoint and you are exposed to attacks on it. You can protect it behind a load balancer and mitigate the attacks with Cloud Armor. Think also to limit the scalability of your public endpoint, to prevent any useless expenses in case of attacks.
Lets say you have a web application that accesses (via api) a user's Google Drive files at a set time each week and performs some kind of task on them.
The user will grant authorization to the web application via it's website (which is hosted on App Engine). However, the weekly, scheduled queries to Google Drive will be carried out by cloud scheduler.
Is it possible then for Cloud Scheduler to use the same credentials (access and refresh tokens) gotten by the web application in the first instance?
Can the credentials, for example, be stored in a cloud data storage bucket, which is accessible to both the application and cloud scheduler?
Or is there another means of accomplishing this?
I am new to Google Cloud. I am trying to access google buckets to upload files. I use Google Storage object for accessing the bucket programmatically in Python. I am able to authenticate the storage object with 'key.json'. But I am unsure when the application will run in cloud how will it access 'key.json' file securely ? Also is there a way to authenticate storage object using access token in python ?
Thanks in advance!
But I am unsure when the application will run in cloud how will it
access 'key.json' file securely ?
Review the details that I wrote below. Once you have selected your environment you might not need to use a service account JSON file at all because the metadata server is available to provide your code with credentials. This is the best case and secure. On my personal website, I have written many articles that show how to create, manage and store Google credentials and secrets.
Also is there a way to authenticate storage object using access token
in python ?
All access is via an OAuth Access Token. The following link shows details using the metadata server which I cover in more detail below.
Authenticating applications directly with access tokens
There are three items to consider:
My code is not running in Google Cloud
My code is running in Google Cloud on a "compute" type of service with access to the metadata server
My code is running in Google Cloud without access to the metadata server.
1) My code is not running in Google Cloud
This means your code is running on your desktop or even in another cloud such as AWS. You are responsible for providing the method of authorization. There are two primary methods: 1) Service Account JSON key file; 2) Google OAuth User Authorization.
Service Account JSON key file
This is what you are using now with key.json. The credentials are stored in the file and are used to generate an OAuth Access Token. You must protect that file as it contains your Google Cloud secrets. You can specify the key.json directly in your code or via the environment variable GOOGLE_APPLICATION_CREDENTIALS
Google OAuth User Authorization
This method requires the user to log in to Google Accounts requesting an OAuth scope for Cloud Storage. The end result is an OAuth Access Token (just like a Service Account) that authorizes access to Cloud Storage.
Getting Started with Authentication
2) My code is running in Google Cloud on a "compute" type of service with access to the metadata server
Notice the word "metadata" server. For Google Cloud compute services, Google provides a metadata server that provides applications running on that compute service (Compute Engine, Cloud Functions, Cloud Run, etc) with credentials. If you use Google SDK Client libraries for your code, the libraries will automatically select the credentials for you. The metadata server can be disabled (denied access through role/scope removal), so you need to evaluate what you are running on.
Storing and retrieving instance metadata
3) My code is running in Google Cloud without access to the metadata server.
This is a similar scenario to #1. However, now you are limited to only using a service account unless this is a web server type of service that can present the Google Accounts authorization service to the user.
One thing I dislike about Google Cloud Platform (GCP) is its less baked-in security model around roles/service accounts.
Running locally on my laptop, I need to use the service account's key specified in a JSON file. In AWS, I can just assume a role I have been granted access to assume (without needing to carry around a private key). Is there an analogue to this with GCP?
I am going to try and answer this. I have the AWS Security Specialty (8 AWS certifications) and I know AWS very well. I have been investing a lot of time this year mastering Google Cloud with a focus on authorization and security. I am also an MVP Security for Alibaba Cloud.
AWS has a focus on security and security features that I both admire and appreciate. However, unless you really spend the time to understand all the little details, it is easy to implement poor/broken security in AWS. I can also say the same about Google security. Google has excellent security built into Google Cloud Platform. Google just does it differently and also requires a lot of time to understand all the little features / details.
In AWS, you cannot just assume a role. You need an AWS Access Key first or be authenticated via a service role. Then you can call STS to assume a role. Both AWS and Google make this easy with AWS Access Keys / Google Service Accounts. Whereas AWS uses roles, Google uses roles/scopes. The end result is good in either platform.
Google authentication is based upon OAuth 2.0. AWS authentication is based upon Access Key / Secret Key. Both have their strengths and weaknesses. Both can be either easy to implement (if you understand them well) or a pain to get correct.
The major cloud providers (AWS, Azure, Alibaba, Google, IBM) are moving very fast with a constant stream of new features and services. Each one has strengths and weaknesses. Today, there is no platform that offers all the features of the others. AWS today is ahead both in features and market share. Google has a vast number of services that outnumber AWS and I don't know why this is overlooked. The other platforms are catching up quickly and today, you can implement enterprise class solutions and security with any of the cloud platforms.
Today, we would not choose only Microsoft or only Open Source for our application and server infrastructure. In 2019, we will not be chosing only AWS or only Google, etc. for our cloud infrastructure. We will mix and match the best services from each platform for our needs.
As described in the Getting Started with Authentication [1] page, for service accounts it is needed the key file in order to authenticate.
From [2]: You can authenticate to a Google Cloud Platform (GCP) API using service accounts or user accounts, and for APIs that don't require authentication, you can use API keys.
Service and user accounts needs the key file to authenticate. Taking this information into account, there is no manner to locally authenticate without using a key file.
Links:
[1] https://cloud.google.com/docs/authentication/getting-started
[2] https://cloud.google.com/docs/authentication/
I wish to use the Google cloud IAM ( identity access management) system for a new Google App Engine project. (Although it's not necessary to know, the front-end will be an angular JS, and the backend in Java.) However, once the user logs into my app using his or her browser and is then authenticated via Google Cloud IAM, I need to know whether it's possible to pass this " authenticated credential" to a Google compute VM. If so, how? The reason why need to pass this "authenticated credential" is that I wish to use the gsutil ( or similar) functionality on a Google compute VM and I want to use the same username to ensure that the security profile carries through properly. (Specifically, I intend to use gsutil to communicate with Google cloud storage, but I intend to do this from a Windows Server compute engine VM.)
I've been reading on the Google computer VM and Google cloud IAM, and they all talk about being able to pass the "service account" token, but there is no reference to how to pass a "authenticated user" credential so that the gsutil command that can access Google cloud storage on the Windows VM could use this authenticated user. (I want to avoid making the user authenticate both for my application as well as for the gsutil program running within the compute engine Windows VM.)
Is this possible? If not, any suggestions/workarounds?
One idea I had, though ugly, is as follows: every time a Windows compute engine VM is requested, we would dynamically create a new Google service account which had the same permissions as the logged in IAM-authenticated user. Then, we would uses Google service account within the Windows compute VM to contact Google cloud storage. This solves the problem of ensuring that the same privileges are communicated, though it creates a slightly different problem in that all the logs that are generated for access to the file will be using this dummy service account instead of the real users name.