Cut Cloud Run service from running - safety reasons - google-cloud-platform

Let's assume, I run a Cloud Run service of Google.
Let's also assume someone wants to really harm you and finds out all API routes or is able to send a lot of post-requests by spamming the site.
There is a Email notification, which will popup on certain limits you set up before.
Is there also a way to automatically cut the Cloud Run service, or set it temporarily offline? I couldn't find any good resource or solution to this.

There are several solution to remove from traffic Cloud Run service, in addition of authentication solution proposed by Dondi
Delete the Cloud Run service. It might seem overkill, but, because the service is stateless, you will lost nothing (except the revision history)
If you have your Cloud Run service behind a Load Balancer
You can remove the serverless NEG that route the traffic to it
You can add a Cloud Armor policy that filter the originator IP to exclude it from the traffic
You can set the ingress to internal, or internal and cloud load balancing.
You can deploy a dummy revision (a hello world container for example), and route 100% of the traffic to it (traffic splitting feature)

You can't really "turn off" a Cloud Run service as it's fully managed by Google. A Cloud Run instance automatically scales down to zero if there are no requests, but it will continue on serving traffic.
To emulate what you want to do, make sure that your service requires authentication then revoke access on the offending user (or all users). As mentioned in the docs:
Cloud Run (fully managed) does not offer a direct way to make a service stop serving traffic, but you can achieve a similar result by revoking the permission to invoke the service to identities that are invoking the service. Notably, if your service is "public", remove allUsers from the Cloud Run Invoker role (roles/run.invoker).
Update: Access to a resource is managed through an IAM policy. In order to control access programmatically, you have to get the IAM policy first, then revoke the role to a user or a service account. Here's the documentation that gives an overview.

Related

Require authorization to access ec2 port

Not sure what the right terms were to start this question but basically I have a downloaded UI tool that runs on 0.0.0.0:5000 on my AWS EC2 instance and my ec2 instance has a public ip address associated with it. So right now everyone in the world can access this tool by going to {ec2_public_ip}:5000.
I want to run some kinda script or add security group inbound rules that will require authorization prior to letting someone view the page. The application running on port 5000 is a downloaded tool not my own code so it wouldnt be possible to add authentication to the tool itself (Its KafkaMagic FYI).
The one security measure I was able to do so far was only allow specific IPs TCP connection to port 5000, which is a good start but not enough as there is no guarantee someone on that IP is authorized to view the tool. Is it possible to require an IAM role to access the IP? I do have a separate api with a login endpoint that could be useful if it was possible to run a script before forwarding the request, is that a possible/viable solution? Not sure what best practice is in this case, there might be a third option I have not considered.
ADD-ON EDIT
Additionally, I am using EC2 Instance Connect and if it is possible to require an active ssh connection before accessing the ec2 instances ip that would be a good solution as well.
EDIT FOLLOWING INITIAL DISCUSSION
Another approach that would work for me is if I had a small app running on a different port that could leverage our existing UI to log a user in. If a user authenticated through this app, would it be possible to display the ui from port 5000 to them then? In this case KafkaMagic would be on a private ip and there would be a different IP that the user would go through before seeing the tool
In short, the answer is no. If you want authorization (I think, you mean, authentication) to access an application running on the server - you need tools that run on the server. If your tool offers such capability - use it. It looks like Kafka Magic has such capability: https://www.kafkamagic.com/faq/#how-to-authenticate-kafka-client-by-consumer-group-id
But you can't use external tools, like AWS, that perform such authentication. Security group is like a firewall - it either allows or blocks access to the port.
You can easily create a script that uses the aws sdk or even just executes the aws CLI to view/add/remove an ip address of a security group. How you execute that script depends on your audience and what language you use.
For a small number of trusted users you could issue them an IAM user and API key with a policy that allows them to manage a single dynamic security group. Then provide a script they can run/shortcut to click that gets the current gateway ip and adds/removes it from the security group.
If you want to allow users via website a simple script behind some existing authentication is also possible with sdk/cli approach(depending on available server side scripting).
If users have SSH access - you could authorise the ip by calling the script/cli from bashrc or some other startup script.
In any case the IAM policy that grants permissions to modify the SG should be as restrictive as possible (basically dont use any *'s in the policy). You can add additional conditions like the source IP/range (ie in your VPC) or that MFA must be active for user etc to make this more secure (can be handled in either case via script). If your running on ec2 id suggest looking at IAM Instance Roles as an easy way to give your server access to credentials for your script (but you can create a user and deploy the key/secret to the server and manage it manually if you wanted).
I would also suggest creating a dedicated security group for dynamically managed access alongside existing SGs required for internal operation for safety. It would be a good idea to implement a lambda function on a schedule to flush the dynamic SG (even if you script de-authorising an IP it might not happen so its good to clean up safely/automatically).

GCP default service accounts best security practices

So, we have a "Compute Engine default service account", and everything is clear with it:
it's a legacy account with excessive permission
it used to be limited by "scope" assigned to each GCE instance or instances group
it's recommended to delete this account and use custom service account for each service with the least privilege principle.
The second "default service account" mentioned in the docs is the "App Engine default service account". Presumably it's assigned to the App Engine instances and it's also a legacy thing that needs to be treated similarly to the Compute Engine default service account. Right?
And what about "Google APIs Service Agent"? It has the "Editor" role. As far as I understand, this account is used internally by GCP and is not accessed by any custom resources I create as a user. Does it mean that there is no reason to reduce its permissions for the sake of complying with the best security practices?
You don't have to delete your default service account however at some point it's best to create accounts that have minimum permissions required for the job and refine the permissions to suit your needs instead of using default ones.
You have full control over this account so you can change it's permissions at any moment or even delete it:
Google creates the Compute Engine default service account and adds it to your project automatically but you have full control over the account.
The Compute Engine default service account is created with the IAM basic Editor role, but you can modify your service account's roles to control the service account's access to Google APIs.
You can disable or delete this service account from your project, but doing so might cause any applications that depend on the service account's credentials to fail
If something stops working you can recover the account up to 90 days.
It's also advisable not to use service accounts during development at all since this may pose security risk in the future.
Google APIs Service Agent which
This service account is designed specifically to run internal Google processes on your behalf. The account is owned by Google and is not listed in the Service Accounts section of Cloud Console
Addtiionally:
Certain resources rely on this service account and the default editor permissions granted to the service account. For example, managed instance groups and autoscaling uses the credentials of this account to create, delete, and manage instances. If you revoke permissions to the service account, or modify the permissions in such a way that it does not grant permissions to create instances, this will cause managed instance groups and autoscaling to stop working.
For these reasons, you should not modify this service account's roles unless a role recommendation explicitly suggests that you modify them.
Having said that we can conclude that remooving either default service account or Google APIs Service Agent is risky and requires a lot of preparation (especially that latter one).
Have a look at the best practices documentation describing what's recommended and what not when managing service accounts.
Also you can have a look at securing them against any expoitation and changing the service account and access scope for an instances.
When you talk about security, you especially talk about risk. So, what are the risks with the default service account.
If you use them on GCE or Cloud Run (the Compute Engine default service account) you have over permissions. If your environment is secured, the risk is low (especially on Cloud Run). On GCE the risk is higher because you have to keep up to date the VM and to control the firewall rules to access to your VM.
Note: by default, Google Cloud create a VPC with firewall rules open to 0.0.0.0/0 on port 22, RDP and ICMP. It's also a security issue to fix by default.
The App Engine default service account is used by App Engine and Cloud Functions by default. Same as Cloud Run, the risk can be considered as low.
Another important aspect is the capacity to generate service account key files on those default services accounts. Service account key file are simple JSON file with a private key in it. This time the risk is very high because a few developers take REALLY care of the security of that file.
Note: In a previous company, the only security issues that we had came from those files, especially with service account with the editor role
Most of the time, the user doesn't need a service account key file to develop (I wrote a bunch of articles on that on Medium)
There is 2 ways to mitigate those risks.
Perform IaC (Infra as code, with product like teraform) to create and deploy your projects and to enforce all the best security practices that you have defined in your company (VPC without default firewall rules, no editor role on service accounts,...)
Use organisation policies, especially this one "Disable service account key creation" to prevent the service account key creation, and this one "Disable Automatic IAM Grants for Default Service Accounts" to prevent the editor role on the default service accounts.
The deletion isn't a solution, but a good knowledge of the risk, a good security culture in the team and some organisation policies are the key.

How can I give my GKE deployed application access to Google Pub/Sub?

I deployed a kotlin backend application that is utilizing google cloud pub/sub. I recently deployed that application with Cloud Run and it ran fine having full access to Pub/Sub.
Now because of reasons I have to deploy the application with GKE. However now the access to Pub/Sub seems not to work anymore.
I checked what service account my GKE Cluster is using and figured out it was the default one. Therefore I granted Permissions as Pub/Sub Editor to that service account.
I thought with this everything should work.
But still I see this error message in my logs:
com.google.api.gax.rpc.PermissionDeniedException: io.grpc.StatusRuntimeException: PERMISSION_DENIED: User not authorized to perform this action.
Any ideas what I have missed out?
That could be 2 things:
Either your pod use Workload Identity and doesn't use the default service account (with the editor role, thing to avoid by the way...). And so, the service account that you use hasn't the PubSub permissions
Or, because you use the default compute engine service account (with the editor role, thing to avoid by the way... I repeat myself, but it's really bad!), the Node pool scope are set by default (if you haven't override that parameters) and you can't access to the PubSub API because of credential scopes.
The best solution is to recreate your node pool, with a custom service account. Like that you can enforce the least privilege at node pool level, and you avoid the legacy compute engine scope definitions and limitations. If you use workload identity, you can go a level beyond in term of security and enforcing the least privilege at the pod level.

Why does enabling the Cloud Run API create so many service accounts? Why do they have so many privileges?

Enabling the Cloud Run API (dev console→Cloud Run→Enable) creates five service accounts. I want to understand their purpose. I need to know if it's my responsibility to configure them for least privileged access.
The Default compute service account has the Editor role. This is the Cloud Run runtime service account. Its purpose is clear, and I know it's my responsibility to configure it for least privileged access.
The App Engine default service account has the Editor role. This matches the description of the Cloud Functions runtime service account. Its purpose is unclear, given the existence of the Cloud Run runtime service account. I don't know if it's my responsibility to configure it for least privileged access.
The Google Container Registry Service Agent (Editor role) and Google Cloud Run Service Agent (Cloud Run Service Agent role) are both Google-managed service accounts "used to access the APIs of Google Cloud Platform services":
I'd like to see Google-managed service accounts configured for least privileged access. I'd also like to be able to filter the Google-managed service accounts in the IAM section of the GCP console. That said, I know I should ignore them.
The unnamed {project-number}{at}cloudbuild.gserviceaccount.com service account has the Cloud Build Service Account role. This service account "can perform builds" but does not appear in the Cloud Run Building Containers docs. It's used for Continuous Deployment—but can't do that without additional user configuration. It's not a Google-managed service account, but it does not appear in the Service Accounts section of the GCP console like the runtime service accounts. Its purpose is unclear. I don't know if it's my responsibility to configure it for least privileged access.
Cloud Run PM:
Yep, exactly right.
We should probably not create this if you're only using Run (and likely not enable the App Engine APIs, which is what created this). During Alpha, this was the runtime service account, and it's likely that it wasn't cleaned up.
I have a feeling it's stuck as Editor because it accesses Cloud Storage, which is oddly broken for "non Editor access" (I'm still trying to track down the exact issue, but it looks like there's a connection to the legacy Editor role that requires it).
Is already "least privileged" from it's perspective, as it only has the permissions to do the things that Run needs to do in order to set up resources on your behalf.
This is the runtime service account equivalent for Cloud Build, and falls into the same category as 1,2. If you need a build to deploy to Cloud Run, you have to grant this account something like Cloud Run Deployer (plus to the additional step of allowing the build service account to act as your runtime service account, to prevent [or at least acknowledge] privilege escalation).
I too want better filtering of "Google created" and "Google managed" and have been talking with the Cloud IAM team about this.

Site to Site connection between SonicWall and AWS - IAM Policy

I'm trying to set up a Site to Site connection between our on-premise server and our cloud infrastructure. In our premises we have a SonicWall firewall installed and, since SonicOS 6.5.1.0 it's now easy to put an AWS access key and AWS Secret Key and let the software configure everything via SDK.
The problem is that the tutorial on how to configure the firewall (p. 8) says:
The security policy used, either for a group to which the user belongs or attached to the user directly, must
include the following permissions:
• AmazonEC2FullAccess – For AWS Objects and AWS VPN
• CloudWatchLogsFullAccess – For AWS Logs
Since it's not ideal to give anyone the full access to Amazon EC2 do you know which features SonicWall actually needs so I can disable everything else and follow the principle of least privilege?
Without looking into the code for SonicWall itself, it is not going to be easy to know exactly which API calls it's going to make to EC2. If you are prepared to at least temporarily grant full EC2 access, you could use AWS CloudTrail to monitor exactly which API calls are being made by the IAM user associated with your on-premises server, and then update your specific policy to match those calls.
Alternatively, start with the full access IAM policy template and go through and deny any calls you think are completely unrelated to SonicWall's functionality.
If you trust SonicWall then probably the easiest thing to do is to just allow the full EC2 access it claims is required (or start there and gradually remove them until something breaks!)