I have a project on GCP which contains a compute node, dns,router,load balancer and API DialogFlow. The connection of DF fulfillment (webhook) with the compute node is though the dns and the load balancer, and it's works.
I detected some random and unfrequently latency problems between DF fulfillment (webhook) and the node and I suppose that if I could connect the webhook directly I'll reduce timings.
I want to connect the DF fulfillment (webhook) directly to the internal IP of the node, but it seems not possible. The DF API and the compute node are on the same GCP project, Why I can't connect the fulfillment with the local IP of the node?
So, the Dialogflow webhook service has some requirements as below:
It must handle HTTPS requests (I think with Compute Engine you can implement this using Ngrok)
The URL for requests must be publicly accessible
...
and a few more.
Although your logic that internal IP could reduce time is correct, the problem is it is not publicly assessable. I guess that is why it is not working. Additionally, DF's waiting time is 5 second and that should be enough unless you are doing some complicated DB queries. Even in that case, I have seen people discussing some workaround to extend the waiting time.
Here's the link for more details
Related
I need to setup a shared processing service that uses a load balancer and several EC2 instances to process incoming requests using a custom .NET application. My issue is that I need to be able to bill based on usage. Only white-listed IPs will be able to call the application, but each IP only gets a set number of calls before each call is a billable event.
Since the AWS documentation for the ELB states "We recommend that you use access logs to understand the nature of the requests, not as a complete accounting of all requests", I do not feel the Access Logs on the ELB is what I'm looking for.
The question I have is how to best manage this so that the accounting team has an easy report each month that says how many calls each client made.
Actually you can use Access logs and since access logs will be written to S3, you can query each IP with Athena by using standard SQL. You can analyze your logs and extract reports.
References:
https://docs.aws.amazon.com/athena/latest/ug/what-is.html
https://aws.amazon.com/premiumsupport/knowledge-center/athena-analyze-access-logs/
Is it possible to get the URL of the exact instance of a Cloud Run process?
I want to use global state in my HTTP server, so that a user can make a second HTTP request with a URL returned from the first request. Both requests should hit the same instance.
Because the second request is immediately after the first, the instance should still be alive.
I don't think what you are asking for can be done. I looked at the following documentation on WebSockets and Cloud Run [link] and it states there:
On Cloud Run, session affinity isn't available, so WebSockets requests can potentially end up at different container instances, due to built-in load balancing.
What this tells me is that there is a front-end load balancer that is the public endpoint for a Cloud Run request and the load balancer determines where to send the request. This means which back-end container. I am sensing that these containers literally have no addressable (direct) IP address or other endpoint that you can leverage. There simply is no way to specify in a subsequent HTTP request that it should go back to the same server instances as a previous request.
Is this a limitation? I'd be tempted to say no. The contract for Cloud Run is that it will service a request and scale as needed to service those requests ... but nowhere in the contract does it make any claims about the state of the server from request to request. One should assume that the container is virgin when reached for every request.
So how do you handle Global State? You don't maintain it in your container/WebServer ... instead, you maintain it in a state management service. Examples would be a SQL database (eg. Cloud SQL), a document database (eg. Cloud Datastore) or a REDIS system (Cloud Memorystore). All of those services are "managed as a service" and can be reached from Cloud Run instances.
It's impossible when using Cloud Run features, but it can be hacked. I am not sure if it's suitable for your case, but here's how it can be done.
On startup, assign a random instance id to a globally accessible static variable. It will be available for the particular instance as long as the container runs.
When making the first call, also return the instance id.
When making the second call, add the instance id to the call (parameter, header, whatever). In the API endpoint code, if the instance id of the request doesn't match the running container instance id, return some pre-defined status code that doesn't indicate success.
The client code needs to handle that status code and retry the call. Eventually, the request will hit the instance you need.
The new session affinity feature makes it more reliable, as the second request will most probably hit the same instance anyway, but I'd keep the check.
I think chrome to cloud run is doing http/2 from what I am reading and looking at developer tools, it shows things as http/2 headers(at least I don't think chrome displays it in http/2 header format if it is http1, but I can't tell as I would think this website is http1 but I see http/2 request headers in chrome's dev tools -> https://www.w3.org/Protocols/HTTP/Performance/microscape/).
Anyways, I am wondering for cloud run if I loop and keep calling a json endpoint to delivery pieces of a file to cloud storage, will it stay connected to the same instance the entire time such that my upload will work with the ByteReader in the server. In this way, I can load large files as long as it loads within the cloud run timeout window.
Does anyone know if this will work or will cloud run see each json request form chrome hit the firewall and the firewall might round robin it among cloud run instances?
Anyways, I am wondering for cloud run if I loop and keep calling a
JSON endpoint to deliver pieces of a file to cloud storage, will it
stay connected to the same instance the entire time ...
The answer sometimes it will and sometimes it will not. Do not design something that depends on that answer.
What you are looking for is often termed sticky sessions or session affinity.
Google Cloud Run is designed as a stateless service.
Google Cloud Run automatically scales container instances and load balances every request. Cloud Run does not offer any session stickiness between requests.
Google Cloud Run: About sticky sessions (session affinity)
Cloud Run offer bidirectional streaming and websocket support. The timeout is still limited to 1 hour, but it's a suitable connection to stream your large file into the same instance (don't crash the instance memory size, remember that even the file that you store take space in memory, because it's a stateless service)
A bad solution is to set a max instance to 1. It's a bad solution, because it's not scalable and, even if most of the time, you will have only one instance, sometime Cloud Run service can provision 2 or more instances and only guaranty you that only one is used at the same time.
Currently I'm doing a project and created a database instance in aws rds. We bought some sensors to monitor the water quality of some outfalls and the sensor will upload the data to the vendor's server. And we could reqeust this incremental data from vendor's website API.
What I want to realize is to set a script run automatically to request incremental data from vendor's website and import it into my aws database.
So created a lambda function and set a cloud watch to make it run automatically one time every day. The lambda function is requesting incremental data from a vendor's server and loading it into our own db instance in aws. Currently the system works well.
However there is a problem I found when checking my AWS billings.
https://i.stack.imgur.com/eojBo.png
As you can see there is a cost for NAT Gateway. In order to let the lambda function get access to the public internet, I created it by following the tutorial in this article. But I didn't expect that the cost depends on the hours it runs.
So is there a way to set up a job that starts/stops the NAT gateway only when I need it? Since the lambda function just run one time per day and it just takes about 3 or 4 seconds, the cost would be much lower if I don't need to maintain the NAT Gateway for all the time.
You can't stop a NAT gateway. You can only delete it. If you do this, next time you want to access the internet, you have to provision new one, modify all route tables to match the new NAT. Obviously this process can be automated, but it requires a custom solution.
But maybe, for very limited use of the internet, instead of NAT Gateway, you could use NAT instance. You could setup tiny NAT instance, and since its instance, you could stop it when not in use.
I have a web app which runs behind Amazon AWS Elastic Load Balancer with 3 instances attached. The app has a /refresh endpoint to reload reference data. It need to be run whenever new data is available, which happens several times a week.
What I have been doing is assigning public address to all instances, and do refresh independently (using ec2-url/refresh). I agree with Michael's answer on a different topic, EC2 instances behind ELB shouldn't allow direct public access. Now my problem is how can I make elb-url/refresh call reaching all instances behind the load balancer?
And it would be nice if I can collect HTTP responses from multiple instances. But I don't mind doing the refresh blindly for now.
one of the way I'd solve this problem is by
writing the data to an AWS s3 bucket
triggering a AWS Lambda function automatically from the s3 write
using AWS SDK to to identify the instances attached to the ELB from the Lambda function e.g. using boto3 from python or AWS Java SDK
call /refresh on individual instances from Lambda
ensuring when a new instance is created (due to autoscaling or deployment), it fetches the data from the s3 bucket during startup
ensuring that the private subnets the instances are in allows traffic from the subnets attached to the Lambda
ensuring that the security groups attached to the instances allow traffic from the security group attached to the Lambda
the key wins of this solution are
the process is fully automated from the instant the data is written to s3,
avoids data inconsistency due to autoscaling/deployment,
simple to maintain (you don't have to hardcode instance ip addresses anywhere),
you don't have to expose instances outside the VPC
highly available (AWS ensures the Lambda is invoked on s3 write, you don't worry about running a script in an instance and ensuring the instance is up and running)
hope this is useful.
While this may not be possible given the constraints of your application & circumstances, its worth noting that best practice application architecture for instances running behind an AWS ELB (particularly if they are part of an AutoScalingGroup) is ensure that the instances are not stateful.
The idea is to make it so that you can scale out by adding new instances, or scale-in by removing instances, without compromising data integrity or performance.
One option would be to change the application to store the results of the reference data reload into an off-instance data store, such as a cache or database (e.g. Elasticache or RDS), instead of in-memory.
If the application was able to do that, then you would only need to hit the refresh endpoint on a single server - it would reload the reference data, do whatever analysis and manipulation is required to store it efficiently in a fit-for-purpose way for the application, store it to the data store, and then all instances would have access to the refreshed data via the shared data store.
While there is a latency increase adding a round-trip to a data store, it is often well worth it for the consistency of the application - under your current model, if one server lags behind the others in refreshing the reference data, if the ELB is not using sticky sessions, requests via the ELB will return inconsistent data depending on which server they are allocated to.
You can't make these requests through the load balancer, So you will have to open up the security group of the instances to allow incoming traffic from source other than the ELB. That doesn't mean you need to open it to all direct traffic though. You could simply whitelist an IP address in the security group to allow requests from your specific computer.
If you don't want to add public IP addresses to these servers then you will need to run something like a curl command on an EC2 instance inside the VPC. In that case you would only need to open the security group to allow traffic from some server (or group of servers) that exist in the VPC.
I solved it differently, without opening up new traffic in security groups or resorting to external resources like S3. It's flexible in that it will dynamically notify instances added through ECS or ASG.
The ELB's Target Group offers a feature of periodic health check to ensure instances behind it are live. This is a URL that your server responds on. The endpoint can include a timestamp parameter of the most recent configuration. Every server in the TG will receive the health check ping within the configured Interval threshold. If the parameter to the ping changes it signals a refresh.
A URL may look like:
/is-alive?last-configuration=2019-08-27T23%3A50%3A23Z
Above I passed a UTC timestamp of 2019-08-27T23:50:23Z
A service receiving the request will check if the in-memory state is at least as recent as the timestamp parameter. If not, it will refresh its state and update the timestamp. The next health-check will result in a no-op since your state was refreshed.
Implementation notes
If refreshing the state can take more time than the interval window or the TG health timeout, you need to offload it to another thread to prevent concurrent updates or outright service disruption as the health-checks need to return promptly. Otherwise the node will be considered off-line.
If you are using traffic port for this purpose, make sure the URL is secured by making it impossible to guess. Anything publicly exposed can be subject to a DoS attack.
As you are using S3 you can automate your task by using the ObjectCreated notification for S3.
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-notification.html
You can install AWS CLI and write a simple Bash script that will monitor that ObjectCreated notification. Start a Cron job that will look for the S3 notification for creation of new object.
Setup a condition in that script file to curl "http: //127.0.0.1/refresh" when the script file detects new object created in S3 it will curl the 127.0.0.1/refresh and done you don't have to do that manually each time.
I personally like the answer by #redoc, but wanted to give another alternative for anyone that is interested, which is a combination of his and the accepted answer. Using SEE object creation events, you can trigger a lambda, but instead of discovering the instances and calling them, which requires the lambda to be in the vpc, you could have the lambda use SSM (aka Systems Manager) to execute commands via a powershell or bash document on EC2 instances that are targeted via tags. The document would then call 127.0.0.1/reload like the accepted answer has. The benefit of this is that your lambda doesn't have to be in the vpc, and your EC2s don't need inbound rules to allow the traffic from lambda. The downside is that it requires the instances to have the SSM agent installed, which sounds like more work than it really is. There's AWS AMIs already optimized with SSM agent stuff, but installing it yourself in the user data is very simple. Another potential downside, depending on your use case, is that it uses an exponential ramp up for simultaneous executions, which means if you're targeting 20 instances, it runs one 1, then 2 at once, then 4 at once, then 8, until they are all done, or it reaches what you set for the max. This is because of the error recovery stuff it has built in. It doesn't want to destroy all your stuff if something is wrong, like slowly putting your weight on some ice.
You could make the call multiple times in rapid succession to call all the instances behind the Load Balancer. This would work because the AWS Load Balancers use round-robin without sticky sessions by default, meaning that each call handled by the Load Balancer is dispatched to the next EC2 Instance in the list of available instances. So if you're making rapid calls, you're likely to hit all the instances.
Another option is that if your EC2 instances are fairly stable, you can create a Target Group for each EC2 Instance, and then create a listener rule on your Load Balancer to target those single instance groups based on some criteria, such as a query argument, URL or header.