Is it possible to get notified when CloudFront distribution has been deployed? Or more generally when the Status (and possibly State) has changed?
In principle I could create a periodic (say, 15 minutes) checker that would poll the distribution list and see whether any property has changed, but considering most of the time this doesn't really change it feels wasteful, plus it is more work (I need to remember the previous state, so the check cannot be stateless).
So a push-based (where AWS notifies me, instead of me asking) solution is preferred.
CloudFront does not provide any push notifications for distribution configuration or state changes.
Related
Here is my case:
When my server receieve a request, it will trigger distributed tasks, in my case many AWS lambda functions (the peek value could be 3000)
I need to track each task progress / status i.e. pending, running, success, error
My server could have many replicas
I still want to know about the task progress / status even if any of my server replica down
My current design:
I choose AWS S3 as my helper
When a task start to execute, it will create marker file in a special folder on S3 e.g. running folder
When the task fail or success, it will move the marker file from running folder to fail folder or success folder
I check the marker files on S3 to check the progress of the tasks.
The problems:
There is a limit for AWS S3 concurrent access
My case is likely to exceed the limit some day
Attempt Solutions:
I had tried my best to reduce the number of request to S3
I don't want to track the progress by storing data in my DB because my DB has already been under heavy workload.
To be honest, it is kind of wierd that using marker files on S3 to track progress of the tasks. However, it worked before.
Is there any recommendations ?
Thanks in advance !
This sounds like a perfect application of persistent event queueing, specifically Kinesis. As each Lambda starts it generates a “starting” event on Kinesis. When it succeeds or fails, it generates the appropriate event. You could even create progress events along the way if you want to see how far they have gotten.
Your server can then monitor the number of starting events against ending events (success or failure) until these two numbers are equal. It can query the error events to see which processes failed and why. All servers can query the same events without disrupting each other, and any server can go down and recover without losing data.
Make sure to put an Origination Key on events that are supposed to be grouped together so they don't get mixed up with a subsequent event. Also, each Lambda should be given its own key so you can trace progress per Lambda. Guids are perfect for this.
I have a serverless backend running on lambda. The runtime usually varies betweeen 40-250s which is over the apigateway max allowed runtime (29s). As such I think my only option is to resort to asynchronous processing. I get the idea behind it, but help online seems sparse and I'd like to know if there are any best practices out there? Or what would be the simplest way for me to get around this timeout problem–using asynchronous processing or other?
It really depends on your use case. But probably an asynchronous approach is best fitted for this scenario given that it's not usually a good idea from the calling side of your API to wait 250 seconds to get the reply back (probably that's why the 29s limitation on API Gateway).
Asynchronous simply means that you will be replying back from Lambda saying that you received the request and you are going to work on it but it will be available only later.
Then, you will be changing the logic on the client side, too, to check back after some time or perform some checks in a loop until the requested resource is ready.
Depending on what work needs to be done you could create an S3 bucket on the fly and reply back to the client with an S3 presigned URL. Then your worker will upload their results to the S3 bucket and the client will poll that bucket for the results until they are present.
Application creates an api key on a per user basis, meaning the process is as follows:
Lambda function creates api key and adds to a usage plan
Api key value is returned from lambda function
Api key is then immediately used to call an Api Gateway end point
Forbidden message is returned
If I delay execution between api key creation and the http request to the api gateway end point (by around 5 seconds), then it works as intended, but less than that I get an error.
I suspect that the api key takes a few seconds to propagate to the endpoint but I can't find an AWS API method that correctly lets me know when it has done so. Has anyone come across this problem before and how did you solve it?
The best solution I have at the moment is to retry the api call on a sliding timeout until an unreasonable amount of time has passed.
How long should I wait after applying an AWS IAM policy before it is valid? is not the same question but seems likely to be similar in its underlying explanation -- it's not so much a case of the API key taking time to exist but rather taking time to propagate and become visible at every possible place where it might need to exist before being valid for any subsequent request.
If those assumptions are correct, there is no mechanism for authoritatively determining whether the key is ready for use or not, because for some period of time after the key creation request succeeds, it's in a situation arguably reminiscent of Schrödinger's cat -- the key both exists and doesn't exist -- you don't know until you try it, and (unlike the cat) even a successful test does not necessarily prove that it is fully ready for use, because of the possibility (however unlikely) of a result such as fail fail fail fail pass fail pass pass pass. Such is the characteristic behavior of many large-scale, distributed systems.
From comments:
If an API call returns the api key value then I would expect it to be able to be used instantly, or at least return only when the key has been propagated fully to the end points.
That makes sense on the surface, but it becomes problematic in implementation. What if one of the endpoints is failed, offline for maintenance, or in the middle of recovering from an outage and lagging... what then? Fail the request? Delay the response waiting for something statistically unlikely to impact you?
The resource cost of observing replication tends to outweigh the benefits in many cases and can destabilize the control plane of a system if a replication issue causes a sufficient backlog, and is often not implemented except in cases where it has a high value, viz. the GetChange action in Route 53 which allows you to verify the propagation of a change through the system -- and note that even in this case, the change request itself succeeds without waiting -- if you need to verify the sync state, you have to ask separately.
A lot of AWS services take time to create. Usually there is a way to detect if the job has been completed. In this case it looks like you get a forbidden response until the key is created.
I think you will have to handle this in your client.
I have alarms set up to tell me when my load balancers are throwing 5xxs using the HTTPCode_Backend_5XX metric with the sum statistic. The issue is that sum registers 0 as no data points, so when no 5xxs are thrown, the alarm is treated as insufficient data. This is especially frustrating, because I have SNS setup to notify me whenever we get too many 5xxs (alarm state) and whenever things go back to normal. Annoyingly, 0 5xxs means we're in INSUFFICIENT DATA status, but 1 5xx means we're in OK status, so 1 5xx triggers everyone getting notified that stuff is OK. Is there any way around this? Ideally, I'd like to just have 0 of anything show up as a zero data point instead of no data at all (insufficient data).
As of March 2017, you can treat missing data as acceptable. This will prevent the alarm from being marked as INSUFFICIENT.
You can also set this in CloudFormation using the TreatMissingData property. For example:
TreatMissingData: notBreaching
We had a similar issue for some of our alarms. You can actually avoid this behaviour with some work, if you really want to deal with the overhead.
What we have done is, instead of sending SNS notifications directly to e-mails, we have created a lambda function and triggered it once we have the notification in the SNS topic.
This way, you will have more control over the actions you can take once the alarms are triggered. As the context will provide you old state value as well.
The good news is, there is already a lambda template to get started.
https://aws.amazon.com/blogs/aws/new-slack-integration-blueprints-for-aws-lambda/
Just pick the one that is designed to send cloudwatch alarms to slack. You can then modify the code as you wish, either dismiss the slack part and just use emails, or keep it with slack. (which is what we did and it works like a charm)
I asked for this in the AWS forums two years ago :-(
https://forums.aws.amazon.com/thread.jspa?threadID=153753&tstart=0
Unfortunately you cannot create notifications based on specific state changes (in your case you want a notification when state changes from ALARM to OK, but not when state changes from INSUFFICIENT to OK). I can only suggest that you also ask for it and hopefully it will eventually be added.
For metrics that are often in the INSUFFICIENT state I generally just create notifications for ALARMS and I don't have notifications on OK for these metrics - if I want to confirm that things are OK I use the AWS mobile app to check on things and see if they have resolved.
We have a very simple AppFabric setup where there are two clients -- lets call them Server A and Server B. Server A is also the lead cache host, and both Server A and B have a local cache enabled. We'd like to be able to make an update to an item from server B and have that change propagate to the local cache of Server A within 30 seconds (for example).
As I understand it, there appears to be two different ways of getting changes propagated to the client:
Set a timeout on the client cache to evict items every X seconds. On next request for the item it will get the item from the host cache since the local cache doesn't have the item
Enable notifications and effectively subscribe to get updates from the cache host
If my requirement is to get updates to all clients within 30 seconds then setting a timeout of less than 30 seconds on the local cache appears to be the only choice if going with option #1 above. Due to the size of the cache, this would be inefficient to evict all of the cache (99.99% of which probably hasn't changed in the last 30 seconds).
I think what we need to implement is option #2 above, but I'm not sure I understand how this works. I've read all of the msdn documentation (http://msdn.microsoft.com/en-us/library/ee808091.aspx) and have looked at some examples but it is still unclear to me whether it is really necessary to write custom code or if this is only if you want to do extra handling.
So my question is: is it necessary to add code to your existing application if want to have updates propagated to all local caches via notifications, or is the callback feature just an bonus way of adding extra handling or code if a notification is pushed down? Can I just enable Notifications and set the appropriate polling interval at the client and things will just work?
It seems like the default behavior (when Notifications are enabled) should be to pull down fresh items automatically at each polling interval.
I ran some tests and am happy to say that you do NOT need to write any code to ensure that all clients are kept in sync. If you set the following as a child element of the cluster config:
In the client config you need to set sync="NotificationBased" on the element.
The element in the client config will tell the client how often it should check for new notifications on the server. In this case, every 15 seconds the client will check for notifications and pull down any items that have changed.
I'm guessing the callback logic that you can add to your app is just in case you want to add your own special logic (like emailing the president every time an item changes in the cache).