Optimize Image Storage / Get Requests from Amazon S3 with Picasso - amazon-web-services

I am creating a polling app, and each poll is going to have an associated image of the particular topic.
I am using Firebase to dynamically update polls as events occur. In Firebase, I am storing the relevant Image URL (referencing the URL in Amazon S3), and I am then using Picasso to load the image onto the client's device (see code below).
I have already noticed that I may be handling this data inefficiently, resulting in unnecessary Get requests to my Amazon files in S3. I was wondering what options I have with Picasso (i.e. I am thinking some caching) to pull the images for each client just once and them store them locally (I do not want them to remain on the client's device permanently, however). My goal is to minimize costs but not compromise performance. Below is my current code:
mPollsRef.child(mCurrentDateString).child(homePollFragmentIndexConvertedToFirebaseReferenceImmediatelyBelowDate).addListenerForSingleValueEvent(new ValueEventListener() {
#Override
public void onDataChange(DataSnapshot dataSnapshot) {
int numberOfPollAnswersAtIndexBelowDate = (int) dataSnapshot.child("Poll_Answers").getChildrenCount();
Log.e("TAG", "There are " + numberOfPollAnswersAtIndexBelowDate + " polls answers at index " + homePollFragmentIndexConvertedToFirebaseReferenceImmediatelyBelowDate);
addRadioButtonsWithFirebaseAnswers(dataSnapshot, numberOfPollAnswersAtIndexBelowDate);
String pollQuestion = dataSnapshot.child("Poll_Question").getValue().toString();
mPollQuestion.setText(pollQuestion);
//This is where the image "GET" from Amazon S3 using Picasso begins; the URL is in Firebase and then I use that URL
//with the Picasso.load method
final String mImageURL = (String) dataSnapshot.child("Image").getValue();
Picasso.with(getContext())
.load(mImageURL)
.fit()
.into((ImageView) rootView.findViewById(R.id.poll_image));
}
#Override
public void onCancelled(FirebaseError firebaseError) {
}
});

First, the Picasso instance will hold a memory cache by default (or you can configure it).
Second, disk caching is done by the HTTP client. You should use OkHttp 3+ in 2016. By default, Picasso will make a reasonable default cache with OkHttp if you include OkHttp in your dependencies. You can also set the Downloader when creating the Picasso instance (make sure to set the cache on the client and use OkHttpDownloader or comparable).
Third, OkHttp will respect cache headers, so make sure the max-age and max-stale have appropriate values.

Related

LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413

I have node/express + serverless backend api which I deploy to Lambda function.
When I call an api, request goes to API gateway to lambda, lambda connects to S3, reads a large bin file, parses it and generates an output in JSON object.
The response JSON object size is around 8.55 MB (I verified using postman, running node/express code locally). Size can vary as per bin file size.
When I make an api request, it fails with the following msg in cloudwatch,
LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413
I can't/don't want to change this pipeline : HTTP API Gateway + Lambda + S3.
What should I do to resolve the issue ?
the AWS lambda functions have hard limits for the sizes of the request and of the response payloads. These limits cannot be increased.
The limits are:
6MB for Synchronous requests
256KB for Asynchronous requests
You can find additional information in the official documentation here:
https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html
You might have different solutions:
use EC2, ECS/Fargate
use the lambda to parse and transform the bin file into the desired JSON. Then save this JSON directly in an S3 public bucket. In the lambda response, you might return the client the public URL/URI/FileName of the created JSON.
For the last solution, if you don't want to make the JSON file visible to whole the world, you might consider using AWS Amplify in your client or/and AWS Cognito in order to give only an authorised user access to the file that he has just created.
As noted in other questions, API Gateway/Lambda has limits on on response sizes. From the discussion I read that latency is a concern additionally.
With these two requirements Lambda are mostly out of the question, as they need some time to start up (which can be lowered with provisioned concurrency) and do only have normal network connections (whereas EC2,EKS can have enhanced networking).
With this requirements it would be better (from AWS Point Of View) to move away from Lambda.
Looking further we could also question the application itself:
Large JSON objects need to be generated on demand. Why can't these be pre-generated asynchronously and then downloaded from S3 directly? Which would give you the best latency and speed and can be coupled with CloudFront
Why need the JSON be so large? Large JSONs also need to be parsed on the client side requiring more CPU. Maybe it can be split and/or compressed?

clear old AWS MediaPackage content from Live Stream

I am using AWS MediaLive and MediaPackage to deliver a HLS Livestream.
However if the stream ends there is always one Minute available in the .m3u8 playlist.
The settings "Startover window (sec.): 0" does not seem to solve this.
Deleting and creating new .m3u8 playlist would be very inconviniert because all players would have to be updatet.
Do anyone have an advice?
Cheers, Richy
Thanks for your post. If i understand correctly, you are referring to the MediaPackage endpoint which serves up a manifest with the last known segments, (60 seconds worth of segments by default).
There are several ways to alter or stop this behavior. I suggest testing some of these methods to see which you prefer:
[a] Delete the public-facing MediaPackage endpoint shortly (perhaps 10s) after your event ends. All subsequent requests to that endpoint will return an error. Segments already retrieved and cached by the player will not be affected, but no new data will be served. Note: you may also maintain a private endpoint on the same Channel to allow for viewing + harvesting of the streamed content if you wish.
[b] Use an AWS CloudFront CDN Distribution with a short Time to Live (TTL) in front of your MediaPackage Channel (which acts as the origin) to deliver content segments to your viewers. When the event ends, you can immediately disable or delete this CDN Distribution, and all requests for content segments will return an error. Segments already retrieved and cached by the player will not be affected, but no new data will be served from this distribution.
[c] Encrypt the content using MediaPackage encryption, then disable the keys at the end of the event. This Same approach applies to CDN Authorization headers, which you can mandate for the event playback and then delete after event completes.
[e] Use DNS redirection to your MediaPackage endpoint. When the event ends, remove the DNS redirector so that any calls to the old domain will fail.
I think one or a combination of these methods will work for you. Good Luck!

How do I add simple licensing to api when using AWS Cloudfront to cache queries

I have an application deployed on AWS Elastic Beanstalk, I added some simple licensing to stop abuse of the api, the user has to pass a licensekey as a field
i.e
search.myapi.com/?license=4ca53b04&query=fred
If this is not valid then the request is rejected.
However until the monthly updates the above query will always return the same data, therefore I now point search.myapi.com to an AWS CloudFront distribution, then only if query is not cached does it go to actual server as
direct.myapi.com/?license=4ca53b04&query=fred
However the problem is that if two users make the same query they wont be deemed the same by Cloudfront because the license parameter is different. So the Cloudfront caching is only working at a per user level which is of no use.
What I want to do is have CloudFront ignore the license parameters for caching but not the other parameters. I dont mind too much if that means user could access CloudFront with invalid license as long as they cant make successful query to server (since CloudFront calls are cheap but server calls are expensive, both in terms of cpu and monetary cost)
Perhaps what I need is something in front of CloudFront that does the license check and then strips out the license parameter but I don't know what that would be ?
Two possible come to mind.
The first solution feels like a hack, but would prevent unlicensed users from successfully fetching uncached query responses. If the response is cached, it would leak out, but at no cost in terms of origin server resources.
If the content is not sensitive, and you're only trying to avoid petty theft/annoyance, this might be viable.
For query parameters, CloudFront allows you to forward all, cache on whitelist.
So, whitelist query (and any other necessary fields) but not license.
Results for a given query:
valid license, cache miss: request goes to origin, origin returns response, response stored in cache
valid license, cache hit: response served from cache
invalid license, cache hit: response served from cache
invalid license, cache miss: response goes to origin, origin returns error, error stored in cache.
Oops. The last condition is problematic, because authorized users will receive the cached error if the make the same query.
But we can fix this, as long as the origin returns an HTTP error for an invalid request, such as 403 Forbidden.
As I explained in Amazon CloudFront Latency, CloudFront caches responses with HTTP errors using different timers (not min/default/max-ttl), with a default of t minutes. This value can be set to 0 (or other values) for each of several individual HTTP status codes, like 403. So, for the error code your origin returns, set the Error Caching Minimum TTL to 0 seconds.
At this point, the problematic condition of caching error responses and playing them back to authorized clients has been solved.
The second option seems like a better idea, overall, but would require more sophistication and probably cost slightly more.
CloudFront has a feature that connects it with AWS Lambda, called Lambda#Edge. This allows you to analyze and manipulate requests and responses using simple Javascript scripts that are run at specific trigger points in the CloudFront signal flow.
Viewer Request runs for each request, before the cache is checked. It can allow the request to continue into CloudFront, or it can stop processing and generate a reaponse directly back to the viewer. Generated responses here are not stored in the cache.
Origin Request runs after the cache is checked, only for cache misses, before the request goes to the origin. If this trigger generates a response, the response is stored in the cache and the origin is not contacted.
Origin Response runs after the origin response arrives, only for cache misses, and before the response goes onto the cache. If this trigger modifies the response, the modified response stored in the cache.
Viewer Response runs immediately before the response is returned to the viewer, for both cache misses and cache hits. If this trigger modifies the response, the modified response is not cached.
From this, you can see how this might be useful.
A Viewer Request trigger could check each request for a valid license key, and reject those without. For this, it would need access to a way to validate the license keys.
If your client base is very small or rarely changes, the list of keys could be embedded in the trigger code itself.
Otherwise, it needs to validate the key, which could be done by sending a request to the origin server from within the trigger code (the runtime environment allows your code to make outbound requests and receive responses via the Internet) or by doing a lookup in a hosted database such as DynamoDB.
Lambda#Edge triggers run in Lambda containers, and depending on traffic load, observations suggest that it is very likely that subsequent requests reaching the same edge location will be handled by the same container. Each container only handles one request at a time, but the container becomes available for the next request as soon as control is returned to CloudFront. As a consequence of this, you can cache the results in memory in a global data structure inside each container, significantly reducing the number of times you need to ascertain whether a license key is valid. The function either allows CloudFront to continue processing as normal, or actively rejects the invalid key by generating its own response. A single trigger will cost you a little under $1 per million requests that it handles.
This solution prevents missing or unauthorized license keys from actually checking the cache or making query requests to the origin. As before, you would want to customize the query string whitelist in the CloudFront cache behavior settings to eliminate license from the whitelist, and change the error caching minimum TTL to ensure that errors are not cached, even though these errors should never occur.

strongloop/loopback - Change connection string based on route value

My application's users are geographically dispersed and data is stored in various regions. Each region has it's own data center and database server.
I would like to include a route value to indicate the region that the user wants to access and connect to, as follows:
/api/region/1/locations/
/api/region/2/locations/
/api/region/3/locations/
Depending on the region passed in, I would like to change the connection string being used. I assume this can be performed somewhere in the middleware chain, but don't know where/how. Any help is appreciated!
What should not be done
Loopback provides a method MyModel.attachTo (doesnt seem to be documented, but a reference to it is made there ).
But since it is a static method, it affects the entire Model, not a single instance.
So for this to work on a per-request basis, you must switch the DB right before the call to the datasource method, to make sure nothing async starts in between. I don't think this is possible.
This is an example using an operation hook (and define all datasources, include dbRegion1 below in datasources.json)
Bad, don't that below. Just for reference
Region.observe('loaded', function filterProperties(ctx, next) {
app.models.Region.attachTo(app.dataSources.dbRegion1);
}
But then you will most likely face concurrency issues when your API receives multiple requests in a short time.
(Another way to see it is that the server is no longer truly stateless, execution will not depend only on inputs but also on a shared state).
The hook may set region2 for request 2 while the method called after the hook was expecting to use region1 for request 1. This will be the case if something async is triggered between the hook and the actual call to the datasource method.
So ultimately, I don't think you should do that. I'm just putting it there because some people have recommended it in other SO posts, but it's just bad.
Potential option 1
Build an external re-routing server, that will re-route the requests from the API server to the appropriate region database.
Use the loopback-connector-rest in your API server to consume this microservice, and use it as a single datasource for all your models. This provides abstraction over database selection.
Then of course there is still the matter of implementing the microservice, but maybe you can find some other ORM than loopback's that will support database sharding, and use it in that microservice.
Potential option 2
Create a custom loopback connector that will act as router for MySQL queries. Depending on region value passed inside the query, re-route the query to the appropriate DB.
Option 3
Use a more distributed architecture.
Write a region-specific server to persist region-specific data.
Run for instance 3 different servers, each one configured for a region.
+ 1 common server for routing
Then build a routing middleware for your single user-facing REST api server.
Basic example:
var express = require('express');
var request = require('request');
var ips = ['127.0.0.1', '127.0.0.2'];
app.all('/api/region/:id', function (req, res, next) {
console.log('Reroute to region server ' + req.params.id);
request(ips[req.params.id], function (error, response, body) {
if (err) return next(err);
next(null, body);
});
});
Maybe this option is the easiest to do

How do I set cookies in Load Impact?

We’ve come across this question fairly often at Load Impact, so I’m adding it to the Stack Overflow community to make it easier to find
Q: When performing a Load Impact load test, I need to have the VUs send cookies with their requests. How do I set a cookie for a VU?
Load Impact VUs will automatically save and use cookies sent to them by the server (through the "Set-Cookie:" header). When the user scenario executed by the VU ends and gets restarted (i.e. starts a new user scenario script iteration), cookies stored by the VU/client will be cleared.
Cookies, or more specifically the “Cookie:” header, is currently the only header that is set automatically by the client. Other headers, such as e.g. “If-Modified-Since:” will not be set unless the user specifies it in the load script (this is why caching is not emulated automatically - client caching behaviour has to be programmed).
You can't manipulate the stored cookies that the VU client has, but you can override or set a cookie used by the client if you specify the "Cookie:" header in the requests you make, like this:
http.request_batch({
{"GET", "http://example.com/", headers={["Cookie"]="name=value"}}
})