What kind of data should/can each SQS message contain? - amazon-web-services

Suppose I have a task of updating an user via a third party API call. Is it okay to put the actual user data inside the message (if it fits)? Or should I only provide an ID in the message so the worker can retrieve the updated record from my local database?

You need to check what level of compliance is required for your infrastructure, to see what kind of data you want to put in the queue.
If there aren't any compliance restrictions, you are free to put any kind of data in your own infrastructure on AWS.

Related

Best strategy to archive specific records from RDS to a cheaper storage in AWS

I have the following requirements:
For every deleted record in RDS we need to archive it into somewhere cheaper on AWS.
Reduce storage cost
Not using Glacier
Context oriented (e.g. a file per table)
re-import is not a requirement
I'm not an experienced user with AWS, so I'm still a bit lost among the amount of options it has to offer and I'd like to know if you have more ideas to help me clear it out.
Initial thoughts:
The microservice that deletes the record, might send it to a broker (RabbitMQ for e.g.) and another microservice (let's call it archiver) will listen to it, write into a file, zip and send to S3. This approach has some technical challenges though: in order to make sense create big files, I need to wait the queue to growth a bit, wrap it into a stream and zip inside S3. The transaction control is very weak as well, since file writing and ack on messages are signal based i.e. I'll remove the messages from the broker just after the file is created.
Add a new column to the "archiveble" tables as "deleted (bool)" and run a separate job fetching only those records and saving them into S3. Discarded they don't want the new microservice with access to other's databases.
Following the same approach as in the first item, but instead of save into S3, save into a cheaper database. SimpleDB?
option 1, but instead of rabbitmq, write it to a kinesis firehose and direct that to an s3 location - it doesn't get much cheaper or easier than that.

Should I store failed login attempts in AWS Cognito or Dynamo DB?

I have a requirement to build a basic "3 failed login attempts and your account gets locked" functionality. The project uses AWS Cognito for Authentication, and the Cognito PreAuth and PostAuth triggers to run a Lambda function look like they will help here.
So the basic flow is to increment a counter in the PreAuth lambda, check it and block login there, or reset the counter in the PostAuth lambda (so successful logins dont end up locking the user out). Essentially it boils down to:
PreAuth Lambda
if failed-login-count > LIMIT:
block login
else:
increment failed-login-count
PostAuth Lambda
reset failed-login-count to zero
Now at the moment I am using a dedicated DynamoDB table to store the failed-login-count for a given user. This seems to work fine for now.
Then I figured it'd be neater to use a custom attribute in Cognito (using CognitoIdentityServiceProvider.adminUpdateUserAttributes) so I could throw away the DynamoDB table.
However reading https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-dg.pdf the section titled "Configuring User Pool Attributes" states:
Attributes are pieces of information that help you identify individual users, such as name, email, and phone number. Not all information about your users should be stored in attributes. For example, user data that changes frequently, such as usage statistics or game scores, should be kept in a separate data store, such as Amazon Cognito Sync or Amazon DynamoDB.
Given that the counter will change on every single login attempt, the docs would seem to indicate I shouldn't do this...
But can anyone tell me why? Or if there would be some negative consequence of doing so?
As far as I can see, Cognito billing is purely based on storage (i.e. number of users), and not operations, whereas Dynamo charges for read/write/storage.
Could it simply be AWS not wanting people to abuse Cognito as a storage mechanism? Or am I being daft?
We are dealing with similar problem and main reason why we have decided to store extra attributes in DB is that Cognito has quotas for all the actions and "AdminUpdateUserAttributes" is limited to 25 per second.
More information here:
https://docs.aws.amazon.com/cognito/latest/developerguide/limits.html
So if you have a pool with 100k or more it can create a bottle neck if wanted to update a Cognito user records with every login etc.
Cognito UserAttributes are meant to store information about the users. This information can then be read from the client using the AWS Cognito SDK, or just by decoding the idToken on the client-side. Every custom attribute you add will be visible on the client-side.
Another downside of custom attributes is that:
You only have 25 values to set
They cannot be removed or changed once added to the user pool.
I have personally used custom attributes and the interface to manipulate them is not excellent. But that is just a personal thought.
If you want to store this information, and not depend on DynamoDB, you can use Amazon Cognito Sync. Besides the service, it offers a client with great features that you can incorporate to your app.
AWS DynamoDb appears to be your best option, it is commonly used for such use cases. Some of the benefits of using it:
You can store separate record for each login attempt with as much info as you want such as ip address, location, user-agent etc. You can also add datetime that can be used by pre-auth Lambda to query by time range for example failed attempt within last 30 minutes
You don't need to manage table because you can set TTL for DynamoDb record so that record will be deleted automatically after specified time.
You can also archive items in S3

How to subscribe to changes in DynamoDB

I don't know how to subscribe to changes in DynamoDB database. Let me show an example: User A sends a message (which is saved in the database) to User B and in the User B's app the message automatically appears.
I know this is possible with recently released AWS AppSync, but I couldn't integrate it with Ionic (which I am using). However, there must be an alternative since AWS AppSync was released only at the end of 2017/beginning of 2018.
I've also seen something called Streams in DynamoDB but not sure if that's what I need.
DynamoDB Streams is designed specifically for capturing/subscribing to table activity. You can set up a Lambda Function with your notification logic to process the stream and send notifications accordingly.

Query AWS SNS Endpoints by User Data

Simple question, but I suspect it doesn't have a simple or easy answer. Still, worth asking.
We're creating an implementation for push notifications using AWS with our Web Server running on EC2, sending messages to a queue on SQS, which is dealt with using Lambda, which is sent finally to SNS to be delivered to the iOS/Android apps.
The question I have is this: is there a way to query SNS endpoints based on the custom user data that you can provide on creation? The only way I see to do this so far is to list all the endpoints in a given platform application, and then search through that list for the user data I'm looking for... however, a more direct approach would be far better.
Why I want to do this is simple: if I could attach a User Identifier to these Device Endpoints, and query based on that, I could avoid completely having to save the ARN to our DynamoDB database. It would save a lot of implementation time and complexity.
Let me know what you guys think, even if what you think is that this idea is impractical and stupid, or if searching through all of them is the best way to go about this!
Cheers!
There isn't the ability to have a "where" clause in ListTopics. I see two possibilities:
Create a new SNS topic per user that has some identifiable id in it. So, for example, the ARN would be something like "arn:aws:sns:us-east-1:123456789:know-prefix-user-id". The obvious downside is that you have the potential for a boat load of SNS topics.
Use a service designed for this type of usage like PubNub. Disclaimer - I don't work for PubNub or own stock but have successfully used it in multiple projects. You'll be able to target one or many users this way.
According the the [AWS documentation][1] if you try and create a new Platform Endpoint with the same User Data you should get a response with an exception including the ARN associated with the existing PlatformEndpoint.
It's definitely not ideal, but it would be a round about way of querying the User Data Endpoint attributes via exception.
//Query CustomUserData by exception
CreatePlatformEndpointRequest cpeReq = new CreatePlatformEndpointRequest().withPlatformApplicationArn(applicationArn).withToken("dummyToken").withCustomUserData("username");
CreatePlatformEndpointResult cpeRes = client.createPlatformEndpoint(cpeReq);
You should get an exception with the ARN if an endpoint with the same withCustomUserData exists.
Then you just use that ARN and away you go.

Amazon S3 conditional put object

I have a system in which I get a lot of messages. Each message has a unique ID, but it can also receives updates during its lifetime. As the time between the message sending and handling can be very long (weeks), they are stored in S3. For each message only the last version is needed. My problem is that occasionally two messages of the same id arrive together, but they have two versions (older and newer).
Is there a way for S3 to have a conditional PutObject request where I can declare "put this object unless I have a newer version in S3"?
I need an atomic operation here
That's not the use-case for S3, which is eventually-consistent. Some ideas:
You could try to partition your messages - all messages that start with A-L go to one box, M-Z go to another box. Then each box locally checks that there are no duplicates.
Your best bet is probably some kind of database. Depending on your use case, you could use a regular SQL database, or maybe a simple RAM-only database like Redis. Write to multiple Redis DBs at once to avoid SPOF.
There is SWF which can make a unique processing queue for each item, but that would probably mean more HTTP requests than just checking in S3.
David's idea about turning on versioning is interesting. You could have a daemon that periodically trims off the old versions. When reading, you would have to do "read repair" where you search the versions looking for the newest object.
Couldn't this be solved by using tags, and using a Condition on that when using PutObject? See "Example 3: Allow a user to add object tags that include a specific tag key and value" here: https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html#tagging-and-policies