AWS Lambda - Where to Keep the configuration - amazon-web-services

My system architecture looks like as follows:-
SNS -> AWS Lambda -> Dynamo Db
So, SNS is publishing messages to which AWS Lambda function is the subscriber and then AWS Lambda pushes the data into Dynamo Db. In this, I am doing some transformation of messages in AWS Lambda. For the transformation, I have to fetch some rules from some place. These rules are basically the mapping between fields of the original messages to fields to transformed messages.
Eg.
Say, Original Message looks like below:-
{"id": 1,
"name":"dsadas",
"house":"dsads dsadsa",
"speciality":"asjdsa"
}
and my mapping is something like:-
{"id":"id",
"house":"home",
"speciality":"area"
}
So, basically I am saying that id should be mapped to id, house to home and so on.
So, I want to keep this mapping at some places like Dynamo Db or some config service. I do not want to directly keep it in aws lambda code as there is a chance that I might have to change. But keeping it in Dynamo Db will be very costly in terms of latency I think because I will make a call on each message request. So, can anyone suggest, any aws resource which can be used for keeping these configs which is very fast and normally used for keeping configuration.

If you need full flexibility of modifying the mapping, without modifying the Lambda code, you have to rely on S3 or Dynamodb or any other storage service to keep the mappings which also adds latency and cost.
You can also keep a separate mapping.json(or js) and upload the file along with your Lambda code. The drawback is, you need to redeploy the Lambda function for each mapping.json modification.
Another option is to use environment variables, only to keep the attribute mapping key value pairs and the mapping template is constructed inside Lambda using these variables.
You can also use base64 encoding of the mapping template to an environment variable and use it in Lambda.

Related

How to ensure that S3 upload triggers a lambda function, but copying data within the same bucket does not trigger the lambda function anymore?

Required procedure:
Someone does an upload to an S3 bucket.
This triggers a Lambda function that does some processing on the uploaded file(s).
Processed objects are now copied into a "processed" folder within the same bucket.
The copy-operation in Step 3 should never re-trigger the initial Lambda function itself.
I know that the general guidance is to use a different bucket for storing the processed objects in a situation like this (but this is not possible in this case).
So my approach was to set up the S3 trigger to only listen to PUT/POST-Method and excluded the COPY-Method. The lambda function itself uses python-boto (S3_CLIENT.copy_object(..)). The approach seems to work (the lambda function seems to not be retriggered by the copy operation)
However I wanted to ask if this approach is really reliable - is it?
You can filter which events trigger the S3 notification.
There are 2 ways to trigger lambda from S3 event in general: bucket notifications and EventBridge.
Notifications: https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-filtering.html
EB: https://aws.amazon.com/blogs/aws/new-use-amazon-s3-event-notifications-with-amazon-eventbridge/
In your case, a quick search doesn't show me that you can setup a "negative" rule, so "everything which doesn't have processed prefix". But you can rework your bucket structure a bit and dump unprocessed items into unprocessed and setup filter based on that prefix only.
When setting up an S3 trigger for lambda function, there is the possibility, to define which kind of overarching S3-event should be listened to:

When to use lambda with api gateway and dynamodb

I am new to aws. I read few articles to directly connect api gateway to DynamoDB and few used lambda to route the requests. I want to understand in what cases we need lambda and when we can avoid it.
It is super-convenient that API Gateway can insert directly into DynamoDB. Many times that is sufficient. There may be times when you need to do "a little more" than just insert into DynamoDB. For example, you may need to:
Append some additional data to items going into a DynamoDB table (like a postal code)
Reformat some of the data (like split a name field into first name and last name)
Execute some other action (like also insert into an RDS database)
Processing a DynamoDB stream is another way in which you might handle those actions. However, DynamoDB stream processing happens asynchronously so you cannot immediately report a failure back through the original API Gateway endpoint. For example, if there's a problem in the Lambda you may want to have the API Gateway endpoint reply with an appropriate HTTP status code so that the caller knows about the problem.

AWS Lambda and loading of reference data

I'm building a Lambda function in AWS that need to load reference data from a mysql database. There is no real issue right now as it very limited amount of data. But what is best practice here? Is there away to keep this data within Lambda (or some other similar functionality) so that I don't need to request it for every invocation of the function? I'm using Node js though I don't think that affects this question.
Many thanks,
Marcus
There is no build-in persistent storage for lambda. Any data that you would like to keep reliably (not counting temporary persistence due to lambda execution context) between invocations is to store data outside of lambda itself.
You already store it in MySQL, but other popular choices are:
SSM Parameter Store
S3
EFS
DynamoDB
ElastiCache if you really need fast access to the data.
Since you already get the data from MySQL the only advantage of using SSM or DynamoDB would be that you can use AWS API to access and update them, or inspect/modify in AWS Console. You don't need to bundle any MySQL client with your function nor establish any connections to the database.

How to fetch only recently added aws s3 objects which were not accessed before?

I have multiple folders inside a bucket each folder is named as a unique guid and it is always going to contain a single file.
I need to fetch only those files which have never been read before. If I'll fetch all the objects at once and then do client side filtering it might introduce latency in the near future as every day the number of new folders getting added could be hundreds.
Initially I tried to list object by specifying StartAfter, but soon I realized it only works with alphabetically sorted list.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html
I am using AWS C# SDK. Can someone please give me some idea about the best approach.
Thanks
Amazon S3 does not maintain a concept of "objects that have not been accessed".
However, there is a different approach to process each object only once:
Create an Amazon S3 Event that will trigger when an object is created
The Event can then trigger:
An AWS Lambda function, or
Send a message to an Amazon SQS queue, or
Send a message to an Amazon SNS topic
You could therefore trigger your custom code via one of these methods, and you will never actually need to "search" for new objects.

Copy data from S3 and post process

There is a service that generates data in S3 bucket that is used for warehouse querying. Data is inserted into S3 as daily mechanism.
I am interested in copying that data from S3 to my service account to further classify the data. The classification needs to happen in my AWS service account as it is based on information present in my service account. Classification needs to happens in my service account as it is specific to my team/service. The service generating the data in S3 is neither concerned about the classification nor has the data to make classification decision.
Each S3 file consists of json objects (record) in it. For every record, I need to look into a dynamodb table. Based on whether data exists in Dynamo table, I need to include an additional attribute to the json object and store the list into another S3 bucket in my account.
The way I am considering doing this:
Trigger a scheduled CW event periodically to invoke a Lambda that will copy the files from Source S3 bucket into a bucket (lets say Bucket A) in my account.
Then, use another scheduled CW event to invoke a Lambda to read the records in the json and compare with dynamodb table to determine classification and write to updated record to another bucket (lets say Bucket B).
I have few questions regarding this:
Are there better alternatives for achieving this?
Would using aws s3 sync in the first Lambda be a good way to achieve this? My concerns revolve around lambdas getting timed out due large amount of data, especially for the second lambda that needs to compare against DDB for every record.
Rather than setting up scheduled events, you can trigger the AWS Lambda functions in real-time.
Use Amazon S3 Events to trigger the Lambda function as soon as a file is created in the source bucket. The Lambda function can call CopyObject() to copy the object to Bucket-A for processing.
Similarly, an Event on Bucket-A could then trigger another Lambda function to process the file. Some things to note:
Lambda functions run for a maximum of 15 minutes
You can increase the memory assigned to a Lambda function, which will also increase the amount of CPU assigned. So, this might speed-up the function if it is taking longer than 15 minutes.
There is a maximum of 512MB of storage space made available for a Lambda function.
If the data is too big, or takes too long to process, then you will need to find a way to do it outside of AWS Lambda. For example, using Amazon EC2 instances.
If you can export the data from DynamoDB (perhaps on a regular basis), you might be able to use Amazon Athena to do all the processing, but that depends on what you're trying to do. If it is simple SELECT/JOIN queries, it might be suitable.