Can an AWS Lambda modify a json file on itself? - amazon-web-services

I have an AWS Lambda function. which have an array on a .json file. now the thing is that I want to modify that .json but after the run, the json remains exactly the same than before the run.
The logs I place there make me think that is actually being modified, but, I wonder if a lambda goes back to its definition before the run.
tbh the information that I need to hold in that json is going to be always just a small amount of settings but those are going to be easy to modify without making a deploy and im trying to avoid using a db or an s3 bucket.
Regards,
Daniel

You're not going to be able to do this. Lambda stores the deployment package (i.e. the .zip or .jar file you used to deploy) and uses that package for the next Lambda it spins up. This new Lambda may or may not be the one that just ran.
The easiest way will be to store this in an S3 bucket. Be aware though that just like in multi-threaded programming you may have many processes (Lambda instances) running at the same time so resource contention is something to be aware of.

I want you to consider the following behaviour of Lambda function:
Let's say you spin one lambda up ,
and then you send a second message to lambda .
If you first lambda finished before you send the second message
The same lambda will run the message .
So this is why you see it changed the file , it's on the same instance with same files .
I would suggest loading json into memory ,
and not change the file directly .
That will solve your problem.

AWS Lambda images are immutable. You need to deploy new state file (json with array) or use some kind storage for it.

Related

How to deploy AWS Lambda using Codepipeline without rebuilding it if there are no function code changes?

so we have a pipeline that zips code functions for our Lambda, uploads it to S3 and builds
the every lambda we have again with new version of zipped codes.
Now, the problem is, every single Lambda is being Built every pipeline run. even if there are no changes to other lambda code function. ex. (only 1 of 10 lambda has code change)
What would be the best approach or checking that we need to add in our pipeline in order to build the only Lambda that has code change? open for any suggestions even creating new pipeline and breaking this lambdas into pieces
I think the best way is to add a stage before the zipfile to see with files changes in the code in the last merge.
Simply take those names and check which lambda was affected.
Then it will pass the list of lambdas need to redeploy.
What we do at our company is have a single pipeline per lambda / repo. We have a few mono repos that when they deploy they deploy all the lambdas in that repo at once but still through a single pipeline. If you concerned about cost in the pipeline sticking around you could always delete them and then have another job to recreate them when you need to feploy a new change.
We've got everything done through cloud formation scripts so it's all simple scripts running here and there to create pipelines.
Curious what is the reason to have one pipeline deploy all 10 lambdas?

Can I load code from file in a AWS Lambda?

I am thinking of creating 2 generic AWS Lambda functions, one as an "Invoker" to run the other Lambda function. The invoked Lambda function loads the code of the Lambda from a file based on the parameter that is passed to it.
Invoker: Calls the invoked Lambda with a specified parameter, e.g. ID
Invoked: Based on the ID, load the appropriate text file containing
the actual code to run
Can this be done?
The reason for this thinking is that I don't want to have to deploy 100 Lambda functions if I could just save the code in 100 text files in S3 bucket and load them as required.
The code is uploaded constantly by users and so I cannot include it in the lambda. And the code can be in all languages supported by AWS (.NET, NodeJs, Python, etc.)
For security, is there a way to maybe "containerized" running the code?
Any recommendation and ideas are greatly appreciated.
Thanking you in advance.
The very first I'd like to mention is that you should pay a lot of attention to the security aspects of your app as you are going to execute code uploaded by users, meaning that they will potentially be able to access sensitive data.
My example is based on NodeJS, but I think something similar may be achieved using other runtimes, not sure. There are main two things you need to know:
AWS Lambda execution environment provides you with the /tmp folder with capacity of 512 MB and you are allowed to put there any necessary resources needed for the current particular invocation.
NodeJS allows you to require modules dynamically at any place in the app.
So, basically, you may download the desired js file into the /tmp folder and then require it from your code. I am not going to write the real code now as it could be quite big, but here is some general steps just to make things clear:
Lambda receives fileId as a parameter in event.
Lambda searches S3 for the file named fileId and then downloads it to the /tmp folder as fileId.js
Now in the app you may require that file and consider it as a module:
const dynamicModule = require("/tmp/fileId.js");
Use the the module loaded
You certainly won't be able to run Python code, or .Net code, in a Node lambda. Can you load files and dynamically run the code? Probably. Should you? Probably not. Even if trust the source of that code you don't want them all running in the same function. 1) they would share the same permissions. That means that, at a minimum, they would all have access to the same S3 bucket where the code is stored. 2) they would all log to the same place. Good luck debugging.
We have several hundred lambda functions deployed in our account. I would never even entertain this idea as an alternative.

Alternative to AWS lambda when deployment package is greater than 250MB?

When I want to launch some code serverless, I use AWS Lambda. However, this time my deployment package is greater than 250MB.
So I can't deploy it on a Lambda...
I want to know what are the alternatives in this case?
I'd question your architecture. If you are running into problems with how AWS has designed a service (i.e. lambda 250mb max size) its likely you are using the service in a way it wasn't intended.
An anti-pattern I often see is people stuffing all their code into one function. Similar to how you'd deploy all your code to a single server. This is not really the use case for AWS lambda.
Does your function do one thing? If not, refactor it out into different functions doing different things. This may help remove dependencies when you split into multiple functions.
Another thing you can look at is can you code the function in a different language (another reason to keep functions small). I once had a lambda function in python that went over 250mb. When I looked at solving the same problem with node.js, my function size dropped to 20mb.
One thing you can do is before run the lambda function you can download the dependencies to /tmp folder from s3 bucket and then add it to python path, it would give you extra 512MB, although you need to take into consideration the download time for some of the lambda invocations

Clearing out tmp folder from AWS Lambda

Hi I have an AWS Lambda environment where the temp directory is now full and I get the following:
java.lang.RuntimeException: java.nio.file.FileSystemException: /tmp/out3786803744412914689: No space left on device
It's serverless so I cannot simply log into the box and delete the contents of the directory.
Is there any way to fix this other than deploying a code change to clear out the temp folder on restart?
When an AWS Lambda function is triggered, a temporary container is created. The Lambda function is then run within the container.
If the Lambda function is triggered many times, it is possible that multiple containers could be created. For example, if the function takes 5 seconds to run and 10 functions are triggered in one second, then 50 containers might be provisioned.
Also, once a function has completed executing, the container might be kept around and used again if the Lambda function is triggered again.
So, there is no single 'server' that is used for the Lambda function. It might be many, or it might be one that is reused.
It is recommended that functions delete their temporary files from /tmp before ending execution. This way, the space will be available for the next execution.
Conversely, you might want to intentionally keep some data in the container for the next execution to act like a cache. For example, if the function downloads some reference data, it will not need to re-download the data the next time if the container is reused.
Bottom line: Program the function to clean-up after itself.
To Add to #John Rotenstein's answer, our lambdas download a large ML model and move to /tmp at the start of the invocation.
In python we do something along the lines of:
if not os.path.isdir(f'/tmp/{self.model}'):
self.download_model()
For our use case this is better than clearing the /tmp dir at the end of the lambda run as it reduces the number of calls and downloads required to/from s3, giving a performance boost for warm starts. It also means the lambdas will finish quicker as they don't need to cleanup. The caveat here is our model is static so we don't need to worry about cache invalidation. If you need to load frequently changing data then of course clear the /tmp dir.
You could potentially build a Lambda shell into your Lambda function using (or emulating) the github lambdash project.
That would allow you to invoke the Lambda with a specific set of parameters that would trigger the Lambda shell feature and execute whatever shell command you passed to it, e.g. "rm /tmp/*". I would personally only consider doing this for development environments, not for production.
That said, the 'proper' answer is #John Rotenstein's answer.
I believe you could delete the contents of the /tmp folder since it would be isolated to your instance, meaning, everything in the /tmp folder was created by your lambda.
You could also offload all this data to some type of storage if it's still relevant.
S3
Dynamo
Redis

Invoke AWS Lambda with AWS X-Ray locally

Is there a way to invoke lambda with X-Ray by using sam invoke local?
According to the idea which PaulMaddox mentioned,
I have tried the step below, and I don't know whether I misunderstood :
Run a X-Ray Daemon locally (0.0.0.0:2000) by following the document
In my lambda's template.yaml set the ENV AWS_XRAY_DAEMON_ADDRESS: 0.0.0.0:2000
Invoke the function, but still got error Missing AWS Lambda trace data for X-Ray. Expected _X_AMZN_TRACE_ID to be set
Here is part of template.yaml setting, I used the environment variable to set AWS_XRAY_DAEMON_ADDRESS
It would be nice if you could provide more information.
I'm not too familiar with SAM, but...
You need to set the _X_AMZN_TRACE_ID environment variable. Currently, the X-Ray Node SDK works by cross communicating between the Lambda runtime start-up code and the user code.
Lambda starts the segment in their start-up code, records information such as timing and exceptions, and sends the segment to the X-Ray service. Then, it forwards the trace ID/parent ID/sampling decision to the user code via setting the _X_AMZN_TRACE_ID environment variable. This allows the SDK to create a separate subsegment, inferring a connection to the original segment, which gets "weaved" into the original on the service end without actually being directly related. Both are sent out of band, asynchronously from each other.
The _X_AMZN_TRACE_ID variable fits the same format as the tracing header here as discussed here: https://docs.aws.amazon.com/xray/latest/devguide/xray-concepts.html#xray-concepts-tracingheader
If you want to send traces through the Daemon to the X-Ray service, you'll need to figure out how to get SAM to construct this Lambda segment initially and set the _X_AMZN_TRACE_ID prior to importing the SDK.
Since the SDK auto-detects the presence of Lambda (which as I understand, SAM mimics), you'll have to set the _X_AMZN_TRACE_ID variable before importing in the SDK. Which, is kind of a catch-22, because you need to import the SDK (in the non-Lambda mode) to construct the Lambda segment before you can populate the _X_AMZN_TRACE_ID.
The problem lies here: https://github.com/aws/aws-xray-sdk-node/blob/master/packages/core/lib/aws-xray.js#L361
If you flip the SDK into LOG_ERROR mode (ignoring the Lambda errors), create and send the Lambda segment (just manually create a segment, load the generated ID/Parent ID/Sampling into _X_AMZN_TRACE_ID then close the segment) and then clear cache/re-import the SDK after, then that should work.
Otherwise, I suspect there may be some work on the SAM end to have this built in. But, hopefully this work around works.