Clearing out tmp folder from AWS Lambda - amazon-web-services

Hi I have an AWS Lambda environment where the temp directory is now full and I get the following:
java.lang.RuntimeException: java.nio.file.FileSystemException: /tmp/out3786803744412914689: No space left on device
It's serverless so I cannot simply log into the box and delete the contents of the directory.
Is there any way to fix this other than deploying a code change to clear out the temp folder on restart?

When an AWS Lambda function is triggered, a temporary container is created. The Lambda function is then run within the container.
If the Lambda function is triggered many times, it is possible that multiple containers could be created. For example, if the function takes 5 seconds to run and 10 functions are triggered in one second, then 50 containers might be provisioned.
Also, once a function has completed executing, the container might be kept around and used again if the Lambda function is triggered again.
So, there is no single 'server' that is used for the Lambda function. It might be many, or it might be one that is reused.
It is recommended that functions delete their temporary files from /tmp before ending execution. This way, the space will be available for the next execution.
Conversely, you might want to intentionally keep some data in the container for the next execution to act like a cache. For example, if the function downloads some reference data, it will not need to re-download the data the next time if the container is reused.
Bottom line: Program the function to clean-up after itself.

To Add to #John Rotenstein's answer, our lambdas download a large ML model and move to /tmp at the start of the invocation.
In python we do something along the lines of:
if not os.path.isdir(f'/tmp/{self.model}'):
self.download_model()
For our use case this is better than clearing the /tmp dir at the end of the lambda run as it reduces the number of calls and downloads required to/from s3, giving a performance boost for warm starts. It also means the lambdas will finish quicker as they don't need to cleanup. The caveat here is our model is static so we don't need to worry about cache invalidation. If you need to load frequently changing data then of course clear the /tmp dir.

You could potentially build a Lambda shell into your Lambda function using (or emulating) the github lambdash project.
That would allow you to invoke the Lambda with a specific set of parameters that would trigger the Lambda shell feature and execute whatever shell command you passed to it, e.g. "rm /tmp/*". I would personally only consider doing this for development environments, not for production.
That said, the 'proper' answer is #John Rotenstein's answer.

I believe you could delete the contents of the /tmp folder since it would be isolated to your instance, meaning, everything in the /tmp folder was created by your lambda.
You could also offload all this data to some type of storage if it's still relevant.
S3
Dynamo
Redis

Related

How can I pass different command line arguments to a Task in aws every time I run it?

What I want to accomplish
I want have a computationally heavy python function that:
takes a file from an S3 bucket
transforms it
then saves a new file to the S3 bucket
I have done functions like these as lambda functions before, but this one is very computationally heavy and takes a long time, so I decided it would be better to package it in a container, put it on ECS and run it through Fargate. (Forgive my use of these terms if I am doing it incorrectly, I haven't wrapped my head around these concepts yet)
... Practically:
So I want to be able to run the Task that has that image of my docker container like I run my container locally, passing arguments to it every time I run it, and these arguments are different for every run:
Run 1:
docker run python-image input_file_path_1 output_file_path_1
Run 2:
docker run python-image input_file_path_2 output_file_path_2
From what I understand, I can change the task definition to include passing arguments to the script, which seems like they are "hardcoded" into the task definition, meaning they cannot be changed for every run.
Question?
So in essence, my question is how do I run this task, either from the cli or the user interface or a lambda function, where I would be able to pass arguments dynamically, every time I run the Task?
Thank you :)
If you look at the ECS RunTask API, you'll see an overrides parameter. Inside that overrides parameter you can override things such as the container environment variables, and the command. In your instance it sounds most appropriate to pass an override of the command each time you call RunTask.
You mentioned both the CLI and AWS Lambda. For the CLI you can see the documentation here. You didn't mention what programming language you are using for AWS Lambda, but all the AWS SDKs have an ECS RunTask you can lookup in their respective documentation to see how to pass in an override.

Can I load code from file in a AWS Lambda?

I am thinking of creating 2 generic AWS Lambda functions, one as an "Invoker" to run the other Lambda function. The invoked Lambda function loads the code of the Lambda from a file based on the parameter that is passed to it.
Invoker: Calls the invoked Lambda with a specified parameter, e.g. ID
Invoked: Based on the ID, load the appropriate text file containing
the actual code to run
Can this be done?
The reason for this thinking is that I don't want to have to deploy 100 Lambda functions if I could just save the code in 100 text files in S3 bucket and load them as required.
The code is uploaded constantly by users and so I cannot include it in the lambda. And the code can be in all languages supported by AWS (.NET, NodeJs, Python, etc.)
For security, is there a way to maybe "containerized" running the code?
Any recommendation and ideas are greatly appreciated.
Thanking you in advance.
The very first I'd like to mention is that you should pay a lot of attention to the security aspects of your app as you are going to execute code uploaded by users, meaning that they will potentially be able to access sensitive data.
My example is based on NodeJS, but I think something similar may be achieved using other runtimes, not sure. There are main two things you need to know:
AWS Lambda execution environment provides you with the /tmp folder with capacity of 512 MB and you are allowed to put there any necessary resources needed for the current particular invocation.
NodeJS allows you to require modules dynamically at any place in the app.
So, basically, you may download the desired js file into the /tmp folder and then require it from your code. I am not going to write the real code now as it could be quite big, but here is some general steps just to make things clear:
Lambda receives fileId as a parameter in event.
Lambda searches S3 for the file named fileId and then downloads it to the /tmp folder as fileId.js
Now in the app you may require that file and consider it as a module:
const dynamicModule = require("/tmp/fileId.js");
Use the the module loaded
You certainly won't be able to run Python code, or .Net code, in a Node lambda. Can you load files and dynamically run the code? Probably. Should you? Probably not. Even if trust the source of that code you don't want them all running in the same function. 1) they would share the same permissions. That means that, at a minimum, they would all have access to the same S3 bucket where the code is stored. 2) they would all log to the same place. Good luck debugging.
We have several hundred lambda functions deployed in our account. I would never even entertain this idea as an alternative.

Alternative to AWS lambda when deployment package is greater than 250MB?

When I want to launch some code serverless, I use AWS Lambda. However, this time my deployment package is greater than 250MB.
So I can't deploy it on a Lambda...
I want to know what are the alternatives in this case?
I'd question your architecture. If you are running into problems with how AWS has designed a service (i.e. lambda 250mb max size) its likely you are using the service in a way it wasn't intended.
An anti-pattern I often see is people stuffing all their code into one function. Similar to how you'd deploy all your code to a single server. This is not really the use case for AWS lambda.
Does your function do one thing? If not, refactor it out into different functions doing different things. This may help remove dependencies when you split into multiple functions.
Another thing you can look at is can you code the function in a different language (another reason to keep functions small). I once had a lambda function in python that went over 250mb. When I looked at solving the same problem with node.js, my function size dropped to 20mb.
One thing you can do is before run the lambda function you can download the dependencies to /tmp folder from s3 bucket and then add it to python path, it would give you extra 512MB, although you need to take into consideration the download time for some of the lambda invocations

Can an AWS Lambda modify a json file on itself?

I have an AWS Lambda function. which have an array on a .json file. now the thing is that I want to modify that .json but after the run, the json remains exactly the same than before the run.
The logs I place there make me think that is actually being modified, but, I wonder if a lambda goes back to its definition before the run.
tbh the information that I need to hold in that json is going to be always just a small amount of settings but those are going to be easy to modify without making a deploy and im trying to avoid using a db or an s3 bucket.
Regards,
Daniel
You're not going to be able to do this. Lambda stores the deployment package (i.e. the .zip or .jar file you used to deploy) and uses that package for the next Lambda it spins up. This new Lambda may or may not be the one that just ran.
The easiest way will be to store this in an S3 bucket. Be aware though that just like in multi-threaded programming you may have many processes (Lambda instances) running at the same time so resource contention is something to be aware of.
I want you to consider the following behaviour of Lambda function:
Let's say you spin one lambda up ,
and then you send a second message to lambda .
If you first lambda finished before you send the second message
The same lambda will run the message .
So this is why you see it changed the file , it's on the same instance with same files .
I would suggest loading json into memory ,
and not change the file directly .
That will solve your problem.
AWS Lambda images are immutable. You need to deploy new state file (json with array) or use some kind storage for it.

Force Discard AWS Lambda Container

How to manually forcefully discard a aws lambda function in the cluster using aws console or aws cli for development and testing purposes ?
If you redeploy the function it'll terminate all existing containers. It could be as simple as assigning the current date/time to the description of the Lambda function and redeploying. This will allow you to redeploy as many times as you need because something is unique and it will tear down all existing containers each time you do the deployment.
With that said, Lambda functions are supposed to be stateless. You should keep that in mind when you write your code (eg. avoid using global variables, use random file names if creating something temp, etc). From the sounds of things, I think you might have an issue with your design if you require the Lambda container to be torn down.
If you're using the UI, then a simple way to do this is to add or alter an environment variable on the function configuration page.
When you click "Save" the function will be reloaded.
Note: this won't work if you're using the versioned functions feature.