I'm using container image with 5x170Mb AI models.
When I invoke function the first time all those models load into memory for further inference.
Problem: more often it takes about 10-25 sec per file to load. (So cold start takes about 2 minutes)
But sometimes it loads as expected about 1-2 sec a model and cold start takes only 10 secs.
After little investigation I've found that it's all about reading/opening file from disk into memory. So simple "read byte-file from disk to variable" takes 10-20 seconds. Insane.
P.S. I'm using 10240Mb RAM functions and should have the most processing power.
Is there any way I can avoid so long loading? Why does it happens?
UPDATE:
I'm using onnxruntime and Python to load the model
All code and models stored in container and opened/loaded from there
From experiment: if I open any model as with open("model.onnx","rb") as f: cont = f.read() it takes 20 secs to open the file. But then when I open the same file with model = onnxruntime.InferenceSession("model.onnx") it loads instantly. So I've made a conclusion that problem with opening/reading file, not with onnx.
This also happens with reading big files in "ZIP" type function. It looks like it's not container problem.
TO REPRODUCE:
If you want to see how it works on your side.
Create lambda function
Configure it to 10240 mb ram and 30 sec timeout
Upload ZIP from my S3: https://alxbtest.s3.amazonaws.com/file-open-test.zip
Run/test event. It took me 16 seconds to open the file.
Zip contains "model.onnx" (168Mb) and "lambda_fuction.py" with code:
import json,time
def lambda_handler(event, context):
# TODO implement
tt = time.time()
with open("model.onnx","rb") as f:
cont = f.read()
tt = time.time()-tt
print(f"Open time: {tt:0.4f} s")
return {
'statusCode': 200,
'body': json.dumps(f'Open time: {tt:0.4f} s')
}
Lambda is not designed for big heavy lifting. Its design intent is small, quickly firing low scope functions. You have two options.
Use an EC2 instance. This is more expensive, but it is a server and designed for this kind of thing
Maybe try Elastic File System - this is another service that can tied to lambda which provides a 'cross invocation' File System that Lambda's can access almost as if it was internal, and exists outside of a single invocation of the lambda. This allows you to have large memory objects 'pre loaded' into the file system memory that the Lambda can access, manipulate, and do whatever with without loading it first into its internal memory.
I noticed you also said AI models. There are specific services for Machine Learning, such as Sage Maker you may take a look into.
SHORT ANSWER: you can't control read/load speed of AWS Lambda
First of all, this problem is about read/write speed of current Lambda instance. It looks like on first invocation AWS look for free instance it can place lambda function to and all those instances has different IO speed.
More often it's about 6-9Mb/sec for reading which insanely slow for opening and working with big files.
Sometimes you are lucky and got instance with 50-80Mb/sec read. But it's pretty rare. Don't count on it.
So, if you want faster speed you must pay more:
Use Elastic File System as mentioned #lynkfox
Use S3
BONUS:
If you don't wired with AWS I've found Google Cloud Run much more suitable for my needs.
It uses docker containers as AWS lambda, also billed per 100 ms, can scale automatically
Read speed pretty stable and about 75Mb/sec
You can select RAM and vCPU separately which can lower costs.
You can load several big files simultaneously with multi processing which makes cold start much faster (multi processing load time in Lambda was the summary of all loaded files. Doesn't work for me)
The Init phase ends when the runtime and all extensions signal that
they are ready by sending a Next API request. The Init phase is
limited to 10 seconds. If all three tasks do not complete within 10
seconds, Lambda retries the Init phase at the time of the first
function invocation.
Refer: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
Check what the model load time on any ec2 machine (or CPU-based localhost) is
If it is close to 10 seconds, there is a high chance the model is loaded again. The next init generally happens quickly as lambda already has ready some of the content and loaded the state.
To make the read faster, others have suggested trying EFS. In addition, try EFS in Elastic mode.
Related
I'm using lambda with EFS, and seeing very high latencies, making the whole solution unusable.
Writing small files (1KB) start from 20ms and add up significantly if I write in 100 concurrent threads.
When I batch to larger files I'm still getting +50ms latencies.
Reading the docs, (https://docs.aws.amazon.com/efs/latest/ug/performance.html), they promise sub millisecond latencies.
Am I doing something wrong?
The code I'm using to write is a simple python script:
with open(filepath, 'wb') as f:
f.write(fact_to_write["data"])
I have a flink job (scala) that is basically reading from a kafka-topic (1.0), aggregating data (1 minute event time tumbling window, using a fold function, which I know is deprecated, but is easier to implement than an aggregate function), and writing the result to 2 different kafka topics.
The question is - when I'm using a FS state backend, everything runs smoothly, checkpoints are taking 1-2 seconds, with an average state size of 200 mb - that is, until the state size is increasing (while closing a gap, for example).
I figured I would try rocksdb (over hdfs) for checkpoints - but the throughput is SIGNIFICANTLY less than fs state backend. As I understand it, flink does not need to ser/deserialize for every state access when using fs state backend, because the state is kept in memory (heap), rocks db DOES, and I guess that is what is accounting for the slowdown (and backpressure, and checkpoints take MUCH longer, sometimes timeout after 10 minutes).
Still, there are times that the state cannot fit in memory, and I am trying to figure out basically how to make rocksdb state backend perform "better".
Is it because of the deprecated fold function? Do I need to fine tune some parameters that are not easily searchable in documentation? any tips?
Each state backend holds the working state somewhere, and then durably persists its checkpoints in a distributed filesystem. The RocksDB state backend holds its working state on disk, and this can be a local disk, hopefully faster than hdfs.
Try setting state.backend.rocksdb.localdir (see https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/state/state_backends.html#rocksdb-state-backend-config-options) to somewhere on the fastest local filesystem on each taskmanager.
Turning on incremental checkpointing could also make a large difference.
Also see Tuning RocksDB.
I have a large file which I want to process using Lambda functions in AWS. Since I can not control the size of the file, I came up with the solution to distribute the processing of the file to multiple lambda function calls to avoid timeouts. Here's how it works:
I dedicated a bucket to accept the new input files to be processed.
I set a trigger on the bucket to handle each time a new file is uploaded (let's call it uploadHandler)
Reading the file, uploadHandler measures the size of the file and splits it into equal chunks.
Each chunk is sent to processor lambda function to be processed.
Notes:
The uploadHandler does not read the file content.
The data sent to processor is just a { start: #, end: # }.
Multiple instances of the processor are called in parallel.
Each processor call reads its own chunk of the file individually and generates the output for it.
So far so good. The problem is how to consolidate the output of the all processor calls into one output? Does anyone have any suggestion? And also how to know when the execution of all the processors is done?
I recently had a similar problem. I solve it using AWS lambda and Step functions using this solution https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-create-iterate-pattern-section.html
In this specific example the execution doesn't happen in Parallel, but it's sequential. But when the state machine finish to execute you have the garantee that the file was totally processed correctly. I don't know if is exactly what you are looking.
Option 1:
After breaking the file, make the uploadHandler function call the processor functions synchronously.
Make the calls concurrent, so that you can trigger all processors at once. Lambda functions have only one vCPU (or 2 vCPUs if RAM > 1,800 Gb), but the requests are IO-bound, so you only need one processor.
The uploadHandler will wait for all processors to respond, then you can assemble all responses.
Pros: simpler to implement, no storage;
Cons: no visibility on what's going on until everything is finished;
Option 2:
Persist a processingJob in a DB (RDS, DynamoDB, whatever). The uploadHandler would create the job and save the number of parts into which the file was broken up. Save the job ID with each file part.
Each processor gets one part (with the job ID), processes it, then store in the DB the results of the processing.
Make each processor check if it's the last one delivering its results; if yes, make it trigger an assembler function to collect all results and do whatever you need.
Pros: more visibility, as you can query your storage DB at any time to check which parts were processed and which are pending; you could store all sorts of metadata from the processor for detailed analysis, if needed;
Cons: requires a storage service and a slightly more complex handling of your Lambdas;
I'm trying to figure out an architecture for processing rather big files (maybe few hundred MB) on a serverless AWS. This is what I've got so far:
API Gateway -> S3 -> Lambda function -> SNS -> Lambda function
In this scenario, the text file is uploaded to S3 through API Gateway. Then some Lambda function is called based on the event generated on S3. This Lambda function will open the text file and read it line by line, generating tasks to be done as messages in an SNS topic. Each message will invoke a separate Lambda function process the task.
My only concern is the first Lambda function call. What if it times out? How can I make sure that it's not a point of failure?
You can ask S3 to only return a particular byte range of a given object, using the Range header: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html
for example:
Range: bytes=0-9
would return only the first 10 bytes of the S3 object.
To read a file line by line, you would have to decide on a specific chunk size (1 MB for example), read 1 chunk of the file at a time and split the chunk by line (by looking for newline characters). Once the whole chunk has been read, you could re-invoke the lambda and pass the chunk pointer as a parameter. The new invocation of the lambda will read the file from the chunk pointer given as a parameter.
First thing to know is that the Lambda CPU available is proportional to its configured RAM size. So, double the RAM gets you double the CPU.
If scaling up the Lambda doesn't do it ... then some back of a napkin ideas:
One workflow might be: if size of CSV less than X (to be determined)
then process in a single Lambda. If size more than X then invoke N
sub-lambdas, pointing them each at a 1/Nth of the input file
(assuming you can split the workload like this). The Lambdas use the get range feature of S3. This is a kind of map/reduce pattern.
Or maybe use Step Functions. Have a 1st Lambda invocation begin to
process the file, keeping track of the time remaining (available
from the context object), and respond to Step Functions to indicate
how far it got. Then Step Functions invokes a subsequent Lambda to
process the next part of the file and so on, until complete.
Or use EC2, containers, or even EMR (obviously not serverless).
Also, note that Lambda functions have limited diskspace (500MB) so if you need to download the file to disk in order to process it, then it will need to be under 500MB, notwithstanding any other diskspace you might need to use. Optionally, you can work around this diskspace limitation by simply reading the file into memory (and resize the Lambda function up to 3GB as needed).
you can use AWS Batch instead of lambda for the heavy stuff.
create docker container with your code, load it to ECS, than create job-definition to run it.
use lambda to submit this job with input file as parameter.
op1: create dependent job this the 2nd stage processing, which will lunch automatically when first job succeded.
op2: use step function to orchestrate all the scenario (note that the integration between step function and Batch is not ideal..)
I am running a single-instance worker on AWS Beanstalk. It is a single-container Docker that runs some processes once every business day. Mostly, the processes sync a large number of small files from S3 and analyze those.
The setup runs fine for about a week, and then CPU load starts growing linearly in time, as in this screenshot.
The CPU load stays at a considerable level, slowing down my scheduled processes. At the same time, my top-resource tracking running inside the container (privileged Docker mode to enable it):
echo "%CPU %MEM ARGS $(date)" && ps -e -o pcpu,pmem,args --sort=pcpu | cut -d" " -f1-5 | tail
shows nearly no CPU load (which changes only during the time that my daily process runs, seemingly accurately reflecting system load at those times).
What am I missing here in terms of the origin of this "background" system load? Wondering if anybody seen some similar behavior, and/or could suggest additional diagnostics from inside the running container.
So far I have been re-starting the setup every week to remove the "background" load, but that is sub-optimal since the first run after each restart has to collect over 1 million small files from S3 (while subsequent daily runs add only a few thousand files per day).
The profile is a bit odd. Especially that it is a linear growth. Almost like something is accumulating and taking progressively longer to process.
I don't have enough information to point at a specific issue. A few things that you could check:
Are you collecting files anywhere, whether intentionally or in a cache or transfer folder? It could be that the system is running background processes (AV, index, defrag, dedupe, etc) and the "large number of small files" are accumulating to become something that needs to be paged or handled inefficiently.
Does any part of your process use a weekly naming convention or house keeping process. Might you be getting conflicts, or accumulating work load as the week rolls over. i.e. the 2nd week is actually processing both the 1st & 2nd week data, but never completing so that the next day it is progressively worse. I saw something similar where an inappropriate bubble sort process was not completing (never reached the completion condition due to the slow but steady inflow of data causing it to constantly reset) and the demand by the process got progressively higher as the array got larger.
Do you have some logging on a weekly rollover cycle ?
Are there any other key performance metrics following the trend ? (network, disk IO, memory, paging, etc)
Do consider if it is a false positive. if it is high CPU there should be other metrics mirroring the CPU behaviour, cache use, disk IO, S3 transfer statistics/logging.
RL