Currently, I have been downloading my ONNX models from S3 like so:
s3 = boto3.client('s3')
if os.path.isfile('/tmp/model.onnx') != True:
s3.download_file('test', 'models/model.onnx', '/tmp/model.onnx')
inference_session = onnxruntime.InferenceSession('/tmp/model.onnx')
However, I want to decrease the latency of having to download this model. To do so, I am looking to save the model in AWS Lambda layers. However, I'm having trouble doing so.
I tried creating a ZIP file as so:
- python
- model.onnx
and loading it like inference_session = onnxruntime.InferenceSession('/opt/model.onnx') but I got a "File doesn't exist" error. What should I do to make sure that the model can be found in the /opt/ directory?
Note: My AWS Lambda function is running on Python 3.6.
Your file should be in /opt/python/model.onnx. Therefore, you should be able to use the following to get it:
inference_session = onnxruntime.InferenceSession('/opt/python/model.onnx')
If you don't want your file to be in python folder, then don't create layer with such folder. Just have model.onnx in the zip's root folder, rather then inside the python folder.
Related
I have a folder with several files corresponding to checkpoints of a RL model trained using RLLIB. I want to make an analysis of the checkpoints in a way that I need to pass a certain folder as an argument, e.g., analysis_function(folder_path). I have to run this line on a SageMaker notebook. I have seen that there are some questions on SO about how to retrieve files from s3, such as this one. However; how can I retrieve a whole folder?
To read the whole folder, you will just have to list all files in the folder and loop through them. You could either do something like -
import boto3
s3_res = boto3.resource("s3")
my_bucket = s3.Bucket("<your-bucket-name>")
for object in my_bucket.objects.filter(Prefix="<your-prefix>")
# your code goes here
Or, simply download the files to your local storage and loop them as you see fit (copy reference)-
!aws s3 cp s3://bucket/prefix/ . --recursive
I am have an apache-beam==2.3.0 pipeline written using the python SDK that is working with my DirectRunner locally. When I change the runner to DataflowRunner I get an error about 'storage' not being global.
Checking my code I think it's because I am using the credentials stored in my environment. In my python code I just do:
class ExtractBlobs(beam.DoFn):
def process(self, element):
from google.cloud import storage
client = storage.Client()
yield list(client.get_bucket(element).list_blobs(max_results=100))
The real issue is that I need the client so I can then get the bucket so I can then list the blobs. Everything I'm doing here is so I can list the blobs.
So if anyone can either point me the right direction towards using 'storage.Client()' in Dataflow or how to list the blobs of a GCP bucket without needing the client.
Thanks in advance!
[+] What I've read: https://cloud.google.com/docs/authentication/production#auth-cloud-implicit-python
Fixed:
Okay so upon further reading and investigating it turns out I have the required libraries to run my pipeline locally but Dataflow needs to know these in order to download them into the resources it spins up. https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
so all I've done is create a requirements.txt file with my google-cloud-* requirements.
I then spin up my pipeline like this:
python myPipeline.py --requirements_file requirements.txt --save_main_session True
that last flag is to tell it to keep the imports you do in main.
Hi I am trying out opencv in AWS lambda. I want to save a SVM model in txt file so that I can load it again. Is it possible to save it in tmp directory and load it from there whenever I need it or will I have to use s3?
I am using python and trying to do something like this:
# saving the model
svm.save("/tmp/svm.dat")
# Loading the model
svm = cv2.ml.SVM_load("/tmp/svm.dat")
Its not possible as Lambda execution environment is distributed and therefore the same function might run on several different instances.
The alternative is to save your svm.dat to S3 and then download it every time you start your lambda function.
I have an image URL (for example: http://www.myexample.come/testImage.jpg) and I would to upload this image on Amazon S3 using Django.
I'm not found a way to copy directly the resource from URL in Amazon S3 passing directly the file URL.
So, I think that i have to implement these steps in my project:
Download the file locally from URL http://www.myexample.come/testImage.jpg. I will have a local file testImage.jpg
I have to upload the local file into Amazon S3. I will have a S3 Url.
I have to delete the local file testImage.jpg
Is this a good way to build this feature?
Is possible to improve these steps?
I have to use this features when I receive a REST request and I have to respond passing in the response the uploaded S3 File Url... Are these steps a good way about performance?
The easiest way off the top of my head would be to use requests with io from the python std lib -- this is a bit of code I used a while back, I just tested it with python 2.7.9 and it works
>>> requests_image('http://docs.python-requests.org/en/latest/_static/requests-sidebar.png')
and it works with the latest version of requests (2.6.0) - but I should point out that it's just a snippet, and I was in full control of the image urls being handed to the function, so there's nothing in the way of error checking (you could use Pillow to open the image and confirm it's really a jpeg, etc.)
import requests
from io import open as iopen
from urlparse import urlsplit
def requests_image(file_url):
suffix_list = ['jpg', 'gif', 'png', 'tif', 'svg',]
file_name = urlsplit(file_url)[2].split('/')[-1]
file_suffix = file_name.split('.')[1]
i = requests.get(file_url)
if file_suffix in suffix_list and i.status_code == requests.codes.ok:
with iopen(file_name, 'wb') as file:
file.write(i.content)
else:
return False
I am trying to write 50 4KB files on a EC2 instance with S3 mounted on it.
How can I do this in python?
I am no sure how to proceed with this.
If you have the S3 object bucket mounted via FUSE or some other method to get S3 object space as a pseudo file system, then you write files just like anything else in Python.
with open('/path/to/s3/mount', 'wb') as dafile:
dafile.write('contents')
If are you trying to put objects in S3 from an EC2 instance, then you will want to follow the boto documentation on how to do this.
to start you off:
create a /etc/boto.cfg or ~/.boto file like the boto howto says
from boto.s3.connection import S3Connection
conn = S3Connection()
# if you want, you can conn = S3Connection('key_id_here, 'secret_here')
bucket = conn.get_bucket('your_bucket_to_store_files')
for file in fifty_file_names:
bucket.new_key(file).set_contents_from_file('/local/path/to/{}'.format(file))
This assumes you are doing fairly small files, like you said 50k. Larger files may need to be split/chunked.