How are you?
I'm trying to execute a sagemaker job but i get this error:
ClientError: Failed to download data. Cannot download s3://pocaaml/sagemaker/xsell_sc1_test/model/model_lgb.tar.gz, a previously downloaded file/folder clashes with it. Please check your s3 objects and ensure that there is no object that is both a folder as well as a file.
I'm have that model_lgb.tar.gz on that s3 path as you can see here:
This is my code:
project_name = 'xsell_sc1_test'
s3_bucket = "pocaaml"
prefix = "sagemaker/"+project_name
account_id = "029294541817"
s3_bucket_base_uri = "{}{}".format("s3://", s3_bucket)
dev = "dev-{}".format(strftime("%y-%m-%d-%H-%M", gmtime()))
region = sagemaker.Session().boto_region_name
print("Using AWS Region: {}".format(region))
# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()
boto3.setup_default_session(region_name=region)
boto_session = boto3.Session(region_name=region)
s3_client = boto3.client("s3", region_name=region)
sagemaker_boto_client = boto_session.client("sagemaker") #este pinta?
sagemaker_session = sagemaker.session.Session(
boto_session=boto_session, sagemaker_client=sagemaker_boto_client
)
sklearn_processor = SKLearnProcessor(
framework_version="0.23-1", role=role, instance_type='ml.m5.4xlarge', instance_count=1
)
PREPROCESSING_SCRIPT_LOCATION = 'funciones_altas.py'
preprocessing_input_code = sagemaker_session.upload_data(
PREPROCESSING_SCRIPT_LOCATION,
bucket=s3_bucket,
key_prefix="{}/{}".format(prefix, "code")
)
preprocessing_input_data = "{}/{}/{}".format(s3_bucket_base_uri, prefix, "data")
preprocessing_input_model = "{}/{}/{}".format(s3_bucket_base_uri, prefix, "model")
preprocessing_output = "{}/{}/{}/{}/{}".format(s3_bucket_base_uri, prefix, dev, "preprocessing" ,"output")
processing_job_name = params["project_name"].replace("_", "-")+"-preprocess-{}".format(strftime("%d-%H-%M-%S", gmtime()))
sklearn_processor.run(
code=preprocessing_input_code,
job_name = processing_job_name,
inputs=[ProcessingInput(input_name="data",
source=preprocessing_input_data,
destination="/opt/ml/processing/input/data"),
ProcessingInput(input_name="model",
source=preprocessing_input_model,
destination="/opt/ml/processing/input/model")],
outputs=[
ProcessingOutput(output_name="output",
destination=preprocessing_output,
source="/opt/ml/processing/output")],
wait=False,
)
preprocessing_job_description = sklearn_processor.jobs[-1].describe()
and on funciones_altas.py i'm using ohe_altas.tar.gz and not model_lgb.tar.gz making this error super weird.
can you help me?
Looks like you are using sagemaker generated execution role and the error is related to S3 permissions.
Here are a couple of things you can do:
make sure to check the policies on the role that they have access to your bucket.
check if the objects are encrypted in your bucket, if so then ensure to also include kms policy to the role you are linking to the job. https://aws.amazon.com/premiumsupport/knowledge-center/s3-403-forbidden-error/
You can always create your own role as well and pass the arn to the code to run the processing job.
I am new to AWS and boto. The data I want to download is on AWS, and I have the access key and the secret key. My problem is I do not understand the approaches I found. For instance, this code:
import boto
import boto.s3.connection
def download_data_connect_s3(access_key, secret_key, region, bucket_name, key, local_path):
conn = boto.connect_s3(aws_access_key_id = access_key,\
aws_secret_access_key = secret_key,\
host='s3-{}.amazonaws.com'.format(region),\
calling_format = boto.s3.connection.OrdinaryCallingFormat()\
)
bucket = conn.get_bucket(bucket_name)
key = bucket.get_key(key)
key.get_contents_to_filename(local_path)
print('Downloaded File {} to {}'.format(key, local_path))
region = 'us-west-1'
access_key = # the key here
secret_key = # the secret key here
bucket_name = 'temp_name'
key = '<folder…/filename>' unique identifer
local_path = # local path
download_data_connect_s3(access_key, secret_key, region, bucket_name, key, local_path)
What I don't understand is the 'key' 'bucket_name' and 'local path'. What is 'key' in comparison to access key and secret key? I was not given a 'key'. Also, is the 'bucket_name' the name of the bucket on AWS (I was not provided with the bucket name); and local path the directory where I want to save the data?
You are right.
bucket_name = name of your S3 bucket
key = is object key. It's full path of the file in side the bucket. (ex: you have a file named a.txt in folder x, so key = x/a.txt. Refer to this link
local_path = where you want to save the data in local machine
It sounds like the data is stored in Amazon S3.
You can use the AWS Command-Line Interface (CLI) to access Amazon S3.
To view the list of buckets in that account:
aws s3 ls
To view the contents of a bucket:
aws s3 ls bucket-name
To copy a file from a bucket to the current directory:
aws s3 cp s3://bucket-name/filename.txt .
Or sync a whole folder:
aws s3 sync s3://bucket-name/folder/ local-folder/
I have an AWS Lambda function written in Python 2.7 in which I want to:
1) Grab an .xls file form an HTTP address.
2) Store it in a temp location.
3) Store the file in an S3 bucket.
My code is as follows:
from __future__ import print_function
import urllib
import datetime
import boto3
from botocore.client import Config
def lambda_handler(event, context):
"""Make a variable containing the date format based on YYYYYMMDD"""
cur_dt = datetime.datetime.today().strftime('%Y%m%d')
"""Make a variable containing the url and current date based on the variable
cur_dt"""
dls = "http://11.11.111.111/XL/" + cur_dt + ".xlsx"
urllib.urlretrieve(dls, cur_dt + "test.xls")
ACCESS_KEY_ID = 'Abcdefg'
ACCESS_SECRET_KEY = 'hijklmnop+6dKeiAByFluK1R7rngF'
BUCKET_NAME = 'my-bicket'
FILE_NAME = cur_dt + "test.xls";
data = open('/tmp/' + FILE_NAME, 'wb')
# S3 Connect
s3 = boto3.resource(
's3',
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY,
config=Config(signature_version='s3v4')
)
# Uploaded File
s3.Bucket(BUCKET_NAME).put(Key=FILE_NAME, Body=data, ACL='public-read')
However, when I run this function, I receive the following error:
'IOError: [Errno 30] Read-only file system'
I've spent hours trying to address this issue but I'm falling on my face. Any help would be appreciated.
'IOError: [Errno 30] Read-only file system'
You seem to lack some write access right. If your lambda has another policy, try to attach this policy to your role:
arn:aws:iam::aws:policy/AWSLambdaFullAccess
It has full access on S3 as well, in case you can't write in your bucket. If it solves your issue, you'll remove some rights after that.
I have uploaded the image to s3 Bucket. In "Lambda Test Event", I have created one json test event which contains BASE64 of Image to be uploaded to s3 Bucket and Image Name.
Lambda Test JSON Event as fallows ======>
{
"ImageName": "Your Image Name",
"img64":"BASE64 of Your Image"
}
Following is the code to upload an image or any file to s3 ======>
import boto3
import base64
def lambda_handler(event, context):
s3 = boto3.resource(u's3')
bucket = s3.Bucket(u'YOUR-BUCKET-NAME')
path_test = '/tmp/output' # temp path in lambda.
key = event['ImageName'] # assign filename to 'key' variable
data = event['img64'] # assign base64 of an image to data variable
data1 = data
img = base64.b64decode(data1) # decode the encoded image data (base64)
with open(path_test, 'wb') as data:
#data.write(data1)
data.write(img)
bucket.upload_file(path_test, key) # Upload image directly inside bucket
#bucket.upload_file(path_test, 'FOLDERNAME-IN-YOUR-BUCKET /{}'.format(key)) # Upload image inside folder of your s3 bucket.
print('res---------------->',path_test)
print('key---------------->',key)
return {
'status': 'True',
'statusCode': 200,
'body': 'Image Uploaded'
}
change data = open('/tmp/' + FILE_NAME, 'wb') change the wb for "r"
also, I assume your IAM user has full access to S3 right?
or maybe the problem is in the request of that url...
try that cur_dt starts with "/tmp/"
urllib.urlretrieve(dls, "/tmp/" + cur_dt + "test.xls")
The link below shows how to download an entire S3 content. However, how does one get subfolder content. Suppose my S3 folder has the following emulated structure.
S3Folder/S1/file1.c
S3Folder/S1/file2.h
S3Folder/S1/file1.h
S3Folder/S2/file.exe
S3Folder/S2/resource.data
Suppose I am interested only in S2 folder. How do I isolate the keys in bucket list ?
local backup of an S3 content
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucket_name)
# go through the list of files
bucket_list = bucket.list()
for l in bucket_list:
keyString = str(l.key)
d = LOCAL_PATH + keyString
try:
l.get_contents_to_filename(d)
except OSError:
# check if dir exists
if not os.path.exists(d):
os.mkdir(d)
You could do the following:
import os
import boto3
s3_resource = boto3.resource("s3", region_name="us-east-1")
def download_objects():
root_dir = 'D:/' # local machine location
s3_bucket_name = 'S3_Bucket_Name' #s3 bucket name
s3_root_folder_prefix = 'sample' # bucket inside root folder
s3_folder_list = ['s3_folder_1','s3_folder_2','s3_folder_3'] # root folder inside sub folders list
my_bucket = self.s3_resource.Bucket(s3_bucket_name)
for file in my_bucket.objects.filter(Prefix=s3_root_folder_prefix):
if any(s in file.key for s in s3_folder_list):
try:
path, filename = os.path.split(file.key)
try:
os.makedirs(root_dir + path)
except Exception as err:
pass
my_bucket.download_file(file.key, root_dir + path + '/' + filename)
except Exception as err:
print(err)
if __name__ == '__main__':
download_objects()
You can download s3 objects by adding prefix of it in the key value.
So, according to your Question , you just need to add prefix '/S2' while downloading objects
FYI: s3 download object using boto3
For more check this
I am already connected to the instance and I want to upload the files that are generated from my python script directly to S3. I have tried this:
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket('alexandrabucket')
from boto.s3.key import Key
key = bucket.new_key('s0').set_contents_from_string('some content')
but this is rather creating a new file s0 with the context "same content" while I want to upload the directory s0 to mybucket.
I had a look also to s3put but I didn't manage to get what I want.
The following function can be used to upload directory to s3 via boto.
def uploadDirectory(path,bucketname):
for root,dirs,files in os.walk(path):
for file in files:
s3C.upload_file(os.path.join(root,file),bucketname,file)
Provide a path to the directory and bucket name as the inputs. The files are placed directly into the bucket. Alter the last variable of the upload_file() function to place them in "directories".
There is nothing in the boto library itself that would allow you to upload an entire directory. You could write your own code to traverse the directory using os.walk or similar and to upload each individual file using boto.
There is a command line utility in boto called s3put that could handle this or you could use the AWS CLI tool which has a lot of features that allow you to upload entire directories or even sync the S3 bucket with a local directory or vice-versa.
The s3fs package provides nice functionalities to handle such cases
s3_file = s3fs.S3FileSystem()
local_path = "some_dir_path/some_dir_path/"
s3_path = "bucket_name/dir_path"
s3_file.put(local_path, s3_path, recursive=True)
I built the function based on the feedback from #JDPTET, however,
I needed to remove the common entire local path from getting uploaded to the bucket!
Not sure how many path separators I encounter - so I had to use os.path.normpath
def upload_folder_to_s3(s3bucket, inputDir, s3Path):
print("Uploading results to s3 initiated...")
print("Local Source:",inputDir)
os.system("ls -ltR " + inputDir)
print("Dest S3path:",s3Path)
try:
for path, subdirs, files in os.walk(inputDir):
for file in files:
dest_path = path.replace(inputDir,"")
__s3file = os.path.normpath(s3Path + '/' + dest_path + '/' + file)
__local_file = os.path.join(path, file)
print("upload : ", __local_file, " to Target: ", __s3file, end="")
s3bucket.upload_file(__local_file, __s3file)
print(" ...Success")
except Exception as e:
print(" ... Failed!! Quitting Upload!!")
print(e)
raise e
s3 = boto3.resource('s3', region_name='us-east-1')
s3bucket = s3.Bucket("<<s3bucket_name>>")
upload_folder_to_s3(s3bucket, "<<Local Folder>>", "<<s3 Path>>")
You could do the following:
import os
import boto3
s3_resource = boto3.resource("s3", region_name="us-east-1")
def upload_objects():
try:
bucket_name = "S3_Bucket_Name" #s3 bucket name
root_path = 'D:/sample/' # local folder for upload
my_bucket = s3_resource.Bucket(bucket_name)
for path, subdirs, files in os.walk(root_path):
path = path.replace("\\","/")
directory_name = path.replace(root_path,"")
for file in files:
my_bucket.upload_file(os.path.join(path, file), directory_name+'/'+file)
except Exception as err:
print(err)
if __name__ == '__main__':
upload_objects()
This is the code I used which recursively upload files from the specified folder to the specified s3 path. Just add S3 credential and bucket details in the script:
https://gist.github.com/hari116/4ab5ebd885b63e699c4662cd8382c314/
#!/usr/bin/python
"""Usage: Add bucket name and credentials
script.py <source folder> <s3 destination folder >"""
import os
from sys import argv
import boto3
from botocore.exceptions import NoCredentialsError
ACCESS_KEY = ''
SECRET_KEY = ''
host = ''
bucket_name = ''
local_folder, s3_folder = argv[1:3]
walks = os.walk(local_folder)
# Function to upload to s3
def upload_to_aws(bucket, local_file, s3_file):
"""local_file, s3_file can be paths"""
s3 = boto3.client('s3', aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
print(' Uploading ' +local_file + ' as ' + bucket + '/' +s3_file)
try:
s3.upload_file(local_file, bucket, s3_file)
print(' '+s3_file + ": Upload Successful")
print(' ---------')
return True
except NoCredentialsError:
print("Credentials not available")
return False
"""For file names"""
for source, dirs, files in walks:
print('Directory: ' + source)
for filename in files:
# construct the full local path
local_file = os.path.join(source, filename)
# construct the full Dropbox path
relative_path = os.path.relpath(local_file, local_folder)
s3_file = os.path.join(s3_folder, relative_path)
# Invoke upload function
upload_to_aws(bucket_name, local_file, s3_file)
For reading file form folder we can use
import boto
from boto.s3.key import Key
keyId = 'YOUR_AWS_ACCESS_KEY_ID'
sKeyId='YOUR_AWS_ACCESS_KEY_ID'
bucketName='your_bucket_name'
conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)
for key in bucket.list():
print ">>>>>"+key.name
pathV = key.name.split('/')
if(pathV[0] == "data"):
if(pathV[1] != ""):
srcFileName = key.name
filename = key.name
filename = filename.split('/')[1]
destFileName = "model/data/"+filename
k = Key(bucket,srcFileName)
k.get_contents_to_filename(destFileName)
elif(pathV[0] == "nlu_data"):
if(pathV[1] != ""):
srcFileName = key.name
filename = key.name
filename = filename.split('/')[1]
destFileName = "model/nlu_data/"+filename
k = Key(bucket,srcFileName)
k.get_contents_to_filename(destFileName)
Updated #user 923227's answer to (1) include newer boto3 interface (2) work with nuances of windows double backslash (3) cleaner tqdm progress bar:
import os
from tqdm import tqdm
def upload_folder_to_s3(s3_client, s3bucket, input_dir, s3_path):
pbar = tqdm(os.walk(input_dir))
for path, subdirs, files in pbar:
for file in files:
dest_path = path.replace(input_dir, "").replace(os.sep, '/')
s3_file = f'{s3_path}/{dest_path}/{file}'.replace('//', '/')
local_file = os.path.join(path, file)
s3_client.upload_file(local_file, s3bucket, s3_file)
pbar.set_description(f'Uploaded {local_file} to {s3_file}')
print(f"Successfully uploaded {input_dir} to S3 {s3_path}")
Usage example:
s3_client = boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
upload_folder_to_s3(s3_client, 'BUCKET-NAME', <local-directory>, <s3-directory>)
Somehow the other snippets did not really work for me, this is a modification of the snippet from user 923227 that does.
This code copies all files in a directory and maintains the directory in S3, e.g.2023/01/file.jpg will be in the bucket as 2023/01/file.jpg.
import os
import sys
import boto3
client = boto3.client('s3')
local_path = "your-path/data"
bucketname = "bucket-name"
for path, dirs, files in os.walk(local_path):
for file in files:
file_s3 = os.path.normpath(path + '/' + file)
file_local = os.path.join(path, file)
print("Upload:", file_local, "to target:", file_s3, end="")
client.upload_file(file_local, bucketname, file_s3)
print(" ...Success")
Another method that did not exist when this question was first asked is to use python-rclone (https://github.com/ddragosd/python-rclone/blob/master/README.md).
This requires a download of rclone and a working rclone config. Commonly used for AWS (https://rclone.org/s3/) but can be used for other providers as well.
install('python-rclone')
import rclone
cfg_path = r'(path to rclone config file here)'
with open(cfg_path) as f:
cfg = f.read()
# Implementation
# Local file to cloud server
result = rclone.with_config(cfg).run_cmd(command="sync", extra_args=["/home/demodir/", "AWS test:dummydir/etc/"])
# Cloud server to cloud server
result = rclone.with_config(cfg).run_cmd(command="sync", extra_args=["Gdrive:test/testing/", "AWS test:dummydir/etc/"
This allows you to run a "sync" command similar to the AWS CLI within your python code by reading in the config file and mapping your output via kwargs (extra_args)
This solution does not use boto, but I think it could do what the OP wants.
It uses awscli and Python.
import os
class AwsCredentials:
def __init__(self, access_key: str, secret_key: str):
self.access_key = access_key
self.secret_key = secret_key
def to_command(self):
credentials = f'AWS_ACCESS_KEY_ID={self.access_key} AWS_SECRET_ACCESS_KEY={self.secret_key}'
return credentials
def sync_s3_bucket(credentials: AwsCredentials, source_path: str, bucket: str) -> None:
command = f'{credentials.to_command()} aws s3 sync {source_path} s3://{bucket}'
result = os.system(command)
assert result == 0, f'The s3 sync was not successful, error code: {result}'
Please consider getting the AWS credentials from a file or from the environment.
The documentation for the s3 sync command is here.
Simply running terminal commands using os module with F string works
import os
ActualFolderName = "FolderToBeUploadedOnS3"
os.system(f'aws s3 cp D:\<PathToYourFolder>\{ActualFolderName} s3://<BucketName>/{ActualFolderName}/ --recursive')