AWS: FileNotFoundError: [Errno 2] No such file or directory - amazon-web-services

I am trying to download a file to sagemaker from my S3 bucket.
the path of the file is
s3://vemyone/input/dicom-images-train/1.2.276.0.7230010.3.1.2.8323329.1000.1517875165.878026/1.2.276.0.7230010.3.1.3.8323329.1000.1517875165.878025/1.2.276.0.7230010.3.1.4.8323329.1000.1517875165.878027.dcm
The path of that file is stored as a list element at train_fns[0].
the value of train_fns[0] is
input/dicom-images-train/1.2.276.0.7230010.3.1.2.8323329.1000.1517875165.878026/1.2.276.0.7230010.3.1.3.8323329.1000.1517875165.878025/1.2.276.0.7230010.3.1.4.8323329.1000.1517875165.878027.dcm
I used the following code:
s3 = boto3.resource('s3')
bucketname = 'vemyone'
s3.Bucket(bucketname).download_file(train_fns[0][:], train_fns[0])
but I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: 'input/dicom-images-train/1.2.276.0.7230010.3.1.2.8323329.1000.1517875165.878026/1.2.276.0.7230010.3.1.3.8323329.1000.1517875165.878025/1.2.276.0.7230010.3.1.4.8323329.1000.1517875165.878027.dcm.5b003ba1'
I notice that some characters have appended itself at the end of the path.
how do I solve this problem?

please see https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Bucket.download_file
by the doc, first argument is file key, second argument is path for local file:
s3 = boto3.resource('s3')
bucketname = 'vemyone'
s3.Bucket(bucketname).download_file(train_fns[0], '/path/to/local/file')

Related

getting an Error. FileNotFoundError: [Errno 2] No such file or directory: 'data_images\\1 (1).jpg'

I'm working on object detection with YOLOV5. the last, when I saved the images and labels in the text in the folder, I got an Error. FileNotFoundError: [Errno 2] No such file or directory: 'data_images\\1 (1).jpg'
def save_data(filename, folder_path, group_obj):
src = os.path.join('data_images',filename)
dst = os.path.join(folder_path,filename)
move(src,dst) # move image to the destination folder
# save the labels
text_filename = os.path.join(folder_path, os.path.splitext(filename)[0]+'.txt')
group_obj.get_group(filename).set_index('filename').to_csv(text_filename,sep=' ',index=False,header=False)
How to solve. I checked the data is present.

Combine all txt files in an S3 bucket into 1 large file

Problem: I am trying to combine large amounts of small-sized text files into 1 large-sized file in S3 bucket. Using python:
The code I tested to try this locally is below. It works perfectly. (obtained from another post):
with open(outfilename, 'wb') as outfile:
for filename in glob.glob('UBXEvents*'):
if filename == outfilename: # don't want to copy the output into the output
continue
with open(filename, 'rb') as readfile:
shutil.copyfileobj(readfile, outfile)
Now, since my files are located in an S3 bucket, I have trouble referencing the S3 bucket. I wanted to run this code for all files (using wild card *) in an S3 but I am having a hard time connecting the two.
Below is the s3 object I created:
object = client.get_object(
Bucket= 'my_bucket_name',
Key='bucket_path/prefix_of_file_name*'
)
Question: How would I reference the S3 bucket/path in my combining code above?
Obtaining a list of files
You can obtain a list of files in the bucket like this:
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='my-bucket', Prefix = 'folder1/')
for object in response['Contents']:
# Do stuff here
print(object['Key'])
Reading & Writing to Amazon S3
Normally, you would need to download each file from Amazon S3 to the local disk (using download_file() and then read the contents). However, you might instead want to use smart-open ยท PyPI, which is a library that allows files to be opened on S3 using similar syntax to the normal Python open() command.
Here's a program that uses smart-open to read files from S3 and combine them into an output file in S3:
import boto3
from smart_open import open
BUCKET = 'my-bucket'
PREFIX = 'folder1/' # Optional
s3_client = boto3.client('s3')
# Open output file with smart-open
with open(f's3://{BUCKET}/out.txt', 'w') as out_file:
response = s3_client.list_objects_v2(Bucket=BUCKET, Prefix = PREFIX)
for object in response['Contents']:
print(f"Copying {object['Key']}")
# Open input file with smart-open
with open(f"s3://{BUCKET}/{object['Key']}", 'r') as in_file:
# Read content from input file
for line in in_file:
# Write content to output file
out_file.write(line)

Error in uploading file in S3 using boto 3

I am trying to upload a file in S3 using boto3.I tried below code.
import boto3
s3 = boto3.resource('s3')
buck_name = s3.create_bucket(Bucket='trubuckboto')
s3.Object('trubuckboto','tlearn.txt').upload_file(
Filename='G:\tlearn.txt')
My bucket creation is successfull but i am not able to upload file from location G:\tlearn.txt inside that bucket.Below is the error i am getting
return os.stat(filename).st_size
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'G:\tlearn.txt'
Can someone suggest what i am missing here ?
In Python strings, the backslash "\" is a special character, also called the "escape" character. If you want a literal backslash then you need to escape the escape character, for example G:\\tlearn.txt:
import boto3
s3 = boto3.resource('s3')
# buck_name = s3.create_bucket(Bucket='trubuckboto')
s3.Object('trubuckboto', 'tlearn.txt').upload_file(
Filename='G:\\tlearn.txt')

Uploading multiple files to Google Cloud Storage via Python Client Library

The GCP python docs have a script with the following function:
def upload_pyspark_file(project_id, bucket_name, filename, file):
"""Uploads the PySpark file in this directory to the configured
input bucket."""
print('Uploading pyspark file to GCS')
client = storage.Client(project=project_id)
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(filename)
blob.upload_from_file(file)
I've created an argument parsing function in my script that takes in multiple arguments (file names) to upload to a GCS bucket. I'm trying to adapt the above function to parse those multiple args and upload those files, but am unsure how to proceed. My confusion is with the 'filename' and 'file' variables above. How can I adapt the function for my specific purpose?
I don't suppose you're still looking for something like this?
from google.cloud import storage
import os
files = os.listdir('data-files')
client = storage.Client.from_service_account_json('cred.json')
bucket = client.get_bucket('xxxxxx')
def upload_pyspark_file(filename, file):
# """Uploads the PySpark file in this directory to the configured
# input bucket."""
# print('Uploading pyspark file to GCS')
# client = storage.Client(project=project_id)
# bucket = client.get_bucket(bucket_name)
print('Uploading from ', file, 'to', filename)
blob = bucket.blob(filename)
blob.upload_from_file(file)
for f in files:
upload_pyspark_file(f, "data-files\\{0}".format(f))
The difference between file and filename is as you may have guessed, file is the source file and filename is the destination file.

IOError in Boto3 download_file

Background
I am using the following Boto3 code to download file from S3.
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
print (key)
if key.find('/') < 0 :
if len(key) > 4 and key[-5:].lower() == '.json': //File is uploaded outside any folder
download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
else:
download_path = '/tmp/{}/{}'.format(uuid.uuid4(), key)//File is uploaded inside a folder
If a new file is uploaded in S3 bucket, this code is triggered and that newly uploaded file is downloaded by this code.
This code works fine when uploaded outside any folder.
However, when I upload a file inside a directory, IO error happens.
Here is a dump of the IO error I am encountering.
[Errno 2] No such file or directory:
/tmp/316bbe85-fa21-463b-b965-9c12b0327f5d/test1/customer1.json.586ea9b8:
IOError
test1 is the directory inside my S3 bucket where customer1.json is uploaded.
Query
Any thoughts on how to resolve this error?
Error raised because you attempted to download and save file into directory which not exists. Use os.mkdir prior downloading file to create an directory.
# ...
else:
item_uuid = str(uuid.uuid4())
os.mkdir('/tmp/{}'.format(item_uuid))
download_path = '/tmp/{}/{}'.format(item_uuid, key) # File is uploaded inside a folder
Note: It's better to use os.path.join() while operating with systems paths. So code above could be rewritten to:
# ...
else:
item_uuid = str(uuid.uuid4())
os.mkdir(os.path.join(['tmp', item_uuid]))
download_path = os.path.join(['tmp', item_uuid, key]))
Also error may be raises because you including '/tmp/' in download path for s3 bucket file, do not include tmp folder as likely it's not exists on s3. Ensure you are on the right way by using that articles:
Amazon S3 upload and download using Python/Django
Python s3 examples
I faced the same issue, and the error message caused a lot of confusion, (the random string extension after the file name). In my case it was caused by the missing directory path, which didn't exist.
thanks for helping Andriy Ivaneyko,I found an solution using boto3.
Using this following code i am able to accomplish my task.
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
fn='/tmp/xyz'
fp=open(fn,'w')
response = s3_client.get_object(Bucket=bucket,Key=key)
contents = response['Body'].read()
fp.write(contents)
fp.close()
The problem with your code is that download_path is wrong. Whenever you are trying to download any file which is under a directory in your s3 bucket, the download path becomes something like:
download_path = /tmp/<uuid><object key name>
where <object key name> = "<directory name>/<object name>"
This makes the download path as:
download_path = /tmp/<uuid><directory name>/<object key name>
The code will fail because there is no directory exist with uuid-directory name. Your code only allows download of a file under /tmp directory only.
To fix the issue, considering splitting your key while making the download path and you can as well avoid check where the file was uploaded in the bucket. This will just take object file name only in the download path. For example:
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
print (key)
download_path = '/tmp/{}{}'.format(uuid.uuid4(), key.split('/')[-1])