How to recursively delete objects - amazon-web-services

I would like to delete all .JPEG files from specified path at S3 bucket. For example, lets say that I have structure on S3 cloud service similar to following:
Obj1/
Obj2/
Obj3/
image_1.jpeg
...
image_N.jpeg
Is it possible to specify Obj1/Obj2/Obj3 as DeleteObjectsInput's prefix and recursively delete all .JPEG files that contain such prefix.
Here is my code:
func (s3Obj S3) Delete() error {
sess := session.Must(session.NewSession(&aws.Config{
Region: aws.String(s3Obj.Region),
}))
svc := s3.New(sess)
input := &s3.DeleteObjectsInput{
Bucket: aws.String(s3Obj.Bucket),
Delete: &s3.Delete{
Objects: []*s3.ObjectIdentifier{
{
Key: aws.String(s3Obj.ItemPath),
},
},
Quiet: aws.Bool(false),
},
}
result, err := svc.DeleteObjects(input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
switch aerr.Code() {
default:
glog.Errorf("Error occurred while trying to delete object from S3. Error message - %v", aerr.Error())
}
} else {
glog.Errorf("Error occurred while trying to delete object from S3. Error message - %v", err.Error())
}
return err
}
glog.Info(result)
return nil
}
sObj3.ItemPath represents Obj1/Obj2/Obj3 path from example above. As a result of this function I do not get any error. I actually get the following message:
Deleted: [{
Key: "Obj1/Obj2/Obj3"
}]
But when I check my S3 cloud service, nothing is done. What am I doing wrong?
EDIT
I've changed my code so my Delete function accepts list of objects from which I make a list of s3.ObjectIdentifier. There is roughly 50 .JPEG files in that list, and for some reason following code ONLY DELETES LAST ONE. I am not sure why.
func (s3Obj S3) Delete(objects []string) error {
sess := session.Must(session.NewSession(&aws.Config{
Region: aws.String(s3Obj.Region),
}))
svc := s3.New(sess)
var objKeys = make([]*s3.ObjectIdentifier, len(objects))
for i, v := range objects {
glog.Info("About to delete: ", v)
objKeys[i] = &s3.ObjectIdentifier{
Key: &v,
}
}
input := &s3.DeleteObjectsInput{
Bucket: aws.String(s3Obj.Bucket),
Delete: &s3.Delete{
Objects: objKeys,
Quiet: aws.Bool(false),
},
}
result, err := svc.DeleteObjects(input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
switch aerr.Code() {
default:
glog.Errorf("Error occurred while trying to delete object from S3. Error message - %v", aerr.Error())
}
} else {
glog.Errorf("Error occurred while trying to delete object from S3. Error message - %v", err.Error())
}
return err
}
glog.Info(result)
return nil
}

Related

How to delete a non-empty S3 bucket using AWS SDK for Go v2

I'm using AWS sdk v2
and I need to delete a bucket that had objects
what's the best way to do so? is there something to force delete? or that deleted all the objects inside a bucket?
The AWS documentation Deleting a bucket describes how to delete a bucket that has objects. The documentation also provides an SDK example (written in Java, but mainly serves as a guideline) that performs the following steps:
Delete all objects
Delete all object versions (for versioned buckets)
Finally delete bucket
There is no "force delete" option for non-empty buckets. You would need to implement the above steps.
The following sample code shows how to completely delete a non-empty bucket:
func main() {
cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithRegion("us-east-1"))
if err != nil {
log.Fatalf("Failed to load config: %v", err)
}
bucket := aws.String("your-bucket-name")
client := s3.NewFromConfig(cfg)
deleteObject := func(bucket, key, versionId *string) {
log.Printf("Object: %s/%s\n", *key, aws.ToString(versionId))
_, err := client.DeleteObject(context.TODO(), &s3.DeleteObjectInput{
Bucket: bucket,
Key: key,
VersionId: versionId,
})
if err != nil {
log.Fatalf("Failed to delete object: %v", err)
}
}
in := &s3.ListObjectsV2Input{Bucket: bucket}
for {
out, err := client.ListObjectsV2(context.TODO(), in)
if err != nil {
log.Fatalf("Failed to list objects: %v", err)
}
for _, item := range out.Contents {
deleteObject(bucket, item.Key, nil)
}
if out.IsTruncated {
in.ContinuationToken = out.ContinuationToken
} else {
break
}
}
inVer := &s3.ListObjectVersionsInput{Bucket: bucket}
for {
out, err := client.ListObjectVersions(context.TODO(), inVer)
if err != nil {
log.Fatalf("Failed to list version objects: %v", err)
}
for _, item := range out.DeleteMarkers {
deleteObject(bucket, item.Key, item.VersionId)
}
for _, item := range out.Versions {
deleteObject(bucket, item.Key, item.VersionId)
}
if out.IsTruncated {
inVer.VersionIdMarker = out.NextVersionIdMarker
inVer.KeyMarker = out.NextKeyMarker
} else {
break
}
}
_, err = client.DeleteBucket(context.TODO(), &s3.DeleteBucketInput{Bucket: bucket})
if err != nil {
log.Fatalf("Failed to delete bucket: %v", err)
}
}
You should probably optimize this further and use DeleteObjects for batch calls in order to reduce request overhead.

Search Files in AWS S3 by LastModified

I want to search files in AWS S3 based on file creation time (or LastModified) time in go. I am aware of same in python using boto3 paginator which provides the options to provide the query string but want to achieve same in go.
Any suggestion or any sample in go-lang would be appreciated?
Sample code I am trying to list all files:
for s.NextContinuationToken != "" {
maxFileRead := 15
bucket := "XXX-XXX-test"
// To check if previous result was truncated
if s.IsTruncated {
fileList, err = s.session.ListObjectsV2(&s3.ListObjectsV2Input{
Bucket: aws.String(bucket),
MaxKeys: aws.Int64(maxFileRead),
ContinuationToken: &s.NextContinuationToken,
})
} else {
fileList, err = s.session.ListObjectsV2(&s3.ListObjectsV2Input{
Bucket: aws.String(bucket),
MaxKeys: aws.Int64(maxFileRead),
})
}
s.IsTruncated = *fileList.IsTruncated
if s.IsTruncated {
s.NextContinuationToken = *fileList.NextContinuationToken
} else {
s.NextContinuationToken = ""
}
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
switch aerr.Code() {
case s3.ErrCodeNoSuchBucket:
fmt.Println(s3.ErrCodeNoSuchBucket, aerr.Error())
default:
fmt.Println(aerr.Error())
}
} else {
// Print the error, cast err to awserr.Error to get the Code and
// Message from an error.
fmt.Println(err.Error())
}
}
}
Now I want to modify the search to only list files created after a particular time.
Call ListObjectsV2 (https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#S3.ListObjectsV2) on each bucket.
The Contents property returned is a list of metadata about each bucket object.
Use the LastModified field.

Fail to upload file to SFTP host with golang

I have the following golang function to upload a file to SFTP:
func uploadObjectToDestination(sshConfig SSHConnectionConfig, destinationPath string, srcFile io.Reader) {
// Connect to destination host via SSH
conn, err := ssh.Dial("tcp", sshConfig.sftpHost+sshConfig.sftpPort, sshConfig.authConfig)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
// create new SFTP client
client, err := sftp.NewClient(conn)
if err != nil {
log.Fatal(err)
}
defer client.Close()
log.Printf("Opening file on destination server under path %s", destinationPath)
// create destination file
dstFile, err := client.OpenFile(destinationPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC)
if err != nil {
log.Fatal(err)
}
defer dstFile.Close()
log.Printf("Copying file to %s", destinationPath)
// copy source file to destination file
bytes, err := io.Copy(dstFile, srcFile)
if err != nil {
log.Fatal(err)
}
log.Printf("%s - Total %d bytes copied\n", dstFile.Name(), bytes)
}
The code above works 95% of the cases but fails for some files. The only relation between this files which are failing is the size (3-4kb). The other files which succeed are smaller (0.5-3kb). In some cases files with size 2-3kb are failing as well.
I was able to reproduce the same issue with different SFTP servers.
When changing the failing code (io.Copy) with sftp.Write I can see the same behavior, except that the process does not return an error, instead I see that 0 bytes were copied, which seems to be the same like failing with io.Copy.
Btw, when using io.Copy, the error I receive is Context cancelled, unexpected EOF.
The code is running from AWS lambda and there is no memory or time limit issue.
After few hours of digging, it turns out, my code was the source of the issue.
Here is the answer for future reference:
There was another function not in the original question which downloads the object(s) from S3:
func getObjectFromS3(svc *s3.S3, bucket, key string) io.Reader {
var timeout = time.Second * 30
ctx := context.Background()
var cancelFn func()
ctx, cancelFn = context.WithTimeout(ctx, timeout)
defer cancelFn()
var input = &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
}
o, err := svc.GetObjectWithContext(ctx, input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok && aerr.Code() == request.CanceledErrorCode {
log.Fatal("Download canceled due to timeout", err)
} else {
log.Fatal("Failed to download object", err)
}
}
// Load S3 file into memory, assuming small files
return o.Body
}
The code above is using context and for some reason, the object returned object size was wrong.
Since I don't use contexts here I simply converted my code to use GetObject(input) which fixed the issue.
func getObjectFromS3(svc *s3.S3, bucket, key string) io.Reader {
var input = &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
}
o, err := svc.GetObject(input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
switch aerr.Code() {
case s3.ErrCodeNoSuchKey:
log.Fatal(s3.ErrCodeNoSuchKey, aerr.Error())
default:
log.Fatal(aerr.Error())
}
} else {
// Print the error, cast err to awserr.Error to get the Code and
// Message from an error.
log.Fatal(err.Error())
}
}
// Load S3 file into memory, assuming small files
return o.Body
}

Copy S3 Object with MultiPartUpload

I need to rename a quite a bunch of objects in AWS S3. For small objects the following snippet works flawlessly:
input := &s3.CopyObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(targetPrefix),
CopySource: aws.String(source),
}
_, err = svc.CopyObject(input)
if err != nil {
panic(errors.Wrap(err, "error copying object"))
}
I am running into the S3 size limitation for larger objects. I understand I need to copy the object using a multi part upload. This is what I tried so far:
multiPartUpload, err := svc.CreateMultipartUpload(
&s3.CreateMultipartUploadInput{
Bucket: aws.String(bucket),
Key: aws.String(targetPrefix), // targetPrefix is the new name
},
)
if err != nil {
panic(errors.Wrap(err, "could not create MultiPartUpload"))
}
resp, err := svc.UploadPartCopy(
&s3.UploadPartCopyInput{
UploadId: multiPartUpload.UploadId,
Bucket: aws.String(bucket),
Key: aws.String(targetPrefix),
CopySource: aws.String(source),
PartNumber: aws.Int64(1),
},
)
if err != nil {
panic(errors.Wrap(err, "error copying multipart object"))
}
log.Printf("copied: %v", resp)
The golang SDK bails out on me with:
InvalidRequest: The specified copy source is larger than the maximum allowable size for a copy source: 5368709120
I have also tried the following approach but I do not get any parts listed here:
multiPartUpload, err := svc.CreateMultipartUpload(
&s3.CreateMultipartUploadInput{
Bucket: aws.String(bucket),
Key: aws.String(targetPrefix), // targetPrefix is the new name
},
)
if err != nil {
panic(errors.Wrap(err, "could not create MultiPartUpload"))
}
err = svc.ListPartsPages(
&s3.ListPartsInput{
Bucket: aws.String(bucket), // Required
Key: obj.Key, // Required
UploadId: multiPartUpload.UploadId, // Required
},
// Iterate over all parts in the `CopySource` object
func(parts *s3.ListPartsOutput, lastPage bool) bool {
log.Printf("parts:\n%v\n%v", parts, parts.Parts)
// parts.Parts is an empty slice
for _, part := range parts.Parts {
log.Printf("copying %v part %v", source, part.PartNumber)
resp, err := svc.UploadPartCopy(
&s3.UploadPartCopyInput{
UploadId: multiPartUpload.UploadId,
Bucket: aws.String(bucket),
Key: aws.String(targetPrefix),
CopySource: aws.String(source),
PartNumber: part.PartNumber,
},
)
if err != nil {
panic(errors.Wrap(err, "error copying object"))
}
log.Printf("copied: %v", resp)
}
return true
},
)
if err != nil {
panic(errors.Wrap(err, "something went wrong with ListPartsPages!"))
}
What am I doing wrong or am I missunderstanding something?
I think that ListPartsPages is the wrong direction because it works on "Multipart Uploads" which is a different entity than an an s3 "Object". So you're listing the already-uploaded parts to the multipart upload you just created.
Your first example is close to what's needed, but you need to manually split the original file into parts, with the range of each part specified by UploadPartCopyInput's CopySourceRange. At least that's my take from reading the documentation.

Golang Aws S3 NoSuchKey: The specified key does not exist

I'm trying to download Objects from S3, the following is my code:
func listFile(bucket, prefix string) error {
svc := s3.New(sess)
params := &s3.ListObjectsInput{
Bucket: aws.String(bucket), // Required
Prefix: aws.String(prefix),
}
return svc.ListObjectsPages(params, func(p *s3.ListObjectsOutput, lastPage bool) bool {
for _, o := range p.Contents {
//log.Println(*o.Key)
log.Println(*o.Key)
download(bucket, *o.Key)
return true
}
return lastPage
})
}
func download(bucket, key string) {
logDir := conf.Cfg.Section("share").Key("LOG_DIR").MustString(".")
tmpLogPath := filepath.Join(logDir, bucket, key)
s3Svc := s3.New(sess)
downloader := s3manager.NewDownloaderWithClient(s3Svc, func(d *s3manager.Downloader) {
d.PartSize = 2 * 1024 * 1024 // 2MB per part
})
f, err := os.OpenFile(tmpLogPath, os.O_CREATE|os.O_WRONLY, 0644)
if _, err = downloader.Download(f, &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
}); err != nil {
log.Fatal(err)
}
f.Close()
}
func main() {
bucket := "mybucket"
key := "myprefix"
listFile(bucket, key)
}
I can get the objects list in the function listFile(), but a 404 returned when call download, why?
I had the same problem with recent versions of the library. Sometimes, the object key will be prefixed with a "./" that the SDK will remove by default making the download fail.
Try adding this to your aws.Config and see if it helps:
config := aws.Config{
...
DisableRestProtocolURICleaning: aws.Bool(true),
}
I submitted an issue.