AWS S3 parallel download using golang - amazon-web-services

I am writing a function to download a large file (9GB) from AWS S3 bucket using aws-sdk for go. I need to optimize this and download the file quickly.
func DownloadFromS3Bucket(bucket, item, path string) {
os.Setenv("AWS_ACCESS_KEY_ID", constants.AWS_ACCESS_KEY_ID)
os.Setenv("AWS_SECRET_ACCESS_KEY", constants.AWS_SECRET_ACCESS_KEY)
file, err := os.Create(filepath.Join(path, item))
if err != nil {
fmt.Printf("Error in downloading from file: %v \n", err)
os.Exit(1)
}
defer file.Close()
sess, _ := session.NewSession(&aws.Config{
Region: aws.String(constants.AWS_REGION)},
)
downloader := s3manager.NewDownloader(sess)
numBytes, err := downloader.Download(file,
&s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(item),
})
if err != nil {
fmt.Printf("Error in downloading from file: %v \n", err)
os.Exit(1)
}
fmt.Println("Download completed", file.Name(), numBytes, "bytes")
}
Can someone suggest a solution to extend this function.

Try altering your NewDownLoader() to this. See https://docs.aws.amazon.com/sdk-for-go/api/service/s3/s3manager/#NewDownloader
// Create a downloader with the session and custom options
downloader := s3manager.NewDownloader(sess, func(d *s3manager.Downloader) {
d.PartSize = 64 * 1024 * 1024 // 64MB per part
d.Concurrency = 4
})
List of Options that can be set with d. in the func can be found here
https://docs.aws.amazon.com/sdk-for-go/api/service/s3/s3manager/#Downloader

Related

Google Cloud Platform presignURL using Go

Trying to upload a picture to Google Cloud Platform, I always get the same err "<?xml version='1.0' encoding='UTF-8'?><Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your Google secret key and signing method.</Message><StringToSign>GOOG4-RSA-SHA256 20.................951Z".
I did add a service-account to the bucket with the role Storage Admin and Storage Object Admin as you can see on the pic
I have generated a Key(for the service account) and downloaded it as .json file, then I generate a presignURL using this code:
// key is the downloaded .json key file from the GCP service-account
// the return string is the presignedURL
func getPresignedURL(path, key string) (string, error) {
sakeyFile := filepath.Join(path, key)
saKey, err := ioutil.ReadFile(sakeyFile)
if err != nil {
log.Fatalln(err)
}
cfg, err := google.JWTConfigFromJSON(saKey)
if err != nil {
log.Fatalln(err)
}
bucket := "mybucket"
ctx := context.Background()
client, err := storage.NewClient(ctx)
if err != nil {
return "", fmt.Errorf("storage.NewClient: %v", err)
}
defer client.Close()
opts := &storage.SignedURLOptions{
Scheme: storage.SigningSchemeV4,
Method: "PUT",
Headers: []string{
"Content-Type:image/jpeg",
},
Expires: time.Now().Add(15 * time.Minute),
GoogleAccessID: cfg.Email,
PrivateKey: cfg.PrivateKey,
}
u, err := client.Bucket(bucket).SignedURL("mypic.jpeg", opts)
if err != nil {
return "", fmt.Errorf("Bucket(%q).SignedURL: %v", bucket, err)
}
return u, nil
}
The presignedURL looks good, something like this:
https://storage.googleapis.com/djedjepicbucket/mypic.jpeg?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=djedje%40picstorage-363707.iam.gserviceaccount.com%2F20220926%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20220926T081951Z&X-Goog-Expires=899&X-Goog Signature=3f330715d7a38ea08f99134a16f464fb............5ad800a7665dfb1440034ab1f5ab045252336&X-Goog-SignedHeaders=content-type%3Bhost
Then I read a file(picture) from disk and upload it using the presignURL
// the uri is the presignedURL
func newfileUploadRequest(uri string, params map[string]string, paramName, path string) (*http.Request, error) {
file, err := os.Open(path)
if err != nil {
return nil, err
}
defer file.Close()
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
if err != nil {
return nil, err
}
_, err = io.Copy(body, file)
for key, val := range params {
_ = writer.WriteField(key, val)
}
err = writer.Close()
if err != nil {
return nil, err
}
req, err := http.NewRequest("PUT", uri, body)
req.Header.Set("Content-Type", "image/jpeg")
return req, err
}
Then I exec the request
// the previous func
request, err := newfileUploadRequest(purl, extraParams, "picture", filepath.Join(path, "download.jpeg"))
if err != nil {
log.Fatal(err)
}
client := &http.Client{}
resp, err := client.Do(request)
if err != nil {
log.Fatal(err)
} else {
body := &bytes.Buffer{}
_, err := body.ReadFrom(resp.Body)
if err != nil {
log.Fatal(err)
}
resp.Body.Close()
fmt.Println(resp.StatusCode)
fmt.Println(resp.Header)
fmt.Println(body)
}
Unfortunatly, I always get the same error back
403
map[Alt-Svc:[h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"] Content-Length:[884] Content-Type:[application/xml; charset=UTF-8] Date:[Mon, 26 Sep 2022 08:22:19 GMT] Server:[UploadServer] X-Guploader-Uploadid:[ADPyc......................ECL_4W]]
<?xml version='1.0' encoding='UTF-8'?><Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your Google secret key and signing method.</Message><StringToSign>GOOG4-RSA-SHA256
20220926T081951Z
20220926/auto/storage/goog4_request
c5f36838af4......................8ffb56329c1eb27f</StringToSign><CanonicalRequest>PUT
/djedjepicbucket/mypic.jpeg
X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=djedje%40picstorage-363707.iam.gserviceaccount.com%2F20220926%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20220926T081951Z&X-Goog-Expires=899&X-Goog-SignedHeaders=content-type%3Bhost
content-type:multipart/form-data; boundary=5be13cc........................dd6aef6823
host:storage.googleapis.com
content-type;host
UNSIGNED-PAYLOAD</CanonicalRequest></Error>
Actually I have tryied many other ways as well but I basically always get this(more or less) same err back, Does someone have an Idea what am I forgetting(I am on that for 2 days now...) ? Thank you
=============================
I edited the question, that code works perfectly
Found the answer, in the both getPresignedURL() and newfileUploadRequest() func, the Header must be set to "Content-Type:application/octet-stream"(or "Content-Type:image/jpeg" for instance if the picture need to be display using its URL), then the pic is uploaded without issue.

Fail to upload file to SFTP host with golang

I have the following golang function to upload a file to SFTP:
func uploadObjectToDestination(sshConfig SSHConnectionConfig, destinationPath string, srcFile io.Reader) {
// Connect to destination host via SSH
conn, err := ssh.Dial("tcp", sshConfig.sftpHost+sshConfig.sftpPort, sshConfig.authConfig)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
// create new SFTP client
client, err := sftp.NewClient(conn)
if err != nil {
log.Fatal(err)
}
defer client.Close()
log.Printf("Opening file on destination server under path %s", destinationPath)
// create destination file
dstFile, err := client.OpenFile(destinationPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC)
if err != nil {
log.Fatal(err)
}
defer dstFile.Close()
log.Printf("Copying file to %s", destinationPath)
// copy source file to destination file
bytes, err := io.Copy(dstFile, srcFile)
if err != nil {
log.Fatal(err)
}
log.Printf("%s - Total %d bytes copied\n", dstFile.Name(), bytes)
}
The code above works 95% of the cases but fails for some files. The only relation between this files which are failing is the size (3-4kb). The other files which succeed are smaller (0.5-3kb). In some cases files with size 2-3kb are failing as well.
I was able to reproduce the same issue with different SFTP servers.
When changing the failing code (io.Copy) with sftp.Write I can see the same behavior, except that the process does not return an error, instead I see that 0 bytes were copied, which seems to be the same like failing with io.Copy.
Btw, when using io.Copy, the error I receive is Context cancelled, unexpected EOF.
The code is running from AWS lambda and there is no memory or time limit issue.
After few hours of digging, it turns out, my code was the source of the issue.
Here is the answer for future reference:
There was another function not in the original question which downloads the object(s) from S3:
func getObjectFromS3(svc *s3.S3, bucket, key string) io.Reader {
var timeout = time.Second * 30
ctx := context.Background()
var cancelFn func()
ctx, cancelFn = context.WithTimeout(ctx, timeout)
defer cancelFn()
var input = &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
}
o, err := svc.GetObjectWithContext(ctx, input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok && aerr.Code() == request.CanceledErrorCode {
log.Fatal("Download canceled due to timeout", err)
} else {
log.Fatal("Failed to download object", err)
}
}
// Load S3 file into memory, assuming small files
return o.Body
}
The code above is using context and for some reason, the object returned object size was wrong.
Since I don't use contexts here I simply converted my code to use GetObject(input) which fixed the issue.
func getObjectFromS3(svc *s3.S3, bucket, key string) io.Reader {
var input = &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
}
o, err := svc.GetObject(input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
switch aerr.Code() {
case s3.ErrCodeNoSuchKey:
log.Fatal(s3.ErrCodeNoSuchKey, aerr.Error())
default:
log.Fatal(aerr.Error())
}
} else {
// Print the error, cast err to awserr.Error to get the Code and
// Message from an error.
log.Fatal(err.Error())
}
}
// Load S3 file into memory, assuming small files
return o.Body
}

How to download from public s3 bucket using golang

I'm implementing a function to download a file from an s3 bucket. This worked fine when the bucket was private and I set the credentials
os.Setenv("AWS_ACCESS_KEY_ID", "test")
os.Setenv("AWS_SECRET_ACCESS_KEY", "test")
However, I made the s3 bucket public as described in here and now I want to download it without credentials.
func DownloadFromS3Bucket(bucket, item, path string) {
file, err := os.Create(filepath.Join(path, item))
if err != nil {
fmt.Printf("Error in downloading from file: %v \n", err)
os.Exit(1)
}
defer file.Close()
sess, _ := session.NewSession(&aws.Config{
Region: aws.String(constants.AWS_REGION)},
)
// Create a downloader with the session and custom options
downloader := s3manager.NewDownloader(sess, func(d *s3manager.Downloader) {
d.PartSize = 64 * 1024 * 1024 // 64MB per part
d.Concurrency = 6
})
numBytes, err := downloader.Download(file,
&s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(item),
})
if err != nil {
fmt.Printf("Error in downloading from file: %v \n", err)
os.Exit(1)
}
fmt.Println("Download completed", file.Name(), numBytes, "bytes")
}
But now I'm getting an error.
Error in downloading from file: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Any idea how to download it without credentials?
We can set Credentials: credentials.AnonymousCredentials when creating session. Following is the working code.
func DownloadFromS3Bucket(bucket, item, path string) {
file, err := os.Create(filepath.Join(path, item))
if err != nil {
fmt.Printf("Error in downloading from file: %v \n", err)
os.Exit(1)
}
defer file.Close()
sess, _ := session.NewSession(&aws.Config{
Region: aws.String(constants.AWS_REGION), Credentials: credentials.AnonymousCredentials},
)
// Create a downloader with the session and custom options
downloader := s3manager.NewDownloader(sess, func(d *s3manager.Downloader) {
d.PartSize = 64 * 1024 * 1024 // 64MB per part
d.Concurrency = 6
})
numBytes, err := downloader.Download(file,
&s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(item),
})
if err != nil {
fmt.Printf("Error in downloading from file: %v \n", err)
os.Exit(1)
}
fmt.Println("Download completed", file.Name(), numBytes, "bytes")
}

Golang Aws S3 NoSuchKey: The specified key does not exist

I'm trying to download Objects from S3, the following is my code:
func listFile(bucket, prefix string) error {
svc := s3.New(sess)
params := &s3.ListObjectsInput{
Bucket: aws.String(bucket), // Required
Prefix: aws.String(prefix),
}
return svc.ListObjectsPages(params, func(p *s3.ListObjectsOutput, lastPage bool) bool {
for _, o := range p.Contents {
//log.Println(*o.Key)
log.Println(*o.Key)
download(bucket, *o.Key)
return true
}
return lastPage
})
}
func download(bucket, key string) {
logDir := conf.Cfg.Section("share").Key("LOG_DIR").MustString(".")
tmpLogPath := filepath.Join(logDir, bucket, key)
s3Svc := s3.New(sess)
downloader := s3manager.NewDownloaderWithClient(s3Svc, func(d *s3manager.Downloader) {
d.PartSize = 2 * 1024 * 1024 // 2MB per part
})
f, err := os.OpenFile(tmpLogPath, os.O_CREATE|os.O_WRONLY, 0644)
if _, err = downloader.Download(f, &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
}); err != nil {
log.Fatal(err)
}
f.Close()
}
func main() {
bucket := "mybucket"
key := "myprefix"
listFile(bucket, key)
}
I can get the objects list in the function listFile(), but a 404 returned when call download, why?
I had the same problem with recent versions of the library. Sometimes, the object key will be prefixed with a "./" that the SDK will remove by default making the download fail.
Try adding this to your aws.Config and see if it helps:
config := aws.Config{
...
DisableRestProtocolURICleaning: aws.Bool(true),
}
I submitted an issue.

AWS S3 large file reverse proxying with golang's http.ResponseWriter

I have a request handler named Download which I want to access a large file from Amazon S3 and push it to the user's browser. My goals are:
To record some request information before granting the user access to the file
To not buffer the file into memory too much. Files may become too large.
Here is what I've explored so far:
func Download(w http.ResponseWriter, r *http.Request) {
sess := session.New(&aws.Config{
Region: aws.String("eu-west-1"),
Endpoint: aws.String("s3-eu-west-1.amazonaws.com"),
S3ForcePathStyle: aws.Bool(true),
Credentials: cred,
})
downloader := s3manager.NewDownloader(sess)
// I can't write directly into the ResponseWriter. It doesn't implement WriteAt.
// Besides, it doesn't seem like the right thing to do.
_, err := downloader.Download(w, &s3.GetObjectInput{
Bucket: aws.String(BUCKET),
Key: aws.String(filename),
})
if err != nil {
log.Error(4, err.Error())
return
}
}
I'm wondering if there isn't a better approach (given the goals I'm trying to achieve).
Any suggestions are welcome. Thank you in advance :-)
If you do want to stream the file through your service (rather than download directly as recommended in the accepted answer) -
import (
...
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/service/s3"
)
func StreamDownloadHandler(w http.ResponseWriter, r *http.Request) {
sess, awsSessErr := session.NewSession(&aws.Config{
Region: aws.String("eu-west-1"),
Credentials: credentials.NewStaticCredentials("my-aws-id", "my-aws-secret", ""),
})
if awsSessErr != nil {
http.Error(w, fmt.Sprintf("Error creating aws session %s", awsSessErr.Error()), http.StatusInternalServerError)
return
}
result, err := s3.New(sess).GetObject(&s3.GetObjectInput{
Bucket: aws.String("my-bucket"),
Key: aws.String("my-file-id"),
})
if err != nil {
http.Error(w, fmt.Sprintf("Error getting file from s3 %s", err.Error()), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s\"", "my-file.csv"))
w.Header().Set("Cache-Control", "no-store")
bytesWritten, copyErr := io.Copy(w, result.Body)
if copyErr != nil {
http.Error(w, fmt.Sprintf("Error copying file to the http response %s", copyErr.Error()), http.StatusInternalServerError)
return
}
log.Printf("Download of \"%s\" complete. Wrote %s bytes", "my-file.csv", strconv.FormatInt(bytesWritten, 10))
}
If the file is potentially large, you don't want it to go trough your own server.
The best approach (in my opinion) is to have the user download it directly from S3.
You can do this by generating a presigned url:
func Download(w http.ResponseWriter, r *http.Request) {
...
sess := session.New(&aws.Config{
Region: aws.String("eu-west-1"),
Endpoint: aws.String("s3-eu-west-1.amazonaws.com"),
S3ForcePathStyle: aws.Bool(true),
Credentials: cred,
})
s3svc := s3.New(sess)
req, _ := s3svc.GetObjectRequest(&s3.GetObjectInput{
Bucket: aws.String(BUCKET),
Key: aws.String(filename),
})
url, err := req.Presign(5 * time.Minute)
if err != nil {
//handle error
}
http.Redirect(w, r, url, http.StatusTemporaryRedirect)
}
The presigned url is only valid for a limited time (5 minutes in this example, adjust to your needs) and takes the user directly to S3. No need to worry about downloads anymore!