DynamoDB list all backups using AWS GoLang SDK - amazon-web-services

Based on the example given in the link blow on API Operation Pagination without Callbacks
https://aws.amazon.com/blogs/developer/context-pattern-added-to-the-aws-sdk-for-go/
I am trying to list all the Backups in dynamodb. But it seems like pagination is not working and it is just retrieving first page and not going to next page
package main
import (
"context"
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/request"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/dynamodb"
)
func main() {
sess, sessErr := session.NewSession()
if sessErr != nil {
fmt.Println(sessErr)
fmt.Println("Cound not initilize session..returning..")
return
}
// Create DynamoDB client
dynamodbSvc := dynamodb.New(sess)
params := dynamodb.ListBackupsInput{}
ctx := context.Background()
p := request.Pagination{
NewRequest: func() (*request.Request, error) {
req, _ := dynamodbSvc.ListBackupsRequest(&params)
req.SetContext(ctx)
return req, nil
},
}
for p.Next() {
page := p.Page().(*dynamodb.ListBackupsOutput)
fmt.Println("Received", len(page.BackupSummaries), "objects in page")
for _, obj := range page.BackupSummaries {
fmt.Println(aws.StringValue(obj.BackupName))
}
}
//return p.Err()
} //end of main

Its a bit late but I'll just put it here in case I can help somebody.
Example:
var exclusiveStartARN *string
var backups []*dynamodb.BackupSummary
for {
backup, err := svc.ListBackups(&dynamodb.ListBackupsInput{
ExclusiveStartBackupArn:exclusiveStartARN,
})
if err != nil {
fmt.Println(err)
os.Exit(1)
}
backups = append(backups, backup.BackupSummaries...)
if backup.LastEvaluatedBackupArn != nil {
exclusiveStartARN = backup.LastEvaluatedBackupArn
//max 5 times a second so we dont hit the limit
time.Sleep(200 * time.Millisecond)
continue
}
break
}
fmt.Println(len(backups))
Explaination:
The way that pagination is done is via ExclusiveStartBackupArn in the ListBackupsRequest. The ListBackupsResponse returns LastEvaluatedBackupArn if there are more pages, or nil if its the last/only page.

It could be that you're smashing into the API a bit with your usage
You can call ListBackups a maximum of 5 times per second.
What is the value of p.HasNextPage() in your p.Next() loop?

Related

How do I get pagination working with exclusiveStartKey for dynamodb aws-sdk-go-v2?

I'm trying to create a pagination endpoint for a dynamodb table I have. But I've tried everything to get the exclusiveStartKey to be the correct type for it to work. However, everything I've tried doesn't seem to work.
example code:
func GetPaginator(tableName string, limit int32, lastEvaluatedKey string) (*dynamodb.ScanPaginator, error) {
svc, err := GetClient()
if err != nil {
logrus.Error(err)
return nil, err
}
input := &dynamodb.ScanInput{
TableName: aws.String(tableName),
Limit: aws.Int32(limit),
}
if lastEvaluatedKey != "" {
input.ExclusiveStartKey = map[string]types.AttributeValue{
"id": &types.AttributeValueMemberS{
Value: lastEvaluatedKey,
},
}
}
paginator := dynamodb.NewScanPaginator(svc, input)
return paginator, nil
}
Edit:
Okay so I'm creating a API that requires pagination. The API needs to have a query parameter where the lastEvaluatedId can be defined. I can then use the lastEvaluatedId to pass as the ExclusiveStartKey on the ScanInput. However when I do this I still received the same item from the database. I've created a test.go file and will post the code below:
package main
import (
"context"
"fmt"
"os"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue"
"github.com/aws/aws-sdk-go-v2/service/dynamodb"
)
type PaginateID struct {
ID string `dynamodbav:"id" json:"id"`
}
func main() {
lastKey := PaginateID{ID: "ae82a99d-486e-11ec-a7a7-0242ac110002"}
key, err := attributevalue.MarshalMap(lastKey)
if err != nil {
fmt.Println(err)
return
}
cfg, err := config.LoadDefaultConfig(context.TODO(), func(o *config.LoadOptions) error {
o.Region = os.Getenv("REGION")
return nil
})
if err != nil {
fmt.Println(err)
return
}
svc := dynamodb.NewFromConfig(cfg, func(o *dynamodb.Options) {
o.EndpointResolver = dynamodb.EndpointResolverFromURL("http://localhost:8000")
})
input := &dynamodb.ScanInput{
TableName: aws.String("TABLE_NAME"),
Limit: aws.Int32(1),
ExclusiveStartKey: key,
}
paginator := dynamodb.NewScanPaginator(svc, input)
if paginator.HasMorePages() {
data, err := paginator.NextPage(context.TODO())
if err != nil {
fmt.Println(err)
return
}
fmt.Println(data.Items[0]["id"])
fmt.Println(data.LastEvaluatedKey["id"])
}
}
When I run this test code. I get this output:
&{ae82a99d-486e-11ec-a7a7-0242ac110002 {}}
&{ae82a99d-486e-11ec-a7a7-0242ac110002 {}}
So the item that is returned is the same Id that I am passing to the ScanInput.ExclusiveStartKey. Which means it's not starting from the ExclusiveStartKey. The scan is starting from the beginning everytime.
The aws-sdk-go-v2 DynamoDB query and scan paginator constructors have a bug (see my github issue, includes the fix). They do not respect the ExclusiveStartKey param.
As an interim fix, I copied the paginator type locally and added one line in to the constructor: nextToken: params.ExclusiveStartKey.
so basically what you need to do is to get the LastEvaluatedKey and to pass it to ExclusiveStartKey
you can not use the scan paginator attributes because it's not exported attributes, therefore instead I suggest that you use the returned page by calling NextPage
in the following snippet I have an example :
func GetPaginator(ctx context.Context,tableName string, limit int32, lastEvaluatedKey map[string]types.AttributeValue) (*dynamodb.ScanOutput, error) {
svc, err := GetClient()
if err != nil {
logrus.Error(err)
return nil, err
}
input := &dynamodb.ScanInput{
TableName: aws.String(tableName),
Limit: aws.Int32(limit),
}
if len(lastEvaluatedKey) > 0 {
input.ExclusiveStartKey = lastEvaluatedKey
}
paginator := dynamodb.NewScanPaginator(svc, input)
return paginator.NextPage(), nil
}
keep in mind that paginator.NextPage(ctx) could be nil incase there is no more pages or you can use HasMorePages()

Golang AWS Firehose using json field name instead of struct field name

I have the following struct:
type ProcessedRecords struct {
CustIndividualID string `json:"individual id"`
Household string `json:"Household"`
}
And I have a slice of many structs that share this value. I'm trying to submit them using the PutRecordBatch operation from the AWS SDK:
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/firehose"
)
type ProcessedRecords struct {
CustIndividualID string `json:"individual id"`
Household string `json:"Household"`
}
func main() {
submitToFirehose(recordList)
}
func submitToFirehose(records []ProcessedRecords) {
streamName := "processed-stream"
sess := session.Must(session.NewSession())
// Create a Firehose client with additional configuration
firehoseService := firehose.New(sess, aws.NewConfig().WithRegion("us-east-1"))
recordsBatchInput := &firehose.PutRecordBatchInput{}
recordsBatchInput = recordsBatchInput.SetDeliveryStreamName(streamName)
recordsInput := []*firehose.Record{}
for i := 0; i < len(records); i++ {
if len(recordsInput) == 500 {
recordsBatchInput = recordsBatchInput.SetRecords(recordsInput)
resp, err := firehoseService.PutRecordBatch(recordsBatchInput)
if err != nil {
fmt.Printf("PutRecordBatch err: %v\n", err)
} else {
fmt.Printf("FailedPuts: %v\n", *resp.FailedPutCount)
}
recordsInput = []*firehose.Record{}
}
b, err := json.Marshal(records[i])
if err != nil {
log.Printf("Error: %v", err)
}
record := &firehose.Record{Data: b}
recordsInput = append(recordsInput, record)
}
}
This seems to work and it would appear that my Glue backend is setup correctly, however CustIndividualID is not being written to S3. I suspect it's because it's reading the json:"individual id" as the column name and not the CustIndividualID.
this is a problem because glue tables can't have spaces in the column name. What am I doing wrong?

Unable to perform query on AWS athena using Golang SDK

I am new to AWS and Golang, and I am trying to create a lambda function, which will trigger AWS Athena query and email the result using AWS SES service. Even after searching for an hour, I couldn't find a working example of lambda function (in Golang) to perform a query on Athena and getting the output of the query.
While searching, I found code for the same in Java, Python and Node Js, but not in Golang.
Even the Go-SDK page redirects to Java example. But unfortunately, I don't even understand Java.
I have also looked into this AWS SDK for Go API Reference page. But I don't understand what is the flow of the program and which operation to select.
I have tried to create the program for this, this may be completely wrong, and I don't know what to do next. Below is the code -
package main
import (
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/athena"
)
func main() {
// Create a new session in the us-west-2 region.
sess, err := session.NewSession(&aws.Config{
Region: aws.String("us-east-1")},
)
// Create an Athena session.
client := athena.New(sess)
// Example sending a request using the StartQueryExecutionRequest method.
query := "SELECT * FROM table1 ;"
params := query
req, resp := client.StartQueryExecutionRequest(params)
err1 := req.Send()
if err1 == nil { // resp is now filled
fmt.Println(resp)
}
}
Appreciate if someone can help me to perform an Athena query and to get its result in Golang(Preferably) or can share some resource. Once I get it, I can then send an email using AWS SES.
Use this to get started.
// run as: go run main.go
package main
import (
"context"
"fmt"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/aws/endpoints"
"github.com/aws/aws-sdk-go-v2/aws/external"
"github.com/aws/aws-sdk-go-v2/service/athena"
)
const table = "textqldb.textqltable"
const outputBucket = "s3://bucket-name-here/"
func main() {
cfg, err := external.LoadDefaultAWSConfig()
if err != nil {
fmt.Printf("config error: %v\n", err)
return
}
cfg.Region = endpoints.UsEast2RegionID
client := athena.New(cfg)
query := "select * from " + table
resultConf := &athena.ResultConfiguration{
OutputLocation: aws.String(outputBucket),
}
params := &athena.StartQueryExecutionInput{
QueryString: aws.String(query),
ResultConfiguration: resultConf,
}
req := client.StartQueryExecutionRequest(params)
resp, err := req.Send(context.TODO())
if err != nil {
fmt.Printf("query error: %v\n", err)
return
}
fmt.Println(resp)
}
#Everton's code is executing a query on Athena, and its responses are getting saved on S3 bucket and not getting returned. So, I have added the code to execute the Athena query and get the response back. Hope this may help others.
// run as: go run main.go
package main
import (
"context"
"fmt"
"time"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/aws/endpoints"
"github.com/aws/aws-sdk-go-v2/aws/external"
"github.com/aws/aws-sdk-go-v2/service/athena"
)
const table = "<Database_Name>.<Table_Name>"
const outputBucket = "s3://bucket-name-here/"
// Execute the query and return the query ID
func executeQuery(query string) *string {
cfg, err := external.LoadDefaultAWSConfig()
if err != nil {
fmt.Printf("config error: %v\n", err)
}
cfg.Region = endpoints.UsEast2RegionID
client := athena.New(cfg)
resultConf := &athena.ResultConfiguration{
OutputLocation: aws.String(outputBucket),
}
params := &athena.StartQueryExecutionInput{
QueryString: aws.String(query),
ResultConfiguration: resultConf,
}
req := client.StartQueryExecutionRequest(params)
resp, err := req.Send(context.TODO())
fmt.Println("Response is: ", resp, " Error is:", err)
if err != nil {
fmt.Printf("Query Error: %v\n", err)
}
fmt.Println("Query Execution Response ID:", resp.QueryExecutionId)
return resp.QueryExecutionId
}
// Takes queryId as input and returns its response
func getQueryResults(QueryID *string) (*athena.GetQueryResultsResponse, error) {
cfg, err := external.LoadDefaultAWSConfig()
if err != nil {
panic("config error")
}
cfg.Region = endpoints.UsEast2RegionID
client := athena.New(cfg)
params1 := &athena.GetQueryResultsInput{
QueryExecutionId: QueryID,
}
req := client.GetQueryResultsRequest(params1)
resp, err := req.Send(context.TODO())
if err != nil {
fmt.Printf("Query Response Error: %v\n", err)
return nil, err
}
return resp, nil
}
func main() {
query := "select * from " + table
// Execute an Athena Query
QueryID := executeQuery(query)
// Get the response of the query
// Wait for some time for query completion
time.Sleep(15 * time.Second) // Otherwise create a loop and try for every x seconds
Resp, err := getQueryResults(QueryID)
if err != nil {
fmt.Printf("Error getting Query Response: %v\n", err)
} else {
fmt.Println(" \nRows:", Resp.ResultSet.Rows)
}
}

Increasing aws log download speed

I'm going to start with showing the code and then what I'm trying to do, code:
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"github.com/aws/aws-sdk-go/service/s3/s3manager"
)
var (
// empty strings for security reasons
Bucket = "" // Download from this bucket
Prefix = "" // Using this key prefix
LocalDirectory = "s3logs" // Into this directory
)
func main() {
sess := session.New()
client := s3.New(sess, &aws.Config{Region: aws.String("us-west-1")})
params := &s3.ListObjectsInput{Bucket: &Bucket, Prefix: &Prefix}
manager := s3manager.NewDownloaderWithClient(client, func(d *s3manager.Downloader) {
d.PartSize = 64 * 1024 * 1024 // 64MB per part
d.Concurrency = 8
}) // works
//manager := s3manager.NewDownloaderWithClient(client) //works
d := downloader{bucket: Bucket, dir: LocalDirectory, Downloader: manager}
client.ListObjectsPages(params, d.eachPage)
}
type downloader struct {
*s3manager.Downloader
bucket, dir string
}
func (d *downloader) eachPage(page *s3.ListObjectsOutput, more bool) bool {
for _, obj := range page.Contents {
d.downloadToFile(*obj.Key)
}
return true
}
func (d *downloader) downloadToFile(key string) {
// Create the directories in the path
file := filepath.Join(d.dir, key)
if err := os.MkdirAll(filepath.Dir(file), 0775); err != nil {
panic(err)
}
fmt.Printf("Downloading " + key)
// Setup the local file
fd, err := os.Create(file)
if err != nil {
panic(err)
}
defer fd.Close()
// Download the file using the AWS SDK
fmt.Printf("Downloading s3://%s/%s to %s...\n", d.bucket, key, file)
params := &s3.GetObjectInput{Bucket: &d.bucket, Key: &key}
_, e := d.Download(fd, params)
if e != nil {
panic(e)
}
}
I'm trying to download the log files from a particular bucket and eventually many buckets. I need the download to be as fast as possible because. There is lots of data. My question is what is the most effective way to download huge amounts of data quickly? The whole process is nil if those logs can't be downloaded at a reasonable speed. Is there a faster way, it's already concurrent according to amazons doc? Any ideas? Also, i've noticed a curious thing. It doesn't matter if I set the Concurrency to 1, 4, or 20. Everything is still downloading at ~.70 - ~/.80 gb / min

Exercise: Web Crawler - concurrency not working

I am going through the golang tour and working on the final exercise to change a web crawler to crawl in parallel and not repeat a crawl ( http://tour.golang.org/#73 ). All I have changed is the crawl function.
var used = make(map[string]bool)
func Crawl(url string, depth int, fetcher Fetcher) {
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("\nfound: %s %q\n\n", url, body)
for _,u := range urls {
if used[u] == false {
used[u] = true
Crawl(u, depth-1, fetcher)
}
}
return
}
In order to make it concurrent I added the go command in front of the call to the function Crawl, but instead of recursively calling the Crawl function the program only finds the "http://golang.org/" page and no other pages.
Why doesn't the program work when I add the go command to the call of the function Crawl?
The problem seems to be, that your process is exiting before all URLs can be followed
by the crawler. Because of the concurrency, the main() procedure is exiting before
the workers are finished.
To circumvent this, you could use sync.WaitGroup:
func Crawl(url string, depth int, fetcher Fetcher, wg *sync.WaitGroup) {
defer wg.Done()
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("\nfound: %s %q\n\n", url, body)
for _,u := range urls {
if used[u] == false {
used[u] = true
wg.Add(1)
go Crawl(u, depth-1, fetcher, wg)
}
}
return
}
And call Crawl in main as follows:
func main() {
wg := &sync.WaitGroup{}
Crawl("http://golang.org/", 4, fetcher, wg)
wg.Wait()
}
Also, don't rely on the map being thread safe.
Here's an approach, again using sync.WaitGroup but wrapping the fetch function in a anonymous goroutine. To make the url map thread safe (meaning parallel threads can't access and change values at the same time) one should wrap the url map in a new type with a sync.Mutex type included i.e. the fetchedUrls type in my example and use the Lock and Unlock methods while the map is being searched/updated.
type fetchedUrls struct {
urls map[string]bool
mux sync.Mutex
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, used fetchedUrls, wg *sync.WaitGroup) {
if depth <= 0 {
return
}
used.mux.Lock()
if used.urls[url] == false {
used.urls[url] = true
wg.Add(1)
go func() {
defer wg.Done()
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("found: %s %q\n", url, body)
for _, u := range urls {
Crawl(u, depth-1, fetcher, used, wg)
}
return
}()
}
used.mux.Unlock()
return
}
func main() {
wg := &sync.WaitGroup{}
used := fetchedUrls{urls: make(map[string]bool)}
Crawl("https://golang.org/", 4, fetcher, used, wg)
wg.Wait()
}
Output:
found: https://golang.org/ "The Go Programming Language"
not found: https://golang.org/cmd/
found: https://golang.org/pkg/ "Packages"
found: https://golang.org/pkg/os/ "Package os"
found: https://golang.org/pkg/fmt/ "Package fmt"
Program exited.
I created my 2 implementations(different concurrency designs) of the same here.
it also uses a thread-safe map
playground link