AWS CLI : Issue with "delete-item" request - amazon-web-services

i am having issues getting particular delete-item request to work through the AWS CLI.
Here is what I am trying to do:
Perform a scan operation on my DynamoDB table to return all results that match a filter expression on a field that ISN'T the partition key
For all items that match this query, delete them from the table
Here is the command I am trying to run..
aws dynamodb scan \
--filter-expression "EnvironmentGroup = :EnvironmentGroup" \
--expression-attribute-values '{":EnvironmentGroup":{"S":"deleteThisGroup"}}' \
--table-name "MyTable"
--query "Items[*]" \
# use jq to get each item on its own line
| jq --compact-output '.[]' \
# replace newlines with null terminated so
# we can tell xargs to ignore special characters
| tr '\n' '\0' \
| xargs -0 -t -I keyItem \
# use the whole item as the key to delete (dynamo keys *are* dynamo items)
aws dynamodb delete-item --table-name "MyTable" --key=keyItem
So in the above example, I want to perform a Scan on the MyTable table and return all items that have the EnvironmentGroup field set to deleteThisGroup. I then want each of these items to be deleted from the table.
This isn't working for me. If I take out the delete-item line, the command works and returns a list of all the items, but if I add the delete-item back in, I get Error parsing parameter '--key': Expected: '=', received: 'EOF' for input: keyItem
What am I doing wrong here?

delete-item is to delete one item
If you want to delete multiple items, use batch-write-item

Related

AWS DynamoDB Table KeySchema RANGE key

I want to create a DynamoDB table which contains only one order entity with following attributes:
OrderId
OrderStatus
Price
Access patterns I try to achieve:
Get single order by id
Filter orders by status
Design in my mind for above access patterns is to create simple primary key consisting of OrderId and one local secondary index with composite primary key consisting of OrderId and OrderStatus. AWS CLI command for achieving this:
aws dynamodb create-table \
--table-name Order \
--attribute-definitions \
AttributeName=OrderId,AttributeType=S \
AttributeName=OrderStatus,AttributeType=S \
--key-schema \
AttributeName=OrderId,KeyType=HASH \
--local-secondary-indexes \
"[{\"IndexName\": \"OrderStatusIndex\",
\"KeySchema\":[{\"AttributeName\": \"OrderId\",\"KeyType\":\"HASH\"},
{\"AttributeName\":\"OrderStatus\",\"KeyType\":\"RANGE\"}],
\"Projection\":{\"ProjectionType\":\"INCLUDE\", \"NonKeyAttributes\":[\"Price\"]}}]" \
--provisioned-throughput \
ReadCapacityUnits=1,WriteCapacityUnits=1 \
--table-class STANDARD
When running this command, I get:
An error occurred (ValidationException) when calling the CreateTable
operation: One or more parameter values were invalid: Table KeySchema
does not have a range key, which is required when specifying a
LocalSecondaryIndex
How can get rid of this error while sticking to my design? Or if it is not possible how should I design my table in this case?
The definition of a Local Secondary index is that it has the same Partition Key as the table , but a difference range key.
In order to have a partition key other than what the table uses, you'll need a Global Secondary Index.

How can I find the size of all tables in AWS DynamoDB?

I have about 50 tables in DynamoDB and I'm looking for a way to find size of all tables.
aws dynamodb describe-table --table-name [table name]
I know above command returns TableSizeBytes, but is this the only way to get size of table?
Do I have to run this command for all tables in picture?
Also, what is the cost of running this command?
Just write a script and loop over all your tables. As already stated there's no cost for running the command.
For example:
#!/usr/bin/env bash
YOUR_DYNAMODB_TABLES=( "table1" "table2" "table3" )
TOTAL_SIZE=0
for TABLE in "${YOUR_DYNAMODB_TABLES[#]}"
do
SIZE=$(aws dynamodb describe-table --table-name "$TABLE" | jq '.Table.TableSizeBytes')
TOTAL_SIZE=$((TOTAL_SIZE + SIZE))
done
echo $TOTAL_SIZE
DescribeTable API is free to invoke on DynamoDB. Your only way is to iterate through all the tables and sum up the values, but I don't get why that might be an issue.
If you don't want to list all tables by hand, here's a one liner, just for fun:
aws dynamodb list-tables --region us-east-1 | \
jq -r '.TableNames[]' | \
xargs -L1 -I'{}' bash -c 'aws dynamodb describe-table --table-name {} --region us-east-1 | jq -r ".Table.TableSizeBytes"' | \
awk '{S+=$1} END {print S}'

How to export Google spanner query results to .csv or google sheets?

I am new to google spanner and I have run a query and found about 50k rows of data. I want to export that resultset to my local machine like .csv or into a google sheet. Previously I have used TOAD where I have an export button, but here I do not see any of those options. Any suggestions please.
The gcloud spanner databases execute-sql command allows you to run SQL statements on the command line and redirect output to a file.
The --format=csv global argument should output in CSV.
https://cloud.google.com/spanner/docs/gcloud-spanner
https://cloud.google.com/sdk/gcloud/reference/
Unfortunately, gcloud spanner databases execute-sql is not quite compatible with --format=csv because of the way the data is laid out under the hood (an array instead of a map). It's much less pretty, but this works:
SQL_STRING='select * from your_table'
gcloud spanner databases execute-sql [YOURDB] --instance [YOURINSTANCE] \
--sql=SQL_STRING --format json > data.json
jq '.metadata.rowType.fields[].name' data.json | tr '\n' ', ' > data.csv
echo "" >> data.csv
jq '.rows[] | #csv' data.json >> data.csv
This dumps the query in json form to data.json, then writes the column names to the CSV, followed by a line feed, and finally the row contents. As a bonus, jq is installed by default on cloudshell, so this shouldn't carry any extra dependencies there.
As #redpandacurios stated you can use the gcloud spanner databases execute-sql CLI command to achieve this, though without the --format csv option as it causes a Format [csv] requires a non-empty projection. error on gcloud v286.0.0.
This does not produce the projection error:
gcloud spanner databases execute-sql \
[DATABASE] \
--instance [INSTANCE] \
--sql "<YOUR SQL>" \
>output.csv
But you get an output formatted as:
<column1> <column2>
<data1> <data1>
<data2> <data2>
...
<dataN> <dataN>
So not quite csv, but whitespace separated. If you want JSON, use --format json >output.json in place of the last line.
To get CSV it seems you may need to convert from JSON to CSV as stated in one of the other answers.
You could use a number of standard database tools with Google Cloud Spanner using a JDBC driver.
Have a look at this article: https://www.googlecloudspanner.com/2017/10/using-standard-database-tools-with.html
Toad is not included as an example, and I don't know if Toad supports dynamic loading of JDBC drivers and connecting to any generic JDBC database. If not, you could try one of the other tools listed in the article. Most of them would also include an export function.
Others have mentioned using --format "csv" but getting the error Format [csv] requires a non-empty projection.
I believe I discovered how to specify projections that will get --format csv to work as expected. An example:
gcloud spanner databases execute-sql [DATABASE] --instance [INSTANCE] \
--sql "select c1, c2 from t1" \
--flatten "rows" \
--format "csv(rows[0]:label=c1, rows[1]:label=c2)"
rows is the actual field name returned by execute-sql and that we need to properly transform in the projection.
I made it with awk only, my gcloud is producing "text" output by default, where values have no whitespaces and are separated with tabs:
gcloud spanner databases execute-sql \
[DATABASE] \
--instance [INSTANCE] \
--sql "<YOUR SQL>" \
| awk '{print gensub(/[[:space:]]{1,}/,",","g",$0)}' \
> output.csv
For key=value format (useful where there are many columns) I use this awk filter instead, to catch the column names in 1st row, then to combine them with values:
awk 'NR==1 {split($0,columns); next} {split ($0,values); for (i=1; i<=NF; i++) printf ("row %d: %s=%s\n", NR-1, columns[i], values[i]); }'

Export a DynamoDB table as CSV through AWS CLI (without using pipeline)

I am new to AWS CLI and I am trying to export my DynamoDB table in CSV format so that I can import it directly into PostgreSQL. Is there a way to do that using AWS CLI?
I came across this command: aws dynamodb scan --table-name <table-name> - but this does not provide an option of a CSV export.
With this command, I can see the output in my terminal but I am not sure how to write it into a file.
If all items have the same attributes, e.g. id and name both of which are strings, then run:
aws dynamodb scan \
--table-name mytable \
--query "Items[*].[id.S,name.S]" \
--output text
That would give tab-separated output. You can redirect this to file using > output.txt, and you could then easily convert tabs into commas for csv.
Note that you may need to paginate per the scan documentation:
If the total number of scanned items exceeds the maximum dataset size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation. The results also include the number of items exceeding the limit. A scan can result in no table data meeting the filter criteria.
Another option is the DynamoDBtoCSV project at github.
For localhost dynamodb:
$aws dynamodb scan --table-name AOP --region us-east-1 --endpoint-url
http://localhost:8000 --output json > /home/ohelig/Desktop/a.json
For dynamodb:
$aws dynamodb scan --table-name AOP --region us-east-1 --output json > /home/ohelig/Desktop/a.json
Then Convert JSON to CSV or whatever.
I have modified above answer to make it clear.
A better way to do a full export of all columns without listign out is at Dynamo db export to csv
basically
aws dynamodb scan --table-name my-table --select ALL_ATTRIBUTES --page-size 500 --max-items 100000 --output json | jq -r '.Items' | jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ].S])[] | #csv' > export.my-table.csv
You can use jq convert the json output given by aws cli to csv
aws dynamodb scan --table-name mytable --query "Items[*].[id.S,name.S]" --output json | jq -r '.[] | #csv' > dump.csv
You can use jq to convert json into csv
aws dynamodb query \
--table-name <table-name> \
--index-name <index-name> \
--select SPECIFIC_ATTRIBUTES \
--projection-expression "attributes1, attributes2,..." \
--key-condition-expression "#index1 = :index1 AND #index2 = :index2" \
--expression-attribute-names '{"#index1": "index1","#index2": "index2"}' \
--expression-attribute-values '{":index1": {"S":"key1"},":index2": {"S":"key2"}}' \
--output json | jq -r '.Items' | jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ][]?])[] | #csv' > output.csv
But be careful if the column data length is different it will produce wrong output

How to list only the EC2 instances in a given CloudFormation stack?

What can I use for $QUERY in the command below that meets the following criteria:
aws ec2 describe-instances --query $QUERY
Only prints instances with an aws:cloudformation:stack-name tag equal to test-stack.
Only prints the InstanceId property for each instance.
Doesn't resort to piping, for loops, or other shell fanciness.
There are a few parameters to use here:
Querying
--query (docs) for retrieving only the InstanceId
Filtering by stack-name tag
--filter (docs) for excluding the instances not tagged with the stack's name
tag-key - The key of a tag assigned to the resource. This filter is
independent of the tag-value filter. For example, if you use both the
filter "tag-key=Purpose" and the filter "tag-value=X", you get any
resources assigned both the tag key Purpose (regardless of what the
tag's value is), and the tag value X (regardless of what the tag's key
is). If you want to list only resources where Purpose is X, see the
tag :key =value filter.
tag-value - The value of a tag assigned to
the resource. This filter is independent of the tag-key filter.
Formatting
--output (docs) for returning only the values you queried for (so no quotes or json/table fluff)
The text format organizes the AWS CLI's output into tab-delimited
lines. It works well with traditional Unix text tools such as grep,
sed, and awk, as well as Windows PowerShell.
Using those parameters like this:
aws ec2 describe-instances \
--query "Reservations[*].Instances[*].InstanceId[]" \
--filters "Name=tag-key,Values=aws:cloudformation:stack-name" "Name=tag-value,Values=test-stack" \
--output=text
Returns:
i-sd64f52a i-das5d64a i-sad56d4