I wrote following command to scan DDB (please correct me if I'm wrong). I want to get all value in id column, when sortKey column contains text prefix|:
aws dynamodb scan \
--table-name ProductTable \
--projection-expression "id" \
--filter-expression 'contains(sortKey,:p)' \
--expression-attribute-values '{":p":{"S":"prefix|"}}'
As a result, it returned a vim-view-like list, but how could I copy or save all results?
Thanks
You can output the result of a CLI command to a file using the > or >> operator in the following syntax:
aws dynamodb scan \ --table-name ProductTable \ --projection-expression "id" \ --filter-expression 'contains(sortKey,:p)' \ --expression-attribute-values '{":p":{"S":"prefix|"}}' > output.txt
In this example output.txt is the name of the file you want to output the result to. Its important to note that when you use > operator the output of the command will not be displayed in the terminal and will be written to the specified file.
It’a also important to note that each time you direct output to a file, the contents of the file will be replaced with the output of the most recent command to output to that file. If you’d prefer to append to the end of the file, instead of replacing the file contents you can use the double output operator >> instead of >.
Related
I'm having an error when I run a command to extract data to a csv file, using the AWS CLI with jq.
Command:
aws dynamodb scan --table-name MyTable --select ALL_ATTRIBUTES --page-size 500 --max-items 100000 --output json --profile production | jq -r '.Items' | jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ].S])[] | #csv' > export.my-table.csv
Error:
'charmap' codec can't encode characters in position 1-3: character maps to <undefined> parse error: Unfinished JSON term at EOF at line 5097, column 21
I believe that is a query that I wrote previously that does not work on nested attributes. You will have to modify it accordingly.
I want to see the data available in the spark streaming dataframe and later part I want to apply business operation on that data.
So far I have tried to convert streaming DataFrame to RDD. Once that object converted into RDD, I want to apply a function with transform the data and also create new column with the schema( for specific message).
dsraw = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", bootstrap_kafka_server) \
.option("subscribe", topic) \
.load() \
.selectExpr("CAST(value AS STRING)")
print "type (df_stream)", type(dsraw)
print "schema (dsraw)", dsraw.printSchema()
def show_data_fun(dsraw, epoch_id):
dsraw.show()
row_rdd = dsraw.rdd.map(lambda row: literal_eval(dsraw['value']))
json_data = row_rdd.collect()
print "From rdd : ", type(json_data)
print "From rdd : ", json_data[0]
print "show_data_function_call"
jsonDataQuery = dsraw \
.writeStream \
.foreach(show_data_fun)\
.queryName("df_value")\
.trigger(continuous='1 second')\
.start()
print the first JSON message which is in the stream.
From EC2 console terminal, I am trying to list a part of an S3 bucket directory names into an array excluding a prefix "date=" but cannot figure out a complete solution.
I've already tried the following code and getting close:
origin="bucket/path/to/my/directory/"
for path in $(aws s3 ls $origin --profile crossaccount --recursive | awk '{print $4}');
do echo "$path"; done
note: directory contains multiple directories like /date=YYYYMMDD/ and all I want to be returned into an array is the YYYYMMDD where YYYYMMDD is >= a certain value.
I expect the output to be an array:
YYYYMMDD, YYYYMMDD, YYYYMMDD
actual result is:
path/to/my/directory/date=YYYYMMDD/file#1
path/to/my/directory/date=YYYYMMDD/file#2
path/to/my/directory/date=YYYYMMDD/file#3
https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html
path="bucket/path/to/my/directory/date="
for i in $(aws s3 ls $path --profile crossaccount --recursive | awk -F'[=/]' '{if($6>20190000)print $6}');
do python3.6 my_python_program.py $i; done
I used awk. In the bracket are the column delimiters =/, and $6 is the 6th column after the directory full name has been delimited. It gave me the date I needed to feed into my python program.
In an aws cli jmespath query, with for example the output ["a","a","b","a","b"], how do i extract the unique values of it to get ["a","b"]?
Unfortunately this is not currently possible in jmespath.
It's not what you asked for but I've used the following:
aws ... | jq -r ".[]" | sort | uniq
This will convert ["a", "a", "b", "a"] to:
a
b
The closest I've come to "unique values"... is to deduplicate outside of JMESPath (so not really in JMESPath pipelines).
aws ec2 describe-images \
--region us-east-1 \
--filter "Name=architecture,Values=x86_64" \
--query 'Images[].ImageOwnerAlias | join(`"\n"`, #)' \
--output text \
| sort -u
Output:
amazon
aws-marketplace
If you use JMESPath standalone, you'd write things like this.
jp -u -f myjson.json 'Images[].ImageOwnerAlias | join(`"\n"`, #)' | sort -u
The idea is to get jp to spit out a list of values (on separate lines) and then apply all the power of your favorite sorter. The tricky part is to get the list (of course).
I'm trying to list files from a virtual folder in S3 within a specific date range. For example: all the files that have been uploaded for the month of February.
I currently run a aws s3 ls command but that gives all the files:
aws s3 ls s3://Bucket/VirtualFolder/VirtualFolder --recursive --human-readable --summarize > c:File.txt
How can I get it to list only the files within a given date range?
You could filter the results with a tool like awk:
aws s3 ls s3://Bucket/VirtualFolder/VirtualFolder --recursive --human-readable --summarize \
| awk -F'[-: ]' '$1 >= 2016 && $2 >= 3 { print }'
Where awk splits each records using -, :, and space delimiters so you can address fields as:
$1 - year
$2 - month
$3 - day
$4 - hour
$5 - minute
$6 - second
The aws cli ls command does not support filters, so you will have to bring back all of the results and filter locally.
Realizing this question was tagged command-line-interface, I have found the best way to address non-trivial aws-cli desires is to write a Python script.
Tersest example:
$ python3 -c "import boto3; print(boto3.client('s3').list_buckets()['Buckets'][0])"
Returns: (for me)
{'Name': 'aws-glue-scripts-282302944235-us-west-1', 'CreationDate': datetime.datetime(2019, 8, 22, 0, 40, 5, tzinfo=tzutc())}
That one-liner isn't a profound script, but it can be expounded into one. (Probably with less effort than munging a bash script, much as I love bash.) After looking up a few boto3 calls, you can deduce the rest from equivalent cli commands.