Get number of partitions in AWS Glue for specific range - amazon-web-services

I want to list all the partitions for a given table and get a count of it, but
aws glue get-partitions --database-name ... returns detailed information about each partitions which is not very helpful in this case.
Let's say my table is partitioned by input_data_date and country I want to know how many partitions I have for a given day.
I can do something with this
aws glue get-partitions --database-name MYDB --table-name MYTABLE --expression "input_data_date = '2021-07-09' "
But it needs some scripting I was looking for a better and cleaner way just by AWS CLI or ....

The AWS CLI uses JMESPATH, which has a length() function. Therefore, you can use:
aws glue get-partitions --database-name xx --table-name xx --query 'length(Partitions[])'
That will return the total number of partitions.
If you want to do something more specific ("how many partitions I have for a given day"), you'd probably need to use a better SDK (eg Python with boto3) to process the information.

Related

How to get programatically the list of sagemaker instance types ml.*?

With the AWS CLI ec2 describe-instance-types I can get a list of all the EC2 instance types but Sagemaker Instance Types like ml.t3.medium, ml.t3.large, ml.m5.xlarge, etc are not there.
aws ec2 describe-instance-types --filters "Name=instance-type,Values=ml.*" --query "InstanceTypes[].{Type: InstanceType, MaxENI: NetworkInfo.MaximumNetworkInterfaces, IPv4addr: NetworkInfo.Ipv4AddressesPerInterface}" --output table
# returns no results
I know I can get the list of SageMaker Instance Types from https://aws.amazon.com/sagemaker/pricing/ but I really want to get it programmatically.
How can I get programatically the list of instance types supported in Sagemaker for a given region?
You can get the list of ml instances with this CLI call (doc):
aws pricing get-products --service-code AmazonSageMaker --filters Type=TERM_MATCH,Field=location,Value="US East (N. Virginia)"
You'll need to filter the results further.
Note that a particular ml instance type might be available for a certain SageMaker feature like training, but not for inference. And be available in one region but not another.
If you're end goal is to get technical details, you could first fine the relevant ml.* instances (maybe even with regex), then use the EC2 describe instance type to get more details (just strip down the ml. prefix).
You can find relevant Python code in the "Total Cost" section of my notebook here.
Building upon the answer from Gili.
Using the aws CLI and jq :
aws --region us-east-1 pricing get-products \
--service-code AmazonSageMaker \
--filters Type=TERM_MATCH,Field=regionCode,Value=eu-north-1 \
| jq -r '.PriceList[]|fromjson|select(.product.productFamily == "ML Instance")|.product.attributes.instanceName'\
|sort\
|uniq
ml.c5.12xlarge
ml.c5.18xlarge
ml.c5.24xlarge
ml.c5.2xlarge
ml.c5.4xlarge
ml.c5.9xlarge
ml.c5.large
...
ml.t3.2xlarge
ml.t3.large
ml.t3.medium
ml.t3.xlarge
it uses
aws pricing get-products
--region us-east-1 is important because the Pricing service is not widely available
--filters Type=TERM_MATCH,Field=regionCode,Values=eu-north-1 to restrict the listing to products in eu-north-1 region, the number of products for AmazonSageMaker alone across all regions is huge, so better to let AWS to filter out those early
jq is used to further filter the output, it seems it's not possible to filter by productFamily at the aws pricing get-products so we need to do it with jq
-r removes the quotes from the output
.PriceList[] will iterate over all the prices returned by aws pricing get-products
fromjson will parse the each string as JSON (.PriceList is an array of strings)
select(.product.productFamily == "ML Instance") will filter out all other products
.product.attributes.instanceName extracts the instance type from each product.

AWS Printing DynamoDB Table Via CLI

I'm trying to find the right command to use in the CLI to print the contents of a table within DynamoDB.
I've tried using the following command but it gives me a "parameter validation failed" error.
`
aws dynamodb get-item \
--table-name Traffic \
--key file://traffic.json \
--return-consumed-capacity TOTAL
`
The AWS website is giving me a 403 error, at the moment, so I can't search for the solution through the official site.
To get all items in a table, use a scan operation, not a get item operation. This basic scan operation works fine with the CLI:
aws dynamodb scan --table-name Work
You can find all valid options here:
https://docs.aws.amazon.com/cli/latest/reference/dynamodb/scan.html
You can run the Scan API to output how the table looks in DynamoDB JSON format.
aws dynamodb scan \
--table-name test \
--output text
If you have a list of keys to fetch in your traffic.json file then you should use batch-get-item.
If it's a single item you need then please share the contents of traffic.json file.

How to get list of available AWS services in a region from boto3 call

I want to use boto3 to get list of available aws services in a specific region. Is there any way to do this.
I tried using Session object:
session = boto3.Session(region_name='ap-south-1').get_available_services()
but it is giving me all the AWS services. For eg: Cloudsearch is not present in ap-south-1, but this function still gives me the service in the output.
Also, I don't want to use ssm service get_parameters_by_path function as I don't want to give ssm permission.
Any other way?
To be frank, I reckon, your best bet actually is the Systems Manager Parameter Store.
For example, you can easily display a complete list of all available AWS services, sort them into alphabetical order, and, for the brevity, show the first 10.
$ aws ssm get-parameters-by-path \
--path /aws/service/global-infrastructure/services --output json | \
jq '.Parameters[].Name' | sort | head -10
Output:
"/aws/service/global-infrastructure/services/acm"
"/aws/service/global-infrastructure/services/acm-pca"
"/aws/service/global-infrastructure/services/alexaforbusiness"
"/aws/service/global-infrastructure/services/apigateway"
"/aws/service/global-infrastructure/services/application-autoscaling"
"/aws/service/global-infrastructure/services/appmesh"
"/aws/service/global-infrastructure/services/appstream"
"/aws/service/global-infrastructure/services/appsync"
"/aws/service/global-infrastructure/services/athena"
"/aws/service/global-infrastructure/services/autoscaling"
And here's how to get the list of services that are available in a given region. Show first 10 and sorted.
$ aws ssm get-parameters-by-path \
--path /aws/service/global-infrastructure/regions/us-east-1/services --output json | \
jq '.Parameters[].Name' | sort | head -10
But... if you want any other way you might want to try AWS Price List API.
With the AWS Price List Query API, you can query specific information about AWS services, products, and pricing using an AWS SDK or the AWS CLI.
This obviously can be narrowed down to a specific region. If there's a price, there is a service.
I got this by below code:
resp = boto3.Session().get_available_regions('cloudsearch')
This gave me the list of all the regions where cloudsearch service is available.

Can I limit the results of an aws rds snapshot query to a certain timeframe?

I have used the following query to find all rds snapshots, that were created after a certain date:
aws rds describe-db-snapshots --db-instance-identifier db-identifier --snapshot-type awsbackup --query 'DBSnapshots[?SnapshotCreateTime>=`Date`].{DBSnapshotIdentifier:DBSnapshotIdentifier,SnapshotCreateTime:SnapshotCreateTime}.sort_by(#,&SnapshotCreateTime)' --output json
What I am trying to do now, is to limit the result to within two hours of the desired point in time, e.g. >=Date1, but <=Date2.
My approach so far has been try to add a second argument to the query, like so:
aws rds describe-db-snapshots --db-instance-identifier db-identifier --snapshot-type awsbackup --query 'DBSnapshots[?SnapshotCreateTime>=`Date1`<=`Date2`].{DBSnapshotIdentifier:DBSnapshotIdentifier,SnapshotCreateTime:SnapshotCreateTime}.sort_by(#,&SnapshotCreateTime)' --output json
but this results in an empty list being returned.
Is what I am trying to do here even possible, without using jq?
The answer was quite simple in the end, different query conditions can easily be combined with a && operator. So in this case, the solution was:
aws rds describe-db-snapshots --db-instance-identifier db-identifier --snapshot-type awsbackup --query 'DBSnapshots[?SnapshotCreateTime>=`Date1`&&SnapshotCreateTime<=`Date2`].{DBSnapshotIdentifier:DBSnapshotIdentifier,SnapshotCreateTime:SnapshotCreateTime}.sort_by(#,&SnapshotCreateTime)' --output json

How can I get the name of the most recent snapshot for an RDS DB instance using the AWS CLI?

Using the AWS CLI, how can I get the most recent snapshot for a particular DB instance?
I can get them through the GUI easily but I'd like to automate it.
You can use the aws rds describe-db-snapshots CLI command to get a list of DB snapshots, and then run a local query using --query to get the latest DB snapshot using the SnapshotCreateTime field.
SnapshotCreateTime -> (timestamp)
Specifies when the snapshot was taken in Coordinated Universal Time (UTC). Changes for the copy when the snapshot is copied.
Something like this:
aws rds describe-db-snapshots \
--db-instance-identifier your-id \
--query "sort_by(DBSnapshots, &SnapshotCreateTime)[-1].{id:DBSnapshotIdentifier,time:SnapshotCreateTime}"
Note that this query sorts snapshots by their ascending SnapshotCreateTime and then simply takes the last one in the list (as dictated by [-1]), which will be the one that was last created.
[Added] if you're looking for snapshots of Aurora DB clusters then you'd have to use describe-db-cluster-snapshots in place of describe-db-snapshots, but otherwise the process is similar: use DBClusterSnapshots and DBClusterSnapshotIdentifier (in place of DBSnapshots and DBSnapshotIdentifier).