Using jmespath and given the below json, how would I filter so only JobNames starting with "analytics" are returned?
For more context, the json was returned by the aws cli command aws glue list-jobs
{
"JobNames": [
"analytics-job1",
"analytics-job2",
"team2-job"
]
}
Tried this
JobNames[?starts_with(JobNames, `analytics`)]
but it failed with
In function starts_with(), invalid type for value: None, expected one
of: ['string'], received: "null"
Above I extracted the jmespath bit, but here is the entire aws cli command I tried and failed is this
aws glue list-jobs --query '{"as_string": to_string(JobNames[?starts_with(JobNames, `analytics`)])}'
I couldn't test it on list-jobs but the query part works on list-crawlers. Just replaced the JobNames with CrawlerNames.
aws glue list-jobs --query 'JobNames[?starts_with(#, `analytics`) == `true`]'
I am using AWS Batch with ECS to perform a job which need to send a request to Athena. I use python boto3 to send the query and the get the request status :
start_query_execution : work fine
get_query_execution : have an error !
When I try to get the query execution I have the following error :
{'QueryExecution': {'QueryExecutionId': 'XXXX', 'Query': "SELECT * FROM my_table LIMIT 10 ", 'StatementType': 'DML', 'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/query_id.csv'}, 'QueryExecutionContext': {'Database': 'my_database'}, 'Status': {'State': 'FAILED', 'StateChangeReason': '**Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 4.**. ; S3 Extended Request ID: ....=)'
I have the all permissions to the container role (only to test) :
s3:*
athena : *
glue : *
I face this problem only in container in AWS batch : with the same policy and code in a lambda it's working !
Any help will be appreciated.
In Athena Output location what I have been using Athena bucket name not file name.
As result set will be generated which will have its own id
'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/'}
If ypu are not sure of the bucket for query you can check in query console -->settings
I am having an error when trying to automate AWS DataSource creation from S3:
I am running a shell script:
#!/bin/bash
for k in 1 2 3 4 5
do
aws machinelearning create-data-source-from-s3 --cli-input-json file://data/cfg/dsrc_training_00$k.json
aws machinelearning create-data-source-from-s3 --cli-input-json file://data/cfg/dsrc_validate_00$k.json
done
and here is an example of the json file it references:
{
"DataSourceId": "Iris_training_00{k}",
"DataSourceName": "[DS Iris] training 00{k}",
"DataSpec": {
"DataLocationS3": "s3://ml-test-predicto-bucket/shuffled_{k}.csv",
"DataSchemaLocationS3": "s3://ml-test-predicto-bucket/dsrc_iris.csv.schema",
"DataRearrangement": {"splitting":{"percentBegin" : 0, "percentEnd" : 70}}
},
"ComputeStatistics": true
}
But when I run my script from the command line I get the error:
Parameter validation failed:
Invalid type for parameter DataSpec.DataRearrangement, value: {u'splitting': {u'percentEnd': u'100', u'percentBegin': u'70'}}, type: <type 'dict'>, valid types: <type 'basestring'>
Can someone please help, I have looked at the API AWS ML documentation and I think I am doing everything right, but I can't seem to solve this error... many thanks !
The DataRearrangement element expects a JSON String object. You are passing a dictionary object.
Change:
"DataRearrangement": {"splitting":{"percentBegin" : 0, "percentEnd" : 70}}
[to]
"DataRearrangement": "{\"splitting\":{\"percentBegin\":0,\"percentEnd\":70}}"
There seems to be about a hundred AWS products available. The only way to get an authoritative listing of them is to look on the web.
Is there any API that could give me a list of all currently available AWS products, ideally with some metadata about each one (product title, description, what regions and edge locations it's available in, etc)?
Python API libraries Boto3 and Botocore. I am providing a code snippet to list the services. You have to look at the docs to get other info you want.
>>> import boto3
>>> session = boto3.Session()
>>> session.get_available_services()
['acm', 'apigateway', 'application-autoscaling', 'appstream', 'autoscaling', 'batch', 'budgets', 'clouddirectory', 'cloudformation', 'cloudfront', 'cloudhsm', 'cloudsearch', 'cloudsearchdomain', 'cloudtrail', 'cloudwatch', 'codebuild', 'codecommit', 'codedeploy', 'codepipeline', 'cognito-identity', 'cognito-idp', 'cognito-sync', 'config', 'cur', 'datapipeline', 'devicefarm', 'directconnect', 'discovery', 'dms', 'ds', 'dynamodb', 'dynamodbstreams', 'ec2', 'ecr', 'ecs', 'efs', 'elasticache', 'elasticbeanstalk', 'elastictranscoder', 'elb', 'elbv2', 'emr', 'es', 'events', 'firehose', 'gamelift', 'glacier', 'health', 'iam', 'importexport', 'inspector', 'iot', 'iot-data', 'kinesis', 'kinesisanalytics', 'kms', 'lambda', 'lex-runtime', 'lightsail', 'logs', 'machinelearning', 'marketplacecommerceanalytics', 'meteringmarketplace', 'opsworks', 'opsworkscm', 'pinpoint', 'polly', 'rds', 'redshift', 'rekognition', 'route53', 'route53domains', 's3', 'sdb', 'servicecatalog', 'ses', 'shield', 'sms', 'snowball', 'sns', 'sqs', 'ssm', 'stepfunctions', 'storagegateway', 'sts', 'support', 'swf', 'waf', 'waf-regional', 'workspaces', 'xray']
>>> for item, service in (enumerate(session.get_available_services(), 1)):
... print item, service
...
1 acm
2 apigateway
3 application-autoscaling
4 appstream
5 autoscaling
6 batch
7 budgets
8 clouddirectory
9 cloudformation
10 cloudfront
11 cloudhsm
12 cloudsearch
13 cloudsearchdomain
14 cloudtrail
15 cloudwatch
16 codebuild
17 codecommit
18 codedeploy
19 codepipeline
20 cognito-identity
21 cognito-idp
22 cognito-sync
23 config
24 cur
25 datapipeline
26 devicefarm
27 directconnect
28 discovery
29 dms
30 ds
31 dynamodb
32 dynamodbstreams
33 ec2
34 ecr
35 ecs
36 efs
37 elasticache
38 elasticbeanstalk
39 elastictranscoder
40 elb
41 elbv2
42 emr
43 es
44 events
45 firehose
46 gamelift
47 glacier
48 health
49 iam
50 importexport
51 inspector
52 iot
53 iot-data
54 kinesis
55 kinesisanalytics
56 kms
57 lambda
58 lex-runtime
59 lightsail
60 logs
61 machinelearning
62 marketplacecommerceanalytics
63 meteringmarketplace
64 opsworks
65 opsworkscm
66 pinpoint
67 polly
68 rds
69 redshift
70 rekognition
71 route53
72 route53domains
73 s3
74 sdb
75 servicecatalog
76 ses
77 shield
78 sms
79 snowball
80 sns
81 sqs
82 ssm
83 stepfunctions
84 storagegateway
85 sts
86 support
87 swf
88 waf
89 waf-regional
90 workspaces
91 xray
One way is to make use of aws command line interface to get the list of available services and make use of their corresponding describe or list commands to get the configured/available services.
This can be done using SSM Parameter Feature.It returns the Service/Product Full Name.
Below is a sample AWS Lambda Code
import json
import boto3
def lambda_handler(event, context):
service_list = []
ssmClient = boto3.client("ssm", region_name = "us-east-1")
list_servie_path = ssmClient.get_parameters_by_path(
Path = "/aws/service/global-infrastructure/services"
)
if len(list_servie_path["Parameters"]) > 0:
for pathData in list_servie_path["Parameters"]:
list_servie_names = ssmClient.get_parameters_by_path(
Path = pathData["Name"]
)
service_list.append(list_servie_names["Parameters"][0]["Value"])
if "NextToken" in list_servie_path:
NextToken = list_servie_path["NextToken"]
while True:
list_servie_path = ssmClient.get_parameters_by_path(
Path = "/aws/service/global-infrastructure/services",
NextToken = NextToken
)
if len(list_servie_path["Parameters"]) > 0:
for pathData in list_servie_path["Parameters"]:
list_servie_names = ssmClient.get_parameters_by_path(
Path = pathData["Name"]
)
service_list.append(list_servie_names["Parameters"][0]["Value"])
if "NextToken" in list_servie_path:
NextToken = list_servie_path["NextToken"]
else:
break
print(len(service_list))
service_list.sort(key=lambda x:(not x.islower(), x))
return service_list
Sample Output :
"AWS Data Exchange",
"AWS Data Pipeline",
"AWS DataSync",
"AWS Database Migration Service",
"AWS DeepComposer",
"AWS DeepLens",
"AWS DeepRacer",
"AWS Device Farm",
"AWS Direct Connect",
"AWS Directory Service",
"AWS Elastic Beanstalk",
"AWS Elemental MediaStore Data Plane",
"AWS Elemental MediaTailor",
"AWS EventBridge Schemas",
"Amazon CloudFront",
"Amazon CloudSearch",
"Amazon CloudWatch",
"Amazon CloudWatch Application Insights",
"Amazon CloudWatch Events",
"Amazon CloudWatch Evidently",
"Amazon CloudWatch Logs",
"Amazon CloudWatch Synthetics",
"Amazon CodeGuru",
Hope, this helps..
Interestingly enough, I suspect the most complete source for this information (at very fine level of detail) is the Price List API.
For example:
To find a list of all available offer files, download the offer index file. Note what it provides:
Offer index file – A JSON file that lists the supported AWS services, with a URL for each offer file where you can download pricing details. The file also includes metadata about the offer index file itself, URLs for service offer files, and URLs for regional offer index files.
http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/price-changes.html
In turn, the individual service files detail all of the pricing information for all possible service elements.
One particularly useful example is the case of EC2, the various instance type attributes are provided here among the pricing data -- you'll find things like processor model, clock speed, number of CPUs, etc., detailed.
Quick perl script to get data by scraping html from /products page. This will get a nice json data set.
#!/usr/bin/perl
#
#This script is intended to simply piece togather a json file for available JSON services.
#
use v5.16.1;
use strict;
use warnings;
use JSON;
my ($category,%data, %opts, $marker);
my $count = 1;
my #foo = `curl https://aws.amazon.com/products/`;
foreach my $line (#foo) {
if ($line =~ /<h6> <a href.*?>(.*?)<i class/) {
$category = $1;
next;
}
if ($line =~ /^\s*<a href="https:\/\/aws.amazon.com\/.*?\/?(.*?)\/\?nc2.*?>(.*?)<span>(.*?)<\/span/) {
$data{category}{$category}{services}{$1}{name} = $2;
$data{category}{$category}{services}{$1}{description} = $3;
}
}
my $json = encode_json \%data;
say $json;
exit;
Ensure you have installed perl JSON module. Usage:
script_name.pl | python -m json.tool > your_json_file.json
Example output:
"Storage": {
"services": {
"ebs": {
"description": "EC2 Block Storage Volumes",
"name": "Amazon Elastic Block Store (EBS)"
},
"efs": {
"description": "Fully Managed File System for EC2",
"name": "Amazon Elastic File System (EFS)"
},
"glacier": {
"description": "Low-cost Archive Storage in the Cloud",
"name": "Amazon Glacier"
},
It will work up till they change that page :)
Not sure but it will list all the products to which the caller has access. But you can use the search_products() API in boto3 or searchProducts in sdk for listing the product. Perform assume role and call this api. For refernce,
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/servicecatalog.html#ServiceCatalog.Client.search_products
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/ServiceCatalog.html#searchProducts-property