Do any AWS API scripts exist in a repository somewhere? - amazon-web-services

I want to start 10 instances, get their instance id's and get their private IP addresses.
I know this can be done using AWS CLI, I'm wondering if there are any such scripts already written so I don't have to reinvent the wheel.
Thanks

I recommend to use python and boto package for such automation. Python is more clear than bash. You can user following page as starting point: http://boto.readthedocs.org/en/latest/ec2_tut.html

In the off chance that someone in the future comes across my question, I thought I'd give my (somewhat) final solution.
Using python and the Boto package that was suggested, I have the following python script.
It's pretty well commented but feel free to ask if you have any questions.
import boto
import time
import sys
IMAGE = 'ami-xxxxxxxx'
KEY_NAME = 'xxxxx'
INSTANCE_TYPE = 't1.micro'
SECURITY_GROUPS = ['xxxxxx'] # If multiple, separate by commas
COUNT = 2 #number of servers to start
private_dns = [] # will be populated with private dns of each instance
print 'Connecting to AWS'
conn = boto.connect_ec2()
print 'Starting instances'
#start instance
reservation = conn.run_instances(IMAGE, instance_type=INSTANCE_TYPE, key_name=KEY_NAME, security_groups=SECURITY_GROUPS, min_count=COUNT, max_count=COUNT)#, dry_run=True)
#print reservation #debug
print 'Waiting for instances to start'
# ONLY CHECKS IF RUNNING, MAY NOT BE SSH READY
for instance in reservation.instances: #doing this for every instance we started
while not instance.update() == 'running': #while it's not running (probably 'pending')
print '.', # trailing comma is intentional to print on same line
sys.stdout.flush() # make the thing print immediately instead of buffering
time.sleep(2) # Let the instance start up
print 'Done\n'
for instance in reservation.instances:
instance.add_tag("Name","Hadoop Ecosystem") # tag the instance
private_dns.append(instance.private_dns_name) # adding ip to array
print instance, 'is ready at', instance.private_dns_name # print to console
print private_dns

Related

SageMaker PyTorch estimator.fit freezes when running in local mode from EC2

I am trying to train a PyTorch model through SageMaker. I am running a script main.py (which I have posted a minimum working example of below) which calls a PyTorch Estimator. I have the code for training my model saved as a separate script, train.py, which is called by the entry_point parameter of the Estimator. These scripts are hosted on a EC2 instance in the same AWS region as my SageMaker domain.
When I try running this with instance_type = "ml.m5.4xlarge", it works ok. However, I am unable to debug any problems in train.py. Any bugs in that file simply give me the error: 'AlgorithmError: ExecuteUserScriptError', and will not allow me to set breakpoint() lines in train.py (encountering a breakpoint throws the above error).
Instead I am trying to run in local mode, which I believe does allow for breakpoints. However, when I reach estimator.fit(inputs), it hangs on that line indefinitely, giving no output. Any print statements that I put at the start of the main function in train.py are not reached. This is true no matter what code I put in train.py. It also did not throw an error when I had an illegal underscore in the base_job_name parameter of the estimator, which suggests that it does not even create the estimator instance.
Below is a minimum example which replicates the issue on my instance. Any help would be appreciated.
### File structure
main.py
customcode/
|
|_ train.py
### main.py
import sagemaker
from sagemaker.pytorch import PyTorch
import boto3
try:
# When running on Studio.
sess = sagemaker.session.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
except ValueError:
# When running from EC2 or local machine.
print('Performing manual setup.')
bucket = 'MY-BUCKET-NAME'
region = 'us-east-2'
role = 'arn:aws:iam::MY-ACCOUNT-NUMBER:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXXXXX'
iam = boto3.client("iam")
sagemaker_client = boto3.client("sagemaker")
boto3.setup_default_session(region_name=region, profile_name="default")
sess = sagemaker.Session(sagemaker_client=sagemaker_client, default_bucket=bucket)
hyperparameters = {'epochs': 10}
inputs = {'data': f's3://{bucket}/features'}
train_instance_type = 'local'
hosted_estimator = PyTorch(
source_dir='customcode',
entry_point='train.py',
instance_type=train_instance_type,
instance_count=1,
hyperparameters=hyperparameters,
role=role,
base_job_name='mwe-train',
framework_version='1.12',
py_version='py38',
input_mode='FastFile',
)
hosted_estimator.fit(inputs) # This is the line that freezes
### train.py
def main():
breakpoint() # Throws an error in non-local mode.
return
if __name__ == '__main__':
print('Reached') # Never reached in local mode.
main()

Retrieving results from Mturk Sandbox

I'm working on retrieving my HIT results from my local computer. I followed the template of get_results.py and entered my key_id, access_key correctly, and installed xmltodict but got the error message. Could anyone help me figure out why? Here is my HIT address if anyone needs the format of my HIT https://workersandbox.mturk.com/mturk/preview?groupId=3MKP0VNPM2VVY0K5UTNZX9OO9Q8RJE
import boto3
mturk = boto3.client('mturk',
aws_access_key_id = "PASTE_YOUR_IAM_USER_ACCESS_KEY",
aws_secret_access_key = "PASTE_YOUR_IAM_USER_SECRET_KEY",
region_name='us-east-1',
endpoint_url = MTURK_SANDBOX
)
# You will need the following library
# to help parse the XML answers supplied from MTurk
# Install it in your local environment with
# pip install xmltodict
import xmltodict
# Use the hit_id previously created
hit_id = 'PASTE_IN_YOUR_HIT_ID'
# We are only publishing this task to one Worker
# So we will get back an array with one item if it has been completed
worker_results = mturk.list_assignments_for_hit(HITId=hit_id, AssignmentStatuses=['Submitted'])

How to get the opened port (firewall) of an instance in GCP using gcloud?

For example, I have an instance named test, now I need to know which port is opened for this instance. How do I do it in command line using gcloud?
Not the port opened inside the instance, but the Firewall attached to this instance.
Using Cloud Console, we can list your VM instances. If we click on the vertical "three dots", we will find an entry that is called "View network details". From there we see a panel that looks like:
This seems to show all the rules for the firewall for this VM instance.
It is possible to see the firewall rules associated with an instance using the cloud shell but in 2 steps. First, we have to run the following command to get the instance detail to see the firewall tags:
gcloud compute instances describe instance-name
In the output, you will see the firewall tags like the following:
output of above command
then run the following command to see, to which firewall rule these tags are attached to.
gcloud compute firewall-rules list --format="table(name,network,
direction,priority,sourceRanges.list():label=SRC_RANGES,
destinationRanges.list():label=DEST_RANGES,
allowed[].map().firewall_rule().list():label=ALLOW,
denied[].map().firewall_rule().list():label=DENY,
sourceTags.list():label=SRC_TAGS,
sourceServiceAccounts.list():label=SRC_SVC_ACCT,
targetTags.list():label=TARGET_TAGS,
targetServiceAccounts.list():label=TARGET_SVC_ACCT,
disabled
)"
it will give the output like below:
output of above command
The Cloud SDK does not have a command for this requirement. You can use gcloud to list firewall-rules, you can use gcloud to list compute instances. You will have to apply external logic to map the two together.
Firewall rules are associated with compute engine instances via several methods:
By a target tag
By a service account
For all instances in the network
Therefore, first display all of the compute engine instances, fetch each one's service account and tags. Then display all the firewall-rules, fetch the targets for each rule. Then match everything together and print a list of open ports for an instance, or all instances.
This is too complex a task for the CLI. You will either need to write a program to do this, or implement a script to process, sort and sync the CLI outputs.
I realize this questions is a bit old but I wanted to add a detailed answer that fully automates what you need, as I needed it as well. Hopefully others will find it useful.
As mentioned above, GCP firewall rules can be applied via three methods:
Network tags
Service accounts
VPC membership
This data can be extracted via two different gcloud commands, but you must connect the dots between each individual compute instance and the three items I mention above.
I actually have a requirement to do this at work in order to generate nmap scripts that target instances with ports exposed to the public Internet. So I filter down a bit to include only instances that are running, have public IP addresses, and are correlated to firewall rules with a source range of 0.0.0.0/0.
Here is how I do it.
First, I generate JSON files that are specifically formatted and filtered with the data I need. I use these two commands:
gcloud compute firewall-rules list \
--format="json(name,allowed[].map().firewall_rule().list(),network,
targetServiceAccounts.list(),targetTags.list())" \
--filter="direction:INGRESS AND disabled:False AND
sourceRanges.list():0.0.0.0/0 AND
allowed[].map().firewall_rule().list():*" \
| tee ./json-data/"$i"/firewall-rules.json
gcloud compute instances list \
--format="json(name,networkInterfaces[].accessConfigs[0].natIP,
serviceAccounts[].email,tags.items[],networkInterfaces[].network)" \
--filter="networkInterfaces[].accessConfigs[0].type:ONE_TO_ONE_NAT
AND status:running" \
| tee ./json-data/"$i"/compute-instances.json
Then, I use this python script to process the the files generated above. It will create a directory in your CWD called out-firewall-data that contains three files:
applied-rules.csv (instance,external_ip,allowed_tcp,allowed_udp)
run-nmap.sh (a script to port scan them)
run-masscan.sh (faster scanner for instances with 1-65535 exposed)
I'm hosting this script in a GitLab repo here and will probably continue development of it there. It also supports auditing many projects at once, you can view the instructions in the repository.
You can run this script like gen_nmap.py --single /path/to/jsons/
#!/usr/bin/env python3
"""
Process gcloud output to determine applied firewall rules.
Firewall rules are applied via multiple methods and Google does not provide
an easy way to script what rules are actually applied to what compute
instances.
Please see the included README for detailed instructions.
"""
import glob
import sys
import os
import json
import argparse
def process_args():
"""Handles user-passed parameters"""
parser = argparse.ArgumentParser()
target = parser.add_mutually_exclusive_group(required=True)
target.add_argument('--single', '-s', type=str, action='store',
help='Single directory containing json files.')
target.add_argument('--multi', '-m', type=str, action='store',
help='Root directory contains multiple subdirectories'
' of json files')
args = parser.parse_args()
if args.single:
target = os.path.abspath(args.single)
else:
target = os.path.abspath(args.multi)
# Before moving on, validate all the input data is present
if not os.path.isdir(target):
print("[!] That directory does not exist. Please try again.")
sys.exit()
return args
def parse_json(file):
"""
Loads the json data from a file into memory
"""
# If used in multi mode, there is a good chance we hit a lot of empty
# or missing files. We'll return empty data on those so the program can
# continue with the next directory.
if not os.path.isfile(file):
return {}
with open(file, 'r') as infile:
try:
data = json.load(infile)
except json.decoder.JSONDecodeError:
return {}
return data
def cleanup_rules(rules):
"""
Extracts details from firewall rules for easier processing
"""
clean_rules = []
for rule in rules:
name = rule['name']
udp_ports = []
tcp_ports = []
if 'all' in rule['allowed']:
tcp_ports = ['all']
udp_ports = ['all']
else:
for ports in rule['allowed']:
if 'tcp' in ports:
tcp_ports = [port.replace('tcp:', '') for port in ports.split(',')]
if 'udp' in ports:
udp_ports = [port.replace('udp:', '') for port in ports.split(',')]
# If a rule set has no target tags and no target svc account
# then it is applied at the VPC level, so we grab that here.
if 'targetServiceAccounts' not in rule and 'targetTags' not in rule:
network = rule['network']
# Otherwise, we are not interested in the network and can discard
# it so that future functions will not think rules are applied
# network-wide.
else:
network = ''
# Tags and target svc accounts may or may not exist
if 'targetTags' in rule:
net_tags = rule['targetTags'].split(',')
else:
net_tags = []
if 'targetServiceAccounts' in rule:
svc_tags = rule['targetServiceAccounts'].split(',')
else:
svc_tags = []
clean_rules.append({'name': name,
'tcp_ports': tcp_ports,
'udp_ports': udp_ports,
'net_tags': net_tags,
'svc_tags': svc_tags,
'network': network})
return clean_rules
def cleanup_instances(instances):
"""
Extracts details from instace data for easier processing
"""
clean_instances = []
for instance in instances:
# The following values should exist for each instance due to the
# gcloud filtering used.
name = instance['name']
networks = [interface['network'] for interface in instance['networkInterfaces']]
external_ip = instance['networkInterfaces'][0]['accessConfigs'][0]['natIP']
# The following values may or may not exist, it depends how the
# instance is configured.
if 'serviceAccounts' in instance:
svc_account = instance['serviceAccounts'][0]['email']
else:
svc_account = ''
if 'tags' in instance:
tags = instance['tags']['items']
else:
tags = []
clean_instances.append({'name': name,
'tags': tags,
'svc_account': svc_account,
'networks': networks,
'external_ip': external_ip})
return clean_instances
def merge_dict(applied_rules, rule, instance):
"""
Adds or updates final entries into dictionary
Using a discrete function as several functions update this dictionary, so
we need to check for the existence of a key and then decide to create or
update it.
"""
name = instance['name']
if name in applied_rules:
applied_rules[name]['allowed_tcp'].update(rule['tcp_ports'])
applied_rules[name]['allowed_udp'].update(rule['udp_ports'])
else:
applied_rules[name] = {'external_ip': instance['external_ip'],
'allowed_tcp': set(rule['tcp_ports']),
'allowed_udp': set(rule['udp_ports'])}
return applied_rules
def process_tagged_rules(applied_rules, rules, instances):
"""
Extracts effective firewall rules applied by network tags on instances
"""
for rule in rules:
for instance in instances:
for tag in rule['net_tags']:
if tag in instance['tags']:
applied_rules = merge_dict(applied_rules, rule, instance)
return applied_rules
def process_vpc_rules(applied_rules, rules, instances):
"""
Extracts effective firewall rules applied by VPC membership
"""
for rule in rules:
for instance in instances:
# In the cleaning function, we only applied a network tag if the
# rule is applied to the whole VPC. So a match means it applies.
if rule['network'] and rule['network'] in instance['networks']:
applied_rules = merge_dict(applied_rules, rule, instance)
return applied_rules
def process_svc_rules(applied_rules, rules, instances):
"""
Extracts effective firewall rules applied by service accounts
"""
for rule in rules:
if rule['svc_tags']:
for instance in instances:
if instance['svc_account'] in rule['svc_tags']:
applied_rules = merge_dict(applied_rules, rule, instance)
return applied_rules
def process_output(applied_rules):
"""
Takes the python dictionary format and output several useful files
"""
if not applied_rules:
print("[!] No publicly exposed ports, sorry!")
sys.exit()
print("[*] Processing output for {} instances with exposed ports"
.format(len(applied_rules)))
out_dir = 'out-firewall-data'
if not os.path.exists(out_dir):
os.makedirs(out_dir)
# First, write the raw data in CSV
with open(out_dir + '/applied-rules.csv', 'w') as outfile:
outfile.write("name,external_ip,allowed_tcp,allowed_udp\n")
for i in applied_rules:
outfile.write("{},{},{},{}\n"
.format(i,
applied_rules[i]['external_ip'],
applied_rules[i]['allowed_tcp'],
applied_rules[i]['allowed_udp'])
.replace("set()", ""))
# Next, make an nmap script
nmap_tcp = 'nmap --open -Pn -sV -oX {}-tcp.xml {} -p {}\n'
nmap_tcp_common = 'nmap --open -Pn -sV -oX {}-tcp.xml {}\n'
nmap_udp = 'sudo nmap --open -Pn -sU -sV -oX {}-udp.xml {} -p {}\n'
nmap_udp_common = 'sudo nmap --open -Pn -sU -sV -oX {}-udp.xml {} -F\n'
with open(out_dir + '/run-nmap.sh', 'w') as outfile:
for name in applied_rules:
external_ip = applied_rules[name]['external_ip']
# "All" as a rule will apply to both TCP and UDP. These get special
# nmap commands to do only the common ports (full range is too slow
# and will be handled with masscan commands.
if set(['all']) in applied_rules[name].values():
outfile.write("echo running common TCP/UDP scans against {}\n"
.format(name))
outfile.write(nmap_tcp_common.format(name, external_ip))
outfile.write(nmap_udp_common.format(name, external_ip))
else:
if applied_rules[name]['allowed_tcp']:
ports = ','.join(applied_rules[name]['allowed_tcp'])
outfile.write("echo running TCP scans against {}\n"
.format(name))
outfile.write(nmap_tcp.format(name, external_ip, ports))
if applied_rules[name]['allowed_udp']:
ports = ','.join(applied_rules[name]['allowed_udp'])
outfile.write("echo running UDP scans against {}\n"
.format(name))
outfile.write(nmap_udp.format(name, external_ip, ports))
# Now, write masscan script for machines with all TCP ports open
masscan = 'sudo masscan -p{} {} --rate=1000 --open-only -oX {}.xml\n'
with open(out_dir + '/run-masscan.sh', 'w') as outfile:
for name in applied_rules:
external_ip = applied_rules[name]['external_ip']
if set(['all']) in applied_rules[name].values():
outfile.write("echo running full masscan against {}\n"
.format(name))
outfile.write(masscan.format('1-65535', external_ip, name))
print("[+] Wrote some files to {}, enjoy!".format(out_dir))
def main():
"""
Main function to parse json files and write analyzed output
"""
args = process_args()
applied_rules = {}
rules = []
instances = []
# Functions below in a loop based on whether we are targeting json files
# in a single directory or a tree with multiple project subdirectories.
if args.multi:
targets = glob.glob(args.multi + '/*')
else:
targets = [args.single]
for target in targets:
rules = parse_json(target + '/firewall-rules.json')
instances = parse_json(target + '/compute-instances.json')
if not rules or not instances:
print("[!] No valid data in {}".format(target))
continue
# Clean the data up a bit
rules = cleanup_rules(rules)
print("[*] Processed {} firewall rules in {}"
.format(len(rules), target))
instances = cleanup_instances(instances)
print("[*] Processed {} instances in {}"
.format(len(instances), target))
# Connect the dots and build out the applied rules dictionary
applied_rules = process_tagged_rules(applied_rules, rules, instances)
applied_rules = process_vpc_rules(applied_rules, rules, instances)
applied_rules = process_svc_rules(applied_rules, rules, instances)
# Process data and create various output files
process_output(applied_rules)
if __name__ == '__main__':
main()

Cloudfront facts using Ansible

I need to retrieve the DNS name of my Cloudfront instance (eg. 1234567890abcd.cloudfront.net) and was wondering if there is a quick way to get this in Ansible without resorting to the AWS CLI.
From gleaming the Extra Modules source it would appear there is not a module for this. How are other people getting this attribute?
You can either write your own module or you can write a filter plugin in a few lines and accomplish the same thing.
Example of writing a filter in Ansible. Lets name this file aws.py in your filter_plugins/aws.py
import boto3
import botocore
from ansible import errors
def get_cloudfront_dns(region, dist_id):
""" Return the dns name of the cloudfront distribution id.
Args:
region (str): The AWS region.
dist_id (str): distribution id
Basic Usage:
>>> get_cloudfront_dns('us-west-2', 'E123456LHXOD5FK')
'1234567890abcd.cloudfront.net'
"""
client = boto3.client('cloudfront', region)
domain_name = None
try:
domain_name = (
client
.get_distribution(Id=dist_id)['Distribution']['DomainName']
)
except Exception as e:
if isinstance(e, botocore.exceptions.ClientError):
raise e
else:
raise errors.AnsibleFilterError(
'Could not retreive the dns name for CloudFront Dist ID {0}: {1}'.format(dist_id, str(e))
)
return domain_name
class FilterModule(object):
''' Ansible core jinja2 filters '''
def filters(self):
return {'get_cloudfront_dns': get_cloudfront_dns,}
In order to use this plugin, you just need to call it.
dns_entry: "{{ 'us-west-2' | get_cloudfront_dns('123434JHJHJH') }}"
Keep in mind, you will need boto3 and botocore installed, in order to use this this plugin.
I have a bunch of examples located in my repo linuxdynasty ld-ansible-filters repo
I ended up writing a module for this (cloudfront_facts.py) that has been accepted into Ansible 2.3.0.

EC2 instance loads my user-data script but doesn't run it

Code:
#!/usr/bin/env python
import boto.ec2
conn_ec2 = boto.ec2.connect_to_region('us-east-1') # access keys are environment vars
my_code = """#!/usr/bin/env python
import sys
sys.stdout = open('file', 'w')
print 'test'
"""
reservation = conn_ec2.run_instances(image_id = 'ami-a73264ce',
key_name = 'backendkey',
instance_type = 't1.micro',
security_groups = ['backend'],
instance_initiated_shutdown_behavior = 'terminate',
user_data = my_code)
The instance is initiated with the proper settings (it's the public Ubuntu 12.04, 64-bit, image) and I can SSH into it normally. The user-data script seems to be loaded correctly: I can see it in /var/lib/cloud/instance/user-data.txt (and also in /var/lib/cloud/instance/scripts/part-001) and on the EC2 console.
But that's it, the script doesn't seem to be executed. Following this answer I checked the /var/log/cloud-init.log file but it doesn't seem to contain any error messages related to my script (well, maybe I'm missing something - here is a gist with the contents of cloud-init.log).
What am I missing?
This is probably not relevant anymore, but yet.
I've just used boto with ubuntu and user data, although the documenation says that the user data has to be base64 encoded, it only worked for me if I pass the 64 bit paramter as regular string.
I read the content of user data from file (using fh.read()) and then just pass this as the user_data paramter to run_instances.
I think it's not working for you because user data can't use any shebang like you used "#!/usr/bin/env python"
On the help page http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html there are two examples one is the standard "#!/bin/bash", and another one looks artificial "#cloud-config". Probably it's only 2 available shebangs. The bash one works for me.