How to get the opened port (firewall) of an instance in GCP using gcloud? - google-cloud-platform

For example, I have an instance named test, now I need to know which port is opened for this instance. How do I do it in command line using gcloud?
Not the port opened inside the instance, but the Firewall attached to this instance.

Using Cloud Console, we can list your VM instances. If we click on the vertical "three dots", we will find an entry that is called "View network details". From there we see a panel that looks like:
This seems to show all the rules for the firewall for this VM instance.

It is possible to see the firewall rules associated with an instance using the cloud shell but in 2 steps. First, we have to run the following command to get the instance detail to see the firewall tags:
gcloud compute instances describe instance-name
In the output, you will see the firewall tags like the following:
output of above command
then run the following command to see, to which firewall rule these tags are attached to.
gcloud compute firewall-rules list --format="table(name,network,
direction,priority,sourceRanges.list():label=SRC_RANGES,
destinationRanges.list():label=DEST_RANGES,
allowed[].map().firewall_rule().list():label=ALLOW,
denied[].map().firewall_rule().list():label=DENY,
sourceTags.list():label=SRC_TAGS,
sourceServiceAccounts.list():label=SRC_SVC_ACCT,
targetTags.list():label=TARGET_TAGS,
targetServiceAccounts.list():label=TARGET_SVC_ACCT,
disabled
)"
it will give the output like below:
output of above command

The Cloud SDK does not have a command for this requirement. You can use gcloud to list firewall-rules, you can use gcloud to list compute instances. You will have to apply external logic to map the two together.
Firewall rules are associated with compute engine instances via several methods:
By a target tag
By a service account
For all instances in the network
Therefore, first display all of the compute engine instances, fetch each one's service account and tags. Then display all the firewall-rules, fetch the targets for each rule. Then match everything together and print a list of open ports for an instance, or all instances.
This is too complex a task for the CLI. You will either need to write a program to do this, or implement a script to process, sort and sync the CLI outputs.

I realize this questions is a bit old but I wanted to add a detailed answer that fully automates what you need, as I needed it as well. Hopefully others will find it useful.
As mentioned above, GCP firewall rules can be applied via three methods:
Network tags
Service accounts
VPC membership
This data can be extracted via two different gcloud commands, but you must connect the dots between each individual compute instance and the three items I mention above.
I actually have a requirement to do this at work in order to generate nmap scripts that target instances with ports exposed to the public Internet. So I filter down a bit to include only instances that are running, have public IP addresses, and are correlated to firewall rules with a source range of 0.0.0.0/0.
Here is how I do it.
First, I generate JSON files that are specifically formatted and filtered with the data I need. I use these two commands:
gcloud compute firewall-rules list \
--format="json(name,allowed[].map().firewall_rule().list(),network,
targetServiceAccounts.list(),targetTags.list())" \
--filter="direction:INGRESS AND disabled:False AND
sourceRanges.list():0.0.0.0/0 AND
allowed[].map().firewall_rule().list():*" \
| tee ./json-data/"$i"/firewall-rules.json
gcloud compute instances list \
--format="json(name,networkInterfaces[].accessConfigs[0].natIP,
serviceAccounts[].email,tags.items[],networkInterfaces[].network)" \
--filter="networkInterfaces[].accessConfigs[0].type:ONE_TO_ONE_NAT
AND status:running" \
| tee ./json-data/"$i"/compute-instances.json
Then, I use this python script to process the the files generated above. It will create a directory in your CWD called out-firewall-data that contains three files:
applied-rules.csv (instance,external_ip,allowed_tcp,allowed_udp)
run-nmap.sh (a script to port scan them)
run-masscan.sh (faster scanner for instances with 1-65535 exposed)
I'm hosting this script in a GitLab repo here and will probably continue development of it there. It also supports auditing many projects at once, you can view the instructions in the repository.
You can run this script like gen_nmap.py --single /path/to/jsons/
#!/usr/bin/env python3
"""
Process gcloud output to determine applied firewall rules.
Firewall rules are applied via multiple methods and Google does not provide
an easy way to script what rules are actually applied to what compute
instances.
Please see the included README for detailed instructions.
"""
import glob
import sys
import os
import json
import argparse
def process_args():
"""Handles user-passed parameters"""
parser = argparse.ArgumentParser()
target = parser.add_mutually_exclusive_group(required=True)
target.add_argument('--single', '-s', type=str, action='store',
help='Single directory containing json files.')
target.add_argument('--multi', '-m', type=str, action='store',
help='Root directory contains multiple subdirectories'
' of json files')
args = parser.parse_args()
if args.single:
target = os.path.abspath(args.single)
else:
target = os.path.abspath(args.multi)
# Before moving on, validate all the input data is present
if not os.path.isdir(target):
print("[!] That directory does not exist. Please try again.")
sys.exit()
return args
def parse_json(file):
"""
Loads the json data from a file into memory
"""
# If used in multi mode, there is a good chance we hit a lot of empty
# or missing files. We'll return empty data on those so the program can
# continue with the next directory.
if not os.path.isfile(file):
return {}
with open(file, 'r') as infile:
try:
data = json.load(infile)
except json.decoder.JSONDecodeError:
return {}
return data
def cleanup_rules(rules):
"""
Extracts details from firewall rules for easier processing
"""
clean_rules = []
for rule in rules:
name = rule['name']
udp_ports = []
tcp_ports = []
if 'all' in rule['allowed']:
tcp_ports = ['all']
udp_ports = ['all']
else:
for ports in rule['allowed']:
if 'tcp' in ports:
tcp_ports = [port.replace('tcp:', '') for port in ports.split(',')]
if 'udp' in ports:
udp_ports = [port.replace('udp:', '') for port in ports.split(',')]
# If a rule set has no target tags and no target svc account
# then it is applied at the VPC level, so we grab that here.
if 'targetServiceAccounts' not in rule and 'targetTags' not in rule:
network = rule['network']
# Otherwise, we are not interested in the network and can discard
# it so that future functions will not think rules are applied
# network-wide.
else:
network = ''
# Tags and target svc accounts may or may not exist
if 'targetTags' in rule:
net_tags = rule['targetTags'].split(',')
else:
net_tags = []
if 'targetServiceAccounts' in rule:
svc_tags = rule['targetServiceAccounts'].split(',')
else:
svc_tags = []
clean_rules.append({'name': name,
'tcp_ports': tcp_ports,
'udp_ports': udp_ports,
'net_tags': net_tags,
'svc_tags': svc_tags,
'network': network})
return clean_rules
def cleanup_instances(instances):
"""
Extracts details from instace data for easier processing
"""
clean_instances = []
for instance in instances:
# The following values should exist for each instance due to the
# gcloud filtering used.
name = instance['name']
networks = [interface['network'] for interface in instance['networkInterfaces']]
external_ip = instance['networkInterfaces'][0]['accessConfigs'][0]['natIP']
# The following values may or may not exist, it depends how the
# instance is configured.
if 'serviceAccounts' in instance:
svc_account = instance['serviceAccounts'][0]['email']
else:
svc_account = ''
if 'tags' in instance:
tags = instance['tags']['items']
else:
tags = []
clean_instances.append({'name': name,
'tags': tags,
'svc_account': svc_account,
'networks': networks,
'external_ip': external_ip})
return clean_instances
def merge_dict(applied_rules, rule, instance):
"""
Adds or updates final entries into dictionary
Using a discrete function as several functions update this dictionary, so
we need to check for the existence of a key and then decide to create or
update it.
"""
name = instance['name']
if name in applied_rules:
applied_rules[name]['allowed_tcp'].update(rule['tcp_ports'])
applied_rules[name]['allowed_udp'].update(rule['udp_ports'])
else:
applied_rules[name] = {'external_ip': instance['external_ip'],
'allowed_tcp': set(rule['tcp_ports']),
'allowed_udp': set(rule['udp_ports'])}
return applied_rules
def process_tagged_rules(applied_rules, rules, instances):
"""
Extracts effective firewall rules applied by network tags on instances
"""
for rule in rules:
for instance in instances:
for tag in rule['net_tags']:
if tag in instance['tags']:
applied_rules = merge_dict(applied_rules, rule, instance)
return applied_rules
def process_vpc_rules(applied_rules, rules, instances):
"""
Extracts effective firewall rules applied by VPC membership
"""
for rule in rules:
for instance in instances:
# In the cleaning function, we only applied a network tag if the
# rule is applied to the whole VPC. So a match means it applies.
if rule['network'] and rule['network'] in instance['networks']:
applied_rules = merge_dict(applied_rules, rule, instance)
return applied_rules
def process_svc_rules(applied_rules, rules, instances):
"""
Extracts effective firewall rules applied by service accounts
"""
for rule in rules:
if rule['svc_tags']:
for instance in instances:
if instance['svc_account'] in rule['svc_tags']:
applied_rules = merge_dict(applied_rules, rule, instance)
return applied_rules
def process_output(applied_rules):
"""
Takes the python dictionary format and output several useful files
"""
if not applied_rules:
print("[!] No publicly exposed ports, sorry!")
sys.exit()
print("[*] Processing output for {} instances with exposed ports"
.format(len(applied_rules)))
out_dir = 'out-firewall-data'
if not os.path.exists(out_dir):
os.makedirs(out_dir)
# First, write the raw data in CSV
with open(out_dir + '/applied-rules.csv', 'w') as outfile:
outfile.write("name,external_ip,allowed_tcp,allowed_udp\n")
for i in applied_rules:
outfile.write("{},{},{},{}\n"
.format(i,
applied_rules[i]['external_ip'],
applied_rules[i]['allowed_tcp'],
applied_rules[i]['allowed_udp'])
.replace("set()", ""))
# Next, make an nmap script
nmap_tcp = 'nmap --open -Pn -sV -oX {}-tcp.xml {} -p {}\n'
nmap_tcp_common = 'nmap --open -Pn -sV -oX {}-tcp.xml {}\n'
nmap_udp = 'sudo nmap --open -Pn -sU -sV -oX {}-udp.xml {} -p {}\n'
nmap_udp_common = 'sudo nmap --open -Pn -sU -sV -oX {}-udp.xml {} -F\n'
with open(out_dir + '/run-nmap.sh', 'w') as outfile:
for name in applied_rules:
external_ip = applied_rules[name]['external_ip']
# "All" as a rule will apply to both TCP and UDP. These get special
# nmap commands to do only the common ports (full range is too slow
# and will be handled with masscan commands.
if set(['all']) in applied_rules[name].values():
outfile.write("echo running common TCP/UDP scans against {}\n"
.format(name))
outfile.write(nmap_tcp_common.format(name, external_ip))
outfile.write(nmap_udp_common.format(name, external_ip))
else:
if applied_rules[name]['allowed_tcp']:
ports = ','.join(applied_rules[name]['allowed_tcp'])
outfile.write("echo running TCP scans against {}\n"
.format(name))
outfile.write(nmap_tcp.format(name, external_ip, ports))
if applied_rules[name]['allowed_udp']:
ports = ','.join(applied_rules[name]['allowed_udp'])
outfile.write("echo running UDP scans against {}\n"
.format(name))
outfile.write(nmap_udp.format(name, external_ip, ports))
# Now, write masscan script for machines with all TCP ports open
masscan = 'sudo masscan -p{} {} --rate=1000 --open-only -oX {}.xml\n'
with open(out_dir + '/run-masscan.sh', 'w') as outfile:
for name in applied_rules:
external_ip = applied_rules[name]['external_ip']
if set(['all']) in applied_rules[name].values():
outfile.write("echo running full masscan against {}\n"
.format(name))
outfile.write(masscan.format('1-65535', external_ip, name))
print("[+] Wrote some files to {}, enjoy!".format(out_dir))
def main():
"""
Main function to parse json files and write analyzed output
"""
args = process_args()
applied_rules = {}
rules = []
instances = []
# Functions below in a loop based on whether we are targeting json files
# in a single directory or a tree with multiple project subdirectories.
if args.multi:
targets = glob.glob(args.multi + '/*')
else:
targets = [args.single]
for target in targets:
rules = parse_json(target + '/firewall-rules.json')
instances = parse_json(target + '/compute-instances.json')
if not rules or not instances:
print("[!] No valid data in {}".format(target))
continue
# Clean the data up a bit
rules = cleanup_rules(rules)
print("[*] Processed {} firewall rules in {}"
.format(len(rules), target))
instances = cleanup_instances(instances)
print("[*] Processed {} instances in {}"
.format(len(instances), target))
# Connect the dots and build out the applied rules dictionary
applied_rules = process_tagged_rules(applied_rules, rules, instances)
applied_rules = process_vpc_rules(applied_rules, rules, instances)
applied_rules = process_svc_rules(applied_rules, rules, instances)
# Process data and create various output files
process_output(applied_rules)
if __name__ == '__main__':
main()

Related

Airflow SSHOperator: How To Securely Access Pem File Across Tasks?

We are running Airflow via AWS's managed MWAA Offering. As part of their offering they include a tutorial on securely using the SSH Operator in conjunction with AWS Secrets Manager. The gist of how their solution works is described below:
Run a Task that fetches the pem file from a Secrets Manager location and store it on the filesystem at /tmp/mypem.pem.
In the SSH Connection include the extra information that specifies the file location
{"key_file":"/tmp/mypem.pem"}
Use the SSH Connection in the SSHOperator.
In short the workflow is supposed to be:
Task1 gets the pem -> Task2 uses the pem via the SSHOperator
All of this is great in theory, but it doesn't actually work. It doesn't work because Task1 may run on a different node from Task2, which means Task2 can't access the /tmp/mypem.pem file location that Task1 wrote the file to. AWS is aware of this limitation according to AWS Support, but now we need to understand another way to do this.
Question
How can we securely store and access a pem file that can then be used by Tasks running on different nodes via the SSHOperator?
I ran into the same problem. I extended the SSHOperator to do both steps in one call.
In AWS Secrets Manager, two keys are added for airflow to retrieve on execution.
{variables_prefix}/airflow-user-ssh-key : the value of the private key
{connections_prefix}/ssh_airflow_user : ssh://replace.user#replace.remote.host?key_file=%2Ftmp%2Fairflow-user-ssh-key
from typing import Optional, Sequence
from os.path import basename, splitext
from airflow.models import Variable
from airflow.providers.ssh.operators.ssh import SSHOperator
from airflow.providers.ssh.hooks.ssh import SSHHook
class SSHOperator(SSHOperator):
"""
SSHOperator to execute commands on given remote host using the ssh_hook.
:param ssh_conn_id: :ref:`ssh connection id<howto/connection:ssh>`
from airflow Connections.
:param ssh_key_var: name of Variable holding private key.
Creates "/tmp/{variable_name}.pem" to use in SSH connection.
May also be inferred from "key_file" in "extras" in "ssh_conn_id".
:param remote_host: remote host to connect (templated)
Nullable. If provided, it will replace the `remote_host` which was
defined in `ssh_hook` or predefined in the connection of `ssh_conn_id`.
:param command: command to execute on remote host. (templated)
:param timeout: (deprecated) timeout (in seconds) for executing the command. The default is 10 seconds.
Use conn_timeout and cmd_timeout parameters instead.
:param environment: a dict of shell environment variables. Note that the
server will reject them silently if `AcceptEnv` is not set in SSH config.
:param get_pty: request a pseudo-terminal from the server. Set to ``True``
to have the remote process killed upon task timeout.
The default is ``False`` but note that `get_pty` is forced to ``True``
when the `command` starts with ``sudo``.
"""
template_fields: Sequence[str] = ("command", "remote_host")
template_ext: Sequence[str] = (".sh",)
template_fields_renderers = {"command": "bash"}
def __init__(
self,
*,
ssh_conn_id: Optional[str] = None,
ssh_key_var: Optional[str] = None,
remote_host: Optional[str] = None,
command: Optional[str] = None,
timeout: Optional[int] = None,
environment: Optional[dict] = None,
get_pty: bool = False,
**kwargs,
) -> None:
super().__init__(
ssh_conn_id=ssh_conn_id,
remote_host=remote_host,
command=command,
timeout=timeout,
environment=environment,
get_pty=get_pty,
**kwargs,
)
if ssh_key_var is None:
key_file = SSHHook(ssh_conn_id=self.ssh_conn_id).key_file
key_filename = basename(key_file)
key_filename_no_extension = splitext(key_filename)[0]
self.ssh_key_var = key_filename_no_extension
else:
self.ssh_key_var = ssh_key_var
def import_ssh_key(self):
with open(f"/tmp/{self.ssh_key_var}", "w") as file:
file.write(Variable.get(self.ssh_key_var))
def execute(self, context):
self.import_ssh_key()
super().execute(context)
The answer by holly is good. I am sharing a different way I solved this problem. I used the strategy of converting the SSH Connection into a URI and then input that into Secrets Manager under the expected connections path, and everything worked great via the SSH Operator. Below are the general steps I took.
Generate an encoded URI
import json
from airflow.models.connection import Connection
from pathlib import Path
pem = Path(“/my/pem/file”/pem).read_text()
myconn= Connection(
conn_id="connX”,
conn_type="ssh",
host="10.x.y.z,
login=“mylogin”,
extra=json.dumps(dict(private_key=pem)),
print(myconn.get_uri())
Input that URI under the environment's configured path in Secrets Manager. The important note here is to input the value in the plaintext field without including a key. Example:
airflow/connections/connX and under Plaintext only include the URI value
Now in the SSHOperator you can reference this connection Id like any other.
remote_task = SSHOperator(
task_id="ssh_and_execute_command",
ssh_conn_id="connX"
command="whoami",
)

How can I get list of all cloud SQL ( GCP ) instances which are stopped in python, I am using google cloud api for this purpose

from googleapiclient import discovery
PROJECT = gcp-test-1234
sql_client = discovery.build('sqladmin', 'v1beta4')
resp = sql_client.instances().list(project=PROJECT).execute()
print(resp)
But in response, I am getting a state as "RUNNABLE" for stopped instances, so how can I verify that the instance is running or stopped programmatically
I have also check gcloud sql instances describe gcp-test-1234-test-db, it is providing state as "STOPPED"
how can I achieve this programmatically using python
In the Rest API, the RUNNABLE for the state field means that the instance is running, or has been stopped by the owner, as stated here.
You need to read from the activationPolicy field, where ALWAYS means your instance is running and NEVER means it is stopped. Something like the following will work:
from pprint import pprint
from googleapiclient import discovery
service = discovery.build('sqladmin', 'v1beta4')
project = 'gcp-test-1234'
instance = 'gcp-test-1234-test-db'
request = service.instances().get(project=project,instance=instance)
response = request.execute()
pprint(response['settings']['activationPolicy'])
Another option would be to use the Cloud SDK command directly from your python file:
import os
os.system("gcloud sql instances describe gcp-test-1234-test-db | grep state | awk {'print $2'}")
Or with subprocess:
import subprocess
subprocess.run("gcloud sql instances describe gcp-test-1234-test-db | grep state | awk {'print $2'}", shell=True)
Note that when you run gcloud sql instances describe you-instance --log-http on a stopped instance, in the response of the API, you'll see "state": "RUNNABLE", however, the gcloud command will show the status STOPPED. This is because the output of the command gets the status from the activationPolicy of the API response rather than the status, if the status is RUNNABLE.
If you want to check the piece of code that translates the activationPolicy to the status, you can see it in the SDK. The gcloud tool is written in python:
cat $(gcloud info --format "value(config.paths.sdk_root)")/lib/googlecloudsdk/api_lib/sql/instances.py|grep "class DatabaseInstancePresentation(object)" -A 17
You'll se the following:
class DatabaseInstancePresentation(object):
"""Represents a DatabaseInstance message that is modified for user visibility."""
def __init__(self, orig):
for field in orig.all_fields():
if field.name == 'state':
if orig.settings and orig.settings.activationPolicy == messages.Settings.ActivationPolicyValueValuesEnum.NEVER:
self.state = 'STOPPED'
else:
self.state = orig.state
else:
value = getattr(orig, field.name)
if value is not None and not (isinstance(value, list) and not value):
if field.name in ['currentDiskSize', 'maxDiskSize']:
setattr(self, field.name, six.text_type(value))
else:
setattr(self, field.name, value)

Cloudfront facts using Ansible

I need to retrieve the DNS name of my Cloudfront instance (eg. 1234567890abcd.cloudfront.net) and was wondering if there is a quick way to get this in Ansible without resorting to the AWS CLI.
From gleaming the Extra Modules source it would appear there is not a module for this. How are other people getting this attribute?
You can either write your own module or you can write a filter plugin in a few lines and accomplish the same thing.
Example of writing a filter in Ansible. Lets name this file aws.py in your filter_plugins/aws.py
import boto3
import botocore
from ansible import errors
def get_cloudfront_dns(region, dist_id):
""" Return the dns name of the cloudfront distribution id.
Args:
region (str): The AWS region.
dist_id (str): distribution id
Basic Usage:
>>> get_cloudfront_dns('us-west-2', 'E123456LHXOD5FK')
'1234567890abcd.cloudfront.net'
"""
client = boto3.client('cloudfront', region)
domain_name = None
try:
domain_name = (
client
.get_distribution(Id=dist_id)['Distribution']['DomainName']
)
except Exception as e:
if isinstance(e, botocore.exceptions.ClientError):
raise e
else:
raise errors.AnsibleFilterError(
'Could not retreive the dns name for CloudFront Dist ID {0}: {1}'.format(dist_id, str(e))
)
return domain_name
class FilterModule(object):
''' Ansible core jinja2 filters '''
def filters(self):
return {'get_cloudfront_dns': get_cloudfront_dns,}
In order to use this plugin, you just need to call it.
dns_entry: "{{ 'us-west-2' | get_cloudfront_dns('123434JHJHJH') }}"
Keep in mind, you will need boto3 and botocore installed, in order to use this this plugin.
I have a bunch of examples located in my repo linuxdynasty ld-ansible-filters repo
I ended up writing a module for this (cloudfront_facts.py) that has been accepted into Ansible 2.3.0.

Why does AWS Elastic Beanstalk Python insert a 'static' rule ahead of all others in priority?

The 'static' routing rule for my Python application is behaving strangely in my AWS Elastic Beanstalk application (and nowhere else), appearing to override all other rules.
For example, using the two functions below, on both my development machines and test servers elsewhere and on AWS, routes list the static rule last, and match_route shows other non-static rules matching paths that begin with 'static/...'. And as expected, if I navigate to a page with a path that starts with static/... on my non-AWS machines, one of my other (non-static) rules is matched. However (only) on AWS-EB, the server's static rule is invoked for such paths!
Why and how is AWS-EB "inserting" this rule ahead of all others? How do I either disable this behavior on AWS, or replicate it in my non-AWS systems?
application.url_map.host_matching = True
# ...
def routes(verbose, wide):
"""List routes supported by the application"""
for rule in sorted(app.url_map.iter_rules()):
if verbose:
fmt = "{:45s} {:30s} {:30s}" if wide else "{:35s} {:25s} {:25s}"
line = fmt.format(rule, rule.endpoint, ','.join(rule.methods))
else:
fmt = "{:45s}" if wide else "{:35s}"
line = fmt.format(rule)
print(line)
def match_route(host, match):
"""Match a route for a given host"""
if match is not None:
urls = app.url_map.bind(host)
try:
m = urls.match(match, "GET")
z = '{}({})'.format(m[0], ','.join(["{}='{}'".format(arg, m[1][arg]) for arg in m[1]] +
["host='{}'".format(host)]))
return z
except NotFound:
return
This is a result of the Apache server configuration in /etc/httpd/conf.d/wsgi.conf which contains
Alias /static/ /opt/python/current/app/static/
If you delete or comment out that line, the server will no longer "intercept" paths that start with 'static'.
Accomplishing this is, however, a bit trickier than one might guess, since the wigs.conf file gets (re)created after files are uploaded and commands are executed.
One way around this is to modify the file with a post-deployment hook, being sure to restart the webserver afterwards:
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/remalias.sh" :
mode: "00775"
owner: root
group: root
content: |
sed -i.backup -e 's/^Alias\s[/]static[/]\s[a-z/]*$//g' /etc/httpd/conf.d/wsgi.conf
service httpd restart

Do any AWS API scripts exist in a repository somewhere?

I want to start 10 instances, get their instance id's and get their private IP addresses.
I know this can be done using AWS CLI, I'm wondering if there are any such scripts already written so I don't have to reinvent the wheel.
Thanks
I recommend to use python and boto package for such automation. Python is more clear than bash. You can user following page as starting point: http://boto.readthedocs.org/en/latest/ec2_tut.html
In the off chance that someone in the future comes across my question, I thought I'd give my (somewhat) final solution.
Using python and the Boto package that was suggested, I have the following python script.
It's pretty well commented but feel free to ask if you have any questions.
import boto
import time
import sys
IMAGE = 'ami-xxxxxxxx'
KEY_NAME = 'xxxxx'
INSTANCE_TYPE = 't1.micro'
SECURITY_GROUPS = ['xxxxxx'] # If multiple, separate by commas
COUNT = 2 #number of servers to start
private_dns = [] # will be populated with private dns of each instance
print 'Connecting to AWS'
conn = boto.connect_ec2()
print 'Starting instances'
#start instance
reservation = conn.run_instances(IMAGE, instance_type=INSTANCE_TYPE, key_name=KEY_NAME, security_groups=SECURITY_GROUPS, min_count=COUNT, max_count=COUNT)#, dry_run=True)
#print reservation #debug
print 'Waiting for instances to start'
# ONLY CHECKS IF RUNNING, MAY NOT BE SSH READY
for instance in reservation.instances: #doing this for every instance we started
while not instance.update() == 'running': #while it's not running (probably 'pending')
print '.', # trailing comma is intentional to print on same line
sys.stdout.flush() # make the thing print immediately instead of buffering
time.sleep(2) # Let the instance start up
print 'Done\n'
for instance in reservation.instances:
instance.add_tag("Name","Hadoop Ecosystem") # tag the instance
private_dns.append(instance.private_dns_name) # adding ip to array
print instance, 'is ready at', instance.private_dns_name # print to console
print private_dns