Code:
#!/usr/bin/env python
import boto.ec2
conn_ec2 = boto.ec2.connect_to_region('us-east-1') # access keys are environment vars
my_code = """#!/usr/bin/env python
import sys
sys.stdout = open('file', 'w')
print 'test'
"""
reservation = conn_ec2.run_instances(image_id = 'ami-a73264ce',
key_name = 'backendkey',
instance_type = 't1.micro',
security_groups = ['backend'],
instance_initiated_shutdown_behavior = 'terminate',
user_data = my_code)
The instance is initiated with the proper settings (it's the public Ubuntu 12.04, 64-bit, image) and I can SSH into it normally. The user-data script seems to be loaded correctly: I can see it in /var/lib/cloud/instance/user-data.txt (and also in /var/lib/cloud/instance/scripts/part-001) and on the EC2 console.
But that's it, the script doesn't seem to be executed. Following this answer I checked the /var/log/cloud-init.log file but it doesn't seem to contain any error messages related to my script (well, maybe I'm missing something - here is a gist with the contents of cloud-init.log).
What am I missing?
This is probably not relevant anymore, but yet.
I've just used boto with ubuntu and user data, although the documenation says that the user data has to be base64 encoded, it only worked for me if I pass the 64 bit paramter as regular string.
I read the content of user data from file (using fh.read()) and then just pass this as the user_data paramter to run_instances.
I think it's not working for you because user data can't use any shebang like you used "#!/usr/bin/env python"
On the help page http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html there are two examples one is the standard "#!/bin/bash", and another one looks artificial "#cloud-config". Probably it's only 2 available shebangs. The bash one works for me.
Related
I'm having trouble authenticating and writing data to a spanner database locally. All imports are up to date - google.cloud, google.auth2, etc. I have tried having someone else run this and it works fine, so the problem seems to be something on my end - something wrong or misconfigured on my computer, maybe where the credentials are stored or something?
Anyone have any ideas?
from google.cloud import spanner
from google.api_core.exceptions import GoogleAPICallError
from google.api_core.datetime_helpers import DatetimeWithNanoseconds
import datetime
from google.oauth2 import service_account
def write_to(database):
record = [[
1041613562310836275,
'test_name'
]]
columns = ("id", "name")
insert_errors = []
try:
with database.batch() as batch:
batch.insert_or_update(
table = "guild",
columns = columns,
values = record,
)
except GoogleAPICallError as e:
print(f'error: {e}')
insert_errors.append(e.message)
pass
return insert_errors
if __name__ == "__main__":
credentials = service_account.Credentials.from_service_account_file(r'path\to\a.json')
instance_id = 'instance-name'
database_id = 'database-name'
spanner_client = spanner.Client(project='project-name', credentials=credentials)
print(f'spanner creds: {spanner_client.credentials}')
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)
insert_errors = write_to(database)
some credential tests:
creds = service_account.Credentials.from_service_account_file(a_json)
<google.oauth2.service_account.Credentials at 0x...>
spanner_client.credentials
<google.auth.credentials.AnonymousCredentials at 0x...>
spanner_client.credentials.signer_email
AttributeError: 'AnonymousCredentials' object has no attribute 'signer_email'
creds.signer_email
'...#....iam.gserviceaccount.com'
spanner.Client().from_service_account_json(a_json).credentials
<google.auth.credentials.AnonymousCredentials object at 0x...>
The most common reason for this is that you have accidentally set (or forgot to unset) the environment variable SPANNER_EMULATOR_HOST. If this environment variable has been set, the client library will try to connect to the emulator instead of Cloud Spanner. This will cause the client library to wait for a long time while trying to connect to the emulator (assuming that the emulator is not running on your machine). Unset the environment variable to fix this problem.
Note: This environment variable will only affect Cloud Spanner client libraries, which is why other Google Cloud product will work on the same machine. The script will also in most cases work on other machines, as they are unlikely to have this environment variable set.
I am trying to train a PyTorch model through SageMaker. I am running a script main.py (which I have posted a minimum working example of below) which calls a PyTorch Estimator. I have the code for training my model saved as a separate script, train.py, which is called by the entry_point parameter of the Estimator. These scripts are hosted on a EC2 instance in the same AWS region as my SageMaker domain.
When I try running this with instance_type = "ml.m5.4xlarge", it works ok. However, I am unable to debug any problems in train.py. Any bugs in that file simply give me the error: 'AlgorithmError: ExecuteUserScriptError', and will not allow me to set breakpoint() lines in train.py (encountering a breakpoint throws the above error).
Instead I am trying to run in local mode, which I believe does allow for breakpoints. However, when I reach estimator.fit(inputs), it hangs on that line indefinitely, giving no output. Any print statements that I put at the start of the main function in train.py are not reached. This is true no matter what code I put in train.py. It also did not throw an error when I had an illegal underscore in the base_job_name parameter of the estimator, which suggests that it does not even create the estimator instance.
Below is a minimum example which replicates the issue on my instance. Any help would be appreciated.
### File structure
main.py
customcode/
|
|_ train.py
### main.py
import sagemaker
from sagemaker.pytorch import PyTorch
import boto3
try:
# When running on Studio.
sess = sagemaker.session.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
except ValueError:
# When running from EC2 or local machine.
print('Performing manual setup.')
bucket = 'MY-BUCKET-NAME'
region = 'us-east-2'
role = 'arn:aws:iam::MY-ACCOUNT-NUMBER:role/service-role/AmazonSageMaker-ExecutionRole-XXXXXXXXXX'
iam = boto3.client("iam")
sagemaker_client = boto3.client("sagemaker")
boto3.setup_default_session(region_name=region, profile_name="default")
sess = sagemaker.Session(sagemaker_client=sagemaker_client, default_bucket=bucket)
hyperparameters = {'epochs': 10}
inputs = {'data': f's3://{bucket}/features'}
train_instance_type = 'local'
hosted_estimator = PyTorch(
source_dir='customcode',
entry_point='train.py',
instance_type=train_instance_type,
instance_count=1,
hyperparameters=hyperparameters,
role=role,
base_job_name='mwe-train',
framework_version='1.12',
py_version='py38',
input_mode='FastFile',
)
hosted_estimator.fit(inputs) # This is the line that freezes
### train.py
def main():
breakpoint() # Throws an error in non-local mode.
return
if __name__ == '__main__':
print('Reached') # Never reached in local mode.
main()
We are running Airflow via AWS's managed MWAA Offering. As part of their offering they include a tutorial on securely using the SSH Operator in conjunction with AWS Secrets Manager. The gist of how their solution works is described below:
Run a Task that fetches the pem file from a Secrets Manager location and store it on the filesystem at /tmp/mypem.pem.
In the SSH Connection include the extra information that specifies the file location
{"key_file":"/tmp/mypem.pem"}
Use the SSH Connection in the SSHOperator.
In short the workflow is supposed to be:
Task1 gets the pem -> Task2 uses the pem via the SSHOperator
All of this is great in theory, but it doesn't actually work. It doesn't work because Task1 may run on a different node from Task2, which means Task2 can't access the /tmp/mypem.pem file location that Task1 wrote the file to. AWS is aware of this limitation according to AWS Support, but now we need to understand another way to do this.
Question
How can we securely store and access a pem file that can then be used by Tasks running on different nodes via the SSHOperator?
I ran into the same problem. I extended the SSHOperator to do both steps in one call.
In AWS Secrets Manager, two keys are added for airflow to retrieve on execution.
{variables_prefix}/airflow-user-ssh-key : the value of the private key
{connections_prefix}/ssh_airflow_user : ssh://replace.user#replace.remote.host?key_file=%2Ftmp%2Fairflow-user-ssh-key
from typing import Optional, Sequence
from os.path import basename, splitext
from airflow.models import Variable
from airflow.providers.ssh.operators.ssh import SSHOperator
from airflow.providers.ssh.hooks.ssh import SSHHook
class SSHOperator(SSHOperator):
"""
SSHOperator to execute commands on given remote host using the ssh_hook.
:param ssh_conn_id: :ref:`ssh connection id<howto/connection:ssh>`
from airflow Connections.
:param ssh_key_var: name of Variable holding private key.
Creates "/tmp/{variable_name}.pem" to use in SSH connection.
May also be inferred from "key_file" in "extras" in "ssh_conn_id".
:param remote_host: remote host to connect (templated)
Nullable. If provided, it will replace the `remote_host` which was
defined in `ssh_hook` or predefined in the connection of `ssh_conn_id`.
:param command: command to execute on remote host. (templated)
:param timeout: (deprecated) timeout (in seconds) for executing the command. The default is 10 seconds.
Use conn_timeout and cmd_timeout parameters instead.
:param environment: a dict of shell environment variables. Note that the
server will reject them silently if `AcceptEnv` is not set in SSH config.
:param get_pty: request a pseudo-terminal from the server. Set to ``True``
to have the remote process killed upon task timeout.
The default is ``False`` but note that `get_pty` is forced to ``True``
when the `command` starts with ``sudo``.
"""
template_fields: Sequence[str] = ("command", "remote_host")
template_ext: Sequence[str] = (".sh",)
template_fields_renderers = {"command": "bash"}
def __init__(
self,
*,
ssh_conn_id: Optional[str] = None,
ssh_key_var: Optional[str] = None,
remote_host: Optional[str] = None,
command: Optional[str] = None,
timeout: Optional[int] = None,
environment: Optional[dict] = None,
get_pty: bool = False,
**kwargs,
) -> None:
super().__init__(
ssh_conn_id=ssh_conn_id,
remote_host=remote_host,
command=command,
timeout=timeout,
environment=environment,
get_pty=get_pty,
**kwargs,
)
if ssh_key_var is None:
key_file = SSHHook(ssh_conn_id=self.ssh_conn_id).key_file
key_filename = basename(key_file)
key_filename_no_extension = splitext(key_filename)[0]
self.ssh_key_var = key_filename_no_extension
else:
self.ssh_key_var = ssh_key_var
def import_ssh_key(self):
with open(f"/tmp/{self.ssh_key_var}", "w") as file:
file.write(Variable.get(self.ssh_key_var))
def execute(self, context):
self.import_ssh_key()
super().execute(context)
The answer by holly is good. I am sharing a different way I solved this problem. I used the strategy of converting the SSH Connection into a URI and then input that into Secrets Manager under the expected connections path, and everything worked great via the SSH Operator. Below are the general steps I took.
Generate an encoded URI
import json
from airflow.models.connection import Connection
from pathlib import Path
pem = Path(“/my/pem/file”/pem).read_text()
myconn= Connection(
conn_id="connX”,
conn_type="ssh",
host="10.x.y.z,
login=“mylogin”,
extra=json.dumps(dict(private_key=pem)),
print(myconn.get_uri())
Input that URI under the environment's configured path in Secrets Manager. The important note here is to input the value in the plaintext field without including a key. Example:
airflow/connections/connX and under Plaintext only include the URI value
Now in the SSHOperator you can reference this connection Id like any other.
remote_task = SSHOperator(
task_id="ssh_and_execute_command",
ssh_conn_id="connX"
command="whoami",
)
I am using Python 2.7 and using google plus public API to get activity data in a file. I am encountering issues to maintain the json encoding in my file. Double quotes are coming as u'' in my file. Below is my code:
from apiclient import discovery
API_KEY = 'MY API KEY'
service = discovery.build("plus", "v1", developerKey=API_KEY)
activities_resource = service.activities()
request = activities_resource.search(query='India versus South Africa', maxResults=1, orderBy='best',)
while request!= None:
activities_document = request.execute()
if 'items' in activities_document:
with open("output.json", mode='a') as file:
data = str(activities_document['items'])
file.write(data +"\n\n")
request = service.activities().list_next(request, activities_document)
Output:
[{u'kind': u'plus#activity', u'provider': {u'title': u'Google+'}, u'titl.......
I am expecting [{"kind": "plus#activity", .....
I am running my code on windows and I have tried both on DOS and pycharm IDE. I have also run the code on ubuntu machine but same output. Please let me know what I am doing wrong.
The json module is used for generating JSON. Use it.
I want to start 10 instances, get their instance id's and get their private IP addresses.
I know this can be done using AWS CLI, I'm wondering if there are any such scripts already written so I don't have to reinvent the wheel.
Thanks
I recommend to use python and boto package for such automation. Python is more clear than bash. You can user following page as starting point: http://boto.readthedocs.org/en/latest/ec2_tut.html
In the off chance that someone in the future comes across my question, I thought I'd give my (somewhat) final solution.
Using python and the Boto package that was suggested, I have the following python script.
It's pretty well commented but feel free to ask if you have any questions.
import boto
import time
import sys
IMAGE = 'ami-xxxxxxxx'
KEY_NAME = 'xxxxx'
INSTANCE_TYPE = 't1.micro'
SECURITY_GROUPS = ['xxxxxx'] # If multiple, separate by commas
COUNT = 2 #number of servers to start
private_dns = [] # will be populated with private dns of each instance
print 'Connecting to AWS'
conn = boto.connect_ec2()
print 'Starting instances'
#start instance
reservation = conn.run_instances(IMAGE, instance_type=INSTANCE_TYPE, key_name=KEY_NAME, security_groups=SECURITY_GROUPS, min_count=COUNT, max_count=COUNT)#, dry_run=True)
#print reservation #debug
print 'Waiting for instances to start'
# ONLY CHECKS IF RUNNING, MAY NOT BE SSH READY
for instance in reservation.instances: #doing this for every instance we started
while not instance.update() == 'running': #while it's not running (probably 'pending')
print '.', # trailing comma is intentional to print on same line
sys.stdout.flush() # make the thing print immediately instead of buffering
time.sleep(2) # Let the instance start up
print 'Done\n'
for instance in reservation.instances:
instance.add_tag("Name","Hadoop Ecosystem") # tag the instance
private_dns.append(instance.private_dns_name) # adding ip to array
print instance, 'is ready at', instance.private_dns_name # print to console
print private_dns