ec2-register Client.null: null - amazon-web-services

I am trying to resgister an amazon image, and I keep getting the error Client.null: null.
I am able to browse to the URL and see the xml file.
The command I execute is:
ec2-register output.raw.manifest.xml -U <URL>
Client.null: null
any idea what could be the problem?
Thanks!

Keep in mind that this command is used to register instance store images rather than EBS back images.
Usually the xml file with a series of 10GB files are uploaded to S3 prior to registering the AMI. Are you sure the bundle is in one of your S3 buckets?
Did you run something like this from the instance you want to create the image from?:
ec2-bundle-vol -d /<someplace-where-you-have-a-lot-of-space> -k YOUR_PRIVATE_KEY -c YOUR_CERTIFICATE -u YOUR_ACCOUNT_NUMBER
ec2-upload-bundle -b YOUR_BUCKET_NAME -m output.raw.manifest.xml -a YOUR_ACCESS_KEY -s YOUR_SECRET_KEY
Then you can run:
ec2-register output.raw.manifest.xml
You can also register your image from the AWS console once you have created the bundle like shown here:
There are several blogs that talk about how to do this too. For example:
http://www.ryannitz.org/tech-notes/2009/08/09/create-amazon-ec2-ami/
Finally, if you are registering and EBS backed AMI you can just simply use:
ec2-create-image <instance id>

Related

AWS EC2 User Data doesn't work after modifying it

Note: There are no asked questions about modifying EC2 instance user data.
my case: I added the user data below at EC2 first launch, and it worked perfectly.
#! /bin/bash
cd ~
echo "Test" > index.html
python -m SimpleHTTPServer 80
After launching the instance, in order to modify the user data I stopped the instance, changed the user data, and restarted the instance. But this time the scripts are not working.
#! /bin/bash
cd ~
echo "Test2" > index.html
python -m SimpleHTTPServer 80
I don't understand why the modified user data didn't work.
To quote User data and shell scripts:
By default, user data scripts and cloud-init directives run only during the boot cycle when you first launch an instance. You can update your configuration to ensure that your user data scripts and cloud-init directives run every time you restart your instance. For more information, see How can I execute user data with every restart of my EC2 instance? in the AWS Knowledge Center.
By default user data is only run on first boot (except instances using instance store volumes)
If you want to remove one time use the below info:
As per the answer from: https://serverfault.com/questions/797482/how-to-make-ec2-user-data-script-run-again-on-startup
rm /var/lib/cloud/instances/*/sem/config_scripts_user
Or
rm /var/lib/cloud/instance/sem/config_scripts_user
For Windows instances just add <persist>true</persist> in the user data.
https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2-windows-user-data.html

Decrypted vars when install a new aws instance via user-data script

I have Ansible playbooks ready, they includes several encrypted vars. With normal process, I can feed a vault password file to decrypt them with --vault-password-file ~/.vault_pass.txt and deploy the change to remote EC2 instance. So I needn't expose the password file.
But my request is different here. I need include ansible-playbook change in user-data script when create a new EC2 instance. Ideally I should automatically have all setting ready after the instance is running.
I deploy the instances with Terraform by below simple user-data script:
#!/usr/bin/bash
yum -y update
/usr/local/bin/aws s3 cp s3://<BUCKET>/ansible.tar.gz ansible.tar.gz
gtar zxvf ansible.tar.gz
cd ansible
ansible-playbook -i inventory/ec2.py -c local ROLE.yml
So I have to upload my password file into user-data script as well, if in the playbook, there are some encrypted vars.
Anything I can do to avoid it? Will Ansible Tower help for this request?
I did test with CredStash, but still a chicken and egg issue.
If you want your instances to configure themselves they are going to either need all the credentials or another way to get the credentials, ideally with some form of one time pass.
The best I can think of off the top of my head is to use Hashicorp's Vault to store the credentials (potentially all of our secrets or maybe just the Ansible Vault password that then can be used to un-vault your Ansible variables) and have your deploy process create a one time use token that is injected into the user-data script via Terraform's templating.
To do this you'll probably want to wrap your Terraform apply command with some form of helper script that might look like this (untested):
#!/bin/bash
vault_host="10.0.0.3"
vault_port="8200"
response=`curl \
-X POST \
-H "X-Vault-Token:$VAULT_TOKEN" \
-d '{"num_uses":"1"}' \
http://${vault_host}:${vault_port}/auth/token/create/ansible_vault_read`
vault_token=`echo ${response} | jq '.auth.client_token' --raw-output`
terraform apply \
-var 'vault_host=${vault_host}'
-var 'vault_port=${vault_port}'
-var 'vault_token=${vault_token}'
And then your user data script will want to be templated in Terraform with something like this (also untested):
template.tf:
resource "template_file" "init" {
template = "${file("${path.module}/init.tpl")}"
vars {
vault_host = "${var.vault_host}"
vault_port = "${var.vault_port}"
vault_token = "${var.vault_token}"
}
}
init.tpl:
#!/usr/bin/bash
yum -y update
response=`curl \
-H "X-Vault-Token: ${vault_token}" \
-X GET \
http://${vault_host}:${vault_port}/v1/secret/ansible_vault_pass`
ansible_vault_password=`echo ${response} | jq '.data.ansible_vault_pass' --raw-output`
echo ${ansible_vault_password} > ~/.vault_pass.txt
/usr/local/bin/aws s3 cp s3://<BUCKET>/ansible.tar.gz ansible.tar.gz
gtar zxvf ansible.tar.gz
cd ansible
ansible-playbook -i inventory/ec2.py -c local ROLE.yml --vault-password-file ~/.vault_pass.txt
Alternatively you could simply have the instances call something such as Ansible Tower to trigger the playbook to be run against it. This allows you to keep the secrets on the central box doing the configuration rather than having to distribute them to every instance you are deploying.
With Ansible Tower this is done using callbacks and you will need to set up job templates and then have your user data script curl the Tower to trigger the configuration run. You could change your user data script to something like this instead:
template.tf:
resource "template_file" "init" {
template = "${file("${path.module}/init.tpl")}"
vars {
ansible_tower_host = "${var.ansible_tower_host}"
ansible_host_config_key = "${var.ansible_host_config_key}"
}
}
init.tpl:
#!/usr/bin/bash
curl \
-X POST
--data "host_config_key=${ansible_host_config_key}" \
http://{${ansible_tower_host}/v1/job_templates/1/callback/
The host_config_key may seem to be a secret at first glance but it's a shared key that can be used for multiple hosts to access a job template and Ansible Tower will still only run if the host is either defined in a static inventory for the job template or if you are using dynamic inventories then if the host is found in that lookup.

delete s3 files from a pipeline AWS

I would like to ask about a processing task I am trying to complete using a data pipeline in AWS, but I have not been able to get it to work.
Basically, I have 2 data nodes representing 2 MySQL databases, where the data is supposed to be extracted from periodically and placed in an S3 bucket. This copy activity is working fine selecting daily every row that has been added, let's say today - 1 day.
However, that bucket containing the collected data as CSVs should become the input for an EMR activity, which will be processing those files and aggregating the information. The problem is that I do not know how to remove or move the already processed files to a different bucket so I do not have to process all the files every day.
To clarify, I am looking for a way to move or remove already processed files in an S3 bucket from a pipeline. Can I do that? Is there any other way I can only process some files in an EMR activity based on a naming convention or something else?
Even better, create a DataPipeline ShellCommandActivity and use the aws command line tools.
Create a script with these two lines:
sudo yum -y upgrade aws-cli
aws s3 rm $1 --recursive
The first line ensures you have the latest aws tools.
The second one removes a directory and all its contents. The $1 is an argument passed to the script.
In your ShellCommandActivity:
"scriptUri": "s3://myBucket/scripts/theScriptAbove.sh",
"scriptArgument": "s3://myBucket/myDirectoryToBeDeleted"
The details on how the aws s3 command works are at:
http://docs.aws.amazon.com/cli/latest/reference/s3/index.html
1) Create a script which takes input path and then deletes the files using hadoop fs -rmr s3path.
2) Upload the script to s3
In emr use the prestep -
1) hadoop fs -copyToLocal s3://scriptname .
2) chmod +x scriptname
3) run script
That pretty much it.
Another approach without using EMR is to install s3cmd tool through ShellCommandActivity in a small EC2 instance, then you can use s3cmd in pipeline to operate your S3 repo in whatever way you want.
A tricky part of this approach is to configure s3cmd through a configuration file safely (basically pass access key and secret), as you can't just ssh into the EC2 instance and use 's3cmd --configure' interactively in a pipeline.
To do that, you create a config file in the ShellCommandActivity using 'cat'. For example:
cat <<EOT >> s3.cfg
blah
blah
blah
EOT
Then use '-c' option to attach the config file every time you call s3cmd like this:
s3cmd -c s3.cfg ls
Sounds complicated, but works.

Can't make COPY from remote host to Redshift work

I have a gzipped file on a local machine and want to load it to Redshift.
My command looks like this:
\COPY tablename FROM 's3://redshift.manifests/copy_from_yb01_urlinfo.txt' REGION 'us-east-1' CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...' SSH GZIP;
But I get a message "s3:/redshift.manifests/copy_from_yb01_urlinfo.txt: No such file or directory".
But this file even public: https://s3.amazonaws.com/redshift.manifests/copy_from_yb01_urlinfo.txt.
Moreover, the user whose credentials I use have a full access to S3 and Redshift: http://c2n.me/iEnI5l.png
And even more weird is the fact that I could perfectly access that file with same credentials from AWS CLI:
> aws s3 ls redshift.manifests
2014-08-01 19:32:13 137 copy_from_yb01_urlinfo.txt
How to diagnose that further?
Just in case, I connect to my Redshift cluster via psql (PostgreSQL cli):
PAGER=more LANG=C psql -h ....us-east-1.redshift.amazonaws.com -p 5439 -U ... -d ...
edit:
Uploaded file to S3 - same error on COPY...
And again I uploaded it and ran COPY with same credentials.
\COPY url_info FROM 's3://redshift-datafiles/url_info_1.copy.gz' CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...' GZIP;
I am going to despair...
Since you are trying to copy to RedShift using a manifest file, you need to use the MANIFEST command at the end like :
\COPY tablename FROM 's3://redshift.manifests/copy_from_yb01_urlinfo.txt' REGION 'us-east-1' CREDENTIALS 'aws_access_key_id=...;aws_secret_access_key=...' SSH GZIP MANIFEST;
Oh.
The fix was to remove backslash in the beginning of the command.
Can't remember why I started writing it... Actually I already began writing it when I exported data from local PostgreSQL installation.
This is so stupid) One small rubber duck could have saved me a day or two.

How to check whether my user data passing to EC2 instance is working

While creating a new AWS EC2 instance using the EC2 command line API, I passed some user data to the new instance.
How can I know whether that user data executed or not?
You can verify using the following steps:
SSH on launch EC2 instance.
Check the log of your user data script in:
/var/log/cloud-init.log and
/var/log/cloud-init-output.log
You can see all logs of your user data script, and it will also create the /etc/cloud folder.
Just for reference, you can check if the user data executed by taking a look at the system log from the EC2 console. Right click on your instance -
In the new interface: Monitor and Troubleshoot > Get System Log
In the old interface: Instance Settings > Get System log
This should open a modal window with the system logs
It might also be useful for you to see what the userdata looks like when it's being executed during the bootstrapping of the instance. This is especially true if you are passing in environmental variables or flags from the CloudFormation template. You can see how the UserData is being executed in two different ways:
1. From within the instance:
# Get instance ID
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
# Print user data
sudo cat /var/lib/cloud/instances/$INSTANCE_ID/user-data.txt
2. From outside the instance
Note: this will only work if you have configured the UserData shell in such a way that it will output the commands it runs.
For bash, you can do this like as follows:
"#!/bin/bash\n",
"set -x\n",
Right click on the EC2 instance from the EC2 console -> Monitor and Troubleshoot -> Get system log. Download the log file and look for something a section that looks like this:
ip-172-31-76-56 login: 2021/10/25 17:13:47Z: Amazon SSM Agent v3.0.529.0 is running
2021/10/25 17:13:47Z: OsProductName: Ubuntu
2021/10/25 17:13:47Z: OsVersion: 20.04
[ 45.636562] cloud-init[856]: Cloud-init v. 21.2-3...
[ 47.749983] cloud-init[896]: + echo hello world
this is what you would see if the UserData was configured like this:
"#!/bin/bash\n",
"set -x\n",
"echo hello world"
Debugging user data scripts on Amazon EC2 is a bit awkward indeed, as there is usually no way to actively hook into the process, so one ideally would like to gain Real time access to user-data script output as summarized in Eric Hammond's article Logging user-data Script Output on EC2 Instances:
The recent Ubuntu AMIs still send user-data script to the console
output, so you can view it remotely, but it is no longer available in
syslog on the instance. The console output is only updated a few
minutes after the instance boots, reboots, or terminates, which forces
you to wait to see the output of the user-data script as well as not
capturing output that might come out after the snapshot.
Depending on your setup you might want to ship the logs to a remote logging facility like Loggly right away, but getting this installed early enough can obviously be kind of a chicken/egg problem (though it works great if the AMI happens to be configured like so already).
Enable logging for your user data
Eric Hammond, in "Logging user-data Script Output on EC2 Instances (2010, Hammond)", suggests:
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
Take care to put a space between the two > > characters at the beginning of the statement.
Here’s a complete user-data script as an example:
#!/bin/bash -ex
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
echo BEGIN
date '+%Y-%m-%d %H:%M:%S'
echo END
Put this in userdata
touch /tmp/file2.txt
Once the instance is up you can check whether the file is created or not. Based on this you can tell if the userdata is executed or not.
Have your user data create a file in your ec2's /tmp directory to see if it works:
bob.txt:
#!/bin/sh
echo 'Woot!' > /home/ec2-user/user-script-output.txt
Then launch with:
ec2-run-instances -f bob.txt -t t1.micro -g ServerPolicy ami-05cf5c6d -v