How to launch workers on AWS EC2 instance when running locally? - amazon-web-services

I have put together a distributed setup at my university using the Distributed package that comes with Julia for running some intensive simulations. I usually launch workers on local machines through ssh using addprocs.
I have launched an c5.24xlarge EC2 instance. The aws_key.pem file exists and I have done
chmod 400 aws_key.pem
I am able to ssh into the instance just fine.
I am trying to add workers with the following code
workervec2 = [("ubuntu#ec2-xxxx:22", 24)]
addprocs(workervec2 ; sshflags="-i aws_key.pem",
tunnel=true, exename="/home/ubuntu/julia-1.0.4/bin/julia",
dir="/home/ubuntu/simulator")
I am trying to add additional workers on my Amazon EC2 instances, but I am failing with the following error
Warning: Identity file aws_key.pem not accessible: No such file or directory.
ubuntu#ec2-xxxx: Permission denied (publickey).
ERROR: LoadError: Unable to read host:port string from worker. Launch command exited with error?
The warning comes even when launching workers on the local machines, but the launch goes through. However, launching on my EC2 instance fails with the following error, while I am able to ssh from the terminal. What is going wrong?

Adding the ssh key from my local machine to the EC2 instance did the trick. This helped.
Then, workers can be added as usual
workervec2 = [("ubuntu#ec2-xxxx:22", 24)]
addprocs(workervec2 ; sshflags="-i ~/.ssh/id_rsa.pub",
tunnel=true, exename="/home/ubuntu/julia-1.0.4/bin/julia",
dir="/home/ubuntu/simulator")

Related

Error while launching the EC2 instance

I create a VPS for Rstudio and launched an EC2 instance in the following configuration
I first chose the AMI (default AMI for free tier users)
Then I added the piece of code for setting up the R server with credentials.
I also defined the security protocol with port 8787 for accessing my server.
I launched the EC2 instance and had a approval for the status check .
I was now able to access the R sever with mycredential.
I tried to read the data n my s3 bucket.
For this, i tried downloading the RCurl package in R.
I had a error that * Warning in install.packages :
installation of package ‘RCurl’ had non-zero exit status*
could someone help me to resolve this issue

ec2 instance with role AmazonEC2RoleforSSM is unable to do EC2 operations in ansible

I have an instance with AmazonEC2RoleforSSM role. I want to run ansible task in this machine which commissions ec2 instances, without setting AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
This doesn't work as expected, it always needs to set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Is there a way to do this?
Jaks, could you explain a little bit more about what you're trying to do?
Having an instance profile with the AmazonEC2RoleforSSM policy will allow the instance to call the Systems Manager APIs and be treated as a managed instance, allowing you to use features like Run Command, Inventory, Patch Manager and the like. It will not, however, grant the instance permission to call EC2 APIs (e.g. run-instances).
What is the specific operation you're performing that's failing and what error message are you getting?
AWS Systems Manager requires the SSM Role to be attached in order to execute a SSM Agent in the EC2 instance. Once SSM agent was installed into a particular EC2 instance, you could freely exec commands from AWS Systems Manager.
I guess after the installation of SSM agent, you can execute ansible script freely (it's not related with access key issue). Is that OK ?
Documents to execute commands with SSM:
Executing Commands Using Systems Manager Run Command
Executing Commands from the Console

SSH connection error - Permission denied (publickey)

I'm trying to run a Spark cluster on AWS using https://github.com/amplab/spark-ec2.
I've generated a key and and login credentials, and I'm using this command:
./spark-ec2 --key-pair=octavianKey4 --identity-file=credentials3.csv --region=eu-west-1 --zone=eu-west-1c launch my-instance-name
However, I keep getting this:
Warning: SSH connection error. (This could be temporary.)
Host: mec2-myHostNumber.eu-west-1.compute.amazonaws.com
SSH return code: 255
SSH output: Warning: Permanently added 'ec2-myHostNumber.eu-west-1.compute.amazonaws.com,myHostNumber' (ECDSA) to the list of known hosts.
Permission denied (publickey).
If I quit the console and then try to start the cluster again, I get this:
Setting up security groups...
Searching for existing cluster my-instance-name in region eu-west-1...
Found 1 master, 1 slave.
ERROR: There are already instances running in group my-instance-name-master or my-instance-name-slaves
The command is incorrect. Key pair name should be the one you mention in AWS. Identity file is .pem file associated. You can't ssh into a machine with AWS credentials (your csv file is credentials).
./spark-ec2 --key-pair=octavianKey4 --identity-file=octavianKey4.pem --region=eu-west-1 --zone=eu-west-1c launch my-instance-name
Can you add --resume to your spark-ec2 command and try? Your slave may not have the key. --resume will make sure it is transferred to the slave.
Running Spark on EC2
If one of your launches fails due to e.g. not having the right
permissions on your private key file, you can run launch with the
--resume option to restart the setup process on an existing cluster.

Can't login to docker with aws

This is an extension of my last question considering I've decided to deploy a Docker container onto a ton of EC2's. I've set up a repository and a user with full rights, and I added the correct keys to my aws cli configuration. When I try to run the docker login command that comes up after running the "aws ecr get-login" command, it gives me a failed with status: 403 forbidden error. I have absolutely no clue what's going on, and I've spent the past 2 days trying to fix this error... Any ideas?
I would suggest to check the security group of the EC2 Instance
To allow access via SSH you have to apply the following settings for the Security Group of the EC2 Instance:
Security Groups

why does my website stops loading on aws ec2 instance randomly once in a while?

I am running a t2.micro ec2 instance on us-west-2a and instance's state is all green.
When I access my website it stops loading once in a while. Even if I reboot it, the website still doesn't load. When I stop an instance and then relaunch it, it shows 1/2 status checks failed.
ALARM TYPE: awsec2-i-20aaa52c-High-Network-Out
I also faced same type of issue.
EC2 instances were failing Instance Status Checks after a stop/start. I was able to take a look on my side at the System logs available to support and I could confirm that the system was having a kernel panic and was unable to boot from the root volume.
So I launched new EC2 temporary instance so we can attach the EBS root volumes of each EC2 instance . Here we modified the grub configuration file so it can load from a previous kernel.
The following commands:
1. Mount the EBS volume as a secondary volume into mnt folder: $ sudo mount /dev/xvdf1 /mnt
2. Backup the grub.cfg file: sudo cp /mnt/boot/grub2/grub.cfg grub.cfg_backup
3. Edit the grub.cfg file: sudo vim /mnt/boot/grub2/grub.cfg
4. Here we commented # all the lines for the first entry loading the new kernel.
Then you attached the original EBS volumes back to the original EC2 instances and these EC2 instances were able to successfully boot.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstances.html#FilesystemKernel