I'm trying to run a Spark cluster on AWS using https://github.com/amplab/spark-ec2.
I've generated a key and and login credentials, and I'm using this command:
./spark-ec2 --key-pair=octavianKey4 --identity-file=credentials3.csv --region=eu-west-1 --zone=eu-west-1c launch my-instance-name
However, I keep getting this:
Warning: SSH connection error. (This could be temporary.)
Host: mec2-myHostNumber.eu-west-1.compute.amazonaws.com
SSH return code: 255
SSH output: Warning: Permanently added 'ec2-myHostNumber.eu-west-1.compute.amazonaws.com,myHostNumber' (ECDSA) to the list of known hosts.
Permission denied (publickey).
If I quit the console and then try to start the cluster again, I get this:
Setting up security groups...
Searching for existing cluster my-instance-name in region eu-west-1...
Found 1 master, 1 slave.
ERROR: There are already instances running in group my-instance-name-master or my-instance-name-slaves
The command is incorrect. Key pair name should be the one you mention in AWS. Identity file is .pem file associated. You can't ssh into a machine with AWS credentials (your csv file is credentials).
./spark-ec2 --key-pair=octavianKey4 --identity-file=octavianKey4.pem --region=eu-west-1 --zone=eu-west-1c launch my-instance-name
Can you add --resume to your spark-ec2 command and try? Your slave may not have the key. --resume will make sure it is transferred to the slave.
Running Spark on EC2
If one of your launches fails due to e.g. not having the right
permissions on your private key file, you can run launch with the
--resume option to restart the setup process on an existing cluster.
Related
I created a sudo user in a newly created google cloud compute machine (debian) and added a new system user using the below commands
ssh into the instance using this command: gcloud compute ssh instance-name --zone=us-central1-a
created a sudouser by running running this command sudo adduser admin_user
I can see the new user gets added by running the below command less /etc/passwd admin_user:x:1002:1003::/home/admin_user:/bin/sh
The user group also I verified by running groups admin_user . this is the output admin_user : admin_user sudo google-sudoers
But when I try to ssh to that instance from my local machine
gcloud compute ssh --project project_name --zone us-central1-a admin_user#instance-name
its giving the following error.
admin_user#32.29.134.441: Permission denied (publickey).
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Could anyone please help how to solve this
The possible causes for a Permission denied (publickey) error are:
Your key expired and Compute Engine deleted your ~/.ssh/authorized_keys file.
You used an SSH key stored in metadata to connect to a VM that has OS Login enabled.
You used an SSH key stored in an OS Login profile to connect to a VM that doesn't have OS Login enabled.
You connected using a third-party tool and your SSH command is misconfigured.
The sshd daemon isn't running or isn't configured properly.
It looks like the first one fits the best for you. To solve this error add the SSH keys as it is explained in this link
I'm creating a new VM instance. I've clean all the meta data. Then I'm running the following command in the cloud shell:
gcloud beta compute ssh --zone "europe-west2-c" "vmname" --project "myprojectname"
then I've been asking to enter a passphrase (which I don't know). I press enter until I get the following error Permission denied (publickey) error
I've delete and recreated my instance multiple time but I always have the same error. What should I do?
Troubleshooting Steps:
Logon using UI ssh. This creates an ephemeral ssh key, Google Agent also executes the codepath to refresh .ssh/authorized_keys and address any invalid dir/file permissions for both .ssh/ and .ssh/authorized_keys. This approach will address common gcloud compute ssh issues that relates to corrupted keys, missing dir/file or invalid dir/file permission. Try the gcloud again after performing the UI ssh.
Make sure that account has authenticated to gcloud as an IAM user with the compute instance admin role; for example, run gcloud auth revoke --all, gcloud auth login [IAM-USER] then try gcloud compute ssh again.
Verify that persistent SSH Keys metadata for gcloud is set for either the project or instance. Look in Compute Engine > Metadata, then click SSH Keys. Persistent keys do not have the expireOn attribute.
It's possible the account has lost the private key, mismatched a keypair, etc. You can force gcloud to generate a new SSH keypair by doing the following:
Move ~/.ssh/google_compute_engine and ~/.ssh/google_compute_engine.pub if present.
For example:
mv ~/.ssh/google_compute_engine.pub ~/.ssh/google_compute_engine.pub.old
mv ~/.ssh/google_compute_engine ~/.ssh/google_compute_engine.old
Try gcloud compute ssh [INSTANCE-NAME] again. A new keypair will be created and the public key will be added to the SSH keys metadata.
Verify that the Linux Google Agent scripts are installed, up-to-date, and running. See Determining Google Agent Status. If the Linux Google Agent is not installed, re-install it. See guest-environment.
Verify account home owner/permission is correct. Make sure that account home directory has the correct ownership and is not globally writable. If not using os-login (which is default), your's .ssh folder must have mode 0700, .ssh/authorized_keys file must have mode 0600. Review /var/log/auth.log for any errors.
Commands:
sudo chmod 700 /home/[user-id]/.ssh
sudo chmod 600 /home/[user-id]/.ssh/authorized_keys
If os-login is enabled and the Virtual Machine instance is using a service account (default). Add the following roles to the account.
roles/compute.osLogin
roles/iam.serviceAccountUser
For more information troubleshooting SSH.
The possible causes for a Permission denied (publickey) error are:
Your key expired and Compute Engine deleted your
~/.ssh/authorized_keys file.
You used an SSH key stored in metadata to connect to a VM that has
OS Login enabled.
You used an SSH key stored in an OS Login profile to connect to a VM
that doesn't have OS Login enabled.
You connected using a third-party tool and your SSH command is
misconfigured.
The sshd daemon isn't running or isn't configured properly.
You can find more information on how to troubleshoot SSH key errors in this link
I have the same issue sometimes . Cause and solution according to GCP troubleshooting link is:
Your key expired and Compute Engine deleted your
~/.ssh/authorized_keys file. If you manually added SSH keys to your VM
and then connected to your VM using the Google Cloud Console, Compute
Engine created a new key pair for your connection. After the new key
pair expired, Compute Engine deleted your ~/.ssh/authorized_keys file
in the VM, which included your manually added SSH key.
To resolve this issue, try one of the following:
Connect to your VM using the Google Cloud Console or the gcloud
command-line tool. Re-add your SSH key to metadata. For more information, see Add SSH keys to VMs that use metadata-based SSH keys.
I use terraform so in this case I instructed the workflow to destroy the VM and rebuild it.
To fix this issue when you cannot start ssh:
Edit VM and enable Serial port
Start serial console
Edit ~/.ssh/authorized_keys
On your desktop/client,
edit /Users/[yourdesktopuser]/.ssh/id_rsa.pub
copy contents to clipboard
Paste this content to the end of authorized_keys file in the VM serial console
Save and close
This will then recognize the public key from your desktop
Hello when creating the instance i have missed to attach a private key to the aws ec2 instance now unable to login via ssh as there is no private key attached
what i did was clone of instance and launched installed and added the key to that instance
added key to it yet dint work
refereed articles https://www.youtube.com/watch?v=XfOsytNUq1w
If you're connecting via the command line ensure that you're specifying the PEM key using the syntax below
ssh -i path/to/key.pem ec2-user#1.2.3.4
Also ensure that the path/to/key.pem has permissions of 400 with the owner as your user.
You can validate this by running ls -lah path/to/key.pem and change the permissions by running chmod 400 path/to/key.pem
I have put together a distributed setup at my university using the Distributed package that comes with Julia for running some intensive simulations. I usually launch workers on local machines through ssh using addprocs.
I have launched an c5.24xlarge EC2 instance. The aws_key.pem file exists and I have done
chmod 400 aws_key.pem
I am able to ssh into the instance just fine.
I am trying to add workers with the following code
workervec2 = [("ubuntu#ec2-xxxx:22", 24)]
addprocs(workervec2 ; sshflags="-i aws_key.pem",
tunnel=true, exename="/home/ubuntu/julia-1.0.4/bin/julia",
dir="/home/ubuntu/simulator")
I am trying to add additional workers on my Amazon EC2 instances, but I am failing with the following error
Warning: Identity file aws_key.pem not accessible: No such file or directory.
ubuntu#ec2-xxxx: Permission denied (publickey).
ERROR: LoadError: Unable to read host:port string from worker. Launch command exited with error?
The warning comes even when launching workers on the local machines, but the launch goes through. However, launching on my EC2 instance fails with the following error, while I am able to ssh from the terminal. What is going wrong?
Adding the ssh key from my local machine to the EC2 instance did the trick. This helped.
Then, workers can be added as usual
workervec2 = [("ubuntu#ec2-xxxx:22", 24)]
addprocs(workervec2 ; sshflags="-i ~/.ssh/id_rsa.pub",
tunnel=true, exename="/home/ubuntu/julia-1.0.4/bin/julia",
dir="/home/ubuntu/simulator")
When following the tutorial instructions for connecting to my JobFlow in EMR, I type following:
./elastic-mapreduce --jobflow j-3FLVMX9CYE5L6 --ssh
and get this error:
Permission denied (publickey)
I'm already able to run other elastic-mapreduce commands just fine to create flows etc, so I'm assuming there's security settings required on the actual master instance for the flow, but nothing in the tutorial explains how to configure this (after all, I need to SSH into it to do the configuration in the first place!)
I found that I need to login as user "hadoop" using the EC2 keypair, and not any of the regular suspects (ec2-user, root, etc.) Like:
ssh -i privatekey.pem hadoop#masternode
Hope this is useful to someone.
Ok now I feel sheepish: I was using the Amazon CloudFront keypair from the my initial account setup rather than keypair associated with my account for accessing EC2 instances, accessible from EC2 > Network & Security > Key Pairs in the AWS Management Console.
The command "ssh -i privatekey.pem hadoop#masternode" worked great. The user "hadoop" must be used for "ec2 elastic mapreduce".