I have several salt states(base and pillars) already written and present in Amazon s3. I want to re-use the salt states instead of writing the salt state again. I want to create an AMI image using packer and apply the salt-states that I have downloaded from s3 to the Packer Builder EC2 instance. Even if the salt-minion is installed on the CentOS -7 machine, I have installed salt-master service as well and started both salt-minion and salt-master by following commands.
cat > /etc/salt/minion.d/minion_id.conf <<'EOT' id: ${host} # id salt-minion id EOT
Generate the name of the master to connect to
cat > /etc/salt/minion.d/master_name.conf <<'EOT' master: localhost EOT
systemctl enable salt-minion
systemctl start salt-minion
systemctl enable salt-master
systemctl start salt-master
When running the below command it doesn't list any minions:
salt-key -L Accepted Keys: Denied Keys: Unaccepted Keys: Rejected Keys:
So the salt 'localhost-*' state.sls state.high_state
fails with errors:
"No minions matched the target. No command was sent, no jid was assigned.
ERROR: No return received"
This is because no minionid is created from salt-key.
Anybody has any idea why the salt-key is not being shown with salt-minion and how can i resolve this issue by running the existing salt-state successfully downloaded from s3 will work in AMI image?
Regards
Pradeep
What could be happening is that your minions can't find (resolve/DNS) the master salt.
What you could do is add the IP of your master to your minions /etc/salt/minion something like this:
master: 10.0.0.1
Replace 10.0.0.3 with the IP of your master
Later restart your minion and check the master again for requests.
Related
I am new in aws, i lunch an redhat instance on aws with free-tier, i logged with ssh client.
My ip starts with like this ec2-user#ec.....bla.com
that is mean i logged with ec2-user, when i try to run some service inside the instance machine, It ask me for root password.
Can anyone tell me what is the root password? i couldn't figure out this yet
Here you go to see some example:
[ec2-user#ip-172--my-aws-ip---34 ~]$ systemctl start docker.service
==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ====
Authentication is required to start 'docker.service'.
Authenticating as: root
Password:
using sudo su - to swtich to root and execute above command or execute as sudo systemctl start docker.service
I can do this successfully:
Bundle my app into a docker image
Build this image into a container using Google Cloud Build upon push to master
(This container is stored in the registry at, for example, gcr.io/my-project/my-container)
Deply this container to the web using Google Cloud Run
Visit the Cloud Run url and see my website
I am now trying more sophisticated builds and I think the next step is to use Google Compute Engine.
To start, I am simply trying to deploy a single instance of the same app that I deployed to Cloud Run:
Navigate to Compute Engine > VM Instances
Enter basics like instance name
Enter my container location under "Container Image": gcr.io/my-project/my-container
(As an aside, I find it suspect that the interface does not offer a selector for your existing Container Registry items here.)
Select "Allow HTTP Traffic" and "Allow HTTPS Traffic"
Click "Create"
GCE takes a minute to create it, and then it shows the green checkmark and the instance name, and "External IP: 35.238.xxx.xxx". I visit that URL in my browser and get... "35.238.xxx.xxx refused to connect."
To inspect, I go back to the GCE page and select "SSH > Open in browser window" next to my instance, which opens a type of cloud terminal to the machine.
In this terminal window, type ps and see that no processes are running. The container Dockerfile ends with CMD yarn start:prod, so I guess that's not happening here.
Further, I ls here and there and navigate around, and see that there is no /app directory from my Dockerfile's WORKDIR /app command. It seems like not only did my app not boot, but was the container not copied to the VM instance?
What am I doing wrong?
For anyone having this issue. I faced the same problem and couldn't figure it out.
Reading Serhii's answer give me the clue. I believe as of today (Jan 2021) the GCP Console UI is a bit unhelpful. It appears that if you type in a container name when creating your VM but WITHOUT specifying a tag on the end, it doesn't complain nor assume a default such as 'latest', it just fails silently. Hence the VM but with no docker container running.
At least it this now works for me, hopefully this helps others.
Check whether your VM has an external IP address.
If it doesn't, the VM might not have network access to the public repository and even to the Google Container Registry (gcr.io) and the docker container doesn't start silently.
I've decided to follow Deploying a container on a new VM instance again.
Please find my steps and commands below:
create a new VM that runs the Docker image gcr.io/cloud-marketplace/google/nginx1:latest with network tag http-server:
$ gcloud compute instances create-with-container instance-3 --tags=http-server,https-server --container-image=gcr.io/cloud-marketplace/google/nginx1:latest
Created [https://www.googleapis.com/compute/v1/projects/test-prj/zones/europe-west3-a/instances/instance-3].
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
instance-3 europe-west3-a n1-standard-1 10.156.0.30 35.XXX.111.XXX RUNNING
create a new firewall rule:
$ gcloud compute firewall-rules create default-allow-http --direction=INGRESS --priority=1000 --network=default --action=ALLOW --rules=tcp:80 --source-ranges=0.0.0.0/0 --target-tags=http-server
Creating firewall...⠹
Created [https://www.googleapis.com/compute/v1/projects/test-prj/global/firewalls/default-allow-http].
Creating firewall...done.
NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED
default-allow-http default INGRESS 1000 tcp:80 False
check current firewall rules:
$ nmap -Pn 35.XXX.111.XXX
Starting Nmap 7.70 ( https://nmap.org ) at 2020-04-02 12:04 CEST
PORT STATE SERVICE
...
80/tcp open http
check if NGINX is running in the container:
$ curl -I http://35.XXX.111.XXX
HTTP/1.1 200 OK
Server: nginx/1.16.1
...
$ curl http://35.XXX.111.XXX
...
<h1>Welcome to nginx!</h1>
...
also via web browser at http://35.XXX.111.XXX
check status of the container:
$ gcloud compute ssh instance-3
...
instance-3 ~ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
...
a657c8871239 gcr.io/cloud-marketplace/google/nginx1:latest "/usr/local/bin/dock…" 14 minutes ago Up 14 minutes klt-instance-3-uwtu
attach to the container and run curl http://35.XXX.111.XXX in the separate terminal:
instance-3 ~ $ docker attach a657c8871239
YY.YY.43.203 - - [02/Apr/2020:10:18:06 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.64.0" "-"
YY.YY.43.203 - - [02/Apr/2020:10:18:07 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.64.0" "-"
I found no errors while following documentation.
To solve your issue:
Compare your steps and commands to mine.
Run test Docker image by following documentation on your project.
Try to replicate steps from documentation with your custom image.
If you still have issue - update your question with all your steps, commands and outputs.
I also had the problem, the instance was running, but could not pull my container.
Error: Failed to start container: Error response from daemon:
{"message":"unautho rized: You don't have the needed permissions to
perform this operation, and you may have invalid credentials. To
authenticate your request, follow the steps in:
https://cloud.google.com/container-registry/docs/advanced-authentication"
I had to add some extra scope to the yaml file : https://www.googleapis.com/auth/source.full_control
steps:
- name: gcr.io/cloud-builders/docker
args: ['build', '-t', 'gcr.io/local-xxxxxxxxxxxxxx/apptraining', '.']
- name: 'gcr.io/cloud-builders/docker'
args: ["push", "gcr.io/local-xxxxxxxxxxxxxx/apptraining"]
- name: 'gcr.io/cloud-builders/gcloud'
args: ['compute', 'instances', 'create-with-container', 'instanceapptraining', '--machine-type=n1-standard-1', '--scopes=https://www.googleapis.com/auth/devstorage.full_control,https://www.googleapis.com/auth/trace.append,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/datastore,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/trace.append,https://www.googleapis.com/auth/source.full_control,https://www.googleapis.com/auth/source.read_only,https://www.googleapis.com/auth/compute.readonly','--zone=us-central1-a', '--preemptible', '--container-image=gcr.io/local-xxxxxxxxxxxxxx/apptraining:latest']
While registering a host to the cluster of Ambari-server, I am getting the following error.
"Host checks were skipped on 1 hosts that failed to register."
I'm trying to install HDP 2.5 version on the instance of AWS.
I have tried to follow the documentation of Hortonworks.
https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.0.3/bk_ambari-installation/content/set_the_hostname.html
I have added public ip address and public hostname to /etc/hosts file and change the name of host in /etc/hostname file on the server and on the host. Rebooted both, hostname got changed. Then I have stop iptables by
sudo service iptables stop
After doing everything, the host registration is still failing. Kindly help. I am stuck.
Background
From my experience with Ambari (Hortonworks) you have to explicitly setup your Hadoop nodes in each other's /etc/hosts file with the actual name/IPs that the Hadoop services will bind to. NOTE: hostnames should also be FQDN - fully qualified domain names.
For example if you're setting up the hosts as:
node01.mydom.com (10.0.0.2)
node02.mydom.com (10.0.0.3)
node03.mydom.com (10.0.0.4)
These entries should be in all 3 server's /etc/hosts and these should be the names used when referencing them within Ambari's installation/setup wizards.
If you do not pay special attention to this detail, Ambari's server will fail to find/manage any of the other node's that you're telling it to manage.
hostname of ambari-agents
The other item to look at is that the ambari-agent's and what hostnames they think they're going as.
$ ps -eaf|grep ambari_agent
root 3282 1 0 Jul30 ? 00:00:00 /usr/bin/python /usr/lib/python2.6/site-packages/ambari_agent/AmbariAgent.py start --expected-hostname=node01.mydom.com
root 3290 3282 1 Jul30 ? 08:24:29 /usr/bin/python /usr/lib/python2.6/site-packages/ambari_agent/main.py start --expected-hostname=node01.mydom.com
Debugging further
In the screen where you're attempting to register the other nodes as agents, there's a full log of what's happening and you can typically get the commands from this area and attempt to run them directly. I've done this on a number of occasions. The commands will often be python ... commands which you can then copy/paste from the logs and run on the Ambari server where you're attempting to run the install.
I have a django site which I would like to deploy to a Digital Ocean server everytime a branch is merged to master. I have it mostly working and have followed this tutorial.
.travis.yml
language: python
python:
- '2.7'
env:
- DJANGO_VERSION=1.10.3
addons:
ssh_known_hosts: mywebsite.com
git:
depth: false
before_install:
- openssl aes-256-cbc -K *removed encryption details* -in travis_rsa.enc -out travis_rsa -d
- chmod 600 travis_rsa
install:
- pip install -r backend/requirements.txt
- pip install -q Django==$DJANGO_VERSION
before_script:
- cp backend/local.env backend/.env
script: python manage.py test
deploy:
skip_cleanup: true
provider: script
script: "./travis-deploy.sh"
on:
all_branches: true
travis-deploy.sh - runs when the travis 'deploy' task calls it
#!/bin/bash
# print outputs and exit on first failure
set -xe
if [ $TRAVIS_BRANCH == "master" ] ; then
# setup ssh agent, git config and remote
echo -e "Host mywebsite.com\n\tStrictHostKeyChecking no\n" >> ~/.ssh/config
eval "$(ssh-agent -s)"
ssh-add travis_rsa
git remote add deploy "travis#mywebsite.com:/home/dean/se_dockets"
git config user.name "Travis CI"
git config user.email "travis#mywebsite.com"
git add .
git status # debug
git commit -m "Deploy compressed files"
git push -f deploy HEAD:master
echo "Git Push Done"
ssh -i travis_rsa -o UserKnownHostsFile=/dev/null travis#mywebsite.com 'cd /home/dean/se_dockets/backend; echo hello; ./on_update.sh'
else
echo "No deploy script for branch '$TRAVIS_BRANCH'"
fi
Everything works find until things get to the 'deploy' stage. I keep getting error messages like:
###########################################################
# WARNING: POSSIBLE DNS SPOOFING DETECTED! #
###########################################################
The ECDSA host key for mywebsite.com has changed,
and the key for the corresponding IP address *REDACTED FOR STACK OVERFLOW*
is unknown. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
###########################################################
# WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! #
###########################################################
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
* REDACTED FOR STACK OVERFLOW *
Please contact your system administrator.
Add correct host key in /home/travis/.ssh/known_hosts to get rid of this message.
Offending RSA key in /home/travis/.ssh/known_hosts:11
remove with: ssh-keygen -f "/home/travis/.ssh/known_hosts" -R mywebsite.com
Password authentication is disabled to avoid man-in-the-middle attacks.
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Permission denied (publickey,password).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Script failed with status 128
INTERESTINGLY - If I re-run this job the 'git push' command will succeed at pushing to the deploy remote (my server). However, the next step in the deploy script which is to SSH into the server and run some postupdate commands will fail for the same reason (hosts fingerprint change or something). Or, it will ask for travis#mywebsite.com password (it has none) and will hang on the input prompt.
Additionally when I debug the Travis CI build and use the SSH url you're given to SSH into the machine Travis CI runs on - I can SSH into my own server from it. However it takes multiple tries to get around the errors.
So - this seems to be a fluid problem with stuff persisting from builds into the next on retries causing different errors/endings.
As you can see in my .yml file and the deploy script I have attempted to disable various host name checks and added the domain to known hosts etc... all to no avail.
I know I have things 99% set up correctly as things do mostly succeed when I retry the job a few times.
Anyone seen this before?
Cheers,
Dean
I am setting up a Kubernetes deployment using auto-scaling groups and Terraform. The kube master node is behind an ELB to get some reliability in case of something going wrong. The ELB has the health check set to tcp 6443, and tcp listeners for 8080, 6443, and 9898. All of the instances and the load balancer belong to a security group that allows all traffic between members of the group, plus public traffic from the NAT Gateway address. I created my AMI using the following script (from the getting started guide)...
# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
# apt-get update
# # Install docker if you don't have it already.
# apt-get install -y docker.io
# apt-get install -y kubelet kubeadm kubectl kubernetes-cni
I use the following user data scripts...
kube master
#!/bin/bash
rm -rf /etc/kubernetes/*
rm -rf /var/lib/kubelet/*
kubeadm init \
--external-etcd-endpoints=http://${etcd_elb}:2379 \
--token=${token} \
--use-kubernetes-version=${k8s_version} \
--api-external-dns-names=kmaster.${master_elb_dns} \
--cloud-provider=aws
until kubectl cluster-info
do
sleep 1
done
kubectl apply -f https://git.io/weave-kube
kube node
#!/bin/bash
rm -rf /etc/kubernetes/*
rm -rf /var/lib/kubelet/*
until kubeadm join --token=${token} kmaster.${master_elb_dns}
do
sleep 1
done
Everything seems to work properly. The master comes up and responds to kubectl commands, with pods for discovery, dns, weave, controller-manager, api-server, and scheduler. kubeadm has the following output on the node...
Running pre-flight checks
<util/tokens> validating provided token
<node/discovery> created cluster info discovery client, requesting info from "http://kmaster.jenkins.learnvest.net:9898/cluster-info/v1/?token-id=eb31c0"
node/discovery> failed to request cluster info, will try again: [Get http://kmaster.jenkins.learnvest.net:9898/cluster-info/v1/?token-id=eb31c0: EOF]
<node/discovery> cluster info object received, verifying signature using given token
<node/discovery> cluster info signature and contents are valid, will use API endpoints [https://10.253.129.106:6443]
<node/bootstrap> trying to connect to endpoint https://10.253.129.106:6443
<node/bootstrap> detected server version v1.4.4
<node/bootstrap> successfully established connection with endpoint https://10.253.129.106:6443
<node/csr> created API client to obtain unique certificate for this node, generating keys and certificate signing request
<node/csr> received signed certificate from the API server:
Issuer: CN=kubernetes | Subject: CN=system:node:ip-10-253-130-44 | CA: false
Not before: 2016-10-27 18:46:00 +0000 UTC Not After: 2017-10-27 18:46:00 +0000 UTC
<node/csr> generating kubelet configuration
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
Node join complete:
* Certificate signing request sent to master and response
received.
* Kubelet informed of new secure connection details.
Run 'kubectl get nodes' on the master to see this machine join.
Unfortunately, running kubectl get nodes on the master only returns itself as a node. The only interesting thing I see in /var/log/syslog is
Oct 27 21:19:28 ip-10-252-39-25 kubelet[19972]: E1027 21:19:28.198736 19972 eviction_manager.go:162] eviction manager: unexpected err: failed GetNode: node 'ip-10-253-130-44' not found
Oct 27 21:19:31 ip-10-252-39-25 kubelet[19972]: E1027 21:19:31.778521 19972 kubelet_node_status.go:301] Error updating node status, will retry: error getting node "ip-10-253-130-44": nodes "ip-10-253-130-44" not found
I am really not sure where to look...
The Hostnames of the two machines (master and the node) should be different. You can check them by running cat /etc/hostname. If they do happen to be the same, edit that file to make them different and then do a sudo reboot to apply the changes. Otherwise kubeadm will not be able to differentiate between the two machines and it will show as a single one in kubectl get nodes.
Yes , I faced the same problem.
I resolved by:
killall kubelet
run the kubectl join command again
and start the kubelet service