Why can't Chef node find a private key? - amazon-web-services

Im trying to install chef on my AWS ec2 instances. Im using one ec2 as a workstation, another as a node and hosted chef as the chef server.
On the workstation, Im able to create a simple project (LAMP stack) and upload it to the chef server.
When I run knife bootstrap on the workstation with the key pair for the nodes ec2 instance, it successfully converges.
Chef Client finished, 1/1 resources updated in 01 minutes 23 seconds
When I go to the chef node however and run chef-client, I get the following error:
Private Key Not Found:
----------------------
Your private key could not be loaded. If the key file exists, ensure that it is
readable by chef-client.
Relevant Config Settings:
-------------------------
validation_key "/etc/chef/validation.pem"
System Info:
------------
chef_version=14.12.9
ruby=ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux]
program_name=/usr/bin/chef-client
executable=/opt/chef/bin/chef-client
Running handlers:
[2019-05-17T21:35:04+00:00] ERROR: Running exception handlers
Running handlers complete
[2019-05-17T21:35:04+00:00] ERROR: Exception handlers complete
Chef Client failed. 0 resources updated in 00 seconds
[2019-05-17T21:35:04+00:00] FATAL: Stacktrace dumped to /home/ubuntu/.chef/cache/chef-stacktrace.out
[2019-05-17T21:35:04+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2019-05-17T21:35:04+00:00] FATAL: Chef::Exceptions::PrivateKeyMissing: I cannot read /etc/chef/client.pem, which you told me to use to sign requests!
On the workstation, I run knife list and I can see my node and validator listed. In hosted chef I can't see the chef node listed under nodes.
Please can you help me understand the error and how to fix it?
I was under the impression that bootstrap takes care of the nodes certificates so Im surprised the node can't load the private key.
Thank you

Related

AWS OpsWorks setup_failed for Instance - unable to deploy_branch

I've had a remote dashboard running fine for a couple of years (written for me by an external developer). It runs on an EC2 instance and is configured using OpsWorks.
Today it's not working, and I see in OpsWorks that the instance is showing as setup_failed.
According to the logs it fails here:
[2021-07-02T15:00:59+00:00] FATAL: Stacktrace dumped to /var/chef/runs/18bc4301-71c1-4393-bb26-eae958791d5a/local-mode-cache/cache/chef-stacktrace.out
[2021-07-02T15:00:59+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2021-07-02T15:00:59+00:00] ERROR: deploy_branch[/srv/api] (iparcelbox::deploy-api line 45) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '255'
---- Begin output of git fetch origin ----
STDOUT:
STDERR: error: cannot open .git/FETCH_HEAD: Permission denied
---- End output of git fetch origin ----
Ran git fetch origin returned 255
I've checked the recipe file for iparcelbox::deploy-api and line 45 calls a deploy_branch:
deploy_branch server_path do
user userName
group groupName
repository node[:iparcelbox][:git_url]
revision node[:iparcelbox][:revision]
enable_submodules false
migrate false
shallow_clone true
git_ssh_wrapper "/tmp/api_git_wrapper.sh"
rollback_on_error false
keep_releases 5
symlink_before_migrate.clear
purge_before_symlink purge_dirs
create_dirs_before_symlink []
symlinks({})
action :deploy
end
So as I understand it, the deploy_branch is trying to fetch a git repo, and for some reason it's failing? I've checked my GitHub repository for the source files and I can see an ssh 'Deploy Key' which is showing as used within the last week.
If anyone could give me any suggestions as to what else to try, it would be much appreciated!
I found an answer to this - I thought the issue was permission denied accessing the git repository, but actually it was because the destination folder on my instance had modified ownership. Setting the ownership back to that specified in the Chef recipe using chown allowed the setup to complete successfully.

"puppet agent --test" on client machine aren't getting manifest from the Puppet master server

Issue
So I have two AWS instances: a Puppet master and a Puppet client. When I run sudo puppet agent --test on my client, the tasks defined in my master's manifest didn't apply to the client instance.
Where I am right now
puppetmaster is installed on the master instance
puppet is installed on client instance
Master just finished signing my client's certificate. No errors were displayed
Master has a /etc/puppet/manifests/site.pp
Client's puppet.conf file has a server=dns_of_master line
My Puppet version is 5.4.0. I'm using the default manifest configuration.
Here's the guide that I'm following: https://www.digitalocean.com/community/tutorials/getting-started-with-puppet-code-manifests-and-modules. The only changes are the site.pp content and that I'm using AWS.
If it helps, here's my AWS instances' AMI: ami-06d51e91cea0dac8d
Details
Here's the content on my master's /etc/puppet/manifests/site.pp:
node default {
package { 'nginx':
ensure => installed
}
service { 'nginx':
ensure => running,
require => Package['nginx']
}
file { '/tmp/hello_world':
ensure => present,
content => 'Hello, World!'
}
}
The file has a permission of 777.
Here's the ouput when I run sudo puppet agent --test. This is after I ran sudo puppet agent --enable:
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Caching catalog for my_client_dns
Info: Applying configuration version '1578968015'
Notice: Applied catalog in 0.02 seconds
I have looked at other StackOverflow posts with this issue. I know that my catalog is not getting applied due to the lack of status messages and the quick time. Unfortunately, the solutions didn't apply to my case:
My site.pp is named correctly and in the correct file path /etc/puppet/manifests
I didn't touch my master's puppet.conf file
I tried restarting the server with sudo systemctl but nothing happens
So I have fixed the issue. The guide that I was following required an older version of Ubuntu (16.4, rather than 18.4 as I'm using). This needs a different AMI than the one that I used to create the instances.

Filebeat and AWS Elasticsearch - Not Working

I have good experience in working with Elasticsearch, I have worked with version 2.4 and now trying to learn new Elasticsearch.
I am trying to implement Filebeat to send my apache and system logs to my Elasticsearch endpoint. To save my time I preferred to launch a t2.medium single node instance over AWS Elasticsearch Service under the public domain and I have attached the access policy to allow everyone to access the cluster.
The AWS Elasticsearch instance is up and running healthy.
I launched a Ubuntu(18.04) server, downloaded the filebeat tar and made the following configuration in filebeat.yml:
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["https://my-public-test-domain.ap-southeast-1.es.amazonaws.com:443"]
18.04- # Optional protocol and basic auth credentials.
#protocol: "https"
#username: "elastic"
#password: "changeme"
I enabled the required modules :
filebeat modules enable system apache
Then as per the filebeat documentation I changed the ownership of the filebeat file and started the filebeat with the following commands :
sudo chown root filebeat.yml
sudo ./filebeat -e
When I started the filebeat I faced the following permission and ownership issues :
Error loading config from file '/home/ubuntu/beats/filebeat-7.2.0-linux-x86_64/modules.d/system.yml', error invalid config: config file ("/home/ubuntu/beats/filebeat-7.2.0-linux-x86_64/modules.d/system.yml") must be owned by the user identifier (uid=0) or root
To resolve this I changed the ownership for the files which were throwing errors.
When I restarted the filebeat service , I started facing the following issue :
Connection marked as failed because the onConnect callback failed: cannot retrieve the elasticsearch license: unauthorized access, could not connect to the xpack endpoint, verify your credentials
Going through this link , I found that to work with AWS Elasticsearch I will need Beats OSS versions.
So I again downloaded the OSS version for beat from this link and followed the same procedure as above, but still no luck. Now I am facing the following errors :
Error 1:
Attempting to reconnect to backoff(elasticsearch(https://my-public-test-domain.ap-southeast-1.es.amazonaws.com:443)) with 12 reconnect attempt(s)
Error 2:
Failed to connect to backoff(elasticsearch(https://my-public-test-domain.ap-southeast-1.es.amazonaws.com:443)): Connection marked as failed because the onConnect callback failed: 1 error: Error loading pipeline for fileset system/auth: This module requires an Elasticsearch plugin that provides the geoip processor. Please visit the Elasticsearch documentation for instructions on how to install this plugin. Response body: {"error":{"root_cause":[{"type":"parse_exception","reason":"No processor type exists with name [geoip]","header":{"processor_type":"geoip"}}],"type":"parse_exception","reason":"No processor type exists with name [geoip]","header":{"processor_type":"geoip"}},"status":400}
From the second error I can understand that the geoip plugin is not available because of which I facing this error.
What else needs to be done to get this working?
Has anyone been to successfully connect Beats to AWS Elasticsearch?
What other steps I could to take to mitigate the above issue?
Envrionment Details:
AWS Elasticsearch Version : 6.7
File Beat : 7.2.0
First, you need to use OSS version of filebeat with AWS ES https://www.elastic.co/downloads/beats/filebeat-oss
Second, AWS ElasticSearch does not provide GeoIP module, so you will need to edit pipelines for any of the default modules you want to use, and make sure GeoIP is removed/commented out.
For example in /usr/share/filebeat/module/system/auth/ingest/pipeline.json (that's the path when installed from deb package - your path will be different of course) comment out:
{
"geoip": {
"field": "source.ip",
"target_field": "source.geo",
"ignore_failure": true
}
},
Repeat the same for apache module.
I've spent hours trying to make filebeat iis module works with AWS elasticsearch. I kept getting ingest-geoip error, Below fixed the issue.
For windows iis logs, AWS elasticsearch remove geoip from filebeat module configuration:
C:\Program Files (x86)\filebeat\module\iis\access\ingest\default.json
C:\Program Files (x86)\filebeat\module\iis\access\manifest.yml
C:\Program Files (x86)\filebeat\module\iis\error\ingest\default.json
C:\Program Files (x86)\filebeat\module\iis\error\manifest.yml

Cannot access cloudera manager on port 7180

Installing Cloudera Manager on an AWS EC2 instance, following the official instruction:
http://www.cloudera.com/documentation/archive/manager/4-x/4-6-0/Cloudera-Manager-Installation-Guide/cmig_install_on_EC2.html
I successfully run the .bin package, but when I visit the IP:7180 , the browser says my access has been denied...Why ...
I tried to confirm the status of cm server: service cloudera-scm-server status. At first it said
cloudera-scm-server is dead and pid file exists
The log file showed mentioned "unknown host ip-10-0-0-110" then I add a map between ip-10-0-0-110 and the EC2 instance **public** ip. Then restart the scm-server service. It could run normally, but the IP:7180 remained unaccessable, saying ERR_CONNECTION_REFUSED. I have uninstalled both the iptables and closed my windows firewall.
After a few minute, cloudera-scm-server is dead and pid file exists appeared again...
Using: tail -40 /var/log/cloudera-scm-server/cloudera-scm-server.out
JAVA_HOME=/usr/lib/jvm/java-7-oracle-cloudera Java HotSpot(TM) 64-Bit
Server VM warning: INFO: os::commit_memory(0x0000000794223000,
319201280, 0) failed; error='Cannot allocate memory' (errno=12)
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (malloc) failed to allocate 319201280 bytes for committing reserved memory.
An error report file with more information is saved as:
/tmp/hs_err_pid5523.log
What type of EC2 instance are you using? The error is pretty descriptive and indicates that CM is unable to access memory. Maybe you are using an instance type with too little RAM.
Also - the docs you are referencing are out of date. The latest docs on deploying CDH5 in the cloud can be found here: https://www.cloudera.com/documentation/director/latest/topics/director_get_started_aws.html
These docs also recommend using Cloudera Director which will simplify much of the deployment and configuration of your cluster.

Not able to Start/Stop Spark Worker from Remote Machine

I have two machines A and B. I am trying to run Spark Master on machine A and Spark Worker on machine B.
I have set machine B's host name in conf/slaves in my Spark directory.
When I am executing start-all.sh to start master and workers, I am getting below message on console:
abc#abc-vostro:~/spark-scala-2.10$ sudo sh bin/start-all.sh
sudo: /etc/sudoers.d is world writable
starting spark.deploy.master.Master, logging to /home/abc/spark-scala-2.10/bin/../logs/spark-root-spark.deploy.master.Master-1-abc-vostro.out
13/09/11 14:54:29 WARN spark.Utils: Your hostname, abc-vostro resolves to a loopback address: 127.0.1.1; using 1XY.1XY.Y.Y instead (on interface wlan2)
13/09/11 14:54:29 WARN spark.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Master IP: abc-vostro
cd /home/abc/spark-scala-2.10/bin/.. ; /home/abc/spark-scala-2.10/bin/start-slave.sh 1 spark://abc-vostro:7077
xyz#1XX.1XX.X.X's password:
xyz#1XX.1XX.X.X: bash: line 0: cd: /home/abc/spark-scala-2.10/bin/..: No such file or directory
xyz#1XX.1XX.X.X: bash: /home/abc/spark-scala-2.10/bin/start-slave.sh: No such file or directory
Master is started but worker is failed to start.
I have set xyz#1XX.1XX.X.X in conf/slaves in my Spark directory.
Can anyone help me to resolve this? This is probably something I'm missing any configuration on my end.
However when I create Spark Master and Worker on same machine, It is working fine.
Have you copied all Spark's files at the worker too? Also you need to setup password less access b/w master and worker.
Here were steps I would follow,
Setting up public key authentication over SSH
Checking /etc/spark/conf.dist/spark-env.sh
scp this to your computer B from computer A (master)
Set conf/slaves, hostname for computer B
./start-all.sh
For standalone cluster mode, you may set these option in spark-env.sh.
For example,
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=4G
see SSH ACCESS, in hadoop multinode cluster setup by michael. just like that .... will solve ur probs..
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/