Puppet agents aren't applying changes from PuppetMaster - amazon-web-services

We have a deployment in AWS where we have a single PuppetMaster box that services hundreds of other servers within the AWS ecosystem. Starting yesterday, we noticed that puppet changes were not applying to the agents. At first we thought it was only newly provisioned boxes, but now we see that we simply aren't getting any error message on any of the machines where puppet agent runs.
# puppet agent --test --verbose
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for blarg-follower-0e5385bace7e84fe2
Info: Applying configuration version '1529498155'
Notice: Finished catalog run in 0.24 seconds
I have access to the PuppetMaster and have validated that the code there is up to date. Need help figuring out how to get better logging out of this and debugging what is wrong between the agent and the puppet master.

In this case the issue was that our Puppet Master's /etc/puppet/puppet.conf file had been modified, and indeed the agents weren't getting the full catalog from Puppet Master. We found a backup copy of the file, restored it, and we were back in business.

Related

Neo4J online backup error on AWS - Failed to run a backup using the available strategies

I'm testing neo4j enterprise 3.3.3 on AWS and trying to run an online backup on a db, which is located on a different server.
I run on my AWS instance:
neo4j-admin backup --backup-dir=~/backup --name=graph.db-backup --from=0.0.0.0:4444
where I change 0.0.0.0 for my open IP for the external neo4j db and 4444 for my port.
But then I get this error:
Failed to load private key: /var/lib/neo4j/certificates/neo4j.key
UPDATE
I fixed that by running the command with sudo (on Amazon AWS).
However, now I'm getting another error:
Failed to run a backup using the available strategies.
The documentation on backups says that you only need to uncomment some settings in neo4j.conf, which is what I've done, both on the server which is being backed up and the one that is actually running backup.
Could it be that the issue is because on AWS you have to run commands with
systemctl
And if so, how do I run neo4-admin with it?
It doesn't work if I use
systemctl neo4j-admin ...
Somebody from Neo4J — can you please help? Backup is one of the main reason to get the Enterprise version but there is not enough documentation on how to use it.

AWS Elastic Beanstalk with docker incorrect version

I'm deploying a docker image from Github to AWS elastic beanstalk using travis. That part goes OK, the actual deployment exits with 0 and there is a .zip file in the S3 bucket.
The issue is that, since this is my first time using AWS I created the app using the Sample Application since the code is deployed from Github, and after the deployment I get the health status as degraded (red exclamation sign) with this message:
ERROR
During an aborted deployment, some instances may have deployed the new application version. To ensure all instances are running the same version, re-deploy the appropriate application version.
If I go to Causes I find this:
Application deployment failed at 2020-05-01T16:01:58Z with exit status 1 and error: Engine execution has encountered an error.
Incorrect application version "travis-e55e05342a8cc16f3f28f8e184735667a9531ffa-1588311901" (deployment 4). Expected version "Sample Application" (deployment 1).
I even deleted the sample application and re-deployed the one that was uploaded and got that particular error. As you can see in the last message I've deployed this 3 times already, getting the same result.
Finally I downloaded the zip file from the S3 bucket and I found inside basically the src and public folders along with all the files in the root folder such as package.json, .gitignore all the docker files, etc.
EDIT
I created two separate repos in github to test this.
The first repo is a static page in a Docker container, quite simple. I create an environment in EB and start everything with the sample app. Then I push the changes to github, travis does it's thing and deploys the app to AWS. This works fine and the app's env is updated with no errors. This is the repo:
https://github.com/rhernandog/docler-static-page-aws
The second repo is a simple react app. Same procedure, create the environment in EB with the sample app. Push the code to github, travis does it's thing and deploys to AWS. This fails and I keep getting the same error:
Environment health has transitioned from Info to Degraded. Command failed on all
instances. Incorrect application version found on all instances. Expected version
"Sample Application" (deployment 1). Application update failed 1 second ago and
took 2 minutes.
This is the repo for the react app:
https://github.com/rhernandog/react-docker-awseb
In terms of Docker, everything works fine in my local machine.
EDIT 2
Based on #stefansundin suggestion I re-deployed the app to EB and check the logs. I ended looking at the full logs for more information and found this:
/var/log/cfn-hup.log
2020-05-14 17:07:42,605 [WARNING] Action for aws-eb-command-handler exited with 1, returning FAILURE
The only place where I found an error was in the engine log file:
/var/log/eb-engine.log
2020/05/14 17:07:42.514601 [INFO] Executing instruction: Docker Specific Build Application
2020/05/14 17:07:42.514605 [INFO] start build docker app
2020/05/14 17:07:42.514615 [INFO] fetch image name
2020/05/14 17:07:42.514639 [INFO] authenticate with ECR if the image is in an ECR repo
2020/05/14 17:07:42.514644 [INFO] pull docker image if update is not false in dockerrun.aws.json
2020/05/14 17:07:42.514657 [INFO] Running command /bin/sh -c docker pull node:12-alpine AS builder
2020/05/14 17:07:42.558923 [ERROR] "docker pull" requires exactly 1 argument.
So basically this is complaining about this in the dockerfile: FROM node:12-alpine AS builder. You can see the whole file in the repo: https://github.com/rhernandog/react-docker-awseb/blob/master/Dockerfile
The point is: Why this doesn't happen in my local machine? And how can I actually get the files from the build command and copy them to the nginx folder?
That is actually the only error I found in the log files.
I solved the issue here:
AWS Elastic Beanstalk Docker Does not support Multi-Stage Build
it is a stage-naming problem of multi-stage Dockerfile. Just use an Unamed one
I also got a similar error in my node app:
Incorrect application version "travis-e55e05342a8cc16f3f28f8e184735667a9531ffa-1588311901" (deployment 4). Expected version "Sample Application" (deployment 1)
What turned out to be an issue with my building and deployment scripts were corrected (debugged in Jenkins) the application successfully deploys in beanstalk with no error.
Turns out the issue was not with Beanstalk or app version but with the build mechanism. Something to look into when nothing else works :)
I had the same issue for java app in docker container.
I tried all the recommendations from this topic, links from this topic and nothing helped.
In the end, the following action helped:
Enable enhanced health panel https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/health-enhanced-enable.html#health-enhanced-enable-console
Go to the extended panel of the desired environment
Select the instance that crashed due to this "version" issue and click reboot
Additionally:
In one of the cases, I had to delete all previous versions (section on the left panel) and push a new one and only after that make the above recommendations.
Also make sure you have sufficient rights to deploy (codepipeline/deployment)
AWS Docs say that
To solve this issue, start another deployment. You can redeploy a previous version that you know works, or configure your environment to ignore health checks during deployment and redeploy the new version to force the deployment to complete.
You can also identify and terminate the instances that are running the wrong application version. Elastic Beanstalk will launch instances with the correct version to replace any instances that you terminate. Use the EB CLI health command to identify instances that are running the wrong application version.
Can you try to delete the instances that runs your applications and start a fresh install?
Also, you can use CodePipeline to deploy your codes to Elastic Beanstalk, you can use your S3 folder for the source stage and skip the build process if your code is build on travis and deploy using the deploy stage to install your new app to your Elastic Beanstalk. There might be some misconfiguration while installing the new app to your environment.
I suggest you to terminate your instances and start new instances sorry if I got your question wrong.
I haven't used Docker on Elastic Beanstalk. When my Ruby on Elastic Beanstalk deployments fail, I find that I usually find the problem if I request the 100 last lines from the logs. If you navigate to "Logs" -> "Request Logs" -> "Last 100 Lines", that may help you.
If that fails, I SSH in to the instance and look in the logs in /var/log. Maybe docker ps and docker logs may help you.
While creating a new webserver environment on platform branch select "Docker running on 64bit Amazon Linux" it will work.

Unable to push to Google Container Registry - Permission issue

I'm having the sample problem as Vaclav. I've followed the GCR quick start to the letter which entailed creating a new project (called gcr-project) and copying the code for a Flask (python) app.
After building the docker image, I entered the commands:
gcloud auth configure-docker
docker tag quickstart-image gcr.io/gcr-project/quickstart-image:tag1
docker push gcr.io/gcr-project/quickstart-image:tag1
The response was:
unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
So it would be nice to know if the issue is with the credentials (I'm using cloud SDK OK for other projects) or permissions. The documentation here suggests you need storage-admin rights but the projects already has it, see screen cap here
Would appreciate any tips for trouble shooting this as I was looking for to using the GCR but this problem is a hard stop for me.
UPDATE:
I tried the same process with the cloud shell
me#cloudshell:~ (gcr-project-XXXXXX)$ docker push gcr.io/gcr-project/quickstart-image:tag1
The push refers to repository [gcr.io/gcr-project/quickstart-image]
4399528b7213: Preparing
1d10b1eeca74: Preparing
75156020d862: Preparing
c5697656a146: Preparing
2a435270de82: Preparing
c35f70b5c25a: Waiting
28e260baaf1b: Waiting
556c5fb0d91b: Waiting
denied: Token exchange failed for project 'gcr-project'. Please enable Google Container Registry API in Cloud Console at https://console.cloud.google.com/apis/api/containerregistry.googleapis.com/overview?project=gcr-project before performing this operation.
me#cloudshell:~ (gcr-project-XXXXXX)$
This prompted me to check the API & Services dashboard to confirm the container-registry API was enabled - It is.
UPDATE 2:
I'm having these problems on a machine running ubuntu 19.04. Per the comments below I was able to do a push via the cloud shell. So I then went through the same exercise on a MacBook Pro - worked no problems.
So I then uninstalled Cloud SDK per the doco having used the standard linux install instructions previously. I then re-installed using the debian-ubuntu install instructions (version 274.0.1-0)... STILL no go.
When I do a docker pull on the image (because push worked on MBP) I get this error: Error response from daemon: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
And when I do a push I get this error: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
So at this stage, given the success on the MBP and the lack thereof on the linux/ubuntu machine, the problem is constrained to to linux/ubuntu installs.
UPDATE 3:
I got on to a separate ubuntu server, did a clean install with sudo snap install google-cloud-sdk --classic , did everything else per the docs and still had the exact same problem. So I recon this is a linux google cloud SDK specific problem.
Is there anyone out there Ubuntu land who as been able install and use cloud SDK with GCR recently?????????
I was able to replicate this issue on multiple ubuntu machines. I tried again after the most recent cloud SDK update (276.0.0) but had no luck.
In the end I went with json key file authentincati described in the docs here as a work around which worked fine.

AWS EBS Deploy: Update environment operation is complete, but with errors. For more information, see troubleshooting documentation

I've been dealing with this issue since yesterday. All working until 4 PM and suddenly after that this error keeps coming.
Creating application version archive "882a".
Uploading: [##################################################] 100% Done...
INFO: Environment update is starting.
INFO: Deploying new version to instance(s).
INFO: Command execution completed on all instances successfully.
INFO: New application version was deployed to running EC2 instances.
ERROR: Update environment operation is complete, but with errors. For more information, see troubleshooting documentation.
ERROR: Update environment operation is complete, but with errors. For more information, see troubleshooting documentation.
No logs has been logged after that. Even there are no logs coming in AWS management console. I ssh the instance and checked all the logs /var/log/, they all have logs only until 4 PM.
So this is possibly a dead end I'm stuck with. I tried rebuilding the environment but no luck. The forum discusses this, but all troubleshooting is happening behind the screens, no possible solutions provided.
Any idea on how to resolve this?

WebHCat on Amazon's EMR?

Is it possible or advisable to run WebHCat on an Amazon Elastic MapReduce cluster?
I'm new to this technology and I was wonder if it was possible to use WebHCat as a REST interface to run Hive queries. The cluster in question is running Hive.
I wasn't able to get it working but WebHCat is actually installed by default on Amazon's EMR instance.
To get it running you have to do the following,
chmod u+x /home/hadoop/hive/hcatalog/bin/hcat
chmod u+x /home/hadoop/hive/hcatalog/sbin/webhcat_server.sh
export TEMPLETON_HOME=/home/hadoop/.versions/hive-0.11.0/hcatalog/
export HCAT_PREFIX=/home/hadoop/.versions/hive-0.11.0/hcatalog/
/home/hadoop/hive/hcatalog/webhcat_server.sh start
You can then confirm that it's running on port 50111 using curl,
curl -i http://localhost:50111/templeton/v1/status
To hit 50111 on other machines you have to open the port up in the EC2 EMR security group.
You then have to configure the users you going to "proxy" when you run queries in hcatalog. I didn't actually save this configuration, but it is outlined in the WebHCat documentation. I wish they had some concrete examples there but basically I ended up configuring the local 'hadoop' user as the one that run the queries, not the most secure thing to do I am sure, but I was just trying to get it up and running.
Attempting a query then gave me this error,
{"error":"Server IPC version 9 cannot communicate with client version
4"}
The workaround was to switch off of the latest EMR image (3.0.4 with Hadoop 2.2.0) and switch to a Hadoop 1.0 image (2.4.2 with Hadoop 1.0.3).
I then hit another issues where it couldn't find the Hive jar properly, after struggling with the configuration more, I decided I had dumped enough time into trying to get this to work and decided to communicate with Hive directly (using RBHive for Ruby and JDBC for the JVM).
To answer my own question, it is possible to run WebHCat on EMR, but it's not documented at all (Googling lead me nowhere which is why I created this question in the first place, it's currently the first hit when you search "WebHCat EMR") and the WebHCat documentation leaves a lot to be desired. Getting it to work seems like a pain, though my hope is that by writing up the initial steps someone will come along and take it the rest of the way and post a complete answer.
I did not test it but, it should be doable.
EMR allows to customise the bootstrap actions, i.e. the scripts run where the nodes are started. You can use bootstrap actions to install additional software and to change the configuration of applications on the cluster
See more details at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html.
I would create a shell script to install WebHCat and test your script on a regular EC2 instance first (outside the context of EMR - just as a test to ensure your script is OK)
You can use EC2's user-data properties to test your script, typically :
#!/bin/bash
curl http://path_to_your_install_script.sh | sh
Then - once you know the script is working - make it available to the cluster on a S3 bucket and follow these instructions to include your script as custom bootstrap action of your cluster.
--Seb