I am trying to install jenkins plugins from AWS S3 bucket.
Code for installing jenkins plugins :
plugin_manager_url="https://github.com/jenkinsci/plugin-installation-manager-tool/releases/download/2.12.3/jenkins-plugin-manager-2.12.3.jar"
jpath="/var/lib/jenkins"
echo "Installing Jenkins Plugin Manager..."
wget -O $${jpath}/jenkins-plugin-manager.jar $${plugin_manager_url}
chown jenkins:jenkins $${jpath}/jenkins-plugin-manager.jar
cd $${jpath}
mkdir pluginsInstalled
aws s3 cp "s3://bucket/folder-with-plugins.zip" .
unzip folder-with-plugins.zip
echo 'Installing Jenkins Plugins...'
cd plugins/
for plugin in *.jpi; do
java -jar $${jpath}/jenkins-plugin-manager.jar --war /usr/share/java/jenkins.war --plugin-download-directory $${jpath}/pluginsInstalled --plugins $(echo $plugin | cut -f 1 -d '.')
done
chown -R jenkins:jenkins $${jpath}/pluginsInstalled
systemctl start jenkins //before installing plugins Jenkins is installed, which is up and running
IN above code snippet, I unzipped s3 bucket folder, where all plugins are inside "plugins/" folder with .jpi extention so I trimmed that extention while
installing plugins and installed plugins will be in "pluginsInstalled" folder
I have DEV and PROD aws accounts. I will build an AMI using EC2 image builder in DEV account and will share/use that AMI in prod for security reasons.
So, the userdata script for installing jenkins and plugins is part of building AMI. When I check EC2 Image builder's Build Instance, I can see userdata is installed propelry.
But, when I check same AMI which is used in PROD, then I cannot see Jenkins Plugins installed.
Jenkins Version : 2.346.2
And the error log for jenkins is,
java.lang.IllegalArgumentException: No hudson.security.AuthorizationStrategy implementation found for folderBased
at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.lambda$lookupDescriptor$11(HeteroDescribableConfigurator.java:211)
at io.vavr.control.Option.orElse(Option.java:321)
at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.lookupDescriptor(HeteroDescribableConfigurator.java:210)
at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.lambda$configure$3(HeteroDescribableConfigurator.java:84)
at io.vavr.Tuple2.apply(Tuple2.java:238)
at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.configure(HeteroDescribableConfigurator.java:83)
at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.check(HeteroDescribableConfigurator.java:92)
at io.jenkins.plugins.casc.impl.configurators.HeteroDescribableConfigurator.check(HeteroDescribableConfigurator.java:55)
at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:350)
at io.jenkins.plugins.casc.BaseConfigurator.check(BaseConfigurator.java:286)
at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$checkWith$8(ConfigurationAsCode.java:776)
at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:712)
at io.jenkins.plugins.casc.ConfigurationAsCode.checkWith(ConfigurationAsCode.java:776)
at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:761)
at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:637)
at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:306)
at io.jenkins.plugins.casc.ConfigurationAsCode.init(ConfigurationAsCode.java:298)
Caused: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:109)
Caused: java.lang.Error
at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:115)
at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:185)
at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305)
at jenkins.model.Jenkins$5.runTask(Jenkins.java:1158)
at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:222)
at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:121)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused: org.jvnet.hudson.reactor.ReactorException
at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:291)
at jenkins.InitReactorRunner.run(InitReactorRunner.java:49)
at jenkins.model.Jenkins.executeReactor(Jenkins.java:1193)
at jenkins.model.Jenkins.<init>(Jenkins.java:983)
at hudson.model.Hudson.<init>(Hudson.java:86)
at hudson.model.Hudson.<init>(Hudson.java:82)
at hudson.WebAppMain$3.run(WebAppMain.java:247)
Caused: hudson.util.HudsonFailedToLoad
at hudson.WebAppMain$3.run(WebAppMain.java:264)
When I check jenkins status on PROD where plugins installed AMI is used, somehow jenkins is not able to restart. It gives following error for jenkins status
Aug 18 21:08:40 ip-10-220-74-95.ec2.internal systemd[1]: Starting Jenkins Continuous Integration Server...
Aug 18 21:08:45 ip-10-220-74-95.ec2.internal jenkins[6656]: Exception in thread "Attach Listener" Agent failed to start!
Aug 18 21:08:50 ip-10-220-74-95.ec2.internal jenkins[6656]: WARNING: An illegal reflective access operation has occurred
Aug 18 21:08:50 ip-10-220-74-95.ec2.internal jenkins[6656]: WARNING: Illegal reflective access by org.codehaus.groovy.vmplugin.v7.Java7$...s,int)
Aug 18 21:08:50 ip-10-220-74-95.ec2.internal jenkins[6656]: WARNING: Please consider reporting this to the maintainers of org.codehaus.g...ava7$1
Aug 18 21:08:50 ip-10-220-74-95.ec2.internal jenkins[6656]: WARNING: Use --illegal-access=warn to enable warnings of further illegal ref...ations
Aug 18 21:08:50 ip-10-220-74-95.ec2.internal jenkins[6656]: WARNING: All illegal access operations will be denied in a future release
The issue was,
I was installing plugins using,
java -jar ./jenkins-plugin-manager.jar --war ./jenkins.war --plugin-download-directory <dir> --plugins <plugins_list>
Here, while it was installing plugins with latest jenkins version.
In my case, I updated targeted jenkins version I am using in our project
sudo java -jar ./jenkins-plugin-manager.jar --jenkins-version <JENNKINS_VERSION> --plugin-download-directory <dir> --plugins <plugins_list>
Related
Before I run eb create command, how can I tell Elastic Beanstalk to use a DIFFERENT docker-compose file?
For example, my project directory:
HelloWorldDocker
├──.elasticbeanstalk
│ └──config.yml
├──app/
├──proxy/
└──docker-compose.prod.yml
└──docker-compose.yml
My docker-compose.yml is what I use for local development
My docker-compose.prod.yml is what I want to use for production
Is there a way to define this configuration before running the eb create command from the EB CLI?
Stating the obvious: I realize I could use docker-compose.yml for my production file and a docker-compose.dev.yml for my local development but then running the docker-compose up command becomes more tedious locally (ie: docker-compose -f docker-compose.dev.yml up --build...). Further, I'm mainly interested if this is even possible as I'm learning Elastic Beanstalk, and how I could do it if I wanted to.
EDIT / UPDATE: June 11, 2021
I attempted to rename docker-compose.prod.yml to docker-compose.yml in .ebextensions/docker-settings.config with this:
container_commands:
rename_docker_compose:
command: mv docker-compose.prod.yml docker-compose.yml
>eb deploy:
2021-06-11 16:44:45 ERROR Instance deployment failed.
For details, see 'eb-engine.log'.
2021-06-11 16:44:45 ERROR Instance deployment: Both
'Dockerfile' and 'Dockerrun.aws.json' are missing in your
source bundle. Include at least one of them. The deployment
failed.
In eb-engine.log, I see:
2021/06/11 16:44:45.818876 [ERROR] An error occurred during
execution of command [app-deploy] - [Docker Specific Build
Application]. Stop running the command. Error: Dockerfile and
Dockerrun.aws.json are both missing, abort deployment
Based on my testing, this is due to AWS needing to call /bin/sh -c docker-compose config before getting to the later steps of container_commands.
Edit / Update #2
If I use commands instead of container_commands:
commands:
rename_docker_compose:
command: mv docker-compose.prod.yml docker-compose.yml
cwd: /var/app/staging
it does seem to do the replacement successfully:
2021-06-11 21:40:44,809 P1957 [INFO] Command find_docker_compose_file
2021-06-11 21:40:45,086 P1957 [INFO] -----------------------Command Output-----------------------
2021-06-11 21:40:45,086 P1957 [INFO] ./var/app/staging/docker-compose.prod.yml
2021-06-11 21:40:45,086 P1957 [INFO] ------------------------------------------------------------
2021-06-11 21:40:45,086 P1957 [INFO] Completed successfully.
but I still am hit with:
2021/06/11 21:40:45.192780 [ERROR] An error occurred during
execution of command [app-deploy] - [Docker Specific Build
Application]. Stop running the command. Error: Dockerfile and
Dockerrun.aws.json are both missing, abort deployment
EDIT / UPDATE: June 12, 2021
I'm on a Windows 10 machine. Before running eb deploy command locally, I opened up Git Bash which uses MINGW64 terminal. I cdd to the prebuild directory where build.sh exists. I ran:
chmod +x build.sh
If I do ls -l, it returns:
-rwxr-xr-x 1 Jarad 197121 58 Jun 12 12:31 build.sh*
I think this means the file is executable.
I then committed to git.
I then ran eb deploy.
I am seeing a build.sh: permission denied error in eb-engine.log. Below is an excerpt of the relevant portion.
...
2021/06/12 19:41:38.108528 [INFO] application/zip
2021/06/12 19:41:38.108541 [INFO] app source bundle is zip file ...
2021/06/12 19:41:38.108547 [INFO] extracting /opt/elasticbeanstalk/deployment/app_source_bundle to /var/app/staging/
2021/06/12 19:41:38.108556 [INFO] Running command /bin/sh -c /usr/bin/unzip -q -o /opt/elasticbeanstalk/deployment/app_source_bundle -d /var/app/staging/
2021/06/12 19:41:38.149125 [INFO] finished extracting /opt/elasticbeanstalk/deployment/app_source_bundle to /var/app/staging/ successfully
2021/06/12 19:41:38.149142 [INFO] Executing instruction: RunAppDeployPreBuildHooks
2021/06/12 19:41:38.149190 [INFO] Executing platform hooks in .platform/hooks/prebuild/
2021/06/12 19:41:38.149249 [INFO] Following platform hooks will be executed in order: [build.sh]
2021/06/12 19:41:38.149255 [INFO] Running platform hook: .platform/hooks/prebuild/build.sh
2021/06/12 19:41:38.149457 [ERROR] An error occurred during execution of command [app-deploy] - [RunAppDeployPreBuildHooks]. Stop running the command. Error: Command .platform/hooks/prebuild/build.sh failed with error fork/exec .platform/hooks/prebuild/build.sh: permission denied
2021/06/12 19:41:38.149464 [INFO] Executing cleanup logic
2021/06/12 19:41:38.149572 [INFO] CommandService Response: {"status":"FAILURE","api_version":"1.0","results":[{"status":"FAILURE","msg":"Engine execution has encountered an error.","returncode":1,"events":[{"msg":"Instance deployment failed. For details, see 'eb-engine.log'.","timestamp":1623526898,"severity":"ERROR"}]}]}
2021/06/12 19:41:38.149706 [INFO] Platform Engine finished execution on command: app-deploy
...
Any idea why I am getting a permission denied error?
My Conclusion From This Madness
Elastic Beanstalk's EB CLI eb deploy command does not zip files (the app_source_bundle it creates) correctly on Windows machines.
Proof
I was able to recreate Marcin's example by zipping it locally and manually uploading it through the Elastic Beanstalk online interface. When I do that and check the source bundle, it shows that build.sh does have executable permissions (-rwxr-xr-x).
[root#ip-172-31-11-170 deployment]# zipinfo app_source_bundle
Archive: app_source_bundle
Zip file size: 993 bytes, number of entries: 5
drwxr-xr-x 3.0 unx 0 bx stor 21-Jun-13 03:08 .platform/
drwxr-xr-x 3.0 unx 0 bx stor 21-Jun-13 03:08 .platform/hooks/
drwxr-xr-x 3.0 unx 0 bx stor 21-Jun-13 03:08 .platform/hooks/prebuild/
-rwxr-xr-x 3.0 unx 58 tx defN 21-Jun-13 03:09 .platform/hooks/prebuild/build.sh
-rw-r--r-- 3.0 unx 98 tx defN 21-Jun-13 03:08 docker-compose.prod.yml
When I initialize and create using the EB CLI and the exact same files, build.sh does NOT have executable permissions (-rw-rw-rw-).
[ec2-user#ip-172-31-5-39 deployment]$ zipinfo app_source_bundle
Archive: app_source_bundle
Zip file size: 1092 bytes, number of entries: 5
drwxrwxrwx 2.0 fat 0 b- stor 21-Jun-12 20:32 ./
-rw-rw-rw- 2.0 fat 98 b- defN 21-Jun-12 20:08 docker-compose.prod.yml
-rw-rw-rw- 2.0 fat 993 b- defN 21-Jun-12 20:15 myzip.zip
drwxrwxrwx 2.0 fat 0 b- stor 21-Jun-12 20:08 .platform/hooks/prebuild/
-rw-rw-rw- 2.0 fat 58 b- defN 21-Jun-12 20:09 .platform/hooks/prebuild/build.sh
Therefore, I think this is a bug with AWS EB CLI deploy command in regards to how it zips files for Windows users.
You can't do this from command level. But I guess you could write container_commands script to rename your docker-compose file from docker-compose.dev.yml to docker-compose.yml:
You can use the container_commands key to execute commands that affect your application source code. Container commands run after the application and web server have been set up and the application version archive has been extracted, but before the application version is deployed.
UPDATE 12 Jun 2021
I tried to replicate the issue using simplified setup with just docker-compose.prod.yml and Docker running on 64bit Amazon Linux 2 3.4.1 EB platform.
docker-compose.prod.yml
version: "3"
services:
client:
image: nginx
ports:
- 80:80
I can confirm and reproduce the issue with container_commands. So in my tests, the solution was to setup prebuild deployment hook.
So my deployment zip had the structure:
├── docker-compose.prod.yml
└── .platform
└── hooks
└── prebuild
└── build.sh
where
build.sh
#!/bin/bash
mv docker-compose.prod.yml docker-compose.yml
I also made the build.sh executable before creating deployment zip.
app_source_bundle permissions (zipinfo -l)
Zip file size: 1008 bytes, number of entries: 5
drwxr-xr-x 3.0 unx 0 bx 0 stor 21-Jun-12 07:37 .platform/
drwxr-xr-x 3.0 unx 0 bx 0 stor 21-Jun-12 07:37 .platform/hooks/
drwxr-xr-x 3.0 unx 0 bx 0 stor 21-Jun-12 07:38 .platform/hooks/prebuild/
-rwxr-xr-x 3.0 unx 77 tx 64 defN 21-Jun-12 07:24 .platform/hooks/prebuild/build.sh
-rw-r--r-- 3.0 unx 92 tx 68 defN 21-Jun-12 07:01 docker-compose.prod.ym
I was able to circumvent this annoying bug by:
Using git and AWS CodeCommit
Running git add --chmod=+x .platform/hooks/prebuild/build.sh
This circumvents the Windows-related issue because:
When you configure CodeCommit with your EB CLI repository, the EB CLI
uses the contents of the repository to create source bundles. When you
run eb deploy or eb create, the EB CLI pushes new commits and uses the
HEAD revision of your branch to create the archive that it deploys to
the EC2 instances in your environment.
Source: Deploying from your CodeCommit repository
I’m trying to deploy to GCP Compute Engine by following this tutorial
https://cloud.google.com/community/tutorials/elixir-phoenix-on-google-compute-engine
Unable to connect to provided external IP after create firewall-rules
There are no errors in following the tutorial. But cannot connect to http://${external_ip}:8080 after creating firewall rules
Build release is already in Google Cloud Storage
I copy hello
gsutil cp _build/prod/rel/hello/bin/hello\
gs://${BUCKET_NAME}/hello-release
instead of hello.run
gsutil cp _build/prod/rel/hello/bin/hello.run \
gs://${BUCKET_NAME}/hello-release
My instance-startup.sh
#!/bin/sh
set -ex
export HOME=/app
mkdir -p ${HOME}
cd ${HOME}
RELEASE_URL=$(curl \
-s "http://metadata.google.internal/computeMetadata/v1/instance/attributes/release-url" \
-H "Metadata-Flavor: Google")
gsutil cp ${RELEASE_URL} hello-release
chmod 755 hello-release
wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 \
-O cloud_sql_proxy
chmod +x cloud_sql_proxy
mkdir /tmp/cloudsql
PROJECT_ID=$(curl \
-s "http://metadata.google.internal/computeMetadata/v1/project/project-id" \
-H "Metadata-Flavor: Google")
./cloud_sql_proxy -projects=${PROJECT_ID} -dir=/tmp/cloudsql &
PORT=8080 ./hello-release start
gcloud compute instances get-serial-port-output shows
...
Feb 23 18:02:35 hello-instance startup-script: INFO startup-script: + PORT=8080 ./hello-release start
Feb 23 18:02:35 hello-instance startup-script: INFO startup-script: + ./cloud_sql_proxy -projects= hello -dir=/tmp/cloudsql
Feb 23 18:02:35 hello-instance startup-script: INFO startup-script: 2019/02/23 18:02:35 Rlimits for file descriptors set to {&{8500 8500}}
Feb 23 18:02:35 hello-instance startup-script: INFO startup-script: ./hello-release: 31: exec: /app/hello_rc_exec.sh: not found
Feb 23 18:02:39 hello-instance startup-script: INFO startup-script: 2019/02/23 18:02:39 Listening on /tmp/cloudsql/hello:asia-east1:hello-db/.s.PGSQL.5432 for hello:asia-east1: hello-db
Feb 23 18:02:39 hello-instance startup-script: INFO startup-script: 2019/02/23 18:02:39 Ready for new connections
Feb 23 18:08:08 hello-instance ntpd[656]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
hello_rc_exec.sh is generated after initialize Distillery. It is stored in _build/prod/rel/hello/bin/hello_rc_exec.sh
firewall rules
NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED
default-allow-http-8080 default INGRESS 1000 tcp:8080 False
...
I also run in ps aux | grep erl in the instance
hello_team#hello-instance:~$ ps aux | grep erl
hello_t+ 23166 0.0 0.0 12784 1032 pts/0 S+ 08:04 0:00 grep erl
Im not sure what information is needed to fix this
Please do ask for information and I will provide them.
Thank you
For posterity, here was the solution (worked out in this forum thread).
First, the poster had uploaded the hello file instead of hello.run to cloud storage. The tutorial intentionally specifies uploading hello.run because it is a full executable archive of the entire release, whereas hello is merely a wrapper script and is by itself not capable of executing the app. So this modification to the procedure needed to be reverted.
Second, the poster's app included the elixir_bcrypt library. This library includes a NIF whose platform-specific binary code is built in the deps directory (instead of the _build directory). The tutorial's procedure doesn't properly clean out binaries in deps prior to cross-compiling for deployment, and so the poster's macOS-built bcrypt library was leaking into the build. When deployed to compute engine on Debian, this crashed on initialization. The poster fixed this problem by deleting the deps directory and re-installing dependencies while cross-compiling.
It was also noted during the discussion that the tutorial promoted a poor practice of mounting the user's app in a volume when doing a Docker cross-compilation. Instead, it should simply copy the app into the image, perform the entire build there, and use docker cp to extract the built artifact. This practice would have prevented this issue. A work item was filed to modify the tutorial accordingly.
The solution is here.
Thank you for the help everyone!
Because I want to install a new clear version of Hyperledger Fabric, I deleted old Hyperledger file of one month ago, and run "vagrant destroy".
I run "vagrant up", and "vagrant ssh" successfully.
I "make peer" successfully, when I run "peer", if failed.
When I run "make peer" and "peer" again, the error is pop up as below:
vagrant#ubuntu-1404:/opt/gopath/src/github.com/hyperledger/fabric$ make peer
make: Nothing to be done for `peer'.
vagrant#ubuntu-1404:/opt/gopath/src/github.com/hyperledger/fabric$ peer
No command 'peer' found, did you mean:
Command 'pee' from package 'moreutils' (universe)
Command 'beer' from package 'gerstensaft' (universe)
Command 'peel' from package 'ears' (universe)
Command 'pear' from package 'php-pear' (main)
peer: command not found
vagrant#ubuntu-1404:/opt/gopath/src/github.com/hyperledger/fabric$
vagrant#ubuntu-1404:/opt/gopath/src/github.com/hyperledger/fabric$ cd peer
vagrant#ubuntu-1404:/opt/gopath/src/github.com/hyperledger/fabric/peer$ ls -l
total 60
drwxr-xr-x 1 vagrant vagrant 204 Jun 26 01:16 bin
-rw-r--r-- 1 vagrant vagrant 17342 Jun 25 14:18 core.yaml
-rw-r--r-- 1 vagrant vagrant 35971 Jun 25 14:18 main.go
-rw-r--r-- 1 vagrant vagrant 1137 Jun 23 08:46 main_test.go
The binary peer file's location is ./build/bin/ folder.
For your configuration the full path is "/opt/gopath/src/github.com/hyperledger/fabric/build/bin/"
Let me tell you one thing I observed when I pulled code from gitHub last week, [Thursday to be exact].
Make command had created the executable in "/opt/gopath/src/github.com/hyperledger/fabric/build/bin/". But one pretty thing which I found was, it had copied the same to "/hyperledger/build/bin". And the $PATH variable now included "/hyperledger/build/bin" also.
So to answer your question, you have two options :-
1. one retain your current version of code & Navigate into the bin folder in the fabric directory and see whether peer executable is present there. ? If yes, then execute the rest of the code.
2. Pull the latest copy from gitHub.com and make peer from fabric directory as usual. But execute peer from anywhere. :)
I installed cloudera vm and started trying some basic stuff. First I just wanted to ls the hdfs directoires. so I issued the below command.
[cloudera#quickstart ~]$ hadoop fs -ls /
ls: Failed on local exception: java.net.SocketException: Network is unreachable; Host Details : local host is: "quickstart.cloudera/10.0.2.15"; destination host is: "quickstart.cloudera":8020;
though ps -fu hdfs says both namenode and data node is running. I checked the status using the service command.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Thinking all the problems will be resolved if I restart all the services, I executed the below command.
[cloudera#quickstart conf]$ sudo /home/cloudera/cloudera-manager --express --force
[QuickStart] Shutting down CDH services via init scripts...
[QuickStart] Disabling CDH services on boot...
[QuickStart] Starting Cloudera Manager daemons...
[QuickStart] Waiting for Cloudera Manager API...
[QuickStart] Configuring deployment...
Submitted jobs: 92
[QuickStart] Deploying client configuration...
Submitted jobs: 93
[QuickStart] Starting Cloudera Management Service...
Submitted jobs: 101
[QuickStart] Enabling Cloudera Manager daemons on boot...
Now I thought all services will be up so again checked the status of namenode service. Again it came failed.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
Now I decided to manually stop and start the namenode service. Again not much use.
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode stop
no namenode to stop
Stopped Hadoop namenode: [ OK ]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode status
Hadoop namenode is not running [FAILED]
[cloudera#quickstart ~]$ sudo service hadoop-hdfs-namenode start
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Failed to start Hadoop namenode. Return value: 1 [FAILED]
I checked the file /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out . It just said below
log4j:ERROR Could not find value for key log4j.appender.RFA
log4j:ERROR Could not instantiate appender named "RFA".
I also checked /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-quickstart.cloudera.log.out . Found below when I searched for error. Can anyone please suggest me what is the best way to get the services back on track. Unfortunately I am not able to access cloudera manager from browser. Anything that I can do from command line?
2016-02-24 21:02:48,105 WARN com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={ROLE_TYPE=[NAMENODE], CATEGORY=[LOG_MESSAGE], ROLE=[hdfs-NAMENODE], SEVERITY=[IMPORTANT], SERVICE=[hdfs], HOST_IDS=[quickstart.cloudera], SERVICE_TYPE=[HDFS], LOG_LEVEL=[WARN], HOSTS=[quickstart.cloudera], EVENTCODE=[EV_LOG_EVENT]}, content=Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!, timestamp=1456295437905} - 1 of 17 failure(s) in last 79302s
java.io.IOException: Error connecting to quickstart.cloudera/10.0.2.15:7184
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
at com.cloudera.cmf.event.shaded.org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:133)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.checkSpecificRequestor(AvroEventStorePublishProxy.java:122)
at com.cloudera.cmf.event.publish.AvroEventStorePublishProxy.publishEvent(AvroEventStorePublishProxy.java:196)
at com.cloudera.cmf.event.publish.EventStorePublisherWithRetry$PublishEventTask.run(EventStorePublisherWithRetry.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketException: Network is unreachable
You can try this:
check witch process is using the port 7184 of namenode (i.e netstat linux command)
and kill that and then restart
Or
change you namenode port from conf and restart hadoop
I'm trying to run the spark shell on my Hadoop cluster via Yarn.
I use
Hadoop 2.4.1
Spark 1.0.0
My Hadoop cluster already works. In order to use Spark, I built Spark as described here :
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.1 -DskipTests clean package
The compilation works fine, and I can run spark-shell without troubles. However, running it on yarn :
spark-shell --master yarn-client
gets me the following error :
14/07/07 11:30:32 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1404725422955
yarnAppState: ACCEPTED
14/07/07 11:30:33 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: -1
appStartTime: 1404725422955
yarnAppState: FAILED
org.apache.spark.SparkException: Yarn application already ended,might be killed or not able to launch application master
.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApp(YarnClientSchedulerBackend.scala:105
)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:82)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:136)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:318)
at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:957)
at $iwC$$iwC.<init>(<console>:8)
at $iwC.<init>(<console>:14)
at <init>(<console>:16)
at .<init>(<console>:20)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:121)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:120)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:263)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:120)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:913)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:142)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:104)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:56)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:930)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Spark manages to communicate with my cluster, but it doesn't work out.
Another interesting thing is that I can access my cluster using pyspark --master yarn. However, I get the following warning
14/07/07 14:10:11 WARN cluster.YarnClientClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
and an infinite computation time when doing something as simple as
sc.wholeTextFiles('hdfs://vm7x64.fr/').collect()
What may be causing this problem ?
Please check does your Hadoop cluster is running correctly.
On the master node next YARN process must be running:
$ jps
24970 ResourceManager
On slave nodes/executors:
$ jps
14389 NodeManager
Also make sure that you created a reference (or copied those files) to Hadoop configuration in Spark config directory :
$ ll /spark/conf/ | grep site
lrwxrwxrwx 1 hadoop hadoop 33 Jun 8 18:13 core-site.xml -> /hadoop/etc/hadoop/core-site.xml
lrwxrwxrwx 1 hadoop hadoop 33 Jun 8 18:13 hdfs-site.xml -> /hadoop/etc/hadoop/hdfs-site.xml
You also can check ResourceManager Web UI on port 8088 - http://master:8088/cluster/nodes. There must be a list of available nodes and resources.
You must take a look at your log files using next command (application ID you can find in Web UI):
$ yarn logs -applicationId <yourApplicationId>
Or you can look directly to entire log files on Master/ResourceManager host:
$ ll /hadoop/logs/ | grep resourcemanager
-rw-rw-r-- 1 hadoop hadoop 368414 Jun 12 18:12 yarn-hadoop-resourcemanager-master.log
-rw-rw-r-- 1 hadoop hadoop 2632 Jun 12 17:52 yarn-hadoop-resourcemanager-master.out
And on Slave/NodeManager hosts:
$ ll /hadoop/logs/ | grep nodemanager
-rw-rw-r-- 1 hadoop hadoop 284134 Jun 12 18:12 yarn-hadoop-nodemanager-slave.log
-rw-rw-r-- 1 hadoop hadoop 702 Jun 9 14:47 yarn-hadoop-nodemanager-slave.out
Also check if all environment variables are correct:
HADOOP_CONF_LIB_NATIVE_DIR=/hadoop/lib/native
HADOOP_MAPRED_HOME=/hadoop
HADOOP_COMMON_HOME=/hadoop
HADOOP_HDFS_HOME=/hadoop
YARN_HOME=/hadoop
HADOOP_INSTALL=/hadoop
HADOOP_CONF_DIR=/hadoop/etc/hadoop
YARN_CONF_DIR=/hadoop/etc/hadoop
SPARK_HOME=/spark