Not able to Start/Stop Spark Worker from Remote Machine - mapreduce

I have two machines A and B. I am trying to run Spark Master on machine A and Spark Worker on machine B.
I have set machine B's host name in conf/slaves in my Spark directory.
When I am executing start-all.sh to start master and workers, I am getting below message on console:
abc#abc-vostro:~/spark-scala-2.10$ sudo sh bin/start-all.sh
sudo: /etc/sudoers.d is world writable
starting spark.deploy.master.Master, logging to /home/abc/spark-scala-2.10/bin/../logs/spark-root-spark.deploy.master.Master-1-abc-vostro.out
13/09/11 14:54:29 WARN spark.Utils: Your hostname, abc-vostro resolves to a loopback address: 127.0.1.1; using 1XY.1XY.Y.Y instead (on interface wlan2)
13/09/11 14:54:29 WARN spark.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Master IP: abc-vostro
cd /home/abc/spark-scala-2.10/bin/.. ; /home/abc/spark-scala-2.10/bin/start-slave.sh 1 spark://abc-vostro:7077
xyz#1XX.1XX.X.X's password:
xyz#1XX.1XX.X.X: bash: line 0: cd: /home/abc/spark-scala-2.10/bin/..: No such file or directory
xyz#1XX.1XX.X.X: bash: /home/abc/spark-scala-2.10/bin/start-slave.sh: No such file or directory
Master is started but worker is failed to start.
I have set xyz#1XX.1XX.X.X in conf/slaves in my Spark directory.
Can anyone help me to resolve this? This is probably something I'm missing any configuration on my end.
However when I create Spark Master and Worker on same machine, It is working fine.

Have you copied all Spark's files at the worker too? Also you need to setup password less access b/w master and worker.

Here were steps I would follow,
Setting up public key authentication over SSH
Checking /etc/spark/conf.dist/spark-env.sh
scp this to your computer B from computer A (master)
Set conf/slaves, hostname for computer B
./start-all.sh
For standalone cluster mode, you may set these option in spark-env.sh.
For example,
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=4G

see SSH ACCESS, in hadoop multinode cluster setup by michael. just like that .... will solve ur probs..
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Related

"puppet agent --test" on client machine aren't getting manifest from the Puppet master server

Issue
So I have two AWS instances: a Puppet master and a Puppet client. When I run sudo puppet agent --test on my client, the tasks defined in my master's manifest didn't apply to the client instance.
Where I am right now
puppetmaster is installed on the master instance
puppet is installed on client instance
Master just finished signing my client's certificate. No errors were displayed
Master has a /etc/puppet/manifests/site.pp
Client's puppet.conf file has a server=dns_of_master line
My Puppet version is 5.4.0. I'm using the default manifest configuration.
Here's the guide that I'm following: https://www.digitalocean.com/community/tutorials/getting-started-with-puppet-code-manifests-and-modules. The only changes are the site.pp content and that I'm using AWS.
If it helps, here's my AWS instances' AMI: ami-06d51e91cea0dac8d
Details
Here's the content on my master's /etc/puppet/manifests/site.pp:
node default {
package { 'nginx':
ensure => installed
}
service { 'nginx':
ensure => running,
require => Package['nginx']
}
file { '/tmp/hello_world':
ensure => present,
content => 'Hello, World!'
}
}
The file has a permission of 777.
Here's the ouput when I run sudo puppet agent --test. This is after I ran sudo puppet agent --enable:
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Caching catalog for my_client_dns
Info: Applying configuration version '1578968015'
Notice: Applied catalog in 0.02 seconds
I have looked at other StackOverflow posts with this issue. I know that my catalog is not getting applied due to the lack of status messages and the quick time. Unfortunately, the solutions didn't apply to my case:
My site.pp is named correctly and in the correct file path /etc/puppet/manifests
I didn't touch my master's puppet.conf file
I tried restarting the server with sudo systemctl but nothing happens
So I have fixed the issue. The guide that I was following required an older version of Ubuntu (16.4, rather than 18.4 as I'm using). This needs a different AMI than the one that I used to create the instances.

Wildfly 10 restart issue on AWS EC2

I am running my Wildfly 10.1.0 server on Linux OS on Amazon EC2 instance. I have written start and stop scripts for the server. Whenever I stop my server and re-start after some time I get the following exception -
WFLYCTL0013: Operation ("add") failed - address: ([("deployment" => "rapid.ear")]) - failure description: "WFLYSRV0137: No deployment content with hash dd66eee901c4bf79dd6659873df918e1b639bc1b is available in the deployment content repository for deployment 'rapid.ear'. This is a fatal boot error. To correct the problem, either restart with the --admin-only switch set and use the CLI to install the missing content or remove it from the configuration, or remove the deployment from the xml configuration file and restart."
When I remove the entry for that WAR from standalone.xml I am able to restart the server, but I need a more permanent solution.
The start script written is -
nohup /data/wildfly-10.1.0.Final/bin/standalone.sh -Djavax.net.ssl.trustStore="/usr/java/jdk1.8.0_121/jre/lib/security/jssecacerts" --server-config=standalone.xml &
And the stop script is -
sh /data/wildfly-10.1.0.Final/bin/jboss-cli.sh --connect command=:shutdown
It may not be quite as efficient in terms of I/O but if you've got a standalone instance I've just taken advantage of the deployment scanner. I have:
<subsystem xmlns="urn:jboss:domain:deployment-scanner:2.0">
<deployment-scanner name="myapp" path="/home/wildfly/sites/www.mysite.tld" scan-interval="60000" auto-deploy-exploded="true"/>
</subsystem>
in my standalone-full.xml (you may or may not need the "-full" part). I then deploy my webapp to "/home/wildfly/sites/www.mysite.tld" and can update it as needed. The code I show only reads the directory once a minute so it isn't terrible on I/O.
Again, your deployment may be different than mine.

Getting files from server on AWS Using Jenkins Build

I have installed jenkins on my local machine (on premises). I have my server (Linux) in AWS Cloud. I need to share logs with developers with out giving server access to them. I need to create a jenkins job by running that job they should get the logs from server.
How can i do that ?? If any one following the same process to get the data from cloud please help me in solving this... Thanks in advance.
Use the SSH Agent plugin to securely setup your private key
Use SCP to copy the log files to the local workspace
Archive those files to the Jenkins job
You could write a pipeline script to do this. Something like:
node ("linux") {
sshagent (credentials: ['deploy-dev']) {
sh 'scp user#awshostnamehere:/somepath/somelogfile .'
archive somelogfile
}
}
Note that this requires you to fill in the blanks. To get this to work you would have to:
Setup an SSH private key credential named deploy-dev
Setup a build agent with the label 'linux' or change that to a label of an agent you do have.

Redis telling me "Failed opening .rdb for saving: Permission denied"

I'm running Redis server 2.8.17 on a Debian server 8.5. I'm using Redis as a session store for a Django 1.8.4 application.
I haven't changed the software configuration on my server for a couple of months and everything was working just fine until a week ago when Django began raising the following error:
MISCONF Redis is configured to save RDB snapshots but is currently not able to persist to disk. Commands that may modify the data set are disabled. Please check Redis logs for details...
I checked the redis log and saw this happening about once a second:
1 changes in 900 seconds. Saving...
Background saving started by pid 22213
Failed opening .rdb for saving: Permission denied
Background saving error
I've read these two SO questions 1, 2 but they haven't helped me find the problem.
ps shows that user "redis" is running the server:
redis 26769 ... /usr/bin/redis-server *.6379
I checked my config file for the redis file name and path:
grep ^dir /etc/redis/redis.conf =>
dir /var/lib/redis
grep ^dbfilename /etc =>
dbfilename dump.rdb
The permissons on /var/lib/redis are 755 and it's owned by redis:redis.
The permissons on /var/lib/redis/dump.rdb are 644 and it's owned by redis:redis too.
I also ran strace on the server process:
ps -C redis-server # pid = 26769
sudo strace -p 26769 -o /tmp/strace.out
But when I examine the output, I don't see any errors. In particular I don't see a "Permission denied" error as I would expect.
Also, /var/lib/redis is not an NFS directory.
Does anyone know what else could be causing this? I'd hate to have to stop using Redis. I know I can run the command "set stop-writes-on-bgsave-error yes" but that doesn't solve the problem.
This is now happening on a daily basis and the only way I can stop the error is to restart the Redis server.
Thanks.
I just had a similar issue. Despite my config file being correct, when I checked the actual dbfilename and dir in redis-client, they were incorrect.
Run redis-cli and then
CONFIG GET dbfilenamewhich should return something like
1) "dbfilename"
2) "dump.rdb"
1) is just the key and 2) the value. Similarly then run CONFIG GET dir should return something like
1) "dir"
2) "/var/lib/redis"
Confirm that these are correct and if not, set them with CONFIG SET dir /correct/path
Hope this helps!
If you have moved Redis to a new mounted volume: /mnt/data-01.
sudo vim /etc/systemd/system/redis.service
Set ReadWriteDirectories=-/mnt/data-01
sudo mkdir /mnt/data-01/redis
Set chown and chmod on new redis data dir and rdb file.
The permissons on /var/lib/redis are 755 and it's owned by redis:redis
The permissons on /var/lib/redis/dump.rdb are 644 and it's owned by redis:redis
Switch configurations while redis is running
$ redis-cli
127.0.0.1:6379> CONFIG SET dir /data/tmp
redis-cli 127.0.0.1:6379> CONFIG SET dbfilename temp.rdb
127.0.0.1:6379> BGSAVE
tail /var/log/redis/redis.cnf (verify saved)
Start Redis Server in a directory where Redis has write permissions
The answers above will definitely solve your problem, but here's what's actually going on:
The default location for storing the rdb.dump file is ./ (denoting current directory). You can verify this in your redis.conf file. Therefore, the directory from where you start the redis server is where a dump.rdb file will be created and updated.
Since you say your redis server has been working fine for a while and this just started happening, it seems you have started running the redis server in a directory where redis does not have the correct permissions to create the dump.rdb file.
To make matters worse, redis will also probably not allow you to shut down the server either until it is able to create the rdb file to ensure the proper saving of data.
To solve this problem, you must go into the active redis client environment using redis-cli and update the dir key and set its value to your project folder or any folder where non-root has permissions to save. Then run BGSAVE to invoke the creation of the dump.rdb file.
CONFIG SET dir "/hardcoded/path/to/your/project/folder"
BGSAVE
(Now, if you need to save the dump.rdb file in the directory that you started the server in, then you will need to change permissions for the directory so that redis can write to it. You can search stackoverflow for how to do that).
You should now be able to shut down the redis server. Note that we hardcoded the path. Hardcoding is rarely a good practice and I highly recommend starting the redis server from your project directory and changing the dir key back to./`.
CONFIG SET dir "./"
BGSAVE
That way when you need redis for another project, the dump file will be created in your current project's directory and not in the hardcoded path's project directory.
You can resolve this problem by going into the redis-cli
Type redis-cli in the terminal
Then write config set stop-writes-on-bgsave-error no and it resolved my problem.
Hope it resolved your problem
Up to redis 3.2 it shipped with pretty insane defaults which opened the port to the public. In combination with the CONFIG SET instruction everybody can change your redis config from outside easily. If the error starts after some time, someone probably changed your config.
On your local machine check that
telnet SERVER_IP REDIS_PORT
is denied. Otherwise check your config, you should have the setting
bind 127.0.0.1
enabled.
Dependent on the user that runs redis, you should also check for damage that the intruder has done.

How do I associate a Vagrant project directory with an existing VirtualBox VM?

Somehow my Vagrant project has disassociated itself from its VirtualBox VM, so that when I vagrant up Vagrant will import the base-box and create a new virtual machine.
Is there a way to re-associate the Vagrant project with the existing VM?
How does Vagrant internally associate a Vagrantfile with a VirtualBox VM directory?
For Vagrant 1.6.3 do the following:
1) In the directory where your Vagrantfile is located, run the command
VBoxManage list vms
You will have something like this:
"virtualMachine" {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
2) Go to the following path:
cd .vagrant/machines/default/virtualbox
3) Create a file called id with the ID of your VM xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
4) Save the file and run vagrant up
WARNING: The solution below works for Vagrant 1.0.x but not Vagrant 1.1+.
Vagrant uses the ".vagrant" file in the same directory as your "Vagrantfile" to track the UUID of your VM. This file will not exist if a VM does not exist. The format of the file is JSON. It looks like this if a single VM exists:
{
"active":{
"default":"02f8b71c-75c6-4f33-a161-0f46a0665ab6"
}
}
default is the name of the default virtual machine (if you're not using multi-VM setups).
If your VM has somehow become disassociated, what you can do is do VBoxManage list vms which will list every VM that VirtualBox knows about by its name and UUID. Then manually create a .vagrant file in the same directory as your Vagrantfile and fill in the contents properly.
Run vagrant status to ensure that Vagrant picked up the proper changes.
Note: This is not officially supported by Vagrant and Vagrant may change the format of .vagrant at any time. But this is valid as of Vagrant 0.9.7 and will be valid for Vagrant 1.0.
The solution with upper version is quite the same.
But first you need to launch the .vbox file by hand so that it appear in VBoxManage list vms
Then you can check the .vagrant/machines/default/virtualbox/id to check that the uuid is the right one.
Had the issue today, my .vagrant folder was missing and found that there was a few more steps than simply setting the id:
Set the id:
VBoxManage list vms
Find the id and set in {project-folder}/.vagrant/machines/default/virtualbox/id.
Note that default may be different if set in your Vagrantfile e.g. config.vm.define "someothername".
Stop the machine from provisioning:
Create a file named action_provision in the same dir as the id file, set it's contents to: 1.5:{id} replacing {id} with the id found in step 1.
Setup a new public/private key:
Vagrant uses a private key stored in .vagrant/machines/default/virtualbox/private_key to ssh into the machine. You'll need to generate a new one.
ssh-keygen -t rsa
name it private_key.
vagrant ssh then copy the private_key.pub into /home/vagrant/.ssh/authorized_keys.
Update with same problem today with Vagrant 1.7.4:
useful thread at https://github.com/mitchellh/vagrant/issues/1755
and specially with following commands:
For example, to pair box 'vip-quickstart_default_1431365185830_12124' to vagrant.
$ VBoxManage list
"vip-quickstart_default_1431365185830_12124" {50feafd3-74cd-40b5-a170-3c976348de27}
$ echo -n "50feafd3-74cd-40b5-a170-3c976348de27" > .vagrant/machines/default/virtualbox/id
For multi-VM setups, it would look like this:
{
"active":{
"web":"a1fc9ae4-5d43-49cb-be31-ab3c4f74745d",
"db":"13503bc5-76b8-4c26-95c4-32435b372212"
}
}
You can get the vm names from the Vagrantfile used to create those VMs. Look for this line:
config.vm.define :web do |web_config|
"web" is the name of the vm in this case.
This is modified from #Petecoop's answer.
Run vagrant halt if you haven't shut down the box yet.
Then list your virtualboxes: VBoxManage list vms
It'll list all of your virtualboxes. Identify the box you want to revert to and grab the id between the curly brackets: {}.
Then edit the project id file: sudo nano .vagrant/machines/default/virtualbox/id (from the project directory)
Replace it with the id you copied from the list of VBs.
Try vagrant reload.
If that doesn't work and gets hung on SSH authorization (where I stumbled), copy the insecure public key from the vagrant git. Replace the content of /.vagrant/machines/default/virtualbox/private_key. Backup the original of course: cp private_key private_key-bak.
Then run vagrant reload. It'll say it's identified the insecure key and create a new one.
default: Vagrant insecure key detected. Vagrant will automatically replace
default: this with a newly generated keypair for better security.
default: Inserting generated public key within guest...
default: Removing insecure key from the guest if it's present...
default: Key inserted! Disconnecting and reconnecting using new SSH key...
==> default: Machine booted and ready!
You should be all set.
I'm using Vagrant 1.8.1 on OSX El Capitan
My vm was not shut correctly when my computer restarted, so when i tried vagrant up it was always creating new vm. No solutions here worked for me. But what did work was a variation of ingmmurillo's answer
So instead of creating .vagrant/machines/default/virtualbox/id based on the id from running VBoxManage list vms. I had to update the id in .vagrant/machines/local/virtual_box/id
I've got a one liner that essentially does this for me:
echo -n `VBoxManage list vms | head -n 1 | awk '{print substr($2, 2, length($2)-2)}'` > .vagrant/machines/local/virtualbox/id
This assumes the first box is the one i need to start from running VBoxManage list vms
In Vagrant 1.9.1:
I had a VM in Virtual Box named 'Ubuntu 16.04.1' so I packaged it as a vagrant box with:
vagrant package --base "Ubuntu 16.04.1"
responds with...
==> Ubuntu 16.04.1: Exporting VM...
==> Ubuntu 16.04.1: Compressing package to: blah blah/package.box
I'm on macos and found that removing the .locks on the boxes solved my problem.
For some reason
vagrant halt
did not remove these locks, and after restoring all my settings in .vagrant/machine/default/virtualbox using timemachine, removing the locks, the right machine booted up.
Only 1 minor problem remains, It booted into grub so I had to press enter once, don't know if this is staying, but I will find out soon enough.
I'm running vagrant 1.7.4 and virtualbox 5.0.2
for me deleting the
cd yourVagrantProject/.vagrant/machines/default/virtualbox/
rm id
worked.