Issue with pywebhdfs module - python-2.7

I am trying to use pywebhdfs module in Python to interact with Hortonworks Hadoop sandbox. I tried the following three commands:
from pywebhdfs.webhdfs import PyWebHdfsClient
hdfs = PyWebHdfsClient(user_name="root",port=50070,host="localhost")
hdfs.make_dir('/newDirectory')
I get the following error on running the last command:
ConnectionError: ('Connection aborted.', error(10035, 'A non-blocking socket operation could not be completed immediately'))
The sandbox is running and I am able to create directories directly on it using Putty. However, it doesn't work through Python.
Can someone help with this error?

I believe 'root' cannot create directory on the '/' node of HDFS, since 'root' user is not a HDFS superuser unless, of course, you changed it.
Could you confirm if you can create '/newDirectory' using root user or perhaps create the directory where root has permissions or choose another user?

Related

Adding JDBC jar driver to classpath for AWS Elastic Beanstalk job

I have an Elastic Beanstalk application that I'm trying to configure to connect to a FileMaker Pro database, over JDBC. The code I'm using is:
import jaydebeapi as jdp
jdbc_driver_location = '/tmp/fmjdbc.jar'
conn = jdb.connect(jdbc_driver_class,
jdbc_connection_type + '://' + db_url + '/' + db_name,
[user_name, password], jdbc_driver_location,)
When I attempt this, I get the following error:
java.sql.SQLException: No suitable driver found for jdbc:filemaker://10.120.120.108/carecord-<class 'jpype._jexception.java.sql.SQLExceptionPyRaisable'>
To try and solve to problem, I've added the jdbc.jar to both the /tmp folder of the Ec2 instance, as well as included it in the project directory. When if I SSH into the EC2 instance and issue the command:
JAVA_HOME=/tmp/fmjdbc.jar
The program will run the next time it's prompted, without issue. After a few hours it will give the original error and need to be issued the above command again to work. To fix this I tried adding the following to /.ebextensions, to copy the .jar into the tmp folder from the project directory and issue the above command to the server from the start:
commands:
command01:
command: sudo cp /opt/python/current/app/fmjdbc.jar /tmp/fmjdbc.jar
command02:
command: JAVA_HOME=/tmp/fmjdbc.jar
But the project still gives the error. Any thoughts on how I can add this driver to the classpath such that the job will run consistently?
To help folks who have this issue in the future, the answer to this that I found was at the end of this thread.
I appended the following:
if jpype.isJVMStarted() and not jpype.isThreadAttachedToJVM():
jpype.attachThreadToJVM()
jpype.java.lang.Thread.currentThread().setContextClassLoader(jpype.java.lang.ClassLoader.getSystemClassLoader())
Just above the
jdbc_driver_location = '/tmp/fmjdbc.jar'
section of my original code above. This allows the application to loop and successfully find the necessary driver.
JAVA_HOME is supposed to point to the location where Java is installed on the server. You don't use JAVA_HOME to add libraries to the classpath. You shouldn't have to set any environment variables for your code to work.
The root of your problem is that you are copying the file to /tmp/fmjdbc.jar but you are setting jdbc_driver_location to be /tmp/jdbc.jar. Notice how those file names are different. To fix your code change it to this:
jdbc_driver_location = '/tmp/fmjdbc.jar'

Redis telling me "Failed opening .rdb for saving: Permission denied"

I'm running Redis server 2.8.17 on a Debian server 8.5. I'm using Redis as a session store for a Django 1.8.4 application.
I haven't changed the software configuration on my server for a couple of months and everything was working just fine until a week ago when Django began raising the following error:
MISCONF Redis is configured to save RDB snapshots but is currently not able to persist to disk. Commands that may modify the data set are disabled. Please check Redis logs for details...
I checked the redis log and saw this happening about once a second:
1 changes in 900 seconds. Saving...
Background saving started by pid 22213
Failed opening .rdb for saving: Permission denied
Background saving error
I've read these two SO questions 1, 2 but they haven't helped me find the problem.
ps shows that user "redis" is running the server:
redis 26769 ... /usr/bin/redis-server *.6379
I checked my config file for the redis file name and path:
grep ^dir /etc/redis/redis.conf =>
dir /var/lib/redis
grep ^dbfilename /etc =>
dbfilename dump.rdb
The permissons on /var/lib/redis are 755 and it's owned by redis:redis.
The permissons on /var/lib/redis/dump.rdb are 644 and it's owned by redis:redis too.
I also ran strace on the server process:
ps -C redis-server # pid = 26769
sudo strace -p 26769 -o /tmp/strace.out
But when I examine the output, I don't see any errors. In particular I don't see a "Permission denied" error as I would expect.
Also, /var/lib/redis is not an NFS directory.
Does anyone know what else could be causing this? I'd hate to have to stop using Redis. I know I can run the command "set stop-writes-on-bgsave-error yes" but that doesn't solve the problem.
This is now happening on a daily basis and the only way I can stop the error is to restart the Redis server.
Thanks.
I just had a similar issue. Despite my config file being correct, when I checked the actual dbfilename and dir in redis-client, they were incorrect.
Run redis-cli and then
CONFIG GET dbfilenamewhich should return something like
1) "dbfilename"
2) "dump.rdb"
1) is just the key and 2) the value. Similarly then run CONFIG GET dir should return something like
1) "dir"
2) "/var/lib/redis"
Confirm that these are correct and if not, set them with CONFIG SET dir /correct/path
Hope this helps!
If you have moved Redis to a new mounted volume: /mnt/data-01.
sudo vim /etc/systemd/system/redis.service
Set ReadWriteDirectories=-/mnt/data-01
sudo mkdir /mnt/data-01/redis
Set chown and chmod on new redis data dir and rdb file.
The permissons on /var/lib/redis are 755 and it's owned by redis:redis
The permissons on /var/lib/redis/dump.rdb are 644 and it's owned by redis:redis
Switch configurations while redis is running
$ redis-cli
127.0.0.1:6379> CONFIG SET dir /data/tmp
redis-cli 127.0.0.1:6379> CONFIG SET dbfilename temp.rdb
127.0.0.1:6379> BGSAVE
tail /var/log/redis/redis.cnf (verify saved)
Start Redis Server in a directory where Redis has write permissions
The answers above will definitely solve your problem, but here's what's actually going on:
The default location for storing the rdb.dump file is ./ (denoting current directory). You can verify this in your redis.conf file. Therefore, the directory from where you start the redis server is where a dump.rdb file will be created and updated.
Since you say your redis server has been working fine for a while and this just started happening, it seems you have started running the redis server in a directory where redis does not have the correct permissions to create the dump.rdb file.
To make matters worse, redis will also probably not allow you to shut down the server either until it is able to create the rdb file to ensure the proper saving of data.
To solve this problem, you must go into the active redis client environment using redis-cli and update the dir key and set its value to your project folder or any folder where non-root has permissions to save. Then run BGSAVE to invoke the creation of the dump.rdb file.
CONFIG SET dir "/hardcoded/path/to/your/project/folder"
BGSAVE
(Now, if you need to save the dump.rdb file in the directory that you started the server in, then you will need to change permissions for the directory so that redis can write to it. You can search stackoverflow for how to do that).
You should now be able to shut down the redis server. Note that we hardcoded the path. Hardcoding is rarely a good practice and I highly recommend starting the redis server from your project directory and changing the dir key back to./`.
CONFIG SET dir "./"
BGSAVE
That way when you need redis for another project, the dump file will be created in your current project's directory and not in the hardcoded path's project directory.
You can resolve this problem by going into the redis-cli
Type redis-cli in the terminal
Then write config set stop-writes-on-bgsave-error no and it resolved my problem.
Hope it resolved your problem
Up to redis 3.2 it shipped with pretty insane defaults which opened the port to the public. In combination with the CONFIG SET instruction everybody can change your redis config from outside easily. If the error starts after some time, someone probably changed your config.
On your local machine check that
telnet SERVER_IP REDIS_PORT
is denied. Otherwise check your config, you should have the setting
bind 127.0.0.1
enabled.
Dependent on the user that runs redis, you should also check for damage that the intruder has done.

Error starting Spark in EMR 4.0

I created an EMR 4.0 instance in AWS with all available applications, including Spark. I did it manually, through AWS Console. I started the cluster and SSHed to the master node when it was up. There I ran pyspark. I am getting the following error when pyspark tries to create SparkContext:
2015-09-03 19:36:04,195 ERROR Thread-3 spark.SparkContext
(Logging.scala:logError(96)) - -ec2-user, access=WRITE,
inode="/user":hdfs:hadoop:drwxr-xr-x at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
I haven't added any custom applications, nor bootstrapping and expected everything to work without errors. Not sure what's going on. Any suggestions will be greatly appreciated.
Login as the user "hadoop" (http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-connect-master-node-ssh.html). It has all the proper environment and related settings for working as expected. The error you are receiving is due to logging in as "ec2-user".
I've been working with Spark on EMR this week, and found a few weird things relating to user permissions and relative paths.
It seems that running Spark from a directory which you don't 'own', as a user, is problematic. In some situations Spark (or some of the underlying Java pieces) want to create files or folders, and they think that pwd - the current directory - is the best place to do that.
Try going to the home directory
cd ~
then running pyspark.

Boto.conf not found

I am running a flask app on an AWS EC2 server, and have been using boto to access data stored in dynamoDB. After accidentally adding boto.conf to a git commit (and push and pull on the server), I have found that my python code can no longer locate the boto.conf file. I rolled back the changes with git, but the problem remains.
The python module and boto.conf file exist in the same directory, but when the module calls
boto.config.load_credential_file('boto.conf')
I get the flask error IOError: [Errno 2] No such file or directory: 'boto.conf'.
As per Documentation:
I'm not really sure why you are using boto.config_load_credential_file. In general you can pick up the config in a file called either ~/.boto or /etc/boto.cfg.
You can also look at this questions from SO that also answers how to get the configuration for boto: Getting Credentials File in the boto.cfg for Python

Not able to Start/Stop Spark Worker from Remote Machine

I have two machines A and B. I am trying to run Spark Master on machine A and Spark Worker on machine B.
I have set machine B's host name in conf/slaves in my Spark directory.
When I am executing start-all.sh to start master and workers, I am getting below message on console:
abc#abc-vostro:~/spark-scala-2.10$ sudo sh bin/start-all.sh
sudo: /etc/sudoers.d is world writable
starting spark.deploy.master.Master, logging to /home/abc/spark-scala-2.10/bin/../logs/spark-root-spark.deploy.master.Master-1-abc-vostro.out
13/09/11 14:54:29 WARN spark.Utils: Your hostname, abc-vostro resolves to a loopback address: 127.0.1.1; using 1XY.1XY.Y.Y instead (on interface wlan2)
13/09/11 14:54:29 WARN spark.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Master IP: abc-vostro
cd /home/abc/spark-scala-2.10/bin/.. ; /home/abc/spark-scala-2.10/bin/start-slave.sh 1 spark://abc-vostro:7077
xyz#1XX.1XX.X.X's password:
xyz#1XX.1XX.X.X: bash: line 0: cd: /home/abc/spark-scala-2.10/bin/..: No such file or directory
xyz#1XX.1XX.X.X: bash: /home/abc/spark-scala-2.10/bin/start-slave.sh: No such file or directory
Master is started but worker is failed to start.
I have set xyz#1XX.1XX.X.X in conf/slaves in my Spark directory.
Can anyone help me to resolve this? This is probably something I'm missing any configuration on my end.
However when I create Spark Master and Worker on same machine, It is working fine.
Have you copied all Spark's files at the worker too? Also you need to setup password less access b/w master and worker.
Here were steps I would follow,
Setting up public key authentication over SSH
Checking /etc/spark/conf.dist/spark-env.sh
scp this to your computer B from computer A (master)
Set conf/slaves, hostname for computer B
./start-all.sh
For standalone cluster mode, you may set these option in spark-env.sh.
For example,
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=4G
see SSH ACCESS, in hadoop multinode cluster setup by michael. just like that .... will solve ur probs..
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/