Hadoop command fails with python3 & works with python 2.7

Hadoop command fails with python3 & works with python 2.7 - python-2.7

I have a macbook pro & i have installed hadoop 2.7.3 on it following this :
https://www.youtube.com/watch?v=06hpB_Rfv-w
I am trying to run hadoop MRJob command via python3 & it is giving me this error:.
bhoots21304s-MacBook-Pro:2.7.3 bhoots21304$ python3 /Users/bhoots21304/PycharmProjects/untitled/MRJobs/Mr_Jobs.py -r hadoop /Users/bhoots21304/PycharmProjects/untitled/MRJobs/File.txt
No configs found; falling back on auto-configuration
Looking for hadoop binary in /usr/local/Cellar/hadoop/2.7.3/bin...
Found hadoop binary: /usr/local/Cellar/hadoop/2.7.3/bin/hadoop
Using Hadoop version 2.7.3
Looking for Hadoop streaming jar in /usr/local/Cellar/hadoop/2.7.3...
Found Hadoop streaming jar: /usr/local/Cellar/hadoop/2.7.3/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar
Creating temp directory /var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/Mr_Jobs.bhoots21304.20170328.165022.965610
Copying local files to hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/files/...
Running step 1 of 1...
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/hadoop-unjar5078580082326840824/] [] /var/folders/53/lvdfwyr52m1gbyf236xv3x1h0000gn/T/streamjob2711596457025539343.jar tmpDir=null
Connecting to ResourceManager at /0.0.0.0:8032
Connecting to ResourceManager at /0.0.0.0:8032
Total input paths to process : 1
number of splits:2
Submitting tokens for job: job_1490719699504_0003
Submitted application application_1490719699504_0003
The url to track the job: http://bhoots21304s-MacBook-Pro.local:8088/proxy/application_1490719699504_0003/
Running job: job_1490719699504_0003
Job job_1490719699504_0003 running in uber mode : false
map 0% reduce 0%
Task Id : attempt_1490719699504_0003_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Task Id : attempt_1490719699504_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Task Id : attempt_1490719699504_0003_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Task Id : attempt_1490719699504_0003_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Task Id : attempt_1490719699504_0003_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Task Id : attempt_1490719699504_0003_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
map 100% reduce 100%
Job job_1490719699504_0003 failed with state FAILED due to: Task failed task_1490719699504_0003_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
Job not successful!
Streaming Command Failed!
Counters: 17
Job Counters
Data-local map tasks=2
Failed map tasks=7
Killed map tasks=1
Killed reduce tasks=1
Launched map tasks=8
Other local map tasks=6
Total megabyte-milliseconds taken by all map tasks=18991104
Total megabyte-milliseconds taken by all reduce tasks=0
Total time spent by all map tasks (ms)=18546
Total time spent by all maps in occupied slots (ms)=18546
Total time spent by all reduce tasks (ms)=0
Total time spent by all reduces in occupied slots (ms)=0
Total vcore-milliseconds taken by all map tasks=18546
Total vcore-milliseconds taken by all reduce tasks=0
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Scanning logs for probable cause of failure...
Looking for history log in hdfs:///tmp/hadoop-yarn/staging...
STDERR: 17/03/28 22:21:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
STDERR: ls: `hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/output/_logs': No such file or directory
STDERR: 17/03/28 22:21:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
STDERR: ls: `hdfs:///tmp/hadoop-yarn/staging/userlogs/application_1490719699504_0003': No such file or directory
STDERR: 17/03/28 22:21:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
STDERR: ls: `hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/output/_logs/userlogs/application_1490719699504_0003': No such file or directory
Probable cause of failure:
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Step 1 of 1 failed: Command '['/usr/local/Cellar/hadoop/2.7.3/bin/hadoop', 'jar', '/usr/local/Cellar/hadoop/2.7.3/libexec/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar', '-files', 'hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/files/Mr_Jobs.py#Mr_Jobs.py,hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/files/mrjob.zip#mrjob.zip,hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/files/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/files/File.txt', '-output', 'hdfs:///user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.165022.965610/output', '-mapper', 'sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step-num=0 --mapper', '-reducer', 'sh -ex setup-wrapper.sh python3 Mr_Jobs.py --step-num=0 --reducer']' returned non-zero exit status 256.
Problem is if i run the same command with python2.7 then it runs fine & shows me the correct output.
Python3 is added in bash_profile.
export JAVA_HOME=$(/usr/libexec/java_home)
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/bin:/usr/local/sbin:$PATH
# Setting PATH for Python 2.6
PATH="/System/Library/Frameworks/Python.framework/Versions/2.6/bin:${PATH}"
export PATH
# Setting PATH for Python 2.7
PATH="/System/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH
# added by Anaconda2 4.2.0 installer
export PATH="/Users/bhoots21304/anaconda/bin:$PATH"
export HADOOP_HOME=/usr/local/Cellar/hadoop/2.7.3
export PATH=$HADOOP_HOME/bin:$PATH
export HIVE_HOME=/usr/local/Cellar/hive/2.1.0/libexec
export PATH=$HIVE_HOME:$PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/libexec/share/hadoop/common
export PATH=$HADOOP_COMMON_LIB_NATIVE_DIR:$PATH
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/libexec/share/hadoop"
export PATH=$HADOOP_OPTS:$PATH
export PYTHONPATH="$PYTHONPATH:/usr/local/Cellar/python3/3.6.1/bin"
# Setting PATH for Python 3.6
# The original version is saved in .bash_profile.pysave
PATH="/usr/local/Cellar/python3/3.6.1/bin:${PATH}"
export PATH
This is my MR_Jobs.py:
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[\w']+")
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MRWordFreqCount.run()
&&
I am running it on hadoop using this command:
python3 /Users/bhoots21304/PycharmProjects/untitled/MRJobs/Mr_Jobs.py -r hadoop /Users/bhoots21304/PycharmProjects/untitled/MRJobs/File.txt
If i run the same file using the above mentioned command on my ubuntu machine..it works but when i run the same thing on my mac machine it gives me an error.
Here are the logs from my mac machine :
2017-03-28 23:05:51,751 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-28 23:05:51,863 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-03-28 23:05:51,965 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-03-28 23:05:51,965 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-03-28 23:05:51,976 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-03-28 23:05:51,976 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1490719699504_0005, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#209da20d)
2017-03-28 23:05:52,254 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-03-28 23:05:52,632 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2017-03-28 23:05:52,632 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2017-03-28 23:05:52,632 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
+ __mrjob_PWD=/tmp/nm-local-
dir/usercache/bhoots21304/appcache/application_1490719699504_0005/
container_1490719699504_0005_01_000010
+ exec
+ python3 -c 'import fcntl; fcntl.flock(9, fcntl.LOCK_EX)'
setup-wrapper.sh: line 6: python3: command not found
2017-03-28 23:05:47,691 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017-03-28 23:05:47,802 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-03-28 23:05:47,879 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-03-28 23:05:47,879 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-03-28 23:05:47,889 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-03-28 23:05:47,889 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1490719699504_0005, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#209da20d)
2017-03-28 23:05:48,079 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-03-28 23:05:48,316 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /tmp/nm-local-dir/usercache/bhoots21304/appcache/application_1490719699504_0005
2017-03-28 23:05:48,498 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-03-28 23:05:48,805 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-03-28 23:05:48,810 INFO [main] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
2017-03-28 23:05:48,810 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : null
2017-03-28 23:05:48,908 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://localhost:9000/user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.173517.724664/files/File.txt:0+32
2017-03-28 23:05:48,923 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2017-03-28 23:05:48,983 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2017-03-28 23:05:48,984 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 100
2017-03-28 23:05:48,984 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 83886080
2017-03-28 23:05:48,984 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 104857600
2017-03-28 23:05:48,984 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 6553600
2017-03-28 23:05:48,989 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-03-28 23:05:49,001 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/bin/sh, -ex, setup-wrapper.sh, python3, Mr_Jobs.py, --step-num=0, --mapper]
2017-03-28 23:05:49,010 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
2017-03-28 23:05:49,010 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
2017-03-28 23:05:49,011 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: job.local.dir is deprecated. Instead, use mapreduce.job.local.dir
2017-03-28 23:05:49,011 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
2017-03-28 23:05:49,011 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2017-03-28 23:05:49,011 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
2017-03-28 23:05:49,011 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
2017-03-28 23:05:49,012 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
2017-03-28 23:05:49,012 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2017-03-28 23:05:49,012 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
2017-03-28 23:05:49,012 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files
2017-03-28 23:05:49,012 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
2017-03-28 23:05:49,013 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
2017-03-28 23:05:49,025 INFO [main] org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2017-03-28 23:05:49,026 INFO [Thread-14] org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2017-03-28 23:05:49,027 INFO [main] org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-03-28 23:05:49,028 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2017-03-28 23:05:49,031 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
2017-03-28 23:05:49,035 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://localhost:9000/user/bhoots21304/tmp/mrjob/Mr_Jobs.bhoots21304.20170328.173517.724664/output/_temporary/1/_temporary/attempt_1490719699504_0005_m_000000_2
2017-03-28 23:05:49,140 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...
2017-03-28 23:05:49,141 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.
2017-03-28 23:05:49,141 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
Mar 28, 2017 11:05:33 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Mar 28, 2017 11:05:33 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Mar 28, 2017 11:05:33 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Mar 28, 2017 11:05:33 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Mar 28, 2017 11:05:33 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Mar 28, 2017 11:05:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Mar 28, 2017 11:05:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Simply create a ~/.mrjob.conf file with this content:
runners:
hadoop:
python_bin: /usr/local/bin/python3
hadoop_bin: /usr/local/opt/hadoop/bin/hadoop
hadoop_streaming_jar: /usr/local/opt/hadoop/libexec/share/hadoop/tools/lib/hadoop-streaming-*.jar
Then run your program with this command:
python3 your_program.py -r hadoop input.txt

Related

EMR Core nodes are not taking up map reduce jobs

I have a 2 node EMR (Version 4.6.0) Cluster (1 master (m4.large) , 1 core (r4.xlarge) ) with HBase installed. I'm using default EMR configurations. I want to export HBase tables using
hbase org.apache.hadoop.hbase.mapreduce.Export -D hbase.mapreduce.include.deleted.rows=true Table_Name hdfs:/full_backup/Table_Name 1
I'm getting the following error
2022-04-04 11:29:20,626 INFO [main] util.RegionSizeCalculator: Calculating region sizes for table "Table_Name".
2022-04-04 11:29:20,900 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2022-04-04 11:29:20,900 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x17ff27095680070
2022-04-04 11:29:20,903 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x17ff27095680070
2022-04-04 11:29:20,904 INFO [main] zookeeper.ZooKeeper: Session: 0x17ff27095680070 closed
2022-04-04 11:29:20,980 INFO [main] mapreduce.JobSubmitter: number of splits:1
2022-04-04 11:29:20,994 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2022-04-04 11:29:21,192 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1649071534731_0002
2022-04-04 11:29:21,424 INFO [main] impl.YarnClientImpl: Submitted application application_1649071534731_0002
2022-04-04 11:29:21,454 INFO [main] mapreduce.Job: The url to track the job: http://ip-10-0-2-244.eu-west-1.compute.internal:20888/proxy/application_1649071534731_0002/
2022-04-04 11:29:21,455 INFO [main] mapreduce.Job: Running job: job_1649071534731_0002
2022-04-04 11:29:28,541 INFO [main] mapreduce.Job: Job job_1649071534731_0002 running in uber mode : false
2022-04-04 11:29:28,542 INFO [main] mapreduce.Job: map 0% reduce 0%
It is stuck at this progress and not running. However when I add a task node and redo the same command, it gets finished within seconds.
Based on the documentation, https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html , core node itself should handle tasks as well. What could be going wrong?

What should I do to fix Sqoop if I am getting a java.lang.NoClassDefFoundError exception during export?

I am attempting to export a Hive database table into a MySQL database table on an Amazon AWS cluster using the command:
sqoop export --connect jdbc:mysql://database_hostname/universities --table 19_20 --username admin -P --export-dir '/final/hive/19_20'
I am trying to export from the folder '/final/hive/19_20' which is the Hive output directory into a MySQL database 'universities', table '19_20'.
In response I get:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/redshift/jdbc/redshift-jdbc42-1.2.37.1061.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/04/11 01:42:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
21/04/11 01:42:18 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/04/11 01:42:18 INFO tool.CodeGenTool: Beginning code generation
21/04/11 01:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `19_20` AS t LIMIT 1
21/04/11 01:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `19_20` AS t LIMIT 1
21/04/11 01:42:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
/tmp/sqoop-hadoop/compile/8aac2b94e7d11dc02d064c8213465c05/_19_20.java:37: warning: Can't initialize javac processor due to (most likely) a class loader problem: java.lang.NoClassDefFoundError: com/sun/tools/javac/processing/JavacProcessingEnvironment
public class _19_20 extends SqoopRecord implements DBWritable, Writable {
^
at lombok.javac.apt.LombokProcessor.getJavacProcessingEnvironment(LombokProcessor.java:411)
at lombok.javac.apt.LombokProcessor.init(LombokProcessor.java:91)
at lombok.core.AnnotationProcessor$JavacDescriptor.want(AnnotationProcessor.java:124)
at lombok.core.AnnotationProcessor.init(AnnotationProcessor.java:177)
at lombok.launch.AnnotationProcessorHider$AnnotationProcessor.init(AnnotationProcessor.java:73)
at com.sun.tools.javac.processing.JavacProcessingEnvironment$ProcessorState.<init>(JavacProcessingEnvironment.java:508)
at com.sun.tools.javac.processing.JavacProcessingEnvironment$DiscoveredProcessors$ProcessorStateIterator.next(JavacProcessingEnvironment.java:605)
at com.sun.tools.javac.processing.JavacProcessingEnvironment.discoverAndRunProcs(JavacProcessingEnvironment.java:698)
at com.sun.tools.javac.processing.JavacProcessingEnvironment.access$1800(JavacProcessingEnvironment.java:91)
at com.sun.tools.javac.processing.JavacProcessingEnvironment$Round.run(JavacProcessingEnvironment.java:1043)
at com.sun.tools.javac.processing.JavacProcessingEnvironment.doProcessing(JavacProcessingEnvironment.java:1184)
at com.sun.tools.javac.main.JavaCompiler.processAnnotations(JavaCompiler.java:1170)
at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:856)
at com.sun.tools.javac.main.Main.compile(Main.java:523)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:129)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:138)
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:224)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:63)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: com.sun.tools.javac.processing.JavacProcessingEnvironment
at java.lang.ClassLoader.findClass(ClassLoader.java:523)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at lombok.launch.ShadowClassLoader.loadClass(ShadowClassLoader.java:530)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 26 more
Note: /tmp/sqoop-hadoop/compile/8aac2b94e7d11dc02d064c8213465c05/_19_20.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
1 warning
21/04/11 01:42:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/8aac2b94e7d11dc02d064c8213465c05/19_20.jar
21/04/11 01:42:24 INFO mapreduce.ExportJobBase: Beginning export of 19_20
21/04/11 01:42:24 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
21/04/11 01:42:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
21/04/11 01:42:26 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
21/04/11 01:42:26 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
21/04/11 01:42:26 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-6-179.ec2.internal/172.31.6.179:8032
21/04/11 01:42:26 INFO client.AHSProxy: Connecting to Application History server at ip-172-31-6-179.ec2.internal/172.31.6.179:10200
21/04/11 01:42:28 INFO input.FileInputFormat: Total input files to process : 1
21/04/11 01:42:29 INFO input.FileInputFormat: Total input files to process : 1
21/04/11 01:42:29 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
21/04/11 01:42:29 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 3fb854bbfdabadafad1fa2cca072658fa097fd67]
21/04/11 01:42:29 INFO mapreduce.JobSubmitter: number of splits:4
21/04/11 01:42:29 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
21/04/11 01:42:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1618090360850_0017
21/04/11 01:42:29 INFO conf.Configuration: resource-types.xml not found
21/04/11 01:42:29 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/04/11 01:42:29 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/04/11 01:42:29 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/04/11 01:42:29 INFO impl.YarnClientImpl: Submitted application application_1618090360850_0017
21/04/11 01:42:29 INFO mapreduce.Job: The url to track the job: http://ip-172-31-6-179.ec2.internal:20888/proxy/application_1618090360850_0017/
21/04/11 01:42:29 INFO mapreduce.Job: Running job: job_1618090360850_0017
21/04/11 01:42:37 INFO mapreduce.Job: Job job_1618090360850_0017 running in uber mode : false
21/04/11 01:42:37 INFO mapreduce.Job: map 0% reduce 0%
21/04/11 01:43:00 INFO mapreduce.Job: map 100% reduce 0%
21/04/11 01:43:01 INFO mapreduce.Job: Job job_1618090360850_0017 failed with state FAILED due to: Task failed task_1618090360850_0017_m_000002
Job failed as tasks failed. failedMaps:1 failedReduces:0
21/04/11 01:43:01 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=3
Killed map tasks=1
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=3779136
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=78732
Total vcore-milliseconds taken by all map tasks=78732
Total megabyte-milliseconds taken by all map tasks=120932352
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
21/04/11 01:43:01 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
21/04/11 01:43:01 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 34.8867 seconds (0 bytes/sec)
21/04/11 01:43:01 INFO mapreduce.ExportJobBase: Exported 0 records.
21/04/11 01:43:01 ERROR mapreduce.ExportJobBase: Export job failed!
21/04/11 01:43:01 ERROR tool.ExportTool: Error during export:
Export job failed!
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Please let me know if this can be fixed and what to do to fix it.

I was not able to fully resolve sqoop exports on AWS, however I stopped receiving the errors with Lombok by downgrading to the prior version of emr.
I hope this helps anyone else experiencing this issue.

Grunt - Mapreduce Mode: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximum

I'm running a mapreduce mode in Apache Pig version 0.17.0 to simply dump a few lines of text data from a file on HDFS Hadoop-2.7.2
When executing the dump command, the execution goes very slow, however it gets completed. I see some failures during execution shown below:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1589604570386_0002]
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
[main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1589604570386_0002]
[main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
[main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
[main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
[main] WARN org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Failed to get map task report
java.io.IOException: java.net.ConnectException: Call From localhost/127.0.0.1 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:343)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:428)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:572)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:184)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.getTaskReports(MRJobStats.java:528)
at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addMapReduceStatistics(MRJobStats.java:355)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:232)
at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:164)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:379)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1475)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1460)
at org.apache.pig.PigServer.storeEx(PigServer.java:1119)
at org.apache.pig.PigServer.store(PigServer.java:1082)
at org.apache.pig.PigServer.openIterator(PigServer.java:995)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:782)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:383)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:564)
at org.apache.pig.Main.main(Main.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Is there away to speed up the mapreduce job?

Why is HBase Rest End Point not starting on AWS?

I have a HBase Schema setup on an Amazon EMR Cluster running 3 m3.xlarge instances with Amazon Linux Image. When I issue the command 'hbase rest start' it's not starting and I'm getting the following output. What can I do?
Output:
[hadoop#ip-10-81-13-20 ~]$ hbase rest start
2016-08-01 08:29:27,863 INFO [main] util.VersionInfo: HBase 1.2.1
2016-08-01 08:29:27,863 INFO [main] util.VersionInfo: Source code repository file:///workspace/workspace/bigtop.release-rpm-4.7.2/build/hbase/rpm/BUILD/hbase-1.2.1 revision=Unknown
2016-08-01 08:29:27,863 INFO [main] util.VersionInfo: Compiled by ec2-user on Fri Jul 8 02:16:27 UTC 2016
2016-08-01 08:29:27,863 INFO [main] util.VersionInfo: From source with checksum b1b31eefd0314d3ed5fa7036ed0201e9
2016-08-01 08:29:28,870 INFO [main] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties
2016-08-01 08:29:28,967 INFO [main] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-08-01 08:29:28,967 INFO [main] impl.MetricsSystemImpl: HBase metrics system started
2016-08-01 08:29:29,034 INFO [main] mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-08-01 08:29:29,081 INFO [main] http.HttpRequestLog: Http request log for http.requests.rest is not defined
2016-08-01 08:29:29,108 INFO [main] http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter)
2016-08-01 08:29:29,109 INFO [main] http.HttpServer: Added global filter 'clickjackingprevention' (class=org.apache.hadoop.hbase.http.ClickjackingPreventionFilter)
2016-08-01 08:29:29,114 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context rest
2016-08-01 08:29:29,114 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2016-08-01 08:29:29,114 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2016-08-01 08:29:29,129 INFO [main] http.HttpServer: HttpServer.start() threw a non Bind IOException
java.net.BindException: Port in use: 0.0.0.0:8085
at org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1017)
at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:953)
at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91)
at org.apache.hadoop.hbase.rest.RESTServer.main(RESTServer.java:248)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:463)
at sun.nio.ch.Net.bind(Net.java:455)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1012)
... 3 more
Exception in thread "main" java.net.BindException: Port in use: 0.0.0.0:8085
at org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1017)
at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:953)
at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91)
at org.apache.hadoop.hbase.rest.RESTServer.main(RESTServer.java:248)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:463)
at sun.nio.ch.Net.bind(Net.java:455)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1012)
... 3 more
2016-08-01 08:29:29,133 INFO [Shutdown] mortbay.log: Shutdown hook executing
2016-08-01 08:29:29,133 INFO [Shutdown] mortbay.log: Shutdown hook complete

(Answering my own question)
The defaults in AWS EMR for HBase ports are different from the regular HBase. From here we can say that the rest port for HBase is 8070 and the port for the UI is 8085. One could use them.
That said, there's always the -p option. Use hbase rest start -p portnumber to start the HBase rest server on a port number of your choice.
There's probably another process using the 8080 port that's why you can't start the HBase server using only hbase rest start.

Hadoop C++, error running wordcount example

I am trying to run the wordcount example in c++, on Hadoop 1.0.4, on Ubuntu 12.04, but I am getting the following error:
Command:
hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwriter=true -input bin/input.txt -output
bin/output.txt -program bin/wordcount.
Error message:
13/06/14 13:50:11 WARN mapred.JobClient: No job jar file set. User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
13/06/14 13:50:11 INFO util.NativeCodeLoader:
Loaded the native-hadoop library 13/06/14 13:50:11 WARN
snappy.LoadSnappy: Snappy native library not loaded 13/06/14 13:50:11
INFO mapred.FileInputFormat: Total input paths to process : 1 13/06/14
13:50:11 INFO mapred.JobClient: Running job: job_201306141334_0003
13/06/14 13:50:12 INFO mapred.JobClient: map 0% reduce 0% 13/06/14
13:50:24 INFO mapred.JobClient: Task Id :
attempt_201306141334_0003_m_000000_0, Status : FAILED
java.io.IOException at
org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at
org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at
org.apache.hadoop.mapred.pipes.Application.(Application.java:149)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:71)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201306141334_0003_m_000000_0: Server failed to authenticate.
Exiting 13/06/14 13:50:24 INFO mapred.JobClient: Task Id :
attempt_201306141334_0003_m_000001_0, Status : FAILED
I didn't find any solution and i've been trying for quite a while to make it work.
I appreciate your help,
Thanks.

Found this SO question (hadoop not running in the multinode cluster) where that user got similar errors and it ended up being that they did not "Set a class" according to the top answer. This was Java however.
I found this tutorial about running the C++ wordcount example in Hadoop. Hopefully this helps you out.
http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Hadoop command fails with python3 & works with python 2.7 - python-2.7

Related

EMR Core nodes are not taking up map reduce jobs

What should I do to fix Sqoop if I am getting a java.lang.NoClassDefFoundError exception during export?

Grunt - Mapreduce Mode: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximum

Why is HBase Rest End Point not starting on AWS?

Hadoop C++, error running wordcount example

Categories

Resources