Hadoop C++, error running wordcount example

Hadoop C++, error running wordcount example - c++

I am trying to run the wordcount example in c++, on Hadoop 1.0.4, on Ubuntu 12.04, but I am getting the following error:
Command:
hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwriter=true -input bin/input.txt -output
bin/output.txt -program bin/wordcount.
Error message:
13/06/14 13:50:11 WARN mapred.JobClient: No job jar file set. User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
13/06/14 13:50:11 INFO util.NativeCodeLoader:
Loaded the native-hadoop library 13/06/14 13:50:11 WARN
snappy.LoadSnappy: Snappy native library not loaded 13/06/14 13:50:11
INFO mapred.FileInputFormat: Total input paths to process : 1 13/06/14
13:50:11 INFO mapred.JobClient: Running job: job_201306141334_0003
13/06/14 13:50:12 INFO mapred.JobClient: map 0% reduce 0% 13/06/14
13:50:24 INFO mapred.JobClient: Task Id :
attempt_201306141334_0003_m_000000_0, Status : FAILED
java.io.IOException at
org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at
org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at
org.apache.hadoop.mapred.pipes.Application.(Application.java:149)
at
org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:71)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at
org.apache.hadoop.mapred.Child$4.run(Child.java:255) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201306141334_0003_m_000000_0: Server failed to authenticate.
Exiting 13/06/14 13:50:24 INFO mapred.JobClient: Task Id :
attempt_201306141334_0003_m_000001_0, Status : FAILED
I didn't find any solution and i've been trying for quite a while to make it work.
I appreciate your help,
Thanks.

Found this SO question (hadoop not running in the multinode cluster) where that user got similar errors and it ended up being that they did not "Set a class" according to the top answer. This was Java however.
I found this tutorial about running the C++ wordcount example in Hadoop. Hopefully this helps you out.
http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop

Related

What should I do to fix Sqoop if I am getting a java.lang.NoClassDefFoundError exception during export?

I am attempting to export a Hive database table into a MySQL database table on an Amazon AWS cluster using the command:
sqoop export --connect jdbc:mysql://database_hostname/universities --table 19_20 --username admin -P --export-dir '/final/hive/19_20'
I am trying to export from the folder '/final/hive/19_20' which is the Hive output directory into a MySQL database 'universities', table '19_20'.
In response I get:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/redshift/jdbc/redshift-jdbc42-1.2.37.1061.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/04/11 01:42:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
21/04/11 01:42:18 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/04/11 01:42:18 INFO tool.CodeGenTool: Beginning code generation
21/04/11 01:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `19_20` AS t LIMIT 1
21/04/11 01:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `19_20` AS t LIMIT 1
21/04/11 01:42:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
/tmp/sqoop-hadoop/compile/8aac2b94e7d11dc02d064c8213465c05/_19_20.java:37: warning: Can't initialize javac processor due to (most likely) a class loader problem: java.lang.NoClassDefFoundError: com/sun/tools/javac/processing/JavacProcessingEnvironment
public class _19_20 extends SqoopRecord implements DBWritable, Writable {
^
at lombok.javac.apt.LombokProcessor.getJavacProcessingEnvironment(LombokProcessor.java:411)
at lombok.javac.apt.LombokProcessor.init(LombokProcessor.java:91)
at lombok.core.AnnotationProcessor$JavacDescriptor.want(AnnotationProcessor.java:124)
at lombok.core.AnnotationProcessor.init(AnnotationProcessor.java:177)
at lombok.launch.AnnotationProcessorHider$AnnotationProcessor.init(AnnotationProcessor.java:73)
at com.sun.tools.javac.processing.JavacProcessingEnvironment$ProcessorState.<init>(JavacProcessingEnvironment.java:508)
at com.sun.tools.javac.processing.JavacProcessingEnvironment$DiscoveredProcessors$ProcessorStateIterator.next(JavacProcessingEnvironment.java:605)
at com.sun.tools.javac.processing.JavacProcessingEnvironment.discoverAndRunProcs(JavacProcessingEnvironment.java:698)
at com.sun.tools.javac.processing.JavacProcessingEnvironment.access$1800(JavacProcessingEnvironment.java:91)
at com.sun.tools.javac.processing.JavacProcessingEnvironment$Round.run(JavacProcessingEnvironment.java:1043)
at com.sun.tools.javac.processing.JavacProcessingEnvironment.doProcessing(JavacProcessingEnvironment.java:1184)
at com.sun.tools.javac.main.JavaCompiler.processAnnotations(JavaCompiler.java:1170)
at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:856)
at com.sun.tools.javac.main.Main.compile(Main.java:523)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:129)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:138)
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:224)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:63)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: com.sun.tools.javac.processing.JavacProcessingEnvironment
at java.lang.ClassLoader.findClass(ClassLoader.java:523)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at lombok.launch.ShadowClassLoader.loadClass(ShadowClassLoader.java:530)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 26 more
Note: /tmp/sqoop-hadoop/compile/8aac2b94e7d11dc02d064c8213465c05/_19_20.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
1 warning
21/04/11 01:42:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/8aac2b94e7d11dc02d064c8213465c05/19_20.jar
21/04/11 01:42:24 INFO mapreduce.ExportJobBase: Beginning export of 19_20
21/04/11 01:42:24 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
21/04/11 01:42:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
21/04/11 01:42:26 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
21/04/11 01:42:26 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
21/04/11 01:42:26 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-6-179.ec2.internal/172.31.6.179:8032
21/04/11 01:42:26 INFO client.AHSProxy: Connecting to Application History server at ip-172-31-6-179.ec2.internal/172.31.6.179:10200
21/04/11 01:42:28 INFO input.FileInputFormat: Total input files to process : 1
21/04/11 01:42:29 INFO input.FileInputFormat: Total input files to process : 1
21/04/11 01:42:29 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
21/04/11 01:42:29 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 3fb854bbfdabadafad1fa2cca072658fa097fd67]
21/04/11 01:42:29 INFO mapreduce.JobSubmitter: number of splits:4
21/04/11 01:42:29 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
21/04/11 01:42:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1618090360850_0017
21/04/11 01:42:29 INFO conf.Configuration: resource-types.xml not found
21/04/11 01:42:29 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/04/11 01:42:29 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/04/11 01:42:29 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/04/11 01:42:29 INFO impl.YarnClientImpl: Submitted application application_1618090360850_0017
21/04/11 01:42:29 INFO mapreduce.Job: The url to track the job: http://ip-172-31-6-179.ec2.internal:20888/proxy/application_1618090360850_0017/
21/04/11 01:42:29 INFO mapreduce.Job: Running job: job_1618090360850_0017
21/04/11 01:42:37 INFO mapreduce.Job: Job job_1618090360850_0017 running in uber mode : false
21/04/11 01:42:37 INFO mapreduce.Job: map 0% reduce 0%
21/04/11 01:43:00 INFO mapreduce.Job: map 100% reduce 0%
21/04/11 01:43:01 INFO mapreduce.Job: Job job_1618090360850_0017 failed with state FAILED due to: Task failed task_1618090360850_0017_m_000002
Job failed as tasks failed. failedMaps:1 failedReduces:0
21/04/11 01:43:01 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=3
Killed map tasks=1
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=3779136
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=78732
Total vcore-milliseconds taken by all map tasks=78732
Total megabyte-milliseconds taken by all map tasks=120932352
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
21/04/11 01:43:01 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
21/04/11 01:43:01 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 34.8867 seconds (0 bytes/sec)
21/04/11 01:43:01 INFO mapreduce.ExportJobBase: Exported 0 records.
21/04/11 01:43:01 ERROR mapreduce.ExportJobBase: Export job failed!
21/04/11 01:43:01 ERROR tool.ExportTool: Error during export:
Export job failed!
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Please let me know if this can be fixed and what to do to fix it.

I was not able to fully resolve sqoop exports on AWS, however I stopped receiving the errors with Lombok by downgrading to the prior version of emr.
I hope this helps anyone else experiencing this issue.

Map reduce Started and it wasn't ending. mapreduce.job jobRunning job : job XXXXXXXXXXXX

I have created a runnable jar file and executing in Hadoop, I am not getting the Output. The code works fine. I have checked it in eclipse with adding hadoop jar files and I have got the Output absolutely right
hduser#Strawhats:~$ hadoop jar /home/hduser/Desktop/project.jar /user/hduser/input /user/hduser/output
17/02/20 19:18:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/20 19:18:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/02/20 19:18:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/02/20 19:18:05 INFO mapred.FileInputFormat: Total input paths to process : 1
17/02/20 19:18:05 INFO mapreduce.JobSubmitter: number of splits:2
17/02/20 19:18:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1487596891791_0003
17/02/20 19:18:06 INFO impl.YarnClientImpl: Submitted application application_1487596891791_0003
17/02/20 19:18:06 INFO mapreduce.Job: The url to track the job: http://Strawhats:8088/proxy/application_1487596891791_0003/
17/02/20 19:18:06 INFO mapreduce.Job: Running job: job_1487596891791_0003

Hadoop streaming mapreduce does not run

I have downloaded (since I don't have space for running CDH or Sandbox) Hadoop 2.6.0 and hadoop streaming from here
I ran the command of
bin/hadoop jar contrib/hadoop-streaming-2.6.0.jar \
-file ${HADOOP_HOME}/py_mapred/mapper.py -mapper ${HADOOP_HOME}/py_mapred/mapper.py \
-file ${HADOOP_HOME}/py_mapred/reducer.py -reducer ${HADOOP_HOME}/py_mapred/reducer.py \
-input /input/davinci/* -output /input/davinci-output
where I stored the downloaded streaming jar in ${HADOOP_HOME}/contrib, and the other files in py_mapred. At the same time, I copyFromLocal to /input directory on hdfs. Now, when I run the command, the following lines show up:
15/08/14 17:35:45 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
15/08/14 17:35:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/usr/local/cellar/hadoop/2.6.0/py_mapred/mapper.py, /usr/local/cellar/hadoop/2.6.0/py_mapred/reducer.py, /var/folders/c5/4xfj65v15g91f71c_b9whnpr0000gn/T/hadoop-unjar3313567263260134566/] [] /var/folders/c5/4xfj65v15g91f71c_b9whnpr0000gn/T/streamjob9165494241574343777.jar tmpDir=null
15/08/14 17:35:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/08/14 17:35:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/08/14 17:35:48 INFO mapred.FileInputFormat: Total input paths to process : 1
15/08/14 17:35:48 INFO mapreduce.JobSubmitter: number of splits:2
15/08/14 17:35:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1439538212023_0002
15/08/14 17:35:49 INFO impl.YarnClientImpl: Submitted application application_1439538212023_0002
15/08/14 17:35:49 INFO mapreduce.Job: The url to track the job: http://Jonathans-MacBook-Pro.local:8088/proxy/application_1439538212023_0002/
15/08/14 17:35:49 INFO mapreduce.Job: Running job: job_1439538212023_0002
It looks like the command has been accepted. I checked on localhost:8088 and the job does register. However it's not running, despite the fact that it says Running job: job_1439538212023_0002. Is there something wrong with my command? Is it due to permission setting? Why isn't the job running?
Thank you

Here is right way for streaming:
bin/hadoop jar contrib/hadoop-streaming-2.6.0.jar \
-file ${HADOOP_HOME}/py_mapred/mapper.py -mapper '/usr/bin/python mapper.py' -file ${HADOOP_HOME}/py_mapred/reducer.py -reducer '/usr/bin/python reducer.py' -input /input/davinci/* -output /input/davinci-output

Hadoop MapReduce Job Hangs

I am trying to simulate the Hadoop environment using latest Hadoop version 2.6.0, Java SDK 1.70 on my Ubuntu desktop. I configured the hadoop with necessary environment parameters and all its processes are up and running and they can be seen with the following jps command:
nandu#nandu-Desktop:~$ jps
2810 NameNode
3149 SecondaryNameNode
3416 NodeManager
3292 ResourceManager
2966 DataNode
4805 Jps
I could also see the above information, plus the dfs files through the Firefox browser. However, when I tried to run a simple WordCound MapReduce job, it hangs and it doesn't produce any output or shows any error message(s). After a while I killed the process using the "hadoop job -kill " command. Can you please guide me, to find the cause of this issue and how to resolve it? I am giving below the Job start and kill(end) screenshot.
If you need additional information, please let me know.
Your help will be highly appreciated.
Thanks,
===================================================================
nandu#nandu-Desktop:~/dev$ hadoop jar wc.jar WordCount /user/nandu/input /user/nandu/output
15/02/27 10:35:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/27 10:35:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/02/27 10:35:21 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/02/27 10:35:21 INFO input.FileInputFormat: Total input paths to process : 2
15/02/27 10:35:21 INFO mapreduce.JobSubmitter: number of splits:2
15/02/27 10:35:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1425048764581_0003
15/02/27 10:35:22 INFO impl.YarnClientImpl: Submitted application application_1425048764581_0003
15/02/27 10:35:22 INFO mapreduce.Job: The url to track the job: http://nandu-Desktop:8088/proxy/application_1425048764581_0003/
15/02/27 10:35:22 INFO mapreduce.Job: Running job: job_1425048764581_0003
==================== at this point the job was killed ===================
15/02/27 10:38:23 INFO mapreduce.Job: Job job_1425048764581_0003 running in uber mode : false
15/02/27 10:38:23 INFO mapreduce.Job: map 0% reduce 0%
15/02/27 10:38:23 INFO mapreduce.Job: Job job_1425048764581_0003 failed with state KILLED due to: Application killed by user.
15/02/27 10:38:23 INFO mapreduce.Job: Counters: 0

I encountered similar problem while running provided MapReduce sample in hadoop package. In my case it was hanging due to low disk space on my VM (about 1.5 GB was empty). When I freed some disk space it ran pretty fine. Also, please check other system resource requirements are fulfilled.

ElephantBird doesn't work with AWS EMR

I'm trying to run a Pig script along with ElephantBird within AWS EMR. I'm using Hadoop 2.x to do that but I get the following message:
2014-09-09 14:53:11,001 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
2014-09-09 14:53:11,029 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
2014-09-09 14:53:11,040 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
at com.twitter.elephantbird.pig.util.PigCounterHelper.incrCounter(PigCounterHelper.java:55)
at com.twitter.elephantbird.pig.load.LzoBaseLoadFunc.incrCounter(LzoBaseLoadFunc.java:70)
at com.twitter.elephantbird.pig.load.JsonLoader.getNext(JsonLoader.java:128)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:544)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:775)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I've tried with different versions of elephant bird (from 3.0.9 to 4.0.x) but none of them seems to work. I degraded my cluster to Hadoop 1.x and I had no problem using elephant bird.
Any idea?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Hadoop C++, error running wordcount example - c++

Related

What should I do to fix Sqoop if I am getting a java.lang.NoClassDefFoundError exception during export?

Map reduce Started and it wasn't ending. mapreduce.job jobRunning job : job XXXXXXXXXXXX

Hadoop streaming mapreduce does not run

Hadoop MapReduce Job Hangs

ElephantBird doesn't work with AWS EMR

Categories

Resources