This question already has an answer here:
Sqoop on Dataproc cannot export data to Avro format
(1 answer)
Closed 3 years ago.
I have submitted Sqoop job via GCP Dataproc Cluster and set it --as-avrodatafile configuration argument, but it is failing with below error:
/08/12 22:34:34 INFO impl.YarnClientImpl: Submitted application application_1565634426340_0021
19/08/12 22:34:34 INFO mapreduce.Job: The url to track the job: http://sqoop-gcp-ingest-mzp-m:8088/proxy/application_1565634426340_0021/
19/08/12 22:34:34 INFO mapreduce.Job: Running job: job_1565634426340_0021
19/08/12 22:34:40 INFO mapreduce.Job: Job job_1565634426340_0021 running in uber mode : false
19/08/12 22:34:40 INFO mapreduce.Job: map 0% reduce 0%
19/08/12 22:34:45 INFO mapreduce.Job: Task Id : attempt_1565634426340_0021_m_000000_0, Status : FAILED
Error: org.apache.avro.reflect.ReflectData.addLogicalTypeConversion(Lorg/apache/avro/Conversion;)V
19/08/12 22:34:50 INFO mapreduce.Job: Task Id : attempt_1565634426340_0021_m_000000_1, Status : FAILED
Error: org.apache.avro.reflect.ReflectData.addLogicalTypeConversion(Lorg/apache/avro/Conversion;)V
19/08/12 22:34:55 INFO mapreduce.Job: Task Id : attempt_1565634426340_0021_m_000000_2, Status : FAILED
Error: org.apache.avro.reflect.ReflectData.addLogicalTypeConversion(Lorg/apache/avro/Conversion;)V
19/08/12 22:35:00 INFO mapreduce.Job: map 100% reduce 0%
19/08/12 22:35:01 INFO mapreduce.Job: Job job_1565634426340_0021 failed with state FAILED due to: Task failed task_1565634426340_0021_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
19/08/12 22:35:01 INFO mapreduce.Job: Counters: 11
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=41976
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=13992
Total vcore-milliseconds taken by all map tasks=13992
Total megabyte-milliseconds taken by all map tasks=42983424
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
19/08/12 22:35:01 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
19/08/12 22:35:01 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 30.5317 seconds (0 bytes/sec)
19/08/12 22:35:01 INFO mapreduce.ImportJobBase: Retrieved 0 records.
19/08/12 22:35:01 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader#61baa894
19/08/12 22:35:01 ERROR tool.ImportTool: Import failed: Import job failed!
19/08/12 22:35:01 DEBUG manager.OracleManager$ConnCache: Caching released connection for jdbc:oracle:thin:#10.25.42.52:1521/uataca.aaamidatlantic.com/GCPREADER
Job output is complete
Without specifying --as-avrodatafile argument it is working fine.
To fix this issue you need to set mapreduce.job.classloader property value to true when submitting your job:
gcloud dataproc jobs submit hadoop --cluster="${CLUSTER_NAME}" \
--class="org.apache.sqoop.Sqoop" \
--properties="mapreduce.job.classloader=true" \
. . .
-- \
--as-avrodatafile \
. . .
Related
We are runing a spark job which runs close to 30 scripts one by one. it usually takes 14-15h to run, but this time it failed in 13h. Below is the details:
Command:spark-submit --executor-memory=80g --executor-cores=5 --conf spark.sql.shuffle.partitions=800 run.py
Setup: Running spark jobs via jenkins on AWS EMR with 16 spot nodes
Error: Since the YARN log is huge (270Mb+), below are some extracts from it:
[2022-07-25 04:50:08.646]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : ermediates/master/email/_temporary/0/_temporary/attempt_202207250435265404741257029168752_0641_m_000599_168147 s3://memberanalytics-data-out-prod/pipelined_intermediates/master/email/_temporary/0/task_202207250435265404741257029168752_0641_m_000599 using algorithm version 1 22/07/25 04:37:05 INFO FileOutputCommitter: Saved output of task 'attempt_202207250435265404741257029168752_0641_m_000599_168147' to s3://memberanalytics-data-out-prod/pipelined_intermediates/master/email/_temporary/0/task_202207250435265404741257029168752_0641_m_000599 22/07/25 04:37:05 INFO SparkHadoopMapRedUtil: attempt_202207250435265404741257029168752_0641_m_000599_168147: Committed 22/07/25 04:37:05 INFO Executor: Finished task 599.0 in stage 641.0 (TID 168147). 9341 bytes result sent to driver 22/07/25 04:49:36 ERROR YarnCoarseGrainedExecutorBackend: Executor self-exiting due to : Driver ip-10-13-52-109.bjw2k.asg:45383 disassociated! Shutting down. 22/07/25 04:49:36 INFO MemoryStore: MemoryStore cleared 22/07/25 04:49:36 INFO BlockManager: BlockManager stopped 22/07/25 04:50:06 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95) 22/07/25 04:50:06 ERROR Utils: Uncaught exception in thread shutdown-hook-0 java.lang.InterruptedException
I have a 2 node EMR (Version 4.6.0) Cluster (1 master (m4.large) , 1 core (r4.xlarge) ) with HBase installed. I'm using default EMR configurations. I want to export HBase tables using
hbase org.apache.hadoop.hbase.mapreduce.Export -D hbase.mapreduce.include.deleted.rows=true Table_Name hdfs:/full_backup/Table_Name 1
I'm getting the following error
2022-04-04 11:29:20,626 INFO [main] util.RegionSizeCalculator: Calculating region sizes for table "Table_Name".
2022-04-04 11:29:20,900 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2022-04-04 11:29:20,900 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x17ff27095680070
2022-04-04 11:29:20,903 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x17ff27095680070
2022-04-04 11:29:20,904 INFO [main] zookeeper.ZooKeeper: Session: 0x17ff27095680070 closed
2022-04-04 11:29:20,980 INFO [main] mapreduce.JobSubmitter: number of splits:1
2022-04-04 11:29:20,994 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2022-04-04 11:29:21,192 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1649071534731_0002
2022-04-04 11:29:21,424 INFO [main] impl.YarnClientImpl: Submitted application application_1649071534731_0002
2022-04-04 11:29:21,454 INFO [main] mapreduce.Job: The url to track the job: http://ip-10-0-2-244.eu-west-1.compute.internal:20888/proxy/application_1649071534731_0002/
2022-04-04 11:29:21,455 INFO [main] mapreduce.Job: Running job: job_1649071534731_0002
2022-04-04 11:29:28,541 INFO [main] mapreduce.Job: Job job_1649071534731_0002 running in uber mode : false
2022-04-04 11:29:28,542 INFO [main] mapreduce.Job: map 0% reduce 0%
It is stuck at this progress and not running. However when I add a task node and redo the same command, it gets finished within seconds.
Based on the documentation, https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html , core node itself should handle tasks as well. What could be going wrong?
I am attempting to export a Hive database table into a MySQL database table on an Amazon AWS cluster using the command:
sqoop export --connect jdbc:mysql://database_hostname/universities --table 19_20 --username admin -P --export-dir '/final/hive/19_20'
I am trying to export from the folder '/final/hive/19_20' which is the Hive output directory into a MySQL database 'universities', table '19_20'.
In response I get:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/aws/redshift/jdbc/redshift-jdbc42-1.2.37.1061.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/04/11 01:42:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
21/04/11 01:42:18 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/04/11 01:42:18 INFO tool.CodeGenTool: Beginning code generation
21/04/11 01:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `19_20` AS t LIMIT 1
21/04/11 01:42:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `19_20` AS t LIMIT 1
21/04/11 01:42:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
/tmp/sqoop-hadoop/compile/8aac2b94e7d11dc02d064c8213465c05/_19_20.java:37: warning: Can't initialize javac processor due to (most likely) a class loader problem: java.lang.NoClassDefFoundError: com/sun/tools/javac/processing/JavacProcessingEnvironment
public class _19_20 extends SqoopRecord implements DBWritable, Writable {
^
at lombok.javac.apt.LombokProcessor.getJavacProcessingEnvironment(LombokProcessor.java:411)
at lombok.javac.apt.LombokProcessor.init(LombokProcessor.java:91)
at lombok.core.AnnotationProcessor$JavacDescriptor.want(AnnotationProcessor.java:124)
at lombok.core.AnnotationProcessor.init(AnnotationProcessor.java:177)
at lombok.launch.AnnotationProcessorHider$AnnotationProcessor.init(AnnotationProcessor.java:73)
at com.sun.tools.javac.processing.JavacProcessingEnvironment$ProcessorState.<init>(JavacProcessingEnvironment.java:508)
at com.sun.tools.javac.processing.JavacProcessingEnvironment$DiscoveredProcessors$ProcessorStateIterator.next(JavacProcessingEnvironment.java:605)
at com.sun.tools.javac.processing.JavacProcessingEnvironment.discoverAndRunProcs(JavacProcessingEnvironment.java:698)
at com.sun.tools.javac.processing.JavacProcessingEnvironment.access$1800(JavacProcessingEnvironment.java:91)
at com.sun.tools.javac.processing.JavacProcessingEnvironment$Round.run(JavacProcessingEnvironment.java:1043)
at com.sun.tools.javac.processing.JavacProcessingEnvironment.doProcessing(JavacProcessingEnvironment.java:1184)
at com.sun.tools.javac.main.JavaCompiler.processAnnotations(JavaCompiler.java:1170)
at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:856)
at com.sun.tools.javac.main.Main.compile(Main.java:523)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:129)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:138)
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:224)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:63)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: com.sun.tools.javac.processing.JavacProcessingEnvironment
at java.lang.ClassLoader.findClass(ClassLoader.java:523)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at lombok.launch.ShadowClassLoader.loadClass(ShadowClassLoader.java:530)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 26 more
Note: /tmp/sqoop-hadoop/compile/8aac2b94e7d11dc02d064c8213465c05/_19_20.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
1 warning
21/04/11 01:42:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/8aac2b94e7d11dc02d064c8213465c05/19_20.jar
21/04/11 01:42:24 INFO mapreduce.ExportJobBase: Beginning export of 19_20
21/04/11 01:42:24 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
21/04/11 01:42:26 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
21/04/11 01:42:26 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
21/04/11 01:42:26 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
21/04/11 01:42:26 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-6-179.ec2.internal/172.31.6.179:8032
21/04/11 01:42:26 INFO client.AHSProxy: Connecting to Application History server at ip-172-31-6-179.ec2.internal/172.31.6.179:10200
21/04/11 01:42:28 INFO input.FileInputFormat: Total input files to process : 1
21/04/11 01:42:29 INFO input.FileInputFormat: Total input files to process : 1
21/04/11 01:42:29 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
21/04/11 01:42:29 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 3fb854bbfdabadafad1fa2cca072658fa097fd67]
21/04/11 01:42:29 INFO mapreduce.JobSubmitter: number of splits:4
21/04/11 01:42:29 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
21/04/11 01:42:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1618090360850_0017
21/04/11 01:42:29 INFO conf.Configuration: resource-types.xml not found
21/04/11 01:42:29 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/04/11 01:42:29 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/04/11 01:42:29 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/04/11 01:42:29 INFO impl.YarnClientImpl: Submitted application application_1618090360850_0017
21/04/11 01:42:29 INFO mapreduce.Job: The url to track the job: http://ip-172-31-6-179.ec2.internal:20888/proxy/application_1618090360850_0017/
21/04/11 01:42:29 INFO mapreduce.Job: Running job: job_1618090360850_0017
21/04/11 01:42:37 INFO mapreduce.Job: Job job_1618090360850_0017 running in uber mode : false
21/04/11 01:42:37 INFO mapreduce.Job: map 0% reduce 0%
21/04/11 01:43:00 INFO mapreduce.Job: map 100% reduce 0%
21/04/11 01:43:01 INFO mapreduce.Job: Job job_1618090360850_0017 failed with state FAILED due to: Task failed task_1618090360850_0017_m_000002
Job failed as tasks failed. failedMaps:1 failedReduces:0
21/04/11 01:43:01 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=3
Killed map tasks=1
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=3779136
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=78732
Total vcore-milliseconds taken by all map tasks=78732
Total megabyte-milliseconds taken by all map tasks=120932352
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
21/04/11 01:43:01 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
21/04/11 01:43:01 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 34.8867 seconds (0 bytes/sec)
21/04/11 01:43:01 INFO mapreduce.ExportJobBase: Exported 0 records.
21/04/11 01:43:01 ERROR mapreduce.ExportJobBase: Export job failed!
21/04/11 01:43:01 ERROR tool.ExportTool: Error during export:
Export job failed!
at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445)
at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Please let me know if this can be fixed and what to do to fix it.
I was not able to fully resolve sqoop exports on AWS, however I stopped receiving the errors with Lombok by downgrading to the prior version of emr.
I hope this helps anyone else experiencing this issue.
I would like to change s3distcp and other hadoop commands to log only WARN messages or worse, while currently it logs INFO and worse.
How can I configure this on the head node of an AWS EMR cluster?
Here's an example of the output that I am trying to hide:
$ hadoop jar ~hadoop/lib/emr-s3distcp-1.0.jar --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
6/06/01 17:18:03 INFO s3distcp.S3DistCp: Running with args: --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
16/06/01 17:18:03 INFO s3distcp.S3DistCp: S3DistCp args: --src /user/myusername/test --dest s3://some-bucket/myusername/data/test
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Using output path 'hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/output'
16/06/01 17:18:06 INFO s3distcp.S3DistCp: GET http://x.x.x.x/latest/meta-data/placement/availability-zone result: us-east-1b
16/06/01 17:18:06 INFO s3distcp.FileInfoListing: Opening new file: hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/files/1
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Created 1 files to copy 88 files
16/06/01 17:18:06 INFO s3distcp.S3DistCp: Reducer number: 15
16/06/01 17:18:06 INFO client.RMProxy: Connecting to ResourceManager at /x.x.x.x:9022
16/06/01 17:18:07 INFO input.FileInputFormat: Total input paths to process : 1
16/06/01 17:18:07 INFO mapreduce.JobSubmitter: number of splits:1
16/06/01 17:18:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464201102672_0019
16/06/01 17:18:07 INFO impl.YarnClientImpl: Submitted application application_1464201102672_0019
16/06/01 17:18:07 INFO mapreduce.Job: The url to track the job: http://x.x.x.x:9046/proxy/application_1464201102672_0019/
16/06/01 17:18:07 INFO mapreduce.Job: Running job: job_1464201102672_0019
16/06/01 17:18:13 INFO mapreduce.Job: Job job_1464201102672_0019 running in uber mode : false
16/06/01 17:18:13 INFO mapreduce.Job: map 0% reduce 0%
16/06/01 17:18:19 INFO mapreduce.Job: map 100% reduce 0%
16/06/01 17:18:30 INFO mapreduce.Job: map 100% reduce 5%
16/06/01 17:18:31 INFO mapreduce.Job: map 100% reduce 10%
16/06/01 17:18:32 INFO mapreduce.Job: map 100% reduce 22%
16/06/01 17:18:33 INFO mapreduce.Job: map 100% reduce 23%
16/06/01 17:18:34 INFO mapreduce.Job: map 100% reduce 33%
16/06/01 17:18:35 INFO mapreduce.Job: map 100% reduce 40%
16/06/01 17:18:36 INFO mapreduce.Job: map 100% reduce 50%
16/06/01 17:18:37 INFO mapreduce.Job: map 100% reduce 57%
16/06/01 17:18:38 INFO mapreduce.Job: map 100% reduce 77%
16/06/01 17:18:39 INFO mapreduce.Job: map 100% reduce 85%
16/06/01 17:18:40 INFO mapreduce.Job: map 100% reduce 90%
16/06/01 17:18:41 INFO mapreduce.Job: map 100% reduce 95%
16/06/01 17:18:42 INFO mapreduce.Job: map 100% reduce 98%
16/06/01 17:18:43 INFO mapreduce.Job: map 100% reduce 100%
16/06/01 17:18:43 INFO mapreduce.Job: Job job_1464201102672_0019 completed successfully
16/06/01 17:18:43 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=5447
FILE: Number of bytes written=1640535
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=113570708
HDFS: Number of bytes written=56776676
HDFS: Number of read operations=401
HDFS: Number of large read operations=0
HDFS: Number of write operations=206
S3: Number of bytes read=0
S3: Number of bytes written=0
S3: Number of read operations=0
S3: Number of large read operations=0
S3: Number of write operations=0
Job Counters
Launched map tasks=1
Launched reduce tasks=15
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=166005
Total time spent by all reduces in occupied slots (ms)=18351000
Total time spent by all map tasks (ms)=3689
Total time spent by all reduce tasks (ms)=203900
Total vcore-seconds taken by all map tasks=3689
Total vcore-seconds taken by all reduce tasks=203900
Total megabyte-seconds taken by all map tasks=5312160
Total megabyte-seconds taken by all reduce tasks=587232000
Map-Reduce Framework
Map input records=88
Map output records=88
Map output bytes=20500
Map output materialized bytes=5387
Input split bytes=138
Combine input records=0
Combine output records=0
Reduce input groups=88
Reduce shuffle bytes=5387
Reduce input records=88
Reduce output records=0
Spilled Records=176
Shuffled Maps =15
Failed Shuffles=0
Merged Map outputs=15
GC time elapsed (ms)=2658
CPU time spent (ms)=98620
Physical memory (bytes) snapshot=5777489920
Virtual memory (bytes) snapshot=50741022720
Total committed heap usage (bytes)=9051308032
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=17218
File Output Format Counters
Bytes Written=0
16/06/01 17:18:43 INFO s3distcp.S3DistCp: Try to recursively delete hdfs:/tmp/97139b69-ea86-400e-9ce4-f0718ff2b669/tempspace
It seems that the best way to do this is to change the HADOOP_ROOT_LOGGER environment variable. You can either do run this in the linux command line for the current session or you can add this to hadoop-env.sh script if this should always be the case.
export HADOOP_ROOT_LOGGER="WARN,console"
WARN specifies that only messages WARN or worse should get logged, and console specifies that the messages should also be printed to the command line.
Note: if you want to modify the hadoop-env.sh file, you may find it in /etc/hadoop/conf/hadoop-env.sh or for older EMR clusters /home/hadoop/conf/hadoop-env.sh
I am trying to simulate the Hadoop environment using latest Hadoop version 2.6.0, Java SDK 1.70 on my Ubuntu desktop. I configured the hadoop with necessary environment parameters and all its processes are up and running and they can be seen with the following jps command:
nandu#nandu-Desktop:~$ jps
2810 NameNode
3149 SecondaryNameNode
3416 NodeManager
3292 ResourceManager
2966 DataNode
4805 Jps
I could also see the above information, plus the dfs files through the Firefox browser. However, when I tried to run a simple WordCound MapReduce job, it hangs and it doesn't produce any output or shows any error message(s). After a while I killed the process using the "hadoop job -kill " command. Can you please guide me, to find the cause of this issue and how to resolve it? I am giving below the Job start and kill(end) screenshot.
If you need additional information, please let me know.
Your help will be highly appreciated.
Thanks,
===================================================================
nandu#nandu-Desktop:~/dev$ hadoop jar wc.jar WordCount /user/nandu/input /user/nandu/output
15/02/27 10:35:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/27 10:35:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/02/27 10:35:21 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/02/27 10:35:21 INFO input.FileInputFormat: Total input paths to process : 2
15/02/27 10:35:21 INFO mapreduce.JobSubmitter: number of splits:2
15/02/27 10:35:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1425048764581_0003
15/02/27 10:35:22 INFO impl.YarnClientImpl: Submitted application application_1425048764581_0003
15/02/27 10:35:22 INFO mapreduce.Job: The url to track the job: http://nandu-Desktop:8088/proxy/application_1425048764581_0003/
15/02/27 10:35:22 INFO mapreduce.Job: Running job: job_1425048764581_0003
==================== at this point the job was killed ===================
15/02/27 10:38:23 INFO mapreduce.Job: Job job_1425048764581_0003 running in uber mode : false
15/02/27 10:38:23 INFO mapreduce.Job: map 0% reduce 0%
15/02/27 10:38:23 INFO mapreduce.Job: Job job_1425048764581_0003 failed with state KILLED due to: Application killed by user.
15/02/27 10:38:23 INFO mapreduce.Job: Counters: 0
I encountered similar problem while running provided MapReduce sample in hadoop package. In my case it was hanging due to low disk space on my VM (about 1.5 GB was empty). When I freed some disk space it ran pretty fine. Also, please check other system resource requirements are fulfilled.