hadoop mapreduce job getting stuck at map 0% reduce 0%

hadoop mapreduce job getting stuck at map 0% reduce 0% - mapreduce

Following the documentation for hadoop setup on a single node cluster, I installed hadoop and then when I tried to run one of the examples present, hadoop-mapreduce-examples-2.7.3.jar, the process seems to get stuck at map 0% reduce 0% in running state.
My system spec is 1TB physical memory, 8GB RAM, 4 cores.
mapred-site.xml is:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2000</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2500</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>3</value>
<source>yarn-default.xml</source>
</property>
yarn-site.xml
<property>
<name>mapreduce.map.memory.mb</name>
<value>1000</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1000</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx800m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx800m</value>
</property>
Console Log:
16/09/30 11:38:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/09/30 11:38:10 INFO input.FileInputFormat: Total input paths to process : 0
16/09/30 11:38:11 INFO mapreduce.JobSubmitter: number of splits:0
16/09/30 11:38:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1475215403708_0001
16/09/30 11:38:12 INFO impl.YarnClientImpl: Submitted application application_1475215403708_0001
16/09/30 11:38:12 INFO mapreduce.Job: The url to track the job: http://DeadSilence:8088/proxy/application_1475215403708_0001/
16/09/30 11:38:12 INFO mapreduce.Job: Running job: job_1475215403708_0001
16/09/30 11:38:23 INFO mapreduce.Job: Job job_1475215403708_0001 running in uber mode : false
16/09/30 11:38:23 INFO mapreduce.Job: map 0% reduce 0%

Related

Checkpointing in flink

Trying to use checkpointing mechanism in flink with fs HDFS.
While connecting with hdfs://aleksandar/0.0.0.0:50010/shared/ i get the following error
Caused by: java.lang.IllegalArgumentException: Pathname /0.0.0.0:50010/shared/972dde22148f58ec9f266fb7bdfae891 from hdfs://aleksandar/0.0.0.0:50010/shared/972dde22148f58ec9f266fb7bdfae891 is not a valid DFS filename.
In the core-site settings i have the following configuration
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:123</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

HBase connection in mapreduce running from Oozie workflow fails

I am running my mapreduce job as java action from Oozie workflow .
When i run my mapreduce in my hadoop cluster it runs successfully,but when i run use same jar from Oozie workflow it throw be
This is my workflow .xml
<workflow-app name="HBaseToFileDriver" xmlns="uri:oozie:workflow:0.1">
<start to="mapReduceAction"/>
<action name="mapReduceAction">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>oozie.libpath</name>
<value>${appPath}/lib</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>root.fricadev</value>
</property>
</configuration>
<main-class>com.thomsonretuers.hbase.HBaseToFileDriver</main-class>
<arg>fricadev:FinancialLineItem</arg>
<capture-output/>
</java>
<ok to="end"/>
<error to="killJob"/>
</action>
<kill name="killJob">
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message>
</kill>
<end name="end" />
</workflow-app>
Below is my exception when i see the logs in the YARN .
even though is showing as succeeded but output files are not getting generated .

Have you look into Oozie Java Action
IMPORTANT: In order for a Java action to succeed on a secure cluster, it must propagate the Hadoop delegation token like in the following code snippet (this is benign on non-secure clusters):
// propagate delegation related props from launcher job to MR job
if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != null) {
jobConf.set("mapreduce.job.credentials.binary", System.getenv("HADOOP_TOKEN_FILE_LOCATION"));
}
You must get HADOOP_TOKEN_FILE_LOCATION from system env variable and set to the property mapreduce.job.credentials.binary.
HADOOP_TOKEN_FILE_LOCATION is set by oozie at runtime.

Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143

Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143.
The hadoop cluster has 3 machines, one of them is the master,others are datanode,the machine's RAM is 8G.
the yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Hadoop1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
</configuration>
the mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.input.dir.recursive</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3072m</value>
</property>
</configuration>``
when I run the mapreduce,get the error:Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143.
The input files are 500M,the reduce number is 4. when the input files less then 300M, the program can run good.

In WSO2 ESB, how to pass parameter to sequence from scheduler task

I configured a scheduler task to invoke a sequence; and I need to pass a parameter to this sequence per requirement. How to accomplish this in WSO2 ESB? I had attempting to pass in the value through the message property in the scheduler task and read the value off the message in the sequence. But failed to obtain the value in the sequence. My codes and output shown below. Please let me know what I should do to make it work. Thank you for your time in advance.
<task class="org.apache.synapse.startup.tasks.MessageInjector" group="synapse.simple.quartz" name="UploadFile2Vendor1" xmlns="http://ws.apache.org/ns/synapse">
<trigger interval="15"/>
<property name="message" xmlns:task="http://www.wso2.org/products/wso2commons/tasks">
<request>
<vendorId>1</vendorId>
</request>
</property>
<property name="sequenceName" value="SendFile2VendorSeq" xmlns:task="http://www.wso2.org/products/wso2commons/tasks"/>
<property name="injectTo" value="sequence" xmlns:task="http://www.wso2.org/products/wso2commons/tasks"/>
</task>
<sequence name="SendFile2VendorSeq" trace="disable" xmlns="http://ws.apache.org/ns/synapse">
<log>
<property xmlns:m0="http://services.samples"
expression="$body/m0:request/m0:vendorId" name="vendorId"/>
</log>
<dblookup description="get vendor">
<connection>
<pool>
<dsName>jdbc/DBDS</dsName>
</pool>
</connection>
<statement>
<sql>SELECT code, name FROM vendor WHERE id = ?</sql>
<parameter expression="get-property('vendorId')" type="INTEGER"/>
<result column="code" name="code"/>
<result column="name" name="name"/>
</statement>
</dblookup>
</sequence>
Output in log file
[2016-07-27 09:25:36,446] DEBUG - StartUpController Synapse server name : localhost
[2016-07-27 09:25:36,446] DEBUG - StartUpController loaded task property : <property xmlns="http://ws.apache.org/ns/synapse" xmlns:task="http://www.wso2.org/products/wso2commons/tasks" name="injectTo" value="sequence"/>
[2016-07-27 09:25:36,446] DEBUG - StartUpController loaded task property : <property xmlns="http://ws.apache.org/ns/synapse" xmlns:task="http://www.wso2.org/products/wso2commons/tasks" name="sequenceName" value="SendFile2VendorSeq"/>
[2016-07-27 09:25:36,446] DEBUG - StartUpController loaded task property : <property xmlns="http://ws.apache.org/ns/synapse" xmlns:task="http://www.wso2.org/products/wso2commons/tasks" name="message">
<request>
<vendorId>1</vendorId>
</request>
</property>
[2016-07-27 09:25:36,446] DEBUG - PropertyHelper Setting property :: invoking method setMessage(<request xmlns="http://ws.apache.org/ns/synapse">
<vendorId>1</vendorId>
</request>)
[2016-07-27 09:25:36,446] DEBUG - MessageInjector set message <request xmlns="http://ws.apache.org/ns/synapse">
<vendorId>1</vendorId>
</request>
[2016-07-27 09:25:36,446] DEBUG - TaskScheduler TaskScheduler already initialized.
[2016-07-27 09:25:36,532] INFO - AbstractQuartzTaskManager Task scheduled: [-1234][ESB_TASK][Upload2Vendor]
[2016-07-27 09:25:36,532] INFO - NTaskTaskManager Scheduled task [NTask::-1234::Upload2Vendor]
[2016-07-27 09:25:36,532] DEBUG - StartUpController Submitted task [Upload2Vendor] to Synapse task scheduler.
[2016-07-27 09:25:36,532] DEBUG - TaskDeployer Initialized the StartupTask : Upload2Vendor
[2016-07-27 09:25:36,532] DEBUG - TaskDeployer StartupTask Deployment from file : C:\MyApps\wso2esb-4.9.0\tmp\carbonapps\-1234\1469629535899ESBCDRCApp_1.0.0.car\UploadCDR2CDG_1.0.0\Upload2Vendor-1.0.0.xml : Completed
[2016-07-27 09:25:36,532] INFO - TaskDeployer StartupTask named 'Upload2Vendor' has been deployed from file : C:\MyApps\wso2esb-4.9.0\tmp\carbonapps\-1234\1469629535899ESBCDRCApp_1.0.0.car\UploadCDR2CDG_1.0.0\Upload2Vendor-1.0.0.xml
[2016-07-27 09:25:36,532] DEBUG - SynapseArtifactDeploymentStore Added deployment artifact with file : C:\MyApps\wso2esb-4.9.0\tmp\carbonapps\-1234\1469629535899ESBCDRCApp_1.0.0.car\UploadCDR2CDG_1.0.0\Upload2Vendor-1.0.0.xml
[2016-07-27 09:25:36,532] DEBUG - AbstractSynapseArtifactDeployer Deployment of the synapse artifact from file : C:\MyApps\wso2esb-4.9.0\tmp\carbonapps\-1234\1469629535899ESBCDRCApp_1.0.0.car\UploadCDR2CDG_1.0.0\Upload2Vendor-1.0.0.xml : COMPLETED
[2016-07-27 09:25:36,532] INFO - ApplicationManager Successfully Deployed Carbon Application : ESBCDRCApp_1.0.0 {super-tenant}
[2016-07-27 09:25:36,532] DEBUG - MessageInjector execute
[2016-07-27 09:25:36,532] DEBUG - Axis2SynapseEnvironment Creating Message Context
[2016-07-27 09:25:36,542] DEBUG - MessageInjector injecting message to sequence : SendFile2VendorSeq
[2016-07-27 09:25:36,542] DEBUG - Axis2SynapseEnvironment Injecting MessageContext for asynchronous mediation using the : SendFile2VendorSeq Sequence
[2016-07-27 09:25:36,552] DEBUG - SequenceMediator Start : Sequence <SendFile2VendorSeq>
[2016-07-27 09:25:36,552] DEBUG - SequenceMediator Sequence <SequenceMediator> :: mediate()
[2016-07-27 09:25:36,552] DEBUG - SequenceMediator Mediation started from mediator position : 0
[2016-07-27 09:25:36,552] DEBUG - SequenceMediator Building message. Sequence <SequenceMediator> is content aware
[2016-07-27 09:25:36,552] DEBUG - LogMediator Start : Log mediator
[2016-07-27 09:25:36,552] INFO - LogMediator To: , MessageID: urn:uuid:c620513a-5b90-452e-8133-d5fd23e2cce0, Direction: request, vendorId =
[2016-07-27 09:25:36,552] DEBUG - LogMediator End : Log mediator
[2016-07-27 09:25:36,552] DEBUG - DBLookupMediator Start : DBLookup mediator
[2016-07-27 09:25:36,582] DEBUG - DBLookupMediator Getting a connection from DataSource jdbc/CallOneCDRDB and preparing statement :
SELECT SELECT code, name FROM vendor WHERE id = ?
[2016-07-27 09:25:36,662] DEBUG - DBLookupMediator Setting as parameter : 1 value : null as JDBC Type : 4(see java.sql.Types for valid types)
[2016-07-27 09:25:36,662] DEBUG - DBLookupMediator Successfully prepared statement :
SELECT code, name FROM vendor WHERE id = ?

One option you can try is use org.apache.synapse.startup.tasks.TemplateMessageExecutor
as the task class. This class exposes two parameters :
1. templateParams
You can set the parameters here in an XML in some root element.
Example :
<root>
<user>John</user>
<age>10</age>
</root>
2. templateKey
You can set the key of sequence template that uses the above parameters here.
Example : gov:/sequenceTemplates/getUserSequenceTemplate
Sample Sequence Template
<template name="getUserSequenceTemplate" xmlns="http://ws.apache.org/ns/synapse">
<parameter name="user"/>
<parameter name="age"/>
<sequence>
<log level = "full">
<property name="User name is" expression={$func:user} />
<property name="User age is" expression={$func:age} />
</log>
</sequence>
</template>

There seems to be issue with namespace. I have updated the task and sequence as follow. This worked for me.
Task
<task class="org.apache.synapse.startup.tasks.MessageInjector" group="synapse.simple.quartz" name="UploadFile2Vendor1" xmlns="http://ws.apache.org/ns/synapse">
<trigger interval="15"/>
<property name="message" xmlns:task="http://www.wso2.org/products/wso2commons/tasks">
<request xmlns="" xmlns:m0="http://services.samples">
<vendorId>1</vendorId>
</request>
</property>
<property name="sequenceName" value="SendFile2VendorSeq" xmlns:task="http://www.wso2.org/products/wso2commons/tasks"/>
<property name="injectTo" value="sequence" xmlns:task="http://www.wso2.org/products/wso2commons/tasks"/>
</task>
Sequence
<?xml version="1.0" encoding="UTF-8"?>
<sequence xmlns="http://ws.apache.org/ns/synapse"
name="SendFile2VendorSeq"
trace="disable">
<log level="full"/>
<log>
<property xmlns:m0="http://services.samples"
name="vendorId"
expression="//vendorId"/>
</log>
</sequence>

How to schedule Hbase Map-Reduce Job by oozie?

I want to schedule a Hbase Map-Reduce job by Oozie.I am facing following problem .
How/Where to specify these properties in oozie workflow ?
( i> Table name for Mapper/Reducer
ii> scan object for Mapper )
Scan scan = new Scan(new Get());
scan.setMaxVersions();
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(VALUE));
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(DATE));
Job job = new Job(conf, JOB_NAME + "_" + TABLE_USER);
// These two properties :-
TableMapReduceUtil.initTableMapperJob(TABLE_USER, scan,
Mapper.class, Text.class, Text.class, job);
TableMapReduceUtil.initTableReducerJob(DETAILS_TABLE,
Reducer.class, job);
or
please let me know the best way to schedule a Hbase Map-Reduce Job by Oozie .
Thanks :) :)

The best way(According to me ) to schedule a Hbase Map_Reduce job is to schedule it as a .java file .
It works well and there is no need to write code to change your scan to string , etc.
So i am scheduling my jobs like java file till i get any better option .
workflow-app xmlns="uri:oozie:workflow:0.1" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker></job-tracker>
<name-node></name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>org.apache.oozie.example.DemoJavaMain</main-class>
<arg>Hello</arg>
<arg>Oozie!</arg>
<arg>This</arg>
<arg>is</arg>
<arg>Demo</arg>
<arg>Oozie!</arg>
</java>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>

You can also schedule the job using <Map-reduce> tag , but it is not as easy as scheduling it as java file. It requires a considerable effort, but can be considered as an alternate approach.
<action name='jobSample'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<!-- This is required for new api usage -->
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<!-- HBASE CONFIGURATIONS -->
<property>
<name>hbase.mapreduce.inputtable</name>
<value>TABLE_USER</value>
</property>
<property>
<name>hbase.mapreduce.scan</name>
<value>${wf:actionData('get-scanner')['scan']}</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>${hbaseZookeeperClientPort}</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>${hbaseZookeeperQuorum}</value>
</property>
<!-- MAPPER CONFIGURATIONS -->
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableInputFormat</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>com.hbase.mapper.MyTableMapper</value>
</property>
<!-- REDUCER CONFIGURATIONS -->
<property>
<name>mapreduce.reduce.class</name>
<value>com.hbase.reducer.MyTableReducer</value>
</property>
<property>
<name>hbase.mapred.outputtable</name>
<value>DETAILS_TABLE</value>
</property>
<property>
<name>mapreduce.outputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableOutputFormat</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>${mapperCount}</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${reducerCount}</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</map-reduce>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Map/Reduce failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
To know more about the property name and value , dump the configration parameter.
Also, the scan property is some serialization of the scan information (a Base 64 encoded version) so not sure how to specify this -
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(VALUE));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

hadoop mapreduce job getting stuck at map 0% reduce 0% - mapreduce

Related

Checkpointing in flink

HBase connection in mapreduce running from Oozie workflow fails

Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143

In WSO2 ESB, how to pass parameter to sequence from scheduler task

How to schedule Hbase Map-Reduce Job by oozie?

Categories

Resources