HBase connection in mapreduce running from Oozie workflow fails - mapreduce

I am running my mapreduce job as java action from Oozie workflow .
When i run my mapreduce in my hadoop cluster it runs successfully,but when i run use same jar from Oozie workflow it throw be
This is my workflow .xml
<workflow-app name="HBaseToFileDriver" xmlns="uri:oozie:workflow:0.1">
<start to="mapReduceAction"/>
<action name="mapReduceAction">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>oozie.libpath</name>
<value>${appPath}/lib</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>root.fricadev</value>
</property>
</configuration>
<main-class>com.thomsonretuers.hbase.HBaseToFileDriver</main-class>
<arg>fricadev:FinancialLineItem</arg>
<capture-output/>
</java>
<ok to="end"/>
<error to="killJob"/>
</action>
<kill name="killJob">
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message>
</kill>
<end name="end" />
</workflow-app>
Below is my exception when i see the logs in the YARN .
even though is showing as succeeded but output files are not getting generated .

Have you look into Oozie Java Action
IMPORTANT: In order for a Java action to succeed on a secure cluster, it must propagate the Hadoop delegation token like in the following code snippet (this is benign on non-secure clusters):
// propagate delegation related props from launcher job to MR job
if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != null) {
jobConf.set("mapreduce.job.credentials.binary", System.getenv("HADOOP_TOKEN_FILE_LOCATION"));
}
You must get HADOOP_TOKEN_FILE_LOCATION from system env variable and set to the property mapreduce.job.credentials.binary.
HADOOP_TOKEN_FILE_LOCATION is set by oozie at runtime.

Related

How to configure File Based SMS OTP Identity Provider

I need to configure file based SMS OTP identity provider. I couldn't find relevant xml syntax in the docs.
I tried following syntax.
<FederatedAuthenticatorConfigs>
<smsotp>
<Name>SMSOTPAuthenticator</Name>
<DisplayName>smsotp</DisplayName>
<IsEnabled>true</IsEnabled>
<Properties>
<property>
<Name>SMSUrl</Name>
<Value>url</Value>
</property>
<property>
<Name>HTTPMethod</Name>
<Value>POST</Value>
</property>
</Properties>
</smsotp>
</FederatedAuthenticatorConfigs>
However, it shows following error.
ERROR {org.wso2.carbon.identity.application.authentication.framework.handler.step.impl.DefaultStepHandler} - SMS URL is null
org.wso2.carbon.identity.application.authentication.framework.exception.AuthenticationFailedException: SMS URL is null

Stackdriver Logging | Missing TraceId and SpanId

I am trying to integrate SpringBootApplication(microservice) with StackdriverLogging by using spring-cloud-gcp-starter-logging. I am able to see the logs in GCP but in the logs traceId and SpanId is missing.For this I tried to use Spring-cloud-sleuth also, but as I am using apache kafka in my microservice sleuth is not working properly.
Can anyone help me how Can I add traceId and SpanId information in logs??
POM.xml Configuration :
<spring-cloud.version>Hoxton.SR1</spring-cloud.version>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${spring-cloud.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-gcp-starter-logging</artifactId>
</dependency>
logback.xml :
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<include resource="org/springframework/cloud/gcp/autoconfigure/logging/logback-json-appender.xml" />
<appender name="CONSOLE_JSON" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="org.springframework.cloud.gcp.logging.StackdriverJsonLayout">
<includeTraceId>true</includeTraceId>
<includeSpanId>true</includeSpanId>
<includeThreadName>false</includeThreadName>
</layout>
</encoder>
</appender>
<logger name="org.apache.kafka" level="warn">
<appender-ref ref="CONSOLE_JSON" />
</logger>
<root level="INFO">
<appender-ref ref="CONSOLE_JSON" />
</root>
</configuration>
By default, Sleuth doesn’t send the trace data for every single request. You probably don’t want to trace every single request either. The trace sampling rate can be adjusted by configuring the spring.sleuth.sampler.percentage property. For more details please refer to "Spring Boot and Spring Cloud Sleuth" in this link and have a look to this video.

Checkpointing in flink

Trying to use checkpointing mechanism in flink with fs HDFS.
While connecting with hdfs://aleksandar/0.0.0.0:50010/shared/ i get the following error
Caused by: java.lang.IllegalArgumentException: Pathname /0.0.0.0:50010/shared/972dde22148f58ec9f266fb7bdfae891 from hdfs://aleksandar/0.0.0.0:50010/shared/972dde22148f58ec9f266fb7bdfae891 is not a valid DFS filename.
In the core-site settings i have the following configuration
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:123</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143

Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143.
The hadoop cluster has 3 machines, one of them is the master,others are datanode,the machine's RAM is 8G.
the yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Hadoop1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
</configuration>
the mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.input.dir.recursive</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3072m</value>
</property>
</configuration>``
when I run the mapreduce,get the error:Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143.
The input files are 500M,the reduce number is 4. when the input files less then 300M, the program can run good.

How to schedule Hbase Map-Reduce Job by oozie?

I want to schedule a Hbase Map-Reduce job by Oozie.I am facing following problem .
How/Where to specify these properties in oozie workflow ?
( i> Table name for Mapper/Reducer
ii> scan object for Mapper )
Scan scan = new Scan(new Get());
scan.setMaxVersions();
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(VALUE));
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(DATE));
Job job = new Job(conf, JOB_NAME + "_" + TABLE_USER);
// These two properties :-
TableMapReduceUtil.initTableMapperJob(TABLE_USER, scan,
Mapper.class, Text.class, Text.class, job);
TableMapReduceUtil.initTableReducerJob(DETAILS_TABLE,
Reducer.class, job);
or
please let me know the best way to schedule a Hbase Map-Reduce Job by Oozie .
Thanks :) :)
The best way(According to me ) to schedule a Hbase Map_Reduce job is to schedule it as a .java file .
It works well and there is no need to write code to change your scan to string , etc.
So i am scheduling my jobs like java file till i get any better option .
workflow-app xmlns="uri:oozie:workflow:0.1" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker></job-tracker>
<name-node></name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>org.apache.oozie.example.DemoJavaMain</main-class>
<arg>Hello</arg>
<arg>Oozie!</arg>
<arg>This</arg>
<arg>is</arg>
<arg>Demo</arg>
<arg>Oozie!</arg>
</java>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
You can also schedule the job using <Map-reduce> tag , but it is not as easy as scheduling it as java file. It requires a considerable effort, but can be considered as an alternate approach.
<action name='jobSample'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<!-- This is required for new api usage -->
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<!-- HBASE CONFIGURATIONS -->
<property>
<name>hbase.mapreduce.inputtable</name>
<value>TABLE_USER</value>
</property>
<property>
<name>hbase.mapreduce.scan</name>
<value>${wf:actionData('get-scanner')['scan']}</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>${hbaseZookeeperClientPort}</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>${hbaseZookeeperQuorum}</value>
</property>
<!-- MAPPER CONFIGURATIONS -->
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableInputFormat</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>com.hbase.mapper.MyTableMapper</value>
</property>
<!-- REDUCER CONFIGURATIONS -->
<property>
<name>mapreduce.reduce.class</name>
<value>com.hbase.reducer.MyTableReducer</value>
</property>
<property>
<name>hbase.mapred.outputtable</name>
<value>DETAILS_TABLE</value>
</property>
<property>
<name>mapreduce.outputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableOutputFormat</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>${mapperCount}</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${reducerCount}</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</map-reduce>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Map/Reduce failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
To know more about the property name and value , dump the configration parameter.
Also, the scan property is some serialization of the scan information (a Base 64 encoded version) so not sure how to specify this -
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(VALUE));