Trying to use checkpointing mechanism in flink with fs HDFS.
While connecting with hdfs://aleksandar/0.0.0.0:50010/shared/ i get the following error
Caused by: java.lang.IllegalArgumentException: Pathname /0.0.0.0:50010/shared/972dde22148f58ec9f266fb7bdfae891 from hdfs://aleksandar/0.0.0.0:50010/shared/972dde22148f58ec9f266fb7bdfae891 is not a valid DFS filename.
In the core-site settings i have the following configuration
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:123</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
Related
I need to configure file based SMS OTP identity provider. I couldn't find relevant xml syntax in the docs.
I tried following syntax.
<FederatedAuthenticatorConfigs>
<smsotp>
<Name>SMSOTPAuthenticator</Name>
<DisplayName>smsotp</DisplayName>
<IsEnabled>true</IsEnabled>
<Properties>
<property>
<Name>SMSUrl</Name>
<Value>url</Value>
</property>
<property>
<Name>HTTPMethod</Name>
<Value>POST</Value>
</property>
</Properties>
</smsotp>
</FederatedAuthenticatorConfigs>
However, it shows following error.
ERROR {org.wso2.carbon.identity.application.authentication.framework.handler.step.impl.DefaultStepHandler} - SMS URL is null
org.wso2.carbon.identity.application.authentication.framework.exception.AuthenticationFailedException: SMS URL is null
I am running my mapreduce job as java action from Oozie workflow .
When i run my mapreduce in my hadoop cluster it runs successfully,but when i run use same jar from Oozie workflow it throw be
This is my workflow .xml
<workflow-app name="HBaseToFileDriver" xmlns="uri:oozie:workflow:0.1">
<start to="mapReduceAction"/>
<action name="mapReduceAction">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>oozie.libpath</name>
<value>${appPath}/lib</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>root.fricadev</value>
</property>
</configuration>
<main-class>com.thomsonretuers.hbase.HBaseToFileDriver</main-class>
<arg>fricadev:FinancialLineItem</arg>
<capture-output/>
</java>
<ok to="end"/>
<error to="killJob"/>
</action>
<kill name="killJob">
<message>"Killed job due to error: ${wf:errorMessage(wf:lastErrorNode())}"</message>
</kill>
<end name="end" />
</workflow-app>
Below is my exception when i see the logs in the YARN .
even though is showing as succeeded but output files are not getting generated .
Have you look into Oozie Java Action
IMPORTANT: In order for a Java action to succeed on a secure cluster, it must propagate the Hadoop delegation token like in the following code snippet (this is benign on non-secure clusters):
// propagate delegation related props from launcher job to MR job
if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != null) {
jobConf.set("mapreduce.job.credentials.binary", System.getenv("HADOOP_TOKEN_FILE_LOCATION"));
}
You must get HADOOP_TOKEN_FILE_LOCATION from system env variable and set to the property mapreduce.job.credentials.binary.
HADOOP_TOKEN_FILE_LOCATION is set by oozie at runtime.
Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143.
The hadoop cluster has 3 machines, one of them is the master,others are datanode,the machine's RAM is 8G.
the yarn-site.xml:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Hadoop1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<description>Whether virtual memory limits will be enforced for containers</description>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
</property>
</configuration>
the mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.input.dir.recursive</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx3072m</value>
</property>
</configuration>``
when I run the mapreduce,get the error:Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143.
The input files are 500M,the reduce number is 4. when the input files less then 300M, the program can run good.
I've a map only job which takes sequence file (key is Text, value is BytesWritable) as input and output data in to sequence file (key is NullWritable, value is Text).
Java class
import java.io.*;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
public class Test {
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = new Job(conf, "Test");
job.setJarByClass(Test.class);
job.setMapperClass(TestMapper.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(Text.class);
job.setNumReduceTasks(0);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.submit();
}
public static class TestMapper extends Mapper<Text, BytesWritable, NullWritable, Text> {
Text outValue = new Text("");
int counter = 0;
public void map(Text filename, BytesWritable data, Context context) throws IOException, InterruptedException {
/ logic
}
}
}
It's working fine when running job from unix command, when the same job scheduled in oozie seeing below error
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
at Test$TestMapper.map(Test.java:56)
job configuration in oozie
<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/temp</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>Test$TestMapper</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>0</value>
</property>
<property>
<name>mapreduce.job.output.key.class</name>
<value>org.apache.hadoop.io.NullWritable</value>
</property>
<property>
<name>mapreduce.job.output.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>
<property>
<name>mapreduce.job.outputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat</value>
</property>
<property>
<name>mapreduce.job.mapinput.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.mapinput.value.class</name>
<value>org.apache.hadoop.io.BytesWritable</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
Can someone tell me what is the error here.. thank you
The classcast exception indicates that Oozie is still using the default inputformat of TextInputFormat, which has a Key type of LongWritable. Since the mapper has a key type of Text, there is a type mismatch on ingestion by the mapper. So the config key of mapreduce.job.inputformat.class was incorrect.
(after some trial and error)
We found that the correct property name is mapreduce.inputformat.class, i.e.:
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>
I want to schedule a Hbase Map-Reduce job by Oozie.I am facing following problem .
How/Where to specify these properties in oozie workflow ?
( i> Table name for Mapper/Reducer
ii> scan object for Mapper )
Scan scan = new Scan(new Get());
scan.setMaxVersions();
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(VALUE));
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(DATE));
Job job = new Job(conf, JOB_NAME + "_" + TABLE_USER);
// These two properties :-
TableMapReduceUtil.initTableMapperJob(TABLE_USER, scan,
Mapper.class, Text.class, Text.class, job);
TableMapReduceUtil.initTableReducerJob(DETAILS_TABLE,
Reducer.class, job);
or
please let me know the best way to schedule a Hbase Map-Reduce Job by Oozie .
Thanks :) :)
The best way(According to me ) to schedule a Hbase Map_Reduce job is to schedule it as a .java file .
It works well and there is no need to write code to change your scan to string , etc.
So i am scheduling my jobs like java file till i get any better option .
workflow-app xmlns="uri:oozie:workflow:0.1" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker></job-tracker>
<name-node></name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>org.apache.oozie.example.DemoJavaMain</main-class>
<arg>Hello</arg>
<arg>Oozie!</arg>
<arg>This</arg>
<arg>is</arg>
<arg>Demo</arg>
<arg>Oozie!</arg>
</java>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
You can also schedule the job using <Map-reduce> tag , but it is not as easy as scheduling it as java file. It requires a considerable effort, but can be considered as an alternate approach.
<action name='jobSample'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<!-- This is required for new api usage -->
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<!-- HBASE CONFIGURATIONS -->
<property>
<name>hbase.mapreduce.inputtable</name>
<value>TABLE_USER</value>
</property>
<property>
<name>hbase.mapreduce.scan</name>
<value>${wf:actionData('get-scanner')['scan']}</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>${hbaseZookeeperClientPort}</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>${hbaseZookeeperQuorum}</value>
</property>
<!-- MAPPER CONFIGURATIONS -->
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableInputFormat</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>com.hbase.mapper.MyTableMapper</value>
</property>
<!-- REDUCER CONFIGURATIONS -->
<property>
<name>mapreduce.reduce.class</name>
<value>com.hbase.reducer.MyTableReducer</value>
</property>
<property>
<name>hbase.mapred.outputtable</name>
<value>DETAILS_TABLE</value>
</property>
<property>
<name>mapreduce.outputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableOutputFormat</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>${mapperCount}</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${reducerCount}</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</map-reduce>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Map/Reduce failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
To know more about the property name and value , dump the configration parameter.
Also, the scan property is some serialization of the scan information (a Base 64 encoded version) so not sure how to specify this -
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(VALUE));