I'm trying to parse the following TSV data into a nested object but my "title" field is always null within the Nested class.
I've included the method at the bottom which converts the TSV data to the object.
value1 | metaData1 | valueA |
value2 | metaData2 | valueB |
value3 | metaData3 | valueC |
public class Data {
#Parsed(index = 0)
private String value0;
#Parsed(index = 1)
private String foo;
#Nested
MetaData metaData;
public static class MetaData {
#Parsed(index = 1)
private String title;
}
}
public <T> List<T> convertFileToData(File file, Class<T> clazz, boolean removeHeader) {
BeanListProcessor<T> rowProcessor = new BeanListProcessor<>(clazz);
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setDelimiter('|');
settings.setProcessor(rowProcessor);
settings.setHeaderExtractionEnabled(removeHeader);
CsvParser parser = new CsvParser(settings);
parser.parseAll(file);
return rowProcessor.getBeans();
}
You forgot to define an index on your Metadata.title:
public static class MetaData {
#Parsed(index=1)
private String title;
}
Also, you are setting the delimiter to \t while your input is using | as the separator.
Related
I'll like to parse column zero in a csv file to a particular datatype, in this example a Date Object.
The method below is what I use currently to parse a csv file but I don't know how to incorporate this requirement.
import java.sql.Date;
public class Data {
#Parsed(index = 0)
private Date date;
}
}
public <T> List<T> convertFileToData(File file, Class<T> clazz) {
BeanListProcessor<T> rowProcessor = new BeanListProcessor<>(clazz);
CsvParserSettings settings = new CsvParserSettings();
settings.setProcessor(rowProcessor);
settings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(settings);
parser.parseAll(file);
return rowProcessor.getBeans();
}
All you need is to define the format(s) of your date and you are set:
#Format(formats = {"dd-MMM-yyyy", "yyyy-MM-dd"})
#Parsed(index = 0)
private Date date;
}
As an extra suggestion, you can also replace a lot of your code by using the CsvRoutines class. Try this:
List<T> beanList = new CsvRoutines(settings).parseAll(clazz, file);
Hope it helps.
I am trying to split a string using mapreduce2(yarn) in Hortonworks Sandbox.
It throws a ArrayOutOfBound Exception if I try to access val[1] , Works fine with when I don't split the input file.
Mapper:
public class MapperClass extends Mapper<Object, Text, Text, Text> {
private Text airline_id;
private Text name;
private Text country;
private Text value1;
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String s = value.toString();
if (s.length() > 1) {
String val[] = s.split(",");
context.write(new Text("blah"), new Text(val[1]));
}
}
}
Reducer:
public class ReducerClass extends Reducer<Text, Text, Text, Text> {
private Text result = new Text();
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String airports = "";
if (key.equals("India")) {
for (Text val : values) {
airports += "\t" + val.toString();
}
result.set(airports);
context.write(key, result);
}
}
}
MainClass:
public class MainClass {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
#SuppressWarnings("deprecation")
Job job = new Job(conf, "Flights MR");
job.setJarByClass(MainClass.class);
job.setMapperClass(MapperClass.class);
job.setReducerClass(ReducerClass.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(KeyValueTextInputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Can you help?
Update:
Figured out that it doesn't convert Text to String.
If the string you are splitting does not contain a comma, the resulting String[] will be of length 1 with the entire string in at val[0].
Currently, you are making sure that the string is not the empty string
if (s.length() > -1)
But you are not checking that the split will actually result in an array of length more than 1 and assuming that there was a split.
context.write(new Text("blah"), new Text(val[1]));
If there was no split this will cause an out of bounds error. A possible solution would be to make sure that the string contains at least 1 comma, instead of checking that it is not the empty string like so:
String s = value.toString();
if (s.indexOf(',') > -1) {
String val[] = s.split(",");
context.write(new Text("blah"), new Text(val[1]));
}
In my mapper class I want to do a small manipulation to a string read from a file(as a line) and then send it over to the reducer to get a string count. The manipulation being replace null strings with 0. (the current replace & join part is failing my hadoop job)
Here is my code:
import java.io.BufferedReader;
import java.io.IOException;
.....
public class PartNumberMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private static Text partString = new Text("");
private final static IntWritable count = new IntWritable(1);
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
// Read line by line to bufferreader and output the (line,count) pair
BufferedReader bufReader = new BufferedReader(new StringReader(line));
String l=null;
while( (l=bufReader.readLine()) != null )
{
/**** This part is the problem ****/
String a[]=l.split(",");
if(a[1]==""){ // if a[1] i.e. second string is "" then set it to "0"
a[1]="0";
l = StringUtils.join(",", a); // join the string array to form a string
}
/**** problematic part ends ****/
partString.set(l);
output.collect(partString, count);
}
}
}
After this is run, the mapper just fails and doesn't post any errors.
[The code is run with yarn]
I am not sure what I am doing wrong, the same code worked without the string join part.
Could any of you explain what is wrong with the string replace/concat? Is there a better way to do it?
Here's a modified version of your Mapper class with a few changes:
Remove the BufferedReader, it seems redundant and isn't being closed
String equality should be .equals() and not ==
Declare a String array using String[] and not String a[]
Resulting in the following code:
public class PartNumberMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private Text partString = new Text();
private final static IntWritable count = new IntWritable(1);
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
String[] a = l.split(",");
if (a[1].equals("")) {
a[1] = "0";
l = StringUtils.join(",", a);
}
partString.set(l);
output.collect(partString, count);
}
}
I have a table called User with two columns, one called visitorId and the other called friend which is a list of strings. I want to check whether the VisitorId is in the friendlist. Can anyone direct me as to how to access the table columns in a map function?
I'm not able to picture how data is output from a map function in hbase.
My code is as follows:
ublic class MapReduce {
static class Mapper1 extends TableMapper<ImmutableBytesWritable, Text> {
private int numRecords = 0;
private static final IntWritable one = new IntWritable(1);
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
#Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException {
//What should i do here??
ImmutableBytesWritable userKey = new ImmutableBytesWritable(row.get(), 0, Bytes.SIZEOF_INT);
context.write(userkey,One);
}
//context.write(text, ONE);
} catch (InterruptedException e) {
throw new IOException(e);
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "CheckVisitor");
job.setJarByClass(MapReduce.class);
Scan scan = new Scan();
Filter f = new RowFilter(CompareOp.EQUAL,new SubstringComparator("mId2"));
scan.setFilter(f);
scan.addFamily(Bytes.toBytes("visitor"));
scan.addFamily(Bytes.toBytes("friend"));
TableMapReduceUtil.initTableMapperJob("User", scan, Mapper1.class, ImmutableBytesWritable.class,Text.class, job);
}
}
So Result values instance would contain the full row from the scanner.
To get the appropriate columns from the Result I would do something like :-
VisitorIdVal = value.getColumnLatest(Bytes.toBytes(columnFamily1), Bytes.toBytes("VisitorId"))
friendlistVal = value.getColumnLatest(Bytes.toBytes(columnFamily2), Bytes.toBytes("friendlist"))
Here VisitorIdVal and friendlistVal are of the type keyValue http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/KeyValue.html, to get their values out you can do a Bytes.toString(VisitorIdVal.getValue())
Once you have extracted the values from columns you can check for "VisitorId" in "friendlist"
I'm trying to set up SQLite for unit testing with Fluent NHibernate as shown here but the table names isn't being generated as I expected.
Some tables have schemas with dots inside which seems to break the generation. (Dots works perfectly well with Microsoft SQL Server which I have in my production environment.)
Example:
[Foo.Bar.Schema].[TableName]
Result:
TestFixture failed: System.Data.SQLite.SQLiteException : SQLite error
unknown database Foo
How do I instruct SQLite to translate the dots to underscores or something so I can run my unit tests?
(I've tried adding brackets to the schema names with no success)
You can use a convention
http://wiki.fluentnhibernate.org/Conventions
*UPDATED
public static class PrivatePropertyHelper
{
// from http://stackoverflow.com/questions/1565734/is-it-possible-to-set-private-property-via-reflection
public static T GetPrivatePropertyValue<T>(this object obj, string propName)
{
if (obj == null) throw new ArgumentNullException("obj");
PropertyInfo pi = obj.GetType().GetProperty(propName, BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance);
if (pi == null) throw new ArgumentOutOfRangeException("propName", string.Format("Property {0} was not found in Type {1}", propName, obj.GetType().FullName));
return (T)pi.GetValue(obj, null);
}
}
public class CustomTableNameConvention : IClassConvention
{
// Use this to set schema to specific value
public void Apply(FluentNHibernate.Conventions.Instances.IClassInstance instance)
{
instance.Schema("My_NEw_Schema");
instance.Table(instance.EntityType.Name.CamelToUnderscoreLower());
}
// Use this to alter the existing schema value.
// note that Schema is a private property and you need reflection to get it
public void Apply(FluentNHibernate.Conventions.Instances.IClassInstance instance)
{
instance.Schema(instance.GetPrivatePropertyValue<string>("Schema").Replace(".", "_"));
instance.Table(instance.EntityType.Name.CamelToUnderscoreLower());
}
}
You must use only one of he Apply methods.
*UPDATE 2
I don't know I would recommend this but if you like to experiment this seems to work. Even more reflection :)
public static void SetSchemaValue(this object obj, string schema)
{
var mapping_ref = obj.GetType().GetField("mapping", BindingFlags.Instance | BindingFlags.IgnoreCase | BindingFlags.GetField | BindingFlags.NonPublic).GetValue(obj);
var mapping = mapping_ref as ClassMapping;
if (mapping != null)
{
mapping.Schema = schema;
}
}
public void Apply(FluentNHibernate.Conventions.Instances.IClassInstance instance)
{
var schema = instance.GetPrivatePropertyValue<string>("Schema");
if (schema == null)
{
instance.Schema("My_New_Schema");
}
else
{
instance.SetSchemaValue("My_New_Schema");
}
}