Univocity - parse each TSV file row to different Type of class object - univocity

I have a tsv file which has fixed rows but each row is mapped to different Java Class.
For example.
recordType recordValue1
recordType recordValue1 recordValue2
for First row I have follofing class:
public class FirstRow implements ItsvRecord {
#Parsed(index = 0)
private String recordType;
#Parsed(index = 1)
private String recordValue1;
public FirstRow() {
}
}
and for second row I have:
public class SecondRow implements ItsvRecord {
#Parsed(index = 0)
private String recordType;
#Parsed(index = 1)
private String recordValue1;
public SecondRow() {
}
}
I want to parse the TSV file directly to the respective objects but I am falling short of ideas.

Use an InputValueSwitch. This will match a value in a particular column of each row to determine what RowProcessor to use. Example:
Create two (or more) processors for each type of record you need to process:
final BeanListProcessor<FirstRow> firstProcessor = new BeanListProcessor<FirstRow>(FirstRow.class);
final BeanListProcessor<SecondRow> secondProcessor = new BeanListProcessor<SecondRow>(SecondRow.class);
Create an InputValueSwitch:
//0 means that the first column of each row has a value that
//identifies what is the type of record you are dealing with
InputValueSwitch valueSwitch = new InputValueSwitch(0);
//assigns the first processor to rows whose first column contain the 'firstRowType' value
valueSwitch.addSwitchForValue("firstRowType", firstProcessor);
//assigns the second processor to rows whose first column contain the 'secondRowType' value
valueSwitch.addSwitchForValue("secondRowType", secondProcessor);
Parse as usual:
TsvParserSettings settings = new TsvParserSettings(); //configure...
// your row processor is the switch
settings.setProcessor(valueSwitch);
TsvParser parser = new TsvParser(settings);
Reader input = new StringReader(""+
"firstRowType\trecordValue1\n" +
"secondRowType\trecordValue1\trecordValue2");
parser.parse(input);
Get the parsed objects from your processors:
List<FirstRow> firstTypeObjects = firstProcessor.getBeans();
List<SecondRow> secondTypeObjects = secondProcessor.getBeans();
The output will be*:
[FirstRow{recordType='firstRowType', recordValue1='recordValue1'}]
[SecondRow{recordType='secondRowType', recordValue1='recordValue1', recordValue2='recordValue2'}]
Assuming you have a sane toString() implemented in your classes
If you want to manage associations among the objects that are parsed:
If your FirstRow should contain the elements parsed for records of type SecondRow, simply override the rowProcessorSwitched method:
InputValueSwitch valueSwitch = new InputValueSwitch(0) {
#Override
public void rowProcessorSwitched(RowProcessor from, RowProcessor to) {
if (from == secondProcessor) {
List<FirstRow> firstRows = firstProcessor.getBeans();
FirstRow mostRecentRow = firstRows.get(firstRows.size() - 1);
mostRecentRow.addRowsOfOtherType(secondProcessor.getBeans());
secondProcessor.getBeans().clear();
}
}
};
The above assumes your FirstRow class has a addRowsOfOtherType method that takes a list of SecondRow as parameter.
And that's it!
You can even mix and match other types of RowProcessor. There's another example here that demonstrates this.
Hope this helps.

Related

How to use Isolationforest in weka?

I am trying to use isolationforest in weka ,but I cannot find a easy example which shows how to use it ,who can help me ?thanks in advance
import weka.classifiers.misc.IsolationForest;
public class Test2 {
public static void main(String[] args) {
IsolationForest isolationForest = new IsolationForest();
.....................................................
}
}
I strongly suggest you to study a little bit the implementation for IslationForest.
The following code work loading a CSV file with first column with Class (note: a single class value will produce only (1-anomaly score) if it's binary you will get the anomaly score too. Otherwise it just return an error). Note I skip the second column (that in my case is the uuid that is not needed for anomaly detection)
private static void findOutlier(File in, File out) throws Exception {
CSVLoader loader = new CSVLoader();
loader.setSource(new File(in.getAbsolutePath()));
Instances data = loader.getDataSet();
// setting class attribute if the data format does not provide this information
// For example, the XRFF format saves the class attribute information as well
if (data.classIndex() == -1)
data.setClassIndex(0);
String[] options = new String[2];
options[0] = "-R"; // "range"
options[1] = "2"; // first attribute
Remove remove = new Remove(); // new instance of filter
remove.setOptions(options); // set options
remove.setInputFormat(data); // inform filter about dataset **AFTER** setting options
Instances newData = Filter.useFilter(data, remove); // apply filter
IsolationForest randomForest = new IsolationForest();
randomForest.buildClassifier(newData);
// System.out.println(randomForest);
FileWriter fw = new FileWriter(out);
final Enumeration<Attribute> attributeEnumeration = data.enumerateAttributes();
for (Attribute e = attributeEnumeration.nextElement(); attributeEnumeration.hasMoreElements(); e = attributeEnumeration.nextElement()) {
fw.write(e.name());
fw.write(",");
}
fw.write("(1 - anomaly score),anomaly score\n");
for (int i = 0; i < data.size(); ++i) {
Instance inst = data.get(i);
final double[] distributionForInstance = randomForest.distributionForInstance(inst);
fw.write(inst + ", " + distributionForInstance[0] + "," + (1 - distributionForInstance[0]));
fw.write(",\n");
}
fw.flush();
}
The previous function will add at the CSV at last column the anomaly values. Please note I'm using a single class so for getting the corresponding anomaly I do 1 - distributionForInstance[0] otherwise you ca do simply distributionForInstance[1] .
A sample input.csv for getting (1-anomaly score):
Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
A,2,41,61,81
A,3,61,37,34
A sample input.csv for getting (1-anomaly score) and anomaly score:
Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
B,2,41,61,81
A,3,61,37,34

Using Roslyn, how to enumerate members (Namespaces, classes etc) details in Visual Basic Document?

Using Roslyn, the only mechanism for determining members of Visual Basic document appears to be:
var members = SyntaxTree.GetRoot().DescendantNodes().Where(node =>
node is ClassStatementSyntax ||
node is FunctionAggregationSyntax ||
node is IncompleteMemberSyntax ||
node is MethodBaseSyntax ||
node is ModuleStatementSyntax ||
node is NamespaceStatementSyntax ||
node is PropertyStatementSyntax ||
node is SubNewStatementSyntax
);
How do get the member name, StarLineNumber and EndLineNumber of each member?
Exists not only the one way to get it:
1) As you try: I willn't show this way for all of kind member (they count are huge and the logic is the similar), but only a one of them, for example ClassStatementSyntax:
to achive it name just get ClassStatementSyntax.Identifier.ValueText
to get start line you can use Location as one of ways:
var location = Location.Create(SyntaxTree, ClassStatementSyntax.Identifier.Span);
var startLine = location.GetLineSpan().StartLinePosition.Line;
logic for retrieving the end line looks like a logic to receive the start line but it dependents on the corresponding closing statement (some kind of end statement or self)
2) More useful way – use SemanticModel to get a data that you want:
In this way you will need to receive semantic info only for ClassStatementSyntax, ModuleStatementSyntxt and NamespaceStatementSyntax, and all of their members will be received just calling GetMembers():
...
SemanticModel semanticModel = // usually it is received from the corresponding compilation
var typeSyntax = // ClassStatementSyntax, ModuleStatementSyntxt or NamespaceStatementSyntax
string name = null;
int startLine;
int endLine;
var info = semanticModel.GetSymbolInfo(typeSyntax);
if (info.Symbol is INamespaceOrTypeSymbol typeSymbol)
{
name = typeSymbol.Name; // retrieve Name
startLine = semanticModel.SyntaxTree.GetLineSpan(typeSymbol.DeclaringSyntaxReferences[0].Span).StartLinePosition.Line; //retrieve start line
endLine = semanticModel.SyntaxTree.GetLineSpan(typeSymbol.DeclaringSyntaxReferences[0].Span).EndLinePosition.Line; //retrieve end line
foreach (var item in typeSymbol.GetMembers())
{
// do the same logic for retrieving name and lines for all others members without calling GetMembers()
}
}
else if (semanticModel.GetDeclaredSymbol(typeSyntax) is INamespaceOrTypeSymbol typeSymbol2)
{
name = typeSymbol2.Name; // retrieve Name
startLine = semanticModel.SyntaxTree.GetLineSpan(typeSymbol2.DeclaringSyntaxReferences[0].Span).StartLinePosition.Line; //retrieve start line
endLine = semanticModel.SyntaxTree.GetLineSpan(typeSymbol2.DeclaringSyntaxReferences[0].Span).EndLinePosition.Line; //retrieve end line
foreach (var item in typeSymbol2.GetMembers())
{
// do the same logic for retrieving name and lines for all others members without calling GetMembers()
}
}
But attention, when you have a partial declaration your DeclaringSyntaxReferences will have a couple items, so you need to filter SyntaxReference by your current SyntaxTree

How to get all rows containing (or equaling) a particular ID from an HBase table?

I have a method which select the row whose rowkey contains the parameter passed into.
HTable table = new HTable(Bytes.toBytes(objectsTableName), connection);
public List<ObjectId> lookUp(String partialId) {
if (partialId.matches("[a-fA-F0-9]+")) {
// create a regular expression from partialId, which can
//match any rowkey that contains partialId as a substring,
//and then get all the row with the specified rowkey
} else {
throw new IllegalArgumentException(
"query must be done with hexadecimal values only");
}
}
I don't know how to finish code above.
I just know the following code can get the row with specified rowkey in Hbase.
String rowkey = "123";
Get get = new Get(Bytes.toBytes(rowkey));
Result result = table.get(get);
You can use RowFilter filter with RegexStringComparator to do that. Or, if it is just to fetch the rows which match a given substring you can use RowFilter with SubstringComparator. This is how you use HBase filters :
public static void main(String[] args) throws IOException {
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "demo");
Scan s = new Scan();
Filter f = new RowFilter(CompareOp.EQUAL, new SubstringComparator("abc"));
s.setFilter(f);
ResultScanner rs = table.getScanner(s);
for(Result r : rs){
System.out.println("RowKey : " + Bytes.toString(r.getRow()));
//rest of your logic
}
rs.close();
table.close();
}
The above piece of code will give you all the rows which contain abc as a part of their rowkeys.
HTH

Qjson handling an returned array of ojbects

I'm using Qjson to parse a json object that is returned from a web service. I'm stuck on handling an array of complex ojects.
At the first level the web service returns a map consisting of "error", "id", and "return". If there are no errors I can get the first level value by using
nestedMap = m_jsonObject["result"].toMap();
group = new Group();
group->Caption = nestedMap["Caption"].toString();
group->CollectionCount = nestedMap["CollectionCount"].toInt();
I can even get a date item value that is at the second level using
group->ModifiedOn = nestedMap["ModifiedOn"].toMap()["Value"].toDateTime();
I have an object called "Elements" that consists of 29 key-value pairs. The web service is returning an array of these "Elements" and I am unable to find the right way to parse it. In the header file the container for the elements is defined as
QList<GroupElement> Elements;
The line
group->Elements = nestedMap["Elements"].toList();
causes the compiler to throw an error 'error: no match for 'operator=' in '((MyClass*)this)->MyClass::group->Group::Elements = QVariant::toMap() const()'
I would like to learn the correct syntax to put this element into the class.
Update: I wrote another function to convert the QVariantMap object to a
first:
The group-> Elements object was changed to a
class ParentClass{
QList<SharedDataPointer<Address> > Elements;
other class memmbers...
};
Second:
A method to convert the QMap object to an Address object was created
QSharedDataPointer<Address>
API_1_6::mapToAddress(QVariantMap o)
{
QSharedDataPointer<Address> address (new Address());
address-> FirstName = o["FirstName"].toString();
address->LastName = o["LastName"].toString();
address->CompanyName = o["CompanyName"].toString();
address->Street = o["Street"].toString();
address->Street2 = o["Street2"].toString();
address->City = o["City"].toString();
address->Zip = o["Zip"].toString();
address-> State = o["State"].toString();
address->Country = o["Country"].toString();
address->Phone = o["Phone"].toString();
address->Phone2 = o["Phone2"].toString();
address-> Fax = o["Fax"].toString();
address-> Url = o["Url"].toString();
address->Email = o["Email"].toString();
address->Other = o["Other"].toString();
return address;
}
third: In the code, foreach is used to walk through the list and create and store the new objects
// get the list of the elements
elementsList = nestedMap["Elements"].toList();
// Add the element, converted to the new type, to the Elements object of the'parent' class
foreach(QVariant qElement, elementsList){
group-> Elements.append(mapToAddress(qElement))
}

Hbase Map/reduce-How to access individual columns of the table?

I have a table called User with two columns, one called visitorId and the other called friend which is a list of strings. I want to check whether the VisitorId is in the friendlist. Can anyone direct me as to how to access the table columns in a map function?
I'm not able to picture how data is output from a map function in hbase.
My code is as follows:
ublic class MapReduce {
static class Mapper1 extends TableMapper<ImmutableBytesWritable, Text> {
private int numRecords = 0;
private static final IntWritable one = new IntWritable(1);
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
#Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException {
//What should i do here??
ImmutableBytesWritable userKey = new ImmutableBytesWritable(row.get(), 0, Bytes.SIZEOF_INT);
context.write(userkey,One);
}
//context.write(text, ONE);
} catch (InterruptedException e) {
throw new IOException(e);
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "CheckVisitor");
job.setJarByClass(MapReduce.class);
Scan scan = new Scan();
Filter f = new RowFilter(CompareOp.EQUAL,new SubstringComparator("mId2"));
scan.setFilter(f);
scan.addFamily(Bytes.toBytes("visitor"));
scan.addFamily(Bytes.toBytes("friend"));
TableMapReduceUtil.initTableMapperJob("User", scan, Mapper1.class, ImmutableBytesWritable.class,Text.class, job);
}
}
So Result values instance would contain the full row from the scanner.
To get the appropriate columns from the Result I would do something like :-
VisitorIdVal = value.getColumnLatest(Bytes.toBytes(columnFamily1), Bytes.toBytes("VisitorId"))
friendlistVal = value.getColumnLatest(Bytes.toBytes(columnFamily2), Bytes.toBytes("friendlist"))
Here VisitorIdVal and friendlistVal are of the type keyValue http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/KeyValue.html, to get their values out you can do a Bytes.toString(VisitorIdVal.getValue())
Once you have extracted the values from columns you can check for "VisitorId" in "friendlist"