I have a tsv file which has fixed rows but each row is mapped to different Java Class.
For example.
recordType recordValue1
recordType recordValue1 recordValue2
for First row I have follofing class:
public class FirstRow implements ItsvRecord {
#Parsed(index = 0)
private String recordType;
#Parsed(index = 1)
private String recordValue1;
public FirstRow() {
}
}
and for second row I have:
public class SecondRow implements ItsvRecord {
#Parsed(index = 0)
private String recordType;
#Parsed(index = 1)
private String recordValue1;
public SecondRow() {
}
}
I want to parse the TSV file directly to the respective objects but I am falling short of ideas.
Use an InputValueSwitch. This will match a value in a particular column of each row to determine what RowProcessor to use. Example:
Create two (or more) processors for each type of record you need to process:
final BeanListProcessor<FirstRow> firstProcessor = new BeanListProcessor<FirstRow>(FirstRow.class);
final BeanListProcessor<SecondRow> secondProcessor = new BeanListProcessor<SecondRow>(SecondRow.class);
Create an InputValueSwitch:
//0 means that the first column of each row has a value that
//identifies what is the type of record you are dealing with
InputValueSwitch valueSwitch = new InputValueSwitch(0);
//assigns the first processor to rows whose first column contain the 'firstRowType' value
valueSwitch.addSwitchForValue("firstRowType", firstProcessor);
//assigns the second processor to rows whose first column contain the 'secondRowType' value
valueSwitch.addSwitchForValue("secondRowType", secondProcessor);
Parse as usual:
TsvParserSettings settings = new TsvParserSettings(); //configure...
// your row processor is the switch
settings.setProcessor(valueSwitch);
TsvParser parser = new TsvParser(settings);
Reader input = new StringReader(""+
"firstRowType\trecordValue1\n" +
"secondRowType\trecordValue1\trecordValue2");
parser.parse(input);
Get the parsed objects from your processors:
List<FirstRow> firstTypeObjects = firstProcessor.getBeans();
List<SecondRow> secondTypeObjects = secondProcessor.getBeans();
The output will be*:
[FirstRow{recordType='firstRowType', recordValue1='recordValue1'}]
[SecondRow{recordType='secondRowType', recordValue1='recordValue1', recordValue2='recordValue2'}]
Assuming you have a sane toString() implemented in your classes
If you want to manage associations among the objects that are parsed:
If your FirstRow should contain the elements parsed for records of type SecondRow, simply override the rowProcessorSwitched method:
InputValueSwitch valueSwitch = new InputValueSwitch(0) {
#Override
public void rowProcessorSwitched(RowProcessor from, RowProcessor to) {
if (from == secondProcessor) {
List<FirstRow> firstRows = firstProcessor.getBeans();
FirstRow mostRecentRow = firstRows.get(firstRows.size() - 1);
mostRecentRow.addRowsOfOtherType(secondProcessor.getBeans());
secondProcessor.getBeans().clear();
}
}
};
The above assumes your FirstRow class has a addRowsOfOtherType method that takes a list of SecondRow as parameter.
And that's it!
You can even mix and match other types of RowProcessor. There's another example here that demonstrates this.
Hope this helps.
Related
I am trying to use isolationforest in weka ,but I cannot find a easy example which shows how to use it ,who can help me ?thanks in advance
import weka.classifiers.misc.IsolationForest;
public class Test2 {
public static void main(String[] args) {
IsolationForest isolationForest = new IsolationForest();
.....................................................
}
}
I strongly suggest you to study a little bit the implementation for IslationForest.
The following code work loading a CSV file with first column with Class (note: a single class value will produce only (1-anomaly score) if it's binary you will get the anomaly score too. Otherwise it just return an error). Note I skip the second column (that in my case is the uuid that is not needed for anomaly detection)
private static void findOutlier(File in, File out) throws Exception {
CSVLoader loader = new CSVLoader();
loader.setSource(new File(in.getAbsolutePath()));
Instances data = loader.getDataSet();
// setting class attribute if the data format does not provide this information
// For example, the XRFF format saves the class attribute information as well
if (data.classIndex() == -1)
data.setClassIndex(0);
String[] options = new String[2];
options[0] = "-R"; // "range"
options[1] = "2"; // first attribute
Remove remove = new Remove(); // new instance of filter
remove.setOptions(options); // set options
remove.setInputFormat(data); // inform filter about dataset **AFTER** setting options
Instances newData = Filter.useFilter(data, remove); // apply filter
IsolationForest randomForest = new IsolationForest();
randomForest.buildClassifier(newData);
// System.out.println(randomForest);
FileWriter fw = new FileWriter(out);
final Enumeration<Attribute> attributeEnumeration = data.enumerateAttributes();
for (Attribute e = attributeEnumeration.nextElement(); attributeEnumeration.hasMoreElements(); e = attributeEnumeration.nextElement()) {
fw.write(e.name());
fw.write(",");
}
fw.write("(1 - anomaly score),anomaly score\n");
for (int i = 0; i < data.size(); ++i) {
Instance inst = data.get(i);
final double[] distributionForInstance = randomForest.distributionForInstance(inst);
fw.write(inst + ", " + distributionForInstance[0] + "," + (1 - distributionForInstance[0]));
fw.write(",\n");
}
fw.flush();
}
The previous function will add at the CSV at last column the anomaly values. Please note I'm using a single class so for getting the corresponding anomaly I do 1 - distributionForInstance[0] otherwise you ca do simply distributionForInstance[1] .
A sample input.csv for getting (1-anomaly score):
Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
A,2,41,61,81
A,3,61,37,34
A sample input.csv for getting (1-anomaly score) and anomaly score:
Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
B,2,41,61,81
A,3,61,37,34
Using Roslyn, the only mechanism for determining members of Visual Basic document appears to be:
var members = SyntaxTree.GetRoot().DescendantNodes().Where(node =>
node is ClassStatementSyntax ||
node is FunctionAggregationSyntax ||
node is IncompleteMemberSyntax ||
node is MethodBaseSyntax ||
node is ModuleStatementSyntax ||
node is NamespaceStatementSyntax ||
node is PropertyStatementSyntax ||
node is SubNewStatementSyntax
);
How do get the member name, StarLineNumber and EndLineNumber of each member?
Exists not only the one way to get it:
1) As you try: I willn't show this way for all of kind member (they count are huge and the logic is the similar), but only a one of them, for example ClassStatementSyntax:
to achive it name just get ClassStatementSyntax.Identifier.ValueText
to get start line you can use Location as one of ways:
var location = Location.Create(SyntaxTree, ClassStatementSyntax.Identifier.Span);
var startLine = location.GetLineSpan().StartLinePosition.Line;
logic for retrieving the end line looks like a logic to receive the start line but it dependents on the corresponding closing statement (some kind of end statement or self)
2) More useful way – use SemanticModel to get a data that you want:
In this way you will need to receive semantic info only for ClassStatementSyntax, ModuleStatementSyntxt and NamespaceStatementSyntax, and all of their members will be received just calling GetMembers():
...
SemanticModel semanticModel = // usually it is received from the corresponding compilation
var typeSyntax = // ClassStatementSyntax, ModuleStatementSyntxt or NamespaceStatementSyntax
string name = null;
int startLine;
int endLine;
var info = semanticModel.GetSymbolInfo(typeSyntax);
if (info.Symbol is INamespaceOrTypeSymbol typeSymbol)
{
name = typeSymbol.Name; // retrieve Name
startLine = semanticModel.SyntaxTree.GetLineSpan(typeSymbol.DeclaringSyntaxReferences[0].Span).StartLinePosition.Line; //retrieve start line
endLine = semanticModel.SyntaxTree.GetLineSpan(typeSymbol.DeclaringSyntaxReferences[0].Span).EndLinePosition.Line; //retrieve end line
foreach (var item in typeSymbol.GetMembers())
{
// do the same logic for retrieving name and lines for all others members without calling GetMembers()
}
}
else if (semanticModel.GetDeclaredSymbol(typeSyntax) is INamespaceOrTypeSymbol typeSymbol2)
{
name = typeSymbol2.Name; // retrieve Name
startLine = semanticModel.SyntaxTree.GetLineSpan(typeSymbol2.DeclaringSyntaxReferences[0].Span).StartLinePosition.Line; //retrieve start line
endLine = semanticModel.SyntaxTree.GetLineSpan(typeSymbol2.DeclaringSyntaxReferences[0].Span).EndLinePosition.Line; //retrieve end line
foreach (var item in typeSymbol2.GetMembers())
{
// do the same logic for retrieving name and lines for all others members without calling GetMembers()
}
}
But attention, when you have a partial declaration your DeclaringSyntaxReferences will have a couple items, so you need to filter SyntaxReference by your current SyntaxTree
I have a method which select the row whose rowkey contains the parameter passed into.
HTable table = new HTable(Bytes.toBytes(objectsTableName), connection);
public List<ObjectId> lookUp(String partialId) {
if (partialId.matches("[a-fA-F0-9]+")) {
// create a regular expression from partialId, which can
//match any rowkey that contains partialId as a substring,
//and then get all the row with the specified rowkey
} else {
throw new IllegalArgumentException(
"query must be done with hexadecimal values only");
}
}
I don't know how to finish code above.
I just know the following code can get the row with specified rowkey in Hbase.
String rowkey = "123";
Get get = new Get(Bytes.toBytes(rowkey));
Result result = table.get(get);
You can use RowFilter filter with RegexStringComparator to do that. Or, if it is just to fetch the rows which match a given substring you can use RowFilter with SubstringComparator. This is how you use HBase filters :
public static void main(String[] args) throws IOException {
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "demo");
Scan s = new Scan();
Filter f = new RowFilter(CompareOp.EQUAL, new SubstringComparator("abc"));
s.setFilter(f);
ResultScanner rs = table.getScanner(s);
for(Result r : rs){
System.out.println("RowKey : " + Bytes.toString(r.getRow()));
//rest of your logic
}
rs.close();
table.close();
}
The above piece of code will give you all the rows which contain abc as a part of their rowkeys.
HTH
I'm using Qjson to parse a json object that is returned from a web service. I'm stuck on handling an array of complex ojects.
At the first level the web service returns a map consisting of "error", "id", and "return". If there are no errors I can get the first level value by using
nestedMap = m_jsonObject["result"].toMap();
group = new Group();
group->Caption = nestedMap["Caption"].toString();
group->CollectionCount = nestedMap["CollectionCount"].toInt();
I can even get a date item value that is at the second level using
group->ModifiedOn = nestedMap["ModifiedOn"].toMap()["Value"].toDateTime();
I have an object called "Elements" that consists of 29 key-value pairs. The web service is returning an array of these "Elements" and I am unable to find the right way to parse it. In the header file the container for the elements is defined as
QList<GroupElement> Elements;
The line
group->Elements = nestedMap["Elements"].toList();
causes the compiler to throw an error 'error: no match for 'operator=' in '((MyClass*)this)->MyClass::group->Group::Elements = QVariant::toMap() const()'
I would like to learn the correct syntax to put this element into the class.
Update: I wrote another function to convert the QVariantMap object to a
first:
The group-> Elements object was changed to a
class ParentClass{
QList<SharedDataPointer<Address> > Elements;
other class memmbers...
};
Second:
A method to convert the QMap object to an Address object was created
QSharedDataPointer<Address>
API_1_6::mapToAddress(QVariantMap o)
{
QSharedDataPointer<Address> address (new Address());
address-> FirstName = o["FirstName"].toString();
address->LastName = o["LastName"].toString();
address->CompanyName = o["CompanyName"].toString();
address->Street = o["Street"].toString();
address->Street2 = o["Street2"].toString();
address->City = o["City"].toString();
address->Zip = o["Zip"].toString();
address-> State = o["State"].toString();
address->Country = o["Country"].toString();
address->Phone = o["Phone"].toString();
address->Phone2 = o["Phone2"].toString();
address-> Fax = o["Fax"].toString();
address-> Url = o["Url"].toString();
address->Email = o["Email"].toString();
address->Other = o["Other"].toString();
return address;
}
third: In the code, foreach is used to walk through the list and create and store the new objects
// get the list of the elements
elementsList = nestedMap["Elements"].toList();
// Add the element, converted to the new type, to the Elements object of the'parent' class
foreach(QVariant qElement, elementsList){
group-> Elements.append(mapToAddress(qElement))
}
I have a table called User with two columns, one called visitorId and the other called friend which is a list of strings. I want to check whether the VisitorId is in the friendlist. Can anyone direct me as to how to access the table columns in a map function?
I'm not able to picture how data is output from a map function in hbase.
My code is as follows:
ublic class MapReduce {
static class Mapper1 extends TableMapper<ImmutableBytesWritable, Text> {
private int numRecords = 0;
private static final IntWritable one = new IntWritable(1);
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
#Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException {
//What should i do here??
ImmutableBytesWritable userKey = new ImmutableBytesWritable(row.get(), 0, Bytes.SIZEOF_INT);
context.write(userkey,One);
}
//context.write(text, ONE);
} catch (InterruptedException e) {
throw new IOException(e);
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "CheckVisitor");
job.setJarByClass(MapReduce.class);
Scan scan = new Scan();
Filter f = new RowFilter(CompareOp.EQUAL,new SubstringComparator("mId2"));
scan.setFilter(f);
scan.addFamily(Bytes.toBytes("visitor"));
scan.addFamily(Bytes.toBytes("friend"));
TableMapReduceUtil.initTableMapperJob("User", scan, Mapper1.class, ImmutableBytesWritable.class,Text.class, job);
}
}
So Result values instance would contain the full row from the scanner.
To get the appropriate columns from the Result I would do something like :-
VisitorIdVal = value.getColumnLatest(Bytes.toBytes(columnFamily1), Bytes.toBytes("VisitorId"))
friendlistVal = value.getColumnLatest(Bytes.toBytes(columnFamily2), Bytes.toBytes("friendlist"))
Here VisitorIdVal and friendlistVal are of the type keyValue http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/KeyValue.html, to get their values out you can do a Bytes.toString(VisitorIdVal.getValue())
Once you have extracted the values from columns you can check for "VisitorId" in "friendlist"