How to get all rows containing (or equaling) a particular ID from an HBase table? - regex

I have a method which select the row whose rowkey contains the parameter passed into.
HTable table = new HTable(Bytes.toBytes(objectsTableName), connection);
public List<ObjectId> lookUp(String partialId) {
if (partialId.matches("[a-fA-F0-9]+")) {
// create a regular expression from partialId, which can
//match any rowkey that contains partialId as a substring,
//and then get all the row with the specified rowkey
} else {
throw new IllegalArgumentException(
"query must be done with hexadecimal values only");
}
}
I don't know how to finish code above.
I just know the following code can get the row with specified rowkey in Hbase.
String rowkey = "123";
Get get = new Get(Bytes.toBytes(rowkey));
Result result = table.get(get);

You can use RowFilter filter with RegexStringComparator to do that. Or, if it is just to fetch the rows which match a given substring you can use RowFilter with SubstringComparator. This is how you use HBase filters :
public static void main(String[] args) throws IOException {
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "demo");
Scan s = new Scan();
Filter f = new RowFilter(CompareOp.EQUAL, new SubstringComparator("abc"));
s.setFilter(f);
ResultScanner rs = table.getScanner(s);
for(Result r : rs){
System.out.println("RowKey : " + Bytes.toString(r.getRow()));
//rest of your logic
}
rs.close();
table.close();
}
The above piece of code will give you all the rows which contain abc as a part of their rowkeys.
HTH

Related

How to use Isolationforest in weka?

I am trying to use isolationforest in weka ,but I cannot find a easy example which shows how to use it ,who can help me ?thanks in advance
import weka.classifiers.misc.IsolationForest;
public class Test2 {
public static void main(String[] args) {
IsolationForest isolationForest = new IsolationForest();
.....................................................
}
}
I strongly suggest you to study a little bit the implementation for IslationForest.
The following code work loading a CSV file with first column with Class (note: a single class value will produce only (1-anomaly score) if it's binary you will get the anomaly score too. Otherwise it just return an error). Note I skip the second column (that in my case is the uuid that is not needed for anomaly detection)
private static void findOutlier(File in, File out) throws Exception {
CSVLoader loader = new CSVLoader();
loader.setSource(new File(in.getAbsolutePath()));
Instances data = loader.getDataSet();
// setting class attribute if the data format does not provide this information
// For example, the XRFF format saves the class attribute information as well
if (data.classIndex() == -1)
data.setClassIndex(0);
String[] options = new String[2];
options[0] = "-R"; // "range"
options[1] = "2"; // first attribute
Remove remove = new Remove(); // new instance of filter
remove.setOptions(options); // set options
remove.setInputFormat(data); // inform filter about dataset **AFTER** setting options
Instances newData = Filter.useFilter(data, remove); // apply filter
IsolationForest randomForest = new IsolationForest();
randomForest.buildClassifier(newData);
// System.out.println(randomForest);
FileWriter fw = new FileWriter(out);
final Enumeration<Attribute> attributeEnumeration = data.enumerateAttributes();
for (Attribute e = attributeEnumeration.nextElement(); attributeEnumeration.hasMoreElements(); e = attributeEnumeration.nextElement()) {
fw.write(e.name());
fw.write(",");
}
fw.write("(1 - anomaly score),anomaly score\n");
for (int i = 0; i < data.size(); ++i) {
Instance inst = data.get(i);
final double[] distributionForInstance = randomForest.distributionForInstance(inst);
fw.write(inst + ", " + distributionForInstance[0] + "," + (1 - distributionForInstance[0]));
fw.write(",\n");
}
fw.flush();
}
The previous function will add at the CSV at last column the anomaly values. Please note I'm using a single class so for getting the corresponding anomaly I do 1 - distributionForInstance[0] otherwise you ca do simply distributionForInstance[1] .
A sample input.csv for getting (1-anomaly score):
Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
A,2,41,61,81
A,3,61,37,34
A sample input.csv for getting (1-anomaly score) and anomaly score:
Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
B,2,41,61,81
A,3,61,37,34

Passing Side Input in PCollection Partition

I want to pass a sideInput in PCollection Partition and On basis of that, i need to Divide my PCollection is their anyway....
PCollectionList<TableRow> part = merged.apply(Partition.of(Pcollection Count Function Called, new PartitionFn<TableRow>(){
#Override
public int partitionFor(TableRow arg0, int arg1) {
return 0;
}
}));
Any Other Way through Which I Can Partition My PCollection
//Without Dynamic destination partitioning BigQuery table
merge.apply("write into target", BigQueryIO.writeTableRows()
.to(new SerializableFunction<ValueInSingleWindow<TableRow>, TableDestination>() {
#Override
public TableDestination apply(ValueInSingleWindow<TableRow> value) {
TableRow row = value.getValue();
TableReference reference = new TableReference();
reference.setProjectId("XYZ");
reference.setDatasetId("ABC");
System.out.println("date of row " + row.get("authorized_transaction_date_yyyymmdd").toString());
LOG.info("date of row "+
row.get("authorized_transaction_date_yyyymmdd").toString());
String str = row.get("authorized_transaction_date_yyyymmdd").toString();
str = str.substring(0, str.length() - 2) + "01";
System.out.println("str value " + str);
LOG.info("str value " + str);
reference.setTableId("TargetTable$" + str);
return new TableDestination(reference, null);
}
}).withFormatFunction(new SerializableFunction<TableRow, TableRow>() {
#Override
public TableRow apply(TableRow input) {
LOG.info("format function:"+input.toString());
return input;
}
})
.withSchema(schema1).withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
Now I have to use Dynamic Destination Any Solution.Insted Of this and have to Do Partition.
Based on seeing TableRow in your code, I suspect that you want to write a PCollection to BigQuery, sending different elements to different BigQuery tables. BigQueryIO.write() already provides a method to do that, using BigQueryIO.write().to(DynamicDestinations). See Writing different values to different BigQuery tables in Apache Beam.

Refreshing a list of webelements after delete line

I am writing an automated test for a dynamic webtable using Selenium Webdriver with chromedriver and testNG.
The objective is to assert that a certain table-entry is there, delete it if it is, and then assert if it is deleted. This second assert is not working properly however.
During the first assert I call the method that creates a list of Webelements and gets the number of rows. I use this number to know when to stop iterating through the table.
The second assert uses the same table to do the same thing, but now the DOM has changed, en there are only 18 rules left in my table were there were 19 before. As soon as the iteration tries to get the 19th row I get the following:
org.openqa.selenium.NoSuchElementException: no such element: Unable to
locate element: {"method":"xpath","selector":".//tr[19]/td[1]"}
I have tried creating a new instance of the MyWishlistPage, but this new instance also sees the "old" number of table rules
I have also tried a thread sleep and a driver refresh after the row delete, but this doesn't help either (piece of code is still there, commented)
I ended up altering my test class to go to another page and then return to the MYWishlistPage. This works, but it's a sloppy workaround that I'm not happy with
Can anyone tell me how I can get the correct number of rows after deleting an entry from the table?
This is a piece of the class for the page that I have the problem with :
public class MyWishlistsPage
{
private WebDriver driver;
public MyWishlistsPage(WebDriver driver)
{
this.driver = driver;
//This call sets the WebElements
PageFactory.initElements(driver, this);
}
public Boolean isWishlistAvailable(String nameToAssert)
{
//This list gets the number of rows from the table
List<WebElement> rows = driver.findElements(By.xpath(".//tr"));
//This loop finds the first row which' title matches sRowValue
for (int i = 1; i < rows.size(); i++)
{
String sValue = driver.findElement(By.xpath(".//tr[" + i + "]/td[1]")).getText();
if (sValue.equalsIgnoreCase(nameToAssert))
{
return true;
}
}
return false;
}
public void deleteWishlistsEntry (String sRowValue)
{
//This list gets the number of rows from the table
List<WebElement> rows = driver.findElements(By.xpath(".//tr"));
//This loop finds the first row which' title matches sRowValue
for (int i = 1; i < rows.size(); i++)
{
String sValue = driver.findElement(By.xpath(".//tr[" + i + "]/td[1]")).getText();
if (sValue.equalsIgnoreCase(sRowValue))
{
// If the sValue matches with the description, the element in the seventh column of the row will be clicked
driver.findElement(By.xpath(".//tr[" + i + "]/td[7]/a/i")).click();
driver.switchTo().alert().accept();
/* try
{
Thread.sleep(10000);
} catch(InterruptedException ex)
{
Thread.currentThread().interrupt();
}
driver.navigate().refresh(); */
}
}
}
}
This is a piece of the testclass I am calling the page from :
Assertions.assertThat(mywishlistspage.isWishlistAvailable(listToAssert)).as("The list you were trying to delete did not exist, and an attempt to create it failed ").isTrue();
//Deletes the chosen list
mywishlistspage.deleteWishlistsEntry(listToAssert);
//Verifies that list has been deleted
//homepage.clickMyAccountPage();
//myaccountpage.goToMyWishlistsPage();
Assertions.assertThat(mywishlistspage.isWishlistAvailable(listToAssert)).as("The list you tried to delete is still there").isFalse();
In the deleteWishlistsEntry, First it get the number of rows, after clicking on the "anchor" the line item will be deleted, after that u have to break / exit from that loop once it clicked on the anchor tag... But in the code u r looping until the 19th element that is why u get "Unable to locate element: {"method":"xpath","selector":".//tr[19]/td[1]"}"
public void deleteWishlistsEntry (String sRowValue)
{
//This list gets the number of rows from the table
List<WebElement> rows = driver.findElements(By.xpath(".//tr"));
//This loop finds the first row which' title matches sRowValue
for (int i = 1; i < rows.size(); i++)
{
String sValue = driver.findElement(By.xpath(".//tr[" + i + "]/td[1]")).getText();
if (sValue.equalsIgnoreCase(sRowValue))
{
// If the sValue matches with the description, the element in the seventh column of the row will be clicked
driver.findElement(By.xpath(".//tr[" + i + "]/td[7]/a/i")).click();
driver.switchTo().alert().accept();
break;
}
}
}

Univocity - parse each TSV file row to different Type of class object

I have a tsv file which has fixed rows but each row is mapped to different Java Class.
For example.
recordType recordValue1
recordType recordValue1 recordValue2
for First row I have follofing class:
public class FirstRow implements ItsvRecord {
#Parsed(index = 0)
private String recordType;
#Parsed(index = 1)
private String recordValue1;
public FirstRow() {
}
}
and for second row I have:
public class SecondRow implements ItsvRecord {
#Parsed(index = 0)
private String recordType;
#Parsed(index = 1)
private String recordValue1;
public SecondRow() {
}
}
I want to parse the TSV file directly to the respective objects but I am falling short of ideas.
Use an InputValueSwitch. This will match a value in a particular column of each row to determine what RowProcessor to use. Example:
Create two (or more) processors for each type of record you need to process:
final BeanListProcessor<FirstRow> firstProcessor = new BeanListProcessor<FirstRow>(FirstRow.class);
final BeanListProcessor<SecondRow> secondProcessor = new BeanListProcessor<SecondRow>(SecondRow.class);
Create an InputValueSwitch:
//0 means that the first column of each row has a value that
//identifies what is the type of record you are dealing with
InputValueSwitch valueSwitch = new InputValueSwitch(0);
//assigns the first processor to rows whose first column contain the 'firstRowType' value
valueSwitch.addSwitchForValue("firstRowType", firstProcessor);
//assigns the second processor to rows whose first column contain the 'secondRowType' value
valueSwitch.addSwitchForValue("secondRowType", secondProcessor);
Parse as usual:
TsvParserSettings settings = new TsvParserSettings(); //configure...
// your row processor is the switch
settings.setProcessor(valueSwitch);
TsvParser parser = new TsvParser(settings);
Reader input = new StringReader(""+
"firstRowType\trecordValue1\n" +
"secondRowType\trecordValue1\trecordValue2");
parser.parse(input);
Get the parsed objects from your processors:
List<FirstRow> firstTypeObjects = firstProcessor.getBeans();
List<SecondRow> secondTypeObjects = secondProcessor.getBeans();
The output will be*:
[FirstRow{recordType='firstRowType', recordValue1='recordValue1'}]
[SecondRow{recordType='secondRowType', recordValue1='recordValue1', recordValue2='recordValue2'}]
Assuming you have a sane toString() implemented in your classes
If you want to manage associations among the objects that are parsed:
If your FirstRow should contain the elements parsed for records of type SecondRow, simply override the rowProcessorSwitched method:
InputValueSwitch valueSwitch = new InputValueSwitch(0) {
#Override
public void rowProcessorSwitched(RowProcessor from, RowProcessor to) {
if (from == secondProcessor) {
List<FirstRow> firstRows = firstProcessor.getBeans();
FirstRow mostRecentRow = firstRows.get(firstRows.size() - 1);
mostRecentRow.addRowsOfOtherType(secondProcessor.getBeans());
secondProcessor.getBeans().clear();
}
}
};
The above assumes your FirstRow class has a addRowsOfOtherType method that takes a list of SecondRow as parameter.
And that's it!
You can even mix and match other types of RowProcessor. There's another example here that demonstrates this.
Hope this helps.

Hbase Map/reduce-How to access individual columns of the table?

I have a table called User with two columns, one called visitorId and the other called friend which is a list of strings. I want to check whether the VisitorId is in the friendlist. Can anyone direct me as to how to access the table columns in a map function?
I'm not able to picture how data is output from a map function in hbase.
My code is as follows:
ublic class MapReduce {
static class Mapper1 extends TableMapper<ImmutableBytesWritable, Text> {
private int numRecords = 0;
private static final IntWritable one = new IntWritable(1);
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
#Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException {
//What should i do here??
ImmutableBytesWritable userKey = new ImmutableBytesWritable(row.get(), 0, Bytes.SIZEOF_INT);
context.write(userkey,One);
}
//context.write(text, ONE);
} catch (InterruptedException e) {
throw new IOException(e);
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "CheckVisitor");
job.setJarByClass(MapReduce.class);
Scan scan = new Scan();
Filter f = new RowFilter(CompareOp.EQUAL,new SubstringComparator("mId2"));
scan.setFilter(f);
scan.addFamily(Bytes.toBytes("visitor"));
scan.addFamily(Bytes.toBytes("friend"));
TableMapReduceUtil.initTableMapperJob("User", scan, Mapper1.class, ImmutableBytesWritable.class,Text.class, job);
}
}
So Result values instance would contain the full row from the scanner.
To get the appropriate columns from the Result I would do something like :-
VisitorIdVal = value.getColumnLatest(Bytes.toBytes(columnFamily1), Bytes.toBytes("VisitorId"))
friendlistVal = value.getColumnLatest(Bytes.toBytes(columnFamily2), Bytes.toBytes("friendlist"))
Here VisitorIdVal and friendlistVal are of the type keyValue http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/KeyValue.html, to get their values out you can do a Bytes.toString(VisitorIdVal.getValue())
Once you have extracted the values from columns you can check for "VisitorId" in "friendlist"