Validate parsed fields using Univocity Parser

Validate parsed fields using Univocity Parser - univocity

I wanted to know if there is a way to check and validate a field when using the CsvRoutines package. Basically I want to process a row if the first column has only numbers and skip/possibly throw an exception otherwise. I'm guessing #Validate annotation released in 2.7.0 can be used to achieve this. But I would like to know if there is any other way to achieve the same with earlier versions like 2.5.9?

Author of the library here. There's no other way other than updating to the latest version. Is there any reason in particular why you can't upgrade?
Update: you can put the #Parsed annotations on the class' getters or setters and perform the validations in them. That is probably the cleanest way to go about it. For example:
class Test {
private Integer number;
//accepts a String here... so this is straight from the parser before it tries to convert anything into an integer - which lets you validate or throw a custom exception
#Parsed
void setNumber(String number){
try{
this.number = Integer.valueOf(number);
} catch(NumberFormatException e){
throw new IllegalArgumentException(number + " is not a valid integer");
}
}
}
Another alternative is to use a custom conversion class. Copy the code of class ValidatedConversion, used in the newest version, then create subclass like:
public static class RangeLimiter extends ValidatedConversion {
int min;
int max;
public RangeLimiter(String[] args) {
super(false, false); //not null, not blank
min = Integer.parseInt(args[0]);
max = Integer.parseInt(args[1]);
}
protected void validate(Object value) {
super.validate(value); //runs the existing validations for not null and not blank
int v = ((Number) value).intValue();
if (v < min || v > max) {
throw new DataValidationException("out of range: " + min + " >= " + value + " <=" + max);
}
}
}
Now on your code, use this:
#Parsed(field = "number")
#Convert(conversionClass = RangeLimiter.class, args = {"1", "10"}) //min = 1, max = 10
public int number;
I didn't test this against an old version. I think you may need to set flag applyDefaultConversion=false in the #Parsed annotation, and make your conversion class convert a String into an int in addition to run the validations.
All in all, that's quite a bit of work that can easily be avoided just by upgrading to the latest version.

Related

How to use Isolationforest in weka?

I am trying to use isolationforest in weka ,but I cannot find a easy example which shows how to use it ,who can help me ?thanks in advance
import weka.classifiers.misc.IsolationForest;
public class Test2 {
public static void main(String[] args) {
IsolationForest isolationForest = new IsolationForest();
.....................................................
}
}

I strongly suggest you to study a little bit the implementation for IslationForest.
The following code work loading a CSV file with first column with Class (note: a single class value will produce only (1-anomaly score) if it's binary you will get the anomaly score too. Otherwise it just return an error). Note I skip the second column (that in my case is the uuid that is not needed for anomaly detection)
private static void findOutlier(File in, File out) throws Exception {
CSVLoader loader = new CSVLoader();
loader.setSource(new File(in.getAbsolutePath()));
Instances data = loader.getDataSet();
// setting class attribute if the data format does not provide this information
// For example, the XRFF format saves the class attribute information as well
if (data.classIndex() == -1)
data.setClassIndex(0);
String[] options = new String[2];
options[0] = "-R"; // "range"
options[1] = "2"; // first attribute
Remove remove = new Remove(); // new instance of filter
remove.setOptions(options); // set options
remove.setInputFormat(data); // inform filter about dataset **AFTER** setting options
Instances newData = Filter.useFilter(data, remove); // apply filter
IsolationForest randomForest = new IsolationForest();
randomForest.buildClassifier(newData);
// System.out.println(randomForest);
FileWriter fw = new FileWriter(out);
final Enumeration<Attribute> attributeEnumeration = data.enumerateAttributes();
for (Attribute e = attributeEnumeration.nextElement(); attributeEnumeration.hasMoreElements(); e = attributeEnumeration.nextElement()) {
fw.write(e.name());
fw.write(",");
}
fw.write("(1 - anomaly score),anomaly score\n");
for (int i = 0; i < data.size(); ++i) {
Instance inst = data.get(i);
final double[] distributionForInstance = randomForest.distributionForInstance(inst);
fw.write(inst + ", " + distributionForInstance[0] + "," + (1 - distributionForInstance[0]));
fw.write(",\n");
}
fw.flush();
}
The previous function will add at the CSV at last column the anomaly values. Please note I'm using a single class so for getting the corresponding anomaly I do 1 - distributionForInstance[0] otherwise you ca do simply distributionForInstance[1] .
A sample input.csv for getting (1-anomaly score):
Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
A,2,41,61,81
A,3,61,37,34
A sample input.csv for getting (1-anomaly score) and anomaly score:
Class,ignore, feature_0, feature_1, feature_2
A,1,21,31,31
B,2,41,61,81
A,3,61,37,34

How do you specify multiple Sort fields with Solrj?

I have an application using solr that needs to be able to sort on two fields. The Solrj api is a little confusing, providing multiple different APIs.
I am using Solr 4.10.4
I have tried:
for (int i = 0; i < entry.getValue().size();) {
logger.debug("Solr({}) {}: {} {}", epName, entry.getKey(), entry
.getValue().get(i), entry.getValue().get(i + 1));
if (i == 0) {
query.setSort(new SolrQuery.SortClause(entry.getValue().get(i++), SolrQuery.ORDER.valueOf(entry.getValue().get(i++))));
} else {
query.addSort(new SolrQuery.SortClause(entry.getValue().get(i++), SolrQuery.ORDER.valueOf(entry.getValue().get(i++))));
}
}
When I look at the generated URL I only see the last SortClause sort=sequence+asc
I also tried creating a List and the setSorts SolrQuery method and that too seems to output only as single sort field, always the last one.
I was able to create the correct sort clause by generating it manually with strings.
I have tried addOrUpdateSort as well. I think I've tried most of the obvious combinations. of methods in the Solrj API.
This does work:
StringBuilder sortString = new StringBuilder();
for (int i = 0; i < entry.getValue().size();) {
if (sortString.length() > 0) {
sortString.append(",");
}
logger.debug("Solr({}) {}: {} {}", epName, entry.getKey(), entry
.getValue().get(i), entry.getValue().get(i + 1));
sortString.append(entry.getValue().get(i++)).append(" ").
append(SolrQuery.ORDER.valueOf(entry.getValue().get(i++)));
}
query.set("sort",sortString.toString());
The sort clause I want to see is: sort=is_cited+asc,sequence+asc
The solrj API seems to only output the final clause.
I suspect a bug in solrj 4.10

can you substitute setSort with addSort ie
for (int i = 0; i < entry.getValue().size();) {
logger.debug("Solr({}) {}: {} {}", epName, entry.getKey(), entry
.getValue().get(i), entry.getValue().get(i + 1));
if (i == 0) {
query.addSort(new SolrQuery.SortClause(entry.getValue().get(i++), SolrQuery.ORDER.valueOf(entry.getValue().get(i++))));
} else {
query.addSort(new SolrQuery.SortClause(entry.getValue().get(i++), SolrQuery.ORDER.valueOf(entry.getValue().get(i++))));
}
}
And let me know if this worked

Check out addOrUpdateSort()
Updates or adds a single sort field specification to the current sort
information. If the sort field already exist in the sort information map,
its position is unchanged and the sort order is set; if it does not exist,
it is appended at the end with the specified order..
#return the modified SolrQuery object, for easy chaining
#since 4.2

How to unit test a Groovy script, used in Elasticsearch for _score calculation

I want to do unit testing for a Groovy script, used in Elasticsearch.
The script itself calculates a _score, based on 3 parameters and a given formula.
I want do program an automated unit test for that script, to verify its correctness.
Are there any tools available, which offer such functionality?

I've solved the problem by mocking/emulating Elasticsearch environment in a TestNG test, using Groovy "magic".
Given the following Groovy script, which should compute a custom score value based on parameters and the documents height.
es_compute_custom_score.groovy
h = doc['height']
if (h <= 50) {
// complex logic here ;-)
} else if (h < 1000) {
// more complex logic here ;-)
} else {
// even more complex logic here ;-)
}
_score = a * b + h
Then this unit test lets you walk the red/green/refactor TDD road...
es_compute_custom_scoreTest.groovy (assuming default Maven project layout)
import org.codehaus.groovy.control.CompilerConfiguration
import org.testng.annotations.BeforeMethod
import org.testng.annotations.DataProvider
import org.testng.annotations.Test
class es_compute_custom_scoreTest{
private static final String SCRIPT_UNDER_TEST = 'src/main/groovy/es_compute_custom_score.groovy'
private CompilerConfiguration compilerConfiguration
private Binding binding
#BeforeMethod
public void setUp() throws Exception {
compilerConfiguration = new CompilerConfiguration()
this.compilerConfiguration.scriptBaseClass = DocumentBaseClassMock.class.name
binding = new Binding()
}
#DataProvider
public Object[][] createTestData() {
List<Object[]> refdata = new ArrayList<>()
refdata.add([100, 50, 5042L])
refdata.add([200, 50, 10042L])
refdata.add([300, 50, 15042L])
return refdata
}
#Test(dataProvider = 'createTestData')
void 'calculate a custom document score, based on parameters a and b, and documents height'(Integer a, Integer b, Long expected_score) {
// given
binding.setVariable("a", a)
binding.setVariable("b", b)
binding.setVariable("doc", new MockDocument(42))
// when
evaluateScriptUnderTest(this.binding)
// then
long score = (long) this.binding.getVariable("_score")
assert score == expected_score
}
private void evaluateScriptUnderTest(Binding binding) {
GroovyShell gs = new GroovyShell(binding, compilerConfiguration)
gs.evaluate(new File(SCRIPT_UNDER_TEST));
}
}
class MockDocument {
long height;
MockDocument(long height) {
this.height = height
}
}

Using Conversion Studio by To-Increase to import Notes into Microsoft Dynamics AX 2009

Currently, I'm using Conversion Studio to bring in a CSV file and store the contents in an AX table. This part is working. I have a block defined and the fields are correctly mapped.
The CSV file contains several comments columns, such as Comments-1, Comments-2, etc. There are a fixed number of these. The public comments are labeled as Comments-1...5, and the private comments are labeled as Private-Comment-1...5.
The desired result would be to bring the data into the AX table (as is currently working) and either concatenate the comment fields or store them as separate comments into the DocuRef table as internal or external notes.
Would it not require just setting up a new block in the Conversion Studio project that I already have setup? Can you point me to a resource that maybe shows a similar procedure or how to do this?
Thanks in advance!

After chasing the rabbit down the deepest of rabbit holes, I discovered that the easiest way to do this is like so:
Override the onEntityCommit method of your Document Handler (that extends AppDataDocumentHandler), like so:
AppEntityAction onEntityCommit(AppDocumentBlock documentBlock, AppBlock fromBlock, AppEntity toEntity)
{
AppEntityAction ret;
int64 recId; // Should point to the record currently being imported into CMCTRS
;
ret = super(documentBlock, fromBlock, toEntity);
recId = toEntity.getRecord().recId;
// Do whatever you need to do with the recId now
return ret;
}
Here is my method to insert the notes, in case you need that too:
private static boolean insertNote(RefTableId _tableId, int64 _docuRefId, str _note, str _name, boolean _isPublic)
{
DocuRef docuRef;
boolean insertResult = false;
;
if (_docuRefId)
{
try
{
docuRef.clear();
ttsbegin;
docuRef.RefCompanyId = curext();
docuRef.RefTableId = _tableId;
docuRef.RefRecId = _docuRefId;
docuRef.TypeId = 'Note';
docuRef.Name = _name;
docuRef.Notes = _note;
docuRef.Restriction = (_isPublic) ? DocuRestriction::External : DocuRestriction::Internal;
docuRef.insert();
ttscommit;
insertResult = true;
}
catch
{
ttsabort;
error("Could not insert " + ((_isPublic) ? "public" : "private") + " comment:\n\n\t\"" + _note + "\"");
}
}
return insertResult;
}

What are the best practices for unit testing properties with code in the setter?

I'm fairly new to unit testing and we are actually attempting to use it on a project. There is a property like this.
public TimeSpan CountDown
{
get
{
return _countDown;
}
set
{
long fraction = value.Ticks % 10000000;
value -= TimeSpan.FromTicks(fraction);
if(fraction > 5000000)
value += TimeSpan.FromSeconds(1);
if(_countDown != value)
{
_countDown = value;
NotifyChanged("CountDown");
}
}
}
My test looks like this.
[TestMethod]
public void CountDownTest_GetSet_PropChangedShouldFire()
{
ManualRafflePresenter target = new ManualRafflePresenter();
bool fired = false;
string name = null;
target.PropertyChanged += new PropertyChangedEventHandler((o, a) =>
{
fired = true;
name = a.PropertyName;
});
TimeSpan expected = new TimeSpan(0, 1, 25);
TimeSpan actual;
target.CountDown = expected;
actual = target.CountDown;
Assert.AreEqual(expected, actual);
Assert.IsTrue(fired);
Assert.AreEqual("CountDown", name);
}
The question is how do I test the code in the setter? Do I break it out into a method? If I do it would probably be private since no one else needs to use this. But they say not to test private methods. Do make a class if this is the only case? would two uses of this code make a class worthwhile? What is wrong with this code from a design standpoint. What is correct?

The way you've got is fine (call the setter and then check the get returns the expected value).
Make sure you choose a selection of test values that exercise all the paths in that setter. A single set/get test isn't sufficient coverage.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Validate parsed fields using Univocity Parser - univocity

Related

How to use Isolationforest in weka?

How do you specify multiple Sort fields with Solrj?

How to unit test a Groovy script, used in Elasticsearch for _score calculation

Using Conversion Studio by To-Increase to import Notes into Microsoft Dynamics AX 2009

What are the best practices for unit testing properties with code in the setter?

Categories

Resources