Univocity - Issue in file header validation during multiple reads - univocity

I'm using Univocity-Parser's bean iterator to read each line of file and get the bean. I have observed a weird behavior in the library when I'm attempting to read the same file mutiple times.
Code when passing the File object to CsvParser instance:
private static void testBeanIterator() throws Exception {
try {
File sampleFile = generateFile(0);
/*
System.out.println("Sample file content = " + FileUtils.readFileToString(sampleFile,
Charset.defaultCharset()));
*/
for (int i = 0; i < 1000; i++) {
BufferedReader reader =
new BufferedReader(new InputStreamReader(new FileInputStream(sampleFile),
StandardCharsets.UTF_8));
AtomicInteger atomicInteger = new AtomicInteger();
final BeanProcessor<CustomerSegmentMapping> rowProcessor =
new BeanProcessor<CustomerSegmentMapping>(CustomerSegmentMapping.class) {
#Override
public void beanProcessed(#Nonnull final CustomerSegmentMapping customerSegmentMapping,
#Nonnull final ParsingContext context) {
try {
System.out.println(OBJECT_MAPPER.writeValueAsString(customerSegmentMapping));
atomicInteger.getAndAdd(1);
} catch (Exception ex) {
throw new RuntimeException("error");
}
}
};
rowProcessor.setStrictHeaderValidationEnabled(true);
final CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
final CsvParser parser = new CsvParser(parserSettings);
//parser.parse(reader);
parser.parse(sampleFile);
System.out.println("Finished parser");
if (atomicInteger.get() != 10) {
throw new Exception("mismatch");
}
reader.close();
}
} catch (Exception ex) {
throw new RuntimeException("exception = " + ex.getMessage(), ex);
} finally {
}
}
On executing the code, following is the console output:
{"customerId":"6bc12a7a-2c28-4aea-a7be-6be45e16ffb2","segmentId":"S1"}
{"customerId":"da736310-e508-47ff-92b8-59d490e37a72","segmentId":"S1"}
{"customerId":"9a5d4454-e6d4-49a5-bb04-8354154d0493","segmentId":"S1"}
{"customerId":"ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5","segmentId":"S1"}
{"customerId":"94ea24b0-0c83-4039-a391-1d2439c88be8","segmentId":"S1"}
{"customerId":"2baef5f9-d8cd-451d-b579-a626cb58b284","segmentId":"S1"}
{"customerId":"022a184b-1b06-49aa-b1c4-b94a6f343b04","segmentId":"S1"}
{"customerId":"bcb3984c-0495-4da8-b146-9af3983cc158","segmentId":"S1"}
{"customerId":"feef62de-1aaf-43d4-a83b-afe053db97cf","segmentId":"S1"}
{"customerId":"5825c924-55d5-4fd6-8468-ca36d47a7cae","segmentId":"S1"}
Finished parser
{"customerId":"6bc12a7a-2c28-4aea-a7be-6be45e16ffb2","segmentId":"S1"}
{"customerId":"da736310-e508-47ff-92b8-59d490e37a72","segmentId":"S1"}
{"customerId":"9a5d4454-e6d4-49a5-bb04-8354154d0493","segmentId":"S1"}
{"customerId":"ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5","segmentId":"S1"}
{"customerId":"94ea24b0-0c83-4039-a391-1d2439c88be8","segmentId":"S1"}
{"customerId":"2baef5f9-d8cd-451d-b579-a626cb58b284","segmentId":"S1"}
{"customerId":"022a184b-1b06-49aa-b1c4-b94a6f343b04","segmentId":"S1"}
{"customerId":"bcb3984c-0495-4da8-b146-9af3983cc158","segmentId":"S1"}
{"customerId":"feef62de-1aaf-43d4-a83b-afe053db97cf","segmentId":"S1"}
{"customerId":"5825c924-55d5-4fd6-8468-ca36d47a7cae","segmentId":"S1"}
Finished parser
{"customerId":"6bc12a7a-2c28-4aea-a7be-6be45e16ffb2","segmentId":"S1"}
{"customerId":"da736310-e508-47ff-92b8-59d490e37a72","segmentId":"S1"}
{"customerId":"9a5d4454-e6d4-49a5-bb04-8354154d0493","segmentId":"S1"}
{"customerId":"ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5","segmentId":"S1"}
{"customerId":"94ea24b0-0c83-4039-a391-1d2439c88be8","segmentId":"S1"}
{"customerId":"2baef5f9-d8cd-451d-b579-a626cb58b284","segmentId":"S1"}
{"customerId":"022a184b-1b06-49aa-b1c4-b94a6f343b04","segmentId":"S1"}
{"customerId":"bcb3984c-0495-4da8-b146-9af3983cc158","segmentId":"S1"}
{"customerId":"feef62de-1aaf-43d4-a83b-afe053db97cf","segmentId":"S1"}
{"customerId":"5825c924-55d5-4fd6-8468-ca36d47a7cae","segmentId":"S1"}
Finished parser
Exception in thread "main" java.lang.RuntimeException: exception = Could not find fields [CustomerId]' in input. Names found: [ustomerId, SegmentId]
Internal state when error was thrown: line=2, column=0, record=1, charIndex=60, headers=[ustomerId, SegmentId]
at com.poppins.cube.common.UnivocityNahiHatanaHai.testBeanIterator(UnivocityNahiHatanaHai.java:95)
at com.poppins.cube.common.UnivocityNahiHatanaHai.main(UnivocityNahiHatanaHai.java:37)
Caused by: com.univocity.parsers.common.DataProcessingException: Could not find fields [CustomerId]' in input. Names found: [ustomerId, SegmentId]
Internal state when error was thrown: line=2, column=0, record=1, charIndex=60, headers=[ustomerId, SegmentId]
at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapFieldIndexes(BeanConversionProcessor.java:414)
at com.univocity.parsers.common.processor.core.BeanConversionProcessor.mapValuesToFields(BeanConversionProcessor.java:340)
at com.univocity.parsers.common.processor.core.BeanConversionProcessor.createBean(BeanConversionProcessor.java:508)
at com.univocity.parsers.common.processor.core.AbstractBeanProcessor.rowProcessed(AbstractBeanProcessor.java:54)
at com.univocity.parsers.common.Internal.process(Internal.java:21)
at com.univocity.parsers.common.AbstractParser.rowProcessed(AbstractParser.java:596)
at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:133)
at com.univocity.parsers.common.AbstractParser.parse(AbstractParser.java:605)
at com.poppins.cube.common.UnivocityNahiHatanaHai.testBeanIterator(UnivocityNahiHatanaHai.java:83)
... 1 more
Process finished with exit code 1
Following is the content of the file:
CustomerId,SegmentId
6bc12a7a-2c28-4aea-a7be-6be45e16ffb2,S1
da736310-e508-47ff-92b8-59d490e37a72,S1
9a5d4454-e6d4-49a5-bb04-8354154d0493,S1
ec2ed5cc-cd18-443b-bd69-e56fc09ba0f5,S1
94ea24b0-0c83-4039-a391-1d2439c88be8,S1
2baef5f9-d8cd-451d-b579-a626cb58b284,S1
022a184b-1b06-49aa-b1c4-b94a6f343b04,S1
bcb3984c-0495-4da8-b146-9af3983cc158,S1
feef62de-1aaf-43d4-a83b-afe053db97cf,S1
5825c924-55d5-4fd6-8468-ca36d47a7cae,S1
From what I could understand, the issue is arising because I'm passing a File object to CsvParser. CsvParser internally creates an InputStream object which is not closed.
If I'm passing a Buffered reader object instead of File object, the issue is not arising.
I'm not able to understand whether this is a known issue with the Univocity-Parsers or is there anything I'm missing in understanding.

Author of the library here. I can see your exception showing it got header ustomerId instead of CustomerId.
This looks like a bug introduced in version 2.5.0 that was fixed in version 2.5.6 if I'm not mistaken. This plagued me for a while as it was an internal concurrency issue that was hard to track down. Basically when you pass a File without an explicit encoding it will try to find a UTF BOM marker in the input (effectively consuming the first character) to determine the encoding automatically. This happened only for InputStreams and Files.
Anyway, this has been fixed so simply updating to the latest version should get rid of the problem for you (please let me know if you are not using version 2.5.something)
If you want to remain with the current version you have there, the error will be gone if you call
parser.parse(sampleFile, Charset.defaultCharset());
This will prevent the parser from trying to discover whether there's a BOM marker in your file, therefore avoiding that pesky bug.
Hope this helps

Related

SolrJ - NPE when accessing to SolrCloud

I'm running the following test code on SolrCloud using Solrj library:
public static void main(String[] args) {
String zkHostString = "192.168.56.99:2181";
SolrClient solr = new CloudSolrClient.Builder().withZkHost(zkHostString).build();
List<MyBean> beans = new ArrayList<>();
for(int i = 0; i < 10000 ; i++) {
// creating a bunch of MyBean to be indexed
// and temporarily storing them in a List
// no Solr operations performed here
}
System.out.println("Adding...");
try {
solr.addBeans("myCollection", beans);
} catch (IOException | SolrServerException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
System.out.println("Committing...");
try {
solr.commit("myCollection");
} catch (SolrServerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
This code fails due to the following exception
Exception in thread "main" java.lang.NullPointerException
at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1175)
at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1057)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:106)
at org.apache.solr.client.solrj.SolrClient.addBeans(SolrClient.java:357)
at org.apache.solr.client.solrj.SolrClient.addBeans(SolrClient.java:312)
at com.togather.solr.testing.SolrIndexingTest.main(SolrIndexingTest.java:83)
This is the full stacktrace of the exception. I just "upgraded" from a Solr standalone installation to a SolrCloud (with an external Zookeeper single instance, not the embedded one). With standalone Solr the same code (with just some minor differences, like the host URL) used to work perfectly.
The NPE sends me inside the SolrJ library, which I don't know.
Anyone can help me understand where the problem originates from and how I can overcome it? Due to my unexperience and the brevity of the error message, I can't figure out where to start inquiring from.
Looking at your code, I would suggest to specify the default collection as first thing.
CloudSolrClient solr = new CloudSolrClient.Builder().withZkHost(zkHostString).build();
solr.setDefaultCollection("myCollection");
Regarding the NPE you're experiencing, very likely is due to a network error.
In these lines your exception is raised by for loop: for (DocCollection ext : requestedCollections)
if (wasCommError) {
// it was a communication error. it is likely that
// the node to which the request to be sent is down . So , expire the state
// so that the next attempt would fetch the fresh state
// just re-read state for all of them, if it has not been retired
// in retryExpiryTime time
for (DocCollection ext : requestedCollections) {
ExpiringCachedDocCollection cacheEntry = collectionStateCache.get(ext.getName());
if (cacheEntry == null) continue;
cacheEntry.maybeStale = true;
}
if (retryCount < MAX_STALE_RETRIES) {//if it is a communication error , we must try again
//may be, we have a stale version of the collection state
// and we could not get any information from the server
//it is probably not worth trying again and again because
// the state would not have been updated
return requestWithRetryOnStaleState(request, retryCount + 1, collection);
}
}

Using decision within a java delegate

I am trying to evaluate a decision inside a camunda java delegate that I created. Following is the code that I am using. Upon executing the delegate( which runs fine without the DMN part), I get an error stating:
java.lang.NoClassDefFoundError: de/odysseus/el/util/SimpleContext"
I am using gradle and have added the following to my .build :
compile 'org.camunda.bpm.dmn:camunda-engine-dmn' ,
'org.camunda.bpm.dmn:camunda-engine-feel-juel:7.5.0-alpha2' ,
'de.odysseus.juel:juel-spi:2.2.7',
'de.odysseus.juel:juel-api:2.2.7' ,
'de.odysseus.juel:juel-impl:2.2.7'
Any suggestion how can I fix this error up? Thanks.
DMN Code:
DmnEngine dmnEngine = DmnEngineConfiguration.createDefaultDmnEngineConfiguration().buildEngine();
// read the DMN XML file as input stream
InputStream inputStream = CheckDatafileExistsExecutor.class.getResourceAsStream("decision1.xml");
// parse the DMN decision from the input stream
DmnDecision decision = dmnEngine.parseDecision("Decision_13nychf", inputStream);
//accessing the input variables
VariableMap variables = Variables.fromMap((Map<String, Object>) decision);
// evaluate the decision table with the input variables
DmnDecisionTableResult result = dmnEngine.evaluateDecisionTable(decision, variables);
int size = result.size();
DmnDecisionRuleResult ruleResult = result.get(0);
Remove all your dependencies and add only compile group: 'org.camunda.bpm.dmn', name: 'camunda-engine-dmn', version: '7.6.0'
You can try also menskis example, but change the camunda-engine-dmn to 7.6.0
and
DmnDecisionTableResult results = dmnEngine.evaluateDecisionTable("decision", "Example.dmn", variables);
to
InputStream fileAsStream = IoUtil.fileAsStream("Example.dmn");
DmnDecisionTableResult results = dmnEngine.evaluateDecisionTable("decision", fileAsStream, variables);
You can use delegate service for decision evaluation based on key or Id,
eg:-
public void execute(DelegateExecution delegateExecution) throws Exception{
DecisionService decisionService = delegateExecution.getProcessEngineServices().getDecisionService();
decisionService.evaluateDecisionByKey(dmnToInvoke).variables(delegateExecution.get Variables()).evaluate();
}

Random "error reading MIME multipart body" in Web API

Occasionally, an upload of a gzipped file from a phone app to the web service fails with the following error:
Error reading MIME multipart body part.
at System.Net.Http.HttpContentMultipartExtensions.<MultipartReadAsync>d__8.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Net.Http.HttpContentMultipartExtensions.<ReadAsMultipartAsync>d__0`1.MoveNext()
The endpoint itself is pretty basic:
[System.Web.Http.HttpPost]
public async Task<HttpResponseMessage> TechAppUploadPhoto()
{
if (!Request.Content.IsMimeMultipartContent())
{
return Request.CreateErrorResponse(HttpStatusCode.UnsupportedMediaType, "The request isn't valid!");
}
try
{
var provider = new MultipartMemoryStreamProvider();
await Request.Content.ReadAsMultipartAsync(provider);
foreach (StreamContent file in provider.Contents)
{
Stream dataStream = await file.ReadAsStreamAsync();
String fileName = file.Headers.ContentDisposition.FileName;
fileName = [unique name];
String filePath = Path.Combine(ConfigurationManager.AppSettings["PhotoUploadLocation"], fileName);
using (var fileStream = File.Create(filePath))
{
dataStream.Seek(0, SeekOrigin.Begin);
dataStream.CopyTo(fileStream);
fileStream.Close();
dataStream.Close();
}
// Enable overwriting with ZipArchive
using (ZipArchive archive = ZipFile.OpenRead(filePath))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
entry.ExtractToFile(Path.Combine(ConfigurationManager.AppSettings["PhotoUploadLocation"], entry.FullName), true);
}
}
File.Delete(filePath);
}
return Request.CreateResponse(HttpStatusCode.Accepted);
}
catch (Exception e)
{
return Request.CreateErrorResponse(HttpStatusCode.InternalServerError, "PostCatchErr: " + e.Message + e.StackTrace);
}
}
This error appears to be pretty random and hard to recreate. Unfortunately, I didn't write either end and don't have much experience here, but it doesn't appear to be related to the size of the upload either - it's a large limit in the web config and larger files can be uploaded than those that error out. Is there something I'm missing that could cause this issue? It only appears to read the body information once, which is the other possible cause I've found. Any thoughts?
This answer on a similar question helped me.
The value for maxRequestLength is incorrect though. It should be 30000 for 30MB (as pointed out in one of the comments on the answer).

C++/CX - DataReader out of bounds exception

I have the following code which opens a file and it works most of the time for one time. After that I get exceptions thrown and I don't know where the problem is hiding. I tried to look for this for a couple of days already but no luck.
String^ xmlFile = "Assets\\TheXmlFile.xml";
xml = ref new XmlDocument();
StorageFolder^ InstallationFolder = Windows::ApplicationModel::Package::Current->InstalledLocation;
task<StorageFile^>(
InstallationFolder->GetFileAsync(xmlFile)).then([this](StorageFile^ file) {
if (nullptr != file) {
task<Streams::IRandomAccessStream^>(file->OpenAsync(FileAccessMode::Read)).then([this](Streams::IRandomAccessStream^ stream)
{
IInputStream^ deInputStream = stream->GetInputStreamAt(0);
DataReader^ reader = ref new DataReader(deInputStream);
reader->InputStreamOptions = InputStreamOptions::Partial;
reader->LoadAsync(stream->Size);
strXml = reader->ReadString(stream->Size);
MessageDialog^ dlg = ref new MessageDialog(strXml);
dlg->ShowAsync();
});
}
});
The error is triggered at this part of the code:
strXml = reader->ReadString(stream->Size);
I get the following error:
First-chance exception at 0x751F5B68 in XmlProject.exe: Microsoft C++ exception: Platform::OutOfBoundsException ^ at memory location 0x02FCD634. HRESULT:0x8000000B The operation attempted to access data outside the valid range
WinRT information: The operation attempted to access data outside the valid range
Just like I said, the first time it just works but after that I get the error. I tried detaching the stream and buffer of the datareader and tried to flush the stream but no results.
I've also asked this question on the Microsoft C++ forums and credits to the user "Viorel_" I managed to get it working. Viorel said the following:
Since LoadAsync does not perform the operation immediately, you should probably add a corresponding “.then”. See some code: https://social.msdn.microsoft.com/Forums/windowsapps/en-US/94fa9636-5cc7-4089-8dcf-7aa8465b8047. This sample uses “create_task” and “then”: https://code.msdn.microsoft.com/vstudio/StreamSocket-Sample-8c573931/sourcecode (file Scenario1.xaml.cpp, for example).
I have had to separate the content in the task<Streams::IRandomAccessStream^> and split it up in separate tasks.
I reconstructed my code and I have now the following:
String^ xmlFile = "Assets\\TheXmlFile.xml";
xml = ref new XmlDocument();
StorageFolder^ InstallationFolder = Windows::ApplicationModel::Package::Current->InstalledLocation;
task<StorageFile^>(
InstallationFolder->GetFileAsync(xmlFile)).then([this](StorageFile^ file) {
if (nullptr != file) {
task<Streams::IRandomAccessStream^>(file->OpenAsync(FileAccessMode::Read)).then([this](Streams::IRandomAccessStream^ stream)
{
IInputStream^ deInputStream = stream->GetInputStreamAt(0);
DataReader^ reader = ref new DataReader(deInputStream);
reader->InputStreamOptions = InputStreamOptions::Partial;
create_task(reader->LoadAsync(stream->Size)).then([reader, stream](unsigned int size){
strXml = reader->ReadString(stream->Size);
MessageDialog^ dlg = ref new MessageDialog(strXml);
dlg->ShowAsync();
});
});
}
});

GATE Embedded runtime

I want to use "GATE" through web. Then I decide to create a SOAP web service in java with help of GATE Embedded.
But for the same document and saved Pipeline, I have a different run-time duration, when GATE Embedded runs as a java web service.
The same code has a constant run-time when it runs as a Java Application project.
In the web service, the run-time will be increasing after each execution until I get a Timeout error.
Does any one have this kind of experience?
This is my Code:
#WebService(serviceName = "GateWS")
public class GateWS {
#WebMethod(operationName = "gateengineapi")
public String gateengineapi(#WebParam(name = "PipelineNumber") String PipelineNumber, #WebParam(name = "Documents") String Docs) throws Exception {
try {
System.setProperty("gate.home", "C:\\GATE\\");
System.setProperty("shell.path", "C:\\cygwin2\\bin\\sh.exe");
Gate.init();
File GateHome = Gate.getGateHome();
File FrenchGapp = new File(GateHome, PipelineNumber);
CorpusController FrenchController;
FrenchController = (CorpusController) PersistenceManager.loadObjectFromFile(FrenchGapp);
Corpus corpus = Factory.newCorpus("BatchProcessApp Corpus");
FrenchController.setCorpus(corpus);
File docFile = new File(GateHome, Docs);
Document doc = Factory.newDocument(docFile.toURL(), "utf-8");
corpus.add(doc);
FrenchController.execute();
String docXMLString = null;
docXMLString = doc.toXml();
String outputFileName = doc.getName() + ".out.xml";
File outputFile = new File(docFile.getParentFile(), outputFileName);
FileOutputStream fos = new FileOutputStream(outputFile);
BufferedOutputStream bos = new BufferedOutputStream(fos);
OutputStreamWriter out;
out = new OutputStreamWriter(bos, "utf-8");
out.write(docXMLString);
out.close();
gate.Factory.deleteResource(doc);
return outputFileName;
} catch (Exception ex) {
return "ERROR: -> " + ex.getMessage();
}
}
}
I really appreciate any help you can provide.
The problem is that you're loading a new instance of the pipeline for every request, but then not freeing it again at the end of the request. GATE maintains a list internally of every PR/LR/controller that is loaded, so anything you load with Factory.createResource or PersistenceManager.loadObjectFrom... must be freed using Factory.deleteResource once it is no longer needed, typically using a try-finally:
FrenchController = (CorpusController) PersistenceManager.loadObjectFromFile(FrenchGapp);
try {
// ...
} finally {
Factory.deleteResource(FrenchController);
}
But...
Rather than loading a new instance of the pipeline every time, I would strongly recommend you explore a more efficient approach to load a smaller number of instances of the pipeline but keep them in memory to serve multiple requests. There is a fully worked-through example of this technique in the training materials on the GATE wiki, in particular module number 8 (track 2 Thursday).