Nifi for parsing HL7 - hdfs

I am trying to parse HL7 files stored in HDFS using NIFI but i am not finding any examples or the proper way to achieve this. Please help.

You could pull the files out of HDFS using ListHDFS + FetchHDFS, then use the ExtractHL7Attributes processor to get information out of the HL7 file, then from there use other processors to do whatever you want with the results.
All of the available processors along with documentation are here:
https://nifi.apache.org/docs.html

Related

How to write Parquet files on HDFS using C++?

I need to write in-memory data records to HDFS file in Parquet format using C++ language. I know there is a parquet-cpp library on github but i can't find example code.
Could anybody share copy or link to example code if you have any? Thanks.
There are examples for parquet-cpp in the github repo in the examples directory. They just deal with Parquet though, and do not involve HDFS access.
For HDFS access from C++, you will need libhdfs from Apache Hadoop. Or you may use Apache Arrow, which has HDFS integration, as desribed here.

Use geospatial data type as input for Apache Beam

Is it possible to read from a geospatial data source (e.g. Shapefile, Geopackage) using Apache Beam's Python SDK?
I found this page but don't know how to take it from there.

Solutions within WSo2 ESB for Processing Zip files

I have the following requirement :
“Copy a Zip file from a sftp server to a directory on a local server, then unzip the file and extract 2 Xml files from inside it to process in our message service we have setup within our ESB.”
I have done some several searches over the past week on the internet, as well as read several topics in the Wso2 documentation but I cannot find a clean way to implement this requirement. I found this question asked on stackoverflow already - https://stackoverflow.com/questions/27806557/wso2-esb-extracting-and-processing-zip-files
However, I did not see where there were any suggestions/solutions provided. My first thought is to build a sequence with a class mediator to handle the extraction of the 2 xml files I need from the zip file, but maybe there is a better approach?
Is there any recommendations, links, or other references that folks could provide or suggest that would help me move forward with implementing this requirement? Or is this something I will need to handle outside of ESB via script with cron control?
Please kindly note that I'm assuming that you are using ESB 4.8.1
Since this is a specific requirement we don't have an out of the box solution for your scenario. However, you can easily do this using WSO2's VFS transport and a custom class mediator. The procedure would be:
Read your zip file using VFS Transport and save it in your local server.
Next, create a class mediator which unzip your zip file and then read your XML files. For more details about how to write a class mediator please refer Class Mediator.
If you need more help regarding this issue please let me know.
Thanks,
Upul
In the newer version of the ESB, the File Connector supports zip/unzip operations: https://docs.wso2.com/display/ESBCONNECTORS/Working+with+the+File+Connector+Version+2

WSO2 CEP STREAMS

I create input and output streams in my WSO2 CEP (v3.1.0) with event formatter and event builder as well. I need to find out where this streams are created in WSO2 CEP catalog structure, becasue I can't find it beyond event builder and formatter (wso2cep-3.1.0\repository\deployment\server).
Has anyone know where I can find this streams files?
Kacu
I managed to load streams via xml (only during startup), by modifying the stream-definitions.xml file in this folder wso2cep-3.1.0/repository/conf/data-bridge.
You can take a look at this page in the documentation for more details, just keep in mind that the location written in the documentation doesn't match what I found in the server.
In CEP 3.1.0, event streams are stored in the registry (which comes with CEP), it is not stored in filesystem. Streams can be located under governance section of the registry (See streamdefinitions sub directory)..
Regards,
Mohan

jar containing org.apache.hadoop.hive.dynamodb

I was trying to programmatically Load a dynamodb table into HDFS (via java, and not hive), I couldnt find examples online on how to do it, so thought I'd download the jar containing org.apache.hadoop.hive.dynamodb and reverse engineer the process.
Unfortunately, I couldn't find the file as well :(.
Could someone answer the following questions for me (listed in order of priority).
Java example that loads a dynamodb table into HDFS (that can be passed to a mapper as a table input format).
the jar containing org.apache.hadoop.hive.dynamodb.
Thanks!
It's in hive-bigbird-handler.jar. Unfortunately AWS doesn't provide any source or at least Java Doc about it. But you can find the jar on any node of an EMR Cluster:
/home/hadoop/.versions/hive-0.8.1/auxlib/hive-bigbird-handler-0.8.1.jar
You might want to checkout this Article:
Amazon DynamoDB Part III: MapReducin’ Logs
Unfortunately, Amazon haven’t released the sources for
hive-bigbird-handler.jar, which is a shame considering its usefulness.
Of particular note, it seems it also includes built-in support for
Hadoop’s Input and Output formats, so one can write straight on
MapReduce Jobs, writing directly into DynamoDB.
Tip: search for hive-bigbird-handler.jar to get to the interesting parts... ;-)
1- I am not aware of any such example, but you might find this library useful. It provides InputFormats, OutputFormats, and Writable classes for reading and writing data to Amazon DynamoDB tables.
2- I don't think they have made it available publically.