Does HDFS support INotify with file open event? - hdfs

Does HDFS support INotify with file open event? From document it seems that hdfs inotify only support append, close, create ... not include file open, while linux inotify supports IN_OPEN

It does not. The HDFS INotify feature is based off of the EditLog, which only contains write transactions. Thus INotify will only be aware of modifications to the namespace, not accesses.
Source: HDFS-6634, the JIRA that originally added this feature.

Related

How to write Parquet files on HDFS using C++?

I need to write in-memory data records to HDFS file in Parquet format using C++ language. I know there is a parquet-cpp library on github but i can't find example code.
Could anybody share copy or link to example code if you have any? Thanks.
There are examples for parquet-cpp in the github repo in the examples directory. They just deal with Parquet though, and do not involve HDFS access.
For HDFS access from C++, you will need libhdfs from Apache Hadoop. Or you may use Apache Arrow, which has HDFS integration, as desribed here.

How to write to ORC files using BucketingSink in Apache Flink?

I'm working on a Flink streaming program that reads kafka messages and dump the messages to ORC files on AWS s3. I found there is no document about the integration of Flink's BucketingSink and ORC file writer. and there is no such an ORC file writer implementation can be used in BucketingSink.
I'm stuck here, any ideas?
I agree, a BucketingSink writer for ORC files would be a great feature. However, it hasn't been contributed to Flink yet. You would have to implement such a writer yourself.
I'm sure the Flink community would help designing and reviewing the writer, if you would consider contributing it to Flink.

WSO2 CEP export event streams, event receivers etc

I am new in WSO2 Cep and I would like to ask if anybody knows that how can I export the event streams, event receivers, etc. which I have create in order to have a backup, from the WSO2 CEP Management console?
Thanks in advance!
CEP deployable artifacts that are created from Management Console are stored in the file system. You can find them in <CEP_HOME>/repository/deployment/server folder.
Event Streams can be found in directory eventstreams
Event Receivers can be found in directory eventreceivers
Event Publishers can be found in directory eventpublishers
Execution Plans can be found in directory executionplans
Then, just create a backup of above directories.
When you create an artifacts from management console they are persisted in file system under CEP_HOME/repository/deployment/server. Under this directory you can find event streams, receivers, executions plans, etc. So you can take a backup of this folder to make a backup of your current artifacts.
Regards

WSO2 CEP STREAMS

I create input and output streams in my WSO2 CEP (v3.1.0) with event formatter and event builder as well. I need to find out where this streams are created in WSO2 CEP catalog structure, becasue I can't find it beyond event builder and formatter (wso2cep-3.1.0\repository\deployment\server).
Has anyone know where I can find this streams files?
Kacu
I managed to load streams via xml (only during startup), by modifying the stream-definitions.xml file in this folder wso2cep-3.1.0/repository/conf/data-bridge.
You can take a look at this page in the documentation for more details, just keep in mind that the location written in the documentation doesn't match what I found in the server.
In CEP 3.1.0, event streams are stored in the registry (which comes with CEP), it is not stored in filesystem. Streams can be located under governance section of the registry (See streamdefinitions sub directory)..
Regards,
Mohan

How to put input file automatically in hdfs?

In Hadoop we always putting input file manually through -put command. Is there any way we can automate this process ?
There is no automated process of inputing a file into the Hadoop filesystem. However, it is possible to -put or -get multiple files with one command.
Here is the website for the Hadoop shell commands
http://hadoop.apache.org/common/docs/r0.18.3/hdfs_shell.html
I am not sure how many files you are dropping into HDFS, but one solution for watching for files and then dropping them in is Apache Flume. These slides provide a decent intro.
You can thing of automatic this process with Fabric library and python. Write hdfs put command in a function and you can call it for multiple file and perform same operations of multiple hosts in network. Fabric should be really helpful to automate in your scenario.