The Apache Nifi state is not persisted to zookeeper - state

According to the official Nifi documentation, the state allows Nifi processors to "resume from the place where it left off after NiFi is restarted. Additionally, it allows for a Processor to store some piece of information so that the Processor can access that information from all of the different nodes in the cluster".
If my understanding is good, when we configure a zookeeper Provider, the state will not be persisted locally, instead, the data will be sent to zookeeper.
I've explored the zookeeper znodes and could not find any data related to the state, all I can find are the informations about the Coordinator and Primary nodes. However, the local state directory is still filled.
The configuration is very simple, I've 3 external ZK nodes and 3 Nifi instances.
Here is an exerpt of the nifi.properties file:
nifi.cluster.is.node=true
nifi.zookeeper.connect.string=zk-node1:2181,zk-node2:2181,zk-node3:2181
nifi.state.management.embedded.zookeeper.start=false
nifi.state.management.provider.cluster=zk-provider
And here is an exerpt of the state-management.xml file:
<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name="Connect String">zk-node1:2181,zk-node2:2181,zk-node3:2181</property>
<property name="Root Node">/nifi</property>
<property name="Session Timeout">10 seconds</property>
<property name="Access Control">Open</property>
</cluster-provider>
When I try to ls the Zookeeper, I can see only 2 znodes: "components" but this znode is empty and the "leaders" zonde which contain some data about the Nifi Coordinator and Primary Nodes.
Also, when I explore the transactions logs, even after using some load balanced connections, I cannot find anything related to the Nifi State.
Could somebody explain what data goes the Zookeeper and why the local state directory is still filled even if we configure the zk provider ?
Thanks.

It depends on the processor, some cases it would never make sense to store cluster wide state because it could never be picked up by another node. For example, ListFile tracking from a local directory, another node cannot access the same directory so storing this state in ZK is not helpful.
There is always a local state provider in a write-ahead-log in the state directory, and it is up to the processor to say whether it should be cluster or local state when storing it.
The documentation for each processor should say how the state is stored. For example, from ListFile:
#Stateful(scopes = {Scope.LOCAL, Scope.CLUSTER}, description = "After performing a listing of files, the timestamp of the newest file is stored. "
+ "This allows the Processor to list only files that have been added or modified after "
+ "this date the next time that the Processor is run. Whether the state is stored with a Local or Cluster scope depends on the value of the "
+ "<Input Directory Location> property.")
If Input Directory Location is "remote" then it will use cluster state, otherwise local state.

Related

How the Hadoop History Server is working?

There are 2 properties within configuration files I am confused with:
The property yarn.nodemanager.remote-app-log-dir in yarn-site.xml:
a.) This property controls, where the logs of map/reduce tasks will be logged?
b.) This is the responsibility of Node Manager (NM)?
The property mapreduce.jobhistory.done-dir from mapred-site.xml:
a.) Job related files like configurations etc. are stored in this location?
b.) This is the responsibility of Application Master (AM)?
Does the History Server (HS) combines both of these information and shows a consolidated information in UI?
Assuming you have enabled log-aggregation,
1.a. This is the log-aggregation dir, usually HDFS where NMs aggregate container-logs to.
1.b. Yes.
2.a. Yes.
2.b. No. MR JobHistory Server will do that, by deleting JobSummary file and mv other files to ${mapreduce.jobhistory.done-dir} from ${mapreduce.jobhistory.intermediate-done-dir}.
3. Yes. MR JobHistory Server Web, includes job info(from ${mapreduce.jobhistory.done-dir}) and container logs(from ${yarn.nodemanager.remote-app-log-dir}).

Is it possible to edit configuration nodes in a Node-Red flow?

In Node-Red, I'm using some Amazon Web Services nodes (from module node-red-node-aws), and I would like to read some configuration settings from a file (e.g. the access key ID & the secret key for the S3 nodes), but I can't find a way to set everything up dynamically, as this configuration has to be made in a config node, which can't be used in a flow.
Is there a way to do this in Node-Red?
Thanks!
Unless a node implementation specifically allows for dynamic configuration, this is not something that Node-RED does generically.
One approach I have seen is to have a flow update itself using the admin REST API into the runtime - see https://nodered.org/docs/api/admin/methods/post/flows/
That requires you to first GET the current flow configuration, modify the flow definition with the desired values and then post it back.
That approach is not suitable in all cases; the config node still only has a single active configuration.
Another approach, if the configuration is statically held in a file, is to insert them into your flow configuration before starting Node-RED - ie, have a place-holding config node configuration in the flow that you insert the credentials into.
Finally, you can use environment variables: if you set the configuration node's property to be something like $(MY_AWS_CREDS), then the runtime will substitute that environment variable on start-up.
You can update your package.json start script to start Node-RED with your desired credentials as environment variables:
"scripts": {
"start": "AWS_SECRET_ACCESS_KEY=<SECRET_KEY> AWS_ACCESS_KEY_ID=<KEY_ID> ./node_modules/.bin/node-red -s ./settings.js"
}
This worked perfect for me when using the node-red-contrib-aws-dynamodbnode. Just leave the credentials in the node blank and they get picked up from your environment variables.

Syncing seconday user store in WSO2 Identity Server cluster

I have setup the cluster for WSO2-IS (2 instances on different machines) based on the information provided here - https://docs.wso2.com/display/CLUSTER44x/WSO2+Clustering+and+Deployment+Guide
Setup DB with a user store, shared registry, 2 local registries
Copied the DB driver jar to component lib
Updated the master-datasource.xml
Updated the registry.xml (made sure the master is read-only false and worker is read-only true)
Updated the AXIS2.xml and used WKA for membership scheme
Performed other changes as suggested in the link
Started the master with -Dsetup option and the worker without -Dsetup option.
Verified that the governance folder is shown as a symlink
I can see the interaction between both the nodes, there are Hazelcast messages related to node joining when the worker is started.
User created in 1 is able to login to the other instance, service provider are also automatically available when viewed through UI.
The problem is that when I create a secondary user store (JDBC) in the first node and goto the list in the second node - the secondary user store is not present and I cannot view the users in the user list too.
Am I missing something or is it the way the cluster is supposed to perform i.e. secondary user stores have to be shared in some other way?
Thanks,
Vikas
Secondary user store configurations are not synced between two nodes by default. Once you create a secondary user store from UI, it will create a file in following location.
[WSO2_IS]/repository/deployment/server/userstores/
These configuration file need to copy by manually or have to use some synchronization mechanism to copy file to other node. since this is not a frequent task better to copy this file.
Fore more information
https://docs.wso2.com/display/IS500/Configuring+Secondary+User+Stores

how to config governance registry and APIM to share registry

all. I have some confusion about the registry.
1.the remote registry mount is done like this
in [1]
but it is done like
in [2] with the port NO. and /registry. are they the same??
2.I gonna install apim and IS and GREG, apim and IS should share their infomation, so that when a new tenant is registered in apim, IS should be able to use this new tenant too. My question is whether both the config and governance of both server should be configurated to GREG? because I don't know which (config or governance) folder contains the user resources?
[1] http://docs.wso2.org/display/CLUSTER420/Clustering+API+Manager
[2] http://docs.wso2.org/display/Governance453/Governance+Partition+in+a+Remote+Registry
To answer your second question, If you wanted to share same tenant information then you need to share user store and the realm database. One way of doing that is, pointing to same ldap from all nodes and refering central database for realm.
By default, IS has embedded ldap and if you do not have external/central ldap or user database that embedded ldap of IS can be use as the central user store. To do so, copy element in <IS_HOME>/repository/conf/user-mgt.xml and replace the element of user-mgt.xml in other nodes. You need to change ConnectionURL appropriately with hostname and port. If you have started IS without any port offset and all nodes run on same server, you can use
<Property name="ConnectionURL">ldap://localhost:10389</Property>
or, if the IS run on some other machine having the ip 192.168.33.66 and start with portoffset 1, connection url would be like follows,
<Property name="ConnectionURL">ldap://192.168.33.66:10380</Property>
And you need to share registry database also. To do so create a central database and create a datasource with a name like WSO2_REALM_DB and refer to jndi name of that datasource in user-mgt xml by changing the property <Property name="dataSource">jdbc/WSO2_REALM_DB</Property>. From this post you can find steps to create database and configuring datasource for that db.
Its not clear on your first question but regarding registry mounting in general,
Local partition is used to store resources specific to a node. And its
usually do not shared among nodes.(ie. APIM node 1 may have local registry 1 and APIM node 2 have local registry 2 while IS node 1 have local registry 3)
Config partition is used to store resources for specific product and its usually shared among nodes in same cluster. (ie. APIM node 1 may have config registry 1 and APIM node 2 also points to config registry 1 while IS node 1 have config registry 2)
Governance partition is used to share resources among different products.(ie. APIM node 1, APIM node 2 and IS node 1 all points to same gov registry 1)
You can more detailed explanation in this article.

Hibernate running on separate JVM fail to read

I am implementing WebService with Hibernate to write/read data into database (MySQL). One big issue I have was when I insert data (e.g., USER table) via one JVM (example: JUNit test or directly from DBUI suite) successfully, my WebService's Hibernate running on separate JVM cannot find this new data. They all point to the same DB server. It is only if I had destroyed the WebService's Hibernate SessionFactory and recreate it, then the WebService's Hibernate layer can read the new inserted data. In contrast, the same JUnit test or a direct query from DBUI suite can find the inserted data.
Any assistance is appreciated.
This issue is resolved today with the following:
I changed our Hibernate config file (hibernate.cfg.xml) to have Isolation Level to at least "2" (READ COMMITTED). This immediately resolved the issue above. To understand further about this isolation level setting, please refer to these:
Hibernate reading function shows old data
Transaction isolation levels relation with locks on table
I ensured I did not use 2nd level caching by setting CacheMode to IGNORE for each of my Session object:
Session session = getSessionFactory().openSession();
session.setCacheMode(CacheMode.IGNORE);
Reference only: Some folks did the following in hibernate.cfg.xml to disable their 2nd level caching in their apps (BUT I didn't need to):
<property name="cache.provider_class">org.hibernate.cache.internal.NoCacheProvider</property>
<property name="hibernate.cache.use_second_level_cache">false</property>
<property name="hibernate.cache.use_query_cache">false</property>