Chainlink node: What to do when transactions are pending? - blockchain

I have a chainlink node, and there are transactions that seem to be stuck. How can I fix pending outgoing confirmations?

Most often, you did not fund your chainlink node account with gas. Go to your configration and grab the ACCOUNT_ADDRESS and send ETH to that address.
The second most common is that you're using an outdated version of the chainlink node. Please use 0.9.4 or higher.
For pending Outgoing Confirmations:
You did not fund your chainlink node account with gas.
The ETH chain is clogged up.
Your MIN_OUTGOING_CONFIRMATIONS variable in your .env is too high (it gets set to a default)
If you see pending transactions at your oracle contracts address then it is likely #2. If you don't see any, it's likely #1.
If #1, you can just send some ETH to your node and it should be fine. You can find your node's address ACCOUNT_ADDRESS in the configuration page of your Chainlink GUI.
If #2 you can either:
You can delete the pending transactions from your database and hope everything clears up (This is an OK solution, hopefully smart contracts have a retry parameter for dealing with you data. But if your node stays stuck, then NO one will be able to get data. )
DELETE FROM job_runs WHERE status = 'pending_outgoing_confirmations';
DELETE FROM tx_attempts WHERE confirmed = 'f';
If you need those transactions to go through, you can rebroadcast the transactions with more ETH gas to push them through faster. This is a little trickier to do. The Chainlink nodes have a built in system to push transactions through without you having to do anything, so hopefully they will kick in.
If #3
Set or change your MIN_OUTGOING_CONFIRMATIONS and restart the node.
For Pending Incoming Confirmations:
This is likely due to network congestion or you're working on a network that doesn't have many transactions. An easy fix is to reduce the number of incoming confirmations and restart your node.
In your .env file, add or set the following:
MIN_INCOMING_CONFIRMATIONS=0

Related

CHAINLINK NODE - Your node is overloaded and may start missing jobs ERROR

Running a test node in GCP, using Docker 9.9.4, Ubuntu, Postgres db, Infura. I had issues with public/private IP, but once I cleared that up my node is up and running. I am now throwing the error below repeatedly, potentially due to the blockchain connection. How do I fix this?
[ERROR] HeadTracker: dropping head 26085153 with hash 0xf50e19099b7e343829935d70dd7d86c5bc0398286b7a4e4f32ac033ac60c3733 because queue is full. WARNING: Your node is overloaded and may start missing jobs. logger/default.go:155 stacktrace=github.com/smartcontractkit/chainlink/core/logger.Errorf
This log output is related to an overload of your blockchain connection.
This notification is usually related to the usage of public websocket connections and/or free third party NaaS Provider. To fix this connection issue you can either run an own full node or change the tier or the third party NaaS provider. Also it is recommended to use Chainlink version 0.10.8 or higher, as the HeadTracker has been revised here and performs more efficient.
In regard to the question let me try to give you a small technical overview, which may clarify the payload of a Chainlink node to it's remote full node:
Your Chainlink node establishes a connection to a full node. There the Chainlink node initiates various subscriptions, which are a special feature of the websocket protocol to enable bidirectional communication. More precisely, this means that the Chainlink node is informed if a certain "state" of the subscription changes. Basically, the node interacts with using JSON-RPC methods and uses the following methods to initiate and process various functions internally:
eth_getBlockByNumber,eth_getBalance,eth_getTransactionReceipt,eth_getTransactionCount,eth_getLogs,eth_subscribe,eth_unsubscribe,eth_sendRawTransaction and eth_Call
https://ethereum.org/uk/developers/docs/apis/json-rpc/
The high amount of interactions of the Chainlink node are especially executed during the syncing process via the internal HeadTracker service. This service initiates a "head" subscription in order to interact with every single incoming new blockheader.
During this syncing process it uses the JSON-RPC methods eth_GetBlockByNumber and eth_getBalance to get all the necessary information from the block. So these two methods are used/ executed every block. The number of requests now depends on the average blocktime of the network the Chainlink node is connected to
An example would be the Kovan Testnet:
The avg. blocktime here is 6.7sec, which means you get a daily request number of approx. 21.000
During fulfilling job requests, those request also includes following methods: eth_getTransactionReceipt, eth_sendRawTransaction, eth_getLogs, eth_subscribe, eth_unsubscribe, eth_getTransactionCount and eth_call, which increases the total number significantly depending on the number of job requests.
It should also be noted that especially with faster blockchains (e.g. polygon) there is a very high payload of the WebSocket and you have to deal with a good full node connection in detail, as many full nodes do not receive such a high number of requests permanently.

New transaction in blockchian network

I am new to blockchain technology and have a basic question.
I understand that in any blockchain network, if any node tries to commit something which is not in sync with other nodes , it gets rejected.Then how the new transaction is commited and validated? Who has authority to do it.
That's the thing about blockchain. There is no authority that determines which block will get added to the chain. And by blockchain, I mean public blockchain.
Blockchain's typically are either public or permissioned.
Public
Public blockchain's, such as bitcoin and ethereum work on the principle of proof of work. In layman's terms, if any participant wants there transaction to be processed, i.e added to the chain, they submit it to the network. This transaction is then processed by independent entities called miners who have to solve a computational puzzle in order to produce a valid block, which if accepted results in compensation in form of the said digital currency for the work put in by the miner. Also, the longest chain is always accepted as the valid chain.
There is absolutely no criteria or organisation that overlooks mining, meaning anybody can become a miner and start contributing. So the network is for the people, by the people, anybody can join and both submit as well as process transactions.
If the transaction is valid, that is you own the coin and are not double spending it, it will be processed by a miner. And if the block produced by the miner is accepted, so is your transaction.
Private/Permissioned
On the other hand, in case of private/permissioned blockchain's like hyperledger fabric for example, participation and block processing is decided by a single or multiple organisations. Hence in this case, a block is processed only if it produced by a valid member and it is endorsed by nodes of all participating organizations.
As you said "if any node tries to commit something which is not in sync with other nodes" what I get is that you are asking about the block which one node produce but rejected by the blockchain. This scenario happens when 2 nodes try to find the proof of work and one node finds it first and broadcast to the network but due to network delay (there could be other reasons too for that), and the other node didn't get the block in this way stale/uncle blocks are created. Bitcoin blockchain considers the longest blockchain and discards the other.

How does Ordering Nodes Synchronization work?

How can adding a new orderer download the ledger as ordering nodes are not connected with each other and kafka keeps messages only for 7 days.
And also if I shut down a orderer node for more than 7 days and if I bring it up again then it will not find the transactions that happened in those 7 days in kafka partition therefore how will it sync and update it's local ledger.
In 1.0, Kafka brokers are to be set with log.retention.ms = -1 (source: documentation, Step 4e).
This disables time-based retention and prevents segments from expiring. This means that:
A partition hosts the entire transaction history of the channel.
A new orderer service node (OSN) can be added at any point in time and use the Kafka brokers to sync up with all channels in their entirety.
In a minor release within the 1.x track we will support ledger pruning for the OSNs. This means that the brokers will only need to maintain a pruned sequence of the transaction history (that will always start from a configuration block), and any new OSN will only be able to sync back to that configuration block.

Can Akka Cluster Client Send Messages to Cluster Nodes Not in Initial Contacts?

Using Akka 2.3.14, I'm trying to create an Akka cluster of various services. Until now, I have had all my "services" in one artifact that was clustered across multiple nodes, but now I am trying to break this artifact into multiple services that all exist on the same cluster.
So in breaking this up, we've designed it so that any node on the cluster will first try to connect to the seed nodes. If there is no seed node, it will look to see if it is a candidate to run as a seed node (if it's on the same host that a seed node can be on) in which case it will grab the an open seed node port and become a seed node. So in this sense, any service in the cluster can become the seed node.
At least, that was the idea. Our API into this system running as a separate service implements a ClusterClient into this system. The initialContacts are set to be the same as the seed nodes. The problem is that the only receptionist actors I can send a message to through the ClusterClient are the actors on the seed nodes.
Here is an example if it helps. Let's say I have a String Service and a Double Service, and the receptionist for each service is a StringActor and a DoubleActor respectively. Now lets say I have a Client Service which sends StringMessages and DoubleMessages to the StringActor and DoubleActor
So for simplicity, let's say I have two nodes, server1 and server2 then:
seed-nodes = ["akka.tcp://system#server1:2773", "akka.tcp://system#server2:2773"]
My ClusterClient would be initialize like so:
system.actorOf(
ClusterClient.props(
Set(
system.actorSelection("akka.tcp://system#server1:2773/user/receptionist"),
system.actorSelection("akka.tcp://system#server2:2773/user/receptionist")
)
),
"clusterClient"
)
Here are the scenarios that are happening for me:
If the StringServices start up on both servers first, then DoubleMessages from the Client Service just disappear into the ether.
If the DoubleServices start up on both servers first, then StringMessages from the Client Service just disappear into the ether.
If the StringService starts up first on serverX and the DoubleService starts up first on serverY, then all StringMessages will be sent to serverX and all DoubleMessages will be sent to serverY, which is not as bad as the above case, but it means it's not really scaling.
This isn't what I expected, it's possible it's just a defect in my code, so I would like to know if this IS expected behavior or not. And if not, then is there another Akka concept that could help me with this?
Arguably, I could just make one service type my entry point, like a RoutingService that could accept StringMessages or DoubleMessages, and then send that to the correct service. But if the Client Service can only send messages to the RoutingService instances that are in the initial contacts, then I can't dynamically scale the RoutingService because no matter how many nodes I add the Client Service can only send to the initial contacts.
I'm also thinking about subscribing to ClusterEvents in my Client Service and seeing if I can add and remove initial contacts from my cluster client as nodes are started up in the cluster, but I'm not sure if this is possible, and it feels like there should be a better solution.
This is what I found out upon more troubleshooting, in case it helps anyone else:
The ClusterClient will attempt to connect to the initial contacts in order, and then only sends it's messages across that connection. If you are deploying different services on each node, you will have problems as the messages sent from the ClusterClient will only be sent to the node that it makes its connection to. In this way, you can think of the ClusterClient a legitimate client, it will connect to a URL that you give it, and then continue to communicate with the server through that URL.
Reading the Distributed Workers example, I realized that my Frontend, or in this case my routing service, should actually be part of the cluster, rather than acting as a client. For this I used the DistributedPubSub method instead.

Akka Cluster remove heartbeat connection message

What does the INFO message of
FailureDetector(akka://MyCluster) - Remove heartbeat connection [akka://MyCluster#127.0.0.1:35250]
in an Akka cluster mean? I can't seem to find anything in the documentation. I'm seeing this a fair bit when running lots of JVMs with actors on a test machine, but not sure if it's a bad sign requiring some kind of Akka or Linux tuning.
Akka 2.1.4 on Oracle JDK 1.7
Update:
Having followed #cmbaxter's advice, I investigated options for tuning heartbeats. I found that increasing/decreasing the timings associated with heartbeats had no effect on the presence of the 'Remove heartbest connection' messages. However, I noticed the 'monitored-by-nr-of-members' configuration setting. I now believe the messages indicate that monitoring of heartbeats from a particular node is being passed from one ActorSystem to another. Hence they indicate the current system simply stating that it's no longer it's own responsibility, rather than indicating any kind of connectivity warning. Indeed, during system start-up the first node recieves a heck of a lot of 'First heartbeat's but then removes most of them, as per the 'monitored-by-nr-of-members' setting, as the load is passed to other nodes.
The message you are seeing is coming from the AccrualFailureDetector class in Akka. According to the docs:
The nodes in the cluster monitor each other by sending heartbeats to detect if a
node is unreachable from the rest of the cluster. The heartbeat arrival times is
interpreted by an implementation of The Phi Accrual Failure Detector.
My guess here is that a cluster node (running locally, on port 35250) has become unreachable enough times that it has been deemed to no longer be part of the cluster. When that happens, the heartbeat check to that node is removed and thus you see this message. If you believe that this node was not unreachable and thus should not have been removed from the cluster heartbeat, then you might have an issue. Take a look at the Cluster Docs here under the Failure Detector section for more info on how to tune the failure detection.