How to modify blocktime of a running private ethereum network - blockchain

I have a private Ethereum network that is running on geth 1.8 using PoA consensus. It consists of two nodes - one sealer node and one bootnode/RPC API node. When I created the genesis file I set the blocktime to 3s but it generates too much data this way and I want to set it to ~10s. How can I do this without loosing previous transactions and data?

Once you have started with Block Time then it will be fixed forever in PoA consensus. There is no command line option for it. In genesis of clique (geth implimention of PoA), we can see "period": 3 (3 second) in
"clique": {
"period": 3,
"epoch": 30000
}
I think you are aware of this now so unless you change in current protocol on how to cope with blockchain data on changing block time or how to change the block time, you have no other option till now.

You will have to change the genesis block with the new seconds and restart the network. There is no way to do this once the network is started.
Note: Of course the network will lose all data.

Related

Clickhouse 2 Nodes Cluster (Keeper Replication)

Is a 2 nodes Clickhouse setup possible? If yes would it be bad or unreasonable?
Most tutorials require a 3 node setup. For example the official documentation:
https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper
I found this example with a two node setup and not sure if it's a good idea.
https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper/
The alternative would be to use the second server for backups (with https://github.com/AlexAkulov/clickhouse-backup) but would prefer a replication setup.
Thanks for any hint!
3 clickhouse-keeper nodes require for avoid split brain situation when connection between servers lost and each server will think - i'm leader
so, you just can setup two nodes clickhouse-server + 1 separatelly clickhouse-keeper
and use Engine=ReplicatedMergeTree
I found this example with a two node setup and not sure if it's a good idea.
I made this example. Some people use this in production, and they are happy, though their ingestion rate is very low.
2 nodes is not a problem ( split brain is impossible with Keeper and Zookeeper ). But with 2 nodes of Keeper or Zookeeper you don't HighAvailability, if one node on Keeper is down, the second node stops to serve requests and Replicated tables in ClickHouse became readonly on all replicas.
Embedded Keeper or Keeper on the same server with ClickHouse is a bad idea if you have heavy loaded system with a high ingestion rate. ClickHouse server utilizes all CPU and all I/O, so Keeper or Zookeeper produce timeout errors because they starve for CPU and especially I/O (disk is too slow for them).

CHAINLINK NODE - Your node is overloaded and may start missing jobs ERROR

Running a test node in GCP, using Docker 9.9.4, Ubuntu, Postgres db, Infura. I had issues with public/private IP, but once I cleared that up my node is up and running. I am now throwing the error below repeatedly, potentially due to the blockchain connection. How do I fix this?
[ERROR] HeadTracker: dropping head 26085153 with hash 0xf50e19099b7e343829935d70dd7d86c5bc0398286b7a4e4f32ac033ac60c3733 because queue is full. WARNING: Your node is overloaded and may start missing jobs. logger/default.go:155 stacktrace=github.com/smartcontractkit/chainlink/core/logger.Errorf
This log output is related to an overload of your blockchain connection.
This notification is usually related to the usage of public websocket connections and/or free third party NaaS Provider. To fix this connection issue you can either run an own full node or change the tier or the third party NaaS provider. Also it is recommended to use Chainlink version 0.10.8 or higher, as the HeadTracker has been revised here and performs more efficient.
In regard to the question let me try to give you a small technical overview, which may clarify the payload of a Chainlink node to it's remote full node:
Your Chainlink node establishes a connection to a full node. There the Chainlink node initiates various subscriptions, which are a special feature of the websocket protocol to enable bidirectional communication. More precisely, this means that the Chainlink node is informed if a certain "state" of the subscription changes. Basically, the node interacts with using JSON-RPC methods and uses the following methods to initiate and process various functions internally:
eth_getBlockByNumber,eth_getBalance,eth_getTransactionReceipt,eth_getTransactionCount,eth_getLogs,eth_subscribe,eth_unsubscribe,eth_sendRawTransaction and eth_Call
https://ethereum.org/uk/developers/docs/apis/json-rpc/
The high amount of interactions of the Chainlink node are especially executed during the syncing process via the internal HeadTracker service. This service initiates a "head" subscription in order to interact with every single incoming new blockheader.
During this syncing process it uses the JSON-RPC methods eth_GetBlockByNumber and eth_getBalance to get all the necessary information from the block. So these two methods are used/ executed every block. The number of requests now depends on the average blocktime of the network the Chainlink node is connected to
An example would be the Kovan Testnet:
The avg. blocktime here is 6.7sec, which means you get a daily request number of approx. 21.000
During fulfilling job requests, those request also includes following methods: eth_getTransactionReceipt, eth_sendRawTransaction, eth_getLogs, eth_subscribe, eth_unsubscribe, eth_getTransactionCount and eth_call, which increases the total number significantly depending on the number of job requests.
It should also be noted that especially with faster blockchains (e.g. polygon) there is a very high payload of the WebSocket and you have to deal with a good full node connection in detail, as many full nodes do not receive such a high number of requests permanently.

Akka design: How to add/remove routee from cluster aware router dynamically

I have the following use case and I am not sure if the akka toolkit provide this out of the box:
I have a number of nodes (instance/machine) that can run a finite number of long running task in the background and cannot accept more work while at max capacity.
Each instance can only process 50 tasks.
All instances are behind a load balancer.
Each task can respond to messages from the client who initiated the task, since the client sends the messages via the load balancer the instances need to route it to the correct instance that handles the task.
I have tried initially cluster sharding, but there doesn't seem to be a way to cap the maximum number of shard regions/actors per node (= #tasks).
Then I tried it with a cluster aware router, which acts as a guard for accepting or rejecting work. This seems to work reasonable well, one problem is that once it reaches capacity I need to remove it as a routee and add it back once it has capacity again.
Is there something out of the box that supports this use case or should I carry on with the routing option and if so how can I achieve this?
I'll update the description if you have further questions or something is unclear.
Your scenario sounds like a good fit for the work pulling pattern. The gist of this pattern is:
A master actor coordinates units of work among a number of worker actors.
Workers register themselves to the master, meaning that workers can be added or removed dynamically.
When the master receives work to be done, the master notifies the workers that work is available. Workers pull units of work when they're ready, do what needs to be done with their respective units of work, then ask the master for more work when they're finished.
To learn more about this pattern, read the following (the first two links are listed in the Akka documentation):
The original post (by Derek Wyatt): http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2
A follow-on post (by Michael Pollmeier): http://www.michaelpollmeier.com/akka-work-pulling-pattern
An application of the pattern in a clustered environment with a cluster-aware router (by Ryan Tanner): https://www.conspire.com/blog/2013/10/akka-at-conspire-part-5-the-importance-of/

Elasticsearch percolation dead slow on AWS EC2

Recently we switched our cluster to EC2 and everything is working great... except percolation :(
We use Elasticsearch 2.2.0.
To reindex (and percolate) our data we use a separate EC2 c3.8xlarge instance (32 cores, 60GB, 2 x 160 GB SSD) and tell our index to include only this node in allocation.
Because we'll distribute it amongst the rest of the nodes later, we use 10 shards, no replicas (just for indexing and percolation).
There are about 22 million documents in the index and 15.000 percolators. The index is a tad smaller than 11GB (and so easily fits into memory).
About 16 php processes talk to the REST API doing multi percolate requests with 200 requests in each (we made it smaller because of the performance, it was 1000 per request before).
One percolation request (a real one, tapped off of the php processes running) is taking around 2m20s under load (of the 16 php processes). That would've been ok if one of the resources on the EC2 was maxed out but that's the strange thing (see stats output here but also seen on htop, iotop and iostat): load, cpu, memory, heap, io; everything is well (very well) within limits. There doesn't seem to be a shortage of resources but still, percolation performance is bad.
When we back off the php processes and try the percolate request again, it comes out at around 15s. Just to be clear: I don't have a problem with a 2min+ multi percolate request. As long as I know that one of the resources is fully utilized (and I can act upon it by giving it more of what it wants).
So, ok, it's not the usual suspects, let's try different stuff:
To rule out network, coordination, etc issues we also did the same request from the node itself (enabling the client) with the same pressure from the php processes: no change
We upped the processors configuration in elasticsearch.yml and restarted the node to fake our way to a higher usage of resources: no change.
We tried tweaking the percolate and get pool size and queue size: no change.
When we looked at the hot threads, we DiscovereUsageTrackingQueryCachingPolicy was coming up a lot so we did as suggested in this issue: no change.
Maybe it's the amount of replicas, seeing Elasticsearch uses those to do searches as well? We upped it to 3 and used more EC2 to spread them out: no change.
To determine if we could actually use all resources on EC2, we did stress tests and everything seemed fine, getting it to loads of over 40. Also IO, memory, etc showed no issues under high strain.
It could still be the batch size. Under load we tried a batch of just one percolator in a multi percolate request, directly on the data & client node (dedicated to this index) and found that it used 1m50s. When we tried a batch of 200 percolators (still in one multi percolate request) it used 2m02s (which fits roughly with the 15s result of earlier, without pressure).
This last point might be interesting! It seems that it's stuck somewhere for a loooong time and then goes through the percolate phase quite smoothly.
Can anyone make anything out of this? Anything we have missed? We can provide more data if needed.
Have a look at the thread on the Elastic Discuss forum to see the solution.
TLDR;
Use multiple nodes on one big server to get better resource utilization.

Amazon Elasticache Failover

We have been using AWS Elasticache for about 6 months now without any issues. Every night we have a Java app that runs which will flush DB 0 of our redis cache and then repopulate it with updated data. However we had 3 instances between July 31 and August 5 where our DB was successfully flushed and then we were not able to write the new data to the database.
We were getting the following exception in our application:
redis.clients.jedis.exceptions.JedisDataException:
redis.clients.jedis.exceptions.JedisDataException: READONLY You can't
write against a read only slave.
When we look at the cache events in Elasticache we can see
Failover from master node prod-redis-001 to replica node
prod-redis-002 completed
We have not been able to diagnose the issue and since the app was running fine for the past 6 months I am wondering if it is something related to a recent Elasticache release that was done on the 30th of June.
https://aws.amazon.com/releasenotes/Amazon-ElastiCache
We have always been writing to our master node and we only have 1 replica node.
If someone could offer any insight it would be much appreciated.
EDIT: This seems to be an intermittent problem. Some days it will fail other days it runs fine.
We have been in contact with AWS support for the past few weeks and this is what we have found.
Most Redis requests are synchronous including the flush so it will block all other requests. In our case we are actually flushing 19m keys and it takes more then 30 seconds.
Elasticache performs a health check periodically and since the flush is running the health check will be blocked, thus causing a failover.
We have been asking the support team how often the health check is performed so we can get an idea of why our flush is only causing a failover 3-4 times a week. The best answer we can get is "We think its every 30 seconds". However our flush consistently takes more then 30 seconds and doesn't consistently fail.
They said that they may implement the ability to configure the timing of the health check however they said this would not be done anytime soon.
The best advice they could give us is:
1) Create a completely new cluster for loading the new data on, and
instead of flushing the previous cluster, re-point your application(s)
to the new cluster, and remove the old one.
2) If the data that you are flushing is an update version of the data,
consider not flushing, but updating and overwriting new keys?
3) Instead of flushing the data, set the expiry of the items to be
when you would normally flush, and let the keys be reclaimed (possibly
with a random time to avoid thundering herd issues), and then reload
the data.
Hope this helps :)
Currently for Redis versions from 6.2 AWS ElastiCache has a new feature of thread monitoring. So the health check doesn't happen in the same thread as all other actions of Redis. Redis can continue to proceed a long command / lua script, but will still considered healthy. Because of this new feature failovers should happen less.