Erlang - How is the creation integer (a part of a distributed pid representation ) actually created? - concurrency

In a distributed Erlang system pids can have two different representations: i) internal; ii) external.
The internal representation has the following shape: < A.B.C >
The external representation, used for instance when a message has to travel across different nodes, is instead composed of the following elements: < node_id, ID, serial, creation > according to the official documentation.
Where node_id is the name of the node, ID and serial identify the process on node_id and creation is an integer used to distinguish the node from past (crashed) version of itself.
What I could not find is how the creation integer is created by the VM.
By setting a small experiment on my PC, I have seen that if I create and kill the same node several times the counter is always increased by 1, and by creating the same node on different machines, the creation integers are different, but have some similarities in their structure, for instance:
machine 1 -> creation integer = 1647595383
machine 2 -> creation integer = 1647596018
Do any of you have any knowledge about how this integer is created? If so could you please explain it to me and possibly reference some (more or less) official documentation?

The creation is sent as a part of the response to node registration in epmd, see details on that protocol.
If you have a custom erl_epmd module, you can also provide your own way of creating the creation-value.
The original creation is the local time of when the node with that name is first registered, and then it is bumped once for each time the name is re-registered.

Related

Rebalance Akka Cluster if One Of Shard Is not Resolving

Intermittently we are receiving following errors
2022-05-25 08:32:30,691 ERROR app=abc a.c.s.DDataShardCoordinator - The ShardCoordinator was unable to update a distributed state within ‘updating-state-timeout’: 2000 millis (retrying). Perhaps the ShardRegion has not started on all active nodes yet? event=ShardRegionRegistered(Actor[akka://application#10.52.174.4:25520/system/sharding/abcapp#-1665332307])
2022-05-25 08:32:31,348 WARN app=abc a.c.s.ShardRegion - abcapp: Trying to register to coordinator at [ActorSelection[Anchor(akka://application#10.52.103.132:25520/), Path(/system/sharding/abcappCoordinator/singleton/coordinator)]], but no acknowledgement. Total [22] buffered messages. [Coordinator [Member(address = akka://application#10.52.103.132:25520, status = Up)] is reachable.]
While we check cluster members by using /cluster/members we got “10.52.174.4:25520” this as
{
“node”: “akka://application#10.52.252.4:25520”,
“nodeUid”: “7353086881718190138”,
“roles”: [
“dc-default”
],
“status”: “Up”
},
Which says its healthy but problem resolves while we remove this node from the cluster using
/cluster/members/{address} (leave operation to remove 10.52.252.4 from cluster, once it’s removed cluster will create new pod and rebalance.
Need help to understand the best way of handling this error.
Thanks
You can of course implement an external control plane to parse logs and take a node exhibiting this error out of the cluster.
That said, it's better to understand what's happening here. The ShardCoordinator runs on the oldest node in the cluster, and needs to ensure that there's agreement on things like which nodes own which shards. It accomplishes this by requiring that updates be acknowledged by a majority of nodes in the cluster. If a state update isn't acknowledged, then further updates to the state (e.g. rebalances) are delayed.
I said "majority", but because in clusters where there's substantial node turnover relative to the size of the cluster simple majorities can lead to data loss, it becomes more complex. Consider a cluster of 3 nodes, N1, N2, N3. N1 (the ShardCoordinator) updates state and considers it successful when it and N3 have updated state. N1 is dropped from the cluster and replaced by N4; N2 becomes the shard coordinator (being the next oldest node) and requests state from itself and the other nodes; N4 responds first. The result becomes that the state update N1 made is lost. So two other settings come into play:
akka.cluster.coordinator-state.write-majority-plus (default 3) which adds that to the majority write requirement (rounding down)
akka.cluster.distributed-data.majority-min-cap (default 5) which requires that the majority plus the added nodes must be at least this
If the computed majority is greater than the number of nodes, the majority becomes all nodes. So in a cluster with fewer than 9 nodes with the defaults these become effectively all nodes (and the actual timeout when updating is a quarter of the configured timeout, to allow for three retries).
You don't say what your cluster size is, but if running in a cluster with fewer than 9 nodes, it can be a good idea to increase the akka.cluster.sharding.updating-state-timeout from the default 5 seconds to allow for the increased consistency level. Decreasing write-majority-plus and majority-min-cap can be an option, if you're willing to take the risks of violating cluster sharding's guarantees (e.g. multiple instances of the same entity running and potentially destroying their persistent state). Increasing the cluster size can also be helpful, paradoxically, if the reason other nodes are slow to respond is overload.

INET EnergyConsumer module's parameters new value never takes effect at runtime

I would request your help on this issue I'm facing on. Actually, I want to change the energyConsumer module's parameters at run time.
The scenario is that I should specify an amount of energy when the nodes are transmitting or receiving depending on the state of the radio.
I came through with the help of this piece of code. Thanks to that, the value of the designated parameters are well settled but the problem is that they never took effect as wished.
The parameterization of the energyConsumer module is s follow :
cModule *module = getGrandParentModule->getParentModule()->getSubmodule("node",myCHIndex);
cModule *energyConsumerCHModule = module->getSubmodule("wlan",0)->getSubmodule("radio")->getSubmodule("energyConsumer");
StateBasedEpEnergyConsumer *energconsumption = check_and_cast<StateBasedEpEnergyConsumer *>(energyConsumerCHModule);
cModule *chmodule = module->getSubmodule("generic")->getSubmodule("np");
Sim *chsim = check_and_cast<Sim *>(chmodule);
cPar *receivingPower = &energconsumption->par("receiverReceivingPowerConsumption");
EV_DEBUG <<"MY CLUSTER HEAD RECEIVING POWER IS : "<<receivingPower->doubleValue()<<endl;
EV_DEBUG <<"NEW VALUE CALCULATED : "<<((chsim->EnergyCH_RX)*bitrate)/(K)<<endl;
receivingPower->setDoubleValue(((chsim->EnergyCH_RX)*bitrate)/(K));
EV_DEBUG <<"MY CLUSTER HEAD RECEIVING POWER APPLIED IS : "<<receivingPower->doubleValue()<<endl;
EV_DEBUG <<"**************************************************************"<<endl;
cPar *transmittingPower = &energyconsumption->par("transmitterTransmittingPowerConsumption");
EV_DEBUG <<"MY CURRENT TRANSMITTING POWER IS : "<<&energyconsumption->par("transmitterTransmittingPowerConsumption")<<endl;
EV_DEBUG <<"NEW VALUE CALCULATED : "<<(EnergyToCH*bitrate)/(K)<<endl;
transmittingPower->setDoubleValue((EnergyToCH*bitrate)/(K));
EV_DEBUG <<"MY CURRENT TRANSMITTING POWER APPLIED IS : "<<&energyconsumption->par("transmitterTransmittingPowerConsumption")<<endl;
THE OUTPUT IS AS PER THE IMAGES :
But as you can see, no energy consumption happens (the energyStorage module still keeps the initial energy provided, 0.5J) despite there being some well-defined value.
THE NEXT IMAGE IS THE STATE OF THE ENERGY STORAGE MODULE:
Could you please advise on how should I process it?
Many thanks for your support...
The StateBasedEpEnergyConsumer module is not written in a way that it expects that certain parameters are changing during runtime (in fact almost none of the INET modules are). A parameter is usually read only once (usually during the initialization phase). After that point they usually store the value of the parameter in an internal member variable and they do not care about any change in the future. (you can check that in StateBasedEpEnergyConsumer.cc).
If you want to change the energy consumption values during runtime, you should write your own EnergyConsumer module by implementing the power::IEpEnergyConsumer interface. (or modify the current implementation by adding a handleParameterChange() method to store the actual new value in the member variable.

TopologyTestDriver with streaming groupByKey.windowedBy.reduce not working like kafka server [duplicate]

I'm trying to play with Kafka Stream to aggregate some attribute of People.
I have a kafka stream test like this :
new ConsumerRecordFactory[Array[Byte], Character]("input", new ByteArraySerializer(), new CharacterSerializer())
var i = 0
while (i != 5) {
testDriver.pipeInput(
factory.create("input",
Character(123,12), 15*10000L))
i+=1;
}
val output = testDriver.readOutput....
I'm trying to group the value by key like this :
streamBuilder.stream[Array[Byte], Character](inputKafkaTopic)
.filter((key, _) => key == null )
.mapValues(character=> PersonInfos(character.id, character.id2, character.age) // case class
.groupBy((_, value) => CharacterInfos(value.id, value.id2) // case class)
.count().toStream.print(Printed.toSysOut[CharacterInfos, Long])
When i'm running the code, I got this :
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 1
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 2
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 3
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 4
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 5
Why i'm getting 5 rows instead of just one line with CharacterInfos and the count ?
Doesn't groupBy just change the key ?
If you use the TopologyTestDriver caching is effectively disabled and thus, every input record will always produce an output record. This is by design, because caching implies non-deterministic behavior what makes itsvery hard to write an actual unit test.
If you deploy the code in a real application, the behavior will be different and caching will reduce the output load -- which intermediate results you will get, is not defined (ie, non-deterministic); compare Michael Noll's answer.
For your unit test, it should actually not really matter, and you can either test for all output records (ie, all intermediate results), or put all output records into a key-value Map and only test for the last emitted record per key (if you don't care about the intermediate results) in the test.
Furthermore, you could use suppress() operator to get fine grained control over what output messages you get. suppress()—in contrast to caching—is fully deterministic and thus writing a unit test works well. However, note that suppress() is event-time driven, and thus, if you stop sending new records, time does not advance and suppress() does not emit data. For unit testing, this is important to consider, because you might need to send some additional "dummy" data to trigger the output you actually want to test for. For more details on suppress() check out this blog post: https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers
Update: I didn't spot the line in the example code that refers to the TopologyTestDriver in Kafka Streams. My answer below is for the 'normal' KStreams application behavior, whereas the TopologyTestDriver behaves differently. See the answer by Matthias J. Sax for the latter.
This is expected behavior. Somewhat simplified, Kafka Streams emits by default a new output record as soon as a new input record was received.
When you are aggregating (here: counting) the input data, then the aggregation result will be updated (and thus a new output record produced) as soon as new input was received for the aggregation.
input record 1 ---> new output record with count=1
input record 2 ---> new output record with count=2
...
input record 5 ---> new output record with count=5
What to do about it: You can reduce the number of 'intermediate' outputs through configuring the size of the so-called record caches as well as the setting of the commit.interval.ms parameter. See Memory Management. However, how much reduction you will be seeing depends not only on these settings but also on the characteristics of your input data, and because of that the extent of the reduction may also vary over time (think: could be 90% in the first hour of data, 76% in the second hour of data, etc.). That is, the reduction process is deterministic but from the resulting reduction amount is difficult to predict from the outside.
Note: When doing windowed aggregations (like windowed counts) you can also use the Suppress() API so that the number of intermediate updates is not only reduced, but there will only ever be a single output per window. However, in your use case/code you the aggregation is not windowed, so cannot use the Suppress API.
To help you understand why the setup is this way: You must keep in mind that a streaming system generally operates on unbounded streams of data, which means the system doesn't know 'when it has received all the input data'. So even the term 'intermediate outputs' is actually misleading: at the time the second input record was received, for example, the system believes that the result of the (non-windowed) aggregation is '2' -- its the correct result to the best of its knowledge at this point in time. It cannot predict whether (or when) another input record might arrive.
For windowed aggregations (where Suppress is supported) this is a bit easier, because the window size defines a boundary for the input data of a given window. Here, the Suppress() API allows you to make a trade-off decision between better latency but with multiple outputs per window (default behavior, Suppress disabled) and longer latency but you'll get only a single output per window (Suppress enabled). In the latter case, if you have 1h windows, you will not see any output for a given window until 1h later, so to speak. For some use cases this is acceptable, for others it is not.

How does the CAP Theorem apply on HDFS?

I just started reading about Hadoop and came across the CAP Theorem. Can you please throw some light on which two components of CAP would be applicable to a HDFS system?
Argument for Consistency
The document very clearly says:
"The consistency model of a Hadoop FileSystem is one-copy-update-semantics; that of a traditional local POSIX filesystem."
(One-copy update semantics means the file contents seen by all of the processes accessing or updating a given file would see as if only a single copy of the file existed.)
Moving forward, the document says:
"Create. Once the close() operation on an output stream writing a newly created file has completed, in-cluster operations querying the file metadata and contents MUST immediately see the file and its data."
"Update. Once the close() operation on an output stream writing a newly created file has completed, in-cluster operations querying the file metadata and contents MUST immediately see the new data.
"Delete. once a delete() operation on a path other than “/” has completed successfully, it MUST NOT be visible or accessible. Specifically, listStatus(), open() ,rename() and append() operations MUST fail."
The above mentioned characteristics point towards the presence of "Consistency" in the HDFS.
Source: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/introduction.html
Argument for Partition Tolerance
HDFS provides High Availability for both Name Nodes and Data Nodes.
Source: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html
Argument for Lack of Availability
It is very clearly mentioned in the documentation(under the section "Operations and failures"):
"The time to complete an operation is undefined and may depend on the implementation and on the state of the system."
This indicates that the "Availability" in the context of CAP is missing in HDFS.
Source:
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/introduction.html
Given the above mentioned arguments, I believe HDFS supports "Consistency and Partition Tolerance" and not "Availability" in the context of
CAP theorem.
C – Consistency (All nodes see the data in homogeneous form i.e. every node has the same knowledge of data at any instant of time)
A – Availability (A guarantee that every request receives a response which may be processed or failed)
P – Partition Tolerance (The system continues to operate even if a message is lost or part of the system fails)
Talking about Hadoop , it supports the Availability and Partition Tolerance property. The Consistency property is not supported because only namenode has the information of where the replicas are placed. This information is not available with each and every node of the cluster.

List VM sizes in Microsoft Azure Compute based on Type or Category

We are trying to list all available sizes for particular location using the API "GET https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachines/{vmName}/vmSizes?api-version=2017-12-01". It returns nearly 22400 sizes. Is it really contains this many sizes under some region? Is there any elegant way to get VM sizes based on type.
For Example:
1. Get VM sizes based on General purpose, Memory optimized, Storage optimized etc.
2. Get VM Sizes based on RAM size, CPU count etc.
I used the sample posted by Laurent (link below) and it returned all available VM sizes' names, cores, disks, memory, etc. in the region (use parm location=region). If you put some code around it you should be able to do example 2.
Get Virtual Machine sizes list in json format using azure-sdk-for-python
def list_available_vm_sizes(compute_client, region = 'EastUS2', minimum_cores = 1, minimum_memory_MB = 768):
vm_sizes_list = compute_client.virtual_machine_sizes.list(location=region)
for vm_size in vm_sizes_list:
if vm_size.number_of_cores >= int(minimum_cores) and vm_size.memory_in_mb >= int(minimum_memory_MB):
print('Name:{0}, Cores:{1}, OSDiskMB:{2}, RSDiskMB:{3}, MemoryMB:{4}, MaxDataDisk:{5}'.format(
vm_size.name,
vm_size.number_of_cores,
vm_size.os_disk_size_in_mb,
vm_size.resource_disk_size_in_mb,
vm_size.memory_in_mb,
vm_size.max_data_disk_count
))
list_available_vm_sizes(compute_client, region = 'EastUS', minimum_cores = 2, minimum_memory_MB = 8192)