Confusion on the UML state diagram - state

I have referred quite a lot of materials online, I saw the usage of merge and junction is almost the same? Some sites said to use the diamond shape as a merge, some said to use junction. Can I know which one is correct? The following images are the material I have read.
merge using diamond shape
merge using junction

In a state machine diagram there are two pseudo states that are often confused: The junction state (a black filled circle) and the choice state (a hollow diamond). Don't confuse them with similar shapes in an activity diagram (initial node, decision node and merge node). They only look the same.
Now for the difference between choice and junction states: A compound transition fires, when its triggering event occurs AND all guards before an eventual choice state evaluate to true. Then all the effect behaviors up to the choice state are executed before any of the guards of outgoing transitions are checked. This allows dynamic branching depending on some value that only gets evaluated when the transition has fired.
A junction state just connects transitions. It could be replaced with as many transitions as there are possible routings. Such a transition only fires if all guards on the route evaluate to true.
Both states can have as many in- and outgoing transitions as you like.

As a side note (too long for a comment) the notation for the combined merge/decision needs some clarification. UML 2.5 states on p. 387:
15.3.3.5 Merge Nodes
A MergeNode is a control node that brings together multiple flows without synchronization. A MergeNode shall have exactly one outgoing ActivityEdge but may have multiple incoming ActivityEdges.
and below (a bit more obscure!):
15.3.3.6 Decision Nodes
A DecisionNode is a ControlNode that chooses between outgoing flows. A DecisionNode shall have at least one and at most two incoming ActivityEdges, and at least one outgoing ActivityEdge. If it has two incoming edges, then one shall be identified as the decisionInputFlow, the other being called the primary incoming edge. If the DecisionNode has only one incoming edge, then it is the primary incoming edge.
Note that this is about activities and not states!
As #AxelScheithauer noted, the diagram rendering may deviate. P. 390 of UML 2.5 states
The functionality of a MergeNode and a DecisionNode can be combined by using the same node symbol, as shown in Figure 15.34. At most one of the incoming flows may be annotated as a decisionInputFlow. This notation maps to a model containing a MergeNode with all the incoming edges shown in the diagram and one outgoing edge to a DecisionNode that has all the outgoing edges shown in the diagram.

Related

How to write milp equation for this problem?

Consider the classic network flow problem where the constraint is that the outflow from a vertex is equal to the sum of the inflows to it. Consider having a more specific constraint where the flow can be split between edges.
I have two questions:
How can I use a decision variable to identify that node j is receiving items from multiple edges?
How to create another equation to determine the cost (2 unit of time per item) of joining x number of items from different edges in the sink node?
This is a tricky modeling question. Let's go by parts.
Consider having a more specific constraint where the flow can be split between edges
I here assume that you have a classic flow constraint modeled as a real variable set y_ij. Therefore, the flow can be split between two or more arcs.
How can I use a decision variable to identify that node j is receiving items from multiple edges?
You need to create an additional binary variable z_ij to represent your flow. You must also create the following constraint:
Next, you will need another additional integer variable set, let's say p_i and an additional constraint
Then, p_i will store the number of ingoing arcs in a node j which are used to send flow. Since you will try to minimize the cost of joining arcs (I think), you need to use the <=.
How to create another equation to determine the cost(2 unit of time per item) of joining x number of items from different edges in the sink node?
For this, you can use the value of p_i and multiply by the predefined cost of joining the flow.

Skip gram in word2vec - what is the number of outputs

The following images are often represented to describe the word2vec model with skip-gram:
However, after reading this discussion on stackoverflow, it seems that word2vec actually take 1 word and input and 1 word as output. The output word is randomly samples from the window. (And this is performed X number of times to generate X input/output pairs.)
It seems to me then that the above image is not correctly describing the network. My question is: is the 1 input/1 output standard (the Tensorflow word2vec tutorial takes this approach and calls it skip-gram) or do some networks actually take the structure of the above image?
It's not a great diagram.
In CBOW, those converging arrows are an averaging that happens all-at-once, to create one single 'training example' (desired prediction) that is (average(context1, context2, ..., contextN) -> target-word). (In practice averaging is more common than the 'SUM' shown in the diagram.)
In Skip-Gram, those diverging arrows are multiple training examples (desired predictions) made one-after-the-other.
And in both diagrams, while they look a bit like neural-net node-architectures, the actual hidden-layer and internal-connection weights are just implied inside the middle-column-to-right-column arrows.
Skip-gram is always 1 "input" context word used to predict 1 nearby (within the effective 'window') "output" target word.
Implementations tend to iterate through the whole effective window, so every (context -> target) pair gets used as a training-example. And in practice, it doesn't matter if you consider the central word the target-word and each word around it to be context-words, or the central word the context-word and each word around it to be target-words – both methods result in the exact same set of (word -> word) pairs being trained, just in a slightly different iteration order. (I believe the original Word2Vec paper described it one way, but then Google's released code did it the other way for reasons of slightly-better cache efficiency.)
In fact the effective window, for each central word considered, is chosen to be some random number from 1 to the configured maximum window value. This turns out to be a cheap way of essentially weighting nearer-words more: the immediate neighbors are always part of training-pairs, further words only sometimes. That is, pairs are not randomly sampled from the whole window - it's just a random window size. (There's another down-sampling where the most-frequent words will be randomly dropped so as not to overtrain them at the expense of less-frequent words, but that's a totally separate process not reflected in the above.)
In CBOW, instead of up-to 2*window input-output pairs of the (context-word -> target-word) form, there's a single input-output pair of (context-words-average -> target-word). (In CBOW, a loop creates the average value for a single N:1 training-example for one central word, and then splits the backpropagated error across all contributing words. In skip-gram, a loop creates multiple alternate 1:1 training-examples for one central word.)

Aggregation node in TBB

I am new to TBB, so my apologies, if this question is obvious... but how do I set up an aggregating node with TBB? Out of all pre-made nodes I cannot find the right type for it.
Imagine I have a stream of incoming images. I want a node that keeps accepting images (with a FIFO buffer), does some calculation on them (i.e. it needs an internal state) and whenever it has received N images (fixed parameter), it emits a single result.
I think there is no such singular node in TBB flow graph that does accumulating with some sort of preprocessing and then, when accumulation is done, forwards the result of it to successor.
However, I believe the effect could be achieved by using several nodes. For example, consider queue_node as a starting point in the graph. It will serve as a buffer with FIFO semantics. After it there goes multifunction_node with N outputs. This node will do actual image preprocessing and send the result to its output port that correponds to image number. Then goes join_node that has all its N inputs connected to corresponding outputs of multifunction_node. At the end there will be a successor of join_node that will receive N images as its input. Since join_node aggregates its inputs in a tuple the drawback of this design could be quickly seen in case the number N is relatively large.
The other variant might be having the same queue_node connected with function_node with unlimited concurrency as successor (function_node is supposed to be doing some image preprocessing), and then having a multifunction_node with serial concurrency (meaning that only single instance of its body could be working at a time) that will sort of accumulate the images and do try_put call from inside the body to its successor when the number N is reached.
Of course there could be other variants how to implement desired behavior by using other flow graph topologies. By the way, to make such a graph as a singular node one could use composite_node that represents the subgraphs as a single node.

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.

Face Recognition Using Backpropagation Neural Network?

I'm very new in image processing and my first assignment is to make a working program which can recognize faces and their names.
Until now, I successfully make a project to detect, crop the detected image, make it to sobel and translate it to array of float.
But, I'm very confused how to implement the Backpropagation MLP to learn the image so it can recognize the correct name for the detected face.
It's a great honor for all experts in stackoverflow to give me some examples how to implement the Image array to be learned with the backpropagation.
It is standard machine learning algorithm. You have a number of arrays of floats (instances in ML or observations in statistics terms) and corresponding names (labels, class tags), one per array. This is enough for use in most ML algorithms. Specifically in ANN, elements of your array (i.e. features) are inputs of the network and labels (names) are its outputs.
If you are looking for theoretical description of backpropagation, take a look at Stanford's ml-class lectures (ANN section). If you need ready implementation, read this question.
You haven't specified what are elements of your arrays. If you use just pixels of original image, this should work, but not very well. If you need production level system (though still with the use of ANN), try to extract more high level features (e.g. Haar-like features, that OpenCV uses itself).
Have you tried writing your feature vectors to an arff file and to feed them to weka, just to see if your approach might work at all?
Weka has a lot of classifiers integrated, including MLP.
As I understood so far, I suspect the features and the classifier you have chosen not to work.
To your original question: Have you made any attempts to implement a neural network on your own? If so, where you got stuck? Note, that this is not the place to request a complete working implementation from the audience.
To provide a general answer on a general question:
Usually you have nodes in an MLP. Specifically input nodes, output nodes, and hidden nodes. These nodes are strictly organized in layers. The input layer at the bottom, the output layer on the top, hidden layers in between. The nodes are connected in a simple feed-forward fashion (output connections are allowed to the next higher layer only).
Then you go and connect each of your float to a single input node and feed the feature vectors to your network. For your backpropagation you need to supply an error signal that you specify for the output nodes. So if you have n names to distinguish, you may use n output nodes (i.e. one for each name). Make them for example return 1 in case of a match and 0 else. You could very well use one output node and let it return n different values for the names. Probably it would even be best to use n completely different perceptrons, i.e. one for each name, to avoid some side-effects (catastrophic interference).
Note, that the output of each node is a number, not a name. Therefore you need to use some sort of thresholds, to get a number-name relation.
Also note, that you need a lot of training data to train a large network (i.e. to obey the curse of dimensionality). It would be interesting to know the size of your float array.
Indeed, for a complex decision you may need a larger number of hidden nodes or even hidden layers.
Further note, that you may need to do a lot of evaluation (i.e. cross validation) to find the optimal configuration (number of layers, number of nodes per layer), or to find even any working configuration.
Good luck, any way!