In this link ( Meta Interpreter ) I believe to have found a nifty way of solving a problem I have to tackle, but since my prolog is very bad I'd first ask if its even possible what I have in mind.
I want to transform certain parts of a workflow/graph depending on a set of rules. A graph basically consists of sequences (a->b) and split/joins, which are either parallel or conditional, i.e. two steps run in parallel in the workflow or a single branch is picked depending on a condition (the condition itself does not matter on this level) (parallel-split - (a && b) - parallel-join) etc. Now a graph usually has nodes and edges, with the form of using terms I want to get rid of edges.
Furthermore each node has a partner attribute, specifying who will execute it.
I'll try to give a simple example what I want to achieve:
A node called A, executed by a partner X, connected with a node called B, executed by a partner Y.
A_X -> B_Y
seq((A,X),(B,Y))
If I detect a pattern like this, i.e. two steps in sequence with different partners, I want this to be replaced with:
A_X -> Send_(X-Y) -> Receive_(Y-X) - B_Y // send step from X to Y and a receive step at Y waiting for something from X
seq((A,X), seq(send(X-Y), seq(receive(Y-X), B)))
If anyone could give me some pointers or help to come up with a solution I would be very thankful!
A graph basically consists of sequences (a->b) and split/joins, which are either parallel or conditional, i.e. two steps run in parallel in the workflow or a single branch is picked depending on a condition
This sounds an awful lot like an and/or graph. Prolog algorithms on these graphs are covered by Ivan Bratko in Prolog Programming for Artificial Intelligence, chapter 13. Even if your graphs aren't really and/or graphs, you may be able to adapt some of these algorithms to your task.
Related
I'm new to using Gremlin (up until now I was accessing Neptune using Opencypher and given up due to how slow it was) and I'm getting really confused over some stuff here.
Basically what I'm trying to do is -
Let us say we have some graph A-->B-->C. There are multiple such graphs in the database, so I'm looking for the specific A,B,C nodes that have the property 'idx' equals '1'. I want to add a node D{'idx' = '1'} and an edge so I will end up having
A-->B-->C-->D
It is safe to assume A,B,C already exist and are connected together.
Also, we wish to add D only if it doesn't already exist.
So what I currently have is this:
g.V().
hasLabel('A').has('idx', '1').
out().hasLabel('B').has('idx', '1').
out().hasLabel('C').has('idx', '1').as('c').
V().hasLabel('D').has('idx', '1').fold().
coalesce(
unfold(),
addV('D').property('idx','1')).as('d').
addE('TEST_EDGE').from('c').to('d')
now the problem is that well, this doesn't work and I don't understand Gremlin enough to understand why. This returns from Neptune as "An unexpected error has occurred in Neptune" with the code "InternalFailureException"
another thing to mention is that if the node D does exist, I don't get an error at all, and in fact th node is properly connected to the graph as it should.
furthermore, I've seen in a different post that using ".as('c')" shouldn't work since there is a 'fold' action afterwards which makes it unusable (for a reason I still don't understand, probably cause I'm not sure how this entire .as,.store,.aggregate work)
And suggests using ".aggregate('c')" instead, but doing so will change the returned error to "addE(TEST_EDGE) could not find a Vertex for from() - encountered: BulkSet". This, adding to the fact that the code I wrote actually works and connects node D to the graph if it already exists, makes me even more confused.
So I'm lost
Any help or clarification or explanation or simplification would be much appreciated
Thank you! :)
A few comments before getting to the query:
If the intent is to have multiple subgraphs of (A->B->C), then you may not want to use this labeling scheme. Labels are meant to be of lower variation - think of labels as groups of vertices of the same "type".
A lookup of a vertex by an ID is the fastest way to find a vertex in a TinkerPop-based graph database. Just be aware of that as you build your access patterns. Instead of doing something like `hasLabel('x').has('idx','y'), if both of those items combined make a unique vertex, you may also want to think of creating a composite ID of something like 'x-y' for that vertex for faster access/lookup.
On the query...
The first part of the query looks good. I think you have a good understanding of the imperative nature of Gremlin just up until you get to the second V() in the query. That V() is going to tell Neptune to start evaluating against all vertices in the graph again. But we want to continue evaluating beyond the 'C' vertex.
Unless you need to return an output in either case of existence or non-existence, you could get away with just doing the following without a coalesce() step:
g.V().
hasLabel('A').has('idx', '1').
out().hasLabel('B').has('idx', '1').
out().hasLabel('C').has('idx', '1').
where(not(out().hasLabel('D').has('idx','1'))).
addE('TEST_EDGE).to(
addV('D').property('idx','1'))
)
The where clause allows us to do the check for the non-existence of a downstream edge and vertex without losing our place in the traversal. It will only continue the traversal if the condition specified is not() found in this case. If it is not found, the traversal continues with where we left off (the 'C' vertex). So we can feed that 'C' vertex directly into an addE() step to create our new edge and new 'D' vertex.
Feel free to cut immediately past the first two paragraphs, they are mostly waffle explaining the situation.
I'm working on a task for my University course, while I don't want any help solving the actual problem (I feel like that's "cheating" as it were) I would like help finding a way to extend the lengths of lists shown in prolog when tracing. For example, in the task you have to make a path finder through a maze with coloured "edges" between nodes which are each assigned a unique letter from the alphabet. The edges are "two way" and there is a "start" node that connects via a red edge to the "m" node also. The goal is to reach the "g" node in the middle while going along edges from the start in a repeating order of [red,brown,yellow].
Anyway, I think my algorithm finds a correct route at the bottom of the recursion, but going through the tracer is possible thousands of steps (I was holding return for about 2 minutes before it finished). Currently it doesn't "return" the generated list of steps (and while I'm sure some of you would be able to tell me how to do so, I'd rather you didn't because it's important I learn the actual prolog myself I feel) so the only time I see what's in the list of route steps is in the trace. SO here is the problem:
path(k, [red, brown, yellow], [[start, red], [m, brown], [e, yellow], [h, red], [r, brown], [p, yellow], [n|...], [...|...]|...], [start, m, e, h, r, p, n, j|...], g)
The final list holds the route I want to know if it's valid, however:
[start, m, e, h, r, p, n, j|...]
Cuts off at j, I want the trace to show FULL lists, otherwise I will have to go back through 100s of lines of trace trying to find the broken up and "correct" nodes in the path, with lots of backtracking mixed in i.e. really hard, and really easy to make a mistake. Also the I'm using program only keeps 30 or so lines prior (no idea if this is normal but I am using a SWI-Prolog(Multi-threaded, version 7.2.3) from the official site). Which means I'd have to go through everything past the first time it reaches the j node which would take a huge amount of time.
So as I say, this could be solved by having the list be unified (or whatever it is called) as a "return" (or whatever it is called) but I don't want an answer like that spoon fed to me and would rather figure it out on my own. So if you know how to do so, please refrain from telling me and just still with the way to increase the maximum displayed list with trace thank you.
I appreciate the help, sorry for the hoops I'm asking people to jump through.
to prevent these kinds of outputs [_|...] adding the code below ;
:- set_prolog_flag(toplevel_print_options,
[quoted(true), portrayed(true), max_depth(0)]).
I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.
I am new in storm framework(https://storm.incubator.apache.org/about/integrates.html),
I test locally with my code and I think If I remove stop words, it will perform well, but i search on line and I can't see any example that removing stopwords in storm.
If the size of the stop words list is small enough to fit in memory, the most straighforward approach would be to simply filter the tuples with an implementation of storm Filter that knows that list. This Filter could possibly poll the DB every so often to get the latest list of stop words if this list evolves over time.
If the size of the stop words list is bigger, then you can use a QueryFunction, called from your topology with the stateQuery function, which would:
receive a batch of tuples to check (say 10000 at a time)
build a single query from their content and look up corresponding stop words in persistence
attach a boolean to each tuple specifying what to with each one
+ add a Filter right after that to filter based on that boolean.
And if you feel adventurous:
Another and faster approach would be to use a bloom filter approximation. I heard that Algebird is meant to provide this kind of functionality and targets both Scalding and Storm (how cool is that?), but I don't know how stable it is nor do I have any experience in practically plugging it into Storm (maybe Sunday if it's rainy...).
Also, Cascading (which is not directly related to Storm but has a very similar set of primitive abstractions on top of map reduce) suggests in this tutorial a method based on left joins. Such joins exist in Storm and the right branch could possibly be fed with a FixedBatchSpout emitting all stop words every time, or even a custom spout that reads the latest version of the list of stop words from persistence every time, so maybe that would work too? Maybe? This also assumes the size of the stop words list is relatively small though.
I have three types of facts:
album(code, artist, title, date).
songs(code, songlist).
musicians(code, list).
Example:
album(123, 'Rolling Stones', 'Beggars Banquet', 1968).
songs(123, ['Sympathy for the Devil', 'Street Fighting Man']).
musicians(123, [[vocals, 'Mick Jagger'], [guitar, 'Keith Richards', 'Brian Jones']].
I need to create these 4 rules:
together(X,Y) This succeeds if X and Y have played on the same album.
artistchain(X,Y) This succeeds if a chain of albums exists from X to Y;
two musicians are linked in the chain by 'together'.
role(X,Y) This succeeds if X had role Y (e.g. guitar) ever.
song(X,Y) This succeeds if artist X recorded song Y.
Any help?
I haven't been able to come up with much but for role(X,Y) I came up with:
role(X,Y) :- prole(X,Y,musicians(_,W)).
prole(X,Y,[[Y|[X|T]]|Z]).
prole(X,Y,[[Y|[H|T]]|Z]) :- prole(X,Y,[[Y|T]|Z]).
prole(X,Y,[A|Z]) :- prole(X,Y,Z).
But that doesn't work. It does work if I manually put in a list instead of musicians(_,W) like [[1,2,3],[4,5,6]].
Is there another way for me to insert the list as a variable?
As for the other rules I'm at a complete loss. Any help would really be appreciated.
You have a misconception about Prolog: Answering a goal in Prolog is not the same as calling a function!
E.g.: You expect that when "role(X,Y) :- prole(X,Y,musicians(_,W))." is executed, "musicians(_,W)" will be evaluated, because it is an argument to "prole". This is not how Prolog works. At each step, it attempts to unify the goal with a stored predicate, and all arguments are treaded either as variables or grounded terms.
The correct way to do it is:
role(X,Y) :- musicians(_, L), prole(X,Y,L).
The first goal unifies L with a list of musicians, and the second goal finds the role (assuming that the rest of your code is correct).
Little Bobby Tables is right, you need to understand the declarative style of Prolog. Your aim is to provide a set of rules that will match against the set of facts in the database.
Very simply, imagine that I have the following database
guitarist(keith).
guitarist(jim).
in_band('Rolling Stones', keith).
in_band('Rolling Stones', mick).
Supposed I want to find out who is both a guitarist and in the Rolling Stones. I could use a rule like this
stones_guitarist(X):-
guitarist(X),
in_band('Rolling Stones', X).
When a variable name is given within a rule (in this case X) it holds its value during the rule, so what we're saying is that the X which is a guitarist must also be the same X that is in a band called 'Rolling Stones'.
There are lots of possible ways for you to arrange the database. For example it might be easier if the names of the musicians were themselves a list (e.g. [guitar,[keith,brian]]).
I hope the following example for song(X,Y) is of some help. I'm using Sicstus Prolog so import the lists library to get 'member', but if you don't have that it's fairly easy to make it yourself.
:- use_module(library(lists)).
song(ARTIST,SONG):-
album(CODE,ARTIST,_,_),
songs(CODE,TRACKS),
member(SONG,TRACKS).