association list in prolog - list

I want to write a rule in prolog which gets the parameters
FAssoc: Association list which contains the number of ways to climb a stair
U: number of last step of the stairs
Key: the current step that I am on, and
Steps: a list with the possible numbers of stairs I can climb with each step.
I want to calculate from the step that I am until the last step the number of ways I can climb the stairs.
Examples :
list_to_assoc([1-1,2-0,3-0,4-1,5-0],Fassoc),
calc_next(5, 1 , Fassoc , [1,2,6],F),
assoc_to_list(F,L).
L = [1-1, 2-1, 3-1, 4-1, 5-0].
However the answer I expect is L = [1-1, 2-1, 3-1, 4-1, 5-2].
Because I can go 1->2->3->4->5. or 1->5.
I am using the function
calc_next(U,Key,FAssoc,[H|T],New):-
(Limit is H+Key,
Limit=<U->
get_assoc(Limit,FAssoc,UntilNow),
get_assoc(Key,FAssoc,ElH),
Update0 is UntilNow+ElH,
Update is Update0 mod 1000000009,
put_assoc(Limit,FAssoc,Update,Assoc),
get_assoc(Limit,FAssoc,Z),write('a'),
calc_next(U,Key,Assoc,T,New)),!.
calc_next(U,Key,FAssoc,[H|T],New):-
calc_next(U,Key,FAssoc,T,New),write('c'),!.

Related

Finding the first occurence of 1-digit number in a list in Raku

I've got a number of lists of various lengths. Each of the lists starts with some numbers which are multiple digits but ends up with a recurring 1-digit number. For instance:
my #d = <751932 512775 64440 59994 9992 3799 423 2 2 2 2>;
my #e = <3750 3177 4536 4545 686 3 3 3>;
I'd like to find the position of the first occurence of the 1-digit number (for #d 7 and for #e 5) without constructing any loop. Ideally a lambda (or any other practical thing) should iterate over the list using a condition such as $_.chars == 1 and as soon as the condition is fulfilled it should stop and return the position. Instead of returing the position, it might as well return the list up until the 1-digit number; changes and improvisations are welcome. How to do it?
You want the :k modifier on first:
say #d.first( *.chars == 1, :k ) # 7
say #e.first( *.chars == 1, :k ) # 5
See first for more information.
To answer your second part of the question:
say #d[^$_] with #d.first( *.chars == 1, :k );
# (751932 512775 64440 59994 9992 3799 423)
say #e[^$_] with #e.first( *.chars == 1, :k );
# (3750 3177 4536 4545 686)
Make sure that you use the with to ensure you only show the slice if first actually found an entry.
See with for more information.

Prolog - Keep track of multiple sum counters

So I have a list that looks like this:
[
["p1", "p2", "100", "Storgatan"],
["p1", "p3", "200", "Lillgatan"],
["p2", "p4", "100", "Nygatan"],
["p3", "p4", "50", "Kungsgatan"],
["p4", "p5", "150", "Kungsgatan"]
]
The elements in each nested list represent (in order):
1st element = Start Point
2nd element = End Point
3rd element = Distance
4th element = Street Name.
I have to now write a predicate which figures out which street is the shortest and which street is the longest, along with their respective (summed up) distances.
For example the final output should look something like this:
Longest street: Kungsgatan, 200
Shortest street: Storgatan, 100
I don't really understand why the start and end points are relevant information here. My current idea is to collect all the unique street names, put them in a separate list along with a counter for each street that starts at zero and then use that list to accumulate all of the distances for each separate street.
Something like:
create_sum_list([
["p1", "p2", "100", "Storgatan"],
["p1", "p3", "200", "Lillgatan"],
["p2", "p4", "100", "Nygatan"],
["p3", "p4", "50", "Kungsgatan"],
["p4", "p5", "150", "Kungsgatan"]
], SL).
SL= [[Storgatan, 0], [Lillgatan, 0],
[Nygatan, 0], [Kungsgatan ,0]]
accumulate(SL, List).
List=[[Storgatan, 100], [Lillgatan, 200],
[Nygatan, 100], [Kungsgatan ,200]]
This is probably a stupid idea and there is probably a way better way to solve this. I have thought of many different ideas where I either reach a dead end or they are way too complex for such a "simple" task.
I can achieve this easily through "normal" imperative programming but I am new to logical programming and Prolog. I have no idea how to achieve this.
Help?
Thanks!
If you already have a list and you want to group by street name and sum the lengths, you must decide how you do the grouping. One way is to use library(pairs):
streets_lengths(S, L) :-
maplist(street_name_and_length, S, NL),
keysort(NL, NL_sorted),
group_pairs_by_key(NL_sorted, G),
maplist(total_lengths, G, GT),
transpose_pairs(GT, By_length), % sorts!
group_pairs_by_key(By_length, L).
street_name_and_length([_, _, N, L], L_atom-N_number) :-
number_string(N_number, N),
atom_string(L_atom, L).
total_lengths(S-Ls, S-T) :-
sum_list(Ls, T).
You can use it like this:
?- streets_lengths([
["p1", "p2", "100", "Storgatan"],
["p1", "p3", "200", "Lillgatan"],
["p2", "p4", "100", "Nygatan"],
["p3", "p4", "50", "Kungsgatan"],
["p4", "p5", "150", "Kungsgatan"]
], SL).
SL = [100-['Storgatan', 'Nygatan'], 200-['Lillgatan', 'Kungsgatan']].
Since there can be many streets with the same length, the results are returned grouped by length. You can get the "shortest" and "longest" by getting the first and last element of the list, like this:
L = [First|_], last(L, Last)
Since this is homework I won't give you the entire answer but the key part of the code.
As I noted in the comments, the format of the structure matters, e.g. list, terms, atoms, strings, etc.
test(Street_lengths,Shortest) :-
List =
[
street(p1, p2, 100, 'Storgatan'),
street(p1, p3, 200, 'Lillgatan'),
street(p2, p4, 100, 'Nygatan'),
street(p3, p4, 50, 'Kungsgatan'),
street(p4, p5, 150, 'Kungsgatan')
],
street_lengths(List,Street_lengths),
lengths1(Street_lengths,Lengths),
min_list(Lengths,Min),
convlist(value_shortest2(Min),Street_lengths,Shortest).
street_lengths([H|T],Street_lengths) :-
merge_streets(H,T,Street_lengths).
% 2 or more items in list
merge_streets(street(_,_,Length0,Name),[street(_,_,Length1,Name),street(_,_,Length2,Name2)|Streets0],[street(Length,Name)|Streets]) :-
Length is Length0 + Length1,
merge_streets(street(_,_,Length2,Name2),Streets0,Streets).
merge_streets(street(_,_,Length0,Name0),[street(_,_,Length1,Name1)|Streets0],[street(Length0,Name0)|Streets]) :-
Name0 \= Name1,
merge_streets(street(_,_,Length1,Name1),Streets0,Streets).
% 1 item in list
merge_streets(street(_,_,Length0,Name),[street(_,_,Length1,Name)],[street(Length,Name)]) :-
Length is Length0 + Length1.
merge_streets(street(_,_,Length0,Name0),[street(_,_,Length1,Name1)],[street(Length0,Name0)|Streets]) :-
Name0 \= Name1,
merge_streets(street(_,_,Length1,Name1),[],Streets).
% no item in list
merge_streets(street(_,_,Length,Name),[],[street(Length,Name)]).
lengths1(List,Lengths) :-
maplist(value_length1,List,Lengths).
value_length1(street(Length,_),Length).
value_shortest2(Min,street(Min,Name),street(Min,Name)).
Example run:
?- test(Street_lengths,Shortest).
Street_lengths = [street(100, 'Storgatan'), street(200, 'Lillgatan'), street(100, 'Nygatan'), street(200, 'Kungsgatan')],
Shortest = [street(100, 'Storgatan'), street(100, 'Nygatan')] ;
false.
I left the longest for you to do, but should be a cake walk.
To display the information as you noted in the question I would use format/2.
So now you either have to change how you get the data read into the format for this code, or change this code to work with how you structured the data. IMHO I would change the data to work with this structure.
If want to know how efficient your code is you can use time/1
?- time(test(Street_lengths,Shortest)).
% 44 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips)
Street_lengths = [street(100, 'Storgatan'), street(200, 'Lillgatan'), street(100, 'Nygatan'), street(200, 'Kungsgatan')],
Shortest = [street(100, 'Storgatan'), street(100, 'Nygatan')] ;
% 17 inferences, 0.000 CPU in 0.000 seconds (?% CPU, Infinite Lips)
false.

Python: referring to each duplicate item in a list by unique index

I am trying to extract particular lines from txt output file. The lines I am interested in are few lines above and few below the key_string that I am using to search through the results. The key string is the same for each results.
fi = open('Inputfile.txt')
fo = open('Outputfile.txt', 'a')
lines = fi.readlines()
filtered_list=[]
for item in lines:
if item.startswith("key string"):
filtered_list.append(lines[lines.index(item)-2])
filtered_list.append(lines[lines.index(item)+6])
filtered_list.append(lines[lines.index(item)+10])
filtered_list.append(lines[lines.index(item)+11])
fo.writelines(filtered_list)
fi.close()
fo.close()
The output file contains the right lines for the first record, but multiplied for every record available. How can I update the indexing so it can read every individual record? I've tried to find the solution but as a novice programmer I was struggling to use enumerate() function or collections package.
First of all, it would probably help if you said what exactly goes wrong with your code (a stack trace, it doesn't work at all, etc). Anyway, here's some thoughts. You can try to divide your problem into subproblems to make it easier to work with. In this case, let's separate finding the relevant lines from collecting them.
First, let's find the indexes of all the relevant lines.
key = "key string"
relevant = []
for i, item in enumerate(lines):
if item.startswith(key):
relevant.append(item)
enumerate is actually quite simple. It takes a list, and returns a sequence of (index, item) pairs. So, enumerate(['a', 'b', 'c']) returns [(0, 'a'), (1, 'b'), (2, 'c')].
What I had written above can be achieved with a list comprehension:
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
So, we have the indexes of the relevant lines. Now, let's collected them. You are interested in the line 2 lines before it and 6 and 10 and 11 lines after it. If your first lines contains the key, then you have a problem – you don't really want lines[-1] – that's the last item! Also, you need to handle the situation in which your offset would take you past the end of the list: otherwise Python will raise an IndexError.
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
You could also catch the IndexError, but that won't save us much typing, as we have to handle negative indexes anyway.
The whole program would look like this:
key = "key string"
with open('Inputfile.txt') as fi:
lines = fi.readlines()
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
with open('Outputfile.txt', 'a') as fi:
fi.writelines(out)
To get rid of duplicates you can cast list to set; example:
x=['a','b','a']
y=set(x)
print(y)
will result in:
['a','b']

Minimal transversal of a Hypergraph

Hi This is my first post, so please go easy on me. I tried going through an algorithm Dualize and Advance for generating maximal frequent item sets. I considered an example as follows
Transactions
abcde
ace
bd
abc
and minimum frequency threshold as 2.
Now, I have a problem understanding how to generate 'minimal transversals' part of the algorithm.
I know that transversal is a subset of vertices of the hypergraph that intersects every hyper edge. So the initial set of minimal transversals should be {a,b,c,d,e} if I am not wrong.
Can you please explain me this part of 'minimal transversal' w.r.t the transactions.
Ok, I ll try to answer my question.
At the end of first iteration of the algorithm and for the given transactions, {abc} is emanated as maximal frequent itemset. Here is how I understood,
minimal transversal X = S1' = {a,b,c,d,e}
S2 = {abc} is maximal and S2' = {de}
Find minimal transversal of S2' which is {d,e}
Now, X = {d,e}, consider 'd' ,
S3 = {bd} is maximal and S3' = {ace}
Now, consider 'e'
S4 = {ace} is maximal and meanwhile I get {ad} , {be} ,{cd} and {de} as minimal transversals which are infrequent.
A subset T of V is a transversal (or hitting set) of H if it
intersects all the hyperedges of H,
'minimal transversal' -> the smallest possible set that we get.
Also, we enforce ordering in our algorithm a>b>c>d>e
Iteration 1:
S1 = {}
S1' = {abcde}
Tr = {a,b,c,d,e} [All nodes are required to cut Si']
Iteration 2:
S2 = {abc}
S2' = {de}
Tr = {d,e} [Two nodes are required to cut Si']
Iteration 3:
S3 = {abc, bd}
S3' = {de, ace}
Tr = {e, cd, ad} [Three nodes are required to cut Si']
Iteration 4:
S4 = {abc, bd, ace}
S4' = {de, bd, ace}
Tr = {de, cd, ad, be} [Four nodes are required to cut Si']
All minimal transversals Tr are infrequent so the algorithm ends.

Stata: Counting number of consecutive occurrences of a pre-defined length

Observations in my data set contain the history of moves for each player. I would like to count the number of consecutive series of moves of some pre-defined length (2, 3 and more than 3 moves) in the first and the second halves of the game. The sequences cannot overlap, i.e. the sequence 1111 should be considered as a sequence of the length 4, not 2 sequences of length 2. That is, for an observation like this:
+-------+-------+-------+-------+-------+-------+-------+-------+
| Move1 | Move2 | Move3 | Move4 | Move5 | Move6 | Move7 | Move8 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| 1 | 1 | 1 | 1 | . | . | 1 | 1 |
+-------+-------+-------+-------+-------+-------+-------+-------+
…the following variables should be generated:
Number of sequences of 2 in the first half =0
Number of sequences of 2 in the second half =1
Number of sequences of 3 in the first half =0
Number of sequences of 3 in the second half =0
Number of sequences of >3 in the first half =1
Number of sequences of >3 in the second half = 0
I have two potential options of how to proceed with this task but neither of those leads to the final solution:
Option 1: Elaborating on Nick’s tactical suggestion to use strings (Stata: Maximum number of consecutive occurrences of the same value across variables), I have concatenated all “move*” variables and tried to identify the starting position of a substring:
egen test1 = concat(move*)
gen test2 = subinstr(test1,"11","X",.) // find all consecutive series of length 2
There are several problems with Option 1:
(1) it does not account for cases with overlapping sequences (“1111” is recognized as 2 sequences of 2)
(2) it shortens the resulting string test2 so that positions of X no longer correspond to the starting positions in test1
(3) it does not account for variable length of substring if I need to check for sequences of the length greater than 3.
Option 2: Create an auxiliary set of variables to identify the starting positions of the consecutive set (sets) of the 1s of some fixed predefined length. Building on the earlier example, in order to count sequences of length 2, what I am trying to get is an auxiliary set of variables that will be equal to 1 if the sequence of started at a given move, and zero otherwise:
+-------+-------+-------+-------+-------+-------+-------+-------+
| Move1 | Move2 | Move3 | Move4 | Move5 | Move6 | Move7 | Move8 |
+-------+-------+-------+-------+-------+-------+-------+-------+
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+-------+-------+-------+-------+-------+-------+-------+-------+
My code looks as follows but it breaks when I am trying to restart counting consecutive occurrences:
quietly forval i = 1/42 {
gen temprow`i' =.
egen rowsum = rownonmiss(seq1-seq`i') //count number of occurrences
replace temprow`i'=rowsum
mvdecode seq1-seq`i',mv(1) if rowsum==2
drop rowsum
}
Does anyone know a way of solving the task?
Assume a string variable concatenating all moves all (the name test1 is hardly evocative).
FIRST TRY: TAKING YOUR EXAMPLE LITERALLY
From your example with 8 moves, the first half of the game is moves 1-4 and the second half moves 5-8. Thus there is for each half only one way to have >3 moves, namely that there are 4 moves. In that case each substring will be "1111" and counting reduces to testing for the one possibility:
gen count_1_4 = substr(all, 1, 4) == "1111"
gen count_2_4 = substr(all, 5, 4) == "1111"
Extending this approach, there are only two ways to have 3 moves in sequence:
gen count_1_3 = inlist(substr(all, 1, 4), "111.", ".111")
gen count_2_3 = inlist(substr(all, 5, 4), "111.", ".111")
In similar style, there can't be two instances of 2 moves in sequence in each half of the game as that would qualify as 4 moves. So, at most there is one instance of 2 moves in sequence in each half. That instance must match either of two patterns, "11." or ".11". ".11." is allowed, so either includes both. We must also exclude any false match with a sequence of 3 moves, as just mentioned.
gen count_1_2 = (strpos(substr(all, 1, 4), "11.") | strpos(substr(all, 1, 4), ".11") ) & !count_1_3
gen count_2_2 = (strpos(substr(all, 5, 4), "11.") | strpos(substr(all, 5, 4), ".11") ) & !count_2_3
The result of each strpos() evaluation will be positive if a match is found and (arg1 | arg2) will be true (1) if either argument is positive. (For Stata, non-zero is true in logical evaluations.)
That's very much tailored to your particular problem, but not much worse for that.
P.S. I didn't try hard to understand your code. You seem to be confusing subinstr() with strpos(). If you want to know positions, subinstr() cannot help.
SECOND TRY
Your last code segment implies that your example is quite misleading: if there can be 42 moves, the approach above can not be extended without pain. You need a different approach.
Let's suppose that the string variable all can be 42 characters long. I will set aside the distinction between first and second halves, which can be tackled by modifying this approach. At its simplest, just split the history into two variables, one for the first half and one for the second and repeat the approach twice.
You can clone the history by
clonevar work = all
gen length1 = .
gen length2 = .
and set up your count variables. Here count_4 will hold counts of 4 or more.
gen count_4 = 0
gen count_3 = 0
gen count_2 = 0
First we look for move sequences of length 42, ..., 2. Every time we find one, we blank it out and bump up the count.
qui forval j = 42(-1)2 {
replace length1 = length(work)
local pattern : di _dup(`j') "1"
replace work = subinstr(work, "`pattern'", "", .)
replace length2 = length(work)
if `j' >= 4 {
replace count4 = count4 + (length1 - length2) / `j'
}
else if `j' == 3 {
replace count3 = count3 + (length1 - length2) / 3
}
else if `j' == 2 {
replace count2 = count2 + (length1 - length2) / 2
}
}
The important details here are
If we delete (repeated instances of) a pattern and measure the change in length, we have just deleted (change in length) / (length of pattern) instances of that pattern. So, if I look for "11" and found that the length decreased by 4, I just found two instances.
Working downwards and deleting what we found ensures that we don't find false positives, e.g. if "1111111" is deleted, we don't find later "111111", "11111", ..., "11" which are included within it.
Deletion implies that we should work on a clone in order not to destroy what is of interest.