Remove Multiple Elements in a Python List Once - list

(Using Python 3)
Given this list named numList: [1,1,2,2,3,3,3,4].
I want to remove exactly one instance of “1” and “3” from numList.
In other words, I want a function that will turn numList into: [1,2,2,3,3,4].
What function will let me remove an X number of elements from a Python list once per element I want to remove?
(The elements I want to remove are guaranteed to exist in the list)
For the sake of clarity, I will give more examples:
[1,2,3,3,4]
Remove 2 and 3
[1,3,4]
[3,3,3]
Remove 3
[3,3]
[1,1,2,2,3,4,4,4,4]
Remove 2, 3 and 4
[1,1,2,4,4,4]
I’ve tried doing this:
numList=[1,2,2,3,3,4,4,4]
remList = [2,3,4]
for x in remList:
numList.remove(x)
This turns numList to [1,2,3,4,4] which is what I want. However, this has a complexity of:
O((len(numList))^(len(remList)))
This is a problem because remList and numList can have a length of 10^5. The program will take a long time to run. Is there a built-in function that does what I want faster?
Also, I would prefer the optimum function which can do this job in terms of space and time because the program needs to run in less than a second and the size of the list is large.

Your approach:
for x in rem_list:
num_list.remove(x)
is intuitative and unless the lists are going to be very large I might do that because it is easy to read.
One alternative would be:
result = []
for x in num_list:
if x in rem_list:
rem_list.remove(x)
else:
result.append(x)
This would be O(len(rem_list) ^ len(num_list)) and faster than the first solution if len(rem_list) < len(num_list).
If rem_list was guaranteed to not contain any duplicates (as per your examples) you could use a set instead and the complexity would be O(len(num_list)).

Related

Is there even a slight possibility to process two lists in one single list comprehension line?

I would like to ask if there's a possibility to process more than one list in just a single line with list comprehension? I'm using Python 2.7 .
Here is what the code looks like:
n=[1,2,3,4,5,6,7]
m=[1,7]
c=[]
for x in m:
if x in n:
c.append(x)
n.pop(n.index(x))
print n
print c
The output is:
[2,3,4,5,6]
[1,7]
Now I'm wondering if I could turn the code (line 5 to line 8) into a single line using a list comprehension?
I would appreciate your advice. Let me know if my question has a duplicate. Thank you very much.
You can do it this way since popping a value from the list returns the value
n=[1,2,3,4,5,6,7]
m=[1,7]
c=[n.pop(n.index(x)) for x in m if x in n]
print n
print c
n=[1,2,3,4,5,6,7]
m=[1,7]
print set(n)-set(m)
> [2,3,4,5,6]
Assign the sets to their own variables if you need to perform additional operations. Converting to a set will take some time on a big list but then membership, subtraction, union or intersection operations should be very fast.

Creating a table of square roots of a given list in Prolog

I am a newbie to Prolog and trying to develop a simple code to get the following output.
?-sqrt_table(7, 4, Result).
Result = [[4, 2.0],[7, 2.64575],[6, 2.44949],[5, 2.23607]] ;
false.
However the output I'm getting is,
?- sqrt_table(4,7,X).
X = [[5, 2.23606797749979], [[6, 2.449489742783178], [[7, 2.6457513110645907], []]]].
I think the issue is in the nested list created by get_sqrt/2. If I can get it flattened down to tuples I think it might work. Your ideas and help are much appreciated.
%main predicate
sqrt_table(N,M,Result):-
full_list(N,M,Out),
get_sqrt(Out,[Rest,Result]).
%Creates a range of outputs within the given upper and lower limits
range(Low,Low,High).
range(Out,Low,High):-
NewLow is Low+1,NewLow=<High,
range(Out,NewLow,High).
%Creates a list of the outputs created by range/3
full_list(Low,High,Out):-
findall(X,range(X,Low,High),Out).
%Calculates the square root of each item in the list and gives a list consisted
%of sublists such that [Input,Squareroot of the input]
get_sqrt([],[]).
get_sqrt([H|T],[[H,Sqrt],SqrtRest]):-
SqrtOf_H is sqrt(H),
get_sqrt(T,SqrtRest),
Sqrt = SqrtOf_H.
Thanks in advance.
In the head of the second clause ofget_sqrt/2, simply write [[H,Sqrt]|SqrtRest], i.e., use (|)/2 instead of (,)/2.
In fact, it would be even better to use the more readable and more idiomatic [H-Sqrt|HSqrts], i.e., use (-)/2 do denote pairs.
And in second fact, a better way altogether is to simply state the relation for one element at a time, using for example:
integer_isqrt(I, I-Sq) :- Sq is sqrt(I).
and then to use the meta-predicate maplist/3 to relate lists of such elements to one another:
?- maplist(integer_isqrt, [0,1,2,3,4], Ls).
Ls = [0-0.0, 1-1.0, 2-1.4142135623730951, 3-1.7320508075688772, 4-2.0].
P.S.: Using flatten/2 always indicates a problem with your data structures, you should avoid flatten/2 entirely. If you need to remove one level of nesting, use append/2. But in this case, neither is needed.

Checking if a string contains an English sentence

As of right now, I decided to take a dictionary and iterate through the entire thing. Every time I see a newline, I make a string containing from that newline to the next newline, then I do string.find() to see if that English word is somewhere in there. This takes a VERY long time, each word taking about 1/2-1/4 a second to verify.
It is working perfectly, but I need to check thousands of words a second. I can run several windows, which doesn't affect the speed (Multithreading), but it still only checks like 10 a second. (I need thousands)
I'm currently writing code to pre-compile a large array containing every word in the English language, which should speed it up a lot, but still not get the speed I want. There has to be a better way to do this.
The strings I'm checking will look like this:
"hithisisastringthatmustbechecked"
but most of them contained complete garbage, just random letters.
I can't check for impossible compinations of letters, because that string would be thrown out because of the 'tm', in between 'thatmust'.
You can speed up the search by employing the Knuth–Morris–Pratt (KMP) algorithm.
Go through every dictionary word, and build a search table for it. You need to do it only once. Now your search for individual words will proceed at faster pace, because the "false starts" will be eliminated.
There are a lot of strategies for doing this quickly.
Idea 1
Take the string you are searching and make a copy of each possible substring beginning at some column and continuing through the whole string. Then store each one in an array indexed by the letter it begins with. (If a letter is used twice store the longer substring.
So the array looks like this:
a - substr[0] = "astringthatmustbechecked"
b - substr[1] = "bechecked"
c - substr[2] = "checked"
d - substr[3] = "d"
e - substr[4] = "echecked"
f - substr[5] = null // since there is no 'f' in it
... and so forth
Then, for each word in the dictionary, search in the array element indicated by its first letter. This limits the amount of stuff that has to be searched. Plus you can't ever find a word beginning with, say 'r', anywhere before the first 'r' in the string. And some words won't even do a search if the letter isn't in there at all.
Idea 2
Expand upon that idea by noting the longest word in the dictionary and get rid of letters from those strings in the arrays that are longer than that distance away.
So you have this in the array:
a - substr[0] = "astringthatmustbechecked"
But if the longest word in the list is 5 letters, there is no need to keep any more than:
a - substr[0] = "astri"
If the letter is present several times you have to keep more letters. So this one has to keep the whole string because the "e" keeps showing up less than 5 letters apart.
e - substr[4] = "echecked"
You can expand upon this by using the longest words starting with any particular letter when condensing the strings.
Idea 3
This has nothing to do with 1 and 2. Its an idea that you could use instead.
You can turn the dictionary into a sort of regular expression stored in a linked data structure. It is possible to write the regular expression too and then apply it.
Assume these are the words in the dictionary:
arun
bob
bill
billy
body
jose
Build this sort of linked structure. (Its a binary tree, really, represented in such a way that I can explain how to use it.)
a -> r -> u -> n -> *
|
b -> i -> l -> l -> *
| | |
| o -> b -> * y -> *
| |
| d -> y -> *
|
j -> o -> s -> e -> *
The arrows denote a letter that has to follow another letter. So "r" has to be after an "a" or it can't match.
The lines going down denote an option. You have the "a or b or j" possible letters and then the "i or o" possible letters after the "b".
The regular expression looks sort of like: /(arun)|(b(ill(y+))|(o(b|dy)))|(jose)/ (though I might have slipped a paren). This gives the gist of creating it as a regex.
Once you build this structure, you apply it to your string starting at the first column. Try to run the match by checking for the alternatives and if one matches, more forward tentatively and try the letter after the arrow and its alternatives. If you reach the star/asterisk, it matches. If you run out of alternatives, including backtracking, you move to the next column.
This is a lot of work but can, sometimes, be handy.
Side note I built one of these some time back by writing a program that wrote the code that ran the algorithm directly instead of having code looking at the binary tree data structure.
Think of each set of vertical bar options being a switch statement against a particular character column and each arrow turning into a nesting. If there is only one option, you don't need a full switch statement, just an if.
That was some fast character matching and really handy for some reason that eludes me today.
How about a Bloom Filter?
A Bloom filter, conceived by Burton Howard Bloom in 1970 is a
space-efficient probabilistic data structure that is used to test
whether an element is a member of a set. False positive matches are
possible, but false negatives are not; i.e. a query returns either
"inside set (may be wrong)" or "definitely not in set". Elements can
be added to the set, but not removed (though this can be addressed
with a "counting" filter). The more elements that are added to the
set, the larger the probability of false positives.
The approach could work as follows: you create the set of words that you want to check against (this is done only once), and then you can quickly run the "in/not-in" check for every sub-string. If the outcome is "not-in", you are safe to continue (Bloom filters do not give false negatives). If the outcome is "in", you then run your more sophisticated check to confirm (Bloom filters can give false positives).
It is my understanding that some spell-checkers rely on bloom filters to quickly test whether your latest word belongs to the dictionary of known words.
This code was modified from How to split text without spaces into list of words?:
from math import log
words = open("english125k.txt").read().split()
wordcost = dict((k, log((i+1)*log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)
def infer_spaces(s):
"""Uses dynamic programming to infer the location of spaces in a string
without spaces."""
# Find the best match for the i first characters, assuming cost has
# been built for the i-1 first characters.
# Returns a pair (match_cost, match_length).
def best_match(i):
candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)
# Build the cost array.
cost = [0]
for i in range(1,len(s)+1):
c,k = best_match(i)
cost.append(c)
# Backtrack to recover the minimal-cost string.
costsum = 0
i = len(s)
while i>0:
c,k = best_match(i)
assert c == cost[i]
costsum += c
i -= k
return costsum
Using the same dictionary of that answer and testing your string outputs
>>> infer_spaces("hithisisastringthatmustbechecked")
294.99768817854056
The trick here is finding out what threshold you can use, keeping in mind that using smaller words makes the cost higher (if the algorithm can't find any usable word, it returns inf, since it would split everything to single-letter words).
In theory, I think you should be able to train a Markov model and use that to decide if a string is probably a sentence or probably garbage. There's another question about doing this to recognize words, not sentences: How do I determine if a random string sounds like English?
The only difference for training on sentences is that your probability tables will be a bit larger. In my experience, though, a modern desktop computer has more than enough RAM to handle Markov matrices unless you are training on the entire Library of Congress (which is unnecessary- even 5 or so books by different authors should be enough for very accurate classification).
Since your sentences are mashed together without clear word boundaries, it's a bit tricky, but the good news is that the Markov model doesn't care about words, just about what follows what. So, you can make it ignore spaces, by first stripping all spaces from your training data. If you were going to use Alice in Wonderland as your training text, the first paragraph would, perhaps, look like so:
alicewasbeginningtogetverytiredofsittingbyhersisteronthebankandofhavingnothingtodoonceortwiceshehadpeepedintothebookhersisterwasreadingbutithadnopicturesorconversationsinitandwhatistheuseofabookthoughtalicewithoutpicturesorconversation
It looks weird, but as far as a Markov model is concerned, it's a trivial difference from the classical implementation.
I see that you are concerned about time: Training may take a few minutes (assuming you have already compiled gold standard "sentences" and "random scrambled strings" texts). You only need to train once, you can easily save the "trained" model to disk and reuse it for subsequent runs by loading from disk, which may take a few seconds. Making a call on a string would take a trivially small number of floating point multiplications to get a probability, so after you finish training it, it should be very fast.

How to perform arithmetic on elements of a Prolog list

Background
A list of integer coefficients can be used to represent a polynomial (in X). For example, 1 + 3x + 3x^2 + 2x^3 is represented by [1,3,3,2].
Let P be one of these lists.
I need to write axioms that take these coefficients and do different things with them.
Example: axioms for a relation eval(P,A,R) where R is the result of evaluating the polynomial represented by P at X = A (expect P and A to be fully instantiated). For example, eval([3,1,2],3,R) produces R=24. (This is because 3(3)^0 + 1(3)^1 + 2(3)^2 = 3 + 3 + 18 = 24).
This Prolog tutorial discusses searching a list recursively: "It sees if something is the first item in the list. If it is, we succeed. If it is not, then we throw away the first item in the list and look at the rest".
on(Item,[Item|Rest]).
on(Item,[DisregardHead|Tail]):-
on(Item,Tail).
Question
How does this code throw away the first item in the list?
The question then becomes, once I've found it, how do I use it to calculate as described above?
How does this code throw away the first item in the list?
By calling on recursivelly on the list's tail, you're ignoring the first element. And since you won't use it, you should call it _ instead of DisregardHead (some compilers will warn you of "singleton variable").
once I've found it, how do I use it to calculate as described above?
Well, on is supposed to return multiple results - one for each match - while your goal is to have a single result that takes the whole list into account. So you shouldn't disregard the first element, but incorporate it in the results. Example:
my_pred([],0).
my_pred([Item|Tail],Result) :-
my_pred(Tail,IntermResult),
combine(Item,IntermResult,Result). % Ex.: Result is Item + IntermResult
I haven't given a complete code since it appears you're still learning, but can do if that's what you want. This is also a very simple example, and not optimized (i.e. no tail recursion).
Additional hint: if you express your polinomial this way, it should become clear how a recursive calculation could be done:
1 + x * (3 + x * (3 + x * (2 + x * 0)))
I made a 90 sec video to show how you can use SWI-Prolog's guitracer to intuitively understand simple Prolog programs. On Ubuntu/Debian just do sudo apt-get install swi-prolog and you can try out the debugger yourself using the commands in the video (which are swipl, [filename] (load file), guitracer., trace. and then the query).
Right click the image and choose "View Image" (Firefox) if the text is unreadable.

GNU Prolog - Build up a list in a loop

I need to build a new list with a "loop". Basically i can't use recursion explicitly, so i am using append to go through lists of list.
I can get the element. Problem is i need to check this element and if something is true it returns another element i need to put back into the list. It does check correctly and it changes correctly.
Problem i am having is how do i create a completely new list.
So, if i had
[[1,1,1],[2,6,2],[3,3,3]]
I go through each element. say i get to the 6 and it changes. So i need to create a new list like so,
[[1,1,1],[2,10,2],[3,3,3]].
Right now my main problem is just creating each row. If i can create each row, i will be able to create a list of lists.
So to break this down a little more, lets just worry about [1,1,1].
I go through each element while appending the new element to a newlist. the new list is now [1,1,1]
I have this:
set(Row,Col,Bin,TheEntry,Bout) :-
append(ListLeft, [R|_], Bin),
append(ListLeft2, [C|_], R),
length(ListLeft, LenR),
length(ListLeft2,LenC),
CurrRow is LenR + 1,
CurrCol is LenC + 1,
getChar(C, Row, Col, CurrRow, CurrCol,TheEntry, NewC),
appendhere?.
I need to create a new list there with the character returned from NewC. Not sure how to do this.
Any clues?
Thanks.
To give you an idea about how to use append/3 to extract an item from a list of lists, consider the following predicate called replace/2:
replace(In, Out) :-
append(LL, [L|RL], In),
append(LE, [E|RE], L),
replaceElement(E, NewE), !,
append(LE, [NewE|RE], NewL),
append(LL, [NewL|RL], Out).
replace(In, In).
This non-recursive predicate takes, as Input, a list of lists, and backtracks to find an element E within an inner list L that can be replaced via replaceElement/2; if so, it is replaced by constructing the inner list first (NewL), then uses this new list in the construction of the new outer list (Out), as the result.
Note that this simply serves to demonstrate how to use append/3 to break apart a list of lists to retrieve individual elements as you need via backtracking, and not recursion, as requested. Once an element E is found to be replaceable by NewE via replaceElement/3, it is used in the construction of the list again using append/3 as shown.
Also note that this suggestion (which is intended to help you, not be your final answer) also happens to replace only a single element within an inner list, if any at all. If you want to do multiple replacements of the input list in a single call to replace/2 or similar using this technique, then you will almost certainly need a recursive definition, or the ability to use the global database via assert. I'm happy to be corrected if someone else can provide a definition as a counterexample.
With this example predicate replace/2, together with, say, the following fact:
replaceElement(6, 10).
Executing the following gives us your required behaviour:
1 ?- replace([[1,1,1],[2,6,2],[3,3,3]], Out).
Out = [[1, 1, 1], [2, 10, 2], [3, 3, 3]] ;
false.
If you cannot use cut (!), it is fine to omit it, but note that the second clause replace(In, In) will cause all calls to replace/2 to backtrack at least once to give you the input list back. If this behaviour is undesirable, omitting this second clause will cause replace/2 to fail outright if there is no replacement to be made.
If you cannot use recursion and have to do it with backtracking you should do something like this:
Assume Bin is a list of lists (each item is a full row)
~ Split input Bin in three parts (a list of 'left' rows, a Row, and a list of remaining rows). This can be done using append/3 with something like append(Left, [Item|Rest], Rows)
~ Now obtain the length of the 'left' rows
~ Test the length using 'is' operator to check wether the left list has Row - 1 items
~ Do the same but now with the Item, i.e. split it in three parts (LeftColums, ColumItem and Rest)
~ Test now the length against the required Column
~ Now you have the Item to change so all you need to do is rebuild a list using two appends (one to rebuild the chosen row and another to rebuild the output list).
So from your code you wouldn't use unnamed variables (_). Instead of that you have to use a named variable to be able to rebuild the new list with the item changed.