Let X and Y be a matrix and a list respectively, both with the exact same size, elements and pattern. Assuming we need to retrieve the (nth) element from both, which method is faster? If i'm not mistaken, in languages like C or C++, the first case is computed as (matrix_pointer+step)* which should give O(θ) time (a.k.a. constant) since its just an addition, while in languages like Prolog or Haskell, while working with lists, you need to pass through the (nth-1) element to get to the (nth) one, which must give O(n) complexity.
Any clarification would be much appreciated.
Related
I am supposed to write a rule in SWI Prolog, which takes a list of characters as input and then replaces each letter by a random other character with a probability of 0.01.
Example:
?- mutate([a,b,c,d,e,f,g],MutatedList).
MutatedList = [a,b,c,a,e,f,g].
Can anyone tell me how that could be implemented? I am totally clueless so far about how this could work out in Prolog.
Thanks to anyone who can help!^^
This is relatively easy. You can use maplist/3 to relate the elements of the lists in a pairwise way. (Take a look at some of my notes on maplist/3].
For each pair [InputItem,OutputItem] sampled from [InputList,OutputList], maplist/3 will call a predicate, call it choose(InputItem,OutputItem).
That predicate will relate InputItem to the same value, InputItem or to a randomly chosen character (atom of length 1), which can be generated by selecting it randomly from a list of characters. The choice on whether to perform mutation can be made using random_float/0 for example.
Of course, choose(InputItem,OutputItem) is not really a predicate (it is just called that way, both in name at runtime), as it does not behave "predicatly" at all, i.e. it will have different outcomes depending on the time of day. It's an Oracle getting information from a magic reservoir. But that's okay.
Now you are all set. Not more than 4 lines!
Assuming I have a list, is there a built-in operator or function to select elements based on a list of indices?
For example, an operator something like this ["a", "b", "z"] = alphabet[0, 1, 25]
An naive implementation of this could be:
def select(list, indices) do
Enum.map(indices, &(Enum.at(list, &1)))
end
If it doesn't exist, it this a deliberate omission to avoid lists being treated like arrays?
An example of what I'm attempting that made me want this, in case I'm asking the wrong question: Given a list, I want to select the first, middle, and last elements, then calculate the median of the three. I was doing length(list) to calculate the length, then I wanted to use this operator/function to select the three elements I'm interested in.
As far as I know, the built in operator does not exist. And each time I have to fetch several elements in a list, I use the same implementation as yours. It is quite short and simple to recreate and I suspect it is the reason why there are no off-the shelf solution in elixir.
Another reason I can think of, is as you pointed out, the fact that lists aren't arrays: when you want to access one element, you have to access all the elements before it, therefore accessing elements by a list of index is not a relevant function, because list are not optimized to be used that way.
Still I often access a list of element with a list of index, meaning that I might not be using elixir the right way.
Let us say that I have a 2D matrix, given by vector<vector<double>> matrix, and that matrix has already been initialized to have R rows and C columns.
There is also a list-of-coordinates (made up of say, N (x,y) pairs), that we are going to process, such that for each co-ordinate, we get a mapping to a particular row (r) and column (c) in our matrix. So, we basically have [r, c] = f(x,y). The particularities of the mapping function f are not important. However, what we want to do, is keep track of the rows r and columns c that are used, by inserting them into another list, called list-of-indicies.
The problem is that, I do not want to keep adding the same r and c to the list if that (r,c) pair already exists in that list. The brute force method would be to simply scan the entire list-of-indicies every time I want to check, but that is going to be very time consuming.
For example, if we have the co-ordinate (x=4, y=5), this yields (r=2, c=6). So, we now add (r=2, c=6) to the list-of-indicies. Now we get a new point, given by (x=-2, y=10). This also ends up falling under (r=2, c=6). However, since I have already added (r=2, c=6) to my list, I do not want to add it again! But without doing a brute-force scan of the list-of-indicies, is there a better way?
You would need a map to do that.
In case you use c++11 you can use unordered_map which is a hashmap and has a constant time lookup, in case you use an older version of c++ you can use the standard map, which is a treemap, and has a logarithmic lookup.
The performance difference won't be big, if you don't have many items.
Instead of the map or unordered_map you could simply use a matrix vector<vector<bool>> with the same R and C as your other matrix with every field initialized to false.
Instead of adding and (r,c) pair to a list you simply set the corresponding boolean in the matrix to true.
I have two sorted arrays, one containing factors (array a) that when multiplied with values from another array (array b), yields the desired value:
a(idx1) * b(idx2) = value
With idx2 known, I would like find the idx1 of a that provides the factor necessary to get as close to value as possible.
I have looked at some different algorithms (like this one, for example), but I feel like they would all be subject to potential problems with floating point arithmetic in my particular case.
Could anyone suggest a method that would avoid this?
If I understand correctly, this expression
minloc(abs(a-value/b(idx2)))
will return the the index into a of the first occurrence of the value in a which minimises the difference. I expect that the compiler will write code to scan all the elements in a so this may not be faster in execution than a search which takes advantage of the knowledge that a and b are both sorted. In compensation, this is much quicker to write and, I expect, to debug.
I've noticed that in functional languages such as Haskell and OCaml you can do 2 actions with lists. First you can do x:xs where x is an element ans xs is a list and the resulting action is we get a new list where x is appended to the beginning of xs in constant time. Second is x++y where both x and y are lists and the resulting action is we get a new list where y gets appended to the end of x in linear time with respect to the number of elements in x.
Now I'm no expert in how languages are designed and compilers are built, but this seems to me a lot like a simple implementation of a linked list with one pointer to the first item. If I were to implement this data structure in a language like C++ I would find it to be generally trivial to add a pointer to the last element. In this case if these languages were implemented this way (assuming they do use linked lists as described) adding a "pointer" to the last item would make it much more efficient to add items to the end of a list and would allow pattern matching with the last element.
My question is are these data structures really implemented as linked lists, and if so why do they not add a reference to the last element?
Yes, they really are linked lists. But they are immutable. The advantage of immutability is that you don't have to worry about who else has a pointer to the same list. You might choose to write x++y, but somewhere else in the program might be relying on x remaining unchanged.
People who work on compilers for such languages (of whom I am one) don't worry about this cost because there are plenty of other data structures that provide efficient access:
A functional queue represented as two lists provides constant-time access to both ends and amortized constant time for put and get operations.
A more sophisticated data structure like a finger tree can provide several kinds of list access at very low cost.
If you just want constant-time append, John Hughes developed an excellent, simple representation of lists as functions, which provides exactly that. (In the Haskell library they are called DList.)
If you're interested in these sorts of questions you can get good info from Chris Okasaki's book Purely Functional Data Structures and from some of Ralf Hinze's less intimidating papers.
You said:
Second is x++y where both x and y are
lists and the resulting action is y
gets appended to the end of x in
linear time with respect to the number
of elements in x.
This is not really true in a functional language like Haskell; y gets appended to a copy of x, since anything holding onto x is depending on it not changing.
If you're going to copy all of x anyway, holding onto its last node doesn't really gain you anything.
Yes, they are linked lists. In languages like Haskell and OCaml, you don't add items to the end of a list, period. Lists are immutable. There is one operation to create new lists — cons, the : operator you refer to earlier. It takes an element and a list, and creates a new list with the element as the head and the list as the tail. The reason x++y takes linear time is because it must cons the last element of x with y, and then cons the second-to-last element of x with that list, and so on with each element of x. None of the cons cells in x can be reused, because that would cause the original list to change as well. A pointer to the last element of x would not be very helpful here — we still have to walk the whole list.
++ is just one of dozens of "things you can do with lists". The reality is that lists are so versatile that one rarely uses other collections. Also, we functional programmers almost never feel the need to look at the last element of a list - if we need to, there is a function last.
However, just because lists are convenient this does not mean that we do not have other data structures. If you're really interested, have a look at this book http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf (Purely Functional Data Structures). You'll find trees, queues, lists with O(1) append of an element at the tail, and so forth.
Here's a bit of an explanation on how things are done in Clojure:
The easiest way to avoid mutating state is to use immutable data structures. Clojure provides a set of immutable lists, vectors, sets and maps. Since they can't be changed, 'adding' or 'removing' something from an immutable collection means creating a new collection just like the old one but with the needed change. Persistence is a term used to describe the property wherein the old version of the collection is still available after the 'change', and that the collection maintains its performance guarantees for most operations. Specifically, this means that the new version can't be created using a full copy, since that would require linear time. Inevitably, persistent collections are implemented using linked data structures, so that the new versions can share structure with the prior version. Singly-linked lists and trees are the basic functional data structures, to which Clojure adds a hash map, set and vector both based upon array mapped hash tries.
(emphasis mine)
So basically it looks you're mostly correct, at least as far as Clojure is concerned.