How do you make an array of lists in clojure? - clojure

So I'm working on a compiler lexer, and I am defining the transition table with
(make-array rows)
where rows is a list of list of lists.
However, I'm running into memory issues creating a tall nested list of 800 * 127 * '() rows, and then converting it back to array.
Is there a way to create an empty 2d-array, and then dynamically set its cells with lists? List sizes of each cells would not be same.

If you don't actually need to initialize each value to clojure.lang.PersistentList$EmptyList (aka '()), this can be as simple as:
(make-array clojure.lang.PersistentList 800 127)
...that said, I don't particularly recommend it. Is there a reason you can't use a vector of vectors?

Related

Using Filter on vectors

I am trying to use the filter function over a vector called dataset that is defined like so:
AK,0.89,0.98
AR,0.49,0.23
AN,0.21,0.78
...
And I want to get all of the values that contain a certain string, something like this:
(filter (contains "AK") dataset)
Which would return:
AK,0.89,0.98
Is it possible to do this using the filter function?
I already iterate over the vector using doseq but I'm required to use filter at some point in my code.
Thanks :)
The basic answer is yes, you can use filter to do this. Filter expects a
predicate function i.e. a function which returns true or false. The filter
function will iterate over the elements in the collection you pass in and pass
each element from that collection to the predicate. What you do inside the
predicate function is totally up to you (though you should make sure to avoid
side effects). Filter will collect all the elements where the predicate returned
true into a new lazy sequence.
Essentially, you have (long form)
(filter (fn [element]
; some test returning true/fals) col)
where col is your collection. The result will be a LAZY SEQUENCE of elements
where the predicate function returned true. It is important to understand that
things like filter and map return lazy sequences and know what that really means.
The critical bit to understand is the structure of your collection. In your
description, you stated
I am trying to use the filter function over a vector called dataset
that is defined like so:
AK,0.89,0.98 AR,0.49,0.23 AN,0.21,0.78 ...
Unfortunately, your description is a little ambiguous. If your dataset structure
is actually a vector of vectors (not simply a vector), then things are very
straight-forward. This is because it will mean that each 'element' passed to the
predicate function will be one of your 'inner' vectors. The real definition is
more accurately represented as
[
[AK,0.89,0.98]
[AR,0.49,0.23]
[AN,0.21,0.78]
...
]
what will be passed to the predicate is a vector of 3 elements. If you just want
to select all the vectors where the first element is 'AK', then the predicate
function could be as simple as
(fn [el]
(if (= "AK" (first el))
true;
false))
So the full line would be something like
(filter (fn [el]
(if (= "AK" (first el))
true
false)) [[AK 0.89 0.98] [AR 0.49 0.23] [AN 0.21 0.78]])
and that is just the start and very verbose version. There is lots you can do to
make this even shorter e.g.
(filter #(= "AK" (first %)) [..])
If on the other hand, you really do just have a single vector, then things
become a little more complicated because you would need to somehow group the
values. This could be done by using the partition function to break up your
vector into groups of 3 items before passing them to filter e.g.
(filter pred (partition 3 col))
which would group the elements in your original vector into groups of 3 and pass
each group to the predicate function. This is where the real power of map,
filter, reduce, etc come into play - you can transform the data, passing it
through a pipeline of functions, each of which somehow manipulates the data and
a final result pops out the end.
The key point is to understand what filter (and other functions like this, such
as map or reduce) will understand as an 'element' in your input
collection. Basically, this is the same as what would be returned by 'first'
called on the collection. This is what is passed to the predicate function in
fileter.
There are a lot of assumptions here. One of the main ones is that your data is
strictly ordered i.e. the value you are looking to test is always the first
element in each group. If this is not the case, then more work will need to be
done. Likewise, we assume the data is always in groups of 3. If it isn't, then other approaches will be needed.

Why are Clojure vectors used to pass key-value pairs?

As a newcomer to Clojure, the distinction between a vector (array-like) and a map (key-value pairs) initially seemed clear to me.
However, in a lot of situations (such as the "let" special form and functions with keyword arguments) a vector is used to pass key-value pairs.
The source code for let even includes a check to ensure that the vector contains an even number of elements.
I really don't understand why vectors are used instead of maps. When I read about the collection types, I would expect maps to be the preferred way to store any information in key-value format.
Can anyone explain me why vectors also seem to be the preferred tool to express pairs of keys and values?
The wonderful people at the Clojure IRC channel explained to me the primary reason: maps (hashes) are not ordered.
For example, the let form allows back-references which could break if the order of the arguments is not stable:
(let [a 1 b (inc a)] (+ a b))
The reason why ordered maps are not used
they have no convenient literal
vanilla Clojure has no ordered map
except one that is ordered by sorting keys (which would be weird).
Thus, the need to keep arguments in order trumps the fact that they are key-value pairs.

Flipping elements in Clojure

If I input a vector and wanted to flip the elements' order, I'd write
(reverse [1 2 3])
Now, how would I generalize this idea to be able to input nested vectors and flip the order of elements? Given a matrix A the function should reverse the elements in each column.
Based on the example you gave in the comments, this is exactly what reverse does given a collection that contains collections. You want reverse.

Complexity of lists in haskell in Data.map

Sorry if this seems like an obvious question.
I was creating a Data.map of lists {actually a tuple of an integer and a list (Integer, [(Integer, Integer)])} for implementing a priority queue + adjacency list for some graph algorithms like Dijkstras and Prims,
The Data.map is implemented using binary trees(I read that) so I just want to confirm that when doing the map operations (I believe they will be rotations) the interpreter does not do deep copies of the list just shallow copies of the references of lists right?
I am doing this to implement a prims algorithm in haskell which will run in O(nlogn + mlogn) time where n = no. of vertices and m = no. of edges, in a purely functional way,
If the lists are stored in the priority queue the algorithm will work in that time. Most haskell implementations I found online, dont achieve this complexity.
Thanks in advance.
You are correct that the lists will not be copied every time you create a new Map, at least if you're using GHC (other implementations probably do this correctly as well). This is one of the advantages of a purely functional language: because data can't be rewritten, data structures don't need to be copied to avoid problems you might have in an imperative language. Consider this snippet of Lisp:
(setf a (list 1 2 3 4 5))
(setf b a)
; a and b are now both '(1 2 3 4 5).
(setf (cadr a) 0)
; a is now '(1 0 3 4 5).
; Careful though! a and b point to the same piece of memory,
; so b is now also '(1 0 3 4 5). This may or may not be what you expected.
In Haskell, the only way to have mutable state like this is to use an explicitly mutable data structure, such as something in the State monad (and even this is sort of faking it (which is a good thing)). This potentially unexpected memory duplication issue is unthinkable in Haskell because once you declare that a is a particular list, it is that list now and forever. Because it is guaranteed to never change, there is no danger in reusing memory for things that are supposed to be equal, and in fact, GHC will do exactly this. Therefore, when you make a new Map with the same values, only pointers to the values will be copied, not the values themselves.
For more information, read about the difference between Boxed and Unboxed types.
1) Integer is slower then Int
2) If you have [(Integer, [(Integer, Integer)])]
You could create with Data.Map not only Map Integer [(Integer, Integer)], but Map Integer (Map Integer Integer)
3) If you use Int instead of Integer, you could use a bit quicker data - IntMapfrom Data.IntMap: IntMap (IntMap Int)
4) complexity of each methods are written in description:
Data.IntMap.Strict and here Data.IntMap.Lazy:
map :: (a -> b) -> IntMap a -> IntMap b
O(n). Map a function over all values in the map.

What are the differences between vector and list data types in R?

What are the main differences between vector and list data types in R? What are the advantages or disadvantages of using (or not) these two data types?
I would appreciate seeing examples that demonstrate the use cases of the data types.
Technically lists are vectors, although very few would use that term. "list" is one of several modes, with others being "logical", "character", "numeric", "integer". What you are calling vectors are "atomic vectors" in strict R parlance:
aaa <- vector("list", 3)
is.list(aaa) #TRUE
is.vector(aaa) #TRUE
Lists are a "recursive" type (of vector) whereas atomic vectors are not:
is.recursive(aaa) # TRUE
is.atomic(aaa) # FALSE
You process data objects with different functions depending on whether they are recursive, atomic or have dimensional attributes (matrices and arrays). However, I'm not sure that a discussion of the "advantages and disadvantages" of different data structures is a sufficiently focused question for SO. To add to what Tommy said, besides lists being capable of holding an arbitrary number of other vectors there is the availability of dataframes which are a particular type of list that has a dimensional attribute which defines its structure. Unlike matrices and arrays which are really folded atomic objects, dataframes can hold varying types including factor types.
There's also the caveat that the is.vector function will return FALSE when there are attributes other than names. See: what is vector?
Lists are "recursive". This means that they can contain values of different types, even other lists:
x <- list(values=sin(1:3), ids=letters[1:3], sub=list(foo=42,bar=13))
x # print the list
x$values # Get one element
x[["ids"]] # Another way to get an element
x$sub$foo # Get sub elements
x[[c(3,2)]] # Another way (gets 13)
str(x) # A "summary" of the list's content
Lists are used in R to represent data sets: the data.frame class is essentially a list where each element is a column of a specific type.
Another use is when representing a model: the result from lm returns a list that contains a bunch of useful objects.
d <- data.frame(a=11:13, b=21:23)
is.list(d) # TRUE
str(d)
m <- lm(a ~ b, data=d)
is.list(m) # TRUE
str(m)
Atomic vectors (non-list like, but numeric, logical and character) are useful since all elements are known to have the same type. This makes manipulating them very fast.
As someone who's just gotten into R, but comes from a C/Java/Ruby/PHP/Python background, here's how I think of it.
A list is really an array + a hashmap. It's a PHP associative array.
> foo = list(bar='baz')
> foo[1]
'baz'
> foo$bar
'baz'
> foo[['bar']]
'baz'
A vector is a fixed-type array/list. Think of it like a linked list - because putting dissimilar items into a linked list is an anti-pattern anyways. It's a vector in the same sense that SIMD/MMX/vector units use the word.
This and similar introductory questions are answered in http://www.burns-stat.com/pages/Tutor/hints_R_begin.html
It is meant to be a gentle introduction that gets you up and running with R as quickly as possible. To some extent it succeeds.
--- Edit: --
An attempt to explain further; quoted from the above reference.
Atomic vector
There are three varieties of atomic vector that you are likely to
encounter:
“numeric”
“logical”
“character”
The thing to remember about atomic vectors is that all of the elements
in them are only of one type.
List
Lists can have different types of items in different components. A
component of a list is allowed to be another list , an atomic vector
(and other things).
Please also refer to this link.
list include multiple data types like character, numeric, logical et. but vector only contains similar type of data.
for ex:
scores <- c(20,30,40,50)
student <- c("A","B","C","D")
sc_log <- c(TRUE,FALSE,FALSE,TRUE)
for list:
mylist <- list(scores,student,sc_log)
# search for class of mylist vector
#check structure of mylist using str() function.
str(mylist)
[1] list of 3
[1] $:num [1:4] 20 30 40 50
[2] $:chr [1:4] "A""B""C""D"
[3] $:log [1:4] TRUE FALSE FALSE TRUE
which means list containing multiple data types like numeric, character and logical in mylist.But in vector there will be single data type of all elements in that vector
for ex:
for vector:
vector1 <- c(1,2,3,4)
Class(vector1)
[1] "Numeric"
#which means all elements of vector containing single data type that is numeric only.