Usual Terminology to name a list of list - list

I frequently have to deal with list of list of my item-type in my scripts.
Most of the time, I'm reducing the list of my item-type into items with a plural to indicate a collection. Following this convention, I would consequently name my list of list of my item-type : list_items
However I might as well have to deal with some list of list_items.
So I was wondering if there were any terminology, name, to use so as to indicate the concept list of list of (or even sequence of sequence of or generator of generator of)
I first thought of 2D-Array, but it's not appropriate since all lists may not have the same length.
Any idea ?

struct Value;
Value value
list<Value> values;
typedef list<Value> ValueList;
ValueList valueList;
list<ValueList> valueLists;
I would not go any further. Usually you should not name you data types based on its content. The name should try to express the semantics.
Sure you have a list of values, but that does it mean? Is it a List of Features? then you should call it so. That does the List of Featurelists represent? is it your Groundtruth data of your classifier? Hope you got the point.

Related

How to create a function that returns a list that is a union of the elements of a nested list in Ocaml?

If I was given a set of lists within a list in Ocaml, for example [[3;1;3]; [4]; [1;2;3]], then how can we implement a function to return a list that is a union of the values of the nested list (so the output from the example will return [1;2;3;4])? I tried removing duplicates from the list, but it didn't work as intended. I am also restricted to using the List module only.
Restricted to using the List module only? Sounds like homework with an arbitrary limit like that. so I don't want to give a fully working solution. However, if you look through the List documentation, you'll see a couple of functions that can be combined to do what you want.
concat, which takes a list of lists and flattens them out into a single list, and sort_uniq, which sorts a list and removes duplicates.
So you just have to take your list of lists, turn it into a single list, and sort_uniq that (With an appropriate comparison function) to get your desired results.

how does a language *know* when a list is sorted?

Forgive me if this question is dumb, but it occurred to me I don't know how a language knows a list is sorted or not.
Say I have a list:
["Apple","Apricot","Blueberry","Cardamom","Cumin"]
and I want to insert "Cinnamon".
AFAIK The language I'm using doesn't know the list is sorted; it's just a list. And it doesn't have a "wide screen" field of view like we do, so it doesn't know where the A-chunk ends and the C-chunk begins from outside the list. So it goes through and compares the first letter of each array string to the first letter of the insert string. If the insert char is greater, it moves to the next string. If the chars match, it moves to the next letter. If it moves on to the next string and the array's char is greater than the insert's char, then the char is inserted there.
My question is, can a language KNOW when a list is sorted?
If the process for combing through a unsorted and sorted list is the same, and the list is still iterated through, then how does sorting save time?
EDIT:
I understand that "sorting allows algorithms that rely on sorting to work"; I apologize for not making that clear. I guess I'm asking if there's anything intrinsic about sorting inside computer languages, or if it's a strategy that people built on top of it. I think it's the latter and you guys have confirmed it. A language doesn't know if it's sorting something or not, but we recognize the performance difference.
Here's the key. The language doesn't / can't / shouldn't know whether your data structure is sorted or unsorted. In fact it doesn't even care what data structure it really is.
Now consider this: What does insertion or deletion really mean? What exact steps need to be taken to insert a new item or delete an existing one. It turns out that the exact meaning of these operations depend upon the data structure that you're using. An array will insert a new element very differently than a linked list.
So it stands to reason that these operations must be defined in the context of the data structure on which these are being applied. The language in general does not supply any keywords to deal with these data structures. Rather the accompanying libraries provide built-in implementations of these structures that contain methods to perform these operations.
Now to the original question: How does the language "know" if a list is sorted or not and why is it more efficient to deal with sorted lists? The answer is, as evident from what I said above, the language doesn't and shouldn't know about the internals of a list. It is the implementation of the list structure that you're using that knows if it is sorted or not, and how to insert a new node in an ordered manner. For example, a certain data structure may use an index (much like the index of a book) to locate the position of the words starting with a certain letter, thus reducing the amount of time that an unsorted list would require to traverse through the entire list, one element at a time.
Hope this makes it a bit clearer.
Languages don't know such things.
Some programming languages come with a standard library containing data structures, and if they don't, you generally can link to libraries that do.
Some of those data structures may be collection types that maintain order.
So given you have a data structure that represents an ordered collection type, then it is that data structure that maintains the order of the collection, whenever an item is added or removed.
If you're using the most basic collection type, an array, then neither the language nor the runtime nor the data structure itself (the array) care in the slightest at what point you insert an item.
can a language KNOW when a list is sorted
Do you mean a language interpreter? Of course it can check whether a list is sorted, simply by checking each elements is "larger" than the previous. I doubt that interpreters do this; why should they care if the list is sorted or not?
In general, if you want to insert "Cinammon" into your list, you need to either specify where to insert it, or just append it at the end. It doesn't matter to the interpreter if the list is sorted beforehand or not. It's how you use the list that determines whether a sorted list will remain sorted, and whether or not it needs to be sorted to begin with. (For example, if you try to find something in the list using a binary search, then the list must be sorted. But you must arrange for this to be the case).
AFAIK The language I'm using ...
(which is?)
... doesn't know the list is sorted; it's just a list. And it doesn't have a "wide screen" field of view like we do, so it doesn't know where the A-chunk ends and the C-chunk begins from outside the list. So it goes through and compares the first letter of each array string to the first letter of the insert string. If the insert char is greater, it moves to the next string. If the chars match, it moves to the next letter. If it moves on to the next string and the array's char is greater than the insert's char, then the char is inserted there.
What you're saying, I think, is that it looks for the first element that is "bigger than" the one being inserted, and inserts the new element just before it. That implies that it maintains the "sorted" property of the list, if it is already sorted. This is horribly inefficient for the case of unsorted lists. Also, the technique you describe for finding the insertion point (linear search) would be inefficient, if that is truly what is happening. I would suspect that your understanding of the list/language semantics are not correct.
It would help a lot if you gave a concrete example in a specific language.

Why does Elixir have so many similar list types in the standard library?

I'm doing the Elixir koans, and already I've worked through something like five different listy data types:
List
Char list
Word list
Tuple
Keyword list
Map
MapSet
Struct
Some of these I buy, but all of them at the same time? Does anyone actually use all of these lists for strictly separated purposes?
Short answer is: yes.
Long answer is:
Lists - are a basic data structure you use everywhere. Lists are ordered and allow duplicates. The main use case is: homogenous varied-length collections
Charlists - where Elixir uses strings (based on binaries), Erlang usually uses charlists (lists of integer codepoints). It's mainly a compatibility interface;
Word lists - I've never heard of those;
Tuples - are another basic data structure you use everywhere. The main use case is: heterogenous fixed-length collections;
Keyword lists - are very common, mainly used for options. It's a simple abstraction on top of lists and tuples (a list of two-element tuples). Allow for duplicate keys and maintain order of keys, since they are ordered pattern-matching is very impractical.
Maps - are common too. Allow for easy pattern matching on keys, but do not allow duplicate keys and are not ordered.
MapSet - sets are a basic data structure - an unordered, unique collection of elements.
Structs - are the main mechanism for polymorphism in Elixir (through protocols), allow creating more rigid structures with keyset enforced at compile-time.
With functional programming choosing the right data structure to represent your data is often half of the issue, that's why you get so many different structures, with different characteristics. Each one has it's use-cases and is useful in different ways.
#michalmuskala provided here great answer, maybe I just extend it a bit.
Lists are the workhorse in Elixir. There's a plenty of issues that you will solve with lists. Lists are not arrays, where random access is the best way to get values, instead lists in Elixir are linked data structures and you traverse them by splitting into head and tail (if you know LISP, Prolog or Erlang, you'd will just like in home).
Charlists are just lists, but narrowed to lists of integers.
Tuples - usually they contain two to four elements. There are common way to pass additional data, but still send one parameter. Common behaviours like GenServer etc. uses them as an expected reply.
Keyword lists are list of tuples and you can use them when you need to store for one key more than one value. This is syntantic sugar.
Instead of a = [{:name, "Patryk"}] you can have a = [name: "Patryk"] and access it with a[:name].
Maps are associative arrays, hashes, dicts etc. One key holds one value and keys are unique.
Set - think about mathematicians sets. Unordered collection of unique values.
Struct - as #michalmuskala wrote they are used in protocols and they are checked by the compiler. Actually they're maps defined for module.
The answers are to be read from the bottom to the top :)
#michalmuskala provided here great answer, #patnowak extended it perfectly. I am here to mostly answer to the question “Does anyone actually use all of these lists for strictly separated purposes?“
Elixir (as well as Erlang) is all about pattern matching. Having different types of lists makes it easy to narrow the pattern matching in each particular case:
List is used mostly in recursion; Erlang has no loops, instead one does recursive calls. It’s highly optimized when used properly (tail-recursion.) Usually matches as [head | tail].
charlist is used in “string” pattern matching, whatever it means. Check for “the first letter of his name is ‘A’” in Erlang would be done with pattern match against [?A | rest] = "Aleksei" |> List.Chars.to_charlist
Tuple is used in pattern matching of different instances of the more-or-less same entity. Fail/Success would be returned as tuples {:ok, result} and {:error, message} respectively and pattern matched afterwards. GenServer simplifies handling of different messages that way as well.
Map is to be pattern matched as %{name: "Aleksei"} = generic_input to immediately extract the name. Keywords are more or less the same.
etc.

Hierarchical filtered lookup in C++

I have been pondering a data structure problem for a while, but can't seem to come up with a good solution. I can not shake off the feeling that the solution is simple and I'm just not seeing it, however, so hopefully you guys can help!
Here is the problem: I have a large collection of objects in memory. Each of them has a number of data fields. Some of the data fields, such as an ID, are unique for each objects, but others, such as a name, can appear in multiple objects.
class Object {
size_t id;
std::string name;
Histogram histogram;
Type type;
...
};
I need to organize these objects in a way that will allow me to quickly (even if the number of objects is relatively large, i.e. millions) filter the collection given a specification of an arbitrary number of object members while all members that are left unspecified count as wildcards. For example, if I specify a given name, I want to retrieve all the objects whose name member equals the given name. However, if I then add a histogram to the query, I would like the query to return only the objects that match in both the name and the histogram fields, and so on. So, for example, I'd like a function
std::set<Object*> retrieve(size_t, std::string, Histogram, Type)
that can both do
retrieve(42, WILDCARD, WILDCARD, WILDCARD)
as well as
retrieve(42, WILDCARD, WILDCARD, Type_foo)
where the second call would return fewer or equally as many objects as the first one. Which data structure allows queries like this and can both be constructed and queried in reasonable time for object counts in the millions?
Thanks for the help!
First you could use Boost Multi-index to implement efficent lookup over differnt members of your Object. This could help to limit the number of elements to consider. As a second step you can simply use a lambda expression to implement a predicate for std::find_if to get first element or use std::copy_if to copy all elements to an target sequence. If you decide to use boost you can use Boost Range with filtering.

c++ last element of a structure field

I get a structure, and I don't know the size of it (every time it's different). I would like to set the last place in one of the fields of this structure to a certain value. In pseudocode, I mean something like this:
structureA.fieldB[end] = cert_value;
I'd do it in matlab however I cannot somehow find the proper syntax in c++, can you help me?
In Matlab, a structure data type holds key-value pairs where the "value" may be of different types. In C++, there are some key-value containers available (associative containers like set, map, multimap), but they usually store elements of a single type. What you need if I understood it right is something like
"one" : 1
"two" : [1,2,5]
"three" : "name"
Which means that your structure resembles a Python dictionary.
In C++, the only way I have heard of using containers with truly different types is by using boost::any, which is accepted as the answer to this question.
If you pack a container with elements of different types, then you can use the end() member function of a container to get the last element.
You need sizeof, this gives you the size of the array in bytes. Since you want the the index of the last element, you have to divide this number by the number of bytes for one element. You end up with:
int index_end = sizeof(structureA.fieldB) / sizeof(structureA.fieldB[0]);
structureA.fieldB[index_end] = new_value;