Scheme: Detecting duplicate elements in a list

Scheme: Detecting duplicate elements in a list - list

Does R6RS or Chez Scheme v7.9.4 have a library function to check if a list contains duplicate elements?
Alternatively, do either have any built in functionality for sets (which dis-allow duplicate elements)? So far, I've only been able to find an example here.
The problem with that is that it doesn't appear to actually be part of the Chez Scheme library. Although I could write my own version of this, I'd much rather use a well known, tested, and maintained library function - especially given how basic an operation this is.
So a simple "use these built-in functions" or a "no built-in library implements this" will suffice. Thanks!

SRFI 1 on list processing has a delete-duplicates function (so you could use that and check the length afterward) and may well have other functions you might find useful.

Kyle,
Awhile back I needed to use a few SRFIs with Chez Scheme. A few that a converted for use with Chez Scheme (including SRFI-1) are at:
http://github.com/dharmatech/chez-srfi
After you add the path to 'chez-srfi' to your CHEZSCHEMELIBDIRS, you can import SRFI-1:
(import (srfi :1))
Ed

Related

What is the name of the data structure for and-or-lists (or and-or-trees) and where can I read about it?

I recently needed to make a data structure which was a nested list of and/or questions. Since most every interesting thing has been discovered by someone else previously, I’m looking for the name of this data structure. it looks something like this.
‘((a b c) (b d e) (c (a b) (f a)))
The interpretation is I want to find abc or bde or caf or caa or cbf or cba and the list encapsulates that. At the top level each item is or’ed together and sub-lists of the top level are and’ed together and sub-lists of sub-lists are or’ed again sub-lists of those are and’ed and sub-lists of those or’ed ad infinitum. Note that in my example, all the lists are the same length, in my real application the lists vary in length.
The code to walk such a “tree” is relatively simple, but I’m assuming that there is a name for that type of tree and there is stuff I can read about it.
These lists are equivalent to fixed length regular expressions (which I've seen referred to as "network expressions", but I am particularly interested in this data structure and representation thereof.

In general (in the very high level of abstraction) it is:
Context free grammar -Wiki
If you allow it to be infinitely nested, then it is not a regular expression because of presence of parentheses (left and right should match).
If you consider, that expressions inside parentheses are ordered. I mean that a and b and c is equivalent to (a and b) and c. You get then Binary expression tree -Wiki
But for your particular case, it is probably: Disjunctive normal form -Wiki
I am not sure, but my intuition says that it is regular expression again because you have only 2 levels of nesting (1st - for 'or-ed' and 2nd - for 'and-ed' parts)

The trees are also a subset of DAWGS - directed acyclic word graphs and one could construct them the same way.
In my case, I have a very small set that I have built by hand and I don't worry about getting the minimal set, but instead just want something that I can easily write down but deals with the types of simple variations I see. Basically, I have different ways of finding where I keep my .el files based upon the different directory structures of various OSes I use. (E.g. when I was working at Google, the /usr/local/emacs/site-lisp directory was actually more like /usr/local/Google/emacs/site-lisp.)
I don't need a full regex, but there are about a dozen variations, some having quite long lists of nested sub-directories (c:\users\cfclark\appData\roaming\emacs.emacs.d or some other awful thing) that I wanted to write down (and then have emacs make an automated search to find the one that is appropriate to this particular installation). And every time I go to a new job, I can simply add to the list a description of where they are in that setup.
Anyway, as that code has evolved, I found that I had I was doing (nested or's and and's and realized that the structure generalized to the alternating or/and/or/and/... case). So, my assumption is that someone must have discovered this before. I had hints of it myself several years ago, but didn't set down to implement it. The Disjunctive Normal Form link mpasko256 gave is also particularly relevant. I don't normalize to that level, I still keep nested and's and or's rather than flattening to 2, but I do have a distinct structure, or's at the top, then and's, then or's....

Use doxygen to document and compare includes and their purpose

I want to use doxygen to document my code in high detail. One thing I want to document, is the "purpose" of single instances of includes of other .c/.h files. And with any chance to find out, if the documented "usage" of the includes meets reality.
Let's say I have an Include #include <some_lib.h>, with a big set of functions but only want or be allowed to use one or two of them, I want to document which I actually use. I would like to say something like:
/**
* Used elements:
* MY_MACRO()
* myFunction()
* myGlobal_Var()
*/
#include <some_lib.h>
Then I want Doxygen to produce some sort of documentation for that file using this information. Preferably a list containing every included file and the the promised items that are used from it.
It would be even better if there is a possibility to get Doxygen to produce a list of the actually used elements. But there are for sure tools out there to do this job.
A Reviewer or a script comparing those 2 results could then easily check if other items then the promised ones used in the implementation (and if the used ones are allowed). At the same time Doxygen will deliver the documentation of the items.
Do you know about a way to achieve this? And/Or have I failed to recognize a Doxygen built-in feature to bring me close to that goal?

OCaml - issues connected with using fold__

I have a several questions connected with fold_left/right.
How to accumulate two or more values? If using a tuple is good solution ?
How to abort work of fold ? For example we must find firstly occurrence of any number, and return position. I mean a break (C++).

zurgl's comment is a great answer (maybe move it down to the answer region).
You can use an exception to terminate a fold early. It's good to try to structure your code so you don't have to do this (in my opinion). Code without exceptions is easier to understand, more composable, parallelizable, etc.

we must find firstly occurrence of any number then you must traverse all your list [1;1;1;1;5], no need to abort here. To deal with partial recursion there are other function like dropwhile, takewhile ... Or you can define the one you need. See folding as projection between type, you have a source type and a target type with a seed value (the accumulator), then folding is a procedure which realize this transformation. Yes using tuple is good solution, IMO.

How would you idiomatically extend arithmetric functions for other datatypes in Clojure?

So I want to use java.awt.Color for something, and I'd like to be able to write code like this:
(use 'java.awt.Color)
(= Color/BLUE (- Color/WHITE Color/RED Color/GREEN))
Looking at the core implementation of -, it talks specifically about clojure.lang.Numbers, which to me implies that there is nothing I do to 'hook' into the core implementation and extend it.
Looking around on the Internet, there seems to be two different things people do:
Write their own defn - function, which only knows about the data type they're interested in. To use you'd probably end up prefixing a namespace, so something like:
(= Color/BLUE (scdf.color/- Color/WHITE Color/RED Color/GREEN))
Or alternatively useing the namespace and use clojure.core/- when you want number math.
Code a special case into your - implementation that passes through to clojure.core/- when your implementation is passed a Number.
Unfortunately, I don't like either of these. The first is probably the cleanest, as the second makes the presumption that the only things you care about doing maths on is their new datatype and numbers.
I'm new to Clojure, but shouldn't we be able to use Protocols or Multimethods here, so that when people create / use custom types they can 'extend' these functions so they work seemlessly? Is there a reason that +,- etc doesn't support this? (or do they? They don't seem to from my reading of the code, but maybe I'm reading it wrong).
If I want to write my own extensions to common existing functions such as + for other datatypes, how should I do it so it plays nicely with existing functions and potentially other datatypes?

It wasn't exactly designed for this, but core.matrix might be of interest to you here, for a few reasons:
The source code provides examples of how to use protocols to define operations that work with with various different types. For example, (+ [1 2] [3 4]) => [4 6]). It's worth studying how this is done: basically the operators are regular functions that call a protocol, and each data type provides an implementation of the protocol via extend-protocol
You might be interested in making java.awt.Color work as a core.matrix implementation (i.e. as a 4D RGBA vector). I did something simiilar with BufferedImage here: https://github.com/clojure-numerics/image-matrix. If you implement the basic core.matrix protocols, then you will get the whole core.matrix API to work with Color objects. Which will save you a lot of work implementing different operations.

The probable reason for not making arithmetic operation in core based on protocols (and making them only work of numbers) is performance. A protocol implementation require an additional lookup for choosing the correct implementation of the desired function. Although from design point of view it may feel nice to have protocol based implementations and extend them whenever required, but when you have a tight loop that does these operations many times (and this is very common use case with arithmetic operations) you will start feeling the performance issues because of the additional lookup on each operation that happen at runtime.
If you have separate implementation for your own data types (ex: color/-) in their own namespace then it will be more performant due to a direct call to that function and it also make things more explicit and customizable for specific cases.
Another issue with these functions will be their variadic nature (i.e they can take any number of arguments). This is a serious issue in providing a protocol implementation as protocol extended type check only works on first parameter.

You can have a look at algo.generic.arithmetic in algo.generic. It uses multimethods.

Is everything a list in scheme?

Along with the book "Simply Scheme" (Second Edition) i'm watching the "Computer Science 61A - Lectures" on youtube. On the lectures , the tutor uses Stk interpreter, but i'm using chicken scheme interpreter.
In the first lecture he uses the "first" procedure which if it's called like :
(first 'hello)
it returns "h".
On the book of "Simply Scheme" it has an example of how first can be implemented:
(define (first sent)
(car sent))
Which to my testing and understanding works if sent is a list .
I'm trying to understand if it's proper to say that "everything is a list" in scheme.
To be more specific where's the list in 'hello and if there is one, why it doesn't work in first procedure as it's written in the book?
Also if every implementation is written with "everything is a list" in mind why the same code does not work in all scheme implementations?

No, this is a common misconception because lists are so pervasive in Scheme programming (and often functional programming in general). Most Scheme implementations come with many data types like strings, symbols, vectors, maps/tables, records, sets, bytevectors, and so on.
This code snippet (first 'hello) is unlikely to work in most Schemes because it is not valid according to the standard. The expression 'hello denotes a symbol, which is an opaque value that can't be deconstructed as a list (the main thing you do with symbols is compare them with eq?). This is probably a quirk of Stk that is unfortunately taught by your book.
See The Scheme Programming Language for a more canonical description of the language. If you just want to learn programming, I recommend HtDP.

Not everything is a list in Scheme. I'm a bit surprised that the example you're showing actually works, in other Scheme interpreters it will fail, as first is usually an alias for car, and car is defined only for cons pairs. For example, in Racket:
(first 'hello)
> first: expected argument of type <non-empty list>; given 'hello
(car 'hello)
> car: expects argument of type <pair>; given 'hello
Scheme's basic data structure is the cons pair, with it it's possible to build arbitrarily linked data structures - in particular, singly-linked lists. There are other data structures supported, like vectors and hash tables. And of course there are primitive types - booleans, symbols, numbers, strings, chars, etc. So it's erroneous to state that "everything is a list" in Scheme.

With respect to Simply Scheme: the functions first and rest are not the standard ones from the Scheme standard, nor ones that come built-into DrRacket. The Simple Scheme API is designed as part of the Simply Scheme curriculum to make it easy to work uniformly on a variety of data. We can't make too many assumptions on how the underlying, low-level implementation works from just the experience of using the Simply Scheme teaching language! There's a runtime cost involved with making things that simple: it does not come for free.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js