Pyomo sum inside sum with various index

Pyomo sum inside sum with various index - pyomo

I would like to do this:
but I don't know exacly how, this is what I've tried:
def R2(model, da, gr, cu):
return sum(sum(model.Dap[asi, pe] for asi in model.A[asi, gr, cu]) for pe in model.Pd[pe, da]) <= 1
model.R2 = Constraint(model.DA, model.GR, model.CU, rule=R2)

I am not 100% I understand your mathematical formulation.
Based on the way you write the constraint, it seems you have multiple Indexed Sets (i.e. sets which are themselves indexed over other sets), which is often a symptom that the formulation is a bit clunky, although it is sometimes necessary.
If what you are trying to express is a system of constraints, i.e. one constraint for each element of DA, GR, CU, and the sets Pd and A are indexed over DA, GR, CU:
Then you should write
def R2(model, d, g, c):
return sum(sum(model.Dap[p] for a in model.A[g, c])
for p in model.Pd[d]) <= 1
model.R2 = Constraint(model.DA, model.GR, model.CU, rule=R2)
Otherwise, you should try to make the formulation more clear.

Related

Update Param with constraint

i want to define a constraint, which update param when assignment is occur. The constraint is as follow.
Where d is Param and Z is decision variable define as:
model.d = Param(model.V, mutable=True)
model.Z = Var(model.Vs, model.Vc2, within = Binary)
I have tried:
def Cons24_rule(model,i):
return model.d[i] == sum(model.d[j] * model.Z[i,j] for j in model.Vc2)
model.Cons24 = Constraint(model.Vs , rule = Cons24_rule)
but i get infeasibility error. How Can i able to define this constraint?
Pyomo code and test data can be found here.
Thanks - Soheil

Your instance is infeasible. Your constraint says:
d[i] = sum {j in V_c2} d[j] * Z[i,j]
for all i. This means the amount shipped out of i must exactly equal its d, and the amount shipped must fully equal the d of the destinations. But for example, d[9] = 6, and there are no other nodes j such that sum {j} d[j] = 6. So, there is no way to satisfy this constraint, i.e., no way to ship exactly 6 units out of node 9.
I suspect that the real problem is in the logic of your constraint formulation, not in your data. I don't think you want to assume that if i ships to j, then it must ship all of d[j]. Either that, or you don't want to require the total shipped out of i to equal d[i] exactly.

find maximum in a list in Prolog [duplicate]

Is it possible to create a predicate max/2 without an accumulator so that max(List, Max) is true if and only if Max is the maximum value of List (a list of integers)?

Yes, you can calculate the maximum after the recursive step. Like:
max([M],M). % the maximum of a list with one element is that element.
max([H|T],M) :-
max(T,M1), % first calculate the maximum of the tail.
M is max(H,M1). % then calculate the real maximum as the max of
% head an the maximum of the tail.
This predicate will work on floating points for instance. Nevertheless it is better to use an accumulator since most Prolog interpreters use tail call optimization (TCO) and predicates with accumulators tend to work with tail calls. As a result predicates with TCO will usually not get a stack overflow exception if you want to process huge lists.
As #Lurker says, is only works in case that the list is fully grounded: it is a finite list and all elements grounded. You can however use Prolog's constraint logic programming package clp(fd):
:- use_module(library(clpfd)).
max([M],M). % the maximum of a list with one element is that element.
max([H|T],M) :-
max(T,M1), % first calculate the maximum of the tail.
M #= max(H,M1). % then calculate the real maximum as the max of
% head an the maximum of the tail.
You can then for instance call:
?- max([A,B,C],M),A=2,B=3,C=1.
A = 2,
B = M, M = 3,
C = 1 ;
false.
So after the max/2 call, by grounding A, B and C, we obtain M=3.

Neither the standard predicate is/2 nor a CLP(FD) predicate (#=)/2 can do mathematics here. So that in the end, for certain applications such as exact geometry, they might not be suitable.
To make a point lets consider an example and the alternative of a computer algebra system (CAS). I am doing the demonstration with the new Jekejeke Minlog 0.9.2 prototype which provides CAS from within Prolog.
As a preliminary we have two predicates eval_keys/2 and min_key/2, their code is found at the appendix in this post. Lets illustrate what this predicates do, first with integer. The first predicate just makes the keys of a pair list evaluated:
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 7.3.25-3-gc3a87c2)
Copyright (c) 1990-2016 University of Amsterdam, VU Amsterdam
?- eval_keys([(1+3)-foo,2-bar],L).
L = [4-foo,2-bar]
The second predicate picks this first value where the key is minimum:
?- min_key([4-foo,2-bar],X).
X = bar
Now lets look at other key values, we will use square roots, which belong to the domain of algebraic numbers. Algebraic numbers are irrational and thus never fit into a float. We therefore get for the new example the outcome foo:
?- eval_keys([(sqrt(98428513)+sqrt(101596577))-foo,sqrt(400025090)-bar],L).
L = [20000.62724016424-foo, 20000.627240164245-bar].
?- min_key([20000.62724016424-foo, 20000.627240164245-bar], X).
X = foo.
CLP(FD) has only integers and there is no direct way to represent algebraic numbers. On the other hand many CAS systems support radicals. Our Prototype even supports comparison of them so that we can obtain the exact result bar:
Jekejeke Prolog 2, Runtime Library 1.2.2
(c) 1985-2017, XLOG Technologies GmbH, Switzerland
?- eval_keys([(sqrt(98428513)+sqrt(101596577))-foo,sqrt(400025090)-bar],L).
L = [radical(0,[98428513-1,101596577-1])-foo,
radical(0,[400025090-1])-bar]
?- min_key([radical(0,[98428513-1,101596577-1])-foo,
radical(0,[400025090-1])-bar],X).
X = bar
That bar is the exact result can be seen for example by using a multi-precision calculator. If we double the precision we indeed see that the last square root is the smaler one and not the sum of the square roots:
?- use_module(library(decimal/multi)).
% 7 consults and 0 unloads in 319 ms.
Yes
?- X is mp(sqrt(98428513)+sqrt(101596577), 32).
X = 0d20000.627240164244658958331341095
?- X is mp(sqrt(400025090), 32).
X = 0d20000.627240164244408966171597261
But a CAS need not to proceed this way. For example our Prolog implementation uses a Swinnerton-Dyer polynomial inspired method to compare radical expressions, which works purely symbolic.
Appendix Test Code:
% :- use_module(library(groebner/generic)). /* to enable CAS */
eval_keys([X-A|L], [Y-A|R]) :- Y is X, eval_keys(L, R).
eval_keys([], []).
min_key([X-A|L], B) :- min_key(L, X, A, B).
min_key([X-A|L], Y, _, B) :- X < Y, !, min_key(L, X, A, B).
min_key([_|L], X, A, B) :- min_key(L, X, A, B).
min_key([], _, A, A).

Various questions about RSA encryption

I'm currently writing my own ASE/RSA encryption program in C++ for Unix. I've been going through the literature for about a week now, and I've started to wrap my head around it all but I'm still left with some pressing questions:
1) Based on my understanding, an RSA key in its most basic form is the combination of the product of the two primes (R) used and the exponents. It's obvious to me that storing the key in such a form in plaintext would defeat the purpose of encryption anything at all. Therefore, in what form can I store my generated public and private keys? Ask the user for a password and do some "simple" shift/replacing on the individual digits of the key with an ASCII table? Or is there some other standard I haven't run across? Also, when the keys are generated, are R and the respective exponent simply stored sequentially? i.e. ##primeproduct####exponent##? In that case, how would a decryption algorithm parse the key into the two separate values?
2) How would I go about programatically generating the private exponent, given that I've decided to use 65537 as my public exponent for all encryptions? I've got the equation P*Q = 1mod(M), where P and Q and the exponents and M is the result of Euler's Totient Function. Is this simply a matter of generating random numbers and testing their relative primality to the public exponent until you hit pay dirt? I know you can't simply start from 1 and increment until you find such a number, as anyone could simply do the same thing and get your private exponent themselves.
3) When generating the character equivalence set, I understand that the numbers used in the set can't be must be less than and relatively prime to P*Q. Again, this is a matter of testing relative primality of numbers to P*Q. Is the speed of testing relative primality independent of the size of the numbers you're working with? Or are special algorithms necessary?
Thanks in advance to anyone who takes the time to read and answer, cheers!

There are some standard formats for storing/exchanging RSA keys such as RFC 3447. For better or worse, most (many, anyway) use ASN.1 encoding, which adds more complexity than most people like, all by itself. A few use Base64 encoding, which is a lot easier to implement.
As far as what constitutes a key goes: in its most basic form, you're correct; the public key includes the modulus (usually called n) and an exponent (usually called e).
To compute a key pair, you start from two large prime numbers, usually called p and q. You compute the modulus n as p * q. You also compute a number (often called r) that's (p-1) * (q-1).
e is then a more or less randomly chosen number that's prime relative to r. Warning: you don't want e to be really small though -- log(e) >= log(n)/4 as a bare minimum.
You then compute d (the private decryption key) as a number satisfying the relation:
d * e = 1 (mod r)
You typically compute this using Euclid's algorithm, though there are other options (see below). Again, you don't want d to be really small either, so if it works out to a really small number, you probably want to try another value for e, and compute a new d to match.
There is another way to compute your e and d. You can start by finding some number K that's congruent to 1 mod r, then factor it. Put the prime factors together to get two factors of roughly equal size, and use them as e and d.
As far as an attacker computing your d goes: you need r to compute this, and knowing r depends on knowing p and q. That's exactly why/where/how factoring comes into breaking RSA. If you factor n, then you know p and q. From them, you can find r, and from r you can compute the d that matches a known e.
So, let's work through the math to create a key pair. We're going to use primes that are much too small to be effective, but should be sufficient to demonstrate the ideas involved.
So let's start by picking a p and q (of course, both need to be primes):
p = 9999991
q = 11999989
From those we compute n and r:
n = 119999782000099
r = 119999760000120
Next we need to either pick e or else compute K, then factor it to get e and d. For the moment, we'll go with your suggestion of e=65537 (since 65537 is prime, the only possibility for it and r not being relative primes would be if r was an exact multiple of 65537, which we can verify is not the case quite easily).
From that, we need to compute our d. We can do that fairly easily (though not necessarily very quickly) using the "Extended" version of Euclid's algorithm, (as you mentioned) Euler's Totient, Gauss' method, or any of a number of others.
For the moment, I'll compute it using Gauss' method:
template <class num>
num gcd(num a, num b) {
num r;
while (b > 0) {
r = a % b;
a = b;
b = r;
}
return a;
}
template <class num>
num find_inverse(num a, num p) {
num g, z;
if (gcd(a, p) > 1) return 0;
z = 1;
while (a > 1) {
z += p;
if ((g=gcd(a, z))> 1) {
a /= g;
z /= g;
}
}
return z;
}
The result we get is:
d = 38110914516113
Then we can plug these into an implementation of RSA, and use them to encrypt and decrypt a message.
So, let's encrypt "Very Secret Message!". Using the e and n given above, that encrypts to:
74603288122996
49544151279887
83011912841578
96347106356362
20256165166509
66272049143842
49544151279887
22863535059597
83011912841578
49544151279887
96446347654908
20256165166509
87232607087245
49544151279887
68304272579690
68304272579690
87665372487589
26633960965444
49544151279887
15733234551614
And, using the d given above, that decrypts back to the original. Code to do the encryption/decryption (using hard-coded keys and modulus) looks like this:
#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>
#include <functional>
typedef unsigned long long num;
const num e_key = 65537;
const num d_key = 38110914516113;
const num n = 119999782000099;
template <class T>
T mul_mod(T a, T b, T m) {
if (m == 0) return a * b;
T r = T();
while (a > 0) {
if (a & 1)
if ((r += b) > m) r %= m;
a >>= 1;
if ((b <<= 1) > m) b %= m;
}
return r;
}
template <class T>
T pow_mod(T a, T n, T m) {
T r = 1;
while (n > 0) {
if (n & 1)
r = mul_mod(r, a, m);
a = mul_mod(a, a, m);
n >>= 1;
}
return r;
}
int main() {
std::string msg = "Very Secret Message!";
std::vector<num> encrypted;
std::cout << "Original message: " << msg << '\n';
std::transform(msg.begin(), msg.end(),
std::back_inserter(encrypted),
[&](num val) { return pow_mod(val, e_key, n); });
std::cout << "Encrypted message:\n";
std::copy(encrypted.begin(), encrypted.end(), std::ostream_iterator<num>(std::cout, "\n"));
std::cout << "\n";
std::cout << "Decrypted message: ";
std::transform(encrypted.begin(), encrypted.end(),
std::ostream_iterator<char>(std::cout, ""),
[](num val) { return pow_mod(val, d_key, n); });
std::cout << "\n";
}
To have even a hope of security, you need to use a much larger modulus though--hundreds of bits at the very least (and perhaps a thousand or more for the paranoid). You could do that with a normal arbitrary precision integer library, or routines written specifically for the task at hand. RSA is inherently fairly slow, so at one time most implementations used code with lots of hairy optimization to do the job. Nowadays, hardware is fast enough that you can probably get away with a fairly average large-integer library fairly easily (especially since in real use, you only want to use RSA to encrypt/decrypt a key for a symmetrical algorithm, not to encrypt the raw data).
Even with a modulus of suitable size (and the code modified to support the large numbers needed), this is still what's sometimes referred to as "textbook RSA", and it's not really suitable for much in the way of real encryption. For example, right now, it's encrypting one byte of the input at a time. This leaves noticeable patterns in the encrypted data. It's trivial to look at the encrypted data above and see than the second and seventh words are identical--because both are the encrypted form of e (which also occurs a couple of other places in the message).
As it stands right now, this can be attacked as a simple substitution code. e is the most common letter in English, so we can (correctly) guess that the most common word in the encrypted data represents e (and relative frequencies of letters in various languages are well known). Worse, we can also look at things like pairs and triplets of letters to improve the attack. For example, if we see the same word twice in succession in the encrypted data, we know we're seeing a double letter, which can only be a few letters in normal English text. Bottom line: even though RSA itself can be quite strong, the way of using it shown above definitely is not.
To prevent that problem, with a (say) 512-bit key, we'd also process the input in 512-bit chunks. That means we only have a repetition if there are two places in the original input that go for 512 bits at a time that are all entirely identical. Even if that happens, it's relatively difficult to guess that that would be, so although it's undesirable, it's not nearly as vulnerable as with the byte-by-byte version shown above. In addition, you always want to pad the input to a multiple of the size being encrypted.
Reference
https://crypto.stackexchange.com/questions/1448/definition-of-textbook-rsa

How to auto-generate and assign variables corresponding to elements in a matrix?

I am working on a binary linear program problem.
I am not really familiar with any computer language(just learned Java and C++ for a few months), but I may have to use computer anyway since the problem is quite complicated.
The first step is to declare variables m_ij for every entry in (at least 8 X 8) a matrix M.
Then I assign corresponding values of each element of a matrix to each of these variables.
The next is to generate other sets of variables, x_ij1, x_ij2, x_ij3, x_ij4, and x_ij5, whenever the value of m_ij is not 0.
The value of x_ijk variable is either 0 or 1, and I do not have to assign values for x_ijk variables.
Probably the simplest way to do it is to declare and assign a value to each variable, e.g.
int* m_11 = 5, int* m_12 = 2, int* m_13 = 0, ... int* m_1n = 1
int* m_21 = 3, int* m_12 = 1, int* m_13 = 2, ... int* m_2n = 3
and then pick variables, the value of which is not 0, and declare x_ij1 ~ x_ij5 accordingly.
But this might be too much work, especially since I am going to consider many different matrices for this problem.
Is there any way to do this automatically?
I know a little bit of Java and C++, and I am considering using lp_solve package in C++(to solve binary integer linear program problem), but I am willing to use any other language or program if I could do this easily.
I am sure there must be some way to do this(probably using loops, I guess?), and this is a very simple task, but I just don't know about it because I do not have much programming language.
One of my cohort wrote a program for generating a random matrix satisfying some condition we need, so if I could use that matrix as my input, it might be ideal, but just any way to do this would be okay as of now.
Say, if there is a way to do it with MS excel, like putting matrix entries to the cells in an excel file, and import it to C++ and automatically generate variables and assign values to them, then this would simplify the task by a great deal!

Matlab indeed seems very suitable for the task. Though the example offered by #Dr_Sam will indeed create the matrices on the fly, I would recommend you to initialize them before you assign the values. This way your code still ends up with the right variable if something with the same name already existed in the workspace and also your variable will always have the expected size.
Assuming you want to define a square 8x8 matrix:
m = zeros(8)
Now in general, if you want to initialize a three dimensional matrixh of size imax,jmax,kmax:
imax = 8;
jmax = 8;
kmax = 5;
x = zeros(imax,jmax,kmax);
Now assigning to or reading from these matrices is very easy, note that length and with of m have been chosen the same as the first dimensions of x:
m(3,4) = 4; %Assign a value
myvalue = m(3,4) %read the value
m(:,1) = 1:8 *Assign the values 1 through 8 to the first column
x(2,4,5) = 12; %Assign a single value to the three dimensional matrix
x(:,:,2) = m+1; Assign the entire matrix plus one to one of the planes in x.

In C++ you could use a std::vector of vectors, like
std::vector<std::vector<int>> matrix;
You don't need to use separate variables for the matrix values, why would you when you have the matrix?
I don't understand the reason you need to get all values where you evaluate true or false. Instead just put directly into a std::vector the coordinates where your condition evaluates to true:
std::vector<std::pair<int, int> true_values;
for (int i = 0; i < matrix.size(); i++)
{
for (int j = 0; j < matrix[i].size(); j++)
{
if (some_condition_for_this_matrix_value(matrix[i][j], i, j) == true)
true_values.emplace_back(std::make_pair(i, j));
}
}
Now you have a vector of all matrix coordinates where your condition is true.
If you really want to have both true and false values, you could use a std::unordered_map with a std::pair containing the matrix coordinates as key and bool as value:
// Create a type alias, as this type will be used multiple times
typedef std::map<std::pair<int, int>, bool> bool_map_type;
bool_map_type bool_map;
Insert into this map all values from the matrix, with the coordinates of the matrix as the key, and the map value as true or false depending on whatever condition you have.
To get a list of all entries from the bool_map you can remove any false entries with std::remove_if:
std::remove_if(bool_map.begin(), bool_map.end(),
[](const bool_map_type::value_type& value) {
return value.second == false;
};
Now you have a map containing only entries with their value as true. Iterate over this map to get the coordinates to the matrix
Of course, I may totally have misunderstood your problem, in which case you of course are free to disregard this answer. :)

I know both C++ and Matlab (not Python) and in your case, I would really go for Matlab because it's way easier to use when you start programming (but don't forget to come back to C++ when you will find the limitations to Matlab).
In Matlab, you can define matrices very easily: just type the name of the matrix and the index you want to set:
m(1,1) = 1
m(2,2) = 1
gives you a 2x2 identity matrix (indices start with 1 in Matlab and entries are 0 by default). You can also define 3d matrices the same way:
x(1,2,3) = 2
For the import from Excel, it is possible if you save your excel file in CSV format, you can use the function dlmread to read it in Matlab. You could also try later to implement your algorithm directly in Matlab.
Finally, if you want to solve your binary integer programm, there is already a built-in function in Matlab, called bintprog which can solve it for you.
Hope it helps!

How to approximate the count of distinct values in an array in a single pass through it

I have several huge arrays (millions++ members). All those are arrays of numbers and they are not sorted (and I cannot do that). Some are uint8_t, some uint16_t/32/64. I would like to approximate the count of distinct values in these arrays. The conditions are following:
speed is VERY important, I need to do this in one pass through the array and I must go through it sequentially (cannot jump back and forth) (I am doing this in C++, if that's important)
I don't need EXACT counts. What I want to know is that if it is an uint32_t array if there are like 10 or 20 distinct numbers or if there are thousands or millions.
I have quite a bit of memory that I can use, but the less is used the better
the smaller the array data type, the more accurate I need to be
I dont mind STL, but if I can do it without it that would be great (no BOOST though, sorry)
if the approach can be easily parallelized, that would be cool (but its not a mandatory condition)
Examples of perfect output:
ArrayA [uint32_t, 3M members]: ~128 distinct values
ArrayB [uint32_t, 9M members]: 100000+ distinct values
ArrayC [uint8_t, 50K members]: 2-5 distinct values
ArrayD [uint8_t, 700K members]: 64+ distinct values
I understand that some of the constraints may seem illogical, but thats the way it is.
As a side note, I also want the top X (3 or 10) most used and least used values, but that is far easier to do and I can do it on my own. However if someone has thoughts for that too, feel free to share them!
EDIT: a bit of clarification regarding STL. If you have a solution using it, please post it. Not using STL would be just a bonus for us, we dont fancy it too much. However if it is a good solution, it will be used!

For 8- and 16-bit values, you can just make a table of the count of each value; every time you write to a table entry that was previously zero, that's a different value found.
For larger values, if you are not interested in counts above 100000, std::map is suitable, if it's fast enough. If that's too slow for you, you could program your own B-tree.

I'm pretty sure you can do it by:
Create a Bloom filter
Run through the array inserting each element into the filter (this is a "slow" O(n), since it requires computing several independent decent hashes of each value)
Count how many bits are set in the Bloom Filter
Compute back from the density of the filter to an estimate of the number of distinct values. I don't know the calculation off the top of my head, but any treatment of the theory of Bloom filters goes into this, because it's vital to the probability of the filter giving a false positive on a lookup.
Presumably if you're simultaneously computing the top 10 most frequent values, then if there are less than 10 distinct values you'll know exactly what they are and you don't need an estimate.
I believe the "most frequently used" problem is difficult (well, memory-consuming). Suppose for a moment that you only want the top 1 most frequently used value. Suppose further that you have 10 million entries in the array, and that after the first 9.9 million of them, none of the numbers you've seen so far has appeared more than 100k times. Then any of the values you've seen so far might be the most-frequently used value, since any of them could have a run of 100k values at the end. Even worse, any two of them could have a run of 50k each at the end, in which case the count from the first 9.9 million entries is the tie-breaker between them. So in order to work out in a single pass which is the most frequently used, I think you need to know the exact count of each value that appears in the 9.9 million. You have to prepare for that freak case of a near-tie between two values in the last 0.1 million, because if it happens you aren't allowed to rewind and check the two relevant values again. Eventually you can start culling values -- if there's a value with a count of 5000 and only 4000 entries left to check, then you can cull anything with a count of 1000 or less. But that doesn't help very much.
So I might have missed something, but I think that in the worst case, the "most frequently used" problem requires you to maintain a count for every value you have seen, right up until nearly the end of the array. So you might as well use that collection of counts to work out how many distinct values there are.

One approach that can work, even for big values, is to spread them into lazily allocated buckets.
Suppose that you are working with 32 bits integers, creating an array of 2**32 bits is relatively impractical (2**29 bytes, hum). However, we can probably assume that 2**16 pointers is still reasonable (2**19 bytes: 500kB), so we create 2**16 buckets (null pointers).
The big idea therefore is to take a "sparse" approach to counting, and hope that the integers won't be to dispersed, and thus that many of the buckets pointers will remain null.
typedef std::pair<int32_t, int32_t> Pair;
typedef std::vector<Pair> Bucket;
typedef std::vector<Bucket*> Vector;
struct Comparator {
bool operator()(Pair const& left, Pair const& right) const {
return left.first < right.first;
}
};
void add(Bucket& v, int32_t value) {
Pair const pair(value, 1);
Vector::iterator it = std::lower_bound(v.begin(), v.end(), pair, Compare());
if (it == v.end() or it->first > value) {
v.insert(it, pair);
return;
}
it->second += 1;
}
void gather(Vector& v, int32_t const* begin, int32_t const* end) {
for (; begin != end; ++begin) {
uint16_t const index = *begin >> 16;
Bucket*& bucket = v[index];
if (bucket == 0) { bucket = new Bucket(); }
add(*bucket, *begin);
}
}
Once you have gathered your data, then you can count the number of different values or find the top or bottom pretty easily.
A few notes:
the number of buckets is completely customizable (thus letting you control the amount of original memory)
the strategy of repartition is customizable as well (this is just a cheap hash table I have made here)
it is possible to monitor the number of allocated buckets and abandon, or switch gear, if it starts blowing up.
if each value is different, then it just won't work, but when you realize it, you will already have collected many counts, so you'll at least be able to give a lower bound of the number of different values, and a you'll also have a starting point for the top/bottom.
If you manage to gather those statistics, then the work is cut out for you.

For 8 and 16 bit it's pretty obvious, you can track every possibility every iteration.
When you get to 32 and 64 bit integers, you don't really have the memory to track every possibility.
Here's a few natural suggestions that are likely outside the bounds of your constraints.
I don't really understand why you can't sort the array. RadixSort is O(n) and once sorted it would be one more pass to get accurate distinctiveness and top X information. In reality it would be 6 passes all together for 32bit if you used a 1 byte radix (1 pass for counting + 1 * 4 passes for each byte + 1 pass for getting values).
In the same cusp as above, why not just use SQL. You could create a stored procedure that takes the array in as a table valued parameter and return the number of distinct values and the top x values in one go. This stored procedure could also be called in parallel.
-- number of distinct
SELECT COUNT(DISTINCT(n)) FROM #tmp
-- top x
SELECT TOP 10 n, COUNT(n) FROM #tmp GROUP BY n ORDER BY COUNT(n) DESC

I've just thought of an interesting solution. It's based on law of boolean algebra called Idempotence of Multiplication, which states that:
X * X = X
From it, and using the commutative property of boolean multiplication, we can deduce that:
X * Y * X = X * X * Y = X * Y
Now, you see where I'm going to? This is how the algorithm would work (I'm terrible with pseudo-code):
make c = element1 & element2 (binary AND between the binary representation of the integers)
for i=3 until i == size_of_array
make b = c & element[i];
if b != c then diferent_values++;
c=b;
In first iteration, we make (element1*element2) * element3. We could represent it as:
(X * Y) * Z
If Z (element3) is equal to X (element1), then:
(X * Y) * Z = X * Y * X = X * Y
And if Z is equal to Y (element2), then:
(X * Y) * Z = X * Y * Y = X * Y
So, if Z isn´t different to X or Y, then X * Y won't change when we multiply it for Z
This remains valid for big expressions, like:
(X * A * Z * G * T * P * S) * S = X * A * Z * G * T * P * S
If we receive a value which is factor of our big multiplicand (that means that it has been already computed) then the big multiplicand won't change when we multiply it to the recieved input, so there's no new distinct value.
So that's how it will go. Each time that a different value is computed then the multiplication of our big multiplicand and that distinct value, will be different to the big operand. So, with b = c & element[i], if b!= c we just increment out distinct values counter.
I guess I'm no being clear enough. If that's the case, please let me know.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js