Best way to store multi variable polynomials in Lisp - list

I need to store polynomials in my lisp program for adding, subtracting and multiplying. But cannot find an easy way of storing one.
I've considered the following way
(2x^3 + 2x + 4y^3 - 2z) in a list of lists where each list is a list of the amount of each power
= ( (0 2 0 2) (0 0 0 4) (0 2) )
But the uncertain lengths of each list and potential length could become a problem.
Is there a generally accepted way to store them in lisp which could make it as easy as possible to add, subtract and multiply them together?

Assuming you know the number of possible variables beforehand, you could express each term like this: (constant x-exponent y-exponent z-exponent ...). Then 5xy^2 would be (5 1 2 0), and a full expression would just be a list of those terms.
If you want to be able to handle any number of arbitrary variables, you would need to do an associative list along the lines of ((constant 5) (a 0) (b 3) (z 23) (apple 13)).
Either way, if you start with individual terms, it's easy to build more complex expressions and this way you don't need to mess with multiple dimensions.

May be this idea will help you partly. You can represent polynomial as vector, when index will be a power and an element - a coefficient, and first element - your variable. I mean 5*x^3 + 10*x^2 + 40x + 50 will look like #(50 40 10 5). Working with such representation easy, but it looks like not very optimal for big powers like x^100.
Multivariable polynomial may be represented as N-dimensional array where N - number of variables.

There are several ways of representing polynomials. As usual the choice of representation is a tradeoff.
One way is an association list from order to coefficient usually sorted
after according to order.
12x^2 + 11x + 10 ((2 . 12) (11 . 1) (10 . 0))
If you need to compute with sparse polynomials, then this representation is space efficient. x^200 is just ((200 . 1)).
If your calculations consists mostly of non-sparse polynomals a vector representation is more space efficient:
12x^2 11x + 10 (vector 10 11 12)
The length of the vector minus one gives the order of the polynomial.
If you need polynomials of more than one variable there are variations of the representations. In particular you can look a the representation in Maxima:
http://maxima.sourceforge.net/docs/manual/maxima_14.html
If you happen to have "Paradigms of Artificial Intelligence Programming: Case Studies in Common LISP" by Peter Norvig, there is a nice chapter on polynomials.

Related

finding intersections in a given range?

assume array of N (N<=100000) elements a1, a2, .... ,an, and you are given range in it L, R where 1<=L<=R<=N, you are required to get number of values in the given range which are divisible by at least one number from a set S which is given also, this set can be any subset of {1,2,....,10}. a fast way must be used because it may ask you for more than one range and more than one S (many queries Q, Q<=100000), so looping on the values each time will be very slow.
i thought of storing numbers of values divisible by each number in the big set {1,2,....,10} in 10 arrays of N elements each, and do cumulative sum to get the number of values divisible by any specific number in any range in O(1) time, for example if it requires to get number of values divisible by at least one of the following: 2,3,5, then i add the numbers of values divisible by each of them and then remove the intersections, but i didn't properly figure out how to calculate the intersections without 2^10 or 2^9 calculations each time which will be also very slow (and possibly hugely memory consuming) because it may be done 100000 times, any ideas ?
Your idea is correct. You can use inclusion-exclusion principle and prefix sums to find the answer. There is just one more observation you need to make.
If there's a pair of numbers a and b in the set such that a divides b, we can remove b without changing the answer to the query (indeed, if b | x, then a | x). Thus, we always get a set such that no element divides any other one.
The number of such mask is smaller than 2^10. In facts, it's 102. Here's the code that computes it:
def good(mask):
for i in filter(lambda b: mask & (1 << (b - 1)), range(1, 11)):
if (any(i % j == 0 for j in filter(lambda b: mask & (1 << (b - 1)), range(1, i)))):
return False
return True
print(list(filter(good, range(1, 2 ** 10)))))
Thus, we the preprocessing requires approximately 100N operations and numbers to store (it looks reasonably small).
Moreover, there are most 5 elements in any "good" mask (it can be checked using the code above). Thus, we can answer each query using around 2^5 operations.

Good way to detect identical expressions in C++

I am writing a program that solves this puzzle game: some numbers and a goal number is given, and you make the goal number using the n numbers and operators +, -, *, / and (). For example, given 2,3,5,7 and the goal number 10, the solutions are (2+3)*(7-5)=10, 3*5-(7-2)=10, and so on.
The catch is, if I implement it naively, I will get a bunch of identical solutions, like (2+3)*(7-5)=10 and (3+2)*(7-5)=10, and 3*5-(7-2)=10 and 5*3-(7-2)=10 and 3*5-7+2=10 and 3*5+2-7=10 and so on. So I'd like to detect those identical solutions and prune them.
I'm currently using randomly generated double numbers to detect identical solutions. What I'm doing is basically substituting those random numbers to the solution and check if there are any pairs of them that calculate to the same number. I have to perform the detection at every node of my search, so it has to be fast, and I use hashset for it now.
Now the problem is the error that comes with the calculation. Because even identical solutions do not calculate to the exactly same value, I currently round the calculated value to a precision when storing in the hashset. However this does not seem to work well enough, and gives different number of solutions every time to the same problem. Sometimes the random numbers are bad and prune some completely different solutions. Sometimes the calculated value lies on the edge of rounding function and it outputs two(or more) identical solutions. Is there a better way to do this?
EDIT:
By "identical" I mean two or more solutions(f(w,x,y,z,...) and g(w,x,y,z,...)) that calculate to the same number whatever the original number(w,x,y,z...) is. For more examples, 4/3*1/2 and 1*4/3/2 and (1/2)/(3/4) are identical, but 4/3/1/2 and 4/(3*1)/2 are not because if you change 1 to some other number they will not produce the same result.
It will be easier if you "canonicalize" the expressions before comparing them. One way would be to sort when an operation is commutative, so 3+2 becomes 2+3 whereas 2+3 remains as it was. Of course you will need to establish an ordering for parenthesized groups as well, like 3+(2*1)...does that become (1*2)+3 or 3+(1*2)? What the ordering is doesn't necessarily matter, so long as it is a total ordering.
Generate all possibilities of your expressions. Then..
When you create expressions, put them in a collection of parsed trees (this would also eliminate your parenthesis). Then "push down" any division and subtraction into the leaf nodes so that all the non-leaf nodes have * and +. Apply a sorting of the branches (e.g. regular string sort) and then compare the trees to see if they are identical.
I like the idea of using doubles. The problem is in the rounding. Why not use a container SORTED by the value obtained with one random set of double inputs. When you find the place you would insert in that container, you can look at the immediately preceding and following items. Use a different set of random doubles to recompute each for the more robust comparison. Then you can have a reasonable cutoff for "close enough to be equal" without arbitrary rounding.
If a pair of expressions are close enough for equal in both the main set of random numbers and the second set, the expressions are safely "same" and the newer one discarded. If close enough for equal in the main set but not the new set, you have a rare problem, that probably requires rekeying the entire container with a different random number set. If not close enough in either, then they are different.
For the larger n suggested by one of your recent comments, I think you would need the better performance that should be possible from a canonical by construction method (or maybe "almost" canonical by construction) rather than a primarily comparison based approach.
You don't want to construct an incredibly large number of expressions, then canonicalize and compare.
Define a doubly recursive function can(...) that takes as input:
A reference to a canonical expression tree.
A reference to one subexpression of that tree.
A count N of inputs to be injected.
A set of flags for prohibiting some injections.
A leaf function to call.
If N is zero, can just calls the leaf function. If N is nonzero, can patches the subtree in every possible way that produces a canonical tree with N injected variables, and calls the leaf function for each and restores the tree, undoing each part of the patch as it is done with it, so we never need massive copying.
X is the subtree and K is a leaf representing variable N-1. First can would replace the subtree temporarily one at a time with subtrees representing some of (X)+K, (X)-K, (X)*K, (X)/K and K/(X) but both flags and some other rules would cause some of those to be skipped. For each not skipped, recursively call itself with the whole tree as both top and sub, with N-1, and with 0 flags.
Next drill into the two children of X and call recursively itself with that as the subtree, with N, and with appropriate flags.
The outer just calls can with a single node tree representing variable N-1 of the original N, and passing N-1.
In discussion, it is easier to name the inputs forward, so A is input N-1 and B is input N-2 etc.
When we drill into X and see it is Y+Z or Y-Z we don't want to add or subtract K from Y or Z because those are redundant with X+K or X-K. So we pass a flag that suppresses direct add or subtract.
Similarly, when we drill into X and see it is Y*Z or Y/Z we don't want to multiply or divide either Y or Z by K because that is redundant with multiplying or dividing X by K.
Some cases for further clarification:
(A/C)/B and A/(B*C) are easily non canonical because we prefer (A/B)/C and so when distributing C into (A/B) we forbid direct multiplying or dividing.
I think it takes just a bit more effort to allow C/(A*B) while rejecting C/(A/B) which was covered by (B/A)*C.
It is easier if negation is inherently non canonical, so level 1 is just A and does not include -A then if the whole expression yields negative the target value, we negate the whole expression. Otherwise we never visit the negative of a canonical expression:
Given X, we might visit (X)+K, (X)-K, (X)*K, (X)/K and K/(X) and we might drill down into the parts of X passing flags which suppress some of the above cases for the parts:
If X is a + or - suppress '+' or '-' in its direct parts. If X is a * or / suppress * or divide in its direct parts.
But if X is a / we also suppress K/(X) before drilling into X.
Since you are dealing with integers, I'd focus on getting an exact result.
Claim: Suppose there is some f(a_1, ..., a_n) = x where a_i and x are your integer input numbers and f(a_1, ..., a_n) represents any functions of your desired form. Then clearly f(a_i) - x = 0. I claim, we can construct a different function g with g(x, a_1, ..., a_n) = 0 for the exact same x and g only uses ()s, +, - and * (no division).
I'll prove that below. Consequently you could construct g evaluate g(x, a_1, ..., a_n) = 0 on integers only.
Example:
Suppose we have a_i = i for i = 1, ..., 4 and f(a_i) = a_4 / (a_2 - (a_3 / 1)) (which contains divisions so far). This is how I would like to simplify:
0 = a_4 / (a_2 - (a_3 / a_1) ) - x | * (a_2 - (a_3 / a_1) )
0 = a_4 - x * (a_2 - (a_3 / a_1) ) | * a_1
0 = a_4 * a_1 - x * (a_2 * a_1 - (a_3) )
In this form, you can verify your equality for some given integer x using integer operations only.
Proof:
There is some g(x, a_i) := f(a_i) - x which is equivalent to f. Consider any equivalent g with as few as possible division. Assume there is at least one (otherwise we are done). Assume within g we divide by h(x, a_i) (any of your functions, may contain divisions itself). Then (g*h)(x, a_i) := g(x, a_i) * h(x, a_i) has the same roots, as g has (multiplying by a root, ie. (x, a_i) where g(a_i) - x = 0, preserves all roots). But on the other hand, g*h is composed of one division fewer. A contradiction (g with minimum number of divisions), which is why g doesn't contain any division.
I've updated the example to visualize the strategy.
Update: This works well on rational input numbers (those represent a single division p/q). This should help you. Other input can't be provided by humans.
What are you doing to find / test f's? I'd guess some form of dynamic programming will be fast in practice.

How can I find products with a given remainder efficiently in C++?

If I am given a number like 23,128,765 and I am given two more numbers 9 and 3, I want to calculate the number of pairs such that for 9 and 3 the substrings whose remainder is 3 when divided by 9 are: 3, 31,287, 12, and 876.
How can I calculate the number of such substrings using C++?
One possible way is to calculate all possible substrings but that is O(n^2), but I want something faster than that.
You asked for C++, but here's a solution in Common Lisp, which was easier to write and demonstrate in this context:
(defun quotients-with-remainder-in (n divisor remainder)
(loop with str = (format nil "~D" n)
for i upfrom remainder to n by divisor
when (search (format nil "~D" i) str)
collect i))
> (quotients-with-remainder-in 23128765 9 3)
(3 12 876 31287)
This strides upward by the divisor, and converts each potential product into a string a searches for that string within the upper bound number's string representation.
That creates many strings along the way, but finding a way to manipulate integers to see whether the decimal representation of one occurs within the decimal representation of another is not amenable to any arithmetic operations. For instance, figuring out whether the number 131 "contains" a 3 can't be accomplished with arithmetic and inspection of a two's complement representation.
Well, let's see; perhaps you could multiply the candidate number by factors of 10, and for each such product subtract it from the original number, and then check whether the important digits in the difference are all zero, for which you can use the modulus operator against a power of 10 one greater than the base-10 logarithm of the product. I started writing such a solution, an it quickly went off the rails. Using the extra logarithms, multiplication, and division will likely yield a solution even slower than the string conversion and substring searching.

C++ two dimensional array of bitsets

I have an assignment where we're tackling the traveling salesman problem.
I'm not going to lie, the part I'm doing right now I actually don't understand fully that they're asking, so sorry if I phrase this question weirdly.
I sort of get it, but not fully.
We're calculating an approximate distance for the salesman. We need to create a two-dimensional array, of bitsets I believe? Storing the values in binary anyway.
0 represents that the city hasn't been visited, and 1 represents that is has been visited.
We've been given an algorithm that helps significantly, and I should be able to finish it if anyone here can help with the first step:
Create memoisation table [N][(1 << N)]
(where N = number of cities).
I get that 1 << N means convert the number of cities (e.g. 5) to binary, then move the set to the left by one place.
My main issues are:
Converting N to binary (I think this is what I need to do?)
Moving the set to the left by one
Actually creating the 2-dimensional array of these sizes...
I could be wrong here, in fact that's probably pretty likely... any help is appreciated, thanks!
Here is the general rule "<<" operator means left shift and ">>" means right shift. Right shifting any number by 1 is equivalent to divide by 2 and left shift any numbers by 2 is equivalent to multiply by 2. For example lets say a number 7 (Binary 111). So 7 << 1 will become 1110 which is 7 * 2 = 14 and 7 >> 1 will become 11 which is 7 / 2 = 3 .
So for algorithm to convert a number N to a bitset array as binary is
N mod 2 (take the remainder if you divide N by 2)
Store the remainder in a collection (i.e, List, Array, Stack )
Divide N by 2
If N/2 >1 Repeat from step 1 with N/2
Else reverse the array and you have your bitset.
Moving the set left to one, If you meant leftshift by one you can do it by N<<1
This is how you create 2 dimensional array in C++
[Variable Type] TwoDimensionalArray[size][size];
For this problem though I believe you might want to read about C++ bitset and you can easily implement it using bitset. For that you just have to figure out the size of the bitset you want to use. For example if the highest value of N is 15 then you need a bitset size of 4. Because with 4 bit the maximum number you can represent is 15 (Binary 1111). Hope this helps.

All subsets in Subset_sum_problem

I'm stuck at solving Subset_sum_problem.
Given a set of integers(S), need to compute non-empty subsets whose sum is equal to a given target(T).
Example:
Given set, S{4, 8, 10, 16, 20, 22}
Target, T = 52.
Constraints:
The number of elements N of set S is limited to 8. Hence a NP time solution is acceptable as N has a small upperbound.
Time and space complexities are not really a concern.
Output:
Possible subsets with sum exactly equal to T=52 are:
{10, 20, 22}
{4, 10, 16, 22}
The solution given in Wiki and in some other pages tries to check whether there exists such a subset or not (YES/NO).
It doesn't really help to compute all possible subsets as outlined in the above example.
The dynamic programming approach at this link gives single such subset but I need all such subsets.
One obvious approach is to compute all 2^N combinations using brute force but that would be my last resort.
I'm looking for some programmatic example(preferably C++) or algorithm which computes such subsets with illutrations/examples?
When you construct the dynamic-programming table for the subset sum problem you intialize most of it like so (taken from the Wikipedia article referenced in the question):
Q(i,s) := Q(i − 1,s) or (xi == s) or Q(i − 1,s − xi)
This sets the table element to 0 or 1.
This simple formula doesn't let you distinguish between those several cases that can give you 1.
But you can instead set the table element to a value that'd let you distinguish those cases, something like this:
Q(i,s) := {Q(i − 1,s) != 0} * 1 + {xi == s} * 2 + {Q(i − 1,s − xi) != 0} *4
Then you can traverse the table from the last element. At every element the element value will tell you whether you have zero, one or two possible paths from it and their directions. All paths will give you all combinations of numbers summing up to T. And that's at most 2N.
if N <= 8 why don't just go with 2^n solution?? it's only 256 possibilities that will be very fast
Just brute force it. If N is limited to 8, your total number of subsets is 2^8, which is only 256. They give constraints for a reason.
You can express the set inclusion as a binary string where each element is either in the set or out of the set. Then you can just increment your binary string (which can simply be represented as an integer) and then determine which elements are in the set or not using the bitwise & operator. Once you've counted up to 2^N, you know you've gone through all possible subsets.
The best way to do it is using a dynamic programming approach.However, dynamic programming just answers whether a subset sum exits or not as you mentioned in your question.
By dynamic programming, you can output all the solutions by backtracking.However, the overall time complexity to generate all the valid combinations is still 2^n.
So, any better algorithm than 2^n is close to impossible.
UPD:
From #Knoothe Comment:
You can modify horowitz-sahni's algorithm to enumerate all possible subsets.If there are M such sets whose sum equals S, then overall time complexity is in O(N * 2^(N/2) + MN)