Efficient approach in the grid [closed] - c++

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Problem: we have to fill a 2D grid of size m*n with characters from the set S such that number of distinct sub-matrices in the resulting grid are close to a given number k.
This question is derived from http://www.codechef.com/JULY14/problems/GERALD09
Limits:
1<=n,m<16
1<=k <=m*n*m*n
|S|=4
time limit=0.1 sec
Assumption: Two sub-matrices are distinct if they are not having same dimensions or at least a pair of characters at their corresponding locations doesn't match.
My approach: We can start with a random grid and loop while acceptable solution is found and in each iteration, we can increase/decrease randomness depending on our current state(but we can stuck in local optimum states).
But the problem is that I don't know efficient way to calculate number of different sub-matrices in a sub-grid.I tried hashing for counting which is pretty fast ( O(n2m2)*cost of generating/searching a hash value for a sub-grid).
But this approach doesn't give exact answers due to collisions of hash values and even after correcting it using the comment of #Vaughn Cato I can carry 15-25 iterations for optimum state finding and that is not enough .
Recently, I learned that Simulated annealing can be used to solve these kinds of problems.
http://www.theprojectspot.com/tutorial-post/simulated-annealing-algorithm-for-beginners/6
I am searching for any efficient approach for solving this optimization problem.
Thanks in advance.

I think they will post an editorial at some point, but here is a possible idea for this particular problem:
I generated locally all possible numbers of sub matrices possible for particular n and m.
For n=m=3 I got only 11 out of 81 possibilities.
For n=3,m=4 I got only 19 out of possible 144 values.
What's more, when I generated the values, I obtained all 19 possible options at the very beginning - after 263000 matrices out of possible 16M I already had them. (I generated in the lexicographical order)
So, I assume, one possible solution might be to precompute as many as possible different values of K that can be achieved for given n and m, save either the seed for random generator or in some other way such that you need O(1) characters per n-m-k triplet, and for a particular test case just check two neighboring values - first k larger and smaller than given.
What's more, since number of possible K values is not large, it may be possible to generate them in other way: given all possible values of K for nxm table, along with the appropriate tables, we can only backtrack through the values in the next row, and try to obtain all possible matrices with all different values of K for nx(m+1).

Related

Best ways to evaluate randomness of a shuffled list [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 months ago.
Improve this question
I'm trying to qualify the randomness of some shuffled list. Given a list of distinct integers, I want to shuffle it with different random generators or methods and evaluate the quality of the resulting shuffling.
For now, I tried to do some kind of dice experiment. Given a list of input_size, I select some bucket in the list to be the "observed" one and then I shuffle the initial list num_runs * input_size (always starting with a fresh copy). I then look the frequencies of the elements that fell in the observed bucket. I then report the result on some plot. You can observe the results bellow for three different methods (line plots of the frequencies, I tried histogramms but it would look bad).
The dice experiment over three methods
Reporting plots only is not formal enough, I would like to report some numbers. What are the best ways to do it (or used in academic publications).
Thanks in advance.
Quantifying randomness isn't trivial.
You generate a huge amount of random numbers and then test if they have the properties exhibited from true random numbers. There are tons of properties you can test for, e.g., a bits should occure with a 50% probability.
There are a randomness test suites that combine a bunch of these tests and try to find statistical flaws in the pseudo random numbers.
PractRand is to my knowledge currently the most sophisticated randomness test suite.
I'd suggest you write a program that uses your method to repeatedly shuffle an array of e.g. [0..255] and write the raw bytes to stdout. (so the output is a pseudo random bit stream) Then you can pipe that into PractRand and it will quit once it finds statistical flaws. ./a.out | ./PractRand stdin.
TestU01's "Big Crush" is also a pretty good test suite, but it takes a very long time to run, and from my experiance PractRand finds more statistical flaws.
I suggest not to use the Diehard or the newer Dieharder test suites, because they aren't as powerful and have false positives, even when using CSPRNGs or true random number generators.

Most efficient way to index true/false values in C++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I have a list of unsigned shorts that act as local IDs for a database. I was wondering what is the most memory-efficient way to store allowed IDs. For the lifetime of my project, the allowed ID list will be dynamic, so it may have more true or more false allowed IDs as time goes on, with a range of none allowed or all allowed.
What would be the best method to store these? I've considered the following:
List of allowed IDs
Bool vector/array of true/false for allowed IDs
Byte array that can be iterated through, similar to 2
Let me know which of these would be best or if another, better method, exists.
Thanks
EDIT: If possible, can a vector have a value put at say, index 1234, without all 1233 previous values, or would this suit a map or similar type more?
I'm looking at using an Arduino with 2k total ram and using external storage to assist with managing a large block of data, but I'm exploring what my options are
"Best" is opinion-based, unless you are aiming for memory efficiency at the expense of all other considerations. Is that really what you want?
First of all, I hope we're talking <vector> here, not <list> -- because a std::list< short > would be quite wasteful already.
What is the possible value range of those ID's? Do they use the full range of 0..USHRT_MAX, or is there e.g. a high bit you could use to indicate allowed ones?
If that doesn't work, or you are willing to sacrifice a bit of space (no pun intended) for a somewhat cleaner implementation, go for a vector partitioned into allowed ones first, disallowed second. To check whether a given ID is allowed, find it in the vector and compare its position against the cut-off iterator (which you got from the partitioning). That would be the most memory-efficient standard container solution, and quite close to a memory-optimum solution either way. You would need to re-shuffle and update the cut-off iterator whenever the "allowedness" of an entry changes, though.
One suitable data structure to solve your problem is a trie (string tree) that holds your allowed or disallowed IDs.
Your can refer to the ID binary representation as the string. Trie is a compact way to store the IDs (memory wise) and the runtime access to it is bound by the longest ID length (which in your case is constant 16)
I'm not familiar with a standard library c++ implementation, but if efficiency is crucial you can find an implementation or implementat yourself.

How to implement TSP with dynamic in C++

Recently I asked a question on Stack Overflow asking for help to solve a problem. It is a travelling salesman problem where I have up to 40,000 cities but I only need to visit 15 of them.
I was pointed to use Dijkstra with a priority queue to make a connectivity matrix for the 15 cities I need to visit and then do TSP on that matrix with DP. I had previously only used Dijkstra with O(n^2). After trying to figure out how to implement Dijkstra, I finally did it (enough to optimize from 240 seconds to 0.6 for 40,000 cities). But now I am stuck at the TSP part.
Here are the materials I used for learning TSP :
Quora
GeeksForGeeks
I sort of understand the algorithm (but not completely), but I am having troubles implementing it. Before this I have done dynamic programming with arrays that would be dp[int] or dp[int][int]. But now when my dp matrix has to be dp[subset][int] I don't have any idea how should I do this.
My questions are :
How do I handle the subsets with dynamic programming? (an example in C++ would be appreciated)
Do the algorithms I linked to allow visiting cities more than once, and if they don't what should I change?
Should I perhaps use another TSP algorithm instead? (I noticed there are several ways to do it). Keep in mind that I must get the exact value, not approximate.
Edit:
After some more research I stumbled across some competitive programming contest lectures from Stanford and managed to find TSP here (slides 26-30). The key is to represent the subset as a bitmask. This still leaves my other questions unanswered though.
Can any changes be made to that algorithm to allow visiting a city more than once. If it can be done, what are those changes? Otherwise, what should I try?
I think you can use the dynamic solution and add to each pair of node a second edge with the shortest path. See also this question:Variation of TSP which visits multiple cities.
Here is a TSP implementation, you will find the link of the implemented problem in the post.
The algorithms you linked don't allow visiting cities more than once.
For your third question, I think Phpdna answer was good.
Can cities be visited more than once? Yes and no. In your first step, you reduce the problem to the 15 relevant cities. This results in a complete graph, i.e. one where every node is connected to every other node. The connection between two such nodes might involve multiple cities on the original map, including some of the relevant ones, but that shouldn't be relevant to your algorithm in the second step.
Whether to use a different algorithm, I would perhaps do a depth-first search through the graph. Using a minimum spanning tree, you can give an upper and lower bound to the remaining cities, and use that to pick promising solutions and to discard hopeless ones (aka pruning). There was also a bunch of research done on this topic, just search the web. For example, in cases where the map is actually carthesian (i.e. the travelling costs are the distance between two points on a plane), you can exploit this info to improve the algorithms a bit.
Lastly, if you really intend to increase the number of visited cities, you will find that the time for computing it increases vastly, so you will have to abandon your requirement for an exact solution.

Ideas Related to Subset Sum with 2,3 and more integers

I've been struggling with this problem just like everyone else and I'm quite sure there has been more than enough posts to explain this problem. However in terms of understanding it fully, I wanted to share my thoughts and get more efficient solutions from all the great people in here related to Subset Sum problem.
I've searched it over the Internet and there is actually a lot sources but I'm really willing to re-implement an algorithm or finding my own in order to understand fully.
The key thing I'm struggling with is the efficiency considering the set size will be large. (I do not have a limit, just conceptually large). The two phases I'm trying to implement ideas on is finding two numbers that are equal to given integer T, finding three numbers and eventually K numbers. Some ideas I've though;
For the two integer part I'm thing basically sorting the array O(nlogn) and for each element in the array searching for its negative value. (i.e if the array element is 3 searching for -3). Maybe a hash table inclusion could be better, providing a O(1) indexing the element?
For the three or more integers I've found an amazing blog post;http://www.skorks.com/2011/02/algorithms-a-dropbox-challenge-and-dynamic-programming/. However even the author itself states that it is not applicable for large numbers.
So I was for 2 and 3 and more integers what ideas could be applied for the subset problem. I'm struggling with setting up a dynamic programming method that will be efficient for the large inputs as well.
That blog post you linked to looked pretty great, actually. This is, after all, an NP-complete problem...
But I bet you could speed it up even further. I haven't done any benchmarks, but I'm gonna guess that his use of a matrix is his single biggest time sink. First, it'll take a huge amount of memory for some really trivial inputs (For example: [-1000, 1000] will need 2001 columns! Good grief!), and then you're wasting a ton of cycles scanning through each row looking for "T"s, which are often gonna be pretty sparse.
So instead: Use a "set" data structure. That'll keep space and iteration time to a minimum,* but store values just as well: If it's in the set, it's a "T"; otherwise, it's an "F".
Hope that helps!
*: Of course, "minimum" doesn't necessarily = "small."

Which linear programming package should I use for high numbers of constraints and "warm starts" [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I have a "continuous" linear programming problem that involves maximizing a linear function over a curved convex space. In typical LP problems, the convex space is a polytope, but in this case the convex space is piecewise curved -- that is, it has faces, edges, and vertices, but the edges aren't straight and the faces aren't flat. Instead of being specified by a finite number of linear inequalities, I have a continuously infinite number. I'm currently dealing with this by approximating the surface by a polytope, which means discretizing the continuously infinite constraints into a very large finite number of constraints.
I'm also in the situation where I'd like to know how the answer changes under small perturbations to the underlying problem. Thus, I'd like to be able to supply an initial condition to the solver based on a nearby solution. I believe this capability is called a "warm start."
Can someone help me distinguish between the various LP packages out there? I'm not so concerned with user-friendliness as speed (for large numbers of constraints), high-precision arithmetic, and warm starts.
Thanks!
EDIT: Judging from the conversation with question answerers so far, I should be clearer about the problem I'm trying to solve. A simplified version is the following:
I have N fixed functions f_i(y) of a single real variable y. I want to find x_i (i=1,...,N) that minimize \sum_{i=1}^N x_i f_i(0), subject to the constraints:
\sum_{i=1}^N x_i f_i(1) = 1, and
\sum_{i=1}^N x_i f_i(y) >= 0 for all y>2
More succinctly, if we define the function F(y)=\sum_{i=1}^N x_i f_i(y), then I want to minimize F(0) subject to the condition that F(1)=1, and F(y) is positive on the entire interval [2,infinity). Note that this latter positivity condition is really an infinite number of linear constraints on the x_i's, one for each y. You can think of y as a label -- it is not an optimization variable. A specific y_0 restricts me to the half-space F(y_0) >= 0 in the space of x_i's. As I vary y_0 between 2 and infinity, these half-spaces change continuously, carving out a curved convex shape. The geometry of this shape depends implicitly (and in a complicated way) on the functions f_i.
As for LP solver recommendations, two of the best are Gurobi and CPLEX (google them). They are free for academic users, and are capable of solving large-scale problems. I believe they have all the capabilities that you need. You can get sensitivity information (to a perturbation) from the shadow prices (i.e. the Lagrange multipliers).
But I'm more interested in your original problem. As I understand it, it looks like this:
Let S = {1,2,...,N} where N is the total number of functions. y is a scalar. f_{i}:R^{1} -> R^{1}.
minimize sum{i in S} (x_{i} * f_{i}(0))
x_{i}
s.t.
(1) sum {i in S} x_{i} * f_{i}(1) = 1
(2) sum {i in S} x_{i} * f_{i}(y) >= 0 for all y in (2,inf]
It just seems to me that you might want to try solve this problem as an convex NLP rather than an LP. Large-scale interior point NLP solvers like IPOPT should be able to handle these problems easily. I strongly recommended trying IPOPT http://www.coin-or.org/Ipopt
From a numerical point of view: for convex problems, warm-starting is not necessary with interior point solvers; and you don't have to worry about the combinatorial cycling of active sets. What you've described as "warm-starting" is actually perturbing the solution -- that's more akin to sensitivity analysis. In optimization parlance, warm-starting usually means supplying a solver with an initial guess -- the solver will take that guess and end up at the same solution, which isn't really what you want. The only exception is if the active set changes with a different initial guess -- but for a convex problem with a unique optimum, this cannot happen.
If you need any more information, I'd be pleased to supply it.
EDIT:
Sorry about the non-standard notation -- I wish I could type in LaTeX like on MathOverflow.net. (Incidentally, you might try posting this there -- I think the mathematicians there would be interested in this problem)
Ah now I see about the "y > 2". It isn't really an optimization constraint so much as an interval defining a space (I've edited my description above). My mistake. I'm wondering if you could somehow transform/project the problem from an infinite to a finite one? I can't think of anything right now, but I'm just wondering if that's possible.
So your approach is to discretize the problem for y in (2,inf]. I'm guessing you're choosing a very big number to represent inf and a fine discretization grid. Oooo tricky. I suppose discretization is probably your best bet. Maybe guys who do real analysis have ideas.
I've seen something similar being done for problems involving Lyapunov functions where it was necessary to enforce a property in every point within a convex hull. But that space was finite.
I encountered a similar problem. I searched the web and found just now that this problem may be classified as "semi-infinite" problem. MATLAB has tools to solve this kind of problems (function "fseminf"). But I haven't checked this in detail. Sure people have encountered this kind of questions.
You shouldn't be using an LP solver and doing the discretization yourself. You can do much better by using a decent general convex solver. Check out, for example, cvxopt. This can handle a wide variety of different functions in your constraints, or allow you to write your own. This will be far better than attempting to do the linearization yourself.
As to warm start, it makes more sense for an LP than a general convex program. While warm start could potentially be useful if you hand code the entire algorithm yourself, you typically still need several Newton steps anyway, so the gains aren't that significant. Most of the benefit of warm start comes in things like active set methods, which are mostly only used for LP.