I just saw that function in code, and intuitively it should return the next prime number greater than the argument. When I call it that way, however, I get 53! and then when I pass in 54 i get 97. I'm not finding a description of what it does online, can anybody point me to one or does anybody know what this does?
It returns the next prime that is sufficiently greater than the specified prime to be worth reorganizing a hash table to that number of buckets. If it returned the very next prime, you'd be reorganizing your hash tables way too often. It is an implementation detail of the hash table code and it's not meant to be used by outside code.
Related
I have another backtracking challenge, in which I have to get all possible combinations of prime numbers that add up to a certain number. I have finished the task using the general use algorithm from Wikipedia, but for the number 100, it took more than an hour to run, and it still hadn't finished by the end of class. I was wondering: Would memoisation(how do you spell that?) have significantly improved the algorithm's performance(as in, would it have made it noticeably faster)? I am using c++, and the function is called a huge number of times. I am using recursive backtracking, which I seem to remember is roughly O(n!) for simple problems.
Create an array external to the function checking for primarity and reachable from it. Global or static, depending on the used language. That array will content all found primary numbers.
If the number in question is in the array, return true.
if number is less or equal than squared max number in the array, return false.
Check for divisibility for all known primaries
if the number is primary, write it into array and return true
return false
That adding is simple enough. Do it and check the changed time.
I'm working on a problem where I have an entire table from a database in memory at all times, with a low range and high range of 9-digit numbers. I'm given a 9-digit number that I need to use to lookup the rest of the columns in the table based on whether that number falls in the range. For example, if the range was 100,000,000 to 125,000,000 and I was given a number 117,123,456, then I would know that I'm in the 100-125 mil range, and whatever vector of data that points to is what I will be using.
Now the best I can think of for lookup time is log(n) run time. This is OK, at best, but still pretty slow. The table has at least 100,000 entries and I will need to look up values in this table tens-of-thousands, if not hundred-thousands of times, per execution of this application (10+ times/day).
So I was wondering if it was possible to use an unordered_set instead, writing my own Hash function that ALWAYS returns the same hash-value for every number in range. Using the same example above, 100,000,000 through 125,000,000 will always return, for example, a hash value of AB12CD. Then when I use the lookup value of 117,123,456, I will get that same AB12CD hash and have a lookup time of O(1).
Is this possible, and if so, any ideas how?
Thanks in advance.
Yes. Assuming that you can number your intervals in order, you could fit a polynomial to your cutoff values, and receive an index value from the polynomial. For instance, with cutoffs of 100,000,000, 125,000,000, 250,000,000, and 327,000,000, you could use points (100, 0), (125, 1), (250, 2), and (327, 3), restricting the first derivative to [0, 1]. Assuming that you have decently-behaved intervals, you'll be able to fit this with an (N+2)th-degree polynomial for N cutoffs.
Have a table of desired hash values; use floor[polynomial(i)] for the index into the table.
Can you write such a hash function? Yes. Will evaluating it be slower than a search? Well there's the catch...
I would personally solve this problem as follows. I'd have a sorted vector of all values. And then I'd have a jump table of indexes into that vector based on the value of n >> 8.
So now your logic is that you look in the jump table to figure out where you are jumping to and how many values you should consider. (Just look at where you land versus the next index to see the size of the range.) If the whole range goes to the same vector, you're done. If there are only a few entries, do a linear search to find where you belong. If they are a lot of entries, do a binary search. Experiment with your data to find when binary search beats a linear search.
A vague memory suggests that the tradeoff is around 100 or so because predicting a branch wrong is expensive. But that is a vague memory from many years ago, so run the experiment for yourself.
I want number of ways to divide an array of possitive integers such that maximum value of left part of array is greater than or equal to the maximum value of right part of the array.
For example,
6 4 1 2 1 can be divide into:
[[6,4,1,2,1]] [[6][4,1,2,1]] [[6,4][1,2,1]] [[6,4,1][2,1]] [[6,4,1,2][1]] [[6][4,1][2,1]] [[6][4][1,2,1]] [[6][4][1,2][1]] [[6][4,1,2][1]] [[6,4,1][2][1]] [[6][4,1][2][1]] [[6,4][1,2][1]]
which are total 12 ways of partitioning.
I tried a recursive approach but it fails beacause of termination due to exceed of time limit. Also this approach is not giving correct output always.
In this another approach, I took the array ,sort it in decreasing order and then for each element I checked weather it lies on right of the original array, and if does then added it's partitions to it's previous numbers too.
I want an approach to solve this, any implementation or pseudocode or just an idea to do this would be appreciable.
I designed a simple recursive algorithm. I will try to explain on your example;
First, check if [6] is a possible/valid part of a partition.
It is a valid partition because maximum element of ([6]) is bigger than remaining part's ([4,1,2,1]) maximum value.
Since it is a valid partition, we can use recursive part of the algorithm.
concatenate([6],algorithm([4,1,2,1]))
now the partitions
[[6][4,1,2,1]], [[6][4,1][2,1]], [[6][4,1][2,1]] [[6][4][1,2,1]] [[6][4][1,2][1]] [[6][4,1,2][1]]
are in our current solution set.
Check if [6,4] is a possible/valid part of a partition.
Continue like this until reaching [6,4,1,2,1].
I have two sorted arrays, one containing factors (array a) that when multiplied with values from another array (array b), yields the desired value:
a(idx1) * b(idx2) = value
With idx2 known, I would like find the idx1 of a that provides the factor necessary to get as close to value as possible.
I have looked at some different algorithms (like this one, for example), but I feel like they would all be subject to potential problems with floating point arithmetic in my particular case.
Could anyone suggest a method that would avoid this?
If I understand correctly, this expression
minloc(abs(a-value/b(idx2)))
will return the the index into a of the first occurrence of the value in a which minimises the difference. I expect that the compiler will write code to scan all the elements in a so this may not be faster in execution than a search which takes advantage of the knowledge that a and b are both sorted. In compensation, this is much quicker to write and, I expect, to debug.
I have around 400.000 "items".
Each "item" consists of 16 double values.
At runtime I need to compare items with each other. Therefore I am muplicating their double values. This is quite time-consuming.
I have made some tests, and I found out that there are only 40.000 possible return values, no matter which items I compare with each other.
I would like to store these values in a look-up table so that I can easily retrieve them without doing any real calculation at runtime.
My question would be how to efficiently store the data in a look-up table.
The problem is that if I create a look-up table, it gets amazingly huge, for example like this:
item-id, item-id, compare return value
1 1 499483,49834
1 2 -0.0928
1 3 499483,49834
(...)
It would sum up to around 120 million combinations.
That just looks too big for a real-world application.
But I am not sure how to avoid that.
Can anybody please share some cool ideas?
Thank you very much!
Assuming I understand you correctly, You have two inputs with 400K possibilities, so 400K * 400K = 160B entries... assuming you have them indexed sequentially, and the you stored your 40K possibilities in a way that allowed 2-octets each, you're looking at a table size of roughly 300GB... pretty sure that's beyond current every-day computing. So, you might instead research if there is any correlation between the 400K "items", and if so, if you can assign some kind of function to that correlation that gives you a clue (read: hash function) as to which of the 40K results might/could/should result. Clearly your hash function and lookup needs to be shorter than just doing the multiplication in the first place. Or maybe you can reduce the comparison time with some kind of intelligent reduction, like knowing the result under certain scenarios. Or perhaps some of your math can be optimized using integer math or boolean comparisons. Just a few thoughts...
To speed things up, you should probably compute all of the possible answers, and store the inputs to each answer.
Then, I would recommend making some sort of look up table that uses the answer as the key(since the answers will all be unique), and then storing all of the possible inputs that get that result.
To help visualize:
Say you had the table 'Table'. Inside Table you have keys, and associated to those keys are values. What you do is you make the keys have the type of whatever format your answers are in(the keys will be all of your answers). Now, give your 400k inputs each a unique identifier. You then store the unique identifiers for a multiplication as one value associated to that particular key. When you compute that same answer again, you just add it as another set of inputs that can calculate that key.
Example:
Table<AnswerType, vector<Input>>
Define Input like:
struct Input {IDType one, IDType two}
Where one 'Input' might have ID's 12384, 128, meaning that the objects identified by 12384 and 128, when multiplied, will give the answer.
So, in your lookup, you'll have something that looks like:
AnswerType lookup(IDType first, IDType second)
{
foreach(AnswerType k in table)
{
if table[k].Contains(first, second)
return k;
}
}
// Defined elsewhere
bool Contains(IDType first, IDType second)
{
foreach(Input i in [the vector])
{
if( (i.one == first && i.two == second ) ||
(i.two == first && i.one == second )
return true;
}
}
I know this isn't real C++ code, its just meant as pseudo-code, and it's a rough cut as-is, but it might be a place to start.
While the foreach is probably going to be limited to a linear search, you can make the 'Contains' method run a binary search by sorting how the inputs are stored.
In all, you're looking at a run-once application that will run in O(n^2) time, and a lookup that will run in nlog(n). I'm not entirely sure how the memory will look after all of that, though. Of course, I don't know much about the math behind it, so you might be able to speed up the linear search if you can somehow sort the keys as well.