How to divide vector into quintiles - c++

My question is not so much about code, it's more of an algorithm problem. I am trying to think of a way to divide a vector of doubles into "quintiles" i.e. 5 parts. I then need to find the average of each individual part. The vector can have any number of data points. Does anyone have an idea of how to go about determining what the beginning and end index would be for each quintile? If anyone would like to suggest code, the language I am working with is C++. Thank you.

Related

Problems in coding small straight for Yahtzee game

So this is for a school assignment obviously, and for some reason I can not wrap my head around coding to calculate a small straight for the game of yahtzee. I have done all the coding for all the other possible rolls except this one. I know this is a question asked a lot but i cant find a simple answer anywhere. I need to be able to do it using if, for, and while loops. If anyone can help it would be massively appreciated!
Sort the rolls and count the number of consecutive dice. If you have four then you have a small straight. If all five rolls are consecutive then you have a large straight.

How to implement TSP with dynamic in C++

Recently I asked a question on Stack Overflow asking for help to solve a problem. It is a travelling salesman problem where I have up to 40,000 cities but I only need to visit 15 of them.
I was pointed to use Dijkstra with a priority queue to make a connectivity matrix for the 15 cities I need to visit and then do TSP on that matrix with DP. I had previously only used Dijkstra with O(n^2). After trying to figure out how to implement Dijkstra, I finally did it (enough to optimize from 240 seconds to 0.6 for 40,000 cities). But now I am stuck at the TSP part.
Here are the materials I used for learning TSP :
Quora
GeeksForGeeks
I sort of understand the algorithm (but not completely), but I am having troubles implementing it. Before this I have done dynamic programming with arrays that would be dp[int] or dp[int][int]. But now when my dp matrix has to be dp[subset][int] I don't have any idea how should I do this.
My questions are :
How do I handle the subsets with dynamic programming? (an example in C++ would be appreciated)
Do the algorithms I linked to allow visiting cities more than once, and if they don't what should I change?
Should I perhaps use another TSP algorithm instead? (I noticed there are several ways to do it). Keep in mind that I must get the exact value, not approximate.
Edit:
After some more research I stumbled across some competitive programming contest lectures from Stanford and managed to find TSP here (slides 26-30). The key is to represent the subset as a bitmask. This still leaves my other questions unanswered though.
Can any changes be made to that algorithm to allow visiting a city more than once. If it can be done, what are those changes? Otherwise, what should I try?
I think you can use the dynamic solution and add to each pair of node a second edge with the shortest path. See also this question:Variation of TSP which visits multiple cities.
Here is a TSP implementation, you will find the link of the implemented problem in the post.
The algorithms you linked don't allow visiting cities more than once.
For your third question, I think Phpdna answer was good.
Can cities be visited more than once? Yes and no. In your first step, you reduce the problem to the 15 relevant cities. This results in a complete graph, i.e. one where every node is connected to every other node. The connection between two such nodes might involve multiple cities on the original map, including some of the relevant ones, but that shouldn't be relevant to your algorithm in the second step.
Whether to use a different algorithm, I would perhaps do a depth-first search through the graph. Using a minimum spanning tree, you can give an upper and lower bound to the remaining cities, and use that to pick promising solutions and to discard hopeless ones (aka pruning). There was also a bunch of research done on this topic, just search the web. For example, in cases where the map is actually carthesian (i.e. the travelling costs are the distance between two points on a plane), you can exploit this info to improve the algorithms a bit.
Lastly, if you really intend to increase the number of visited cities, you will find that the time for computing it increases vastly, so you will have to abandon your requirement for an exact solution.

Fast Jaro Winkler c++ code for numeric vectors

Is there any library or the code of a function in C++ that I can use for comparing numeric vectors in C++?
Just googled a little and found smth like that here I think you can use that to write your own which will work with numbers and vectors. Anyways, I dont know that topic but they say it is about strings(in other words letters). Also, I don't think for a project of this scale, speed will be an issue.

Ideas Related to Subset Sum with 2,3 and more integers

I've been struggling with this problem just like everyone else and I'm quite sure there has been more than enough posts to explain this problem. However in terms of understanding it fully, I wanted to share my thoughts and get more efficient solutions from all the great people in here related to Subset Sum problem.
I've searched it over the Internet and there is actually a lot sources but I'm really willing to re-implement an algorithm or finding my own in order to understand fully.
The key thing I'm struggling with is the efficiency considering the set size will be large. (I do not have a limit, just conceptually large). The two phases I'm trying to implement ideas on is finding two numbers that are equal to given integer T, finding three numbers and eventually K numbers. Some ideas I've though;
For the two integer part I'm thing basically sorting the array O(nlogn) and for each element in the array searching for its negative value. (i.e if the array element is 3 searching for -3). Maybe a hash table inclusion could be better, providing a O(1) indexing the element?
For the three or more integers I've found an amazing blog post;http://www.skorks.com/2011/02/algorithms-a-dropbox-challenge-and-dynamic-programming/. However even the author itself states that it is not applicable for large numbers.
So I was for 2 and 3 and more integers what ideas could be applied for the subset problem. I'm struggling with setting up a dynamic programming method that will be efficient for the large inputs as well.
That blog post you linked to looked pretty great, actually. This is, after all, an NP-complete problem...
But I bet you could speed it up even further. I haven't done any benchmarks, but I'm gonna guess that his use of a matrix is his single biggest time sink. First, it'll take a huge amount of memory for some really trivial inputs (For example: [-1000, 1000] will need 2001 columns! Good grief!), and then you're wasting a ton of cycles scanning through each row looking for "T"s, which are often gonna be pretty sparse.
So instead: Use a "set" data structure. That'll keep space and iteration time to a minimum,* but store values just as well: If it's in the set, it's a "T"; otherwise, it's an "F".
Hope that helps!
*: Of course, "minimum" doesn't necessarily = "small."

Graph - strongly connected components

Is there any fast way to determine the size of the largest strongly connected component in a graph?
I mean, like, the obvious approach would mean determining every SCC (could be done using two DFS calls, I suppose) and then looping through them and taking the maximum.
I'm pretty sure there has to be some better approach if I only need to have the size of that component and only the largest one, but I can't think of a good solution. Any ideas?
Thanks.
Let me answer your question with another question -
How can you determine which value in a set is the largest without examining all of the values?
Firstly you could use Tarjan's algorithm which needs only one DFS instead of two. If you understand the algorithm clearly, the SCCs form a DAG and this algo finds them in the reverse topological sort order. So if you have a sense of the graph (like a visual representation) and if you know that relative big SCCs occur at end of the DAG then you could stop the algorithm once first few SCCs are found.