return the sum of the max sublist - list

I have to write a function that takes a list of integers and returns the maximum sum sublist of the list. An example would be:
l = [4,-2,-8,5,-2,7,7,2,-6,5]
returns 19
so far my code is:
count = 0
for i in range(0,len(l)-1):
for j in range(i,len(l)-1):
if l[i] >= l[j]:
count += l[i:j]
return count
I am kind of stuck and confused, can anyone help?
Thank You!

I assume this is a homework, so I won't try to google algorithms here and/or post too much code.
Some ideas (just from the top of my head, 'cause I like these kind of tasks :-))
As user lc already pointed out the naive, and also exhaustive way is to test every single sublist. I believe your (user2101463) code goes in that direction. Just use sum() to build up the sums and compare against a known best. To prime the best known sum with a reasonable starting value, just use the first value of the list.
the_list = [4,-2,-8,5,-2,7,7,2,-6,5]
best_value = the_list[0]
best_idx = (0,0)
for start_element in range(0, len(the_list)+1):
for stop_element in range(start_element+1, len(the_list)+1):
sum_sublist = sum(the_list[start_element:stop_element])
if sum_sublist > best_value:
best_value = sum_sublist
best_idx = (start_element, stop_element)
print("sum(list([{}:{}])) yields the biggest sum of {}".format(best_idx[0], best_idx[1], best_value))
This of course has quadratic runtime O(N^2). That means: If the problem size, as defined by the number of elements of the input list, grows with N, the runtime grows with N*N, with some arbitrary coefficients.
Some heuristics for improvement:
Obviously negative numbers are not good because they decrease the achievable sum
If you encounter a sequence of negative numbers, restart your best sublist after that sequence, if the sum of the best list so far plus the negative numbers is < 0. In your example list the first three numbers cannot be part of a best list because the positive effect of the 4 is always negated by the -2, -8.
Possibly this even leads to an O(N) implementation which just iterates from start to end, memorizing the best known start index while calculating running sums of a full total from that start index as well as positive and negative subtotals of the last continues sequence of positive and negative numbers, respectively.
Once such a best list is found, possibly this requires a final cleanup to remove a trailing negative sublist such as the -6, 5 at the end of your example.
Hope this leads in the right direction.

This is called the 'maximum subarray problem' and can be done in linear time. The wikipedia article has your answer.

The most optimal solution is which takes linear runtime that is O(n).But this problem has "n*lgn" runtime solution(based on divide and conquer algorithm) and "n^2" runtime solution.If you are interested in those algorithms here is the link introduction to algorithms which is highly recommended and here I write a code in java which has linear runtime.
public static void main(String[] args) {
// TODO Auto-generated method stub
Scanner sc=new Scanner(System.in);
int n=sc.nextInt();
int []A=new int[n];
int highestsum=Integer.MIN_VALUE;
int sumvariable=0;
int x=0;
for(int i=0;i<n;i++)
{
A[i]=sc.nextInt();
}
for(int i=0;i<n;i++)
{
sumvariable+=A[i];
if(sumvariable<0)
{
if(sumvariable>=highestsum)
{
highestsum=A[i];
sumvariable=A[i];
}
else
{
sumvariable=0;
}
}
else
{
if(sumvariable>highestsum)
{
highestsum=sumvariable;
}
}
}
System.out.println(highestsum);
}

Related

efficiently mask-out exactly 30% of array with 1M entries

My question's header is similar to this link, however that one wasn't answered to my expectations.
I have an array of integers (1 000 000 entries), and need to mask exactly 30% of elements.
My approach is to loop over elements and roll a dice for each one. Doing it in a non-interrupted manner is good for cache coherency.
As soon as I notice that exactly 300 000 of elements were indeed masked, I need to stop. However, I might reach the end of an array and have only 200 000 elements masked, forcing me to loop a second time, maybe even a third, etc.
What's the most efficient way to ensure I won't have to loop a second time, and not being biased towards picking some elements?
Edit:
//I need to preserve the order of elements.
//For instance, I might have:
[12, 14, 1, 24, 5, 8]
//Masking away 30% might give me:
[0, 14, 1, 24, 0, 8]
The result of masking must be the original array, with some elements set to zero
Just do a fisher-yates shuffle but stop at only 300000 iterations. The last 300000 elements will be the randomly chosen ones.
std::size_t size = 1000000;
for(std::size_t i = 0; i < 300000; ++i)
{
std::size_t r = std::rand() % size;
std::swap(array[r], array[size-1]);
--size;
}
I'm using std::rand for brevity. Obviously you want to use something better.
The other way is this:
for(std::size_t i = 0; i < 300000;)
{
std::size_t r = rand() % 1000000;
if(array[r] != 0)
{
array[r] = 0;
++i;
}
}
Which has no bias and does not reorder elements, but is inferior to fisher yates, especially for high percentages.
When I see a massive list, my mind always goes first to divide-and-conquer.
I won't be writing out a fully-fleshed algorithm here, just a skeleton. You seem like you have enough of a clue to take decent idea and run with it. I think I only need to point you in the right direction. With that said...
We'd need an RNG that can return a suitably-distributed value for how many masked values could potentially be below a given cut point in the list. I'll use the halfway point of the list for said cut. Some statistician can probably set you up with the right RNG function. (Anyone?) I don't want to assume it's just uniformly random [0..mask_count), but it might be.
Given that, you might do something like this:
// the magic RNG your stats homework will provide
int random_split_sub_count_lo( int count, int sub_count, int split_point );
void mask_random_sublist( int *list, int list_count, int sub_count )
{
if (list_count > SOME_SMALL_THRESHOLD)
{
int list_count_lo = list_count / 2; // arbitrary
int list_count_hi = list_count - list_count_lo;
int sub_count_lo = random_split_sub_count_lo( list_count, mask_count, list_count_lo );
int sub_count_hi = list_count - sub_count_lo;
mask( list, list_count_lo, sub_count_lo );
mask( list + sub_count_lo, list_count_hi, sub_count_hi );
}
else
{
// insert here some simple/obvious/naive implementation that
// would be ludicrous to use on a massive list due to complexity,
// but which works great on very small lists. I'm assuming you
// can do this part yourself.
}
}
Assuming you can find someone more informed on statistical distributions than I to provide you with a lead on the randomizer you need to split the sublist count, this should give you O(n) performance, with 'n' being the number of masked entries. Also, since the recursion is set up to traverse the actual physical array in constantly-ascending-index order, cache usage should be as optimal as it's gonna get.
Caveat: There may be minor distribution issues due to the discrete nature of the list versus the 30% fraction as you recurse down and down to smaller list sizes. In practice, I suspect this may not matter much, but whatever person this solution is meant for may not be satisfied that the random distribution is truly uniform when viewed under the microscope. YMMV, I guess.
Here's one suggestion. One million bits is only 128K which is not an onerous amount.
So create a bit array with all items initialised to zero. Then randomly select 300,000 of them (accounting for duplicates, of course) and mark those bits as one.
Then you can run through the bit array and, any that are set to one (or zero, if your idea of masking means you want to process the other 700,000), do whatever action you wish to the corresponding entry in the original array.
If you want to ensure there's no possibility of duplicates when randomly selecting them, just trade off space for time by using a Fisher-Yates shuffle.
Construct an collection of all the indices and, for each of the 700,000 you want removed (or 300,000 if, as mentioned, masking means you want to process the other ones) you want selected:
pick one at random from the remaining set.
copy the final element over the one selected.
reduce the set size.
This will leave you with a random subset of indices that you can use to process the integers in the main array.
You want reservoir sampling. Sample code courtesy of Wikipedia:
(*
S has items to sample, R will contain the result
*)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]

How do I print out vectors in different order every time

I'm trying to make two vectors. Where vector1 (total1) is containing some strings and vector2(total2) is containing some random unique numbers(that are between 0 and total1.size() - 1)
I want to make a program that print out total1s strings, but in different order every turn. I don't want to use iterators or something because I want to improve my problem solving capacity.
Here is the specific function that crash the program.
for (unsigned i = 0; i < total1.size();)
{
v1 = rand() % total1.size();
for (unsigned s = 0; s < total1.size(); ++s)
{
if (v1 == total2[s])
;
else
{
total2.push_back(v1);
++i;
}
}
}
I'm very grateful for any help that I can get!
Can I suggest you change of algorithm?. Because, even if your current one is correctly implemented ("s", in your code, must go from 0 to total2.size not total1.size and if element is found, break and generate a new random), it has the following drawback: assume vectors of 1.000.000 elements and you are trying the last random number. You have one probability in 1.000.000 of find a random number not previously used. That is a very small amount.Last but one number has a probability of 2 in 1.000.000 also small. In conclusion, your program will loop and expend lots of CPU resources.
Your best alternative is follow #NathanOliver suggestion and look for function std::shuffle. The manual page shows the implementation algorithm, that is what you are looking for.
Another simple algorithm, with some pros and cons, is:
init total2 with sequence 0,1,2,...,n where n is the size total1 - 1
choice two random numbers, i1 and i2, in range [0,n-1].
Swap elements i1 and i2 in total2.
repeat from (2) a fixed number of times "R".
This method allows to known a priori the necessary steps and to control the level of "randomness" of the final vector (bigger R is more random). However, it is far to be good in its randomness quality.
Another method, better in the probabilistic distribution:
fill a list L with number 0,1,2,...size total1-1.
choice a random number i between 0 and the size of list L - 1 .
Store in total2 the i-th element in list L.
Remove this element from L.
repeat from (2) until L is empty.
If you just want to shuffle vector<string> total1, you can do this without using helping vector<int> total2. Here is an implementation based on Fisher–Yates shuffle.
for(int i=n-1; i>=1; i--) {
int j=rand()%(i+1);
swap(total1[j], total1[i]); // your prof might not allow use of swap:)
}
If you must use vector<int> total2 then shuffle it using above algorithm. Next you can use it to create a new vector<string> result from total1 where result[i]=total1[total2[i]].

How to calc percentage of coverage in an array of 1-100 using C++?

This is for an assignment so I would appreciate no direct answers; rather, any logic help with my algorithms (or pointing out any logic flaws) would be incredibly helpful and appreciated!
I have a program that receives "n" number of elements from the user to put into a single-dimensional array.
The array uses random generated numbers.
IE: If the user inputs 88, a list of 88 random numbers (each between 1 to 100) is generated).
"n" has a max of 100.
I must write 2 functions.
Function #1:
Determine the percentage of numbers that appear in the array of "n" elements.
So any duplicates would decrease the percentage.
And any missing numbers would decrease the percentage.
Thus if n = 75, then you have a maximum possible %age of 0.75
(this max %age decreases if there are duplicates)
This function basically calls upon function #2.
FUNCTION HEADER(GIVEN) = "double coverage (int array[], int n)"
Function #2:
Using a linear search, search for the key (key being the current # in the list of 1 to 100, which should be from the loop in function #1), in the array.
Return the position if that key is found in the array
(IE: if this is the loops 40th run, it will be at the variable "39",
and will go through every instance of an element in the array
and if any element is equal to 39, all of those positions will be returned?
I believe that is what our prof is asking)
Return -1 if the key is not found.
Given notes = "Only function #1 calls function #2,
and does so to find out if a certain value (key) is found among the first n elements of the array."
FUNCTION HEADER(GIVEN) = "int search (int array[], int n, int key)"
What I really need help with is the logic for the algorithm.
I would appreciate any help with this as I would approach this problem completely differently than our professor wants us.
My first thoughts would be to loop through function #1 for all variable keys of 1 through 100.
And in that loop, go to the search function (function #2), in which a loop would go through every number in the array and add to a counter if a number was (1)a duplicate or (2) non-existent in the array. Then I would subtract that counter from 100. Thus if all numbers were included in the array except for the #40 and #41, and then #77 was a duplicate , the total percentage of coverage would be 100 - 3 = 97%.
Although as I type this I think that may in of itself be flawed? ^ Because with a max of 100 elements in the array, if the only number missing was 99, then you would subtract 1 for having that number missing, and then if there was a duplicate you would subtract another 1, and thus your percentage of coverage would be (100-2) = 98, when clearly it ought to be 99.
And this ^ is exactly why I would REALLY appreciate any logic help. :)
I know I am having some problems approaching this logically.
I think I can figure out the coding with a relative amount of ease; what I am struggling witht he most is the steps to take. So any pseudocode ideas would be amazing!
(I can post my entire program code so far if necessary for anyone, just ask, but it is rather long as of now as I have many other functions performing other tasks in the program)
I may be mistaken, but as I read it all you need to do is:
write a function that loops through the array of n elements to find a given number in it. It would return the index of first occurence, or a negative value in case the number cannot be found in the array.
write a loop to call the function for all numbers 1 to 100 and count the finds. Then divide the result by 100.
I'm not sure if I understand this whole thing right, but 1 function you can do it, if you don't care about speed, it's better to put array into a vector, loop through 1..100 and use boost find function http://www.boost.org/doc/libs/1_41_0/doc/html/boost/algorithm/find_nth.html. There you can compare current value with the second entry value in the vector, if it contains you decrease, not not decrease, if you want to find if the unique number is in array, use http://www.cplusplus.com/reference/algorithm/find/. I don't understand, how the percentage decreases, so it's on your own and I don't rly understand second function, but if its linear search use again find.
P.S. Vector description http://www.cplusplus.com/reference/vector/vector/begin/.
You want to know how many numbers in the range [1, 100] appear in your given array. You can search for each number in turn:
size_t count_unique(int array[], size_t n)
{
size_t result = 0;
for (int i = 1; i <= 100; ++i)
{
if (contains(array, n, i))
{
++result;
}
}
return result;
}
All you still need is an implementation of the containment check contains(array, n, i), and to transform the unique count into a percentage (by using division).

How can I find number of consecutive sequences of various lengths satisfy a particular property?

I am given a array A[] having N elements which are positive integers
.I have to find the number of sequences of lengths 1,2,3,..,N that satisfy a particular property?
I have built an interval tree with O(nlogn) complexity.Now I want to count the number of sequences that satisfy a certain property ?
All the properties required for the problem are related to sum of the sequences
Note an array will have N*(N+1)/2 sequences. How can I iterate over all of them in O(nlogn) or O(n) ?
If we let k be the moving index from 0 to N(elements), we will run an algorithm that is essentially looking for the MIN R that satisfies the condition (lets say I), then every other subset for L = k also is satisfied for R >= I (this is your short circuit). After you find I, simply return an output for (L=k, R>=I). This of course assumes that all numerics in your set are >= 0.
To find I, for every k, begin at element k + (N-k)/2. Figure out if this defined subset from (L=k, R=k+(N-k)/2) satisfies your condition. If it does, then decrement R until your condition is NOT met, then R=1 is your MIN (your could choose to print these results as you go, but they results in these cases would be essentially printed backwards). If (L=k, R=k+(N-k)/2) does not satisfy your condition, then INCREMENT R until it does, and this becomes your MIN for that L=k. This degrades your search space for each L=k by a factor of 2. As k increases and approaches N, your search space continuously decreases.
// This declaration wont work unless N is either a constant or MACRO defined above
unsigned int myVals[N];
unsigned int Ndiv2 = N / 2;
unsigned int R;
for(unsigned int k; k < N; k++){
if(TRUE == TESTVALS(myVals, k, Ndiv2)){ // It Passes
for(I = NDiv2; I>=k; I--){
if(FALSE == TESTVALS(myVals, k, I)){
I++;
break;
}
}
}else{ // It Didnt Pass
for(I = NDiv2; I>=k; I++){
if(TRUE == TESTVALS(myVals, k, I)){
break;
}
}
}
// PRINT ALL PAIRS from L=k, from R=I to R=N-1
if((k & 0x00000001) == 0) Ndiv2++;
} // END --> for(unsigned int k; k < N; k++)
The complexity of the algorithm above is O(N^2). This is because for each k in N(i.e. N iterations / tests) there is no greater than N/2 values for each that need testing. Big O notation isnt concerned about the N/2 nor the fact that truly N gets smaller as k grows, it is concerned with really only the gross magnitude. Thus it would say N tests for every N values thus O(N^2)
There is an Alternative approach which would be FASTER. That approach would be to whenever you wish to move within the secondary (inner) for loops, you could perform a move have the distance algorithm. This would get you to your O(nlogn) set of steps. For each k in N (which would all have to be tested), you run this half distance approach to find your MIN R value in logN time. As an example, lets say you have a 1000 element array. when k = 0, we essentially begin the search for MIN R at index 500. If the test passes, instead of linearly moving downward from 500 to 0, we test 250. Lets say the actual MIN R for k = 0 is 300. Then the tests to find MIN R would look as follows:
R=500
R=250
R=375
R=312
R=280
R=296
R=304
R=300
While this is oversimplified, your are most likely going to have to optimize, and test 301 as well 299 to make sure youre in the sweet spot. Another not is to be careful when dividing by 2 when you have to move in the same direction more than once in a row.
#user1907531: First of all , if you are participating in an online contest of such importance at national level , you should refrain from doing this cheap tricks and methodologies to get ahead of other deserving guys. Second, a cheater like you is always a cheater but all this hampers the hard work of those who have put in making the questions and the competitors who are unlike you. Thirdly, if #trumetlicks asks you why haven't you tagged the ques as homework , you tell another lie there.And finally, I don't know how could so many people answer this question this cheater asked without knowing the origin/website/source of this question. This surely can't be given by a teacher for homework in any Indian school. To tell everyone this cheater has asked you the complete solution of a running collegiate contest in India 6 hours before the contest ended and he has surely got a lot of direct helps and top of that invited 100's others to cheat from the answers given here. So, good luck to all these cheaters .

How to get 2 random (different) elements from a c++ vector

I would like to get 2 random different elements from an std::vector. How can I do this so that:
It is fast (it is done thousands of times in my algorithm)
It is elegant
The elements selection is really uniformly distributed
For elegance and simplicty:
void Choose (const int size, int &first, int &second)
{
// pick a random element
first = rand () * size / MAX_RAND;
// pick a random element from what's left (there is one fewer to choose from)...
second = rand () * (size - 1) / MAX_RAND;
// ...and adjust second choice to take into account the first choice
if (second >= first)
{
++second;
}
}
using first and second to index the vector.
For uniformness, this is very tricky since as size approaches RAND_MAX there will be a bias towards the lower values and if size exceeds RAND_MAX then there will be elements that are never chosen. One solution to overcome this is to use a binary search:
int GetRand (int size)
{
int lower = 0, upper = size;
do
{
int mid = (lower + upper) / 2;
if (rand () > RAND_MAX / 2) // not a great test, perhaps use parity of rand ()?
{
lower = mid;
}
else
{
upper = mid;
}
} while (upper != lower); // this is just to show the idea,
// need to cope with lower == mid and lower != upper
// and all the other edge conditions
return lower;
}
What you need is to generate M uniformly distributed random numbers from [0, N) range, but there is one caveat here.
One needs to note that your statement of the problem is ambiguous. What is meant by the uniformly distributed selection? One thing is to say that each index has to be selected with equal probability (of M/N, of course). Another thing is to say that each two-index combination has to be selected with equal probability. These two are not the same. Which one did you have in mind?
If M is considerably smaller than N, the classic algorithm for selecting M numbers out of [0, N) range is Bob Floyd algorithm that can be found in Bentley's "Programming Peals" book. It looks as follows (a sketch)
for (int j = N - M; i < N; ++j) {
int rand = random(0, j); // generate a random integer in range [0, j]
if (`rand` has not been generated before)
output rand;
else
output j;
}
In order to implement the check of whether rand has already been generated or not for relatively high M some kind of implementation of a set is necessary, but in your case M=2 it is straightforward and easy.
Note that this algorithm distributes the sets of M numbers uniformly. Also, this algorithm requires exactly M iterations (attempts) to generate M random numbers, i.e. it doesn't follow that flawed "trial-and-error" approach often used in various ad-hoc algorithms intended to solve the same problem.
Adapting the above to your specific situation, the correct algorithm will look as follows
first = random(0, N - 2);
second = random(0, N - 1);
if (second == first)
second = N - 1;
(I leave out the internal details of random(a, b) as an implementation detail).
It might not be immediately obvious why the above works correctly and produces a truly uniform distribution, but it really does :)
How about using a std::queue and doing std::random_shuffle on them. Then just pop til your hearts content?
Not elegant, but extreamly simple: just draw a random number in [0, vector.size()[ and check it's not twice the same.
Simplicity is also in some way elegance ;)
What do you call fast ? I guess this can be done thousands of times within a millisecond.
Whenever need something random, you are going to have various questions about the random number properties regarding uniformity, distribution and so on.
Assuming you've found a suitable source of randomness for your application, then the simplest way to generate pairs of uncorrelated entries is just to pick two random indexes and test them to ensure they aren't equal.
Given a vector of N+1 entries, another option is to generate an index i in the range 0..N. element[i] is choice one. Swap elements i and N. Generate an index j in the range 0..(N-1). element[j] is your second choice. This slowly shuffles your vector which may be problematical, but it can be avoided by using a second vector which holds indexes into the first, and shuffling that. This method trades a swap for the index comparison and tends to be more efficient for small vectors (a dozen or fewer elements, typically) as it avoids having to do multiple comparisons as the number of collisions increase.
You might wanna look into the gnu scientific library. There are some pretty nice random number generators in there that are guaranteed to be random down to the bit level.