Find largest mode in huge data set without timing out

Find largest mode in huge data set without timing out - c++

Description
In statistics, there is a measure of the distribution called the mode. The mode is the data that appears the most in a data set. A data set may have more than one mode, that is, when there is more than one data with the same number of occurrences.
Mr. Dengklek gives you N integers. Find the greatest mode of the numbers.
Input Format
The first line contains an integer N. The next line contains N integers.
Output Format
A row contains an integer which is the largest mode.
Input Example
6
1 3 2 4 1 4
Example Output
4
Limits
1 ≤ N ≤100,000
1≤(every integer on the second line)≤1000
#include <iostream>
#include <string>
using namespace std;
#define ll long long
int main() {
unsigned int N;
while(true){
cin >> N;
if(N > 0 && N <= 1000){
break;
}
}
int arr[N];
int input;
for (int k = 0; k < N; k++)
{
cin >> input;
if(input > 0 && input <=1000){
arr[k] = input;
}
else{
k -= 1;
}
}
int number;
int mode;
int position;
int count = 0;
int countMode = 1;
for (int i = 0; i < N; i++)
{
number = arr[i];
for (int j = 0; j < N; j++)
{
if(arr[j] == number){
++count;
}
}
if(count > countMode){
countMode = count;
mode = arr[i];
position = i;
}
else if(count == countMode){
if(arr[i] > arr[position]){
mode = arr[i];
position = i;
}
}
count = 0;
}
cout << mode << endl;
return 0;
}
I got a "RTE" (run time error) and 70 pts.
Here is the code which I got 80 pts but got "TLE" (time limit exceeded):
#include <bits/stdc++.h>
using namespace std;
#define ll long long
int main() {
unsigned int N;
while(true){
cin >> N;
if(N > 0 && N <= 100000){
break;
}
}
int arr[N];
int input;
for (int k = 0; k < N; k++)
{
cin >> input;
if(input > 0 && input <=1000){
arr[k] = input;
}
else{
k -= 1;
}
}
int number;
vector<int> mode;
int count = 0;
int countMode = 1;
for (int i = 0; i < N; i++)
{
number = arr[i];
for (int j = 0; j < N; j++)
{
if(arr[j] == number){
++count;
}
}
if(count > countMode){
countMode = count;
mode.clear();
mode.push_back(arr[i]);
}
else if(count == countMode){
mode.push_back(arr[i]);
}
count = 0;
}
sort(mode.begin(), mode.end(), greater<int>());
cout << mode.front() << endl;
return 0;
}
How can I accelerate the program?

As already noted, the algorithm implemented in both of the posted snippets has O(N2) time complexity, while there exists an O(N) alternative.
You can also take advantage of some of the algorithms in the Standard Library, like std::max_element, which returns an
iterator to the greatest element in the range [first, last). If several elements in the range are equivalent to the greatest element, returns the iterator to the first such element.
#include <algorithm>
#include <array>
#include <iostream>
int main()
{
constexpr long max_N{ 100'000L };
long N;
if ( !(std::cin >> N) or N < 1 or N > max_N )
{
std::cerr << "Error: Unable to read a valid N.\n";
return 1;
}
constexpr long max_value{ 1'000L };
std::array<long, max_value> counts{};
for (long k = 0; k < N; ++k)
{
long value;
if ( !(std::cin >> value) or value < 1 or value > max_value )
{
std::cerr << "Error: Unable to read value " << k + 1 << ".\n";
return 1;
}
++counts[value - 1];
}
auto const it_max_mode{ std::max_element(counts.crbegin(), counts.crend()) };
// If we start from the last... ^^ ^^
std::cout << std::distance(it_max_mode, counts.crend()) << '\n';
// The first is also the greatest.
return 0;
}
Compiler Explorer demo
I got a "RTE" (run time error)
Consider this fragment of the first snippet:
int number;
int mode;
int position; // <--- Note that it's uninitialized
int count = 0;
int countMode = 1;
for (int i = 0; i < N; i++)
{
number = arr[i];
// [...] Evaluate count.
if(count > countMode){
countMode = count;
mode = arr[i];
position = i; // <--- Here it's assigned a value, but...
}
else if(count == countMode){ // If this happens first...
if(arr[i] > arr[position]){
// ^^^^^^^^^^^^^ Position may be indeterminate, here
mode = arr[i];
position = i;
}
}
count = 0;
}
Finally, some resources worth reading:
Why is “using namespace std;” considered bad practice?
Why should I not #include <bits/stdc++.h>?
Using preprocessing directive #define for long long
Why aren't variable-length arrays part of the C++ standard?

You're overcomplicating things. Competitive programming is a weird beast were solutions assume limited resources, whaky amount of input data. Often those tasks are balanced that way that they require use of constant time alternate algorithms, summ on set dynamic programming. Size of code is often taken in consideration. So it's combination of math science and dirty programming tricks. It's a game for experts, "brain porn" if you allow me to call it so: it's wrong, it's enjoyable and you're using your brain. It has little in common with production software developing.
You know that there can be only 1000 different values, but there are huge number or repeated instances. All that you need is to find the largest one. What's the worst case of finding maximum value in array of 1000? O(1000) and you check one at the time. And you already have to have a loop on N to input those values.
Here is an example of dirty competitive code (no input sanitation at all) to solve this problem:
#include <bits/stdc++.h>
using namespace std;
using in = unsigned short;
array<int, 1001> modes;
in biggest;
int big_m;
int N;
int main()
{
cin >> N;
in val;
while(N --> 0){
cin >> val;
if(val < 1001) {
modes[val]++;
}
else
continue;
if( modes[val] == big_m) {
if( val > biggest )
biggest = val;
}
else
if( modes[val] > big_m) {
biggest = val;
big_m = modes[val];
}
}
cout << biggest;
return 0;
}
No for loops if you don't need them, minimalistic ids, minimalistic data to store. Avoid dynamic creation and minimize automatic creation of objects if possible, those add execution time. Static objects are created during compilation and are materialized when your executable is loaded.
modes is an array of our counters, biggest stores largest value of int for given maximum mode, big_m is current maximum value in modes. As they are global variables, they are initialized statically.
PS. NB. The provided example is an instance of stereotype and I don't guarantee it's 100% fit for that particular judge or closed test cases it uses. Some judges use tainted input and some other things that complicate life of challengers, there is always a factor of unknown. E.g. this example would faithfully output "0" if judge would offer that among input values even if value isn't in range.

Related

Binary tree and Processors (C++ Codeforces Problem)

As the title says, I am trying to solve this problem which I couldn't find a solution on Youtube or somewhere else...
So here is the problem statement:
Eonathan Eostar decided to learn the magic of multiprocessor systems. He has a full binary tree of tasks with height h. In the beginning, there is only one ready task in the tree — the task in the root. At each moment of time, p processes choose at most p ready tasks and perform them. After that, tasks whose parents were performed become ready for the next moment of time. Once the task becomes ready, it stays ready until it is performed.
You shall calculate the smallest number of time moments the system needs to perform all the tasks.
Input:
The first line of the input contains the number of tests t (1≤t≤5⋅105). Each of the next t lines contains the description of a test. A test is described by two integers h (1≤h≤50) and p (1≤p≤104) — the height of the full binary tree and the number of processes. It is guaranteed that all the tests are different.
Output:
For each test output one integer on a separate line — the smallest number of time moments the system needs to perform all the tasks
Example:
input:
3
3 1
3 2
10 6
output:
7
4
173
I am a new C++ learner, so I thought of this way to solve this question:
I count all the nodes (pow(2,height)-1)
For each row I count the available nodes and put an if statement which says: If the available nodes at this row are smaller than the processors number then count++, else while the available nodes are bigger than zero (node_at_m -= m[i])
[node_at_m = Nodes available at this row; m[i] = processors number given in the question]
It gives correct answer for the first 2 cases which is (3 1) and (3 2) but it gives me wrong answer on the third case (10 6), so here is my code:
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
int t, node,nodeatm, count;
cin >> t;
int n[t], m[t];
for (int i = 0; i < t; i++)
{
cin >> n[i];
cin >> m[i];
node = pow(2,n[i])-1;
count = 0;
for(int q = 0; q < n[i]; q++)
{
nodeatm = pow(2,q);
if(nodeatm <= m[i])
{
count++;
}
else
while(nodeatm > 0)
{
nodeatm -= m[i];
count++;
}
}
cout << count << endl;
}
return 0;
}
I am really not a big fan of posting Codeforces questions on here, but I couldn't find any resource for this question on the Internet...
Waiting your answers, thanks.

The problem with above code is that you are incorrectly handling the case when some of the tasks are remaining from previous level. You are assuming that all tasks must finished from one level before we move to another level.
Following is corrected code. You can see it working here:
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
int t, node,nodeatm, count;
cin >> t;
int n[t], m[t];
for (int i = 0; i < t; i++)
{
cin >> n[i];
cin >> m[i];
node = pow(2,n[i])-1;
count = 0;
int rem = 0;
for(int q = 0; q < n[i]; q++)
{
nodeatm = pow(2,q) + rem ;
if(nodeatm <= m[i])
{
count++;
rem = 0;
}
else
{
while(nodeatm >= m[i])
{
nodeatm -= m[i];
count++;
}
rem = nodeatm;
}
}
if( rem )
{
count++;
}
cout << count << endl;
}
return 0;
}
Following is a bit simplified code. You can see it working here:
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
int t;
cin >> t;
for (int i = 0; i < t; i++)
{
int rem = 0, n, m, count = 0;
cin >> n >> m;
for(int q = 0; q < n; q++)
{
int nodeatm = pow(2,q) + rem;
if( nodeatm < m)
{
count++;
rem = 0;
}
else
{
count += ( nodeatm/ m );
rem = ( nodeatm % m );
}
}
if( rem )
count++;
cout << count << endl;
}
return 0;
}

Function not printing any solutions

So, I need to make a function that is going to return the chromatic number of a graph. The graph is given through an adjecency matrix that the function finds using a file name. I have a function that should in theory work and which the compiler is throwing no issues for, yet when I run it, it simply prints out an empty line and ends the program.
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
using namespace std;
int Find_Chromatic_Number (vector <vector <int>> matg, int matc[], int n) {
if (n == 0) {
return 0;
}
int result, i, j;
result = 0;
for (i = 0; i < n; i++) {
for (j = i; j < n; j++) {
if (matg[i][j] == 1) {
if (matc[i] == matc[j]) {
matc[j]++;
}
}
}
}
for (i = 0; i < n; i++) {
if (result < matc[i]) {
result = matc[i];
}
}
return result;
}
int main() {
string file;
int n, i, j, m;
cout << "unesite ime datoteke: " << endl;
cin >> file;
ifstream reader;
reader.open(file.c_str());
reader >> n;
vector<vector<int>> matg(n, vector<int>(0));
int matc[n];
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
reader >> matg[i][j];
}
matc[i] = 1;
}
int result = Find_Chromatic_Number(matg, matc, n);
cout << result << endl;
return 0;
}
The program is supposed to use an freader to convert the file into a 2D vector which represents the adjecency matrix (matg). I also made an array (matc) which represents the value of each vertice, with different numbers corresponding to different colors.
The function should go through the vector and every time there is an edge between two vertices it should check if their color value in matc is the same. If it is, it ups the second vale (j) by one. After the function has passed through the vector, the matc array should contain n different number with the highest number being the chromatic number I am looking for.
I hope I have explained enough of what I am trying to accomplish, if not just ask and I will add any further explanations.

Try to make it like that.
Don't choose a size for your vector
vector<vector<int> > matg;
And instead of using reader >> matg[i][j];
use:
int tmp;
reader >> tmp;
matg[i].push_back(tmp);

Splitting an array at a given value

Hello I am trying to split an array any time there is a negative value (excluding the negative value) and am a bit stuck at the moment. I tried an approach as seen in my code but I am not getting the desired output.
#include <iostream>
#include <string>
#include <vector>
#include <fstream>
using namespace std;
int main()
{
string line;
string filename;
int n,length;
std::vector<int>arr1;
fstream file("t1.txt");
if(file.is_open())
{
while(file >> n)
arr1.push_back(n);
for(int i =0; i < (int)arr1.size(); i++)
cout << arr1.at(i);
}
cout << endl;
int* arr2 = &arr1[0];
int arr3[arr1.size()/2];
int arr4[arr1.size()/2];
for(int i = 0; i < arr1.size(); i++)
{
cout << arr2[i];
}
for (int i =0; i < arr1.size(); i++)
{
if(i == -1)
break;
else
arr3[i] = arr2[i];
}
return 0;
}

The main problem is here:
int arr3[arr1.size()/2];
int arr4[arr1.size()/2];
This doesn't compile, and can be replaced with
std::vector<int> arr3; arr3.reserve(arr1.size() / 2);
std::vector<int> arr4; arr4.reserve(arr1.size() / 2);
I've added the "reserve" function so that the program doesn't have to allocate memory over and over in the loop.
Next, you are checking i in your loop, and your i loops from 0 to arr1.size() (which is unsigned so can't be negative) therefore i will never be negative.
What you really wanna check is what is in the arr1 vector at "i" position, and you can do so with the [] operator like
for (int i =0; i < arr1.size(); i++)
{
if (arr1[i] >= 0) //if the value is positive, we push it inside our arr3 vector
arr3.push_back(arr1[i]);
else
{
i++; //skip negative value
//
while (i < arr1.size())
{
if (arr1[i] > 0)
arr4.push_back(arr1[i]);
i++;
}
//
//or
//insert all the elemenents we haven't processed yet in the arr4 vector
//this code assumes those elements are positive values
//arr4.insert(arr4.begin(), arr1.begin() + i, arr1.end());
//break;
}
}
Of course this could be done in a different way, like instead of creating 2 vectors, you could just use the one you have generated already.
Hope this helps.

There are several problems in your code
you should not access the vector's data this way unless you really need to
you prepare arrays with predefined size without knowing where to expect the negative values
you do not assign anything to your array 4
you check the index for being negative, not the value
according to your text there could be several negative values leading to multiple result-arrays. You seem to be prepared for only two.
Here is some code that actually splits when encountering negative values:
std::vector<vector<int> > splitted;
for (int i = 0; i < arr1.size(); ++i)
{
if (i ==0 or arr1[i] < 0)
splitted.push_back(std::vector<int>());
if (arr1[i] >= 0)
splitted.back().push_back(arr1[i]);
}
Testing it:
for (int i = 0; i < splitted.size(); ++i)
{
for (int k = 0; k < splitted[i].size(); ++k)
{
std::cout << splitted[i][k];
}
if (splitted[i].empty())
std::cout << "(emtpy)";
std::cout << '\n';
}
Using the following test input
1 2 3 -1 1 -1 -1
You get the following output:
123
1
(emtpy)
(emtpy)

Exceeding memory limit or time limit

I'm attempting to create a program that will multiply 3 not equal positions from vector 1 ('V1'), and find the maximum multiplication.
I'm using 3 'for' loops for counting and writing. The program gets the position amount 'N', then all 'N' numbers in 'input.txt'. After that, it gets the greatest position 'max' and writes it in 'output.exe'.
But I need to keep the program as efficient as possible, 16 MB memory limit and 1 second time limit (I get 1.004 second and 33 MB). Is there a more efficient way to do this?
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>
using namespace std;
int main()
{
int N;
long long max = -9223372036854775807;
int input;
vector<long long> V1;
ifstream file1;
file1.open("input.txt");
file1 >> N;
V1.resize(N);
for (int i = 0; i < N; i++)
{
file1 >> V1[i];
}
for (int i = 0; i < N; i++)
for (int j = 0; j < i; j++)
for (int k = 0; k < j; k++)
if (V1[i] * V1[j] * V1[k] > max)
{
max = V1[i] * V1[j] * V1[k];
}
ofstream file2;
file2.open("output.txt");
file2 << max;
file2.close();
}
File Input.txt
5
10 10 10 -300 - 300

From looking at what you have done, you have to find the greatest of the product of 3 numbers in a given input vector.
Just sort vector V1 and output the max of (product of 1st 3 elements or 1st and last 2 elements). This is efficient in both space and time.
Like this:
sort(V1.begin(),V1.end(),greater<int>()) //sorts in descending order
int n = V1.size()-1;
output max(V1[0] * V1[1] * V1[2], V1[0] * V1[n] * V1[n-1])

The first, which comes to my mind - why do you store these values? You only need the single maximum value - there is no need to store all these values, push them, and, moreover, sort them out.
Another important notices:
You have a vector of long long, but you read ints. Since you have big numbers in your input, use long long everywhere
Pushing an item and popping it back is senseless - you should have checked it before pushing to avoid two unnecessary operations
Anyway, you don't need to compare i, j, k for equivalence at all - they are never equal according to your loop restrictions
Pushing items to an array when you know their number is wrong. It takes more time to extend a vector. You may want to resize it to the given size.
Probably, this code will meet your memory \ time requirements:
int N;
long long maximum = -9223372036854775807; // Subject to limits.h LLONG_MIN usage
vector<long long> V1;
ifstream file1;
file1.open("input.txt");
file1 >> N;
V1.resize(N);
for (int i = 0; i < N; i++){
file1 >> V1[i];
}
file1.close();
for (int i = 0; i < N; i++)
for (int j = 0; j < i; j++)
for (int k = 0; k < j; k++)
if (V1[i] * V1[j] * V1[k] > maximum)
maximum = V1[i] * V1[j] * V1[k];
ofstream file2;
file2.open("output.txt");
file2 << maximum;
file2.close();

Well, as soon as I see size and time reduction, I tend to remove all unnecessary language goodies, because they do help in proper programming but only come at a resource expense.
So if you really wanted to keep all products of different indices of a list of values, I would advice you to throw away vectors, push and pop and use fixed size arrays.
But before that low-level optimisation, we must think of all possible algorithmic optimisations. You only want be biggest products from all possible from three different values taken from a list. But for positive numbers, a >= b <=> a *c >= b *c and the product of two negative numbers is positive.
So the highest product may only come from:
product of three highest positive values
product of one highest positive value and two lowest negative values (highest in absolute value)
product of three highest negative values if there are no positive values
So you do not even need to load the full initial vector but just keep:
Three highest positive values
Three highest negative values
Two lowest negative values
You get them by storing them at read time in O(n) time and only store eight values. If you only have five values, it is not efficient at all, but it will be linear in time and constant in size whatever number of values you process.
Possible implementation:
#include <iostream>
#include <fstream>
#include <climits>
using namespace std;
class Max3 {
long long pmax[3];
long long nmax[3];
long long nmin[2];
void push(long long *record, long long val, size_t pos) {
for(size_t i=0; i<pos; i++) {
record[i] = record[i + 1];
}
record[pos] = val;
}
void set(long long *record, long long val, size_t sz) {
for (size_t i=1; i<sz; i++) {
if (val < record[i]) {
push(record, val, i - 1);
return;
}
}
push(record, val, sz - 1);
}
public:
Max3() {
size_t i;
for (i=0; i<sizeof(pmax)/sizeof(pmax[0]); i++)
pmax[i] = 0;
for (i=0; i<sizeof(nmin)/sizeof(nmin[0]); i++)
nmin[i] = 0;
for (i=0; i<sizeof(nmax)/sizeof(nmax[0]); i++)
nmax[i] = LLONG_MIN;
}
void test(long long val) {
if (val >= *pmax) {
set(pmax, val, 3);
}
else if (val <= 0) {
if (val <= *nmin) {
set(nmin, -val, 2);
}
if (val >= *nmax) {
set(nmax, val, 3);
}
}
}
long long getMax() {
long long max = 0, prod, pm;
if ((prod = pmax[0] * pmax[1] * pmax[2]) > max)
max = prod;
if (pmax[2] > 0)
pm = pmax[2];
else if (pmax[1] > 0)
pm = pmax[1];
else
pm = pmax[0];
if ((prod = nmin[0] * nmin[1] * pm) > max)
max = prod;
if ((prod = nmax[0] * nmax[1] * nmax[2]) > max)
max = prod;
return max;
}
};
int main() {
int N;
long long input;
Max3 m3;
ifstream file1;
file1.open("input.txt");
file1 >> N;
for (int i = 0; i < N; i++){
file1 >> input;
m3.test(input);
}
file1.close();
ofstream file2;
file2.open("output.txt");
file2 << m3.getMax();
file2.close();
return 0;
}
The code is slightly more complex, but the program size is only 35 KB, with little dynamic allocation.

After replacing the 'for' loops with a sort of vector 1 'V1' (in descending order), the program compares the products 'V1[0] * V1[1] * V1[2]' and 'V1[0] * V1[N] * V1[N - 1', and then prints the maximum in file output.txt:
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>
#include <functional>
using namespace std;
int main()
{
int N;
long long max = -9223372036854775807;
int input;
vector<long long> V1;
ifstream file1;
file1.open("input.txt");
file1 >> N;
V1.resize(N);
for (int i = 0; i < N; i++){
file1 >> V1[i];
}
sort(V1.begin(), V1.end(), greater<int>());
N -= 1;
max = V1[0] * V1[1] * V1[2];
if (max < V1[0] * V1[N] * V1[N - 1])
max = V1[0] * V1[N] * V1[N - 1];
ofstream file2;
file2.open("output.txt");
file2 << max;
file2.close();
}

Stack Corruption error around the int array in c++

I am trying to write some c++ code that is a demo for a formula but using Recursion.
Here is my program and the error it throws.
Environment - Visual Studio 2012
Compilation - Successful
Runtime Exception -
Run-Time Check Failure #2 - Stack around the variable 'inputNumbers' was corrupted.
Code -
#include <stdlib.h>
#include <iostream>
using namespace std;
int FindNumber(int Numbers[],int index,int sum, int count)
{
if(count == 0)
return sum;
else if (count == 1)
{
sum -= Numbers[index-1];
index = index -1;
count = count-1;
return sum = FindNumber(Numbers,index,sum,count);
}
else
{
sum += Numbers[index-1];
index = index -1;
count = count-1;
return sum = FindNumber(Numbers,index,sum,count);
}
}
void main()
{
int inputNumbers[50]; //declare the series of numbers
int cnt = 0; //define and initailize an index counter for inserting the values in number series.
int position = 7; //defines the position of the number in the series whose value we want to find.
// insert the number series values in the int array.
for (int i = 1; i < 51; i++)
{
inputNumbers[cnt] = i;
cnt++;
inputNumbers[cnt] = i;
cnt++;
}
cnt=0;
for (int i = 1; i < 51; i++)
{
cout<<inputNumbers[cnt]<<endl;
cnt++;
cout<<inputNumbers[cnt]<<endl;
cnt++;
}
// set another counter variable to 3 since formula suggests that we need to substrat 3 times from the nth position
// Formula : nth = (n-1)th + (n-2)th - (n-3)th
cnt = 3;
int FoundNumber = 0;
//Check if position to be found is greater than 3.
if(position>3)
{
FoundNumber = FindNumber(inputNumbers,position,FoundNumber,cnt);
cout<< "The number found is : " << FoundNumber<< endl;
}
else
{
cout<<"This program is only applicable for finding numbers of a position value greater than 3..."<<endl;
}
}
The entire program is working perfect as per the logic I expect and gives proper output when i debug it but throw exception while exiting the main() after execution is complete.
I see i am doing a really silly but an intricate memory management mistake[and cannot find it].
Any help is appreciated.

Aren't you filling twice the size of the array here?
for (int i = 1; i < 51; i++)
{
inputNumbers[cnt] = i;
cnt++;
inputNumbers[cnt] = i;
cnt++;
}

For an array of length 50 you cannot access beyond element 49; so code should be like:
int inputNumbers[50]; //declare the series of numbers
int cnt = 0; //define and initailize an index counter for inserting the values in number series.
// insert the number series values in the int array.
for (int i = 0; i < 50; i++)
{
inputNumbers[cnt] = i;
cnt++;
}
And indeed as in the previous answer you probably want to increment cnt only once.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find largest mode in huge data set without timing out - c++

Related

Binary tree and Processors (C++ Codeforces Problem)

Function not printing any solutions

Splitting an array at a given value

Exceeding memory limit or time limit

Stack Corruption error around the int array in c++

Categories

Resources