Hi,
I am participating in programming contest. My algorithm is fine with number of sets to 5000.
Sets of values are consist of three integers.
But I enter 300 000 sets of numbers, it takes too long.
Limit of running program: 14s.
Fetching data: 576s. (Way too long)
My formatted input is:
300000
a b c
300000 - number of sets
a, b, c - elements of the set
My algorithm (dont judge about the code):
#include <iostream>
using namespace std;
int min_replacements(int n, int *ds, int *ps, int *rs);
int max(int a, int b, int c);
bool ot(int a, int b, int c);
bool ooo(int a, int b, int c);
bool to(int a, int b, int c);
int main()
{
int n = 0;
cin >> n;
int *ds, *ps, *rs;
ds = new int[n];
ps = new int[n];
rs = new int[n];
int d{}, p{}, r{};
for (int i = 0; i < n; i++)
{
scanf("%d %d %d", &ds[i], &ps[i], &rs[i]);
printf("%d", i);
}
int t = min_replacements(n, ds, ps, rs);
printf("%d\n", t);
delete[] ds;
delete[] ps;
delete[] rs;
}
bool ot(int a, int b, int c)
{
return (a != 0 && b == 0 && c == 0);
}
bool ooo(int a, int b, int c)
{
return (a == 0 && b != 0 && c == 0);
}
bool to(int a, int b, int c)
{
return (a == 0 && b == 0 && c != 0);
}
int max(int a, int b, int c)
{
int m = 0;
if (a == b && c < a)
{
m = a;
}
if (b == c && a < b)
{
m = b;
}
if (a == c && b < c)
{
m = c;
}
if (b < a && c < a)
{
m = a;
}
if (a < b && c < b)
{
m = b;
}
if (a < c && b < c)
{
m = c;
}
if (a == b && b == c)
{
m = a;
}
return m;
}
int min_replacements(int n, int *ds, int *ps, int *rs)
{
int t = 0;
if (ds[0] == ps[0] && ps[0] == rs[0] && ds[0] == rs[0])
{
return (n + ps[0]) * rs[0];
}
bool loop = true;
while (loop)
{
for (int i = 0; i < n - 1; ++i)
{
if (ot(*(ds + i), *(ps + i), *(rs + i)) || ooo(*(ds + i), *(ps + i), *(rs + i)) || to(*(ds + i), *(ps + i), *(rs + i)))
{
continue;
}
int m = max(*(ds + i), *(ps + i), *(rs + i));
if (m == *(ds + i))
{
*(ps + i + 1) += *(ps + i);
*(rs + i + 1) += *(rs + i);
*(ps + i) = *(rs + i) = 0;
t += 2;
}
if (m == *(ps + i))
{
*(ds + i + 1) += *(ds + i);
*(rs + i + 1) += *(rs + i);
*(ds + i) = *(rs + i) = 0;
t += 2;
}
if (m == *(rs + i))
{
*(ds + i + 1) += *(ds + i);
*(ps + i + 1) += *(ps + i);
*(ps + i) = *(ds + i) = 0;
t += 2;
}
}
for (int i = 0; i < n; ++i)
{
if (ot(*(ds + i), *(ps + i), *(rs + i)) || ooo(*(ds + i), *(ps + i), *(rs + i)) || to(*(ds + i), *(ps + i), *(rs + i)))
{
loop = false;
}
else
{
loop = true;
}
}
if (loop)
{
*ds += *(ds + n - 1);
*ps += *(ps + n - 1);
*rs += *(rs + n - 1);
*(ds + n - 1) = *(ps + n - 1) = *(rs + n - 1) = 0;
t -= 2;
}
}
if (t == 0)
return 0;
return t + 1;
}
I used a cin in this algorithm
Can you help me? Thank you so much.
How do you know the std::cin part is the problem? Did you profile your code? If not, I suggest doing that, it's often surprising which part of the code is taking up most time. See e.g. How can I profile C++ code running on Linux?.
You're doing a lot of unnecessary work in various parts of the code. For example, your max function does at least 7 comparissons, and looks extremely error prone to write. You could simply replace the whole function by:
std::max({ a, b, c })
I would also take a look at your min_replacements function and see if it can be simplified. Unfortunately, you're using variable names which are super vague, so it's pretty much impossible to understand what the code should be doing. I suggest using much more descriptive variable names. That way the code will become much easier to reason about. The way it's currently written, there's a very good change even you yourself won't be able to make sense of it in a month's time.
Just glacing over the min_replacements function though, there's definitely a lot more work going on than necessary. E.g. the last for-loop:
for (int i = 0; i < n; ++i)
{
if (ot(*(ds + i), *(ps + i), *(rs + i)) || ooo(*(ds + i), *(ps + i), *(rs + i)) || to(*(ds + i), *(ps + i), *(rs + i)))
{
loop = false;
}
else
{
loop = true;
}
}
Each loop iterator sets the loop variable. Assuming this code is correct, you don't need the loop at all, just do the check only once for i = n - 1. That's already O(n) changed to O(1).
Related
The Karatsuba multiplication algorithm implementation does not output any result and exits with code=3221225725.
Here is the message displayed on the terminal:
[Running] cd "d:\algorithms_cpp\" && g++ karatsube_mul.cpp -o karatsube_mul && "d:\algorithms_cpp\"karatsube_mul
[Done] exited with code=3221225725 in 1.941 seconds
Here is the code:
#include <bits/stdc++.h>
using namespace std;
string kara_mul(string n, string m)
{
int len_n = n.size();
int len_m = m.size();
if (len_n == 1 && len_m == 1)
{
return to_string((stol(n) * stol(m)));
}
string a = n.substr(0, len_n / 2);
string b = n.substr(len_n / 2);
string c = m.substr(0, len_m / 2);
string d = m.substr(len_m / 2);
string p1 = kara_mul(a, c);
string p2 = kara_mul(b, d);
string p3 = to_string((stol(kara_mul(a + b, c + d)) - stol(p1) - stol(p2)));
return to_string((stol(p1 + string(len_n, '0')) + stol(p2) + stol(p3 + string(len_n / 2, '0'))));
}
int main()
{
cout << kara_mul("15", "12") << "\n";
return 0;
}
And after fixing this I would also like to know how to multiply two 664 digit integers using this technique.
There are several issues:
The exception you got is caused by infinite recursion at this call:
kara_mul(a + b, c + d)
As these variables are strings, the + is a string concatenation. This means these arguments evaluate to
n and m, which were the arguments to the current execution of the function.
The correct algorithm would perform a numerical addition here, for which you need to provide an implementation (adding two string representations of potentially very long integers)
if (len_n == 1 && len_m == 1) detects the base case, but the base case should kick in when either of these sizes is 1, not necessary both. So this should be an || operator, or should be written as two separate if statements.
The input strings should be split such that b and d are equal in size. This is not what your code does. Note how the Wikipedia article stresses this point:
The second argument of the split_at function specifies the number of digits to extract from the right
stol should never be called on strings that could potentially be too long for conversion to long. So for example, stol(p1) is not safe, as p1 could have 20 or more digits.
As a consequence of the previous point, you'll need to implement functions that add or subtract two string representations of numbers, and also one that can multiply a string representation with a single digit (the base case).
Here is an implementation that corrects these issues:
#include <iostream>
#include <algorithm>
int digit(std::string n, int i) {
return i >= n.size() ? 0 : n[n.size() - i - 1] - '0';
}
std::string add(std::string n, std::string m) {
int len = std::max(n.size(), m.size());
std::string result;
int carry = 0;
for (int i = 0; i < len; i++) {
int sum = digit(n, i) + digit(m, i) + carry;
result += (char) (sum % 10 + '0');
carry = sum >= 10;
}
if (carry) result += '1';
reverse(result.begin(), result.end());
return result;
}
std::string subtract(std::string n, std::string m) {
int len = n.size();
if (m.size() > len) throw std::invalid_argument("subtraction overflow");
if (n == m) return "0";
std::string result;
int carry = 0;
for (int i = 0; i < len; i++) {
int diff = digit(n, i) - digit(m, i) - carry;
carry = diff < 0;
result += (char) (diff + carry * 10 + '0');
}
if (carry) throw std::invalid_argument("subtraction overflow");
result.erase(result.find_last_not_of('0') + 1);
reverse(result.begin(), result.end());
return result;
}
std::string simple_mul(std::string n, int coefficient) {
if (coefficient < 2) return coefficient ? n : "0";
std::string result = simple_mul(add(n, n), coefficient / 2);
return coefficient % 2 ? add(result, n) : result;
}
std::string kara_mul(std::string n, std::string m) {
int len_n = n.size();
int len_m = m.size();
if (len_n == 1) return simple_mul(m, digit(n, 0));
if (len_m == 1) return simple_mul(n, digit(m, 0));
int len_min2 = std::min(len_n, len_m) / 2;
std::string a = n.substr(0, len_n - len_min2);
std::string b = n.substr(len_n - len_min2);
std::string c = m.substr(0, len_m - len_min2);
std::string d = m.substr(len_m - len_min2);
std::string p1 = kara_mul(a, c);
std::string p2 = kara_mul(b, d);
std::string p3 = subtract(kara_mul(add(a, b), add(c, d)), add(p1, p2));
return add(add(p1 + std::string(len_min2*2, '0'), p2), p3 + std::string(len_min2, '0'));
}
I'm solving a algorithm problem https://codeforces.com/contest/1671/problem/E. Although my submit can pass the tests provided by the contest, I find it fails on specific test(Hack). When I'm trying to find where's the error, I find that if I choose start debugging, the program would run perfectly. However, when I click "run", it would give a wrong answer. So, I'm curious about what happens.
#include<iostream>
#include <algorithm>
#include <cstring>
using namespace std;
int const NN = 1e6;
int const MOD = 998244353;
char str[NN];
long long dfs_data[NN];
int powans[20];
string myhash[NN];
int n;
long long dfs(int num) {
if (dfs_data[num] != 0) return dfs_data[num];
if (num >= powans[n - 1] - 1) {
dfs_data[num] = 1;
return 1;
}
if (myhash[num * 2 + 1] == myhash[num * 2 + 2]) {
dfs_data[num] = (dfs(num * 2 + 1) % MOD) * (dfs(num * 2 + 2) % MOD) % MOD;
} else dfs_data[num] = 2 * (dfs(num * 2 + 1) % MOD) * (dfs(num * 2 + 2) % MOD) % MOD;
return dfs_data[num];
}
void gethashcode(int t) {
if (t >= powans[n - 1] - 1) {
myhash[t] += str[t];
return;
}
if (myhash[2 * t + 1] == "") gethashcode(2 * t + 1);
if (myhash[2 * t + 2] == "") gethashcode(2 * t + 2);
if (myhash[2 * t + 1] < myhash[2 * t + 2]) myhash[t] = str[t] + myhash[2 * t + 1] + myhash[2 * t + 2];
else myhash[t] = str[t] + myhash[2 * t + 2] + myhash[2 * t + 1];
}
void solve() {
memset(dfs_data, 0, sizeof dfs_data);
cin >> n;
cin >> str;
powans[0] = 1;
for (int i = 1; i < 19; i++) {
powans[i] = 2 * powans[i - 1];
}
for (int i = 0; i < NN; i++) {
myhash[i] = "";
}
gethashcode(0);
cout << dfs(0);
}
int main() {
solve();
}
The prompt question is, the size of the help array that can be written is (R-L+1)*'4' bytes, but '8' bytes may be written, what does this mean?Is the array out of bounds, but I think it is logically correct, the specific code is as follows:
void merge(int a[], int L, int M, int R) {
int* help = new int[R - L + 1];
int i = 0;
int p = L;
int q = M + 1;
while (p <= M && q <= R) {
help[i++] = a[p] <= a[q] ? a[p++] : a[q++];
}
while (p <= M) {
help[i++] = a[i++];
}
while (q <= R) {
help[i++] = a[i++];
}
for (i = 0; i < R - L + 1; i++) {
a[L + i] = help[i];
}
}
I've been struggling to understand how the function long long number here works. The bit that I can't fully grasp is the for cycles in the if's. Why when we have a number in dec do we have to raise it to that power? Shouldn't we just sum it up and leave it? Also why do we raise the other numbers to that power?
Here is the code:
int counter(long long n, int k) {
int counter = 0;
while (n != 0) {
counter++;
n /= k;
}
return counter;
}
int number2(long long n, int number) {
return (n / (long long) pow(10, number)) % 10;
}
int toDecimal(long long n, int k) {
long long decimal = 0;
for (int i = 0; i < counter(n, 10); i++) {
decimal += number2(n, i)*(int)pow(k, i);
}
return decimal;
}
long long number(char *arr, int start) {
int end = start;
long long number2 = 0;
while (*(arr + end) != ' ' && *(arr + end) != '\0') {
end++;
}
int numberSize = end - start;
if (*(arr + start) != '0') {
for (int i = 0; i < numberSize; i++) {
number2 += (*(arr + start + i) - '0')*pow(10, numberSize - i - 1);
}
return number2;
}
if (*(arr + start) == '0' && (*(arr + start + 1) != 'b' && *(arr + start + 1) != 'x')) {
for (int i = 1; i < numberSize; i++) {
number2 += (*(arr + start + i) - '0')*pow(10, numberSize - i - 1);
}
return toDecimal(number2, 8);
}
if (*(arr + start) == '0' && *(arr + start + 1) == 'b') {
for (int i = 2; i < numberSize; i++) {
number2 += (*(arr + start + i) - '0')*pow(10, numberSize - i - 1);
}
return toDecimal(number2, 2);
}
if (*(arr + start) == '0' && *(arr + start + 1) == 'x') {
int *hex = new int[numberSize - 2];
for (int i = 2; i < numberSize; i++) {
if (*(arr + start + i) >= '0'&&
*(arr + start + i) <= '9')
arr[i - 2] = (*(arr + start + i) - '0');
if (*(arr + start + i) >= 'A'&&
*(arr + start + i) <= 'F')
arr[i - 2] = (int)(*(arr + start + i) - '7');
number2 += arr[i - 2] * pow(16, numberSize - i - 1);
}
delete[] hex;
return number2;
}
}
int main() {
char first[1000];
cin.getline(first, 1000);
int size = strlen(first);
long numberr = number(&first[0], 0);
for (int counter = 0; counter < size; counter++) {
if (first[counter] == ' '&&first[counter + 1] == '+') {
numberr += number(&first[0], counter + 3);
}
}
cout << numberr << "\n";
return 0;
}
The number is a string and is a sequence of single characters representing digits. You have to convert the characters to numbers ("1" --> 1) and then multiply it by the right number of tens to move it to the right place. For example: "123" --> (1 * 10^2) + (2 * 10^1) + (3 * 10^0)
I am trying to make a fraction calculator that calculates on a cuda devise, below is first the sequential version and then my try for a parallel version.
It runs without error, but for some reason do it not give the result back, I have been trying to get this to work for 2 weeks now, but can’t find the error!
Serilized version
int f(int x, int c, int n);
int gcd(unsigned int u, unsigned int v);
int main ()
{
clock_t start = clock();
srand ( time(NULL) );
int x = 1;
int y = 2;
int d = 1;
int c = rand() % 100;
int n = 323;
if(n % y == 0)
d = y;
while(d == 1)
{
x = f(x, c, n);
y = f(f(y, c, n), c, n);
int abs = x - y;
if(abs < 0)
abs = abs * -1;
d = gcd(abs, n);
if(d == n)
{
printf("\nd == n");
c = 0;
while(c == 0 || c == -2)
c = rand() % 100;
x = 2;
y = 2;
}
}
int d2 = n/d;
printf("\nTime elapsed: %f", ((double)clock() - start) / CLOCKS_PER_SEC);
printf("\nResult: %d", d);
printf("\nResult2: %d", d2);
int dummyReadForPause;
scanf_s("%d",&dummyReadForPause);
}
int f(int x, int c, int n)
{
return (int)(pow((float)x, 2) + c) % n;
}
int gcd(unsigned int u, unsigned int v){
int shift;
/ * GCD(0,x) := x * /
if (u == 0 || v == 0)
return u | v;
/ * Let shift := lg K, where K is the greatest power of 2
dividing both u and v. * /
for (shift = 0; ((u | v) & 1) == 0; ++shift) {
u >>= 1;
v >>= 1;
}
while ((u & 1) == 0)
u >>= 1;
/ * From here on, u is always odd. * /
do {
while ((v & 1) == 0) / * Loop X * /
v >>= 1;
/ * Now u and v are both odd, so diff(u, v) is even.
Let u = min(u, v), v = diff(u, v)/2. * /
if (u < v) {
v -= u;
} else {
int diff = u - v;
u = v;
v = diff;
}
v >>= 1;
} while (v != 0);
return u << shift;
}
parallel version
#define threads 512
#define MaxBlocks 65535
#define RunningTheads (512*100)
__device__ int gcd(unsigned int u, unsigned int v)
{
int shift;
if (u == 0 || v == 0)
return u | v;
for (shift = 0; ((u | v) & 1) == 0; ++shift) {
u >>= 1;
v >>= 1;
}
while ((u & 1) == 0)
u >>= 1;
do {
while ((v & 1) == 0)
v >>= 1;
if (u < v) {
v -= u;
} else {
int diff = u - v;
u = v;
v = diff;
}
v >>= 1;
} while (v != 0);
return u << shift;
}
__device__ bool cuda_found;
__global__ void cudaKernal(int *cArray, int n, int *outr)
{
int index = blockIdx.x * threads + threadIdx.x;
int x = 1;
int y = 2;
int d = 4;
int c = cArray[index];
while(d == 1 && !cuda_found)
{
x = (int)(pow((float)x, 2) + c) % n;
y = (int)(pow((float)y, 2) + c) % n;
y = (int)(pow((float)y, 2) + c) % n;
int abs = x - y;
if(abs < 0)
abs = abs * -1;
d = gcd(abs, n);
}
if(d != 1 && !cuda_found)
{
cuda_found = true;
outr = &d;
}
}
int main ()
{
int n = 323;
int cArray[RunningTheads];
cArray[0] = 1;
for(int i = 1; i < RunningTheads-1; i++)
{
cArray[i] = i+2;
}
int dresult = 0;
int *dev_cArray;
int *dev_result;
HANDLE_ERROR(cudaMalloc((void**)&dev_cArray, RunningTheads*sizeof(int)));
HANDLE_ERROR(cudaMalloc((void**)&dev_result, sizeof(int)));
HANDLE_ERROR(cudaMemcpy(dev_cArray, cArray, RunningTheads*sizeof(int), cudaMemcpyHostToDevice));
int TotalBlocks = ceil((float)RunningTheads/(float)threads);
if(TotalBlocks > MaxBlocks)
TotalBlocks = MaxBlocks;
printf("Blocks: %d\n", TotalBlocks);
printf("Threads: %d\n\n", threads);
cudaKernal<<<TotalBlocks,threads>>>(dev_cArray, n, dev_result);
HANDLE_ERROR(cudaMemcpy(&dresult, dev_result, sizeof(int), cudaMemcpyDeviceToHost));
HANDLE_ERROR(cudaFree(dev_cArray));
HANDLE_ERROR(cudaFree(dev_result));
if(dresult == 0)
dresult = 1;
int d2 = n/dresult;
printf("\nResult: %d", dresult);
printf("\nResult2: %d", d2);
int dummyReadForPause;
scanf_s("%d",&dummyReadForPause);
}
Lets have a look at your kernel code:
__global__ void cudaKernal(int *cArray, int n, int *outr)
{
int index = blockIdx.x * threads + threadIdx.x;
int x = 1;
int y = 2;
int d = 4;
int c = cArray[index];
while(d == 1 && !cuda_found) // always false because d is always 4
{
x = (int)(pow((float)x, 2) + c) % n;
y = (int)(pow((float)y, 2) + c) % n;
y = (int)(pow((float)y, 2) + c) % n;
int abs = x - y;
if(abs < 0)
abs = abs * -1;
d = gcd(abs, n); // never writes to d because the loop won't
// be executed
}
if(d != 1 && !cuda_found) // maybe true if cuda_found was initalized
// with false
{
cuda_found = true; // Memory race here.
outr = &d; // you are changing the adresse where outr
// points to; the host code does not see this
// change. your cudaMemcpy dev -> host will copy
// the exact values back from device that have
// been uploaded by cudaMemcpy host -> dev
// if you want to set outr to 4 than write:
// *outr = d;
}
}
One of the problems is you don't return the result. In your code you just change outr which has local scope in your kernel function (i.e. changes are not seen outside this function). You should write *outr = d; to change the value of memory you're pointing with outr.
and I'm not sure if CUDA initializes global variables with zero. I mean are you sure cuda_found is always initialized with false?