Why is this Pascal Triangle implementation giving me trailing zeroes? - c++

I tried to implement it recursively (iteratively seemed less elegant, but please do correct me if I am wrong).But the output seems to be giving me trailing zeroes and the first few rows are unexpected.I have checked the base cases and the recursive cases , but they seem to be all right.The problem is definitely within the function.
#include <iostream>
unsigned long long p[1005][1005];
void pascal(int n)
{
if (n == 1)
{
p[0][0] = 1;
return;
}
else if (n == 2)
{
p[0][0] = 1; p[0][1] = 1;
return;
}
p[n][0] = 1;
p[n][n-1] = 1;
pascal(n-1);
for (int i = 1; i < n;++i)
{
p[n][i] = p[n-1][i-1] + p[n-1][i];
}
return;
}
int main()
{
int n;
std::cin >> n;
pascal(n);
for (int i = 0 ; i < n ; ++i)
{
for (int j = 0 ;j < i+1 ; ++j)
{
std::cout << p[i][j] << " ";
}
std::cout << "\n";
}
}
Output:
(I enter)15
1
0 0
0 0 0
1 0 0 0
1 1 0 0 0
1 2 1 0 0 0
1 3 3 1 0 0 0
1 4 6 4 1 0 0 0
1 5 10 10 5 1 0 0 0
1 6 15 20 15 6 1 0 0 0
1 7 21 35 35 21 7 1 0 0 0
1 8 28 56 70 56 28 8 1 0 0 0
1 9 36 84 126 126 84 36 9 1 0 0 0
1 10 45 120 210 252 210 120 45 10 1 0 0 0
1 11 55 165 330 462 462 330 165 55 11 1 0 0 0

The base cases n = 1 and n = 2 are too aggressive (1 is never reached for a normal input like 10 because 2 breaks the recursion prematurely, leaving untouched zeroes in the array). These values for n should be covered automatically by the recursive case. Our real base case where we do nothing is when n < 0.
void pascal(int n)
{
if (n < 0) return;
p[n][0] = 1;
pascal(n - 1);
for (int i = 1; i <= n; ++i)
{
p[n][i] = p[n-1][i-1] + p[n-1][i];
}
}
Output for n = 15:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1
1 11 55 165 330 462 462 330 165 55 11 1
1 12 66 220 495 792 924 792 495 220 66 12 1
1 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1
Having said this, it's poor practice to hard code the size of the array. Consider using vectors and passing parameters to the functions so that they don't mutate global state.
We can also write it iteratively in (to me) a more intuitive way:
void pascal(int n)
{
for (int i = 0; i < n; ++i)
{
p[i][0] = 1;
for (int j = 1; j <= i; ++j)
{
p[i][j] = p[i-1][j-1] + p[i-1][j];
}
}
}

Related

counting sort algorithm not working in hackerrank [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 25 days ago.
Improve this question
this is my code
vector<int> countingSort(vector<int> arr) {
int max = arr[0];
for(unsigned int i = 1; i < arr.size(); i++) {
if(arr[i] > arr[0]) {
max = arr[i];
}
}
max++;
vector<int> frequency_arr;
for(int i = 0; i < max; i++) {
frequency_arr.push_back(0);
}
for(unsigned int i = 0; i < arr.size(); i++) {
int elem = arr[i];
frequency_arr[elem]++;
}
return frequency_arr;
}
explanation : "countingSort" function is a sorting function which sort elements by using "counting sort" algorithm. it takes vector array as an input and return the "frequency array" as output.
input array
63 25 73 1 98 73 56 84 86 57 16 83 8 25 81 56 9 53 98 67 99 12 83 89 80 91 39 86 76 85 74 39 25 90 59 10 94 32 44 3 89 30 27 79 46 96 27 32 18 21 92 69 81 40 40 34 68 78 24 87 42 69 23 41 78 22 6 90 99 89 50 30 20 1 43 3 70 95 33 46 44 9 69 48 33 60 65 16 82 67 61 32 21 79 75 75 13 87 70 33
my output
0 2 0 2 0 0 1 0 1 2 1 0 1 1 0 0 2 0 1 0 1 2 1 1 1 3 0 2 0 0 2 0 3 3 1 0 0 0 0 2 2 1 1 1 2 0 2 0 1 0 1 0 0 1 0 0 2 1 0 1 1 1 0 1 0 1 0 2 1 3 2
expected output
0 2 0 2 0 0 1 0 1 2 1 0 1 1 0 0 2 0 1 0 1 2 1 1 1 3 0 2 0 0 2 0 3 3 1 0 0 0 0 2 2 1 1 1 2 0 2 0 1 0 1 0 0 1 0 0 2 1 0 1 1 1 0 1 0 1 0 2 1 3 2 0 0 2 1 2 1 0 2 2 1 2 1 2 1 1 2 2 0 3 2 1 1 0 1 1 1 0 2 2
Constraints
length of input array -> [100, 10^6]
range of elements in array -> [1, 100]
i was solving a counting sort problem in hackerrank, but when i run the code , in output it do not show all element of "frequency_arr"
it only show the bold numbers in expected output and do not show the remaning elements.
do you know what i am doing wrong here and what are the potential fixes?
It seems you have a typo in the first for loop
int max = arr[0];
for(unsigned int i = 1; i < arr.size(); i++) {
if(arr[i] > arr[0]) {
^^^^^^^^^^^^^^^^^
max = arr[i];
}
}
You need to write the if statement within the for loop the following way
if(arr[i] > max) {
Moreover in general the variable i should have the type size_t instead of unsigned int.
for( size_t i = 1; i < arr.size(); i++) {
Pay attention to that there is standard algorithm std::max_element declared in header <algorithm>.
And instead of this code snippet
vector<int> frequency_arr;
for(int i = 0; i < max; i++) {
frequency_arr.push_back(0);
}
you could just write
vector<int> frequency_arr( max );
In this case all max elements of the vector will be zero-initialized.

How to parallelize this array correct way using OpenMP?

After I try to parallelize the code with openmp, the elements in the array are wrong, as for the order of the elements is not very important. Or is it more convenient to use c++ std vector instead of array to parallelize, could you suggest a easy way?
#include <stdio.h>
#include <math.h>
int main()
{
int n = 100;
int a[n*(n+1)/2]={0};
int count=0;
#pragma omp parallel for reduction(+:a,count)
for (int i = 1; i <= n; i++) {
for (int j = i + 1; j <= n; j++) {
double k = sqrt(i * i + j * j);
if (fabs(round(k) - k) < 1e-10) {
a[count++] = i;
a[count++] = j;
a[count++] = (int) k;
}
}
}
for(int i=0;i<count;i++)
printf("%d %s",a[i],(i+1)%3?"":", ");
printf("\ncount: %d", count);
return 0;
}
Original output:
3 4 5 , 5 12 13 , 6 8 10 , 7 24 25 , 8 15 17 , 9 12 15 , 9 40 41 , 10 24 26 , 11 60 61 , 12 16 20 , 12 35 37 , 13 84 85 , 14 48 50 , 15 20 25 , 15 36 39 , 16 30 34 , 16 63 65 , 18 24 30 , 18 80 82 , 20 21 29 , 20 48 52 , 20 99 101 , 21 28 35 , 21 72 75 , 24 32 40 , 24 45 51 , 24 70 74 , 25 60 65 , 27 36 45 , 28 45 53 , 28 96 100 , 30 40 50 , 30 72 78 , 32 60 68 , 33 44 55 , 33 56 65 , 35 84 91 , 36 48 60 , 36 77 85 , 39 52 65 , 39 80 89 , 40 42 58 , 40 75 85 , 40 96 104 , 42 56 70 , 45 60 75 , 48 55 73 , 48 64 80 , 48 90 102 , 51 68 85 , 54 72 90 , 56 90 106 , 57 76 95 , 60 63 87 , 60 80 100 , 60 91 109 , 63 84 105 , 65 72 97 , 66 88 110 , 69 92 115 , 72 96 120 , 75 100 125 , 80 84 116 ,
count: 189
After using OpenMP(gcc file.c -fopenmp):
411 538 679 , 344 609 711 , 354 533 649 , 218 387 449 , 225 475 534 , 182 283 339 , 81 161 182 , 74 190 204 , 77 138 159 , 79 176 195 , 18 24 30 , 18 80 82 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 , 0 0 0 ,
count: 189
Your threads are all accessing the shared count.
You would be better off eliminating count and have each loop iteration determine where to write its output based only on the (per-thread) values of i and j.
Alternatively, use a vector to accumulate the results:
#include <cmath>
#include <iostream>
#include <utility>
#include <vector>
#pragma omp declare \
reduction(vec_append : std::vector<std::pair<int,int>> : \
omp_out.insert(omp_out.end(), omp_in.begin(), omp_in.end()))
int main()
{
constexpr int n = 100'000;
std::vector<std::pair<int,int>> result;
#pragma omp parallel for \
reduction(vec_append:result) \
schedule(dynamic)
for (int i = 1; i <= n; ++i) {
for (int j = i + 1; j <= n; ++j) {
auto const h2 = i * i + j * j; // hypotenuse squared
int h = std::sqrt(h2) + 0.5; // integer square root
if (h * h == h2) {
result.emplace_back(i, j);
}
}
}
// for (auto const& v: result) {
// std::cout << v.first << ' '
// << v.second << ' '
// << std::hypot(v.first, v.second) << ' ';
// }
std::cout << "\ncount: " << result.size() << '\n';
}
As an alternative to using a critical section, this solution uses atomics and could therefore be faster.
The following code might freeze your computer due to memory consumption. Be careful!
#include <cstdio>
#include <cmath>
#include <vector>
int main() {
int const n = 100;
// without a better (smaller) upper_bound this is extremely
// wasteful in terms of memory for big n
long const upper_bound = 3L * static_cast<long>(n) *
(static_cast<long>(n) - 1L) / 2l;
std::vector<int> a(upper_bound, 0);
int count = 0;
#pragma omp parallel for schedule(dynamic) shared(a, count)
for (int i = 1; i <= n; ++i) {
for (int j = i + 1; j <= n; ++j) {
double const k = std::sqrt(static_cast<double>(i * i + j * j));
if (std::fabs(std::round(k) - k) < 1e-10) {
int my_pos;
#pragma omp atomic capture
my_pos = count++;
a[3 * my_pos] = i;
a[3 * my_pos + 1] = j;
a[3 * my_pos + 2] = static_cast<int>(std::round(k));
}
}
}
count *= 3;
for(int i = 0; i < count; ++i) {
std::printf("%d %s", a[i], (i + 1) % 3 ? "" : ", ");
}
printf("\ncount: %d", count);
return 0;
}
EDIT:
My answer was initially a reaction to a by now deleted answer using a critical section in a sub-optimal way. In the following I will present another solution which combines a critical section with using std::vector::emplace_back() to circumvent the need for upper_bound similar to Toby Speight's solution. Generally using a reduce clause like in Toby Speight's solution should be preferred over critical sections and atomics, as reductions should scale better for big numbers of threads. In this particular case (relatively few calculations will be written to a) and without a big amount of cores to run on, the following code might still be preferable.
#include <cstdio>
#include <cmath>
#include <tuple>
#include <vector>
int main() {
int const n = 100;
std::vector<std::tuple<int, int, int>> a{};
// optional, might reduce number of reallocations
a.reserve(2 * n); // 2 * n is an arbitrary choice
#pragma omp parallel for schedule(dynamic) shared(a)
for (int i = 1; i <= n; ++i) {
for (int j = i + 1; j <= n; ++j) {
double const k = std::sqrt(static_cast<double>(i * i + j * j));
if (std::fabs(std::round(k) - k) < 1e-10) {
#pragma omp critical
a.emplace_back(i, j, static_cast<int>(std::round(k)));
}
}
}
long const count = 3L * static_cast<long>(a.size());
for(unsigned long i = 0UL; i < a.size(); ++i) {
std::printf("%d %d %d\n",
std::get<0>(a[i]), std::get<1>(a[i]), std::get<2>(a[i]));
}
printf("\ncount: %ld", count);
return 0;
}
The count variable is an index into a. The reduction(+:a,count) operator is summing the array, it is not a concatenation operation which is what I think you are looking for.
The count variable needs to be surrounded by a mutex, something like #pragma omp critical, but I am not an OpenMP expert.
Alternatively, create int a[n][n], set all of them to -1 (a sentinel value to indicate "invalid") then assign the result of the sqrt() when it is near enough to a whole number.

OpenMP integral image slower then sequential

I have implemented Summed Area Table (or Integral image) in C++ using OpenMP.
The problem is that the Sequential code is always faster then the Parallel code even changing the number of threads and image sizes.
For example I tried images from (100x100) to (10000x10000) and threads from 1 to 64, but none of the combination is ever faster.
I also tried this code in different machines like:
Mac OSX 1,4 GHz Intel Core i5 dual core
Mac OSX 2,3 GHz Intel Core i7 quad core
Ubuntu 16.04 Intel Xeon E5-2620 2,4 GHz 12 cores
The time has been measured with OpenMP function: omp_get_wtime().
For compiling I use: g++ -fopenmp -Wall main.cpp.
Here is the parallel code:
void transpose(unsigned long *src, unsigned long *dst, const int N, const int M) {
#pragma omp parallel for
for(int n = 0; n<N*M; n++) {
int i = n/N;
int j = n%N;
dst[n] = src[M*j + i];
}
}
unsigned long * integralImageMP(uint8_t*x, int n, int m){
unsigned long * out = new unsigned long[n*m];
unsigned long * rows = new unsigned long[n*m];
#pragma omp parallel for
for (int i = 0; i < n; ++i)
{
rows[i*m] = x[i*m];
for (int j = 1; j < m; ++j)
{
rows[i*m + j] = x[i*m + j] + rows[i*m + j - 1];
}
}
transpose(rows, out, n, m);
#pragma omp parallel for
for (int i = 0; i < n; ++i)
{
rows[i*m] = out[i*m];
for (int j = 1; j < m; ++j)
{
rows[i*m + j] = out[i*m + j] + rows[i*m + j - 1];
}
}
transpose(rows, out, m, n);
delete [] rows;
return out;
}
Here is the sequential code:
unsigned long * integralImage(uint8_t*x, int n, int m){
unsigned long * out = new unsigned long[n*m];
for (int i = 0; i < n; ++i)
{
for (int j = 0; j < m; ++j)
{
unsigned long val = x[i*m + j];
if (i>=1)
{
val += out[(i-1)*m + j];
if (j>=1)
{
val += out[i*m + j - 1] - out[(i-1)*m + j - 1];
}
} else {
if (j>=1)
{
val += out[i*m + j -1];
}
}
out[i*m + j] = val;
}
}
return out;
}
I also tried without the transpose but it was even slower probably because the cache accesses.
An example of calling code:
int main(int argc, char **argv){
uint8_t* image = //read image from file (gray scale)
int height = //height of the image
int width = //width of the image
double start_omp = omp_get_wtime();
unsigned long* integral_image_parallel = integralImageMP(image, height, width); //parallel
double end_omp = omp_get_wtime();
double time_tot = end_omp - start_omp;
std::cout << time_tot << std::endl;
start_omp = omp_get_wtime();
unsigned long* integral_image_serial = integralImage(image, height, width); //sequential
end_omp = omp_get_wtime();
time_tot = end_omp - start_omp;
std::cout << time_tot << std::endl;
return 0;
}
Each thread is working on a block of rows (maybe an illustration of what each thread is doing can be useful):
Where ColumnSum is done transposing the matrix and repeating RowSum.
Let me first say, that the results are a bit surprising to me and I would guesstimate the problem being in the non local memory access required by the transpose algorithm.
You can anyway mitigate it by turning your sequential algorithm into parallel by a two pass approach. The first pass has to calculate the 2D integral in T threads N rows apart and the second pass must compensate the fact that each block didn't start from the accumulated result of the previous row but from zero.
An example with Matlab shows the principle in 2D.
f=fix(rand(12,8)*8) % A random matrix with 12 rows, 8 columns
5 6 1 4 7 5 4 4
4 6 0 7 1 3 2 0
7 0 2 3 0 1 6 3
5 3 1 7 4 3 7 2
6 4 3 2 7 3 5 1
3 3 2 5 5 0 2 1
3 5 7 5 1 4 4 3
6 5 7 4 2 1 0 0
0 2 0 5 3 3 7 4
1 3 5 5 7 4 7 3
1 0 2 1 1 2 6 5
3 7 3 1 6 2 2 5
ff=cumsum(cumsum(f')') % The Summed Area Table
5 11 12 16 23 28 32 36
9 21 22 33 41 49 55 59
16 28 31 45 53 62 74 81
21 36 40 61 73 85 104 113
27 46 53 76 95 110 134 144
30 52 61 89 113 128 154 165
33 60 76 109 134 153 183 197
39 71 94 131 158 178 208 222
39 73 96 138 168 191 228 246
40 77 105 152 189 216 260 281
41 78 108 156 194 223 273 299
44 88 121 170 214 245 297 328
fx=[cumsum(cumsum(f(1:4,:)')'); % The original table summed in
cumsum(cumsum(f(5:8,:)')'); % three parts -- 4 rows per each
cumsum(cumsum(f(9:12,:)')')] % "thread"
5 11 12 16 23 28 32 36
9 21 22 33 41 49 55 59
16 28 31 45 53 62 74 81
21 36 40 61 73 85 104 113 %% Notice this row #4
6 10 13 15 22 25 30 31
9 16 21 28 40 43 50 52
12 24 36 48 61 68 79 84
18 35 54 70 85 93 104 109 %% Notice this row #8
0 2 2 7 10 13 20 24
1 6 11 21 31 38 52 59
2 7 14 25 36 45 65 77
5 17 27 39 56 67 89 106
fx(4,:) + fx(8,:) %% this is the SUM of row #4 and row #8
39 71 94 131 158 178 208 222
%% and finally -- what is the difference of the piecewise
%% calculated result and the real result?
ff-fx
0 0 0 0 0 0 0 0 %% look !! the first block
0 0 0 0 0 0 0 0 %% is already correct
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
21 36 40 61 73 85 104 113 %% All these rows in this
21 36 40 61 73 85 104 113 %% block are short by
21 36 40 61 73 85 104 113 %% the row #4 above
21 36 40 61 73 85 104 113 %%
39 71 94 131 158 178 208 222 %% and all these rows
39 71 94 131 158 178 208 222 %% in this block are short
39 71 94 131 158 178 208 222 %% by the SUM of the rows
39 71 94 131 158 178 208 222 %% #4 and #8 above
Fortunately one can start integrating the block 2, i.e. rows 2N..3N-1 before the block #1 has been compensated -- one just has to calculate the offset, which is a relatively small sequential task.
acc_for_block_2 = row[2*N-1] + row[N-1];
acc_for_block_3 = acc_for_block_2 + row[3*N-1];
..
acc_for_block_T-1 = acc_for_block_(T-2) + row[N*(T-1)-1];

How to loop rows and columns in pandas while replacing values with a constant increment

I am trying to replace values in a dataframe by 0. the first column I need to replace the 1st 3 values, the next column the 1st 6 values so on so forth increasing by 3 every time
a=np.array([133,124,156,189,132,176,189,192,100,120,130,140,150,50,70,133,124,156,189,132])
b = pd.DataFrame(a.reshape(10,2), columns= ['s','t'])
for columns in b:
yy = 3
for i in xrange(yy):
b[columns][i] = 0
yy += 3
print b
the outcome is the following
s t
0 0 0
1 0 0
2 0 0
3 189 189
4 132 132
5 176 176
6 189 189
7 192 192
8 100 100
9 120 120
I am clearly missing something really simple, to make the loop replace 6 values instead of only 3 in column t, any ideas?
i would do it this way:
i = 1
for c in b.columns:
b.ix[0 : 3*i-1, c] = 0
i += 1
Demo:
In [86]: b = pd.DataFrame(np.random.randint(0, 100, size=(20, 4)), columns=list('abcd'))
In [87]: %paste
i = 1
for c in b.columns:
b.ix[0 : 3*i-1, c] = 0
i += 1
## -- End pasted text --
In [88]: b
Out[88]:
a b c d
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 10 0 0 0
4 8 0 0 0
5 49 0 0 0
6 55 48 0 0
7 99 43 0 0
8 63 29 0 0
9 61 65 74 0
10 15 29 41 0
11 79 88 3 0
12 91 74 11 4
13 56 71 6 79
14 15 65 46 81
15 81 42 60 24
16 71 57 95 18
17 53 4 80 15
18 42 55 84 11
19 26 80 67 59
You need inicialize yy=3 before loop:
yy = 3
for columns in b:
for i in xrange(yy):
b[columns][i] = 0
yy += 3
print b
Python 3 solution:
yy = 3
for columns in b:
for i in range(yy):
b[columns][i] = 0
yy += 3
print (b)
s t
0 0 0
1 0 0
2 0 0
3 189 0
4 100 0
5 130 0
6 150 50
7 70 133
8 124 156
9 189 132
Another solution:
yy= 3
for i, col in enumerate(b.columns):
b.ix[:i*yy+yy-1, col] = 0
print (b)
s t
0 0 0
1 0 0
2 0 0
3 189 0
4 100 0
5 130 0
6 150 50
7 70 133
8 124 156
9 189 132

Floyd's Algorithm (Shortest Paths) Issue - C++

Basically, I'm tasked with implementing Floyd's algorithm to find the shortest path of a matrix. A value, in my case, arg, is taken in and the matrix becomes size arg*arg. The next string of values are applied to the matrix in the order received. Lastly, a -1 represents infinity.
To be quite honest, I've no idea where my problem is coming in. When ran through the tests, the first couple pass, but the rest fail. I'll only post the first two failures along with the passes. I'll just post the relevant segment of code.
int arg, var, i, j;
cin >> arg;
int arr[arg][arg];
for (i = 0; i < arg; i++)
{
for(j = 0; j < arg; j++)
{
cin >> var;
arr[i][j] = var;
}
}
for(int pivot = 0; pivot < arg; pivot++)
{
for(i = 0; i < arg; i++)
{
for(j = 0; j < arg; j++)
{
if((arr[i][j] > (arr[i][pivot] + arr[pivot][j])) && ((arr[i][pivot] != -1) && arr[pivot][j] != -1))
{
arr[i][j] = (arr[i][pivot] + arr[pivot][j]);
arr[j][i] = (arr[i][pivot] + arr[pivot][j]);
}
}
}
}
And here are the failures that I'm receiving. The rest of them get longer and longer, up to a 20*20 matrix, so I'll spare you from that:
floyd>
* * * Program successfully started and correct prompt received.
floyd 2 0 14 14 0
0 14 14 0
floyd> PASS : Input "floyd 2 0 14 14 0" produced output "0 14 14 0".
floyd 3 0 85 85 85 0 26 85 26 0
0 85 85 85 0 26 85 26 0
floyd> PASS : Input "floyd 3 0 85 85 85 0 26 85 26 0" produced output "0 85 85 85 0 26 85 26 0".
floyd 3 0 34 7 34 0 -1 7 -1 0
0 34 7 34 0 -1 7 -1 0
floyd> FAIL : Input "floyd 3 0 34 7 34 0 -1 7 -1 0" did not produce output "0 34 7 34 0 41 7 41 0".
floyd 4 0 -1 27 98 -1 0 41 74 27 41 0 41 98 74 41 0
0 -1 27 68 -1 0 41 74 27 41 0 41 68 74 41 0
floyd> FAIL : Input "floyd 4 0 -1 27 98 -1 0 41 74 27 41 0 41 98 74 41 0" did not produce output "0 68 27 68 68 0 41 74 27 41 0 41 68 74 41 0".
Imagine the situation arr[i][j] == -1, obviously (arr[i][j] > (arr[i][pivot] + arr[pivot][j])) && ((arr[i][pivot] != -1) && arr[pivot][j] != -1) fails, but it shouldn’t if arr[i][pivot] and arr[pivot][j] are not -1
Since you are using -1 instead of infinity, you have to have something like if ((arr[i][j] == -1 || arr[i][j] > (arr[i][pivot] + arr[pivot][j])) && ((arr[i][pivot] != -1) && arr[pivot][j] != -1)) i.e. you check against 2 things: the first one is your condition and the 2nd one is the situation when arr[i][j] is infinity and the path through pivot exists, as in this case any valid path is less then infinity.