I have written this quick sort algorithm but it doesn't work properly for some reason, I need some help to find out the error.
I tried not to use built-in swap function however it didn't work out.
FindPivot function returns the median of three elements (start, half, end) of the array after swapping them so that the smallest one is at the start, the greatest at the middle and the median at the end of the array
#include <bits/stdc++.h>
using namespace std;
int pivotIdx, pivot, n;
int partitn(int* arr, int first, int last)
{
pivot = FindPivot(arr,first,last-1);
// pivot = arr[last-1];
cout << "Pivot: " << pivot << endl;
int L = first;
int R = last-2;
while(L<R)
{
while(arr[L]<=pivot)
L++;
while(arr[R]>=pivot)
R--;
swap(arr[L++],arr[R--]);
}
swap(arr[L],arr[last-1]);
return L;
}
void QuickSort(int* arr,int first,int last)
{
if(last-first < 6){
InsertionSort(arr,first,last);
}else{
pivotIdx = partitn(arr, first, last);
QuickSort(arr, first, pivotIdx);
QuickSort(arr, pivotIdx+1, last);
}
}
int main()
{
cin >> n;
int arr[100];
for(int i=0;i<n;i++)
cin>>arr[i];
QuickSort(arr,0,n);
for(int i=0;i<n;i++){
cout << arr[i] << " ";
}
return 0;
}
Input:
8
5 1 6 2 7 3 8 4
Output:
Pivot: 4
1 2 3 5 4 6 7 8
There may be other issues, but the current partition code could scan past the boundaries of the sub-array. Assuming the pivot is not at either end of the sub-array, the code can < instead of <= and > instead of >= and the scans will not run past the boundaries.
while(arr[L]<pivot) // not <=
L++;
while(arr[R]>pivot) // not >=
R--;
Input:
5 1 6 2 7 3 8 4
This becomes:
QuickSort([5 1 6 2 7 3 8 4], 0, 8)
partitn([5 1 6 2 7 3 8 4], 0, 8)
pivot = 4
L = 0 R = 6 [5 1 6 2 7 3 8 4]
// Enter Loop
L = 0 R = 5 swap [3 1 6 2 7 5 8 4]
L = 2 R = 3 swap [3 1 2 6 7 5 8 4]
L = 3 R = 2 swap [3 1 6 2 7 5 8 4]
// Exit Loop
L = 3 swap [3 1 6 4 7 5 8 2] This looks wrong.
The 2 and 6 are in the wrong place.
Related
I need to write a program that takes the input of 3 positive integers. Then find their distinct permutations in increasing order. e.g.
{
{1 2 3}
{1 3 2}
{2 1 3}
{2 3 1}
{3 1 2}
{3 2 1}
}
Then, insert operators '+' or '-' between each two numbers to form an equation and print the result of
it. For each operation, both + and – should be considered. For example, for the permutation of 1 2 3,
we need to consider four cases: 1 + 2 + 3 = 6, 1 + 2 – 3 = 0, 1 – 2 + 3 = 2, and 1 – 2 – 3 = -4.
The output should display all possible equations and their calculation results in an increasing order
sorted by the result of the equation. If two equations have the same result, further sort the equations
in an increasing order of the permutations. For example, 1 + 2 + 3 = 6 should be output before
1 + 3 + 2 = 6 because permutation 1 2 3 is smaller than 1 3 2 according to the definition above.
The order of permutation is defined as follow:
• compare the number in the left-most place first. The permutation with the smaller leftmost number is said to be smaller, e.g., 1 3 2 is smaller than 2 1 3.
• If the left-most number are the same, compare the next one, e.g., 2 1 3 is smaller than 2 3 1
My code :
I wrote two functions.
permutation to find the distinct permutation of 3 integers.
sortPermutation to sort the permutation in an increasing order. But I am not sure if I am doing the right thing in this function.
I have no idea how to perform the + and - operation for the permutation, as well as to sort them in increasing order and display them afterwards. I have searched for resources online but still do not have an idea. Any help will be appreciated!!
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
vector<vector<int>> sortPermutation(vector<int> &result);
vector<vector<int>> permutation(vector<int>num);
int main()
{
int a, b, c;
cout << "Enter three different positive numbers: ";
cin >> a >> b >> c;
vector<int> num = { a,b,c };
int size = num.size();
vector<vector<int>> result = permutation(num);
for (int i = 0; i < result.size(); i++)
{
for (int j = 0; j < result[i].size(); j++)
{
cout << result[i][j];
}
cout << endl;
}
}
vector<vector<int>> sortPermutation(vector<vector<int>>& result)
{
sort(result.begin(),
result.end(),
[](const std::vector<int>& a, const std::vector<int>& b)
{
return a[0] < b[0];
});
return result;
}
vector<vector<int>> permutation(vector<int>num)
{
vector<vector<int>> permu;
if (num.size() <= 1)
{
return { num };
}
for (int i = 0; i < num.size(); i++)
{
vector<int>v(num.begin(), num.end()); // 1,2,3
v.erase(v.begin() + i); //2,3 // 3 // 2
auto res = permutation(v);//3 // 2 // 1 // 3 // 1 // 2
for (int j = 0; j < res.size(); j++)
{
vector<int>_v = res[j];
_v.insert(_v.begin(), num[i]);
permu.push_back(_v);
}
}
return sortPermutation(permu);
}
This is a sample output for permutation 1 2 3
1 - 2 - 3 = -4
1 - 3 - 2 = -4
2 - 1 - 3 = -2
2 - 3 - 1 = -2
1 + 2 - 3 = 0
1 - 3 + 2 = 0
2 + 1 - 3 = 0
2 - 3 + 1 = 0
3 - 1 - 2 = 0
3 - 2 - 1 = 0
1 - 2 + 3 = 2
1 + 3 - 2 = 2
3 + 1 - 2 = 2
3 - 2 + 1 = 2
2 - 1 + 3 = 4
2 + 3 - 1 = 4
3 - 1 + 2 = 4
3 + 2 - 1 = 4
1 + 2 + 3 = 6
1 + 3 + 2 = 6
2 + 1 + 3 = 6
2 + 3 + 1 = 6
3 + 1 + 2 = 6
3 + 2 + 1 = 6
Trying to debug an EXEC_BAD_ACCESS error on a C++ program. The program is taking as input numbers which are converted to strings. I then want to return the ordering of the ints that would generate the largest possible number.
The main is
int main() {
// Number of ints to input
int n;
std::cin >> n;
vector<string> a(n);
// Input numbers as strings
for (size_t i = 0; i < a.size(); i++) {
std::cin >> a[i];
}
std::sort(a.begin(), a.end(), is_greater_than_or_equal);
std::cout << largest_number(a);
std::cout << std::endl;
return 0;
}
Using lldb, it looks like the error is coming from the sort / is_greater_than_or_equal functions.
is_greater_than_or_equal is
bool is_greater_than_or_equal(string n1, string n2) {
int l1 = n1.length();
int l2 = n2.length();
int min = (l1 < l2) ? l1 : l2;
for (int i = 0; i < min; i++) {
// First digit is strictly larger
if (n1[i] > n2[i]) {
return true;
// First digit is strictly lower
} else if (n1[i] < n2[i]) {
return false;
// First digits are the same
} else {
// If first is single digit
if (l1 == 1) {
return true;
// If the second is single digit
} else if (l2 == 1) {
return false;
// If they're both multiple digits
} else {
int j = i + 1;
// Keep checking until hitting min
while (j < min) {
if (n1[j] > n2[j]) {
return true;
} else if (n1[j] < n2[j]) {
return false;
} else {
j++;
}
}
// If min was hit and nothing was returned
// choose integer with lowest length
if (l1 > l2) {
return false;
}
return true;
}
}
}
return false;
}
After checking with lldb, it seems like the sort function keeps going way past a.end() - I believe this is what's causing the error but I'm not sure why this would happen. The reason for this is that examining the calls to is_greater_than_or_equal reveal calls like
(lldb) fr v
(std::__1::string) n1 = "9"
(std::__1::string) n2 = "2"
(int) l1 = 0
(int) l2 = 0
(int) min = 0
at first but then, after a while, they become
(lldb) fr v
(std::__1::string) n1 = ""
(std::__1::string) n2 = ""
(int) l1 = 0
(int) l2 = 0
(int) min = 0
which clearly should not happen (btw, I'm setting the breakpoint before l1 and l2 are assigned which is why the length is not matching above - the point is really just to see n1 and n2).
The output from lldb is
(lldb) r
Process 7405 launched: '/Users/etc' (x86_64)
100
2 8 2 3 6 4 1 1 10 6 3 3 6 1 3 8 4 6 1 10 8 4 10 4 1 3 2 3 2 6 1 5 2 9 8 5 10 8 7 9 6 4 2 6 3 8 8 9 8 2 9 10 3 10 7 5 7 1 7 5 1 4 7 6 1 10 5 4 8 4 2 7 8 1 1 7 4 1 1 9 8 6 5 9 9 3 7 6 3 10 8 10 7 2 5 1 1 9 9 5
Process 7405 stopped
* thread #1: tid = 0x2f2a6, 0x00007fff97c82051 libsystem_platform.dylib`_platform_memmove$VARIANT$Ivybridge + 49, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x00007fff97c82051 libsystem_platform.dylib`_platform_memmove$VARIANT$Ivybridge + 49
libsystem_platform.dylib`_platform_memmove$VARIANT$Ivybridge:
-> 0x7fff97c82051 <+49>: rep
0x7fff97c82052 <+50>: movsb (%rsi), %es:(%rdi)
0x7fff97c82053 <+51>: popq %rbp
0x7fff97c82054 <+52>: retq
I'm not really sure how to interpret this output nor am I entirely aware of the inner workings of the sort.
I am trying to generate a list of subsets from a set. For example, if I had n = 6, and r = 4, I would have 15 possible combinations which would be the following:
0 1 2 3
0 1 2 4
0 1 2 5
0 1 3 4
0 1 3 5
0 1 4 5
0 2 3 4
0 2 3 5
0 2 4 5
0 3 4 5
1 2 3 4
1 2 3 5
1 2 4 5
1 3 4 5
2 3 4 5
My current code does work with the above subsets if n = 6 & r = 4. It also works if any other combination of n-r=2. It does not work for anything else and I'm having a bit of trouble debugging since my code makes perfect sense to me. The code I have is the following:
int array[r];
int difference = n-r;
for(int i = 0; i < r; i++){
array[i] = i;
}
while (array[0] < difference){
print (array, r);
for(int i = r-1; i >= 0; i--){
if ((array[i] - i) == 0){
array[i] = array[i] + 1;
for (int j = i+1; j < r; j++){
array[j] = j + 1;
}
i = r;
}
else{
array[i] = array[i] + 1;
}
print (array, r);
}
}
}
To give some context, when I plug in n=6 and r=3, I am supposed to have 20 combinations as the output. Only 14 are printed, however:
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
2 3 4
2 3 5
2 4 5
3 4 5
It does print the first and last output correctly, however I need to have all the outputs printed out and correct. I can see after the 3rd iteration, the code starts failing as it goes from 0 1 4 to 0 2 3 when it should go to 0 1 5 instead. Any suggestions as to what I'm doing wrong?
Here's what I think you are trying to do. As far as I can tell, your main problem is that the main for loop should start over after incrementing an array element to a valid value, rather than continuing.
So this version only calls print in one place and uses break to get out of the main for loop. It also counts the combinations found.
#include <iostream>
void print(int array[], int r) {
for(int i=0; i<r; ++i) {
std::cout << array[i] << ' ';
}
std::cout << '\n';
}
int main() {
static const int n = 6;
static const int r = 3;
static const int difference = n-r;
int array[r];
for(int i = 0; i < r; i++) {
array[i] = i;
}
int count = 0;
while(array[0] <= difference) {
++count;
print(array, r);
for(int i=r-1; i>=0; --i) {
++array[i];
if(array[i] <= difference + i) {
for(int j=i+1; j<r; ++j) {
array[j] = array[j-1] + 1;
}
break;
} } }
std::cout << "count: " << count << '\n';
}
Outputs
0 1 2
0 1 3
0 1 4
0 1 5
0 2 3
0 2 4
0 2 5
0 3 4
0 3 5
0 4 5
1 2 3
1 2 4
1 2 5
1 3 4
1 3 5
1 4 5
2 3 4
2 3 5
2 4 5
3 4 5
count: 20
I'm relatively new to CUDA programming. I have understood the programming model and have already written few basic kernels. I know how to apply a kernel to each element of a matrix (stored as 1D array), but now I'm trying to figure out how to apply the same operation to the same row/column of the input matrix.
Let's say I have a MxN matrix and a vector of length N. I would like to sum (but it can be any other math operation) the vector to each row of the matrix.
The serial code of such operation is:
for (int c = 0; c < columns; c++)
{
for (int r = 0; r < rows; r++)
{
M[r * rows + c] += V[c];
}
}
Now the CUDA code for doing this operation should be quite straightforward: I should spawn as many cuda threads as the elements and apply this kernel:
__global__ void kernel(const unsigned int size, float* matrix, const float* vector)
{
// get the current element index for the thread
unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < size)
{
// sum the current element with the
matrix[idx] += vector[threadIdx.x];
}
}
It runs but the result is not correct. Actually, it's correct if I transpose the matrix after the kernel completes its work. Unfortunately, I have no clue why it works in this way. Could you help me to figure out this problem? Thanks in advance.
EDIT #1
I launch the kernel using:
int block_size = 64;
int grid_size = (M * N + block_size - 1) / block_size;
kernel<<<grid_size, block_size>>>(M * N, matrix, vector);
EDIT #2
I solved the problem by fixing the CPU code as suggested by #RobertCrovella:
M[r * columns + c] += V[c];
It should match the outer for, that is, over the columns.
The kernel shown in the question could be used without modification to sum a vector to each of the rows of a matrix (assuming c-style row-major storage), subject to certain limitations. A demonstration is here.
The main limitation of that approach is that the maximum vector length and therefore matrix width that can be handled is equal to the maximum number of threads per block, which on current CUDA 7-supported GPUs is 1024.
We can eliminate that limitation with a slight modification to the vector indexing, and passing the row width (number of columns) as a parameter to the matrix. With this modification, we should be able to handle arbitrary matrix (and vector) sizes.
EDIT: based on discussion/comments, OP wants to know how to handle row-major or column major underlying storage. The following example uses a templated kernel to select either row-major or column major underlying storage, and also shows one possible CUBLAS method for doing a add-vector-to-each-matrix-row operation using rank-1 update function:
$ cat t712.cu
#include <iostream>
#include <cublas_v2.h>
#define ROWS 20
#define COLS 10
#define nTPB 64
#define ROW_MAJOR 0
#define COL_MAJOR 1
template <int select, typename T>
__global__ void vec_mat_row_add(const unsigned int height, const unsigned int width, T* matrix, const T* vector)
{
// get the current element index for the thread
unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < height*width)
{
// sum the current element with the
if (select == ROW_MAJOR)
matrix[idx] += vector[idx%width];
else // COL_MAJOR
matrix[idx] += vector[idx/height];
}
}
int main(){
float *h_mat, *d_mat, *h_vec, *d_vec;
const unsigned int msz = ROWS*COLS*sizeof(float);
const unsigned int vsz = COLS*sizeof(float);
h_mat = (float *)malloc(msz);
h_vec = (float *)malloc(vsz);
cudaMalloc(&d_mat, msz);
cudaMalloc(&d_vec, vsz);
for (int i=0; i<COLS; i++) h_vec[i] = i; // set vector to 0,1,2, ...
cudaMemcpy(d_vec, h_vec, vsz, cudaMemcpyHostToDevice);
// test row-major case
cudaMemset(d_mat, 0, msz); // set matrix to zero
vec_mat_row_add<ROW_MAJOR><<<(ROWS*COLS + nTPB -1)/nTPB, nTPB>>>(ROWS, COLS, d_mat, d_vec);
cudaMemcpy(h_mat, d_mat, msz, cudaMemcpyDeviceToHost);
std::cout << "Row-major result: " << std::endl;
for (int i = 0; i < ROWS; i++){
for (int j = 0; j < COLS; j++) std::cout << h_mat[i*COLS+j] << " ";
std::cout << std::endl;}
// test column-major case
cudaMemset(d_mat, 0, msz); // set matrix to zero
vec_mat_row_add<COL_MAJOR><<<(ROWS*COLS + nTPB -1)/nTPB, nTPB>>>(ROWS, COLS, d_mat, d_vec);
cudaMemcpy(h_mat, d_mat, msz, cudaMemcpyDeviceToHost);
std::cout << "Column-major result: " << std::endl;
for (int i = 0; i < ROWS; i++){
for (int j = 0; j < COLS; j++) std::cout << h_mat[j*ROWS+i] << " ";
std::cout << std::endl;}
// test CUBLAS, doing matrix-vector add using <T>ger
cudaMemset(d_mat, 0, msz); // set matrix to zero
float *d_ones, *h_ones;
h_ones = (float *)malloc(ROWS*sizeof(float));
for (int i =0; i<ROWS; i++) h_ones[i] = 1.0f;
cudaMalloc(&d_ones, ROWS*sizeof(float));
cudaMemcpy(d_ones, h_ones, ROWS*sizeof(float), cudaMemcpyHostToDevice);
cublasHandle_t ch;
cublasCreate(&ch);
float alpha = 1.0f;
cublasStatus_t stat = cublasSger(ch, ROWS, COLS, &alpha, d_ones, 1, d_vec, 1, d_mat, ROWS);
if (stat != CUBLAS_STATUS_SUCCESS) {std::cout << "CUBLAS error: " << (int)stat << std::endl; return 1;}
cudaMemcpy(h_mat, d_mat, msz, cudaMemcpyDeviceToHost);
std::cout << "CUBLAS Column-major result: " << std::endl;
for (int i = 0; i < ROWS; i++){
for (int j = 0; j < COLS; j++) std::cout << h_mat[j*ROWS+i] << " ";
std::cout << std::endl;}
return 0;
}
$ nvcc -o t712 t712.cu -lcublas
$ ./t712
Row-major result:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
Column-major result:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
CUBLAS Column-major result:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
$
For brevity of presentation, I've not included proper cuda error checking, but that is always a good idea any time you are having trouble with a CUDA code. As a proxy/shortcut, you can run your code with cuda-memcheck as a quick check to see if there are any CUDA errors.
Note that we expect all 3 printouts to be identical because that is actually the correct way to display the matrix, regardless of whether the underlying storage is row-major or column-major. The difference in underlying storage is accounted for in the for-loops handling the display output.
Robert Crovella has already answered this question providing examples using explicit CUDA kernels and cuBLAS.
I find it useful, for future references, to show also an example on how performing row-wise or column-wise operations using CUDA Thrust. In particular, I'm focusing on two problems:
Summing a column vector to all matrix columns;
Summing a row vector to all matrix rows.
The generality of thrust::transform enables to generalize the example below to elementwise operations other than the sum (e.g., multiplications, divisions, subtractions etc.).
#include <thrust/device_vector.h>
#include <thrust/reduce.h>
#include <thrust/random.h>
#include <thrust/sort.h>
#include <thrust/unique.h>
#include <thrust/equal.h>
using namespace thrust::placeholders;
/*************************************/
/* CONVERT LINEAR INDEX TO ROW INDEX */
/*************************************/
template <typename T>
struct linear_index_to_row_index : public thrust::unary_function<T,T> {
T Ncols; // --- Number of columns
__host__ __device__ linear_index_to_row_index(T Ncols) : Ncols(Ncols) {}
__host__ __device__ T operator()(T i) { return i / Ncols; }
};
/********/
/* MAIN */
/********/
int main()
{
/**************************/
/* SETTING UP THE PROBLEM */
/**************************/
const int Nrows = 10; // --- Number of rows
const int Ncols = 3; // --- Number of columns
// --- Random uniform integer distribution between 0 and 100
thrust::default_random_engine rng;
thrust::uniform_int_distribution<int> dist1(0, 100);
// --- Random uniform integer distribution between 1 and 4
thrust::uniform_int_distribution<int> dist2(1, 4);
// --- Matrix allocation and initialization
thrust::device_vector<float> d_matrix(Nrows * Ncols);
for (size_t i = 0; i < d_matrix.size(); i++) d_matrix[i] = (float)dist1(rng);
// --- Column vector allocation and initialization
thrust::device_vector<float> d_column(Nrows);
for (size_t i = 0; i < d_column.size(); i++) d_column[i] = (float)dist2(rng);
// --- Row vector allocation and initialization
thrust::device_vector<float> d_row(Ncols);
for (size_t i = 0; i < d_row.size(); i++) d_row[i] = (float)dist2(rng);
printf("\n\nOriginal matrix\n");
for(int i = 0; i < Nrows; i++) {
std::cout << "[ ";
for(int j = 0; j < Ncols; j++)
std::cout << d_matrix[i * Ncols + j] << " ";
std::cout << "]\n";
}
printf("\n\nColumn vector\n");
for(int i = 0; i < Nrows; i++) std::cout << d_column[i] << "\n";
printf("\n\nRow vector\n");
for(int i = 0; i < Ncols; i++) std::cout << d_row[i] << " ";
/*******************************************************/
/* ADDING THE SAME COLUMN VECTOR TO ALL MATRIX COLUMNS */
/*******************************************************/
thrust::device_vector<float> d_matrix2(d_matrix);
thrust::transform(d_matrix.begin(), d_matrix.end(),
thrust::make_permutation_iterator(
d_column.begin(),
thrust::make_transform_iterator(thrust::make_counting_iterator(0), linear_index_to_row_index<int>(Ncols))),
d_matrix2.begin(),
thrust::plus<float>());
printf("\n\nColumn + Matrix -> Result matrix\n");
for(int i = 0; i < Nrows; i++) {
std::cout << "[ ";
for(int j = 0; j < Ncols; j++)
std::cout << d_matrix2[i * Ncols + j] << " ";
std::cout << "]\n";
}
/*************************************************/
/* ADDING THE SAME ROW VECTOR TO ALL MATRIX ROWS */
/*************************************************/
thrust::device_vector<float> d_matrix3(d_matrix);
thrust::transform(thrust::make_permutation_iterator(
d_matrix.begin(),
thrust::make_transform_iterator(thrust::make_counting_iterator(0),(_1 % Nrows) * Ncols + _1 / Nrows)),
thrust::make_permutation_iterator(
d_matrix.begin(),
thrust::make_transform_iterator(thrust::make_counting_iterator(0),(_1 % Nrows) * Ncols + _1 / Nrows)) + Nrows * Ncols,
thrust::make_permutation_iterator(
d_row.begin(),
thrust::make_transform_iterator(thrust::make_counting_iterator(0), linear_index_to_row_index<int>(Nrows))),
thrust::make_permutation_iterator(
d_matrix3.begin(),
thrust::make_transform_iterator(thrust::make_counting_iterator(0),(_1 % Nrows) * Ncols + _1 / Nrows)),
thrust::plus<float>());
printf("\n\nRow + Matrix -> Result matrix\n");
for(int i = 0; i < Nrows; i++) {
std::cout << "[ ";
for(int j = 0; j < Ncols; j++)
std::cout << d_matrix3[i * Ncols + j] << " ";
std::cout << "]\n";
}
return 0;
}
I tried solving this interview question. My code runs for test cases but fails for all the real input test cases. I tried hard to find the mistake but unable to do so. Please find my code below the question
Bob loves sorting very much. He is always thinking of new ways to sort an array.His friend Ram gives him a challenging task. He gives Bob an array and an integer K. The challenge is to produce the lexicographical minimal array after at most K-swaps. Only consecutive pairs of elements can be swapped. Help Bob in returning the lexicographical minimal array possible after at most K-swaps.
Input: The first line contains an integer T i.e. the number of Test cases. T test cases follow. Each test case has 2 lines. The first line contains N(number of elements in array) and K(number of swaps). The second line contains n integers of the array.
Output: Print the lexicographical minimal array.
Constraints:
1 <= T <= l0
1 <= N,K <= 1000
1 <= A[i] <= 1000000
Sample Input (Plaintext Link)
2
3 2
5 3 1
5 3
8 9 11 2 1
Sample Output (Plaintext Link)
1 5 3
2 8 9 11 1
Explanation
After swap 1:
5 1 3
After swap 2:
1 5 3
{1,5,3} is lexicographically minimal than {5,1,3}
Example 2:
Swap 1: 8 9 2 11 1
Swap 2: 8 2 9 11 1
Swap 3: 2 8 9 11 1
#include <iostream>
using namespace std;
void trySwap(int a[], int start, int end, int *rem)
{
//cout << start << " " << end << " " << *rem << endl;
while (*rem != 0 && end > start)
{
if (a[end - 1] > a[end])
{
swap(a[end - 1], a[end]);
(*rem)--;
end--;
}
else end--;
}
}
void getMinimalLexicographicArray(int a[], int size, int k)
{
int start , rem = k, window = k;
for (start = 0; start < size; start++)
{
window = rem;
if (rem == 0)
return;
else
{
//cout << start << " " << rem << endl;
int end = start + window;
if (end >= size)
{
end = size - 1;
}
trySwap(a, start, end, &rem);
}
}
}
int main()
{
int T, N, K;
int a[1000];
int i, j;
cin >> T;
for (i = 0; i < T; i++)
{
cin >> N;
cin >> K;
for (j = 0; j < N; j++)
{
cin >> a[j];
}
getMinimalLexicographicArray(a, N, K);
for (j = 0; j < N; j++)
cout << a[j] << " ";
cout << endl;
}
return 0;
}
Python solution can be easily translated to C++:
def findMinArray(arr, k):
i = 0
n = len(arr)
while k > 0 and i < n:
min_idx = i
hi = min(n, i + k + 1)
for j in range(i, hi):
if arr[j] < arr[min_idx]:
min_idx = j
for j in range(min_idx, i, -1):
arr[j - 1], arr[j] = arr[j], arr[j - 1]
k -= min_idx - i
i += 1
return arr
Here are two failed test cases.
2
2 2
2 1
5 3
3 2 1 5 4
In the first, your code makes no swaps, because K >= N. In the second, your code swaps 5 and 4 when it should spend its third swap on 3 and 2.
EDIT: the new version is still too greedy. The correct output for
1
10 10
5 4 3 2 1 10 9 8 7 6
is
1 2 3 4 5 10 9 8 7 6
.