Bytes in serial code to an array on Arduino - c++

I've read with a logic analyzer a TX of a controller. I know that it works at 1200 Baud and I have identified the frames according to this photo:
Frames
I have identified in the frame:
1byte - Always 54
2byte - Sequential, ++ per frame
3byte - Always 0
4, 5 and 6byte - Data
7byte - Always 0
8, 9, 10 and 11- Data
12, 13, 14 and 15- They vary (I understand that 15 is a checksum)
I cannot identify the checksum (I suspect Checksum8 Xor due to its similarity to another controller).
I try to take each byte with Arduino to a position of an Array, knowing that the first byte is constant (54) and the frame is always the same length.
Could it be that as the Arduino loop being faster than Serial it duplicates the data in all positions of each Array?
Looking for information I have read that when I make a Serial.print it works correctly (each byte is not repeated, but when I write an array with while (Serial.available ()) it fails.
I leave some frames obtained through arduino with:
#include <Arduino.h>
void setup() {
Serial.begin(9600);
Serial1.begin(1200);
}
void loop() {
if (Serial1.available()) {
int test= Serial1.read();
Serial.println(test);
}
}
54 154 0 84 84 84 0 84 84 84 84 201 133 84 224
54 155 0 89 89 89 0 89 89 89 89 206 138 89 233
54 156 0 2 2 2 0 2 2 2 2 119 51 2 238
54 157 0 7 7 7 0 7 7 7 7 124 56 7 239
54 158 0 0 0 0 0 0 0 0 0 117 49 0 236
54 159 0 5 5 5 0 5 5 5 5 122 54 5 229
Any help is welcome
Thank you very much

Related

How to correctly compress a vector using ZSTD simple API?

I'm new to C++ and I wanted to compress a vector via ZSTD compression library. I used ZSTD simple API ZSTD_compress and ZSTD_decompress in the same way as the example. But I found a wired issue that when I compressed and decompressed a vector, the decompressed vector was not the same as the original vector. I'm not sure which part of my operation went wrong.
I looked at ZSTD's GitHub homepage and didn't find an answer. Please help or try to give some ideas how to solve it.
Example C code: https://github.com/facebook/zstd/blob/dev/examples/simple_compression.c
//Initialize a vector
vector<int> NumToCompress ;
NumToCompress.resize(10000);
for(int i = 0; i < 10000; i++)
{
NumToCompress[i] = rand()% 255;
}
//compress
int* com_ptr = NULL;
size_t NumSize = NumToCompress.size();
size_t Boundsize = ZSTD_compressBound(NumSize);
com_ptr =(int*) malloc(Boundsize);
size_t ComSize;
ComSize = ZSTD_compress(com_ptr,Boundsize,NumToCompress.data(),NumToCompress.size(),ZSTD_fast);
//decompress
int* decom_ptr = NULL;
unsigned long long decom_Boundsize;
decom_Boundsize = ZSTD_getFrameContentSize(com_ptr, ComSize);
decom_ptr = (int*)malloc(decom_Boundsize);
size_t DecomSize;
DecomSize = ZSTD_decompress(decom_ptr, decom_Boundsize, com_ptr, ComSize);
vector<int> NumAfterDecompress(decom_ptr,decom_ptr+DecomSize);
//check if two vectors are same
if(NumToCompress == NumAfterDecompress)
{
cout << "Two vectors are same" << endl;
}else
{cout << "Two vectors are insame" << endl;}
free(com_ptr);
free(decom_ptr);
case 1: If zstd can compress std::vector directly?
case 2: How to properly compress vectors with zstd if zstd can compress std::vector directly ?
Two vectors are insame
Original vector:
163 151 162 85 83 190 241 252 249 121 107 82 20 19 233 226 45 81 142 31 86 8 87 39 167 5 212 208 82 130 119 117 27 153 74 237 88 61 106 82 54 213 36 74 104 142 173 149 95 60 53 181 196 140 221 108 17 50 61 226 180 180 89 207 206 35 61 39 223 167 249 150 252 30 224 102 44 14 123 140 202 48 66 143 188 159 123 206 209 184 177 135 236 138 214 187 46 21 99 14
Decompressed vector:
163 151 162 85 83 190 241 252 249 121 107 82 20 19 233 226 45 81 142 31 86 8 87 39 167 0 417 0551929248 21916 551935408 21916 551933352 21916 551939512 21916 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Seems likely that your API expects the size to be in bytes but you give the size as the number of elements. So you need to multiply the number of elements by the size of each element. Like this
ComSize = ZSTD_compress(com_ptr, Boundsize, NumToCompress.data(),
NumToCompress.size()*sizeof(int), ZSTD_fast);
and similarly when you decompress you need to divide by the element size
DecomSize = ZSTD_decompress(decom_ptr, decom_Boundsize, com_ptr, ComSize);
vector<int> NumAfterDecompress(decom_ptr, decom_ptr+DecomSize/sizeof(int));

CLion + std::ios_base::sync_with_stdio(false) make std::getline only reads the first 509 characters

Please forgive me for my naive English and ignorance as I am new to programming and C++.
Trying my best to provide basic information:
I use CLion (student version) to code in C++.
In my Windows 10 notebook it seems to be named CLion 2019.3.5 x64.
It is recently updated to Version 2020.1.
Installed MinGW (I believe latest, but I'm not sure, sorry).
My question:
Well... As in title, I tried to use std::getline() to read some text.
Not std::cin.getline(), but std::getline() (sorry that I don't quite know the difference, but I found them to be different functions requiring different parameters).
I use std::getline() because (obviously) I want to read a string with spaces between.
However, it seems that it only reads first 509 characters in my input...
But WHY???????
It may seem more likely if it reads first 512 = 2^9 or something.
Example:
It is just one of the code I tested.
#include <iostream>
int main() {
std::ios_base::sync_with_stdio(false); //THIS IS SOMETHING REMARKABLE in the context of my problem, which I would explain below
std::string testing;
std::getline(std::cin, testing); //There is no '\n' at the beginning so it would work well.
std::cout << "testing.max_size() = " << testing.max_size() << '\n' //Just testing, should be 2147483647 or something, which therefore it's not the problem with how many characters a string can store.
<< "testing.length() = " << testing.length() << '\n'
<< R"(testing = ")" << testing << '"' << std::endl;
return 0;
}
Input:
(Some long string with length 689, actually for my own purpose)
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 60 29 61 31 62 33 63 35 64 37 65 39 66 41 67 43 68 45 69 47 70 49 71 51 72 53 73 55 74 19 75 29 76 7 77 31 78 15 79 33 80 7 81 35 82 11 83 37 84 19 85 39 86 15 87 41 88 11 89 43 90 27 91 45 92 25 93 47 94 23 95 49 96 21 97 51 98 3 99 53 100 3 101 55 102 17 103 13 104 27 105 25 106 23 107 21 108 9 109 5 110 13 111 17 112 5 113 9 114 0 115 0 116 0 117 0 118 0 119 0 120 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Output:
testing.max_size() = 2147483647
testing.length() = 509
testing = "60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 9
6 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 60 29 61 31 62 33 63 35 6
4 37 65 39 66 41 67 43 68 45 69 47 70 49 71 51 72 53 73 55 74 19 75 29 76 7 77 31 78 15 79 33 80 7 81 35 82 11 83 37 84
19 85 39 86 15 87 41 88 11 89 43 90 27 91 45 92 25 93 47 94 23 95 49 96 21 97 51 98 3 99 53 100 3 101 55 102 17 103 13 1
04 27 105 25 106 23 107 21 108 9 109 5 1"
(In CLion console, oftenly some lines are broken to two, which I don't know why, but doesn't matter in the context of my question)
Explanation:
As you can see, testing.length() == 509, and I have tested in C++14 as well as C++20, this project (of CLion) as well as other project opened, and of course with other texts (at position 509 there is no special characters, which I tried by replacing it with other characters but still get the same result).
And by the way I have the habit of using std::ios_base::sync_with_stdio(false) to increase the speed of input and output, which I think it's nice to me (familiar with this line of code in the beginning of my program).
However, (may save your testing time), the problem is with std::ios_base::sync_with_stdio(false). When I remove this line, the program works very well, even for strings a few times longer in length.
However, I don't like omitting this line, as I think it does increase the speed of i/o and it is not necessary to remove it in general.
Also, it is very specific to CLion (as far as I know about C++ and programming).
I tested (with std::ios_base::sync_with_stdio(false)) in some online C++ compiler and it works fine (testing.length() == 689 for my input):
https://www.onlinegdb.com/online_c++_compiler
https://www.jdoodle.com/online-compiler-c++/
Summarization of my problem:
Why, exclusively in CLion (but not some online compilers), the code std::ios_base::sync_with_stdio(false) exclusively makes std::getline() read at most exactly 509 characters (I tested on several projects)?
This doesn't seem reasonable to me...
More:
I tried to search for similar problems, but sorry, I can't find any (very) relevant posts.
I would like some detailed but comprehensive answers, if possible, as I am not an expert in C++ (though I am willing to learn some more about C++).
I hope I can solve this problem without stop using CLion / stop using std::ios_base::sync_with_stdio(false) (competitive programming!) / stop using std::getline(), thx.
Yes I know the perfomance increase is normally small, see answer to this post, but most of the time std::ios_base::sync_with_stdio(false) is important to me, so is there some fix or whatsoever that don't require std::ios_base::sync_with_stdio(true)? Even if no, I would wonder why...
Any help would be appreciated, thank you!

Next higher number with one zero bit

Today I've run into this problem, but I couldn't solve it after a period of time. I need some help
I have number N. The problem is to find next higher number ( > N ) with only one zero bit in binary.
Example:
Number 1 can be represented in binary as 1.
Next higher number with only one zero bit is 2 - Binary 10
A few other examples:
N = 2 (10), next higher number with one zero bit is 5 (101)
N = 5 (101), next higher number is 6 (110)
N = 7 (111), next higher number is 11 (1011)
List of 200 number:
1 1
2 10 - 1
3 11
4 100
5 101 - 1
6 110 - 1
7 111
8 1000
9 1001
10 1010
11 1011 - 1
12 1100
13 1101 - 1
14 1110 - 1
15 1111
16 10000
17 10001
18 10010
19 10011
20 10100
21 10101
22 10110
23 10111 - 1
24 11000
25 11001
26 11010
27 11011 - 1
28 11100
29 11101 - 1
30 11110 - 1
31 11111
32 100000
33 100001
34 100010
35 100011
36 100100
37 100101
38 100110
39 100111
40 101000
41 101001
42 101010
43 101011
44 101100
45 101101
46 101110
47 101111 - 1
48 110000
49 110001
50 110010
51 110011
52 110100
53 110101
54 110110
55 110111 - 1
56 111000
57 111001
58 111010
59 111011 - 1
60 111100
61 111101 - 1
62 111110 - 1
63 111111
64 1000000
65 1000001
66 1000010
67 1000011
68 1000100
69 1000101
70 1000110
71 1000111
72 1001000
73 1001001
74 1001010
75 1001011
76 1001100
77 1001101
78 1001110
79 1001111
80 1010000
81 1010001
82 1010010
83 1010011
84 1010100
85 1010101
86 1010110
87 1010111
88 1011000
89 1011001
90 1011010
91 1011011
92 1011100
93 1011101
94 1011110
95 1011111 - 1
96 1100000
97 1100001
98 1100010
99 1100011
100 1100100
101 1100101
102 1100110
103 1100111
104 1101000
105 1101001
106 1101010
107 1101011
108 1101100
109 1101101
110 1101110
111 1101111 - 1
112 1110000
113 1110001
114 1110010
115 1110011
116 1110100
117 1110101
118 1110110
119 1110111 - 1
120 1111000
121 1111001
122 1111010
123 1111011 - 1
124 1111100
125 1111101 - 1
126 1111110 - 1
127 1111111
128 10000000
129 10000001
130 10000010
131 10000011
132 10000100
133 10000101
134 10000110
135 10000111
136 10001000
137 10001001
138 10001010
139 10001011
140 10001100
141 10001101
142 10001110
143 10001111
144 10010000
145 10010001
146 10010010
147 10010011
148 10010100
149 10010101
150 10010110
151 10010111
152 10011000
153 10011001
154 10011010
155 10011011
156 10011100
157 10011101
158 10011110
159 10011111
160 10100000
161 10100001
162 10100010
163 10100011
164 10100100
165 10100101
166 10100110
167 10100111
168 10101000
169 10101001
170 10101010
171 10101011
172 10101100
173 10101101
174 10101110
175 10101111
176 10110000
177 10110001
178 10110010
179 10110011
180 10110100
181 10110101
182 10110110
183 10110111
184 10111000
185 10111001
186 10111010
187 10111011
188 10111100
189 10111101
190 10111110
191 10111111 - 1
192 11000000
193 11000001
194 11000010
195 11000011
196 11000100
197 11000101
198 11000110
199 11000111
200 11001000
There are three cases.
The number x has more than one zero bit in its binary representation. All but one of these zero bits must be "filled in" with 1 to obtain the required result. Notice that all numbers obtained by taking x and filling in one or more of its low-order zero bits are numerically closer to x compared to the number obtained by filling just the top-most zero bit. Therefore the answer is the number x with all-but-one of its zero bits filled: only its topmost zero bit remains unfilled. For example if x=110101001 then the answer is 110111111. To get the answer, find the index i of the topmost zero bit of x, and then calculate the bitwise OR of x and 2^i - 1.
C code for this case:
// warning: this assumes x is known to have *some* (>1) zeros!
unsigned next(unsigned x)
{
unsigned topmostzero = 0;
unsigned bit = 1;
while (bit && bit <= x) {
if (!(x & bit)) topmostzero = bit;
bit <<= 1;
}
return x | (topmostzero - 1);
}
The number x has no zero bits in binary. It means that x=2^n - 1 for some number n. By the same reasoning as above, the answer is then 2^n + 2^(n-1) - 1. For example, if x=111, then the answer is 1011.
The number x has exactly one zero bit in its binary representation. We know that the result must be strictly larger than x, so x itself is not allowed to be the answer. If x has the only zero in its least-significant bit, then this case reduces to case #2. Otherwise, the zero should be moved one position to the right. Assuming x has zero in its i-th bit, the answer should have its zero in i-1-th bit. For example, if x=11011, then the result is 11101.
You could also use another approach:
Every number with exactly one zero bit can be represented as
2^n - 1 - 2^m
Now the task is easy:
1. Find an n, great enough for at least 2^n-1-2^0>x, that's equivalent to 2^n>x+2
2. Find the greatest m for which 2^n-1-2^m is still greater than x.
as Code:
#include <iostream>
#include <math.h>
using namespace std;
//binary representation
void bin(unsigned n)
{
for (int i = floor(log2(n));i >= 0;--i)
(n & (1<<i))? printf("1"): printf("0");
}
//outputs the next greater int to x with exactly one 0 in binary representation
int nextHigherOneZero(int x)
{
unsigned int n=0;
while((1<<n)<= x+2 ) ++n;
unsigned int m=0;
while((1<<n)-1-(1<<(m+1)) > x && m<n-2)
++m;
return (1<<n)-1-(1<<m);
}
int main()
{
int r=0;
for(int i = 1; i<100;++i){
r=nextHigherOneZero(i);
printf("\nX: %i=",i);
bin(i);
printf(";\tnextHigherOneZero(x):%i=",r);
bin(r);
printf("\n");
}
return 0;
}
You can try it here (with some additional Debug-Output):
http://ideone.com/6w3fAN
As a note: its probably possible to get m and n faster with some good binary logic, feel free to contribute...
Pro of this approach:
No assumptions needs to be made
Cons:
Ugly while loops
couldn't miss the opportunity to remember binary logic :), here's my solution:
here's main
main(int argc, char** argv)
{
int i = 139261;
i++;
while (!oneZero(i))
{
i++;
}
std::cout << i;
}
and here's all logic to find if number has 1 zero
bool oneZero(int i)
{
int count = 0;
while (i != 0)
{
// check last bit if it is zero
if ((1 & i) == 0) {
count++;
if (count > 1) return false;
}
// make the number shorter :)
i = i >> 1;
}
return (count == 1);
}

Regarding Standard Oxford Format for vlfeat sift

One of my upper classmates has given me a data set for experimenting with vlfeat's SIFT, however, her extracted SIFT data for the frame part contains 5 dimensions. An example is given below:
192
9494
262.08 749.211 0.00295391 -0.00030945 0.00583025 0 0 0 45 84 107 86 8 10 49 31 21 32 37 46 50 11 23 49 60 29 30 24 17 4 15 67 25 28 47 13 11 27 9 0 40 117 99 27 3 117 117 39 19 11 18 16 32 8 27 50 117 102 20 23 18 2 10 36 45 47 84 37 16 36 31 9 50 112 52 12 9 117 36 6 4 3 15 54 117 9 3 2 31 94 101 92 23 0 20 47 36 38 14 1 0 34 19 39 52 27 0 0 31 6 14 18 29 24 13 11 11 12 10 3 1 4 25 29 5 0 5 6 3 12 29 35 2 93 73 61 50 123 118 100 109 58 44 79 122 120 108 103 87 92 61 28 33 55 107 123 123 37 73 60 32 93 123 123 89 118 118 77 66 118 118 63 96 118 94 60 27 41 74 108 118 107 81 107 118 118 43 73 64 118 118 118 56 45 38 27 58
432.424 57.2287 0.00285143 -0.00048992 0.00292525 10 12 19 26 88 43 14 10 3 4 44 50 125 74 0 1 2 4 47 34 17 3 0 0 3 3 8 6 1 0 0 1 11 12 14 17 43 37 10 6 35 36 125 77 47 10 5 13 2 7 125 125 125 29 0 2 1 3 11 15 33 5 1 0 36 14 7 8 102 64 37 27 41 8 2 2 55 53 103 125 4 2 2 5 125 125 41 28 1 3 4 7 32 11 3 1 46 29 6 7 125 57 3 3 49 11 0 1 90 34 19 31 10 3 3 6 122 33 10 9 0 2 11 10 7 2 2 1 35 64 129 129 129 93 48 44 24 55 129 117 129 71 41 19 44 65 76 58 129 129 129 89 42 48 57 96 129 129 90 55 133 118 58 42 58 42 133 133 133 62 24 17 18 12 133 133 133 133 133 125 78 33 17 29 133 133 82 45 23 11 13 44
... // the list keeps on going for all keypoints.
This file is simply descriptors' data of an image. There are a few things I need to know:
what are the first two values '192' and '9494'?
what is the 5th value for the keypoint? vlfeat's sift normally gives out 4 values for key point's frame.
So I asked her what is this 5th dimension, and she pointed me to search for "standard oxford format" for sift feature.
The thing is I tried to search around regarding this standard oxford format and sift feature, but I got no luck in finding it at all. If somebody knows anything regarding this, could he please point me to the right direction?
192 represents the descriptor length ,9494 represent the Number of key-points you have in the file.
The other line consists of [WORD_ID] [X] [Y] [A] [B] [C]
X and Y is the feature centroid and A, B, C define the parameters of
the ellipse in the following equation A*(x-X)^2 + 2*B*(x-X)(y-Y) + C(y-Y)^2 = 1
You can check the official website for the formate Here
If you are using VLfeat package you can read here how to read the file in Oxford format.
If you are very curious how the file formate is read in VLfeat vl_ubcread function. Here is the code.

Why referencing to outer memory from within __global__ function messes everything up?

I'm writing some code in CUDA (Huffman algorithm to be exact, but it's totally irrelevant to the case). I've got a file Paralellel.cu with two functions: one (WriteDictionary) is an ordinary function, the second (wrtDict) is a special CUDA _global_ function running in CUDA GPU. Here are bodies of these functions:
//I know body of this function looks kinda not-related
// to program main topic, but it's just for tests.
__global__ void wrtDict(Node** nodes, unsigned char* str)
{
int i = threadIdx.x;
Node* n = nodes[i];
char c = n->character;
str[6 * i] = 1;//c; !!!
str[6 * i + 1] = 2;
str[6 * i + 2] = 0;
str[6 * i + 3] = 0;
str[6 * i + 4] = 0;
str[6 * i + 5] = 0;
}
I know these two first lines seem pointless, since I don't use this object n of Node class here, but just let them be for a while. And there's a super secret comment marked by "!!!". Here is WriteDictionary:
void WriteDictionary(NodeList* nodeList, unsigned char* str)
{
Node** nodes = nodeList->elements;
int N = nodeList->getCount();
Node** cudaNodes;
unsigned char* cudaStr;
cudaMalloc((void**)&cudaStr, 6 * N * sizeof(unsigned char));
cudaMalloc((void**)&cudaNodes, N * sizeof(Node*));
cudaMemcpy(cudaStr, str, 6 * N * sizeof(char), cudaMemcpyHostToDevice);
cudaMemcpy(cudaNodes, nodes, N * sizeof(Node*), cudaMemcpyHostToDevice);
dim3 block(1);
dim3 thread(N);
std::cout << N << "\n";
wrtDict<<<block,thread>>>(cudaNodes, cudaStr);
cudaMemcpy(str, cudaStr, 6 * N * sizeof(unsigned char), cudaMemcpyDeviceToHost);
cudaFree(cudaNodes);
cudaFree(cudaStr);
}
As one can see, the function WriteDictionary is kind of a proxy between CUDA and rest of the program. I've got a bunch of objects of my class Node somewhere in an ordinary memory pointed by the Node * array elements keeped within my object NodeList. For now it's enough to know about Node, that it has a public field char character. A char * str for now is going to be filled with some test data. It contains 6 * N allocated memory for chars, where N = count of all elements in the elements array. So I allocate in CUDA a memory space for 6 * N chars and N Node pointers. Then I copy there my Node pointers, they're still pointing to an ordinary memory. I'm running the function. Within the function wrtDict I'm extracting character into char c variable and this time NOT trying to put it into output array str.
So, when I'm writing a content of output array str (outside WriteDictionary function), I'm getting perfectly correct answer, i.e.:
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0 1 2 0 0 0 0
1 2 0 0 0 0
Yeah, here we've got 39 correct sixes of chars (shown in hex). BUT when we slightly change our super secret comment within wrtDict function, like this:
__global__ void wrtDict(Node** nodes, unsigned char* str)
{
int i = threadIdx.x;
Node* n = nodes[i];
char c = n->character;
str[6 * i] = c;//1; !!!
str[6 * i + 1] = 2;
str[6 * i + 2] = 0;
str[6 * i + 3] = 0;
str[6 * i + 4] = 0;
str[6 * i + 5] = 0;
}
we will see strange things. I'm now expecting the first char of every six to be a character from Node pointed by the array - each one different. Or, even if it fails, I'm expecting only the first char of every six to be messed up, but the rest of them left intact: ? 2 0 0 0 0. But NO! When I do this EVERYTHING completely messes up, and now content of an output array str looks like this:
70 21 67 b7 70 21 67 b7 0 0 0 0
0 0 0 0 18 d7 85 8 b8 d7 85 8
78 d7 85 8 38 d9 85 8 d8 d7 85 8
f8 d5 85 8 58 d6 85 8 d8 d5 85 8
78 d6 85 8 b8 d6 85 8 98 d7 85 8
98 d6 85 8 38 d6 85 8 d8 d6 85 8
38 d5 85 8 18 d6 85 8 f8 d6 85 8
58 d9 85 8 f8 d7 85 8 78 d9 85 8
98 d9 85 8 d8 d4 85 8 b8 d8 85 8
38 d8 85 8 38 d7 85 8 78 d8 85 8
f8 d8 85 8 d8 d8 85 8 18 d5 85 8
61 20 75 6c 74 72 69 63 65 73 20 6d
6f 6c 65 73 74 69 65 20 73 69 74 20
61 6d 65 74 20 69 64 20 73 61 70 69
65 6e 2e 20 4d 61 75 72 69 73 20 73
61 70 69 65 6e 20 65 73 74 2c 20 64
69 67 6e 69 73 73 69 6d 20 61 63 20
70 6f 72 74 61 20 75 74 2c 20 76 75
6c 70 75 74 61 74 65 20 61 63 20 61
6e 74 65 2e 20 46
I'm asking now - why? Is it because I tried to reach an ordinary memory from within CUDA GPU? I'm getting a warning, probably about exactly this case, saying:
Cannot tell what pointer points to, assuming global memory space
I've googled about this, found only this, that CUDA it's reaching exactly an ordinary memory, cause couldn't find out where to reach, and this warning in 99.99% should be ignored. So I'm ignoring it, thinking it'll be fine, but it isn't - is my case within that 0.01%?
How can I solve this problem? I know I could just copy Nodes, not pointers to them, into CUDA, but I assume copying them would cost me more time than I save paralellizing what's being done to them inside. I could also extract character from every Node, put them all into an array and then copy it to CUDA, but - the same problem as in the previous statement.
I just completely don't know what to do and, what's worse, deadline of CUDA project in my college is today, apx. 17pm (I just haven't got enough time to make it earlier, damn it...).
PS. If it helps: I'm compiling using pretty simple (no any switches) command:
nvcc -o huff ArchiveManager.cpp IOManager.cpp Node.cpp NodeList.cpp Program.cpp Paraleller.cu
This is a terrible question, see talonmies' comment.
Check the error values from every CUDA API call. You will get a launch failure message on the cudaMemcpy after your kernel launch
Run cuda-memcheck to help debug the error (which is basically a segmentation fault)
Realise that you are dereferencing a (unmapped) pointer into host memory from the GPU, you need to copy the nodes, not just the pointers to the nodes
You can also run your program from inside cuda-gdb. cuda-gdb will show you what error you're hitting. Also, right at the beginning in cuda-gdb, do a "set cuda memcheck on", it will turn on memcheck inside cuda-gdb.
In the latest cuda-gdb version (5.0 as of today), you can also see warnings if you're not checking return codes from API calls and those API calls are failing.