C++ To Cuda Conversion/String Generation And Comparison

C++ To Cuda Conversion/String Generation And Comparison - c++

So I am in a basic High School coding class. We had to think up one
of our semester projects. I chose to
base mine on ideas and applications
that arn't used in traditional code.
This brought up the idea for use of
CUDA. One of the best ways I would
know to compare speed of traditional
methods versus unconventional is
string generation and comparison. One
could demonstrate the generation and
matching speed of traditional CPU
generation with timers and output. And
then you could show the increase(or
decrease) in speed and output of GPU
Processing.
I wrote this C++ code to generate random characters that are input into
a character array and then match that
array to a predetermined string.
However like most CPU programming it
is incredibly slow comparatively to
GPU programming. I've looked over CUDA
API and could not find something that
would possibly lead me in the right
direction for what I'm looking to do.
Below is the code I have written in C++, if anyone could point me in
the direction of such things as a
random number generator that I can
convert to chars using ASCII codes,
that would be excellent.
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;
int sLength = 0;
int count = 0;
int stop = 0;
int maxValue = 0;
string inString = "aB1#";
static const char alphanum[] =
"0123456789"
"!##$%^&*"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
int stringLength = sizeof(alphanum) - 1;
char genRandom()
{
return alphanum[rand() % stringLength];
}
int main()
{
cout << "Length of string to match?" << endl;
cin >> sLength;
string sMatch(sLength, ' ');
while(true)
{
for (int x = 0; x < sLength; x++)
{
sMatch[x] = genRandom();
//cout << sMatch[x];
count++;
if (count == 2147000000)
{
count == 0;
maxValue++;
}
}
if (sMatch == inString)
{
cout << "It took " << count + (maxValue*2147000000) << " randomly generated characters to match the strings." << endl;
cin >> stop;
}
//cout << endl;
}
}

If you want to implement a pseudorandom number generator using CUDA, have a look over here. If you want to generate chars from a predetermined set of characters, you can just put all possible chars into that array and create a random index (just as you are doing it right now).
But I think it might be more valuable comparison might be one that uses brute force. Therefore, you could adapt your program to try not random strings, but try one string after another in any meaningful order.
Then, on the other hand, you could implement the brute-force stuff on the GPU using CUDA. This can be tricky since you might want to stop all CUDA threads as soon as one of them finds a solution. I could imagine the brute force process using CUDA the following way: One thread tries aa as first two letters and brute-forces all following digits, the next thread tries ab as first two letters and brute-forces all following digits, the next thread tries ac as first two letters and brute-forces all following digits, and so on. All these threads run in parallel. Of course, you could vary the number of predetermined chars such that e.g. the first thread tries aaaa, the second aaab. Then, you could compare different input values.
Any way, if you have never dealt with CUDA, I recommend the vector addition sample, a very basic CUDA example, that serves very well for getting a basic understanding of what's going on with CUDA. Moreover, you should read the CUDA programming guide to make yourself familiar with CUDAs concept of a grid of thread-blocks containing a grid of threads. Once you understand this, I think it becomes clearer how CUDA organizes stuff. To be short, in CUDA, you should replace loops with a kernel, that is executed multiple times at once.

First off, I am not sure what your actual question is? Do you need a faster random number generator or one with a greater period? In that case I would recommend boost::random, the "Mersenne Twister" is generally considered state of the art. It is a little hard to get started, but boost is a great library so worth the effort.
I think the method you arer using should be fairly efficient. Be aware that it could take up to (#characters)^(length of string) draws to get to the target string (here 70^4 = 24010000). GPU should be at an advantage here since this process is a Monte Carlo simulation and trivially parallelizable.
Have you compiled the code with optimizations?

Related

How to test if a particular C++ statement is faster or slower than other? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
For example when printing a single character like a new line character, which might be faster while using cout in C++, passing as string or as character?
cout << "\n";
Or
cout << '\n';
This video motivated me to write efficient codes.
How would you go about testing such things? Maybe I might want to test other things to see which is faster so it would be helpful to know how I can test these things myself.

In theory, yes, using '\n' instead of "\n" is quite faster when took out the elapsed time of printing 1000 occurrences of the same good-ol new-line:
Remember: A single char possibly cannot be slower than a pointer... since a pointer points to addresses of each char (like a container) and this is why its byte size is not fixed, and a char only has one address and that is itself... of only 1 byte
// Works with C++17 and above...
#include <iostream>
#include <chrono>
template<typename T, typename Duration = std::chrono::milliseconds, typename ...Args>
constexpr static auto TimeElapsedOnOperation(T&& functor, Args&&... arguments)
{
auto const ms = std::chrono::steady_clock::now();
std::invoke(std::forward<decltype(functor)>(functor),
std::forward<Args>(arguments)...);
return std::chrono::duration_cast<std::chrono::
milliseconds>(std::chrono::steady_clock::now() - ms);
}
int main()
{
std::cout << TimeElapsedOnOperation([]
{
for (auto i = 0; i < 1000; i++)
std::cout << "\n";
}).count() << std::endl;
std::cin.get();
std::cout << TimeElapsedOnOperation([]
{
for (auto i = 0; i < 1000; i++)
std::cout << '\n';
}).count() << std::endl;
std::cin.get();
return 0;
}
It gave the following output: (Can occur differently...)
<1000> newlines follow...
2195 milliseconds For the string "\n"
More <1000> newlines follow...
852 milliseconds For the character '\n'
2195 - 852 = 1343 milliseconds
It took 1343 (1.343 seconds) milliseconds longer... So we can take the approximation that it was 61.18% (1343 / 2195 * 100) slower than using just '\n'
This is just an approximation since the performance can differ in other machines...
As to why this happens:
A single character (1 byte) constant is much smaller (in bytes) than a string having a single character since a string is a pointer to char (Points to specified addresses in memory) which takes up more space than a single char in the memory since it is a container (for memory addresses of each character) after all... (i.e, const char*)...
There is a difference how a character and a string is read... A character is directly accessed while the string is iterated and the operations are performed for each individual character pointed and the result is stored back inside the address of the pointer...
A string is always a char array, while a char is safely considered an integer containing the respective numerical value (Extended ASCII, from which different character encodings are branched) of it, a string of 1 character is an array of 1 character (along with its address...), which, in fact, is not equal to a single char...
So maybe (just maybe) you are on the better side of using '\n' instead...
However, some "tricky" compiler may optimize your code from "\n" to '\n' anytime..., so, actually, we never can guess, but still, it is considered good practice to declare a char as a char...

Theoretical considerations only:
A single character can be just printed out as is, a string needs to be iterated over to find the terminating null character.
A single character can be passed and used directly as value; a string is passed by pointer, so the address must be resolved before the character(s) can be used.
So at least, the single character cannot be slower. However, a sufficiently clever compiler might spot the constant one-character string and optimise any difference away (especially if operator<< is inline).
How to test: At very first, you'd be interested in a system that might disturb the test as little as possible (context switches between threads are expensive), so best close any open applications.
A very simple test program might repeatedly use both operators sufficiently often, something like:
for(uint32_t loop = 0; loop < SomeLimit; ++loop)
{
// take timestamp in highest precision possible
for(uint32_t i = 0; i < Iterations; ++i)
{
// single character
}
// calculate difference to timestamp, add to sum for character
// take timestamp in highest precision possible
for(uint32_t i = 0; i < Iterations; ++i)
{
// string
}
// calculate difference to timestamp, add to sum for string
}
Interleaving character and string output might help to get a better average over runtime if OS activities vary during the test, the inner loops should run sufficiently long to get a reasonable time interval for measurement.
The longer the program runs, the more precise the output will be. To prevent overflow, use uint64_t to collect the sums (your program would have to run more than 200000 days even with ns precision to overflow...).

How to improve this random number generator code in c++?

I am C++ student and I am working on creating a random number generator.
Infact I should say my algorithm selects a number within a defined range.
I am writing this just because of my curiosity.
I am not challenging existing library functions.
I always use library functions when writing applications based on randomness but I am again stating that I just want to make it because of my curiosity.
I would also like to know if there is something wrong with my algorithm or with my approach.
Because i googled how PRNGs work and on some sites they said that a mathematical algorithm is there and a predefined series of numbers and a seed just sets the pointer in a different point in the series and after some intervals the sequence repeats itself.
My algorithm just starts moving to and fro in the array of possible values and the seed breaks the loop with different values each time. I don't i this approach is wrong. I got answers suggesting a different algorithm but they didn't explain What's wrong with my current algorithm?
Yes,there was a problem with my seed as it was not precise and made results little predictable as here:-
cout<
<
rn(50,100);
The results in running four times are 74,93,56,79.
See the pattern of "increasing order".
And for large ranges patterns could be seen easily.I got an answer on getting good seeds but that too recommended a new algorithm(but didn't say why?).
An alternative way could be to shuffle my array randomly generating a new sequence every time.And the pattern of increasing order will go off.Any help with that rearranging too will also be good.Here is the code below.And if my function is not possible please notify me.
Thanking you in anticipation.
int rn(int lowerlt, int upperlt)
{
/* Over short ranges, results are satisfactory.
* I want to make it effective for big ranges.
*/
const int size = upperlt - lowerlt; // Constant size of the integer array.
int ar[size]; // Array to store all possible values within defined range.
int i, x, ret; // Variables to control loops and return value.
long pointer = 0; //pointer variable. The one which breaks the main loop.
// Loop to initialize the array with possible values..
for (i=0, x=lowerlt; x <= upperlt; i++, x++)
ar[i]=x;
long seed = time(0);
//Main loop . To find the random number.
for (i=0; pointer <= seed; i++, pointer++)
{
ret = ar[i];
if (i == size-1)
{
// Reverse loop.
for (; i >= 0; i--)
{
ret=ar[i];
}
}
}
return ret;
}

Caveat: From your post, aside from your random generator algorithm, one of your problems is getting a good seed value, so I'll address that part of it.
You could use /dev/random to get a seed value. That would be a great place to start [and would be sufficient on its own], but might be considered "cheating" from some perspective.
So, here are some other sources of "entropy":
Use a higher resolution time of day clock source: gettimeofday or clock_gettime(CLOCK_REALTIME,...) call it "cur_time". Use only the microsecond or nanosecond portion respectively, call it "cur_nano". Note that cur_nano is usually pretty random all by itself.
Do a getpid(2). This has a few unpredictable bits because between invocations other programs are starting and we don't know how many.
Create a new temp file and get the file's inode number [then delete it]. This varies slowly over time. It may be the same on each invocation [or not]
Get the high resolution value for the system's time of day clock when the system was booted, call it "sysboot".
Get the high resolution value for the start time of your "session": When your program's parent shell was started, call it "shell_start".
If you were using Linux, you could compute a checksum of /proc/interrupts as that's always changing. For other systems, get some hash of the number of interrupts of various types [should be available from some type of syscall].
Now, create some hash of all of the above (e.g.):
dev_random * cur_nano * (cur_time - sysboot) * (cur_time - shell_start) *
getpid * inode_number * interrupt_count
That's a simple equation. You could enhance it with some XOR and/or sum operations. Experiment until you get one that works for you.
Note: This only gives you the seed value for your PRNG. You'll have to create your PRNG from something else (e.g. earl's linear algorithm)

unsigned int Random::next() {
s = (1664525 * s + 1013904223);
return s;
}
's' is growing with every call of that function.
Correct is
unsigned int Random::next() {
s = (1664525 * s + 1013904223) % xxxxxx;
return s;
}
Maybe use this function
long long Factor = 279470273LL, Divisor = 4294967291LL;
long long seed;
next()
{
seed = (seed * Factor) % Divisor;
}

C++ Multithreaded prime counter between specified range

#include <math.h>
#include <sstream>
#include <iostream>
#include <mutex>
#include <stdlib.h>
#include <chrono>
#include <thread>
bool isPrime(int number) {
int i;
for (i = 2; i < number; i++) {
if (number % i == 0) {
return false;
}
}
return true;
}
std::mutex myMutex;
int pCnt = 0;
int icounter = 0;
int limit = 0;
int getNext() {
std::lock_guard<std::mutex> guard(myMutex);
icounter++;
return icounter;
}
void primeCnt() {
std::lock_guard<std::mutex> guard(myMutex);
pCnt++;
}
void primes() {
while (getNext() <= limit)
if (isPrime(icounter))
primeCnt();
}
int main(int argc, char *argv[]) {
std::stringstream ss(argv[2]);
int tCount;
ss >> tCount;
std::stringstream ss1(argv[4]);
int lim;
ss1 >> lim;
limit = lim;
auto t1 = std::chrono::high_resolution_clock::now();
std::thread *arr;
arr = new std::thread[tCount];
for (int i = 0; i < tCount; i++)
arr[i] = std::thread(primes);
for (int i = 0; i < tCount; i++)
arr[i].join();
auto t2 = std::chrono::high_resolution_clock::now();
std::cout << "Primes: " << pCnt << std::endl;
std::cout << "Program took: " << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() <<
" milliseconds" << std::endl;
return 0;
}
Hello , im trying to find the amount of prime numbers between the user specified range, i.e., 1-1000000 with a user specified amount of threads to speed up the process, however, it seems to take the same amount of time for any amount of threads compared to one thread. Im not sure if its supposed to be that way or if theres a mistake in my code. thank you in advance!

You don't see performance gain because time spent in isPrime() is much smaller than time which threads take when fighting on mutex.
One possible solution is to use atomic operations, as #The Badger suggested. The other way is to partition your task into smaller ones and distribute them over your thread pool.
For example, if you have n threads, then each thread should test numbers from i*(limit/n) to (i+1)*(limit/n), where i is thread number. This way you wouldn't need to do any synchronization at all and your program would (theoretically) scale linearly.

Multithreaded algorithms work best when threads can do a lot of work on their own.
Imagine doing this in real life: you have a group of 20 humans that will do work for you, and you want them to test whether each number up to 1000 is prime. How will you do this?
Would you hand each person a single number at a time, and ask them to come back to you to tell you if its prime and to receive another number?
Surely not; you would give each person a bunch of numbers to work on at once, and have them come back and tell you how many were prime and to receive another bunch of numbers.
Maybe even you'd divide up the entire set of numbers into 20 groups and tell each person to work on a group. (but then you run the risk of one person being slow and having everyone else sitting idle while you wait for that one person to finish... although there are so-called "work stealing" algorithms, but that's complicated)
The same thing applies here; you want each thread to do a lot of work on its own and keep its own tally, and only have to check back with the centralized information once in a while.

A better solution would be to use the Sieve of Atkin to find the primes (even the Sieve of Eratosthenes which is easier to understand is better), your basic algorithm is very poor to start with. It will for every number n in your interval do n checks in order to determine if it's prime and do this limit times. This means that you're doing about limit*limit/2 checks - that's what we call O(n^2) complexity. The Sieve of Atkins OTOH only have to do O(n) operations to find all primes. If n is large it is hard to beat the algorithm that has fewer steps by performing the steps faster. Trying to fix a poor algorithm by throwing more resources on it is a bad strategy.
Another problem with your implementation is that it has race conditions and therefore is broken to start with. It's often little use in optimizing something unless you first make sure it's working correctly. The problem is in the primes function:
void primes() {
while (getNext() <= limit)
if( isPrime(icounter) )
primeCnt();
}
Between the getNext() and isPrime another thread may have increased the icounter and cause the program to skip candidates. This results in the program giving different result each time. In addition neither icounter nor pCnt is declared volatile so there's actually no guarantee that the value gets to the global storage location as part of the mutex lock.
Since the problem is CPU intensive, that is almost all of the time is spent executing CPU instructions multi threading won't help unless you have multiple CPU's (or cores) which the OS are scheduling threads of the same process on. This means that there is a limit of number of threads (that can be as low as 1 - I fx see only a improvement for two threads, beyond that theres none) where you can expect an improved performance. What happens if you have more threads than cores is that the OS will just let one thread run for a while on a core and then switch the thread an let the next thread execute for a while.
The problem that may arise when scheduling threads on different cores is in addition that each core may have separate cache (which is faster than the shared cache). In effect if two threads are going to access the same memory the separated cache has to be flushed as part of the synchronization of the data involved - this may be time consuming.
That is you have to strive to keep the data that the different threads are working on separate and minimize the frequent use of common variable data. In your example it would mean that you should avoid the global data as much as possible. The counter for example need only be accessed when the counting has finished (to add the threads contribution to the count). Also you could minimize the use of icounter by not reading it for each candidate, but get a bunch of candidates in one go. Something like:
void primes() {
int next;
int count=0;
while( (next = getNext(1000)) <= limit ) {
for( int j = next; j < next+1000 && j <= limit ; j++ ) {
if( isPrime(j) )
count++;
}
}
primeCnt(count);
}
where getNext is the same, but it reserves a number of candidates (by increasing icounter by the supplied count) and primeCnt adds count to pCnt.
Consequently you may end up in a situation where the core runs one thread, then after a while switch to another thread and so on. The result of this is that you will have to run all the code for your problem plus code for switching between the thread. Add that you will probably have more cache hits, then this will probably even be slower.

Perhaps instead of a mutex try to use an atomic integer for the counter. It might speed it up a bit, not sure by how much.
#include <atomic>
std::atomic<uint64_t> pCnt; // Made uint64 for bigger range as #IgnisErus mentioned
std::atomic<uint64_t> icounter;
int getNext() {
return ++icounter; // Pre increment is faster
}
void primeCnt() {
++pCnt;
}
On benchmarking, most of the time the processor need to warm up to get the best performance, so to take the time once is not always a good representation of the actual performance. Try to run the code many times and get an average. You can also try to do some heavy work before you do the calculation (A long for-loop calculating the power of some counter?)
Getting accurate benchmark results is also a topic of interest for me since I do not yet know how to do it.

Generating random word made of letters and digits

Im looking for help with my code. Im doing app, that will be downloading random images from Imgur.com, and I've stuck on their names generator.
This is the code, that I have
char letter;
unsigned short int asciiCode = 0;
std::string imageName = "";
std::ofstream fileToStoreImageNames;
if (!fileToStoreImageNames.is_open())
return -1;
for (auto i = 0; i < 6; i++)
{
/* if getTrueOrFalse()==0 return capitalLetter() if not return smallLetter() */
asciiCode = random.getTrueOrFalse() == 0 ? random.upperCase() : random.lowerCase();
letter = static_cast <char>(asciiCode);
if (imageName.size() > 0)
imageName += letter;
else
imageName = letter;
}
fileToStoreImageNames << imageName << std::endl;
I made some generators, that are returning numbers from defined range(in case of random.upperCase() it is range of 65 to 90), there is 50% chance for upperCase and 50% for lowerCase. Later im converting those numbers by static_cast to char.
For now Im only writing those names to file, and I can see it isnt working as intended. If i just compile this code, It is writing to file something like
bbbbbb
rrrrrr
YYYYYY
vvvvvv
UUUUUU
EEEEEE
rrrrrr
but, when I debug it step by step, it is working as it should be and I get random letters. There is my file after 11 attempts, lines 8 and 11, are result of step by step debugging.

Your program seems to be too fast. If you debug your code time passes and you Random Number Generator (RNG) gets a different time point from your OS. RNGs use the current time to deliver pseudo random numbers.
You can use RNGs from the c++11 standard pseudo random number generation. The RNG object is instanciated once and will provide a different random number in every cycle of the loop.

Instead of seeding your RNGs on every call to the random functions, do it once in the constructor of that class, then store and reuse the generator all over the class member functions. Your problem occurs because if they get seeded with the same timestamp they produce the same results, thus you get the same characters calling the function multiple times within the same second. When using C++11 random library you shouldn't even use a timestamp as seed.

What's the Right Way to use the rand() Function in C++?

I'm doing a book exercise that says to write a program that generates psuedorandom numbers. I started off simple with.
#include "std_lib_facilities.h"
int randint()
{
int random = 0;
random = rand();
return random;
}
int main()
{
char input = 0;
cout << "Press any character and enter to generate a random number." << endl;
while (cin >> input)
cout << randint() << endl;
keep_window_open();
}
I noticed that each time the program was run, there would be the same "random" output. So I looked into random number generators and decided to try seeding by including this first in randint().
srand(5355);
Which just generated the same number over and over (I feel stupid now for implementing it.)
So I thought I'd be clever and implement the seed like this.
srand(rand());
This basically just did the same as the program did in the first place but outputted a different set of numbers (which makes sense since the first number generated by rand() is always 41.)
The only thing I could think of to make this more random is to:
Have the user input a number and set that as the seed (which would be easy to implement, but this is a last resort)
OR
Somehow have the seed be set to the computer clock or some other constantly changing number.
Am I in over my head and should I stop now? Is option 2 difficult to implement? Any other ideas?
Thanks in advance.

Option 2 isn't difficult, here you go:
srand(time(NULL));
you'll need to include stdlib.h for srand() and time.h for time().

srand() should only be used once:
int randint()
{
int random = rand();
return random;
}
int main()
{
// To get a unique sequence the random number generator should only be
// seeded once during the life of the application.
// As long as you don't try and start the application mulitple times a second
// you can use time() to get a ever changing seed point that only repeats every
// 60 or so years (assuming 32 bit clock).
srand(time(NULL));
// Comment the above line out if you need to debug with deterministic behavior.
char input = 0;
cout << "Press any character and enter to generate a random number." << endl;
while (cin >> input)
{
cout << randint() << endl;
}
keep_window_open();
}

It is common to seed the random number generator with the current time. Try:
srand(time(NULL));

The problem is that if you don't seed the generator it will seed itself with 0 (as if srand(0) were called). PRNGs are designed to generate the same sequence when seeded the same (due to the fact that PNRGs are not really random, they're deterministic algorithms and maybe a bit because it's quite useful for testing).
When you're trying to seed it with a random number using
srand(rand());
you're in effect doing:
srand(0);
x = rand(); // x will always be the same.
srand(x);
As FigBug mentioned, using the time to seed the generator is commonly used.

I think that the point of these articles is to have a go at implementing the algorithm that is in rand() not how to seed it effectively.
producing (pseudo) random numbers is non trivial and is worth investigating different techniques of generating them. I don't think that simply using rand() is what the authors had in mind.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js