Extremely slow random string generator - c++

I came up with the below code to generate 100001 random strings.the strings should be unique. However, the below code takes hours to do the job. Can someone let me know how i can optimize it and why is it so slow?
string getRandomString(int length) {
static string charset = "abcdefghijklmnopqrstuvwxyz";
string result;
result.resize(length);
for (int i = 0; i < length; i++) {
result[i] = charset[rand() % charset.length()];
}
return result;
}
void main(){
srand(time(NULL));
vector<string> storeUnigrams;
int numUnigram = 100001;
string temp = "";
int minLen = 3;
int maxLen = 26;
int range = maxLen - minLen + 1;
int i =0;
while(i < numUnigram){
int lenOfRanString = rand()%range + minLen;
temp = getRandomString(lenOfRanString);
bool doesithave = false;
for(int j =0 ; j < storeUnigrams.size() ; j++){
if(temp.compare(storeUnigrams[j]) == 0){
doesithave = true;
break;
}
if(temp.compare(storeUnigrams[j]) < 0){
break;
}
}
if(!doesithave){
storeUnigrams.push_back(temp);
sort(storeUnigrams.begin(),storeUnigrams.end());
i++;
}
}

There are two factors that make your code slow:
Checking by linear search whether the string already exists – O(n)
Sorting the vector in each iteration – O(n log n)
Use e.g. a set for storing the strings – it's sorted automatically, and checking for existence is fast:
int main(){
srand(time(NULL));
set<string> storeUnigrams;
int numUnigram = 100001;
int minLen = 3;
int maxLen = 26;
int range = maxLen - minLen + 1;
while(storeUnigrams.size() < numUnigram){
int lenOfRanString = rand()%range + minLen;
storeUnigrams.insert(getRandomString(lenOfRanString));
}
}

This code generates a unique random number only once and stores it in random_once[i].
The first for loop generates ad stores the random number.
The second for loop is used to get the pre-rendered random numbers stored in the random_once[i] array.
Yes generating 100001 random numbers will take hours if not days.
#include <ctime>
#include <iostream>
using namespace std;
int main()
{
int numUnigram = 3001;
int size=numUnigram;
int random_once[100001];
cout<<"Please wait: Generatng "<<numUnigram<<" random numbers ";
std::cout << '-' << std::flush;
srand(time(0));
for (int i=0;i<size;i++)
{
//This code generates a unique random number only once
//and stores it in random_once[i]
random_once[i]=rand() % size;
for(int j=0;j<i;j++) if (random_once[j]==random_once[i]) i--;
//loading animation
std::cout << "\b\\" << std::flush;
std::cout << "\b|" << std::flush;
std::cout << "\b/" << std::flush;
std::cout << "\b-" << std::flush;
}
cout<<" \n";
// this code dispays unique random numbers stored in random_once[i]
for ( i=0;i<size;i++) cout<<" "<<random_once[i]<<"\t";
cout<<" \n";
return 0;
}

Philipp answer is fine. Another approach would be to use a Self-balancing Binary Search Tree like Red Black Tree instead of Vector. You can perform search and insets in log(n) time. If search is empty, insert the element.

Define your variables outside the while loop - because they are getting redefined at each iteration
int lenOfRanString = rand()%range + minLen; ;
bool doesithave = false;
Update
Thought it's advised in many books, in practice with all the new compilers, this will not significantly improve the performance

Use char arrays instead of strings (the string class does a lot of stuff behind the scenes)

Related

C++: How to Generate All Combinations of a Vector of Digits of Length N, disregarding order?

So I need to combine a specific number (not string) of digits from a vector of possible ones (all 0-9, not characters) in an N-digit number (not binary, then). However, I cannot have any extra permutations appear: for example 1234, 4321, 3124... are now the same and cannot be all outputed. Only one can be. This is hard because other questions cover these permutions by using std::next_permutation, but I still need the different combinations. My attempts at trying to remove permutations have failed, so how do you do this? Here is my current code with comments:
#include <iostream>
#include <vector>
using namespace std;
#define ll long long
int n = 0, m = 0, temp; //n is number of available digits
//m is the length of the desired numbers
//temp is used to cin
vector <int> given;
//vector of digits that can be used
vector <int> num;
//the vector to contain a created valid number
void generate(vector <int> vec, int m) {
//recursive function to generate all numbers
if (m == 0) {
for (int x : num) {
cout << x;
}
cout << '\n';
return;
}
for (int i = 0; i < given.size(); i++) {
num.push_back(given[i]); //add digit to number
int save = given[i];
given.erase(given.begin() + i);
//no repeating digits, save the used one and delete
//however, permutations can still pass, which is undesirable
generate(vec, m - 1);
//recursive
num.pop_back();
//undo move
given.insert(given.begin() + i, save);
//redo insert deleted digit
}
}
int main () {
cin >> n;
//input number of available digits
for (int i = 0; i < n; i++) {
cin >> temp;
given.push_back(temp); //input digits
}
cin >> m;
//input length of valid numbers
generate(given, m); //start recursive generation function
return 0;
}
I tried deleting permutations before printing them and erasing more digits to stop generating permutations, but they all failed. Lots of other questions still used std::next_permutation, which was not helpful.
Unlike some people who suggested binary strings in some comments, you can begin by having a recursive function that can go two ways as an on/off switch to decide whether or not to include a given digit. I personally like using a recursive function to do this and a check for length at the end to actually print the number of the desired len, demonstrated in the code below:
#include <iostream>
#include <vector>
#include <string>
using namespace std;
int givencount = 0, temp = 0, len = 0;
vector <int> given;
string creatednum;
void generate(int m) {
if (m == givencount) {
if (creatednum.length() == len) {
cout << creatednum << '\n';
}
return;
}
for (int i = 0; i < 2; i++) {
if (i == 1) {
generate(m + 1);
continue;
}
creatednum = creatednum + ((char) ('0' + given[m]));
generate(m + 1);
creatednum.erase(creatednum.begin() + creatednum.length() - 1);
}
}
int main () {
cin >> givencount;
for (int i = 0; i < givencount; i++) {
cin >> temp;
given.push_back(temp);
}
cin >> len;
generate(0);
return 0;
}

why won't my array print correctly (bubble sort)

This program is supposed to print out the most popular ramen flavors based on the highest amount of bowls bought to lowest .
However If i randomly input amount of bowls sold as the following
(1 sold -for first flavor in the array)
( 2 sold - for second flavor in the array)
(3 sold- for third flavor in the array)
(4 sold-for fourth flavor in the array )
output
chicken 4
__ 3
__ 2
__ 1
but if I assign the amount sold in descending order the program works
I would really appreciate your feedback
#include <iostream>
#include <string>
using namespace std;
int main ()
{
string flavor[]={"fish","lamp","steak" ,"chicken"} ;
int scoops[100]={};
int sum=0;
int x=0;
for(x=0;x<4;x++)
{
cout <<"enter amount of bowls for the following ramen flavor :"<<flavor[x] <<endl;
cin>>scoops[x];
sum=scoops[x]+sum;
}
cout <<"total number of bowls is "<<sum<<endl;
cout <<"list of the most popular flavors to least popular flavors "<<endl;//bubble sort
int i=0,j=0,temp,char tempf;
if(scoops[j]<scoops[j+1])
{
temp=scoops[j];
scoops[j]=scoops[j+1];
flavor[j]=flavor[j+1];
scoops[j+1]=temp;
flavor[j+1]=tempf;
}
for (int a=0;a<4;a++)
{
cout <<flavor[a] <<"\t"<<scoops[a]<<endl;
}
}
You could implement bubble sort in your scenario like so
int i = 0;
bool is_sorted = true;
int number_of_scoop_records = 4;
// We keep looping over the array until all the elements are sorted
while(true) {
if(i >= (number_of_scoop_records-1)) {
// All elements sorted, nothing to do anymore
if(is_sorted)
break;
// Lets go around again
i = 0;
is_sorted = true;
continue;
}
// Unsorted elements found
if(scoops[i+1] < scoops[i]) {
is_sorted = false;
std::swap(scoops[i+1], scoops[i]);
}
i++;
}
I think that you should iterate around the scoops[] array, check it the value and use the swap() function's that STL::algorithm provide us.
int length = sizeof(flavor)/sizeof(flavor[0]);
for (int i = 0; i < length-1; ++i)
{
for (int j = i+1; j < length; ++j)
{
if (scoops[i] > scoops[j])
{
swap(flavor[i], flavor[j]);
}
}
}

Ambiguous errors when trying to compile C++ code

When trying to compile the code below I get three errors.
'iterator_category': is not a member of any direct or indirect base class of 'std::iterator_traits<_InIt>'
'_Iter_cat_t' : Failed to specialize alias template
type 'unknown-type' unexpected
I'm quite new to C++ and have gone over the code many times changing snippets but nothing helps. Any help deciphering these error messages is much appreciated.
#include "../../std_lib_facilities.h"
class Puzzle {
public:
vector<char> letters;
Puzzle(int my_size);
void generate(void);
void enter_letters(void);
void feedback(Puzzle puzzle);
private:
int size = 4;
};
Puzzle::Puzzle(int my_size)
{
size = my_size;
}
//Generate size unique letters for the letters array.
void Puzzle::generate(void)
{
for (int i = 0; i < size; ++i) {
char rand = randint(26) - 1 + 'a';
while ((find(letters[0], letters[size], rand) != letters[size])) {
rand = randint(26) - 1 + 'a';
}
letters[i] = rand;
}
}
//Let the user enter size unique letters.
void Puzzle::enter_letters(void)
{
cout << "Enter four different letters seperated by spaces:\n";
for (int i = 0; i < size; ++i) {
char letter;
cin >> letter;
letters[i] = letter;
}
}
//Tell the user how many bulls and cows they got.
void Puzzle::feedback(Puzzle puzzle)
{
int cows = 0, bulls = 0;
for (int i = 0; i < size; i++) { //input
for (int j = 0; j < size; ++j) { //puzzle
if (i == j && letters[i] == puzzle.letters[j]) {
++bulls;
break;
}
else if (letters[i] == puzzle.letters[j]) {
++cows;
break;
}
}
}
cout << "Bulls: " << bulls << "\nCows: " << cows << "\n";
}
//Seed the random function.
void seed(void)
{
int sum = 0;
cout << "Enter a random string of characters:\n";
string str;
cin >> str;
for (char& c : str)
sum += c;
srand(sum);
}
int main()
{
constexpr int GAME_SIZE = 4;
seed();
Puzzle puzzle(GAME_SIZE);
puzzle.generate();
Puzzle input(GAME_SIZE);
input.enter_letters();
while (puzzle.letters != input.letters) {
input.feedback(puzzle);
input.enter_letters();
}
cout << "Congragulations, you did it!\n";
keep_window_open();
return 0;
}
You're using find() wrong.
find(letters[0], letters[size], rand)
Your letters is a std::vector. You're passing the first value in the vector, and the last value in the vector, to std::find.
This is actually undefined behavior, since size is not always the actual size of your letters vector. So, you'll be getting a random crash, as an extra bonus here in addition to your compilation error.
The first two parameters to std::find are iterators of a sequence to search, and not values.
This should be:
find(letters.begin(), letters.end(), rand)
Also, your overall algorithm is broken. Once letters reaches a certain size, your random number generating code will take ... a significant time to find some new letter to add to letters, that's not used already. Once letters manages to acquire all 26 characters of the alphabet, this will turn into an infinite loop. But that would be a different question...

Finding most common k-mers and their number of apperance in C++

Regarding kmer https://en.wikipedia.org/wiki/K-mer
I am trying to find most frequent k-mers in a large fastq file. My plan was to use misra-gries algorithm to find most frequent k-mers, then searching each frequent k-mer's count in file with a second pass. Yet I don't think my algorithm is efficient enough. Here is my first draft below. I try to be memory efficient as possible.(program must not run out of memory)
I also found this DSK algorithm, yet this one is too hard for me to understand without seeing a simple implementation. http://minia.genouest.org/dsk/
Note: Also ID of each counter will be integers not strings, I am going to change it later in my final draft.
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
using namespace std;
struct node {
string id;
int count;
};
void searchCount(vector <node>&memory, string line,int k) {
int count = 0;
string kMer;
for (int i = 0; i < memory.size(); i++) {
if (memory[i].id != "") {
for (int j = 0; j < line.length() - k + 1; j++) {
kMer = line.substr(j, k);
if (kMer == memory[i].id) {
count++;
}
}
}
memory[i].count = count;
count = 0;
}
}
int doWeHaveSpace(vector <node> memory) {
for (int i = 0; i < memory.size(); i++) {
if (memory[i].id == "") {
return i;
}
}
return -1;
}
void MisraGries(string element, vector <node> &memory) {
bool isHere = false;
int index;
for (int i = 0; i < memory.size(); i++) {
if (memory[i].id == element) {
isHere = true;
index = i;
}
}
if (isHere) {
memory[index].count++;
}
else {
int freeIndex = doWeHaveSpace(memory);
if (freeIndex+1) {
memory[freeIndex].count++;
memory[freeIndex].id = element;
}
else {
for (int i = 0; i < memory.size(); i++) {
if (memory[i].count != 0) {
memory[i].count--;
if (memory[i].count == 0) {
memory[i].id = "";
}
}
}
}
}
}
void filecheck(ifstream & input, string prompt) // this function checks and opens input files
{
string filename;
cout << "Please enter file directory and name for " << prompt << ": ";
do
{
getline(cin, filename);
input.open(filename.c_str());
if (input.fail())
cout << " wrong file directory. Please enter real directory. ";
} while (input.fail());
}
int main() {
int line = 1;
string filename;
ifstream input;
ofstream output;
string search;
vector <node> frequent(1000);
for (int i = 0; i < frequent.size(); i++) {
frequent[i].id = "";
frequent[i].count = 0;
}
int k = 30;
string kMer;
filecheck(input, "input file");
while (!input.eof())
{
getline(input, search); // it gets infos line by line to count lines
line++;
if (line == 3) {
for (int i = 0; i < search.length() - k + 1; i++) {
kMer = search.substr(i, k);
MisraGries(kMer, frequent);
}
line = -1;
}
}
return 0;
}
You can speed up your code by storing the most frequent k-mers in a hash table instead of an array. This way, you'll be able to process one k-mer in O(1) time (assuming that the length is constant) if it's already in the cache (if it's not, it would still require a linear pass, but it might give a big improvement on average).
You could also make it even faster if there're a lot of misses by keeping additional information in some kind of auxiliary data structure (like a priority queue) so that you can find the element with count = 0 and remove them without checking all other elements.
Taking into account that k is pretty small in your example, you could increase the size of your in-memory cache (a typical computer should easily keep a few millions of such strings in memory) so that there're less misses.
You could store even more data during the first pass by hashing k-mers (this way, you'll just need to store integers in memory instead of strings).
To sum it up, I'll recommend to make the cache larger (as long as it fits into memory) and use a more suitable data structure that supports fast lookups, like a hash table (std::unordered_map in C++).

Algorithm to print asterisks for duplicate characters [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I was asked this question in an interview:
Given an array with the input string, display the output as shown below
Input
INDIA
Output
INDA
****
*
I iterated through the array and stored each character as a key in std::map with value as number of occurrence. Later I iterate the map and print the asteriks and reduce the value in the map for each character.
Initially, I was asked not to use any library. I gave a solution which needed lot of iterations. For every character, iterate the complete array till the index to find previous occurrences and so on.
Is there any better way, e.g. better complexity, such as faster operation, by which this can be achieved?
Essentially what you are asking is how to implement map without using the STL code, as using some kind of data structure which replicates the basic functionality of map is pretty much the most reasonable way of solving this problem.
There are a number of ways of doing this. If your keys (here the possible characters) come from a very large set where most elements of the set don't appear (such as the full Unicode character set), you would probably want to use either a tree or a hash table. Both of these data structures are very important with lots of variations and different ways of implementing them. There is lots of information and example code about the two structures around.
As #PeterG said in a comment, if the only characters you are going to see are from a set of 256 8-bit chars (eg ASCII or similar), or some other limited collection like the upper-case alphabet you should just use an array of 256 ints and store a count for each char in that.
here is another one:
You can see it working HERE
#include <stdio.h>
int main()
{
int i,j=0,f=1;
char input[50]={'I','N','D','I','A','N','A','N'};
char letters[256]={0};
int counter[256]={0};
for(i=0;i<50;i++)
{
if(input[i])
counter[input[i]]++;
if(counter[input[i]]==1)
{
putchar(input[i]);
letters[j]=input[i];
j++;
}
}
putchar('\n');
while(f)
{
f=0;
for(i=0;i<j;i++)
if(counter[letters[i]])
{
putchar('*');
counter[letters[i]]--;
f=1;
}
else
{
putchar(' ');
}
putchar('\n');
}
return 0;
}
If the alphabet under consideration is fixed, it can be done in two passes:
Create an integer array A with the size of the alphabet, initialized with all zeros.
Create a boolean array B with size of the input, initialize with all false.
Iterate the input; increase for every character the corresponding content of A.
Iterate the input; output a character if its value it B is false and set its value in B to true. Finally, output a carriage return.
Reset B.
Iterate input as in 4., but print a star if if the character's count in A is positive, then decrease this count; print a space otherwise.
Output a carriage return; loop to 5 as long as there are any stars in the output generated.
This is interesting. You shouldnt use a stl::map because that is not a hashmap. An stl map is a binary tree. An unordered_map is actually a hash map. In this case we dont need either. We can use a simple array for char counts.
void printAstr(std::string str){
int array[256] ;// assumining it is an ascii string
memset(array, 0, sizeof(array));
int astrCount = 0;
for(int i = 0; i < str.length()-1; i++){
array[(int) str[i]]++;
if(array[(int) str[i]] > 1) astrCount++;
}
std::cout << str << std::endl;
for(int i = 0; i < str.length()-1;i++) std::cout << "* ";
std::cout << std::endl;
while(astrCount != 0){
for(int i= 0; i< str.length() - 1;i++){
if(array[(int) str[i]] > 1){
std::cout << "* ";
array[(int) str[i]]--;
astrCount--;
}else{
std::cout << " ";
}
}
std::cout << std::endl;
}
}
pretty simple just add all values to the array, then print them out the number of times you seem them.
EDIT: sorry just made some logic changes. This works now.
The following code works correctly. I am assuming that you can't use std::string and take note that this doesn't take overflowing into account since I didn't use dynamic containers. This also assumes that the characters can be represented with a char.
#include <iostream>
int main()
{
char input[100];
unsigned int input_length = 0;
char letters[100];
unsigned int num_of_letters = 0;
std::cin >> input;
while (input[input_length] != '\0')
{
input_length += 1;
}
//This array acts like a hash map.
unsigned int occurrences[256] = {0};
unsigned int max_occurrences = 1;
for (int i = 0; i < input_length; ++i)
{
if ((occurrences[static_cast<unsigned char>(input[i])] += 1) == 1)
{
std::cout<< " " << (letters[num_of_letters] = input[i]) << " ";
num_of_letters += 1;
}
if (occurrences[static_cast<unsigned char>(input[i])] > max_occurrences)
{
max_occurrences = occurrences[static_cast<unsigned char>(input[i])];
}
}
std::cout << std::endl;
for (int row = 1; row <= max_occurrences; ++row)
{
for (int i = 0; i < num_of_letters; ++i)
{
if (occurrences[static_cast<unsigned char>(letters[i])] >= row)
{
std::cout << " * ";
}
else
{
std::cout << " ";
}
}
std::cout << std::endl;
}
return 0;
}
The question is marked as c++ but It seems to me that the answers not are all quite C++'ish, but could be quite difficult to achieve a good C++ code with a weird requirement like "not to use any library". In my approach I've used some cool C++11 features like in-class initialization or nullptr, here is the live demo and below the code:
struct letter_count
{
char letter = '\0';
int count = 0;
};
int add(letter_count *begin, letter_count *end, char letter)
{
while (begin != end)
{
if (begin->letter == letter)
{
return ++begin->count;
}
else if (begin->letter == '\0')
{
std::cout << letter; // Print the first appearance of each char
++begin->letter = letter;
return ++begin->count;
}
++begin;
}
return 0;
}
int max (int a, int b)
{
return a > b ? a : b;
}
letter_count *buffer = nullptr;
auto testString = "supergalifragilisticoespialidoso";
int len = 0, index = 0, greater = 0;
while (testString[index++])
++len;
buffer = new letter_count[len];
for (index = 0; index < len; ++index)
greater = max(add(buffer, buffer + len, testString[index]), greater);
std::cout << '\n';
for (int count = 0; count < greater; ++count)
{
for (index = 0; buffer[index].letter && index < len; ++index)
std::cout << (count < buffer[index].count ? '*' : ' ');
std::cout << '\n';
}
delete [] buffer;
Since "no libraries are allowed" (except for <iostream>?) I've avoided the use of std::pair<char, int> (which could have been the letter_count struct) and we have to code many utilities (such as max and strlen); the output of the program avobe is:
supergaliftcod
**************
* ******* *
* *** *
* *
*
*
My general solution would be to traverse the word and replace repeated characters with an unused nonsense character. A simple example is below, where I used an exclamation point (!) for the nonsense character (the input could be more robust, some character that is not easily typed, disallowing the nonsense character in the answer, error checking, etc). After traversal, the final step would be removing the nonsense character. The problem is keeping track of the asterisks while retaining the original positions they imply. For that I used a temp string to save the letters and a process string to create the final output string and the asterisks.
#include <iostream>
#include <string>
using namespace std;
int
main ()
{
string input = "";
string tempstring = "";
string process = "";
string output = "";
bool test = false;
cout << "Enter your word below: " << endl;
cin >> input;
for (unsigned int i = 0; i < input.length (); i++)
{ //for the traversed letter, traverse through subsequent letters
for (unsigned int z = i + 1; z < input.length (); z++)
{
//avoid analyzing nonsense characters
if (input[i] != '!')
{
if (input[i] == input[z])
{ //matched letter; replace with nonsense character
input[z] = '!';
test = true; //for string management later
}
}
}
if (test)
{
tempstring += input[i];
input[i] = '*';
test = false; //reset bool for subsequent loops
}
}
//remove garbage symbols and save to a processing string
for (unsigned int i = 0; i < input.size (); i++)
if (input[i] != '!')
process += input[i];
//create the modified output string
unsigned int temp = 0;
for (unsigned int i = 0; i < process.size (); i++)
if (process[i] == '*')
{ //replace asterisks with letters stored in tempstring
output += tempstring[temp];
temp++;
}
else
output += process[i];
//output word with no repeated letters
cout << output << endl;
//output asterisks equal to output.length
for (unsigned int a = 0; a < output.length (); a++)
cout << "*";
cout << endl;
//output asterisks for the letter instances removed
for (unsigned int i = 0; i < process.size (); i++)
if (process[i] != '*')
process[i] = ' ';
cout << process << endl << endl;
}
Sample output I received by running the code:
Enter your word below:
INDIA
INDA
****
*
Enter your word below:
abcdefgabchijklmnop
abcdefghijklmnop
****************
***
It is possible just using simple array to keep count of values.
#include<iostream>
#include<string>
using namespace std;
int main(){
string s;
char arr[10000];
cin>>s;
int count1[256]={0},count2[256]={0};
for(int i=0;i<s.size();++i){
count1[s[i]]++;
count2[s[i]]++;
}
long max=-1;
int j=0;
for(int i=0;i<s.size();++i){
if(count1[s[i]]==count2[s[i]]){ //check if not printing duplicate
cout<<s[i];
arr[j++]=s[i];
}
if(count2[s[i]]>max)
max=count2[s[i]];
--count1[s[i]];
}
cout<<endl;
for(int i =1; i<=max;++i){
for(int k=0;k<j;++k){
if(count2[arr[k]]){
cout<<"*";
count2[arr[k]]--;
}
else
cout<<" ";
}
cout<<endl;
}
}