Finding most common k-mers and their number of apperance in C++ - c++

Regarding kmer https://en.wikipedia.org/wiki/K-mer
I am trying to find most frequent k-mers in a large fastq file. My plan was to use misra-gries algorithm to find most frequent k-mers, then searching each frequent k-mer's count in file with a second pass. Yet I don't think my algorithm is efficient enough. Here is my first draft below. I try to be memory efficient as possible.(program must not run out of memory)
I also found this DSK algorithm, yet this one is too hard for me to understand without seeing a simple implementation. http://minia.genouest.org/dsk/
Note: Also ID of each counter will be integers not strings, I am going to change it later in my final draft.
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
using namespace std;
struct node {
string id;
int count;
};
void searchCount(vector <node>&memory, string line,int k) {
int count = 0;
string kMer;
for (int i = 0; i < memory.size(); i++) {
if (memory[i].id != "") {
for (int j = 0; j < line.length() - k + 1; j++) {
kMer = line.substr(j, k);
if (kMer == memory[i].id) {
count++;
}
}
}
memory[i].count = count;
count = 0;
}
}
int doWeHaveSpace(vector <node> memory) {
for (int i = 0; i < memory.size(); i++) {
if (memory[i].id == "") {
return i;
}
}
return -1;
}
void MisraGries(string element, vector <node> &memory) {
bool isHere = false;
int index;
for (int i = 0; i < memory.size(); i++) {
if (memory[i].id == element) {
isHere = true;
index = i;
}
}
if (isHere) {
memory[index].count++;
}
else {
int freeIndex = doWeHaveSpace(memory);
if (freeIndex+1) {
memory[freeIndex].count++;
memory[freeIndex].id = element;
}
else {
for (int i = 0; i < memory.size(); i++) {
if (memory[i].count != 0) {
memory[i].count--;
if (memory[i].count == 0) {
memory[i].id = "";
}
}
}
}
}
}
void filecheck(ifstream & input, string prompt) // this function checks and opens input files
{
string filename;
cout << "Please enter file directory and name for " << prompt << ": ";
do
{
getline(cin, filename);
input.open(filename.c_str());
if (input.fail())
cout << " wrong file directory. Please enter real directory. ";
} while (input.fail());
}
int main() {
int line = 1;
string filename;
ifstream input;
ofstream output;
string search;
vector <node> frequent(1000);
for (int i = 0; i < frequent.size(); i++) {
frequent[i].id = "";
frequent[i].count = 0;
}
int k = 30;
string kMer;
filecheck(input, "input file");
while (!input.eof())
{
getline(input, search); // it gets infos line by line to count lines
line++;
if (line == 3) {
for (int i = 0; i < search.length() - k + 1; i++) {
kMer = search.substr(i, k);
MisraGries(kMer, frequent);
}
line = -1;
}
}
return 0;
}

You can speed up your code by storing the most frequent k-mers in a hash table instead of an array. This way, you'll be able to process one k-mer in O(1) time (assuming that the length is constant) if it's already in the cache (if it's not, it would still require a linear pass, but it might give a big improvement on average).
You could also make it even faster if there're a lot of misses by keeping additional information in some kind of auxiliary data structure (like a priority queue) so that you can find the element with count = 0 and remove them without checking all other elements.
Taking into account that k is pretty small in your example, you could increase the size of your in-memory cache (a typical computer should easily keep a few millions of such strings in memory) so that there're less misses.
You could store even more data during the first pass by hashing k-mers (this way, you'll just need to store integers in memory instead of strings).
To sum it up, I'll recommend to make the cache larger (as long as it fits into memory) and use a more suitable data structure that supports fast lookups, like a hash table (std::unordered_map in C++).

Related

Function not printing any solutions

So, I need to make a function that is going to return the chromatic number of a graph. The graph is given through an adjecency matrix that the function finds using a file name. I have a function that should in theory work and which the compiler is throwing no issues for, yet when I run it, it simply prints out an empty line and ends the program.
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
using namespace std;
int Find_Chromatic_Number (vector <vector <int>> matg, int matc[], int n) {
if (n == 0) {
return 0;
}
int result, i, j;
result = 0;
for (i = 0; i < n; i++) {
for (j = i; j < n; j++) {
if (matg[i][j] == 1) {
if (matc[i] == matc[j]) {
matc[j]++;
}
}
}
}
for (i = 0; i < n; i++) {
if (result < matc[i]) {
result = matc[i];
}
}
return result;
}
int main() {
string file;
int n, i, j, m;
cout << "unesite ime datoteke: " << endl;
cin >> file;
ifstream reader;
reader.open(file.c_str());
reader >> n;
vector<vector<int>> matg(n, vector<int>(0));
int matc[n];
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
reader >> matg[i][j];
}
matc[i] = 1;
}
int result = Find_Chromatic_Number(matg, matc, n);
cout << result << endl;
return 0;
}
The program is supposed to use an freader to convert the file into a 2D vector which represents the adjecency matrix (matg). I also made an array (matc) which represents the value of each vertice, with different numbers corresponding to different colors.
The function should go through the vector and every time there is an edge between two vertices it should check if their color value in matc is the same. If it is, it ups the second vale (j) by one. After the function has passed through the vector, the matc array should contain n different number with the highest number being the chromatic number I am looking for.
I hope I have explained enough of what I am trying to accomplish, if not just ask and I will add any further explanations.
Try to make it like that.
Don't choose a size for your vector
vector<vector<int> > matg;
And instead of using reader >> matg[i][j];
use:
int tmp;
reader >> tmp;
matg[i].push_back(tmp);

Algorithm to print asterisks for duplicate characters [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I was asked this question in an interview:
Given an array with the input string, display the output as shown below
Input
INDIA
Output
INDA
****
*
I iterated through the array and stored each character as a key in std::map with value as number of occurrence. Later I iterate the map and print the asteriks and reduce the value in the map for each character.
Initially, I was asked not to use any library. I gave a solution which needed lot of iterations. For every character, iterate the complete array till the index to find previous occurrences and so on.
Is there any better way, e.g. better complexity, such as faster operation, by which this can be achieved?
Essentially what you are asking is how to implement map without using the STL code, as using some kind of data structure which replicates the basic functionality of map is pretty much the most reasonable way of solving this problem.
There are a number of ways of doing this. If your keys (here the possible characters) come from a very large set where most elements of the set don't appear (such as the full Unicode character set), you would probably want to use either a tree or a hash table. Both of these data structures are very important with lots of variations and different ways of implementing them. There is lots of information and example code about the two structures around.
As #PeterG said in a comment, if the only characters you are going to see are from a set of 256 8-bit chars (eg ASCII or similar), or some other limited collection like the upper-case alphabet you should just use an array of 256 ints and store a count for each char in that.
here is another one:
You can see it working HERE
#include <stdio.h>
int main()
{
int i,j=0,f=1;
char input[50]={'I','N','D','I','A','N','A','N'};
char letters[256]={0};
int counter[256]={0};
for(i=0;i<50;i++)
{
if(input[i])
counter[input[i]]++;
if(counter[input[i]]==1)
{
putchar(input[i]);
letters[j]=input[i];
j++;
}
}
putchar('\n');
while(f)
{
f=0;
for(i=0;i<j;i++)
if(counter[letters[i]])
{
putchar('*');
counter[letters[i]]--;
f=1;
}
else
{
putchar(' ');
}
putchar('\n');
}
return 0;
}
If the alphabet under consideration is fixed, it can be done in two passes:
Create an integer array A with the size of the alphabet, initialized with all zeros.
Create a boolean array B with size of the input, initialize with all false.
Iterate the input; increase for every character the corresponding content of A.
Iterate the input; output a character if its value it B is false and set its value in B to true. Finally, output a carriage return.
Reset B.
Iterate input as in 4., but print a star if if the character's count in A is positive, then decrease this count; print a space otherwise.
Output a carriage return; loop to 5 as long as there are any stars in the output generated.
This is interesting. You shouldnt use a stl::map because that is not a hashmap. An stl map is a binary tree. An unordered_map is actually a hash map. In this case we dont need either. We can use a simple array for char counts.
void printAstr(std::string str){
int array[256] ;// assumining it is an ascii string
memset(array, 0, sizeof(array));
int astrCount = 0;
for(int i = 0; i < str.length()-1; i++){
array[(int) str[i]]++;
if(array[(int) str[i]] > 1) astrCount++;
}
std::cout << str << std::endl;
for(int i = 0; i < str.length()-1;i++) std::cout << "* ";
std::cout << std::endl;
while(astrCount != 0){
for(int i= 0; i< str.length() - 1;i++){
if(array[(int) str[i]] > 1){
std::cout << "* ";
array[(int) str[i]]--;
astrCount--;
}else{
std::cout << " ";
}
}
std::cout << std::endl;
}
}
pretty simple just add all values to the array, then print them out the number of times you seem them.
EDIT: sorry just made some logic changes. This works now.
The following code works correctly. I am assuming that you can't use std::string and take note that this doesn't take overflowing into account since I didn't use dynamic containers. This also assumes that the characters can be represented with a char.
#include <iostream>
int main()
{
char input[100];
unsigned int input_length = 0;
char letters[100];
unsigned int num_of_letters = 0;
std::cin >> input;
while (input[input_length] != '\0')
{
input_length += 1;
}
//This array acts like a hash map.
unsigned int occurrences[256] = {0};
unsigned int max_occurrences = 1;
for (int i = 0; i < input_length; ++i)
{
if ((occurrences[static_cast<unsigned char>(input[i])] += 1) == 1)
{
std::cout<< " " << (letters[num_of_letters] = input[i]) << " ";
num_of_letters += 1;
}
if (occurrences[static_cast<unsigned char>(input[i])] > max_occurrences)
{
max_occurrences = occurrences[static_cast<unsigned char>(input[i])];
}
}
std::cout << std::endl;
for (int row = 1; row <= max_occurrences; ++row)
{
for (int i = 0; i < num_of_letters; ++i)
{
if (occurrences[static_cast<unsigned char>(letters[i])] >= row)
{
std::cout << " * ";
}
else
{
std::cout << " ";
}
}
std::cout << std::endl;
}
return 0;
}
The question is marked as c++ but It seems to me that the answers not are all quite C++'ish, but could be quite difficult to achieve a good C++ code with a weird requirement like "not to use any library". In my approach I've used some cool C++11 features like in-class initialization or nullptr, here is the live demo and below the code:
struct letter_count
{
char letter = '\0';
int count = 0;
};
int add(letter_count *begin, letter_count *end, char letter)
{
while (begin != end)
{
if (begin->letter == letter)
{
return ++begin->count;
}
else if (begin->letter == '\0')
{
std::cout << letter; // Print the first appearance of each char
++begin->letter = letter;
return ++begin->count;
}
++begin;
}
return 0;
}
int max (int a, int b)
{
return a > b ? a : b;
}
letter_count *buffer = nullptr;
auto testString = "supergalifragilisticoespialidoso";
int len = 0, index = 0, greater = 0;
while (testString[index++])
++len;
buffer = new letter_count[len];
for (index = 0; index < len; ++index)
greater = max(add(buffer, buffer + len, testString[index]), greater);
std::cout << '\n';
for (int count = 0; count < greater; ++count)
{
for (index = 0; buffer[index].letter && index < len; ++index)
std::cout << (count < buffer[index].count ? '*' : ' ');
std::cout << '\n';
}
delete [] buffer;
Since "no libraries are allowed" (except for <iostream>?) I've avoided the use of std::pair<char, int> (which could have been the letter_count struct) and we have to code many utilities (such as max and strlen); the output of the program avobe is:
supergaliftcod
**************
* ******* *
* *** *
* *
*
*
My general solution would be to traverse the word and replace repeated characters with an unused nonsense character. A simple example is below, where I used an exclamation point (!) for the nonsense character (the input could be more robust, some character that is not easily typed, disallowing the nonsense character in the answer, error checking, etc). After traversal, the final step would be removing the nonsense character. The problem is keeping track of the asterisks while retaining the original positions they imply. For that I used a temp string to save the letters and a process string to create the final output string and the asterisks.
#include <iostream>
#include <string>
using namespace std;
int
main ()
{
string input = "";
string tempstring = "";
string process = "";
string output = "";
bool test = false;
cout << "Enter your word below: " << endl;
cin >> input;
for (unsigned int i = 0; i < input.length (); i++)
{ //for the traversed letter, traverse through subsequent letters
for (unsigned int z = i + 1; z < input.length (); z++)
{
//avoid analyzing nonsense characters
if (input[i] != '!')
{
if (input[i] == input[z])
{ //matched letter; replace with nonsense character
input[z] = '!';
test = true; //for string management later
}
}
}
if (test)
{
tempstring += input[i];
input[i] = '*';
test = false; //reset bool for subsequent loops
}
}
//remove garbage symbols and save to a processing string
for (unsigned int i = 0; i < input.size (); i++)
if (input[i] != '!')
process += input[i];
//create the modified output string
unsigned int temp = 0;
for (unsigned int i = 0; i < process.size (); i++)
if (process[i] == '*')
{ //replace asterisks with letters stored in tempstring
output += tempstring[temp];
temp++;
}
else
output += process[i];
//output word with no repeated letters
cout << output << endl;
//output asterisks equal to output.length
for (unsigned int a = 0; a < output.length (); a++)
cout << "*";
cout << endl;
//output asterisks for the letter instances removed
for (unsigned int i = 0; i < process.size (); i++)
if (process[i] != '*')
process[i] = ' ';
cout << process << endl << endl;
}
Sample output I received by running the code:
Enter your word below:
INDIA
INDA
****
*
Enter your word below:
abcdefgabchijklmnop
abcdefghijklmnop
****************
***
It is possible just using simple array to keep count of values.
#include<iostream>
#include<string>
using namespace std;
int main(){
string s;
char arr[10000];
cin>>s;
int count1[256]={0},count2[256]={0};
for(int i=0;i<s.size();++i){
count1[s[i]]++;
count2[s[i]]++;
}
long max=-1;
int j=0;
for(int i=0;i<s.size();++i){
if(count1[s[i]]==count2[s[i]]){ //check if not printing duplicate
cout<<s[i];
arr[j++]=s[i];
}
if(count2[s[i]]>max)
max=count2[s[i]];
--count1[s[i]];
}
cout<<endl;
for(int i =1; i<=max;++i){
for(int k=0;k<j;++k){
if(count2[arr[k]]){
cout<<"*";
count2[arr[k]]--;
}
else
cout<<" ";
}
cout<<endl;
}
}

VC++ Runtime Error : Debug Assertation Failed

Currently I am getting an runtime "assertation error"
Here is the error:
I'm reading words from a text file into dynamically allocated arrays.
this block of code is where I am filling the new arrays.
I know the problem is being caused by this block of code and something about my logic is off just can't see what it is.
//fill new arrays
for( int y = 0; y < new_numwords; y++)
{
for( int i = 0; i < NUM_WORDS; i++)
{
if (!strcmp(SentenceArry[i], EMPTY[0]) == 0)
{
New_SentenceArry[y] = SentenceArry[i];
New_WordCount[y] = WordCount[i];
y++;
}
}
}
Also how would I pass this dynamically allocated 2D array to a function? (the code really needs to be cleaned up as a whole)
char** SentenceArry = new char*[NUM_WORDS]; //declare pointer for the sentence
for( int i = 0; i < NUM_WORDS; i++)
{
SentenceArry[i] = new char[WORD_LENGTH];
}
Here is the full extent of the code.. help would be much appreciated!
Here is what is being read in:
and the current output (the output is how it's suppose to be ):
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <fstream>
#include <cstring>
#include <cctype>
#include <iomanip>
using std::setw;
using std::left;
using std::cout;
using std::cin;
using std::endl;
using std::ifstream;
int main()
{
const int NUM_WORDS = 17;//constant for the elements of arrays
const int WORD_LENGTH = 50;//constant for the length of the cstrings (NEED TO GIVE THE VALUE ZERO STILL!)
short word_entry = 0; //declare counter
short new_numwords= 0; //declare new word count
char EMPTY[1][4]; //NULL ARRAY
EMPTY[0][0] = '\0';//define it as null
char** SentenceArry = new char*[NUM_WORDS]; //declare pointer for the sentence
for( int i = 0; i < NUM_WORDS; i++)
{
SentenceArry[i] = new char[WORD_LENGTH];
}
int WordCount[NUM_WORDS];//declare integer array for the word counter
for(int i = 0; i < NUM_WORDS; i++)//fill int array
{
WordCount[i] = 1;
}
int New_WordCount[NUM_WORDS] = {0};
ifstream read_text("DataFile.txt"); //read in our text file
if (read_text.is_open()) //check if the the file was opened
{
read_text >> SentenceArry[word_entry];
//REMOVE PUNCTUATION BEFORE BEING READ INTO THE ARRAY
while (!read_text.eof())
{
word_entry++; //increment counter
read_text >> SentenceArry[word_entry]; //read in single words of the text file into the array SentenceArry
char* ptr_ch;//declare our pointer that will find chars
ptr_ch = strstr( SentenceArry[word_entry], ",");//look for "," within the array
if (ptr_ch != NULL)//if true replace it with a null character
{
strncpy( ptr_ch, "\0" , 1);
}//end if
else
{
ptr_ch = strstr( SentenceArry[word_entry], ".");//look for "." within the array
if (ptr_ch != NULL)//if true replace it with a null character
{
strncpy( ptr_ch, "\0" , 1);
}//end if
}//end else
} //end while
}//end if
else
{
cout << "The file could not be opened!" << endl;//display error message if file doesn't open
}//end else
read_text.close(); //close the text file after eof
//WORD COUNT NESTED FOR LOOP
for(int y = 0; y < NUM_WORDS; y++)
{
for(int i = y+1; i < NUM_WORDS; i++)
{
if (strcmp(SentenceArry[y], EMPTY[0]) == 0)//check if the arrays match
{
y++;
}
else
{
if (strcmp(SentenceArry[y], SentenceArry[i]) == 0)//check if the arrays match
{
WordCount[y]++;
strncpy(SentenceArry[i], "\0" , 3);
}//end if
}//end if
}//end for
}//end for
//find how many arrays still contain chars
for(int i = 0; i < NUM_WORDS; i++)
{
if (!strcmp(SentenceArry[i], EMPTY[0]) == 0)
{
new_numwords++;
}
}
//new dynamic array
char** New_SentenceArry = new char*[new_numwords]; //declare pointer for the sentence
for( int i = 0; i < new_numwords; i++)
{
New_SentenceArry[i] = new char[new_numwords];
}
//fill new arrays
for( int y = 0; y < new_numwords; y++)
{
for( int i = 0; i < NUM_WORDS; i++)
{
if (!strcmp(SentenceArry[i], EMPTY[0]) == 0)
{
New_SentenceArry[y] = SentenceArry[i];
New_WordCount[y] = WordCount[i];
y++;
}
}
}
//DISPLAY REPORT
cout << left << setw(15) << "Words" << left << setw(9) << "Frequency" << endl;
for(int i = 0; i < new_numwords; i++) //compare i to the array constant NUM_WORDS
{
cout << left << setw(15) << New_SentenceArry[i] << left << setw(9) << New_WordCount[i] << endl; //display the contents of the array SentenceArry
}
//DEALLOCATION
for( int i = 0; i < NUM_WORDS; i++)//deallocate the words inside the arrays
{
delete [] SentenceArry[i];
}
for(int i = 0; i < new_numwords; i++)
{
delete [] New_SentenceArry[i];
}
delete [] SentenceArry; //deallocate the memory allocation made for the array SentenceArry
delete [] New_SentenceArry;//deallocate the memory allocation made for the array New_SentenceArry
}//end main
There are several issues with the code, not withstanding that this could be written using C++, not C with a sprinkling of C++ I/O..
Issue 1:
Since you're using c-style strings, any copying of string data will require function calls such as strcpy(), strncpy(), etc. You failed in following this advice in this code:
for( int y = 0; y < new_numwords; y++)
{
for( int i = 0; i < NUM_WORDS; i++)
{
if (!strcmp(SentenceArry[i], EMPTY[0]) == 0)
{
New_SentenceArry[y] = SentenceArry[i]; // This is wrong
New_WordCount[y] = WordCount[i];
y++;
}
}
}
You should be using strcpy(), not = to copy strings.
strcpy(New_SentenceArry[y], SentenceArry[i]);
Issue 2:
You should allocate WORD_LENGTH for both the original and new arrays. The length of the strings is independent of the number of strings.
char** New_SentenceArry = new char*[new_numwords]; //declare pointer for the sentence
for( int i = 0; i < new_numwords; i++)
{
New_SentenceArry[i] = new char[new_numwords];
}
This should be:
char** New_SentenceArry = new char*[new_numwords]; //declare pointer for the sentence
for( int i = 0; i < new_numwords; i++)
{
New_SentenceArry[i] = new char[WORD_LENGTH];
}
Issue 3:
Your loops do not check to see if the index is going out of bounds of your arrays.
It seems that you coded your program in accordance to the data that you're currently using, instead of writing code regardless of what the data will be. If you have limited yourself to 17 words, where is the check to see if the index goes above 16? Nowhere.
For example:
while (!read_text.eof() )
Should be:
while (!read_text.eof() && word_entry < NUM_WORDS)
Issue 4:
You don't process the first string found correctly:
read_text >> SentenceArry[word_entry]; // Here you read in the first word
while (!read_text.eof() )
{
word_entry++; //increment counter
read_text >> SentenceArry[word_entry]; // What about the first word you read in?
Summary:
Even with these changes, I can't guarantee that the program won't crash. Even it it doesn't crash with these changes, I can't guarantee it will work 100% of the time -- a guarantee would require further analysis.
The proper C++ solution, given what this assignment was about, is to use a std::map<std::string, int> to keep the word frequency. The map would automatically store similar words in one entry (given that you remove the junk from the word), and would bump up the count to 1 automatically, when the entry is inserted into the map.
Something like this:
#include <string>
#include <map>
#include <algorithm>
typedef std::map<std::string, int> StringMap;
using namespace std;
bool isCharacterGarbage(char ch)
{ return ch == ',' || ch == '.'; }
int main()
{
StringMap sentenceMap;
//...
std::string temp;
read_text >> temp;
temp.erase(std::remove_if(temp.begin(), temp.end(), isCharacterGarbage),temp.end());
sentenceMap[temp]++;
//...
}
That code alone does everything your original code did -- keep track of the strings, bumps up the word count, removes the junk characters from the word before being processed, etc. But best of all, no manual memory management. No calls to new[], delete[], nothing. The code just "works". That is effectively 5 lines of code that you would just need to write a "read" loop around.
I won't go through every detail, you can do that for yourself since the code is small, and there are vast amounts of resources available explaining std::map, remove_if(), etc.
Then printing out is merely going through the map and printing each entry (string and count). If you add the printing, that may be 4 lines of extra code. So in all, practically all of the assignment is done with effectively 10 or so lines of code.
Remove below code.
for(int i = 0; i < new_numwords; i++)
{
delete [] New_SentenceArry[i];
}

BinSearch fails after Bubble Sort

My program seems to be behaving oddly all of a sudden and I cannot figure out why no matter how I look.
Let's begin with the header
//inventoryData.h
//This is the second edition of inventory data, now featuring an actual description
//This header will load an array, sort it, and then be used in InventorySearch to produce parts and prices.
//by Robert Moore on [DATE]
#include <iostream>
#include <fstream>
#include <iomanip>
using namespace std;
class InventoryData{
//Variables
private:
int partNum[1000];
double price[1000];
int invCount;
public:
InventoryData();//Build Up
void loadArrays(); //Feed the data from the database into our arrays
void arraySort(); //Bubblesort for the array
int seqSearch(int); //Our one by one search method
int binSearch(int); //The other search
int returnpart(int); //Return Part Number
double returnPrice(int); //Return price
//Incorportate a search counter to both these searches?
//IE: bin search found [x] (completed after [y] records)
};
InventoryData::InventoryData()
{
//Load the array
invCount = 0;
for (int count = 0; count < 1000; count++)
{
partNum[count] = 0;
price[count] = 0;
}
}
void InventoryData::arraySort()
{
int counter = 0; //Used to keep track of subscripts
int temp = 0; //Used to sort subscript contents
double tempPrice = 0;
int maxSub = invCount;
int lastKnown = 0; //Used to indicate what the last swapped value was
char swap = 'Y'; //used to indicate if a swap was made or not
while (swap == 'Y')
{
swap = 'N';
counter = 0;
while (counter < maxSub){
if (partNum[counter] < partNum[counter+1])
{
//Swap the part number
temp = partNum[counter];
partNum[counter] = partNum[counter+1];
partNum[counter+1] = temp;
//Swap the price
tempPrice = price[counter];
price[counter] = price[counter+1];
price[counter+1] = tempPrice;
//Report the swap occured
swap = 'Y';
lastKnown = counter;
}
counter++;
}//End of While Loop
maxSub = lastKnown;
}//End this While Loop Too
cout<<"File sort complete."<<endl;
}
void InventoryData::loadArrays()
{
ifstream partIn;
partIn.open("masterInventory.dat");
cout<<"Loading..."<<endl;
if (partIn.is_open())
{
//Prime Read
partIn >> partNum[invCount]
>> price[invCount];
//cout<<partNum[invCount]<<" and "<<price[invCount] <<" have been loaded."<<endl;
while(!partIn.eof())
{
invCount++;
partIn >> partNum[invCount]
>> price[invCount];
// cout<<partNum[invCount]<<" and "<<price[invCount] <<" have been loaded."<<endl;
} //END While
partIn.close();
cout<<"All files loaded successfully."<<endl;
} //END IF*/
else
{
invCount = -1;
cout<<"File failed to open."<<endl;
}
//arraySort();
}
int InventoryData::seqSearch(int searchKey)
{
int index = 0;
int found = -1;
int counter = 0;
while(index < invCount)
{
counter++;
if (searchKey == partNum[index]
)
{
found = index;
index = invCount;
}
else
{
index++;
}
}
cout<<"(Sequential completed after reading "<< counter<<" files.)"<<endl;
return found;
}
int InventoryData::binSearch(int searchKey)
{
int first = 0;
int last = invCount;
int found = 0;
int mid = 0;
int counter = 0;
while (first <= last && found == 0)
{
counter++;
mid = (first + last)/2;
if (searchKey == partNum[mid] ){
found = 1;
return mid;
}
else
{
if (partNum[mid] < searchKey)
{
first = mid+1;
}
else
{
last = mid - 1;
}
}
}
if (found == 0)
{
mid = -1;
}
cout<<"(Binary completed after reading "<< counter <<" files.)"<<endl;
return mid;
}
int InventoryData::returnpart(int value)
{
return partNum[value];
}
double InventoryData::returnPrice(int value)
{
setprecision(2);
return price[value];
}
With this set up, the program loads numbers from a database (any random combination of digits and another set of "prices"), then we call the function to load, sort, and search the array, as found in the CPP file
//InventorySearch
/*This file is used to search our databases
and return a value for whatever our search may
be looking for.*/
//by Robert Moore
#include "inventoryData.h"
#include <iomanip>
int main()
{
//Declare Variable
int tempSeq = 0;
int tempBin = 0;
int search = 0;
char confirmation = 'Y';
int searchCounter = 0;
int partsFound = 0;
int partsLost = 0;
//Build Object and Load Array
InventoryData invent;
invent.loadArrays();
invent.arraySort();
//Introduction
cout<<"Welcome to Part Search."<<endl;
//Begin Loop Here
while(confirmation != 'N')
{
cout<<"Please enter a part number: ";
searchCounter++;
cin>>search;
cout<<endl;
tempSeq = invent.seqSearch(search);
if (tempSeq != -1)
{
std::cout << std::fixed;
cout<<"Sequential found part number "<<invent.returnpart(tempSeq)<< ", and it's price is "<<setprecision(2)<<invent.returnPrice(tempSeq)<<endl;
partsFound++;
}
else
{
cout<<"Sequential search failed to find part number "<<search<<endl;
partsLost++;
}
tempBin = invent.binSearch(search);
if (tempBin != -1)
{
std::cout << std::fixed;
cout<<"Binary found part number "<<invent.returnpart(tempBin)<<", and it's price is "<<setprecision(2)<<invent.returnPrice(tempBin)<<endl;
partsFound++;
}
else
{
cout<<"Binary search failed to find part number "<<search<<endl;
partsLost++;
}
cout<<"Would you like to search again? (Plese enter Y/N): ";
cin>>confirmation;
confirmation = toupper(confirmation);
}
cout<<"Today's Summary: "<<endl;
cout<<setw(5)<<"Total searches: "<<setw(25)<<searchCounter<<endl;
cout<<setw(5)<<"Total successful searches:"<<setw(15)<<(partsFound/2)<<endl;
cout<<setw(5)<<"Total unsuccessful searches:"<<setw(12)<<(partsLost/2)<<endl;
cout<<"Thank you for using Part Search. Have a nice day."<<endl;
return 0;
}
However, the output runs into the following problem: where the sequential search will scour the entire database and find our value, the binSearch will only search up to 8 values and fail. At first I thought this was due to the way the sort was loaded, but once I coded it out, it continued to fail. Worse yet, aside from adding the sort, the program function just fine prior to this.
I'm running out of ideas as to where the program is wrong, as this code worked just fine up until arraySort() was added.
In your arraySort() method, you should take note of the fact that for instance if maxSub=10, then for the part where you write
while (counter < maxSub){
if (partNum[counter] < partNum[counter+1])
{
.....
}
}
you might end up performing
if(partNum[9]<partNum[10]){
....
}
Since C++ does not perform bound checking on arrays, your code, although buggy, might end up compiling successfully, and may (or may not) produce the correct result. Thus you need to change the loop condition to
while((counter+1)<maxSub){
.....
}
Besides, your arraySort() is sorting in the Descending order, and your binSearch() has been implemented for an array sorted in ascending order. You can change either of the methods as per your requirement.
Hope this helps.
Your sorting algorithm seems faulty to me. If you are trying bubble sort, sorting implementation should be like this.
for(int counter1 = 0;counter1<invCount; ++counter1)
{
for(int counter2 = counter1+1; counter2<invCount; ++counter2)
{
if(partNum[counter1] < partNum[counter2])
{
//do swaping here.
}
}
}

Max Repeated Word in a string

I was trying to do a very common interview problem "Finding the max repeated word in a string " and could not find much resources in net for c/c++ implementation. So I coded it myself here.
I have tried to do most of the coding from scratch for better understanding.
Could you review my code and provide comments on my algorithm. Some people have suggested using hashtables for storing the count, but am not using hashtables here.
#include<stdafx.h>
#include<stdlib.h>
#include<stdio.h>
#include<string>
#include<iostream>
using namespace std;
string word[10];
//splitting string into words
int parsestr(string str)
{
int index = 0;
int i = 0;
int maxlength = str.length();
int wordcnt = 0;
while(i < maxlength)
{
if(str[i]!= ' ')
{
word[index] = word[index]+str[i];
}
else
{
index++;//new word
wordcnt = index;
}
i++;
}
return wordcnt;
}
//find the max word count out of the array and return the word corresponding to that index.
string maxrepeatedWord(int wordcntArr[],int count)
{
int max = 0;
int index = 0;
for(int i=0;i<=count;i++)
{
if(wordcntArr[i] > max)
{
max = wordcntArr[i];
index = i;
}
}
return word[index];
}
void countwords(int count)
{
int wordcnt = 0;
int wordcntArr[10];
string maxrepeatedword;
for(int i=0;i<=count ;i++)
{
for(int j=0;j<=count;j++)
{
if(word[i]==word[j])
{
wordcnt++;
//word[j] = "";
}
else
{}
}
cout<<" word "<< word[i] <<" occurs "<< wordcnt <<" times "<<endl;
wordcntArr[i] = wordcnt;
wordcnt = 0;
}
maxrepeatedword = maxrepeatedWord(wordcntArr,count);
cout<< " Max Repeated Word is " << maxrepeatedword;
}
int main()
{
string str = "I am am am good good";
int wordcount = 0;
wordcount = parsestr(str);
countwords(wordcount);
}
Just for the sake of comparison, the most obvious way to do this is C++ is:
#include <map>
#include <string>
#include <iostream>
#include <sstream>
int main()
{
std::istringstream input("I am am am good good");
std::map<std::string, int> count;
std::string word;
decltype(count)::const_iterator most_common;
while (input >> word)
{
auto iterator = count.emplace(word, 0).first;
++iterator->second;
if (count.size() == 1 ||
iterator->second > most_common->second)
most_common = iterator;
}
std::cout << '\"' << most_common->first << "' repeated "
<< most_common->second << " times\n";
}
See it run here.
Notes:
map::emplace returns a pair<iterator,bool> indicating where the word & its count are in the map, and whether its newly inserted. We only care about where so capture emplace(...).first.
As we update the count, we check if that makes the word the most-common word seen so far. If so we copy the iterator to the local variable most_common, so we have a record of both the most commonly seen word so far and its count.
Some things you're doing that are worth thinking about:
word is a global variable - it's a good habit to pass things as function arguments unless it's terribly inconvenient, means the code can be reused more easily from async signal handlers or other threads, and it's more obvious in looking at a function call site what the inputs and outputs might be. As is, the call countwords(wordcount) makes it look like countwords' only input is the int wordcount.
fixed sized arrays: if you've more than 10 words, you're sunk. C++ Standard containers can grow on demand.
there are a few convenience functions you could use, such as std::string::operator+=(char) to append a char more concisely ala my_string += my_char;
Generally though, your code is quite sensible and shows a good understanding of iteration and problem solving, doing it all very low-level but that's good stuff to understand in a very hands-on way.
Code Snippet :
void mostRepeat(string words[], int n)
{
int hash[n]={0};
for(int j=0; j<n; j++)
{
for(int i=0; i<n; i++)
{
if(words[j]==words[i]) hash[j]++;
}
}
int maxi = hash[0];
int index = 0;
for(int i=0; i<n; i++)
{
if(maxi<hash[i])
{
maxi=hash[i];
index = i;
}
}
cout<<words[index]<<endl;
}
Full program : Link
import java.util.*;
public class StringWordDuplicates {
static void duplicate(String inputString){
HashMap<String, Integer> wordCount = new HashMap<String,Integer>();
String[] words = inputString.split(" ");
for(String word : words){
if(wordCount.containsKey(word)){
wordCount.put(word, wordCount.get(word)+1);
}
else{
wordCount.put(word, 1);
}
}
//Extracting of all keys of word count
Set<String> wordsInString = wordCount.keySet();
for(String word : wordsInString){
if(wordCount.get(word)>1){
System.out.println(word+":"+wordCount.get(word));
}
}
}
public static void main(String args[]){
duplicate("I am Java Programmer and IT Server Programmer with Java as Best Java lover");
}
}