Stop Words in C++

Stop Words in C++ - c++

The following C++ program takes two text files, stop_words.txt, and story.txt. It then removes all the stop word occurrences in the story.txt file. For instance,
Monkey is a common name that may refer to groups or species of mammals, in part, the simians of infraorder L. The term is applied descriptively to groups of primates, such as families of new world monkeys and old world monkeys. Many monkey species are tree-dwelling (arboreal), although there are species that live primarily on the ground, such as baboons. Most species are also active during the day (diurnal). Monkeys are generally considered to be intelligent, especially the old world monkeys of Catarrhini.
the text above is story.txt, and the stop_words.txt file is given below:
is
are
be
When I run my code, it doesn't delete all the stop words and keeps some of them. The code also creates a file called stop_words_counter.txt which should display the number of stop word occurrences like so:
is 2
are 4
b 1
But my output file shows the following:
is 1
are 4
be 1
I would be very grateful for some help regarding this code! I have posted it below for your reference.
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
const int MAX_NUM_STOPWORDS = 100;
struct Stop_word
{
string word; // stop word
int count; // removal count
};
int stops[100];
string ReadLineFromStory(string story_filename )
{
string x = "";
string b;
ifstream fin;
fin.open(story_filename);
while(getline(fin, b))
{
x += b;
}
return x;
}
void ReadStopWordFromFile(string stop_word_filename, Stop_word words[], int &num_words)
{
ifstream fin;
fin.open(stop_word_filename);
string a;
int i = 0;
if (fin.fail())
{
cout << "Failed to open "<< stop_word_filename << endl;
exit(1);
}
words[num_words].count = 0;
while (fin >> words[num_words].word)
{
++num_words;
}
fin.close();
}
void WriteStopWordCountToFile(string wordcount_filename, Stop_word words[], int num_words)
{
ofstream fout;
fout.open(wordcount_filename);
for (int i = 0; i < 1; i++)
{
fout << words[i].word << " "<< stops[i] + 1 << endl;
}
for (int i = 1; i < num_words; i++)
{
fout << words[i].word << " "<< stops[i] << endl;
}
fout.close();
}
int RemoveWordFromLine(string &line, string word)
{
int length = line.length();
int counter = 0;
int wl = word.length();
for(int i=0; i < length; i++)
{
int x = 0;
if(line[i] == word[0] && (i==0 || (i != 0 && line[i-1]==' ')))
{
for(int j = 1 ; j < wl; j++)
if (line[i+j] != word[j])
{
x = 1;
break;
}
if(x == 0 && (i + wl == length || (i + wl != length && line[i+wl] == ' ')))
{
for(int k = i + wl; k < length; k++)
line[k -wl] =line[k];
length -= wl;
counter++;
}
}
}
line[length] = 0;
char newl[1000] = {0};
for(int i = 0; i < length; i++)
newl[i] = line[i];
line.assign(newl);
return counter;
}
int RemoveAllStopwordsFromLine(string &line, Stop_word words[], int num_words)
{
int counter[100];
int final = 0;
for(int i = 1; i <= num_words; i++)
{
counter[i] = RemoveWordFromLine(line, words[i].word);
final += counter[i];
stops[i] = counter[i];
}
return final;
}
int main()
{
Stop_word stopwords[MAX_NUM_STOPWORDS]; // an array of struct Stop_word
int num_words = 0, total = 0;
// read in two filenames from user input
string a, b, c;
cin >> a >> b;
// read stop words from stopword file and
// store them in an array of struct Stop_word
ReadStopWordFromFile(a, stopwords, num_words);
// open text file
c = ReadLineFromStory(b);
// open cleaned text file
ofstream fout;
fout.open("story_cleaned.txt");
// read in each line from text file, remove stop words,
// and write to output cleaned text file
total = RemoveAllStopwordsFromLine(c, stopwords, num_words) + 1 ;
fout << c;
// close text file and cleaned text file
fout.close();
// write removal count of stop words to files
WriteStopWordCountToFile("stop_words_count.txt", stopwords, num_words);
// output to screen total number of words removed
cout << "Number of stop words removed = " << total << endl;
return 0;
}

There is one major bug in your code.
in function RemoveAllStopwordsFromLine
you are using the wrong array indices. In C++ the first element in an array has the index 0. Also you must compare with "less" than the size.
for (int i = 1; i <= num_words; i++)
So the first stop word "is", will never be checked and counted.
Please modify to
for (int i = 0; i < num_words; i++)
But then you need also to remove your patch in function WriteStopWordCountToFile . You made a special case for element 0. That is wrong.
Please remove
for (int i = 0; i < 1; i++)
{
fout << words[i].word << " " << stops[i] + 1 << endl;
}
and start the next for with 0. And remove the "+" while calculating the total.
Because you are using C-Style arrays, magic numbers and ultra complex code, I will show you a modern C++ solution.
In C++ you have many useful algorithms. Some are specifically designed to address your requirments. So, please use them. Try to get away from C and migrate to C++.
#include <string>
#include <iostream>
#include <fstream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <regex>
#include <sstream>
// The filenames. Whatever you want
const std::string storyFileName{ "r:\\story.txt" };
const std::string stopWordFileName{ "r:\\stop_words.txt" };
const std::string stopWordsCountFilename{ "r:\\stop_words_count.txt" };
const std::string storyCleanedFileName{ "r:\\story_cleaned.txt" };
// Becuase of the simplicity of the task, put everything in main
int main() {
// Open all 4 needed files
std::ifstream storyFile(storyFileName);
std::ifstream stopWordFile(stopWordFileName);
std::ofstream stopWordsCountFile(stopWordsCountFilename);
std::ofstream storyCleanedFile(storyCleanedFileName);
// Check, if the files could be opened
if (storyFile && stopWordFile && stopWordsCountFile && storyCleanedFile) {
// 1. Read the complete sourcefile with the story into a std::string
std::string story( std::istreambuf_iterator<char>(storyFile), {} );
// 2. Read all "stop words" into a std::vector of std::strings
std::vector stopWords(std::istream_iterator<std::string>(stopWordFile), {});
// 3. Count the occurences of the "stop words" and write them into the destination file
std::for_each(stopWords.begin(), stopWords.end(), [&story,&stopWordsCountFile](std::string& sw) {
std::regex re{sw}; // One of the "stop words"
stopWordsCountFile << sw << " --> " << // Write count to output
std::distance(std::sregex_token_iterator(story.begin(), story.end(), re, 1), {}) << "\n";});
// 4. Replace "stop words" in story and write new story into file
std::ostringstream wordsToReplace; // Build a list of all stop words, followed by an option white space
std::copy(stopWords.begin(), stopWords.end(), std::ostream_iterator<std::string>(wordsToReplace, "\\s?|"));
storyCleanedFile << std::regex_replace(story,std::regex(wordsToReplace.str()), "");
}
else {
// In case that any of the files could not be opened.
std::cerr << "\n*** Error: Could not open one of the files\n";
}
return 0;
}
Please try to study and understand this code. This is a very simple solution.

Related

Removing all the vowels in a string in c++

I've written a code that removes all vowels from a string in c++ but for some reason it doesn't remove the vowel 'o' for one particular input which is: zjuotps.
Here's the code:
#include<iostream>
#include<string>
using namespace std;
int main(){
string s;
cin >> s;
string a = "aeiouyAEIOUY";
for (int i = 0; i < s.length(); i++){
for(int j = 0; j < a.length(); j++){
if(s[i] == a[j]){
s.erase(s.begin() + i);
}
}
}
cout << s;
return 0;
}
When I input: zjuotps
The Output I get is: zjotps

This is a cleaner approach using the C++ standard library:
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string input = "zjuotps";
std::string vowels = "aeiouyAEIOUY";
auto predicate = [&vowels](char c) { return vowels.find(c) != std::string::npos; };
auto iterator = std::remove_if(input.begin(), input.end(), predicate);
input.erase(iterator, input.end());
cout << input << endl;
}
Edit:
as #RemyLebeau pointed out, std::erase_if can be used which is introduced in c++20 and the answer becomes one line of code:
std::erase_if(input, [&vowels](char c) { return vowels.find(c) != std::string::npos; });

You can develop a solution by adding the matching characters to the new string object. The eliminate() method writes the character to the result object if the characters in the input object doesn't match the characters in the remove object.
#include <iostream>
/**
* #brief This method scans the characters in the "input" object and writes
* the characters not in the "remove" object to the "result" object.
* #param input This object contains the characters to be scanned.
* #param remove This object contains characters that will not match.
* #param result Non-match result data is writed to this object.
*/
void eliminate(std::string input, std::string remove, std::string &result);
int main()
{
std::string input = "zjuotpsUK", remove = "aeiouyAEIOUY", result;
eliminate(input, remove, result);
std::cout << result << std::endl;
return 0;
}
void eliminate(std::string input, std::string remove, std::string &result)
{
for (size_t i = 0, j = 0; i < input.length(); i++)
{
for(j = 0; j < remove.length(); j++)
if(input[i] == remove[j])
break;
if(j == remove.length())
result += input[i];
}
}

In your code here, I replaced s with input_str, and a with vowels, for readability:
for (int i = 0; i < input_str.length(); i++){
for(int j = 0; j < vowels.length(); j++){
if(input_str[i] == vowels[j]){
input_str.erase(input_str.begin() + i);
}
}
}
The problem with your current code above is that each time you erase a char in the input string, you should break out of the vowels j loop and start over again in the input string at the same i location, checking all vowels in the j loop again. This is because erasing a char left-shifts all chars which are located to the right, meaning that the same i location would now contain a new char to check since it just left-shifted into that position from one position to the right. Erroneously allowing i to increment means you skip that new char to check in that same i position, thereby leaving the 2nd vowel in the string if 2 vowels are in a row, for instance. Here is the fix to your immediate code from the question:
int i = 0;
while (i < s.length()){
bool char_is_a_vowel = false;
for(int j = 0; j < a.length(); j++){
if(s[i] == a[j]){
char_is_a_vowel = true;
break; // exit j loop
}
}
if (char_is_a_vowel){
s.erase(s.begin() + i);
continue; // Do NOT increment i below! Skip that.
}
i++;
}
However, there are many other, better ways to do this. I'll present some below. I personally find this most-upvoted code difficult to read, however. It requires extra study and looking up stuff to do something so simple. So, I'll show some alternative approaches to that answer.
Approach 1 of many: copy non-vowel chars to new string:
So, here is an alternative, simple, more-readable approach where you simply scan through all chars in the input string, check to see if the char is in the vowels string, and if it is not, you copy it to an output string since it is not a vowel:
Just the algorithm:
std::string output_str;
for (const char c : input_str) {
if (vowels.find(c) == std::string::npos) {
output_str.push_back(c);
}
}
Full, runnable example:
#include <iostream> // For `std::cin`, `std::cout`, `std::endl`, etc.
#include <string>
int main()
{
std::string input_str = "zjuotps";
std::string vowels = "aeiouyAEIOUY";
std::string output_str;
for (const char c : input_str)
{
if (vowels.find(c) == std::string::npos)
{
// char `c` is NOT in the `vowels` string, so append it to the
// output string
output_str.push_back(c);
}
}
std::cout << "input_str = " << input_str << std::endl;
std::cout << "output_str = " << output_str << std::endl;
}
Output:
input_str = zjuotps
output_str = zjtps
Approach 2 of many: remove vowel chars in input string:
Alternatively, you could remove the vowel chars in-place as you originally tried to do. But, you must NOT increment the index, i, for the input string if the char is erased since erasing the vowel char left-shifs the remaining chars in the string, meaning that we need to check the same index location again the next iteration in order to read the next char. See the note in the comments below.
Just the algorithm:
size_t i = 0;
while (i < input_str.length()) {
char c = input_str[i];
if (vowels.find(c) != std::string::npos) {
input_str.erase(input_str.begin() + i);
continue;
}
i++;
}
Full, runnable example:
#include <iostream> // For `std::cin`, `std::cout`, `std::endl`, etc.
#include <string>
int main()
{
std::string input_str = "zjuotps";
std::string vowels = "aeiouyAEIOUY";
std::cout << "BEFORE: input_str = " << input_str << std::endl;
size_t i = 0;
while (i < input_str.length())
{
char c = input_str[i];
if (vowels.find(c) != std::string::npos)
{
// char `c` IS in the `vowels` string, so remove it from the
// `input_str`
input_str.erase(input_str.begin() + i);
// do NOT increment `i` here since erasing the vowel char above just
// left-shifted the remaining chars in the string, meaning that we
// need to check the *same* index location again the next
// iteration!
continue;
}
i++;
}
std::cout << "AFTER: input_str = " << input_str << std::endl;
}
Output:
BEFORE: input_str = zjuotps
AFTER: input_str = zjtps
Approach 3 of many: high-speed C-style arrays: modify input string in-place
I borrowed this approach from "Approach 1" of my previous answer here: Removing elements from array in C
If you are ever in a situation where you need high-speed, I'd bet this is probably one of the fastest approaches. It uses C-style strings (char arrays). It scans through the input string, detecting any vowels. If it sees a char that is NOT a vowel, it copies it into the far left of the input string, thereby modifying the string in-place, filtering out all vowels. When done, it null-terminates the input string in the new location. In case you need a C++ std::string type in the end, I create one from the C-string when done.
Just the algorithm:
size_t i_write = 0;
for (size_t i_read = 0; i_read < ARRAY_LEN(input_str); i_read++) {
bool char_is_a_vowel = false;
for (size_t j = 0; j < ARRAY_LEN(input_str); j++) {
if (input_str[i_read] == vowels[j]) {
char_is_a_vowel = true;
break;
}
}
if (!char_is_a_vowel) {
input_str[i_write] = input_str[i_read];
i_write++;
}
}
input_str[i_write] = '\n';
Full, runnable example:
#include <iostream> // For `std::cin`, `std::cout`, `std::endl`, etc.
#include <string>
/// Get the number of elements in an array
#define ARRAY_LEN(array) (sizeof(array)/sizeof(array[0]))
int main()
{
char input_str[] = "zjuotps";
char vowels[] = "aeiouyAEIOUY";
std::cout << "BEFORE: input_str = " << input_str << std::endl;
// Iterate over all chars in the input string
size_t i_write = 0;
for (size_t i_read = 0; i_read < ARRAY_LEN(input_str); i_read++)
{
// Iterate over all chars in the vowels string. Only retain in the input
// string (copying chars into the left side of the input string) all
// chars which are NOT vowels!
bool char_is_a_vowel = false;
for (size_t j = 0; j < ARRAY_LEN(input_str); j++)
{
if (input_str[i_read] == vowels[j])
{
char_is_a_vowel = true;
break;
}
}
if (!char_is_a_vowel)
{
input_str[i_write] = input_str[i_read];
i_write++;
}
}
// null-terminate the input string at its new end location; the number of
// chars in it (its new length) is now equal to `i_write`!
input_str[i_write] = '\n';
std::cout << "AFTER: input_str = " << input_str << std::endl;
// Just in case you need it back in this form now:
std::string str(input_str);
std::cout << " C++ str = " << str << std::endl;
}
Output:
BEFORE: input_str = zjuotps
AFTER: input_str = zjtps
C++ str = zjtps
See also:
[a similar answer of mine in C] Removing elements from array in C

should I launch multiple threads if my CPU load suggests otherwise?

I'm making a program in C++ which counts NGS read alignments against a reference annotation. Basically the program reads both the annotation and alignment file into memory, iterates through the annotation, binary searches the alignment file for a probable location, upon finding this location linear searches a frame that is around that probable location.
Typically I want to keep this frame somewhat large (10000 alignments), so I had the idea to split the frame up and throw parts of it into separate threads.
Everything compiles and runs, but it doesn't look like my multithreading is working as intended because my comp is using one core for the job. Would anyone be kind enough to help me figure this out where I implemented the threading wrong.
https://sourceforge.net/projects/fast-count/?source=directory
#include <iostream>
#include <cstdlib>
#include <vector>
#include <string>
#include <thread>
#include <sstream>
#include <fstream>
#include <math.h>
#include "api/BamReader.h"
using namespace std;
using namespace BamTools;
int hit_count = 0;
struct bam_headers{
string chr;
int start;
};
struct thread_data{
int thread_id;
int total_thread;
int start_gtf;
int stop_gtf;
};
struct gtf_headers{
string chr;
string source;
string feature;
string score;
string strand;
string frame;
string annotation;
int start;
int end;
};
void process(int* start_holder, int size, int gtf_start, int gtf_stop){
//threaded counter process
for (int t = 0; t < size; t++){
if((start_holder[t] >= gtf_start) && (start_holder[t] <= gtf_stop)){
hit_count++;
}
}
}
vector <string> find_index(vector <vector <bam_headers> > bams){
//define vector for bam_index to chromosome
vector <string> compute_holder;
for (int bam_idx = 0; bam_idx < bams.size();bam_idx++){
compute_holder.push_back(bams[bam_idx][0].chr);
}
return compute_holder;
}
vector <gtf_headers> load_gtf(char* filename){
//define matrix to memory holding gtf annotations by assoc. header
vector<gtf_headers> push_matrix;
gtf_headers holder;
ifstream gtf_file(filename);
string line;
cout << "Loading GTF to memory" << "\n";
if (gtf_file.is_open()){
int sub_count = 0;
string transfer_hold[8];
while(getline(gtf_file,line)){
//iterate through file
istringstream iss(line);
string token;
//iterate through line, and tokenize by tab delimitor
while(getline(iss,token,'\t')){
if (sub_count == 8){
//assign to hold struct, and push to vector
holder.chr = transfer_hold[0];
holder.source = transfer_hold[1];
holder.feature = transfer_hold[2];
holder.start = atoi(transfer_hold[3].c_str());
holder.end = atoi(transfer_hold[4].c_str());
holder.score = transfer_hold[5];
holder.strand = transfer_hold[6];
holder.frame = transfer_hold[7];
holder.annotation = token;
push_matrix.push_back(holder);
sub_count = 0;
} else {
//temporarily hold tokens
transfer_hold[sub_count] = token;
++sub_count;
}
}
}
cout << "GTF successfully loaded to memory" << "\n";
gtf_file.close();
return(push_matrix);
}else{
cout << "GTF unsuccessfully loaded to memory. Check path to file, and annotation format. Exiting" << "\n";
exit(-1);
}
}
vector <vector <bam_headers>> load_bam(char* filename){
//parse individual bam file to chromosome bins
vector <vector <bam_headers> > push_matrix;
vector <bam_headers> iter_chr;
int iter_refid = -1;
bam_headers bam_holder;
BamReader reader;
BamAlignment al;
const vector<RefData>& references = reader.GetReferenceData();
cout << "Loading " << filename << " to memory" << "\n";
if (reader.Open(filename)) {
while (reader.GetNextAlignmentCore(al)) {
if (al.IsMapped()){
//bam file must be sorted by chr. otherwise the lookup will segfault
if(al.RefID != iter_refid){
//check if chr. position has advanced in the bam file, if true, push empty vector
iter_refid++;
push_matrix.push_back(iter_chr);
}else{
//if chr. position hasn't advanced push to current index in 2d vector
bam_holder.chr = references[al.RefID].RefName;
bam_holder.start = al.Position;
push_matrix.at(iter_refid).push_back(bam_holder);
}
}
}
reader.Close();
cout << "Successfully loaded " << filename << " to memory" << "\n";
return(push_matrix);
}else{
cout << "Could not open input BAM file. Exiting." << endl;
exit(-1);
}
}
short int find_bin(const string & gtf_chr, const vector <string> mapping){
//determines which chr. bin the gtf line is associated with
int bin_compare = -1;
for (int i = 0; i < mapping.size(); i++){
if(gtf_chr == mapping[i]){
bin_compare = i;
}
}
return(bin_compare);
}
int find_frame(gtf_headers gtf_matrix, vector <bam_headers> bam_file_bin){
//binary search to find alignment index with greater and less than gtf position
int bin_size = bam_file_bin.size();
int high_end = bin_size;
int low_end = 0;
int binary_i = bin_size / 2;
int repeat = 0;
int frame_start;
bool found = false;
while (found != true){
if ((bam_file_bin[binary_i].start >= gtf_matrix.start) && (bam_file_bin[binary_i].start <= gtf_matrix.end)){
frame_start = binary_i;
found = true;
}else{
if(repeat != binary_i){
if(bam_file_bin[binary_i].start > gtf_matrix.end){
if(repeat != binary_i){
repeat = binary_i;
high_end = binary_i;
binary_i = ((high_end - low_end) / 2) + low_end;
}
}else{
if(repeat != binary_i){
repeat = binary_i;
low_end = binary_i;
binary_i = ((high_end - low_end) / 2) + low_end;
}
}
}else{
frame_start = low_end;
found = true;
}
}
}
return(frame_start);
}
vector <int > define_frame(int frame_size, int frame_start, int bam_matrix){
//define the frame for the search
vector <int> push_ints;
push_ints.push_back(frame_start - (frame_size / 2));
push_ints.push_back(frame_start + (frame_size / 2));
if(push_ints[0] < 0){
push_ints[0] = 0;
push_ints[1] = frame_size;
if(push_ints[1] > bam_matrix){
push_ints[1] = frame_size;
}
}
if(push_ints[1] > bam_matrix){
push_ints[1] = bam_matrix;
push_ints[0] = bam_matrix - (frame_size / 2);
if(push_ints[0] < 0){
push_ints[0] = 0;
}
}
return(push_ints);
}
void thread_handler(int nthread, vector <int> frame, vector <bam_headers> bam_matrix, gtf_headers gtf_record){
int thread_divide = frame[1]-frame[0];//frame_size / nthread;
int thread_remain = (frame[1]-frame[0]) % nthread;
int* start_holder = new int[thread_divide];
for(int i = 0; i < nthread; i++){
if (i < nthread - 1){
for (int frame_index = 0; frame_index < thread_divide; frame_index++){
start_holder[frame_index] = bam_matrix[frame[0]+frame_index].start;
}
frame[0] = frame[0] + thread_divide;
thread first(process, start_holder,thread_divide,gtf_record.start,gtf_record.end);
first.join();
}else{
for (int frame_index = 0; frame_index < thread_divide + thread_remain; frame_index++){
start_holder[frame_index] = bam_matrix[frame[0]+frame_index].start;
}
thread last(process, start_holder,thread_divide + thread_remain,gtf_record.start,gtf_record.end);
last.join();
}
}
}
int main (int argc, char *argv[])
{
// usage
// ./count threads frame_size gtf_file files
//define matrix to memory holding gtf annotations by assoc. header
vector <gtf_headers> gtf_matrix = load_gtf(argv[3]);
//load bam, perform counts
for(int i = 4;i < argc;i++){
//iterate through filenames in argv, define matrix to memory holding bam alignments chr and bp position
vector <vector <bam_headers> > bam_matrix = load_bam(argv[i]);
//map chromosome to bam matrix index
vector <string> index_mapping = find_index(bam_matrix);
//iterate through gtf matrix, find corresponding bins for chr, set search frames, and count
for(int gtf_i = 0; gtf_i < gtf_i < gtf_matrix.size();gtf_i++){ //gtf_i < gtf_matrix.size()
hit_count = 0;
//find corresponding bins for gtf chr
short int bin_compare = find_bin(gtf_matrix[gtf_i].chr,index_mapping);
if(bin_compare != -1){
//find start of search frame
int frame_start = find_frame(gtf_matrix[gtf_i], bam_matrix[bin_compare]);
//get up lower bounds of search frame;
vector <int> full_frame = define_frame(atoi(argv[2]),frame_start,bam_matrix[bin_compare].size());
//create c array of bam positional data for the frame, and post to thread process
thread_handler(atoi(argv[1]),full_frame,bam_matrix[bin_compare],gtf_matrix[gtf_i]);
}
//counts displayed in STOUT
cout << gtf_matrix[gtf_i].chr << "\t" << gtf_matrix[gtf_i].source << "\t" << gtf_matrix[gtf_i].feature << "\t" << gtf_matrix[gtf_i].start << "\t" << gtf_matrix[gtf_i].end << "\t" << gtf_matrix[gtf_i].score << "\t" << gtf_matrix[gtf_i].strand << "\t" << gtf_matrix[gtf_i].frame << "\t" << gtf_matrix[gtf_i].annotation << "\t" << hit_count << "\n";
}
}
}

The answer to your question is very simple:
thread last(process, start_holder,thread_divide + thread_remain,gtf_record.start,gtf_record.end);
last.join();
Here, the parent task creates a new thread, and ... immediately waits for the thread to finish. That's what join() does, it waits for the thread to terminate.
So, your code starts a new thread, and immediately waits for it to finish, before doing anything else, like starting the next thread.
You need to rewrite thread_handler() to instantiate all std::thread instances, and then after instantiating all of them, call join() on each one, to wait for all of them to finish.
The typical approach is to precreate a std::vector of all thread instances, using std::thread's default constructor, then loop over them to initialize each one, then loop over them again, calling join() on each one.

How to solve Project 12.15 in Walter Savitch Absolute C++ 5th Ed

I have been working the Project 15 in Chapter 12 of Walter Savitch Absolute C++ (5th ed.) for a long time. The problem is at the bottom, which you can check if you are interested in my problem.
I figure out one way to solve this problem which is to read the file containing a paragraph many times and at each time, locate the keyword, the line number and the context. But it sounds to me very tedious. So I tried to read the file only twice. At the first time reading it, I locate all the keywords and line numbers. At the second time reading the file, I determine the context. But I failed.
I do not attempt to fix bugs in my code (
// header file for project 15
#ifndef PROJECT15_H
#define PROJECT15_H
#include<iostream>
#include<fstream>
#include<string>
#include<sstream>
#include<cstring>
#include<iomanip>
#include<vector>
using std::ios;
using std::cout;
using std::endl;
using std::setw;
using std::string;
using std::istream;
using std::ostream;
using std::ifstream;
using std::ofstream;
using std::stringstream;
const int MAX_CHARACTERS = 72;
const int N_CHARACTERS_BEFORE = 2;
const int N_CHARACTERS_AFTER = 1;
namespace
{
void sortKWIX(string** list, int size);
void advanceChain(string chain[], string newStr);
void printKWIX(string** list, int size, ostream& outStream);
} // namespace
namespace project15
{
void getKWIX(string keyWords[], int size, string file, ostream& outStream);
} //namespace project15
#endif // PROJECT15_H
/******************************************************************************//******************************************************************************//******************************************************************************/
// function definitions for project 15
#include"Project15.h"
namespace
{
void sortKWIX(string** list, int size)
{
string tmp;
for (int outer = 0; outer < size - 1; outer++)
for (int inner = outer + 1; inner < size; inner++)
if (strcmp(list[outer][0].c_str(), list[inner][0].c_str()) > 0)
{
for (int index = 0; index < 3; index++)
{
tmp = list[outer][index];
list[outer][index] = list[inner][index];
list[inner][index] = tmp;
}
}
}
void advanceChain(string chain[], string newStr)
{
for (int index = 0; index < N_CHARACTERS_AFTER + N_CHARACTERS_BEFORE; index++)
chain[index] = chain[index + 1];
chain[N_CHARACTERS_AFTER + N_CHARACTERS_BEFORE] = newStr;
}
void printKWIX(string** list, int size, ostream& outStream)
{
outStream.setf(ios::left);
outStream.width(12);
outStream << "KWIX Listing:\n"
<< "Keyword " << "Line Number " << "Keyword in Context\n";
for (int index = 0; index < size; index++)
{
for (int ncol = 0; ncol < 3; ncol++)
outStream << list[index][ncol] << " ";
outStream << endl;
}
}
} // namespace
namespace project15
{
void getKWIX(string keyWords[], int size, string file, ostream& outStream)
{
using namespace std;
ifstream fin(file.c_str());
if (fin.fail())
{
cout << "Input file openning failed.\n";
exit(1);
}
string inp;
char next;
int nChars = 0;
int nline = 0;
int position = 0;
vector<string> keys;
vector<int> positions;
vector<int> nlines;
while (fin >> inp)
{
position++;
if (!fin.eof())
fin.get(next);
else
next = '\0';
// determine line number
if ((nChars + inp.length() > MAX_CHARACTERS) || (next == '\n'))
{
nline++;
nChars = inp.length();
}
else
nChars += inp.length();
if (next != '\n')
nChars++;
if (!fin.eof() && (fin.peek() == '\n'))
nline++;
// determine whether inp is a keyword
// first, determine if the last character in inp is not an alphabet
if (!(isalpha(inp[inp.length() - 1])))
inp = inp.substr(0, inp.length() - 1);
for (int index = 0; index < size;index++)
if (inp == keyWords[index])
{
keys.push_back(inp);
positions.push_back(position);
nlines.push_back(nline);
break;
}
}
fin.close();
// open the file for the 2nd time to get contexts
fin.open(file.c_str());
if (fin.fail())
{
cout << "Input file openning failed.\n";
exit(1);
}
position = 0;
int index = 0;
vector<string> contexts;
string tmp[N_CHARACTERS_AFTER + N_CHARACTERS_BEFORE + 1];
for (int iter = 0; iter < N_CHARACTERS_AFTER + N_CHARACTERS_BEFORE + 1; iter++)
tmp[iter] = "";
while (fin >> inp)
{
//position++;
advanceChain(tmp, inp);
if (!fin.eof())
fin.get(next);
else
next = '\0';
// determine where to start to save contexts
if (positions[index] - position <= N_CHARACTERS_BEFORE)
{
// determine where to start to save strings
string str = "";
if (positions[index] - N_CHARACTERS_BEFORE < 0)
{
for (int iter = N_CHARACTERS_AFTER + N_CHARACTERS_BEFORE - position; iter < N_CHARACTERS_AFTER + N_CHARACTERS_BEFORE + 1; iter++)
str += tmp[iter];
position++;
}
else
{
for (int iter = N_CHARACTERS_AFTER - position + positions[index]; iter < N_CHARACTERS_AFTER + N_CHARACTERS_BEFORE + 1; iter++)
str += tmp[iter];
position++;
}
for (int iter = 0; iter < N_CHARACTERS_AFTER; iter++)
{
if (!fin.eof())
{
fin >> inp;
advanceChain(tmp, inp);
str += inp;
position++;
}
}
contexts.push_back(str);
index++;
}
}
fin.close();
// form a new list
string** list = new string*[keys.size()];
for (int i = 0; i < keys.size(); i++)
list[i] = new string[3];
stringstream ss;
for (int nrow = 0; nrow < keys.size(); nrow++)
{
list[nrow][0] = keys[nrow];
ss.str("");
ss << nlines[nrow];
list[nrow][1] = ss.str();
list[nrow][2] = contexts[nrow];
}
sortKWIX(list, keys.size());
printKWIX(list, keys.size(), outStream);
}
} //namespace project15
) right now, but I want to hear from you what kind of method you would like to use to solve this problem. Would you give me some hint?
===================================================================== Problem:
In this program you are to process text to create a KWIX table (Key Word In conteXt table). The idea is to produce a list of keywords (not programming language keywords, rather words that have important technical meaning in a discussion),then for each instance of each keyword, place the keyword, the line number of the context, and the keyword in its context in the table. There may be more than one context for a given keyword. The sequence of entries within a keyword is to be the order of occurrence in the text. For this problem, “context” is a user-selected number of words before the keyword, the keyword itself, and a user-selected number of words after the keyword. The table has an alphabetized column of keywords followed by a line number(s) where the keyword occurs, followed by a column of all contexts within which the keyword occurs. See the following example.
Hints: To get your list of keywords, you should choose and type in several paragraphs from the text, then omit from your paragraph “boring” words such as forms of the verb “to be”; pronouns such as I, me, he, she, her, you, us, them, who, which, etc. Finally, sort the keyword list and remove duplicates. The better job you do at this, the more useful output you will get.
Example: A paragraph and its KWIX Listing:
There are at least two complications when reading and writing with random access via an fstream : (1) You normally work in bytes using the typechar or arrays of char and need to handle type conversions on your own, and (2) you typically need to position a pointer (indicating where the read or write begins) before each read or write.
KWIX Listing:
Keyword Line Number Keyword in Context
access 2 with random access via
arrays 3 char or arrays of
bytes 2 work in bytes using char 3 the type
char or
char 3 array of char and
conversions 3 handle type conversions on
The table is longer than these sample entries.

Recursive generation of all “words” of arbitrary length

I have a working function that generates all possible “words” of a specific length, i.e.
AAAAA
BAAAA
CAAAA
...
ZZZZX
ZZZZY
ZZZZZ
I want to generalize this function to work for arbitrary lengths.
In the compilable C++ code below
iterative_generation() is the working function and
recursive_generation() is the WIP replacement.
Keep in mind that the output of the two functions not only differs slightly, but is also mirrored (which doesn’t really make a difference for my implementation).
#include <iostream>
using namespace std;
const int alfLen = 26; // alphabet length
const int strLen = 5; // string length
char word[strLen]; // the word that we generate using either of the
// functions
void iterative_generation() { // all loops in this function are
for (int f=0; f<alfLen; f++) { // essentially the same
word[0] = f+'A';
for (int g=0; g<alfLen; g++) {
word[1] = g+'A';
for (int h=0; h<alfLen; h++) {
word[2] = h+'A';
for (int i=0; i<alfLen; i++) {
word[3] = i+'A';
for (int j=0; j<alfLen; j++) {
word[4] = j+'A';
cout << word << endl;
}
}
}
}
}
}
void recursive_generation(int a) {
for (int i=0; i<alfLen; i++) { // the i variable should be accessible
if (0 < a) { // in every recursion of the function
recursive_generation(a-1); // will run for a == 0
}
word[a] = i+'A';
cout << word << endl;
}
}
int main() {
for (int i=0; i<strLen; i++) {
word[i] = 'A';
}
// uncomment the function you want to run
//recursive_generation(strLen-1); // this produces duplicate words
//iterative_generation(); // this yields is the desired result
}
I think the problem might be that I use the same i variable in all the recursions. In the iterative function every for loop has its own variable.
What the exact consequences of this are, I can’t say, but the recursive function sometimes produces duplicate words (e.g. ZAAAA shows up twice in a row, and **AAA gets generated twice).
Can you help me change the recursive function so that its result is the same as that of the iterative function?
EDIT
I realised I only had to print the results of the innermost function. Here’s what I changed it to:
#include <iostream>
using namespace std;
const int alfLen = 26;
const int strLen = 5;
char word[strLen];
void recursive_generation(int a) {
for (int i=0; i<alfLen; i++) {
word[a] = i+'A';
if (0 < a) {
recursive_generation(a-1);
}
if (a == 0) {
cout << word << endl;
}
}
}
int main() {
for (int i=0; i<strLen; i++) {
word[i] = 'A';
}
recursive_generation(strLen-1);
}

It turns out you don't need recursion after all to generalize your algorithm to words of arbitrary length.
All you need to do is "count" through the possible words. Given an arbitrary word, how would you go to the next word?
Remember how counting works for natural numbers. If you want to go from 123999 to its successor 124000, you replace the trailing nines with zeros and then increment the next digit:
123999
|
123990
|
123900
|
123000
|
124000
Note how we treated a number as a string of digits from 0 to 9. We can use exactly the same idea for strings over other alphabets, for example the alphabet of characters from A to Z:
ABCZZZ
|
ABCZZA
|
ABCZAA
|
ABCAAA
|
ABDAAA
All we did was replace the trailing Zs with As and then increment the next character. Nothing magic.
I suggest you now go implement this idea yourself in C++. For comparison, here is my solution:
#include <iostream>
#include <string>
void generate_words(char first, char last, int n)
{
std::string word(n, first);
while (true)
{
std::cout << word << '\n';
std::string::reverse_iterator it = word.rbegin();
while (*it == last)
{
*it = first;
++it;
if (it == word.rend()) return;
}
++*it;
}
}
int main()
{
generate_words('A', 'Z', 5);
}
If you want to count from left to right instead (as your example seems to suggest), simply replace reverse_iterator with iterator, rbegin with begin and rend with end.

You recursive solution have 2 errors:
If you need to print in alphabetic order,'a' need to go from 0 up, not the other way around
You only need to print at the last level, otherwise you have duplicates
void recursive_generation(int a) {
for (int i=0; i<alfLen; i++)
{ // the i variable should be accessible
word[a] = i+'A';
if (a<strLen-1)
// in every recursion of the function
recursive_generation(a+1); // will run for a == 0
else
cout << word << '\n';
}
}

As I am inspired from #fredoverflow 's answer, I created the following code which can do the same thing at a higher speed relatively.
#include <iostream>
#include <cstdlib>
#include <cstring>
#include <ctime>
#include <cmath>
void printAllPossibleWordsOfLength(char firstChar, char lastChar, int length) {
char *word = new char[length];
memset(word, firstChar, length);
char *lastWord = new char[length];
memset(lastWord, lastChar, length);
int count = 0;
std::cout << word << " -> " << lastWord << std::endl;
while(true) {
std::cout << word << std::endl;
count += 1;
if(memcmp(word, lastWord, length) == 0) {
break;
}
if(word[length - 1] != lastChar) {
word[length - 1] += 1;
} else {
for(int i=1; i<length; i++) {
int index = length - i - 1;
if(word[index] != lastChar) {
word[index] += 1;
memset(word+index+1, firstChar, length - index - 1);
break;
}
}
}
}
std::cout << "count: " << count << std::endl;
delete[] word;
delete[] lastWord;
}
int main(int argc, char* argv[]) {
int length;
if(argc > 1) {
length = std::atoi(argv[1]);
if(length == 0) {
std::cout << "Please enter a valid length (i.e., greater than zero)" << std::endl;
return 1;
}
} else {
std::cout << "Usage: go <length>" << std::endl;
return 1;
}
clock_t t = clock();
printAllPossibleWordsOfLength('A', 'Z', length);
t = clock() - t;
std:: cout << "Duration: " << t << " clicks (" << ((float)t)/CLOCKS_PER_SEC << " seconds)" << std::endl;
return 0;
}

File Storage and Retrieval

I am a high school student programming as a hobby. I make free stuff and I am working on a game using opengl. I need to save and load data but when met with difficulty I made the following to test my methods.
The save file 'shiptest' is correct but when I open the second file 'shipout' which is created with the save data from 'shiptest' only the first line is there. At first I thought that my array wasn't loading any new data and the clear function wasn't getting rid of the first elements. I corrected this assumption by overwriting those lines after saving the data and observing that the saved lines were loaded after all. My new assumption is that the getline func is only getting the first line each time it's called; but i do not know how to fix this.
#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <stdio.h>
#include <stdlib.h>
unsigned short int shipPart;
float editShip[256][3];//part ID, x relative, y relative, r,g,b
float activeShip[256][3];
void CLEAR(bool edit)
{
for (int n = 0; n < 256; n++)
{
if (edit)
editShip[n][0] = -1;
else
activeShip[n][0] = -1;
}
}
void saveEdit(std::string name)
{
std::ofstream out;
out.open ("ship" + name + ".txt", std::ofstream::out);
for (int n = 0; n < 256; n++)
{
for (int i = 0; i < 3; i++)
{
if (editShip[n][0] == -1)
break;
out << editShip[n][i] << " ";
}
out << "\n";
}
out.close();
}
void load(std::string name, bool edit)
{
CLEAR(edit);
std::ifstream in;
in.open ("ship" + name + ".txt", std::ifstream::in);
std::string line, buf;
std::stringstream ss;
int i;
for (int n = 0; n < 3; n++)
{
getline(in, line);
ss << line;
i=0;
while (ss >> buf)
{
if (edit)
editShip[n][i] = atof(buf.c_str());
else
activeShip[n][i] = atof(buf.c_str());
i++;
}
}
in.close();
}
int main()
{
for (int n = 0; n < 256; n++)
{
editShip[n][0] = -1;
activeShip[n][0] = -1;
}
editShip[0][0] = 5;
editShip[0][1] = .11;
editShip[0][2] = .22;
editShip[1][0] = 4;
editShip[1][1] = .33;
editShip[1][2] = .44;
editShip[2][0] = 3;
editShip[2][1] = .55;
editShip[2][2] = .66;
saveEdit("test");
editShip[0][0] = 5000;
editShip[0][1] = 8978;
editShip[0][2] = 8888;
load("test",1);
saveEdit("out");
std::cout << "Hello world!" << std::endl;
return 0;
}

In load(), you keep appending more lines to your stringstream ss but its eof flag is probably remaining set from the previous time through the loop, so even though there's more to read from it, eof is already set so it won't continue providing data via operator>>(). If you simply call ss.clear() at the top of the for() loop, you'll start with an empty stringstream on each loop, and I think you'll get what you want.

In your load() function:
for (int n = 0; n < 3; n++)
{
ss.clear(); //< Clear ss here before you use it!
getline(in, line);
ss << line;
i=0;
while (ss >> buf)
{
if (edit)
editShip[n][i] = atof(buf.c_str());
else
activeShip[n][i] = atof(buf.c_str());
i++;
}
}
Getline() was working just fine. Just clear the stringstream before you use it and you're good to go. Ran this code on my computer and it works as desired.
EDIT: Ack! Just saw that phonetagger said the same thing while I was making my answer. He deserves the +1's not me.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Stop Words in C++ - c++

Related

Removing all the vowels in a string in c++

should I launch multiple threads if my CPU load suggests otherwise?

How to solve Project 12.15 in Walter Savitch Absolute C++ 5th Ed

Recursive generation of all “words” of arbitrary length

File Storage and Retrieval

Categories

Resources