As per request of the fantastic fellas over at the C++ chat lounge, what is a good way to break down a file (which in my case contains a string with roughly 100 lines, and about 10 words in each line) and insert all these words into a std::set?
The easiest way to construct any container from a source that holds a series of that element, is to use the constructor that takes a pair of iterators. Use istream_iterator to iterate over a stream.
#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
using namespace std;
int main()
{
//I create an iterator that retrieves `string` objects from `cin`
auto begin = istream_iterator<string>(cin);
//I create an iterator that represents the end of a stream
auto end = istream_iterator<string>();
//and iterate over the file, and copy those elements into my `set`
set<string> myset(begin, end);
//this line copies the elements in the set to `cout`
//I have this to verify that I did it all right
copy(myset.begin(), myset.end(), ostream_iterator<string>(cout, "\n"));
return 0;
}
http://ideone.com/iz1q0
Assuming you've read your file into a string, boost::split will do the trick:
#include <set>
#include <boost/foreach.hpp>
#include <boost/algorithm/string.hpp>
std::string astring = "abc 123 abc 123\ndef 456 def 456"; // your string
std::set<std::string> tokens; // this will receive the words
boost::split(tokens, astring, boost::is_any_of("\n ")); // split on space & newline
// Print the individual words
BOOST_FOREACH(std::string token, tokens){
std::cout << "\n" << token << std::endl;
}
Lists or Vectors can be used instead of a Set if necessary.
Also note this is almost a dupe of:
Split a string in C++?
#include <set>
#include <iostream>
#include <string>
int main()
{
std::string temp, mystring;
std::set<std::string> myset;
while(std::getline(std::cin, temp))
mystring += temp + ' ';
temp = "";
for (size_t i = 0; i < mystring.length(); i++)
{
if (mystring.at(i) == ' ' || mystring.at(i) == '\n' || mystring.at(i) == '\t')
{
myset.insert(temp);
temp = "";
}
else
{
temp.push_back(mystring.at(i));
}
}
if (temp != " " || temp != "\n" || temp != "\t")
myset.insert(temp);
for (std::set<std::string>::iterator i = myset.begin(); i != myset.end(); i++)
{
std::cout << *i << std::endl;
}
return 0;
}
Let's start at the top. First off, you need a few variables to work with. temp is just a placeholder for the string while you build it from each character in the string you want to parse. mystring is the string you are looking to split up and myset is where you will be sticking the split strings.
So then we read the file (input through < piping) and insert the contents into mystring.
Now we want to iterate down the length of the string, searching for spaces, newlines, or tabs to split the string up with. If we find one of those characters, then we need to insert the string into the set, and empty our placeholder string, otherwise, we add the character to the placeholder, which will build up the string. Once we finish, we need to add the last string to the set.
Finally, we iterate down the set, and print each string, which is simply for verification, but could be useful otherwise.
Edit: A significant improvement on my code provided by Loki Astari in a comment which I thought should be integrated into the answer:
#include <set>
#include <iostream>
#include <string>
int main()
{
std::set<std::string> myset;
std::string word;
while(std::cin >> word)
{
myset.insert(std::move(word));
}
for(std::set<std::string>::const_iterator it=myset.begin(); it!=myset.end(); ++it)
std::cout << *it << '\n';
}
Related
I would like to store a dictionary in a vector of lists. Each lists contains all words that have the same starting letter in the alphabet. (e. g. ananas, apple)
My problem is that I cannot read any words starting with "z" in my const char* array into the list.
Could someone explain to me why and how to fix this/ Is there a way to realize it with const char*? Thank you!
#include <iostream>
#include <list>
#include <vector>
#include <iterator>
#include <algorithm>
#include <string>
#include <fstream>
std::pair<bool, std::vector<std::list<std::string>> > loadwithList()
{
const char* prefix = "abcdefghijklmnopqrstuvwxyz";
std::vector<std::list<std::string>> dictionary2;
std::ifstream infile("/Users/User/Desktop/Speller/Dictionaries/large", std::ios::in);
if (infile.is_open())
{
std::list<std::string> data;
std::string line;
while (std::getline(infile, line))
{
if (line.starts_with(*prefix) && *prefix != '\0')
{
data.push_front(line);
}
else
{
dictionary2.push_back(data);
data.clear();
prefix++;
}
}
infile.close();
return std::make_pair(true, dictionary2);
}
std::cout << "Cant find file\n";
return std::make_pair(false, dictionary2);
}
int main()
{
auto [loaded, dictionary2] = loadwithList();
if (!loaded) return 1;
}
Answer is already given and problems are explained.
Basically you would need a double nested loop. Outer loop would read word by word, inner loop would check a mtach for each of the characters in "prefix". This will be a lot of looping . . .
And somehow not efficient. It would be better to take a std::mapfor storing the data in the first place. And if you really need a std::vectorof std::lists, then we can copy the data. We will take care to store only lowercase alpha characters as the key of the std::map.
For test purposes I loaded a list with words from here. There are roundabout 450'000 words in this list.
I used this for my demo program.
Please see below one potential solution proposal:
#include <iostream>
#include <fstream>
#include <map>
#include <list>
#include <vector>
#include <utility>
#include <string>
#include <cctype>
std::pair<bool, std::vector<std::list<std::string>> > loadwithList() {
std::vector<std::list<std::string>> data{};
bool resultOK{};
// Open File and check, if it could be opened
if (std::ifstream ifs{ "r:\\words.txt" }; ifs) {
// Here we will store the dictionary
std::map<char, std::list<std::string>> dictionary{};
// Fill dictionary. Read complete file and sort according to firstc character
for (std::string line{}; std::getline(ifs, line); )
// Store only alpha letters and words
if (not line.empty() and std::isalpha(line.front()))
// Use lower case start character for map. All all words starting with that character
dictionary[std::tolower(line.front())].push_back(line);
// Reserve space for resulting vector
data.reserve(dictionary.size());
// Move result to vector
for (auto& [letter, words] : dictionary)
data.push_back(std::move(words));
// All good
resultOK = true;
}
else
std::cerr << "\n\n*** Error: Could not open source file\n\n";
// And give back the result
return { resultOK , data };
}
int main() {
auto [result, data] = loadwithList();
if ( result)
for (const std::list<std::string>&wordList : data)
std::cout << (char)std::tolower(wordList.front().front()) << " has " << wordList.size() << "\twords\n";
}
You loose the first word of each letter after 'a'. This is because when you reach a word of the next letter, the if(line.starts_with(*prefix) && *prefix != '\0') fails and only then you go to the next letter but also go to the next word.
You loose the whole letter 'z' because after the last line in your file - the if(line.starts_with(*prefix) && *prefix != '\0') has succeeded at this point - the while (std::getline(infile, line)) terminates and you miss the dictionary2.push_back(data);.
This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 2 years ago.
I am missing the last word of string. this is code I used to store word into array.
string arr[10];
int Add_Count = 0;
string sentence = "I am unable to store last word"
string Words = "";
for (int i = 0; i < sentence.length(); i++)
{
if (Sentence[i] == ' ')
{
arr[Add_Count] = Words;
Words = "";
Add_Count++;
}
else if (isalpha(Sentence[i]))
{
Words = Words + sentence[i];
}
}
Let's print the arr:
for(int i =0; i<10; i++)
{
cout << arr[i] << endl;
}
You are inserting the word found when you see a blank character.
Since the end of the string is not a blank character, the insertion for the last word never happens.
What you can do is:
(1) If the current character is black, skip to the next character.
(2) See the next character of current character.
(2-1) If the next character is blank, insert the accumulated word.
(2-2) If the next character doesn't exist (end of the sentence), insert the accumulated word.
(2-3) If the next character is not blank, accumulate word.
Obviously you lost the last word because when you go to the end the last word is not extracted yet. You can add this line to get the last word
if (Words.length() != 0) {
arr[Add_Count] = Words;
Words = "";
}
Following on from the very good approach by #Casey, but adding the use of std::vector instead of an array, allows you to break a line into as many words as may be included in it. Using the std::stringstream and extracting with >> allows a simple way to tokenize the sentence while ignoring leading, multiple included and trailing whitespace.
For example, you could do:
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
int main (void) {
std::string sentence = " I am unable to store last word ",
word {};
std::stringstream ss (sentence); /* create stringstream from sentence */
std::vector<std::string> words {}; /* vector of strings to hold words */
while (ss >> word) /* read word */
words.push_back(word); /* add word to vector */
/* output original sentence */
std::cout << "sentence: \"" << sentence << "\"\n\n";
for (const auto& w : words) /* output all words in vector */
std::cout << w << '\n';
}
Example Use/Output
$ ./bin/tokenize_sentence_ss
sentence: " I am unable to store last word "
I
am
unable
to
store
last
word
If you need more fine-grained control, you can use std::string::find_first_of and std::string::find_first_not_of with a set of delimiters to work your way through a string finding the first character in a token with std::string::find_first_of and then skipping over delimiters to the start of the next token with std::string::find_first_not_of. That involves a bit more arithmetic, but is a more flexible alternative.
This happens because the last word has no space after it, just add this line after for loop.
arr[Add_Count] = Words;
My version :
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
int main() {
std::istringstream iss("I am unable to store last word");
std::vector<std::string> v(std::istream_iterator<std::string>(iss), {});
std::copy(v.begin(), v.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Sample Run :
I
am
unable
to
store
last
word
If you know you won't have to worry about punctuation, the easiest way to handle it is to throw the string into a istringstream. You can use the extraction operator overload to extract the "words". The extraction operator defaults to splitting on whitespace and automatically terminates at the end of the stream:
#include <algorithm>
#include <sstream>
#include <string>
#include <vector>
std::string sentence = // ... Get the string from cin, a file, or hard-code it here.
std::istringstream ss(sentence);
std::vector<std::string> arr;
arr.reserve(1 + std::count(std::cbegin(sentence), std::cend(sentence), ' '));
std::string word;
while(ss >> word) {
arr.push_back(word);
}
I am creating a function that splits a sentence into words, and believe the way to do this is to use str.substr, starting at str[0] and then using str.find to find the index of the first " " character. Then update the starting position parameter of str.find to start at the index of that " " character, until the end of str.length().
I am using two variables to mark the beginning position and end position of the word, and update the beginning position variable with the ending position of the last. But it is not updating as desired in the loop as I currently have it, and cannot figure out why.
#include <iostream>
#include <string>
using namespace std;
void splitInWords(string str);
int main() {
string testString("This is a test string");
splitInWords(testString);
return 0;
}
void splitInWords(string str) {
int i;
int beginWord, endWord, tempWord;
string wordDelim = " ";
string testWord;
beginWord = 0;
for (i = 0; i < str.length(); i += 1) {
endWord = str.find(wordDelim, beginWord);
testWord = str.substr(beginWord, endWord);
beginWord = endWord;
cout << testWord << " ";
}
}
It is easier to use a string stream.
#include <vector>
#include <string>
#include <sstream>
using namespace std;
vector<string> split(const string& s, char delimiter)
{
vector<string> tokens;
string token;
istringstream tokenStream(s);
while (getline(tokenStream, token, delimiter))
{
tokens.push_back(token);
}
return tokens;
}
int main() {
string testString("This is a test string");
vector<string> result=split(testString,' ');
return 0;
}
You can write it using the existing C++ libraries:
#include <string>
#include <vector>
#include <iterator>
#include <sstream>
int main()
{
std::string testString("This is a test string");
std::istringstream wordStream(testString);
std::vector<std::string> result(std::istream_iterator<std::string>{wordStream},
std::istream_iterator<std::string>{});
}
Couple of issues:
The substr() method second parameter is a length (not a position).
// Here you are using `endWord` which is a poisition in the string.
// This only works when beginWord is 0
// for all other values you are providing an incorrect len.
testWord = str.substr(beginWord, endWord);
The find() method searches from the second paramer.
// If str[beginWord] contains one of the delimiter characters
// Then it will return beginWord
// i.e. you are not moving forward.
endWord = str.find(wordDelim, beginWord);
// So you end up stuck on the first space.
Assuming you got the above fixed. You would be adding space at the front of each word.
// You need to actively search and remove the spaces
// before reading the words.
nice things you could do:
Here:
void splitInWords(string str) {
You are passing the parameter by value. This means you are making a copy. A better technique would be to pass by const reference (you are not modifying the original or the copy).
void splitInWords(string const& str) {
An Alternative
You can use the stream functionality.
void split(std::istream& stream)
{
std::string word;
stream >> word; // This drops leading space.
// Then reads characters into `word`
// until a "white space" character is
// found.
// Note: it emptys words before adding any
}
I want to find a specific string in a list of sentence. Each sentence is a line delimited with a \n. When the newline is reached the current search should stop and start new on the next line.
My program is:
#include <iostream>
#include <string.h>
using namespace std;
int main(){
string filename;
string list = "hello.txt\n abc.txt\n check.txt\n"
cin >> filename;
// suppose i run programs 2 times and at 1st time i enter abc.txt
// and at 2nd time i enter abc
if(list.find(filename) != std::string::npos){
//I want this condition to be true only when user enters complete
// file name. This condition also becoming true even for 'abc' or 'ab' or even for 'a' also
cout << file<< "exist in list";
}
else cout<< "file does not exist in list"
return 0;
}
Is there any way around. i want to find only filenames in the list
list.find will only find substring in the string list, but if you want to compare the whole string till you find the \n, you can tokenize the list and put in some vector.
For that, you can put the string list in std::istringstream and make a std::vector<std::string> out of it by using std::getline like:
std::istringstream ss(list);
std::vector<std::string> tokens;
std::string temp;
while (std::getline(ss, temp)){
tokens.emplace_back(temp);
}
If there are leading or trailing spaces in the tokens, you can trim the tokens before adding them to the vector. For trimming, see What's the best way to trim std::string?, find a trimming solution from there that suits you.
And after that, you can use find from <algorithm> to check for complete string in that vector.
if (std::find(tokens.begin(), tokens.end(), filename) != tokens.end())
std::cout << "found" << std::endl;
First of all I wouldn't keep the list of files in a single string, but I would use any sort of list or vector.
Then if keeping the list in a string is a necessity of yours (for some kind of reason in your application logic) I would separate the string in a vector, then cycle through the elements of the vector checking if the element is exactly the one searched.
To split the elements I would do:
std::vector<std::string> split_string(const std::string& str,
const std::string& delimiter)
{
std::vector<std::string> strings;
std::string::size_type pos = 0;
std::string::size_type prev = 0;
while ((pos = str.find(delimiter, prev)) != std::string::npos)
{
strings.push_back(str.substr(prev, pos - prev));
prev = pos + 1;
}
// To get the last substring (or only, if delimiter is not found)
strings.push_back(str.substr(prev));
return strings;
}
You can see an example of the function working here
Then just use the function and change your code to:
#include <iostream>
#include <string.h>
#include <vector>
using namespace std;
int main(){
string filename;
string list = "hello.txt\n abc.txt\n check.txt\n"
cin >> filename;
vector<string> fileList = split_string(list, "\n");
bool found = false;
for(int i = 0; i<fileList.size(); i++){
if(fileList.at(i) == file){
found = true;
}
}
if(found){
cout << file << "exist in list";
} else {
cout << "file does not exist in list";
}
return 0;
}
Obviously you need to declare and implement the function split_string somewhere in your code. Possibly before main declaration.
How can I find the position of a character in a string? Ex. If I input "abc*ab" I would like to create a new string with just "abc". Can you help me with my problem?
C++ standard string provides a find method:
s.find(c)
returns the position of first instance of character c into string s or std::string::npos in case the character is not present at all. You can also pass the starting index for the search; i.e.
s.find(c, x0)
will return the first index of character c but starting the search from position x0.
std::find returns an iterator to the first element it finds that compares equal to what you're looking for (or the second argument if it doesn't find anything, in this case the end iterator.) You can construct a std::string using iterators.
#include <iostream>
#include <string>
#include <algorithm>
int main()
{
std::string s = "abc*ab";
std::string s2(s.begin(), std::find(s.begin(), s.end(), '*'));
std::cout << s2;
return 0;
}
If you are working with std::string type, then it is very easy to find the position of a character, by using std::find algorithm like so:
#include <string>
#include <algorithm>
#include <iostream>
using namespace std;
int main()
{
string first_string = "abc*ab";
string truncated_string = string( first_string.cbegin(), find( first_string.cbegin(), first_string.cend(), '*' ) );
cout << truncated_string << endl;
}
Note: if your character is found multiple times in your std::string, then the find algorithm will return the position of the occurrence.
Elaborating on existing answers, you can use string.find() and string.substr():
#include <iostream>
#include <string>
int main() {
std::string s = "abc*ab";
size_t index = s.find("*");
if (index != std::string::npos) {
std::string prefix = s.substr(0, index);
std::cout << prefix << "\n"; // => abc
}
}