C++ split string - c++

I am trying to split a string using spaces as a delimiter. I would like to store each token in an array or vector.
I have tried.
string tempInput;
cin >> tempInput;
string input[5];
stringstream ss(tempInput); // Insert the string into a stream
int i=0;
while (ss >> tempInput){
input[i] = tempInput;
i++;
}
The problem is that if i input "this is a test", the array only seems to store input[0] = "this". It does not contain values for input[2] through input[4].
I have also tried using a vector but with the same result.

Go to the duplicate questions to learn how to split a string into words, but your method is actually correct. The actual problem lies in how you are reading the input before trying to split it:
string tempInput;
cin >> tempInput; // !!!
When you use the cin >> tempInput, you are only getting the first word from the input, not the whole text. There are two possible ways of working your way out of that, the simplest of which is forgetting about the stringstream and directly iterating on input:
std::string tempInput;
std::vector< std::string > tokens;
while ( std::cin >> tempInput ) {
tokens.push_back( tempInput );
}
// alternatively, including algorithm and iterator headers:
std::vector< std::string > tokens;
std::copy( std::istream_iterator<std::string>( std::cin ),
std::istream_iterator<std::string>(),
std::back_inserter(tokens) );
This approach will give you all the tokens in the input in a single vector. If you need to work with each line separatedly then you should use getline from the <string> header instead of the cin >> tempInput:
std::string tempInput;
while ( getline( std::cin, tempInput ) ) { // read line
// tokenize the line, possibly with your own code or
// any answer in the 'duplicate' question
}

Notice that it’s much easier just to use copy:
vector<string> tokens;
copy(istream_iterator<string>(cin),
istream_iterator<string>(),
back_inserter(tokens));
As for why your code doesn’t work: you’re reusing tempInput. Don’t do that. Furthermore, you’re first reading a single word from cin, not the whole string. That’s why only a single word is put into the stringstream.

The easiest way: Boost.Tokenizer
std::vector<std::string> tokens;
std::string s = "This is, a test";
boost::tokenizer<> tok(s);
for(boost::tokenizer<>::iterator it=tok.begin(); it != tok.end(); ++it)
{
tokens.push_back(*it);
}
// tokens is ["This", "is", "a", "test"]
You can parameter the delimiters and escape sequences to only take spaces if you wish, by default it tokenize on both spaces and punctuation.

Here a little algorithm where it splits the string into a list just like python does.
std::list<std::string> split(std::string text, std::string split_word) {
std::list<std::string> list;
std::string word = "";
int is_word_over = 0;
for (int i = 0; i <= text.length(); i++) {
if (i <= text.length() - split_word.length()) {
if (text.substr(i, split_word.length()) == split_word) {
list.insert(list.end(), word);
word = "";
is_word_over = 1;
}
//now we want that it jumps the rest of the split character
else if (is_word_over >= 1) {
if (is_word_over != split_word.length()) {
is_word_over += 1;
continue;
}
else {
word += text[i];
is_word_over = 0;
}
}
else {
word += text[i];
}
}
else {
word += text[i];
}
}
list.insert(list.end(), word);
return list;
}
There probably exists a more optimal way to write this.

Related

How to read input from file and pair into a map in C++

I am trying to read through a text file that can possibly look like below.
HI bye
goodbye
foo bar
boy girl
one two three
I am trying to take the lines with only two words and store them in a map, the first word would be the key and second word would be the value.
below is the code I came up with but I can't figure out how to ignore the lines that do not have two words on them.
this only works properly if every line has two words. I understand why this is only working if every line has two words but, I'm not sure what condition I can add to prevent this.
pair myPair;
map myMap;
while(getline(file2, line, '\0'))
{
stringstream ss(line);
string word;
while(!ss.eof())
{
ss >> word;
myPair.first = word;
ss >> word;
myPair.second = word;
myMap.insert(myPair);
}
}
map<string, string>::iterator it=myMap.begin();
for(it=myMap.begin(); it != myMap.end(); it++)
{
cout<<it->first<<" "<<it->second<<endl;
}
Read two words into a temporary pair. If you can't, do not add the pair to the map. If you can read two words, see if you can read a third word. If you can, you have too many words on the line. Do not add.
Example:
while(getline(file2, line, '\0'))
{
stringstream ss(line);
pair<string,string> myPair;
string junk;
if (ss >> myPair.first >> myPair.second && !(ss >> junk))
{ // successfully read into pair, but not into a third junk variable
myMap.insert(myPair);
}
}
let me suggest a little different implementation
std::string line;
while (std::getline(infile, line)) {
// Vector of string to save tokens
vector <string> tokens;
// stringstream class check1
stringstream check1(line);
string intermediate;
// Tokenizing w.r.t. space ' '
while(getline(check1, intermediate, ' ')) {
tokens.push_back(intermediate);
}
if (tokens.size() == 2) {
// your condition of 2 words in a line apply
// process 1. and 2. item of vector here
}
}
You can use fscanf for take input from file and sscanf for take input from string with format. sscanf return how many input successfully take with given format. so you can easily check, how many word have a line.
#include<stdio.h>
#include<stdlib.h>
#include <iostream>
using namespace std;
int main()
{
char line[100];
FILE *fp = fopen("inp.txt", "r");
while(fscanf(fp, " %[^\n]s", line) == 1)
{
cout<<line<<endl;
char s1[100], s2[100];
int take = sscanf(line, "%s %s", s1, s2);
cout<<take<<endl;
}
return 0;
}

Reading in only letters from a text file

I am trying to read in from a text file a poem that contains commas, spaces, periods, and newline character. I am trying to use getline to read in each separate word. I do not want to read in any of the commas, spaces, periods, or newline character. As I read in each word I am capitalizing each letter then calling my insert function to insert each word into a binary search tree as a separate node. I do not know the best way to separate each word. I have been able to separate each word by spaces but the commas, periods, and newline characters keep being read in.
Here is my text file:
Roses are red,
Violets are blue,
Data Structures is the best,
You and I both know it is true.
The code I am using is this:
string inputFile;
cout << "What is the name of the text file?";
cin >> inputFile;
ifstream fin;
fin.open(inputFile);
//Input once
string input;
getline(fin, input, ' ');
for (int i = 0; i < input.length(); i++)
{
input[i] = toupper(input[i]);
}
//check for duplicates
if (tree.Find(input, tree.Current, tree.Parent) == true)
{
tree.Insert(input);
countNodes++;
countHeight = tree.Height(tree.Root);
}
Basically I am using the getline(fin,input, ' ') to read in my input.
I was able to figure out a solution. I was able to read in an entire line of code into the variable line, then I searched each letter of the word and only kept what was a letter and I stored that into word.Then, I was able to call my insert function to insert the Node into my tree.
const int MAXWORDSIZE = 50;
const int MAXLINESIZE = 1000;
char word[MAXWORDSIZE], line[MAXLINESIZE];
int lineIdx, wordIdx, lineLength;
//get a line
fin.getline(line, MAXLINESIZE - 1);
lineLength = strlen(line);
while (fin)
{
for (int lineIdx = 0; lineIdx < lineLength;)
{
//skip over non-alphas, and check for end of line null terminator
while (!isalpha(line[lineIdx]) && line[lineIdx] != '\0')
++lineIdx;
//make sure not at the end of the line
if (line[lineIdx] != '\0')
{
//copy alphas to word c-string
wordIdx = 0;
while (isalpha(line[lineIdx]))
{
word[wordIdx] = toupper(line[lineIdx]);
wordIdx++;
lineIdx++;
}
//make it a c-string with the null terminator
word[wordIdx] = '\0';
//THIS IS WHERE YOU WOULD INSERT INTO THE BST OR INCREMENT FREQUENCY COUNTER IN THE NODE
if (tree.Find(word) == false)
{
tree.Insert(word);
totalNodes++;
//output word
//cout << word << endl;
}
else
{
tree.Counter();
}
}
This is a good time for a technique I've posted a few times before: define a ctype facet that treats everything but letters as white space (searching for imbue will show several examples).
From there, it's a matter of std::transform with istream_iterators on the input side, a std::set for the output, and a lambda to capitalize the first letter.
You can make a custom getline function for multiple delimiters:
std::istream &getline(std::istream &is, std::string &str, std::string const& delims)
{
str.clear();
// the 3rd parameter type and the condition part on the right side of &&
// should be all that differs from std::getline
for(char c; is.get(c) && delims.find(c) == std::string::npos; )
str.push_back(c);
return is;
}
And use it:
getline(fin, input, " \n,.");
You can use std::regex to select your tokens
Depending on the size of your file you can read it either line by line or entirely in an std::string.
To read the file you can use :
std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
and this will do the matching for space separated string.
std::regex word_regex(",\\s]+");
auto what =
std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();
std::vector<std::string> v;
for (;what!=wend ; wend) {
std::smatch match = *what;
V.push_back(match.str());
}
I think to separate tokens separated either by , space or new line you should use this regex : (,| \n| )[[:alpha:]].+ . I have not tested though and it might need you to check this out.

Using erase() in a while loop and segfault C++

Okay, so I'm having a bit of a problem here. The thing is this code works on a friend's computer but I'm getting segmentation faults when I try to run it.
I am reading a file looking like so:
word 2 wor ord
anotherword 7 ano oth the her erw wor ord
...
And I want to parse every word of the file. The first two words (e.g. word and 2) are to be erased but saving the first one in another variable in the process.
I've looked around a bit on accomplishing this, and I've come up with this half-assed piece of code that seems to work on my friends' computer but not mine.
Dictionary::Dictionary() {
ifstream ip;
ip.open("words.txt", ifstream::in);
string input;
string buf;
vector<string> tokens; // Holds words
while(getline(ip, input)){
if(input != " ") {
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
string werd = tokens.at(0);
tokens.erase(tokens.begin()); // Remove the word from the vector
tokens.erase(tokens.begin()); // Remove the number indicating trigrams
Word curr(werd, tokens);
words[werd.length()].push_back(curr); // Put the word at the vector with word length i.
tokens.clear();
}
}
ip.close();
}
What's the best of of parsing this kind of structure in a file and removing the first two elements but saving the others? As you can see, I'm making a Word object that contains a string and a vector for later use.
Regards
EDIT; It seems to add the first line fine, but on removal of the second element, it crashes with a segmentation fault error.
EDIT; words.txt contain this:
addict 4 add ddi dic ict
sinister 6 ini ist nis sin ste ter
test 2 est tes
cplusplus 7 cpl lus lus plu plu spl usp
Without leading blank spaces or ending blanks. Not that it reads all the way anyway.
Word.cc:
#include <string>
#include <vector>
#include <algorithm>
#include "word.h"
using namespace std;
Word::Word(const string& w, const vector<string>& t) : word(w), trigrams(t) {}
string Word::get_word() const {
return word;
}
unsigned int Word::get_matches(const vector<string>& t) const {
vector<string> sharedTrigrams;
set_intersection(t.begin(),t.end(), trigrams.begin(), trigrams.end(), back_inserter(sharedTrigrams));
return sharedTrigrams.size();
}
First of all, there is error in the number of closing }s in your posted code. If you indent them properly, you will see that your code is:
while(getline(ip, input))
{
if(input != " ")
{
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
}
string werd = tokens.at(0);
tokens.erase(tokens.begin());
tokens.erase(tokens.begin());
Word curr(werd, tokens);
words[werd.length()].push_back(curr);
tokens.clear();
}
}
Assuming that is a small typo in posting, the other problem is that tokens is an empty list when input == " " yet you continue to use tokens as though it has 2 or more items in it.
You can fix that by moving everything inside the if statement.
while(getline(ip, input))
{
if(input != " ")
{
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
string werd = tokens.at(0);
tokens.erase(tokens.begin());
tokens.erase(tokens.begin());
Word curr(werd, tokens);
words[werd.length()].push_back(curr);
tokens.clear();
}
}
I would add further checks to make it more robust.
while(getline(ip, input))
{
if(input != " ")
{
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
string werd;
if ( !tokens.empty() )
{
werd = tokens.at(0);
tokens.erase(tokens.begin());
}
if ( !tokens.empty() )
{
tokens.erase(tokens.begin());
}
Word curr(werd, tokens);
words[werd.length()].push_back(curr);
tokens.clear();
}
}
You forgot to include the initialization of the variable "words" in your code. Just looking at it, I am guessing you are initializing "words" to be a fixed-length array of vectors, but then read a word that is off the end of the array. Bang, you're dead. Add a check to "werd.length()" to ensure it is strictly less than the length of "words".
ifstream ip;
ip.open("words.txt", ifstream::in);
string input;
while(getline(ip, input)){
istringstream iss(input);
string str;
unsigned int count = 0;
if(iss >> str >> count) {
vector<string> tokens { istream_iterator<string>(iss), istream_iterator<string>() }; // Holds words
if(tokens.size() == count)
words[str.length()].emplace_back(str, tokens);
}
}
ip.close();
This is what I used to make it work.

Fast, Simple CSV Parsing in C++

I am trying to parse a simple CSV file, with data in a format such as:
20.5,20.5,20.5,0.794145,4.05286,0.792519,1
20.5,30.5,20.5,0.753669,3.91888,0.749897,1
20.5,40.5,20.5,0.701055,3.80348,0.695326,1
So, a very simple and fixed format file. I am storing each column of this data into a STL vector. As such I've tried to stay the C++ way using the standard library, and my implementation within a loop looks something like:
string field;
getline(file,line);
stringstream ssline(line);
getline( ssline, field, ',' );
stringstream fs1(field);
fs1 >> cent_x.at(n);
getline( ssline, field, ',' );
stringstream fs2(field);
fs2 >> cent_y.at(n);
getline( ssline, field, ',' );
stringstream fs3(field);
fs3 >> cent_z.at(n);
getline( ssline, field, ',' );
stringstream fs4(field);
fs4 >> u.at(n);
getline( ssline, field, ',' );
stringstream fs5(field);
fs5 >> v.at(n);
getline( ssline, field, ',' );
stringstream fs6(field);
fs6 >> w.at(n);
The problem is, this is extremely slow (there are over 1 million rows per data file), and seems to me to be a bit inelegant. Is there a faster approach using the standard library, or should I just use stdio functions? It seems to me this entire code block would reduce to a single fscanf call.
Thanks in advance!
Using 7 string streams when you can do it with just one sure doesn't help wrt. performance.
Try this instead:
string line;
getline(file, line);
istringstream ss(line); // note we use istringstream, we don't need the o part of stringstream
char c1, c2, c3, c4, c5; // to eat the commas
ss >> cent_x.at(n) >> c1 >>
cent_y.at(n) >> c2 >>
cent_z.at(n) >> c3 >>
u.at(n) >> c4 >>
v.at(n) >> c5 >>
w.at(n);
If you know the number of lines in the file, you can resize the vectors prior to reading and then use operator[] instead of at(). This way you avoid bounds checking and thus gain a little performance.
I believe the major bottleneck (put aside the getline()-based non-buffered I/O) is the string parsing. Since you have the "," symbol as a delimiter, you may perform a linear scan over the string and replace all "," by "\0" (the end-of-string marker, zero-terminator).
Something like this:
// tmp array for the line part values
double parts[MAX_PARTS];
while(getline(file, line))
{
size_t len = line.length();
size_t j;
if(line.empty()) { continue; }
const char* last_start = &line[0];
int num_parts = 0;
while(j < len)
{
if(line[j] == ',')
{
line[j] = '\0';
if(num_parts == MAX_PARTS) { break; }
parts[num_parts] = atof(last_start);
j++;
num_parts++;
last_start = &line[j];
}
j++;
}
/// do whatever you need with the parts[] array
}
I don't know if this will be quicker than the accepted answer, but I might as well post it anyway in case you wish to try it.
You can load in the entire contents of the file using a single read call by knowing the size of the file using some fseek magic. This will be much faster than multiple read calls.
You could then do something like this to parse your string:
//Delimited string to vector
vector<string> dstov(string& str, string delimiter)
{
//Vector to populate
vector<string> ret;
//Current position in str
size_t pos = 0;
//While the the string from point pos contains the delimiter
while(str.substr(pos).find(delimiter) != string::npos)
{
//Insert the substring from pos to the start of the found delimiter to the vector
ret.push_back(str.substr(pos, str.substr(pos).find(delimiter)));
//Move the pos past this found section and the found delimiter so the search can continue
pos += str.substr(pos).find(delimiter) + delimiter.size();
}
//Push back the final element in str when str contains no more delimiters
ret.push_back(str.substr(pos));
return ret;
}
string rawfiledata;
//This call will parse the raw data into a vector containing lines of
//20.5,30.5,20.5,0.753669,3.91888,0.749897,1 by treating the newline
//as the delimiter
vector<string> lines = dstov(rawfiledata, "\n");
//You can then iterate over the lines and parse them into variables and do whatever you need with them.
for(size_t itr = 0; itr < lines.size(); ++itr)
vector<string> line_variables = dstov(lines[itr], ",");
std::ifstream file{ InputFilename };
std::vector<std::string> line_elements;
for (std::string line; std::getline(file, line);)
{
line_elements.clear();
std::istringstream ss(line);
for (std::string value; std::getline(ss, value, ',');)
{
line_elements.push_back(std::move(value));
}
// Do something with the line_elements.
}

C++, How to get multiple input divided by whitespace?

I have a program that need to get multiple cstrings. I current get one at a time and then ask if you want to input another word. I cannot find any simple way to get just one input with words divided be whitespace. i.e. "one two three" and save the the input in an array of cstrings.
typedef char cstring[20]; cstring myWords[50];
At the moment I am trying to use getline and save the input to a cstring and then I am trying to use the string.h library to manipulate it. Is that the right approach? How else could this be done?
If you really have to use c-style strings, you could use istream::getline, strtok and strcpy functions:
typedef char cstring[20]; // are you sure that 20 chars will be enough?
cstring myWords[50];
char line[2048]; // what's the max length of line?
std::cin.getline(line, 2048);
int i = 0;
char* nextWord = strtok(line, " \t\r\n");
while (nextWord != NULL)
{
strcpy(myWords[i++], nextWord);
nextWord = strtok(NULL, " \t\r\n");
}
But much better would be to use std::string, std::getline, std::istringstream and >> operator instead:
using namespace std;
vector<string> myWords;
string line;
if (getline(cin, line))
{
istringstream is(line);
string word;
while (is >> word)
myWords.push_back(word);
}
std::vector<std::string> strings;
for (int i = 0; i < MAX_STRINGS && !cin.eof(); i++) {
std::string str;
std::cin >> str;
if (str.size())
strings.push_back(str);
}