Delimiting a string

Delimiting a string - c++

In a text file, I have several lines and each line looks like the below one (different numbers).
[275, 61],[279, 56],[285, 54],[292, 55],[298, 57],[315, 57],[321, 54],[328, 54],[335, 56]
When it comes to one line, I want to get each point separately.
For example:
first I should get [275, 61]
then I should get [279, 56]
and then [285, 54]
However when I tried the following;
istringstream linestream(line);
while (getline (linestream, item, ','))
{
...............
}
what it gives me is :
first [275
and then 61]
next [279
etc.
Can anyone tell me how to modify the while loop so that I get the required output ?

The behavior you see is the expected one since your delimiter is ','. If you want to delimit on every second ',' you need to concatenate your tokens back two by two.
For example "[275" + "," + " 61]"
Given your current output you should be able to do so with a simple for loop over your tokens.

The behaviour you are experiencing using getline is the correct one, since you are delimiting with ,.
In order to get the desired behaviour and if your compiler supports C++11 you could use the regular expressions library (i.e., regex) like the example below:
#include <iostream>
#include <string>
#include <regex>
int main() {
std::string str("[275, 61],[279, 56],[285, 54],[292, 55],[298, 57],[315, 57],[321, 54],[328, 54],[335, 56]");
std::regex e("\\[\\s*\\d+\\s*\\,\\s*\\d+\\s*\\]");
std::smatch sm;
std::regex_search(str, sm, e);
std::cout << "the matches were: ";
while (std::regex_search(str, sm, e)) {
for(auto x : sm) std::cout << x << " ";
std::cout << std::endl;
str = sm.suffix().str();
}
}
LIVE DEMO

Assuming that the text is coherent (no missed '[' or ']') You can manually search for position of '[' and position of ']', and then you can copy the sub-string between those for further analysis.
The following code demonstrate how you extract and print those sub-strings:
#include <stdio.h>
#include <string.h>
void main()
{
const char* test=" [275, 61],[279, 56],[285, 54],[292, 55],[298, 57],[315, 57],[321, 54],[328, 54],[335, 56] ";
char pair[20];
int i,pos1,pos2;
for (i=0,pos1=0,pos2=0; test[i]>0; i++)
{
if(test[i]=='[') pos1=i; //searching for '['
if(test[i]==']') pos2=i; //searching for ']'
//if(test[i]==']') printf("%.*s\n", pos2-pos1+1, test + pos1); //direct print from memory
if(test[i]==']') strncpy_s(pair,sizeof(pair), test + pos1, pos2-pos1+1); // copy result sub-string to "pair"
if(test[i]==']') printf("%s\n", pair); //print result on screen
}
}

Related

Splitting of strings

How do I separate the string into two , first one before ","or "." or " " etc and second one after that and then assign both of the to two different variables.
for example
string s="154558,ABCDEF; (This is to be inputted by the user ) string a = 154558; //It should be spilt like this after conversion string b =ABCDEF

I believe it can be something as simple as using rfind + substr
size_t pos = str.rfind('.')
new_str = str.substr(0, pos);
Essentially what the code is doing is searching for the first '.' and then using substr to extract the substring.

The two primary ways to split the string on ',' would be (1) create a std::basic_stringstream from the string and then use std::basic_istream::getline with the delimiter of ',' to separate the two strings, e.g.
#include <iostream>
#include <string>
#include <sstream>
int main (void) {
std::string s {"154558,ABCDEF"};
std::stringstream ss(s);
std::string sub {};
while (getline (ss, sub, ','))
std::cout << sub << '\n';
}
Example Use/Output
$ ./bin/str_split_ss
154558
ABCDEF
Or the second and equally easy way would be to use std::basic_string::find_first_of and find the position of ',' within the string and then use the position with std::basic_string::substr to extract the substring on either side of the comma, e.g.
#include <iostream>
#include <string>
int main (void) {
std::string s {"154558,ABCDEF"};
size_t pos = s.find_first_of (",");
if (pos != std::string::npos) {
std::cout << "first: " << s.substr(0, pos) <<
"\nsecond: " << s.substr(pos+1) << '\n';
}
}
Example Use/Output
$ ./bin/str_split_find_first_of
first: 154558
second: ABCDEF
Either way works fine. Look things over and let me know if you have further questions.

Missing last word of string when I split the sentence into word [duplicate]

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 2 years ago.
I am missing the last word of string. this is code I used to store word into array.
string arr[10];
int Add_Count = 0;
string sentence = "I am unable to store last word"
string Words = "";
for (int i = 0; i < sentence.length(); i++)
{
if (Sentence[i] == ' ')
{
arr[Add_Count] = Words;
Words = "";
Add_Count++;
}
else if (isalpha(Sentence[i]))
{
Words = Words + sentence[i];
}
}
Let's print the arr:
for(int i =0; i<10; i++)
{
cout << arr[i] << endl;
}

You are inserting the word found when you see a blank character.
Since the end of the string is not a blank character, the insertion for the last word never happens.
What you can do is:
(1) If the current character is black, skip to the next character.
(2) See the next character of current character.
(2-1) If the next character is blank, insert the accumulated word.
(2-2) If the next character doesn't exist (end of the sentence), insert the accumulated word.
(2-3) If the next character is not blank, accumulate word.

Obviously you lost the last word because when you go to the end the last word is not extracted yet. You can add this line to get the last word
if (Words.length() != 0) {
arr[Add_Count] = Words;
Words = "";
}

Following on from the very good approach by #Casey, but adding the use of std::vector instead of an array, allows you to break a line into as many words as may be included in it. Using the std::stringstream and extracting with >> allows a simple way to tokenize the sentence while ignoring leading, multiple included and trailing whitespace.
For example, you could do:
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
int main (void) {
std::string sentence = " I am unable to store last word ",
word {};
std::stringstream ss (sentence); /* create stringstream from sentence */
std::vector<std::string> words {}; /* vector of strings to hold words */
while (ss >> word) /* read word */
words.push_back(word); /* add word to vector */
/* output original sentence */
std::cout << "sentence: \"" << sentence << "\"\n\n";
for (const auto& w : words) /* output all words in vector */
std::cout << w << '\n';
}
Example Use/Output
$ ./bin/tokenize_sentence_ss
sentence: " I am unable to store last word "
I
am
unable
to
store
last
word
If you need more fine-grained control, you can use std::string::find_first_of and std::string::find_first_not_of with a set of delimiters to work your way through a string finding the first character in a token with std::string::find_first_of and then skipping over delimiters to the start of the next token with std::string::find_first_not_of. That involves a bit more arithmetic, but is a more flexible alternative.

This happens because the last word has no space after it, just add this line after for loop.
arr[Add_Count] = Words;

My version :
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
int main() {
std::istringstream iss("I am unable to store last word");
std::vector<std::string> v(std::istream_iterator<std::string>(iss), {});
std::copy(v.begin(), v.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Sample Run :
I
am
unable
to
store
last
word

If you know you won't have to worry about punctuation, the easiest way to handle it is to throw the string into a istringstream. You can use the extraction operator overload to extract the "words". The extraction operator defaults to splitting on whitespace and automatically terminates at the end of the stream:
#include <algorithm>
#include <sstream>
#include <string>
#include <vector>
std::string sentence = // ... Get the string from cin, a file, or hard-code it here.
std::istringstream ss(sentence);
std::vector<std::string> arr;
arr.reserve(1 + std::count(std::cbegin(sentence), std::cend(sentence), ' '));
std::string word;
while(ss >> word) {
arr.push_back(word);
}

Splitting sentences and placing in vector

I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.

This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.

You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.

I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.

How to remove first word from a string?

Let's say I have
string sentence{"Hello how are you."}
And I want string sentence to have "how are you" without the "Hello". How would I go about doing that.
I tried doing something like:
stringstream ss(sentence);
ss>> string junkWord;//to get rid of first word
But when I did:
cout<<sentence;//still prints out "Hello how are you"
It's pretty obvious that the stringstream doesn't change the actual string. I also tried using strtok but it doesn't work well with string.

Try the following
#include <iostream>
#include <string>
int main()
{
std::string sentence{"Hello how are you."};
std::string::size_type n = 0;
n = sentence.find_first_not_of( " \t", n );
n = sentence.find_first_of( " \t", n );
sentence.erase( 0, sentence.find_first_not_of( " \t", n ) );
std::cout << '\"' << sentence << "\"\n";
return 0;
}
The output is
"how are you."

str=str.substr(str.find_first_of(" \t")+1);
Tested:
string sentence="Hello how are you.";
cout<<"Before:"<<sentence<<endl;
sentence=sentence.substr(sentence.find_first_of(" \t")+1);
cout<<"After:"<<sentence<<endl;
Execution:
> ./a.out
Before:Hello how are you.
After:how are you.
Assumption is the line does not start with an empty space. In such a case this does not work.
find_first_of("<list of characters>").
the list of characters in our case is space and a tab. This will search for first occurance of any of the list of characters and return an iterator. After that adding +1 movers the position by one character.Then the position points to the second word of the line.
Substr(pos) will fetch the substring starting from position till the last character of the string.

You can for example take the remaining substring
string sentence{"Hello how are you."};
stringstream ss{sentence};
string junkWord;
ss >> junkWord;
cout<<sentence.substr(junkWord.length()+1); //string::substr
However, it also depends what you want to do further

There are countless ways to do this. I think I would go with this:
#include <iostream>
#include <string>
int main() {
std::string sentence{"Hello how are you."};
// First, find the index for the first space:
auto first_space = sentence.find(' ');
// The part of the string we want to keep
// starts at the index after the space:
auto second_word = first_space + 1;
// If you want to write it out directly, write the part of the string
// that starts at the second word and lasts until the end of the string:
std::cout.write(
sentence.data() + second_word, sentence.length() - second_word);
std::cout << std::endl;
// Or, if you want a string object, make a copy from the start of the
// second word. substr copies until the end of the string when you give
// it only one argument, like here:
std::string rest{sentence.substr(second_word)};
std::cout << rest << std::endl;
}
Of course, unless you have a really good reason not to, you should check that first_space != std::string::npos, which would mean the space was not found. The check is omitted in my sample code for clarity :)

You could use string::find() to locate the first space. Once you have its index, then get the sub string with string::substr() from the index after the index of the space up to the end of the string.

One liner:
std::string subStr = sentence.substr(sentence.find_first_not_of(" \t\r\n", sentence.find_first_of(" \t\r\n", sentence.find_first_not_of(" \t\r\n"))));
working example:
#include <iostream>
#include <string>
void main()
{
std::string sentence{ "Hello how are you." };
char whiteSpaces[] = " \t\r\n";
std::string subStr = sentence.substr(sentence.find_first_not_of(whiteSpaces, sentence.find_first_of(whiteSpaces, sentence.find_first_not_of(whiteSpaces))));
std::cout << subStr;
std::cin.ignore();
}

Here's how to use a stringstream to extract the junkword while ignoring any space before or after (using std::ws), then get the rest of the sentence, with robust error handling....
std::string sentence{"Hello how are you."};
std::stringstream ss{sentence};
std::string junkWord;
if (ss >> junkWord >> std::ws && std::getline(ss, sentence, '\0'))
std::cout << sentence << '\n';
else
std::cerr << "the sentence didn't contain ANY words at all\n";
See it running on ideone here....

#include <iostream> // cout
#include <string> // string
#include <sstream> // string stream
using namespace std;
int main()
{
string testString = "Hello how are you.";
istringstream iss(testString); // note istringstream NOT sstringstream
char c; // this will read the delima (space in this case)
string firstWord;
iss>>firstWord>>c; // read the first word and end after the first ' '
cout << "The first word in \"" << testString << "\" is \"" << firstWord << "\""<<endl;
cout << "The rest of the words is \"" <<testString.substr(firstWord.length()+1) << "\""<<endl;
return 0;
}
output
The first word in "Hello how are you." is "Hello"
The rest of the words is "how are you."
live testing at ideon

converting individual string elements to their decimal equivalents in c++

I have a string str ( "1 + 2 = 3" ). I want to obtain the individual numbers of the string in their decimal values( not ASCII ). I have tried atoi and c_str(). But both them require the entire string to consist of only numbers. I am writing my code in C++.
Any help would be great.
My challenge is to evaluate a prefix expression. I am reading from a file where each line contains a prefix expression. My code snippet to tokenize and and store the variables is as shown below. Each line of the file contains numbers and operators(+,-,*) which are separated by a space.
Ex - line = ( * + 2 3 4);
ifstream file;
string line;
file.open(argv[1]);
while(!file.eof())
{
getline(file,line);
if(line.length()==0)
continue;
else
{
vector<int> vec;
string delimiters = " ";
size_t current;
size_t next = -1;
do
{
current = next + 1;
next = line.find_first_of( delimiters, current );
if((line[next] <=57)&&(line[next] >=48))
vec.push_back(atoi((line.substr( current, next - current )).c_str()));
}while (next != string::npos);
cout << vec[0] << endl;
}
}
file.close();
In this case vec[0] prints 50 not 2.

You need to learn to delimit a string. Your delimiting characters would be mathematical operators (ie:
C: creating array of strings from delimited source string
http://www.gnu.org/software/libc/manual/html_node/Finding-Tokens-in-a-String.html
In the case of the second link, you would do something like:
const char delimiters[] = "+-=";
With this knowledge, you can create an array of strings, and call atoi() on each string to get the numeric equivalent. Then you can use the address (array index) of each delimiter to determine which operator is there.
For just things like addition and subtraction, this will be dead simple. If you want order of operations and multiplication, parentheses, etc, your process flow logic will be more complicated.
For a more in-depth example, please see this final link. A simple command-line calculator in C. That should make it crystal clear.
http://stevehanov.ca/blog/index.php?id=26

You will not fall into your if, since your next position will be at a delimiter.
string delimiters = " ";
...
next = line.find_first_of( delimiters, current );
if((line[next] <=57)&&(line[next] >=48))
...
Since your delimiters consist of " ", then line[next] will be a space character.
From the description of your problem, you are missing code that will save away your operators. There is no code to attempt to find the operators.
You don't have to assume ASCII for testing for a digit. You can use is_digit() for example, or you can compare against '9' and '0'.
When you print your vector element, you may be accessing the vector inappropriately, because no item may have ever been inserted into the array.

Don't use fin.eof() to control a loop. That function is only useful after a read has failed.
There are a number of ways to get ints from a std::string, I'm choosing std::stoi() from the C++11 standard in this case.
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
typedef std::vector<int> ints;
bool is_known_operator(std::string const& token)
{
static char const* tokens[] = {"*", "/", "+", "-"};
return std::find(std::begin(tokens), std::end(tokens), token) != std::end(tokens);
}
ints tokenise(std::string const& line)
{
ints vec;
std::string token;
std::istringstream iss(line);
while (iss >> token)
{
if (is_known_operator(token))
{
std::cout << "Handle operator [" << token << "]" << std::endl;
}
else
{
try
{
auto number = std::stoi(token);
vec.push_back(number);
}
catch (const std::invalid_argument&)
{
std::cerr << "Unexpected item in the bagging area ["
<< token << "]" << std::endl;
}
}
}
return vec;
}
int main(int, const char *argv[])
{
std::ifstream file(argv[1]);
std::string line;
ints vec;
while (std::getline(file, line))
{
vec = tokenise(line);
}
std::cout << "The following " << vec.size() << " numbers were read:\n";
std::copy(vec.begin(), vec.end(), std::ostream_iterator<int>(std::cout, "\n"));
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Delimiting a string - c++

The behavior you see is the expected one since your delimiter is ','. If you want to delimit on every second ',' you need to concatenate your tokens back two by two. For example "[275" + "," + " 61]" Given your current output you should be able to do so with a simple for loop over your tokens.

Related

Splitting of strings

Missing last word of string when I split the sentence into word [duplicate]

Splitting sentences and placing in vector

How to remove first word from a string?

converting individual string elements to their decimal equivalents in c++

Categories

Resources