Parse a string to get the nth field - c++

Im trying to parse the string located in /proc/stat in a linux filesystem using c++
I have lifted and saved the string as a variable in a c++ program
I want to lift individual values from the string. Each value is separated by a space.
I want to know how i would, for example, lift the 15th value from the string.

std::strings separated by spaces can be automatically parsed from any ostream. Simply throw the entire line into an std::istringstream and parse out the nth string.
std::string tokens;
std::istringstream ss(tokens);
std::string nth;
for (int i = 0; i < 15; ++i)
ss >> nth;
return nth;

#include <string>
#include <sstream>
#include <iostream>
using namespace std;
// return n'th field or empty string
string Get( const std::string & s, unsigned int n ) {
istringstream is( s );
string field;
do {
if ( ! ( is >> field ) ) {
return "";
}
} while( n-- != 0 );
return field;
}
int main() {
string s = "one two three four";
cout << Get( s, 2 ) << endl;
}

I would use the split algorithm from the Boosts String Algorithms here:
#include <string>
#include <vector>
#include <boost/algorithm/string/classification.hpp>
#include <boost/algorithm/string/split.hpp>
std::string line = "...."; // parsed line
std::vector<std::string> splits;
boost::algorithm::split( splits, parsed_line, boost::is_any_of( " " ) );
std::string value;
if ( splits.size() >= 15 ) {
value = splits.at( 14 );
}

You could use boost::tokenizer with space as a separator and iterate over the values.

See this SO and that should answer most of your question.

you could use strtok function with some counter to stop when you reach nth value

You could use std::string::find to find a space and repeat until the 15th value is found.

Related

Trouble getting two variables to update in C++ for loop

I am creating a function that splits a sentence into words, and believe the way to do this is to use str.substr, starting at str[0] and then using str.find to find the index of the first " " character. Then update the starting position parameter of str.find to start at the index of that " " character, until the end of str.length().
I am using two variables to mark the beginning position and end position of the word, and update the beginning position variable with the ending position of the last. But it is not updating as desired in the loop as I currently have it, and cannot figure out why.
#include <iostream>
#include <string>
using namespace std;
void splitInWords(string str);
int main() {
string testString("This is a test string");
splitInWords(testString);
return 0;
}
void splitInWords(string str) {
int i;
int beginWord, endWord, tempWord;
string wordDelim = " ";
string testWord;
beginWord = 0;
for (i = 0; i < str.length(); i += 1) {
endWord = str.find(wordDelim, beginWord);
testWord = str.substr(beginWord, endWord);
beginWord = endWord;
cout << testWord << " ";
}
}
It is easier to use a string stream.
#include <vector>
#include <string>
#include <sstream>
using namespace std;
vector<string> split(const string& s, char delimiter)
{
vector<string> tokens;
string token;
istringstream tokenStream(s);
while (getline(tokenStream, token, delimiter))
{
tokens.push_back(token);
}
return tokens;
}
int main() {
string testString("This is a test string");
vector<string> result=split(testString,' ');
return 0;
}
You can write it using the existing C++ libraries:
#include <string>
#include <vector>
#include <iterator>
#include <sstream>
int main()
{
std::string testString("This is a test string");
std::istringstream wordStream(testString);
std::vector<std::string> result(std::istream_iterator<std::string>{wordStream},
std::istream_iterator<std::string>{});
}
Couple of issues:
The substr() method second parameter is a length (not a position).
// Here you are using `endWord` which is a poisition in the string.
// This only works when beginWord is 0
// for all other values you are providing an incorrect len.
testWord = str.substr(beginWord, endWord);
The find() method searches from the second paramer.
// If str[beginWord] contains one of the delimiter characters
// Then it will return beginWord
// i.e. you are not moving forward.
endWord = str.find(wordDelim, beginWord);
// So you end up stuck on the first space.
Assuming you got the above fixed. You would be adding space at the front of each word.
// You need to actively search and remove the spaces
// before reading the words.
nice things you could do:
Here:
void splitInWords(string str) {
You are passing the parameter by value. This means you are making a copy. A better technique would be to pass by const reference (you are not modifying the original or the copy).
void splitInWords(string const& str) {
An Alternative
You can use the stream functionality.
void split(std::istream& stream)
{
std::string word;
stream >> word; // This drops leading space.
// Then reads characters into `word`
// until a "white space" character is
// found.
// Note: it emptys words before adding any
}

What is an efficient method for extracting data from a string into a Map?

This is in C++. Let's say I have a string that looks like this "[05]some words here [13]some more words here [17]and so on"
I want to split this string into a Map<int, std::string> with the number as the key and the text up to the next code as the value. The brackets are to be completely ignored.
So far I've been getting by with the standard library and SDL (I'm making a small game), but I'm willing to install boost or any other library that would help.
My first thought was to either use some of Boosts Regex functions to do a kind of regex find and replace, or to simply convert it to a char array going through every character looking for the brackets and recording the number inside but that seems like it would be inefficient, especially since I'm sure there's probably a popular method to do this in C++.
You can use a regex_token_iterator for this. Here's the basic idea:
#include <iostream>
#include <map>
#include <string>
#include <vector>
#include <regex>
using namespace std;
map<int, string> extract( const std::string & s )
{
map<int, string> m;
static const regex r( "\\s*\\[(\\d+)\\]" );
sregex_token_iterator tok( s.begin(), s.end(), r, { -1, 1 } );
tok++; // Skip past the first end-of-sequence iterator.
for( sregex_token_iterator end; tok != end; )
{
int num = stoi( *tok, nullptr, 10 );
if( ++tok != end )
{
m.emplace( make_pair( num, *tok++ ) );
}
}
return m;
}
int main()
{
auto m = extract("[05]some words here [13]some more words here [17]and so on");
for( auto & p : m ) cout << p.first << ": '" << p.second << "'" << endl;
return 0;
}
Here, this is searching for and extracting the pattern \s*\[(\d+)\]\s*, which means it will drop any whitespace before and after the square brackets, and create a matching group to match at least one digit.
By using {-1, 1} on the iterator, we're asking for the iteration sequence to provide all text prior to the match, followed by matching group 1.
Output:
5: 'some words here'
13: 'some more words here'
17: 'and so on'
Working example is here
You can utilize substr() and find_first_of() to extract the actual data from a string as follows:
#include <string>
#include <iostream>
#include <map>
using std::string;
using std::cout;
using std::endl;
using std::map;
map<int,string> StrToMap(const string& str)
{
map<int, string> temMap;
for (int i(0); i < str.size(); ++i){
if ( str[i] == '[' ){
string tempIdx = str.substr(i+1, str.find_first_of("]",i)-i-1 );
int a = i+str.find_first_of("]",i)-i+1;
int b = str.find_first_of("[",a)-1;
if ( b < 0 )
b = str.size();
string tempStr = str.substr(a, b-a);
int idx = std::stoi( tempIdx );
temMap[idx] = tempStr;
}
}
return temMap;
}
int main(int argc, char* argv[])
{
map<int, string> temMap = StrToMap("[05]some words here [13]some more words here [17]and so on");
for (std::map<int, string>::const_iterator it=temMap.begin(); it!=temMap.end(); ++it)
std::cout << it->first << " " << it->second << '\n';
return 0;
}
The result is
5 some words here
13 some more words here
17 and so on
You can split string by '[' characters and collect parts to the vector. Then for each element of the vector split it on two parts (before ']' and after). Convert first to the number and put everything in map. It's all will be standard std methods.

Create a new string from prefix up to character position in C++

How can I find the position of a character in a string? Ex. If I input "abc*ab" I would like to create a new string with just "abc". Can you help me with my problem?
C++ standard string provides a find method:
s.find(c)
returns the position of first instance of character c into string s or std::string::npos in case the character is not present at all. You can also pass the starting index for the search; i.e.
s.find(c, x0)
will return the first index of character c but starting the search from position x0.
std::find returns an iterator to the first element it finds that compares equal to what you're looking for (or the second argument if it doesn't find anything, in this case the end iterator.) You can construct a std::string using iterators.
#include <iostream>
#include <string>
#include <algorithm>
int main()
{
std::string s = "abc*ab";
std::string s2(s.begin(), std::find(s.begin(), s.end(), '*'));
std::cout << s2;
return 0;
}
If you are working with std::string type, then it is very easy to find the position of a character, by using std::find algorithm like so:
#include <string>
#include <algorithm>
#include <iostream>
using namespace std;
int main()
{
string first_string = "abc*ab";
string truncated_string = string( first_string.cbegin(), find( first_string.cbegin(), first_string.cend(), '*' ) );
cout << truncated_string << endl;
}
Note: if your character is found multiple times in your std::string, then the find algorithm will return the position of the occurrence.
Elaborating on existing answers, you can use string.find() and string.substr():
#include <iostream>
#include <string>
int main() {
std::string s = "abc*ab";
size_t index = s.find("*");
if (index != std::string::npos) {
std::string prefix = s.substr(0, index);
std::cout << prefix << "\n"; // => abc
}
}

Using strtok() to parse text file

I've been trying to make a program that parses a text file and feeds 6 pieces of information into an array of objects. The problem for me is that I'm having issues figuring out how to process the text file. I was told that the first step I needed to do was to write some code that counted how many letters long each entry was. The txt file is in this format:
"thing1","thing2","thing3","thing4","thing5","thing6"
This is the current version of my code:
#include<iostream>
#include<string>
#include<fstream>
#include<cstring>
using namespace std;
int main()
{
ifstream myFile("Book List.txt");
while(myFile.good())
{
string line;
getline(myFile, line);
char *sArr = new char[line.length() + 1];
strcpy(sArr, line.c_str());
char *sPtr;
sPtr = strtok(sArr, " ");
while(sPtr != NULL)
{
cout << strlen(sPtr) << " ";
sPtr = strtok(NULL, " ");
}
cout << endl;
}
myFile.close();
return 0;
}
So there are two things making it hard for me right now.
1) How do I deal with the delimiters?
2) How do I deal with "skipping" the first quotation mark in each line?
Read in a string instead of a c-style string. This means that you can use the handy std methods.
The std::string::find() method should help you out with finding each thing that you want to parse.
http://www.cplusplus.com/reference/string/string/find/
You can use this to find all the commas, which will give you the starts of all the things.
Then you can use std::string::substr() to cut up the string into each piece.
http://www.cplusplus.com/reference/string/string/substr/
You can manage to get rid of the quotation marks by passing in 1 more than the start and 1 less than the length of the thing, you can also use
If you have to use strtok then this code snippet should give enough to modify your program to parse your data:
#include <cstdio>
#include <cstring>
int main ()
{
char str[] ="\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str,"\",");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ",\"");
}
return 0;
}
If you do not have to use strtok then you should use std::string as others have advised. Using std::string and std::istringstream:
#include <string>
#include <sstream>
#include <vector>
#include <iostream>
int main ()
{
std::string str2( "\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"" ) ;
std::istringstream is(str2);
std::string part;
while (getline(is, part, ','))
std::cout << part.substr(1,part.length()-2) << std::endl;
return 0;
}
For starters, don't use strtok if you can avoid it (and you easily can here - and you can even avoid using the find series of functions as well).
If you want to read in the whole line and then parse it:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
// defines a new ctype that treats commas as whitespace
struct csv_reader : std::ctype<char>
{
csv_reader() : std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
rc['\n'] = std::ctype_base::space;
rc[','] = std::ctype_base::space;
return &rc[0];
}
};
int main()
{
std::ifstream fin("yourFile.txt");
std::string line;
csv_reader csv;
std::vector<std::vector<std::string>> values;
while (std::getline(fin, line))
{
istringstream iss(line);
iss.imbue(std::locale(std::locale(), csv));
std::vector<std::string> vec;
std::copy(std::istream_iterator<std::string>(iss), std::istream_iterator<std::string>(), std::back_inserter(vec));
values.push_back(vec);
}
// values now contains a vector for each line that has the strings split by their commas
fin.close();
return 0;
}
That answers your first question. For your second, you can skip all the quotation marks by adding them to the rc mask (also treating them as whitespace) or you can strip them out afterwards (either directly or by using a transform):
std::transform(vec.begin(), vec.end(), vec.begin(), [](std::string& s)
{
std::string::iterator pend = std::remove_if(s.begin(), s.end(), [](char c)
{
return c == '"';
});
s.erase(pend, s.end());
});

CString Parsing Carriage Returns

Let's say I have a string that has multiple carriage returns in it, i.e:
394968686
100630382
395950966
335666021
I'm still pretty amateur hour with C++, would anyone be willing to show me how you go about: parsing through each "line" in the string ? So I can do something with it later (add the desired line to a list). I'm guessing using Find("\n") in a loop?
Thanks guys.
while (!str.IsEmpty())
{
CString one_line = str.SpanExcluding(_T("\r\n"));
// do something with one_line
str = str.Right(str.GetLength() - one_line.GetLength()).TrimLeft(_T("\r\n"));
}
Blank lines will be eliminated with this code, but that's easily corrected if necessary.
You could try it using stringstream. Notice that you can overload the getline method to use any delimeter you want.
string line;
stringstream ss;
ss << yourstring;
while ( getline(ss, line, '\n') )
{
cout << line << endl;
}
Alternatively you could use the boost library's tokenizer class.
You can use stringstream class in C++.
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
int main()
{
string str = "\
394968686\
100630382\
395950966\
335666021";
stringstream ss(str);
vector<string> v;
string token;
// get line by line
while (ss >> token)
{
// insert current line into a std::vector
v.push_back(token);
// print out current line
cout << token << endl;
}
}
Output of the program above:
394968686
100630382
395950966
335666021
Note that no whitespace will be included in the parsed token, with the use of operator>>. Please refer to comments below.
If your string is stored in a c-style char* or std::string then you can simply search for \n.
std::string s;
size_t pos = s.find('\n');
You can use string::substr() to get the substring and store it in a list. Pseudo code,
std::string s = " .... ";
for(size_t pos, begin = 0;
string::npos != (pos = s.find('\n'));
begin = ++ pos)
{
list.push_back(s.substr(begin, pos));
}