Get string between 2 delimiters using C++ STl TR1 regular expressions - c++

I want to extract string between open and close bracket.So if the content is
145(5)(7)
the output must be
5
7
I tried with C++ STL TR1 using below code
const std::tr1::regex pattern("\\((.*?)\\)");
// the source text
std::string text= "145(5)(7)";
const std::tr1::sregex_token_iterator end;
for (std::tr1::sregex_token_iterator i(text.begin(),
text.end(), pattern);
i != end;
++i)
{
std::cout << *i << std::endl;
}
I am getting the output as,
(5)
(7)
I want the output without delimiters.
Please help me to meet my requirement using STL TR1.

You need to use sregex_iterator instead of sregex_token_iterator and then access the submatches via .str(n):
const std::tr1::regex pattern1("\\((.*?)\\)");
std::string text= "145(5)(7)";
const std::tr1::sregex_iterator end;
for (std::tr1::sregex_iterator i(text.begin(),
text.end(), pattern1);
i != end;
++i)
{
std::cout << (*i).str(1) << std::endl;
}

Related

Get all numbers from string c++

I know this question is asked for several times, but none of the answer fits with my need.
So I have this string
Sep=1, V_Batt=7.40, I_Batt=-559.63, V_SA=7.20, I_SA=-0.55, I_MB=500.25, V_5v=4.95, I_5v=446.20, V_3v=3.28, I_3v=3.45, S=0, T_Batt=25.24, T_SA1=22.95, T_SA2=-4.86
I want to get all of the number after the "=" sign and make a new string like
1,7.40,559.63,7.20,0.55,500.25,4.95,446.20,3.28,3.45,0,25.24,22.95,4.68
Can anyone help me to solve the problem. I have used stringstream but I got all 0 for my output
Thank you
Based on a corrected understanding of what's actually desired, I'd do things quite differently than I originally suggested. Under the circumstances, I agree with Stephen Webb that a regular expression is probably the right way to go, though I differ as to the right regex to use, and a bit in how to use it (though the latter is probably as much about habits I've formed as anything else).
#include <regex>
#include <iostream>
#include <string>
int main()
{
using iter = std::regex_token_iterator<std::string::const_iterator>;
std::string s = "Sep=1, V_Batt=7.40, I_Batt=-559.63, V_SA=7.20,
" I_SA=-0.55, I_MB=500.25, V_5v=4.95, I_5v=446.20,"
" V_3v=3.28, I_3v=3.45, S=0, T_Batt=25.24, T_SA1=22.95,"
" T_SA2=-4.86";
std::regex re(R"#([A-Z][^=]*=([-\.\d]+))#");
auto begin = iter(s.begin(), s.end(), re, 1);
iter end;
for (auto i = begin; i!= end; ++i)
std::cout << *i << ", ";
std::cout << '\n';
}
Result:
1, 7.40, -559.63, 7.20, -0.55, 500.25, 4.95, 446.20, 3.28, 3.45, 0, 25.24, 22.95, -4.86,
If the number of arguments and their order are known, you can use snprintf like this:
char str[100];
int Sep=1;
double V_Batt = 7.40, I_Batt = 559.63;// etc ...
snprintf(str, 100, "%d,%.2f,%.2f", Sep, V_Batt, I_Batt); //etc...
// str = 1,7.40,559.63
Open your file with fopen() function.
It returns you the File* variable. Of course, if already available your chars, just skip this step.
Use this File variable to get each char, let's say, by means of fgetc().
Check the content of obtained char variable and make what you want with it, eventually insert some comma in your new string, as necessary
That's exactly what std::regex_iterator is for.
#include <regex>
#include <iostream>
#include <string>
int main()
{
const std::string s = "Sep=1, V_Batt=7.40, I_Batt=-559.63, V_SA=7.20, I_SA=-0.55, I_MB=500.25, V_5v=4.95, I_5v=446.20, V_3v=3.28, I_3v=3.45, S=0, T_Batt=25.24, T_SA1=22.95, T_SA2=-4.86";
std::regex re("[-\\d\\.]+");
auto words_begin = std::sregex_iterator(s.begin(), s.end(), re);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i)
std::cout << (*i).str() << ',';
std::cout << "\n";
}
The output of the above complete program is this.
1,7.40,-559.63,7.20,-0.55,500.25,5,4.95,5,446.20,3,3.28,3,3.45,0,25.24,1,22.95,2,-4.86,

C++: Separating a char* with '\t' delimiter

I've been fighting this problem for a while now, and can't seem to find a simple solution that doesn't involve parsing a char * by hand. I need to split my char* variable by '\t', and I've tried the following ways:
Method 1:
char *splitentry;
std::string ss;
splitentry = strtok(read_msg_.data(), "\\t");
while(splitentry != NULL)
{
std::cout << splitentry << std::endl;
splitentry = strtok(NULL, "\\t");
}
Using the input '\tthis\tis\ta\ttest'
results in this output:
his
is
a
es
Method 2:
std::string s(read_msg_.data());
boost::algorithm::split(strs, s, boost::is_any_of("\\t");
for (int i = 0; i < strs.size(); i++)
std::cout << strs.at(i) << std::endl;
Which creates an identical output.
I've tried using boost::split_regex and used "\\t" as my regex value, but nothing gets split. Will I have to split it on my own, or am I going about this incorrectly?
I would try to make things a little simpler by sticking to std:: functions. (p.s. you never use this: std::string ss;)
Why not do something like this?
Method 1: std::istringstream
std::istringstream ss(read_msg_.data());
std::string line;
while( std::getline(ss,line,ss.widen('\t')) )
std::cout << line << std::endl;
Method 2: std::string::substr (my preferred method as it is lighter)
std::string data(read_msg_.data());
std::size_t SPLITSTART(0); // signifies the start of the cell
std::size_t SPLITEND(0); // signifies the end of the cell
while( SPLITEND != std::string::npos ) {
SPLITEND = data.find('\t',SPLITSTART);
// SPLITEND-SPLITSTART signifies the size of the string
std::cout << data.substr(SPLITSTART,SPLITEND-SPLITSTART) << std::endl;
SPLITSTART = SPLITEND+1;
}

How to use regular expressions to deal with Chinese punctuation symbols in C++

I want to achieve such a result:
Before:
有人可能会问:“那情绪、欲望、冲动、强迫症有什么区别呢?”
After:
有人可能会问 那情绪 欲望 冲动 强迫症有什么区别呢
To space replace Chinese punctuation symbols.
I tried to use replace and replace_if function but failed. The code like this:
char myints[] = "有人可能会问:“那情绪、欲望、冲动、强迫症有什么区别呢?”";
std::vector<char> myvector ;
std::replace_if (myvector.begin(), myvector.end(), "\\pP", " ");
std::cout << "myvector contains:";
for (std::vector<char>::iterator it=myvector.begin(); it!=myvector.end(); ++it)
std::cout << ' ' << *it;
std::cout << '\n';
Assuming you did mean to use a regular expression, rather than a character-by-character replacement function... Here's what I meant by using std::regex_replace. There's probably a more elegant regex that generalizes with fewer surprises, but at least this works for your example.
#include <regex>
#include <string>
int main()
{
std::wstring s(L"有人可能会问:“那情绪、欲望、冲动、强迫症有什么区别呢?”");
// Replace each run of punctuation with a space; use ECMAScript grammar
s = std::regex_replace(s, std::wregex(L"[[:punct:]]+"), L" ");
// Remove extra space at ends of line
s = std::regex_replace(s, std::wregex(L"^ | $"), L"");
return (s != L"有人可能会问 那情绪 欲望 冲动 强迫症有什么区别呢"); // returns 0
}

Split string by regex in VC++

I am using VC++ 10 in a project. Being new to C/C++ I just Googled, it appears that in standard C++ doesnt have regex? VC++ 10 seems to have regex. However, how do I do a regex split? Do I need boost just for that?
Searching the web, I found that many recommend Boost for many things, tokenizing/splitting string, parsing (PEG), and now even regex (though this should be build in ...). Can I conclude boost is a must have? Its 180MB for just trivial things, supported naively in many languages?
C++11 standard has std::regex. It also included in TR1 for Visual Studio 2010. Actually TR1 is available since VS2008, it's hidden under std::tr1 namespace. So you don't need Boost.Regex for VS2008 or later.
Splitting can be performed using regex_token_iterator:
#include <iostream>
#include <string>
#include <regex>
const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separator("-");
const std::tr1::sregex_token_iterator endOfSequence;
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separator, -1);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}
if you need to get also the separator itself, you could obtain it from sub_match object pointed by token, it is pair containing start and end iterators of token.
while(token != endOfSequence)
{
const std::tr1::sregex_token_iterator::value_type& subMatch = *token;
if(subMatch.first != s.begin())
{
const char sep = *(subMatch.first - 1);
std::cout << "Separator: " << sep << std::endl;
}
std::cout << *token++ << std::endl;
}
This is sample for case when you have single char separator. If separator itself can be any substring you need to do some more complex iterator work and possible store previous token submatch object.
Or you can use regex groups and place separators in first group and the real token in second:
const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separatorAndStr("(-*)([^-]*)");
const std::tr1::sregex_token_iterator endOfSequence;
// Separators will be 0th, 2th, 4th... tokens
// Real tokens will be 1th, 3th, 5th... tokens
int subMatches[] = { 1, 2 };
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separatorAndStr, subMatches);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}
Not sure it is 100% correct, but just to illustrate the idea.
Here an example from this blog.
You'll have all your matches in res
std::tr1::cmatch res;
str = "<h2>Egg prices</h2>";
std::tr1::regex rx("<h(.)>([^<]+)");
std::tr1::regex_search(str.c_str(), res, rx);
std::cout << res[1] << ". " << res[2] << "\n";

How can I extract the file name and extension from a path in C++

I have a list of files stored in a .log in this syntax:
c:\foto\foto2003\shadow.gif
D:\etc\mom.jpg
I want to extract the name and the extension from this files. Can you give a example of a simple way to do this?
To extract a filename without extension, use boost::filesystem::path::stem instead of ugly std::string::find_last_of(".")
boost::filesystem::path p("c:/dir/dir/file.ext");
std::cout << "filename and extension : " << p.filename() << std::endl; // file.ext
std::cout << "filename only : " << p.stem() << std::endl; // file
For C++17:
#include <filesystem>
std::filesystem::path p("c:/dir/dir/file.ext");
std::cout << "filename and extension: " << p.filename() << std::endl; // "file.ext"
std::cout << "filename only: " << p.stem() << std::endl; // "file"
Reference about filesystem: http://en.cppreference.com/w/cpp/filesystem
std::filesystem::path::filename
std::filesystem::path::stem
As suggested by #RoiDanto, for the output formatting, std::out may surround the output with quotations, e.g.:
filename and extension: "file.ext"
You can convert std::filesystem::path to std::string by p.filename().string() if that's what you need, e.g.:
filename and extension: file.ext
If you want a safe way (i.e. portable between platforms and not putting assumptions on the path), I'd recommend to use boost::filesystem.
It would look somehow like this:
boost::filesystem::path my_path( filename );
Then you can extract various data from this path. Here's the documentation of path object.
BTW: Also remember that in order to use path like
c:\foto\foto2003\shadow.gif
you need to escape the \ in a string literal:
const char* filename = "c:\\foto\\foto2003\\shadow.gif";
Or use / instead:
const char* filename = "c:/foto/foto2003/shadow.gif";
This only applies to specifying literal strings in "" quotes, the problem doesn't exist when you load paths from a file.
You'll have to read your filenames from the file in std::string. You can use the string extraction operator of std::ostream. Once you have your filename in a std::string, you can use the std::string::find_last_of method to find the last separator.
Something like this:
std::ifstream input("file.log");
while (input)
{
std::string path;
input >> path;
size_t sep = path.find_last_of("\\/");
if (sep != std::string::npos)
path = path.substr(sep + 1, path.size() - sep - 1);
size_t dot = path.find_last_of(".");
if (dot != std::string::npos)
{
std::string name = path.substr(0, dot);
std::string ext = path.substr(dot, path.size() - dot);
}
else
{
std::string name = path;
std::string ext = "";
}
}
Not the code, but here is the idea:
Read a std::string from the input stream (std::ifstream), each instance read will be the full path
Do a find_last_of on the string for the \
Extract a substring from this position to the end, this will now give you the file name
Do a find_last_of for ., and a substring either side will give you name + extension.
The following trick to extract the file name from a file path with no extension in c++ (no external libraries required):
#include <iostream>
#include <string>
using std::string;
string getFileName(const string& s) {
char sep = '/';
#ifdef _WIN32
sep = '\\';
#endif
size_t i = s.rfind(sep, s.length());
if (i != string::npos)
{
string filename = s.substr(i+1, s.length() - i);
size_t lastindex = filename.find_last_of(".");
string rawname = filename.substr(0, lastindex);
return(rawname);
}
return("");
}
int main(int argc, char** argv) {
string path = "/home/aymen/hello_world.cpp";
string ss = getFileName(path);
std::cout << "The file name is \"" << ss << "\"\n";
}
I also use this snippet to determine the appropriate slash character:
boost::filesystem::path slash("/");
boost::filesystem::path::string_type preferredSlash = slash.make_preferred().native();
and then replace the slashes with the preferred slash for the OS. Useful if one is constantly deploying between Linux/Windows.
For linux or unix machines, the os has two functions dealing with path and file names. use man 3 basename to get more information about these functions.
The advantage of using the system provided functionality is that you don't have to install boost or needing to write your own functions.
#include <libgen.h>
char *dirname(char *path);
char *basename(char *path);
Example code from the man page:
char *dirc, *basec, *bname, *dname;
char *path = "/etc/passwd";
dirc = strdup(path);
basec = strdup(path);
dname = dirname(dirc);
bname = basename(basec);
printf("dirname=%s, basename=%s\n", dname, bname);
Because of the non-const argument type of the basename() function, it is a little bit non-straight forward using this inside C++ code. Here is a simple example from my code base:
string getFileStem(const string& filePath) const {
char* buff = new char[filePath.size()+1];
strcpy(buff, filePath.c_str());
string tmp = string(basename(buff));
string::size_type i = tmp.rfind('.');
if (i != string::npos) {
tmp = tmp.substr(0,i);
}
delete[] buff;
return tmp;
}
The use of new/delete is not good style. I could have put it into a try/catch
block in case something happened between the two calls.
Nickolay Merkin's and Yuchen Zhong's answers are great, but however from the comments you can see that it is not fully accurate.
The implicit conversion to std::string when printing will wrap the file name in quotations. The comments aren't accurate either.
path::filename() and path::stem() returns a new path object and path::string() returns a reference to a string. Thus something like std::cout << file_path.filename().string() << "\n" might cause problems with dangling reference since the string that the reference points to might have been destroyed.