Matching and Extracting data using regular expression - c++

Problem: To find a matching string and to extract data from the matched string. There are a number of command strings which has keywords and data.
Command Examples:
Ask name to call me
Notify name that do this action
Message name that request
Keywords: Ask, Notify, Message, to, that. Data:
Input strings:
Ask peter to call me
Notify Jenna that I am going to be away
Message home that I am running late
My problem consists of two problems
1) Find matching command
2) Extract data
Here is what I am doing:
I create multiple regular expressions:
"Ask[[:s:]][[:w:]]+[[:s:]]to[[:s:]][[:w:]]+" or "Ask([^\t\n]+?)to([^\t\n]+?)"
"Notify[[:s:]][[:w:]]+[[:s:]]that[[:s:]][[:w:]]+" or "Notify([^\t\n]+?)that([^\t\n]+?)"
void searchExpression(const char *regString)
{
std::string str;
boost::regex callRegEx(regString, boost::regex_constants::icase);
boost::cmatch im;
while(true) {
std::cout << "Enter String: ";
getline(std::cin, str);
fprintf(stderr, "str %s regstring %s\n", str.c_str(), regString);
if(boost::regex_search(str.c_str(), im, callRegEx)) {
int num_var = im.size() + 1;
fprintf(stderr, "Matched num_var %d\n", num_var);
for(int j = 0; j <= num_var; j++) {
fprintf(stderr, "%d) Found %s\n",j, std::string(im[j]).c_str());
}
}
else {
fprintf(stderr, "Not Matched\n");
}
}
}
I am able to Find a matching string, I am not able to extract the data.
Here is the output:
input_string: Ask peter to call Regex Ask[[:s:]][[:w:]]+[[:s:]]to[[:s:]][[:w:]]+
Matched num_var 2
0) Found Ask peter to call
1) Found
2) Found
I would like to extract peter and call from Ask Peter to call.

Since you're really wanting to parse a grammar, you should consider Boost's parser generator.
You'd simply write the whole thing top-down:
auto sentence = [](auto&& v, auto&& p) {
auto verb = lexeme [ no_case [ as_parser(v) ] ];
auto name = lexeme [ +graph ];
auto particle = lexeme [ no_case [ as_parser(p) ] ];
return confix(verb, particle) [ name ];
};
auto ask = sentence("ask", "to") >> lexeme[+char_];
auto notify = sentence("notify", "that") >> lexeme[+char_];
auto message = sentence("message", "that") >> lexeme[+char_];
auto command = ask | notify | message;
This is a Spirit X3 grammar for it. Read lexeme as "keep whole word" (don't ignore spaces).
Here, "name" is taken to be anything up to the expected particle¹
If you just want to return the raw string matched, this is enough:
Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/directive/confix.hpp>
namespace x3 = boost::spirit::x3;
namespace commands {
namespace grammar {
using namespace x3;
auto sentence = [](auto&& v, auto&& p) {
auto verb = lexeme [ no_case [ as_parser(v) ] ];
auto name = lexeme [ +graph ];
auto particle = lexeme [ no_case [ as_parser(p) ] ];
return confix(verb, particle) [ name ];
};
auto ask = sentence("ask", "to") >> lexeme[+char_];
auto notify = sentence("notify", "that") >> lexeme[+char_];
auto message = sentence("message", "that") >> lexeme[+char_];
auto command = ask | notify | message;
auto parser = raw [ skip(space) [ command ] ];
}
}
int main() {
for (std::string const input : {
"Ask peter to call me",
"Notify Jenna that I am going to be away",
"Message home that I am running late",
})
{
std::string matched;
if (parse(input.begin(), input.end(), commands::grammar::parser, matched))
std::cout << "Matched: '" << matched << "'\n";
else
std::cout << "No match in '" << input << "'\n";
}
}
Prints:
Matched: 'Ask peter to call me'
Matched: 'Notify Jenna that I am going to be away'
Matched: 'Message home that I am running late'
BONUS
Of course, you'd actually want to extract the relevant bits of information.
Here's how I'd do that. Let's parse into a struct:
struct Command {
enum class Type { ask, message, notify } type;
std::string name;
std::string message;
};
And let's write our main() as:
commands::Command cmd;
if (parse(input.begin(), input.end(), commands::grammar::parser, cmd))
std::cout << "Matched: " << cmd.type << "|" << cmd.name << "|" << cmd.message << "\n";
else
std::cout << "No match in '" << input << "'\n";
Live On Coliru
#include <iostream>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/directive/confix.hpp>
namespace x3 = boost::spirit::x3;
namespace commands {
struct Command {
enum class Type { ask, message, notify } type;
std::string name;
std::string message;
friend std::ostream& operator<<(std::ostream& os, Type t) { return os << static_cast<int>(t); } // TODO
};
}
BOOST_FUSION_ADAPT_STRUCT(commands::Command, type, name, message)
namespace commands {
namespace grammar {
using namespace x3;
auto sentence = [](auto type, auto&& v, auto&& p) {
auto verb = lexeme [ no_case [ as_parser(v) ] ];
auto name = lexeme [ +graph ];
auto particle = lexeme [ no_case [ as_parser(p) ] ];
return attr(type) >> confix(verb, particle) [ name ];
};
using Type = Command::Type;
auto ask = sentence(Type::ask, "ask", "to") >> lexeme[+char_];
auto notify = sentence(Type::notify, "notify", "that") >> lexeme[+char_];
auto message = sentence(Type::message, "message", "that") >> lexeme[+char_];
auto command // = rule<struct command, Command> { }
= ask | notify | message;
auto parser = skip(space) [ command ];
}
}
int main() {
for (std::string const input : {
"Ask peter to call me",
"Notify Jenna that I am going to be away",
"Message home that I am running late",
})
{
commands::Command cmd;
if (parse(input.begin(), input.end(), commands::grammar::parser, cmd))
std::cout << "Matched: " << cmd.type << "|" << cmd.name << "|" << cmd.message << "\n";
else
std::cout << "No match in '" << input << "'\n";
}
}
Prints
Matched: 0|peter|call me
Matched: 2|Jenna|I am going to be away
Matched: 1|home|I am running late
¹ I'm no English linguist so I don't know whether that is the correct grammatical term :)

This code reads the command strings from the file "commands.txt", searches for the regular expressions and prints the parts whenever there is a match.
#include <iostream>
#include <fstream>
#include <string>
#include <boost/regex.hpp>
const int NumCmdParts = 4;
std::string CommandPartIds[] = {"Verb", "Name", "Preposition", "Content"};
int main(int argc, char *argv[])
{
std::ifstream ifs;
ifs.open ("commands.txt", std::ifstream::in);
if (!ifs.is_open()) {
std::cout << "Error opening file commands.txt" << std::endl;
exit(1);
}
std::string cmdStr;
// Pieces of regular expression pattern
// '(?<Verb>' : This is to name the capture group as 'Verb'
std::string VerbPat = "(?<Verb>(Ask)|(Notify|Message))";
std::string SeparatorPat = "\\s*";
std::string NamePat = "(?<Name>\\w+)";
// Conditional expression. if (Ask) (to) else (that)
std::string PrepositionPat = "(?<Preposition>(?(2)(to)|(that)))";
std::string ContentPat = "(?<Content>.*)";
// Put the pieces together to compose pattern
std::string TotalPat = VerbPat + SeparatorPat + NamePat + SeparatorPat
+ PrepositionPat + SeparatorPat + ContentPat;
boost::regex actions_re(TotalPat);
boost::smatch action_match;
while (getline(ifs, cmdStr)) {
bool IsMatch = boost::regex_search(cmdStr, action_match, actions_re);
if (IsMatch) {
for (int i=1; i <= NumCmdParts; i++) {
std::cout << CommandPartIds[i-1] << ": " << action_match[CommandPartIds[i-1]] << "\n";
}
}
}
ifs.close();
}

Related

boost spirit x3 match an end of lexeme? [duplicate]

How does one prevent X3 symbol parsers from matching partial tokens? In the example below, I want to match "foo", but not "foobar". I tried throwing the symbol parser in a lexeme directive as one would for an identifier, but then nothing matches.
Thanks for any insights!
#include <string>
#include <iostream>
#include <iomanip>
#include <boost/spirit/home/x3.hpp>
int main() {
boost::spirit::x3::symbols<int> sym;
sym.add("foo", 1);
for (std::string const input : {
"foo",
"foobar",
"barfoo"
})
{
using namespace boost::spirit::x3;
std::cout << "\nParsing " << std::left << std::setw(20) << ("'" + input + "':");
int v;
auto iter = input.begin();
auto end = input.end();
bool ok;
{
// what's right rule??
// this matches nothing
// auto r = lexeme[sym - alnum];
// this matchs prefix strings
auto r = sym;
ok = phrase_parse(iter, end, r, space, v);
}
if (ok) {
std::cout << v << " Remaining: " << std::string(iter, end);
} else {
std::cout << "Parse failed";
}
}
}
Qi used to have distinct in their repository.
X3 doesn't.
The thing that solves it for the case you showed is a simple lookahead assertion:
auto r = lexeme [ sym >> !alnum ];
You could make a distinct helper easily too, e.g.:
auto kw = [](auto p) { return lexeme [ p >> !(alnum | '_') ]; };
Now you can just parse kw(sym).
Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
int main() {
boost::spirit::x3::symbols<int> sym;
sym.add("foo", 1);
for (std::string const input : { "foo", "foobar", "barfoo" }) {
std::cout << "\nParsing '" << input << "': ";
auto iter = input.begin();
auto const end = input.end();
int v = -1;
bool ok;
{
using namespace boost::spirit::x3;
auto kw = [](auto p) { return lexeme [ p >> !(alnum | '_') ]; };
ok = phrase_parse(iter, end, kw(sym), space, v);
}
if (ok) {
std::cout << v << " Remaining: '" << std::string(iter, end) << "'\n";
} else {
std::cout << "Parse failed";
}
}
}
Prints
Parsing 'foo': 1 Remaining: ''
Parsing 'foobar': Parse failed
Parsing 'barfoo': Parse failed

tokenizing string , accepting everything between given set of characters in CPP

I have the following code:
int main()
{
string s = "server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')";
regex re("(\'[!-~]+\')");
sregex_token_iterator i(s.begin(), s.end(), re, 1);
sregex_token_iterator j;
unsigned count = 0;
while(i != j)
{
cout << "the token is "<<*i<< endl;
count++;
}
cout << "There were " << count << " tokens found." << endl;
return 0;
}
Using the above regex, I wanted to extract the string between the paranthesis and single quote:, The out put should look like :
the token is 'm1.labs.teradata.com'
the token is 'use\')r_*5'
the token is 'u" er 5'
the token is 'default'
There were 4 tokens found.
Basically, the regex supposed to extract everything between " (' " and " ') ". It can be anything space , special character, quote or a closing parathesis.
I has earlier used the following regex:
boost::regex re_arg_values("(\'[!-~]+\')");
But is was not accepting space. Please can someone help me out with this. Thanks in advance.
Here's a sample of using Spirit X3 to create grammar to actually parse this. I'd like to parse into a map of (key->value) pairs, which makes a lot more sense than just blindly assuming the names are always the same:
using Config = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;
Now, we setup some grammar rules using X3:
namespace parser {
using namespace boost::spirit::x3;
auto value = quoted("'") | quoted('"');
auto key = lexeme[+alpha];
auto pair = key >> '(' >> value >> ')';
auto config = skip(space) [ *as<Entry>(pair) ];
}
The helpers as<> and quoted are simple lambdas:
template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
auto quoted = [](auto q) { return lexeme[q >> *('\\' >> char_ | char_ - q) >> q]; };
Now we can parse the string into a map directly:
Config parse_config(std::string const& cfg) {
Config parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::config, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
And the demo program
int main() {
Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");
for (auto& setting : cfg)
std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}
Prints
Key dbname has value default
Key password has value u" er 5
Key server has value m1.labs.teradata.com
Key username has value use')r_*5
LIVE DEMO
Live On Coliru
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <map>
using Config = std::map<std::string, std::string>;
using Entry = std::pair<std::string, std::string>;
namespace parser {
using namespace boost::spirit::x3;
template <typename T> auto as = [](auto p) { return rule<struct _, T> {} = p; };
auto quoted = [](auto q) { return lexeme[q >> *(('\\' >> char_) | (char_ - q)) >> q]; };
auto value = quoted("'") | quoted('"');
auto key = lexeme[+alpha];
auto pair = key >> '(' >> value >> ')';
auto config = skip(space) [ *as<Entry>(pair) ];
}
Config parse_config(std::string const& cfg) {
Config parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::config, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
int main() {
Config cfg = parse_config("server ('m1.labs.teradata.com') username ('use\\')r_*5') password('u\" er 5') dbname ('default')");
for (auto& setting : cfg)
std::cout << "Key " << setting.first << " has value " << setting.second << "\n";
}
Bonus
If you want to learn how to extract the raw input: just try
auto source = skip(space) [ *raw [ pair ] ];
as in this:
using RawSettings = std::vector<std::string>;
RawSettings parse_raw_config(std::string const& cfg) {
RawSettings parsed;
auto f = cfg.begin(), l = cfg.end();
if (!parse(f, l, parser::source, parsed))
throw std::invalid_argument("Parse failed at " + std::string(f,l));
return parsed;
}
int main() {
for (auto& setting : parse_raw_config(text))
std::cout << "Raw: " << setting << "\n";
}
Which prints: Live On Coliru
Raw: server ('m1.labs.teradata.com')
Raw: username ('use\')r_*5')
Raw: password('u" er 5')
Raw: dbname ('default')
Fixing a few syntax and style issues:
you need to escape \ in C strings
you had a " in s, making a syntax error
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <iostream>
int main() {
std::string s = "server ('m1.labs.teradata.com') username ('use\')r_*5') password('u' er 5') dbname ('default')";
boost::regex re(R"(('([^'\\]*(?:\\[\s\S][^'\\]*)*)'))");
size_t count = 0;
for (auto tok : boost::make_iterator_range(boost::sregex_token_iterator(s.begin(), s.end(), re, 1), {})) {
std::cout << "Token " << ++count << " is " << tok << "\n";
}
}
Prints
Token 1 is 'm1.labs.teradata.com'
Token 2 is 'use'
Token 3 is ') password('
Token 4 is ' er 5'
Token 5 is 'default'

parsing a single value into an ast node with a container

My problem is the following. I have an ast node which is defined as like the following:
struct foo_node{
std::vector<std::string> value;
}
and I have a parser like this for parsing into the struct, which works fine:
typedef x3::rule<struct foo_node_class, foo_node> foo_node_type;
const foo_node_type foo_node = "foo_node";
auto const foo_node_def = "(" >> +x3::string("bar") >> ")";
Now I want to achieve that the parser also parses "bar", without brackets, but only if its a single bar. I tried to do it like this:
auto const foo_node_def = x3::string("bar")
| "(" > +x3::string("bar") > ")";
but this gives me a compile time error, since x3::string("bar") returns a string and not a std::vector<std::string>.
My question is, how can I achieve, that the x3::string("bar") parser (and every other parser which returns a string) parses into a vector?
The way to parse a single element and expose it as a single-element container attribute is x3::repeat(1) [ p ]:
Live On Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
namespace x3 = boost::spirit::x3;
struct foo_node {
std::vector<std::string> value;
};
BOOST_FUSION_ADAPT_STRUCT(foo_node, value)
namespace rules {
auto const bar
= x3::string("bar");
auto const foo_node
= '(' >> +bar >> ')'
| x3::repeat(1) [ +bar ]
;
}
int main() {
for (std::string const input : {
"bar",
"(bar)",
"(barbar)",
})
{
auto f = input.begin(), l = input.end();
foo_node data;
bool ok = x3::parse(f, l, rules::foo_node, data);
if (ok) {
std::cout << "Parse success: " << data.value.size() << " elements\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
Prints
Parse success: 1 elements
Parse success: 1 elements
Parse success: 2 elements

"Ends_with" function while reading an input file is not working

I am trying to create a C++ code that using boost libraries reads an input file like the following,
1 12 13 0 0 1 0 INLE
.
.
.
In this case, I must do an action if the condition specified on the last column of the right is INLE.
I have the following code,
#include <iostream>
#include <fstream>
#include <string>
#include <boost/algorithm/string/predicate.hpp>
int main(int argc, const char * argv[])
{
std::string line;
const std::string B_condition = "INLE";
std::ifstream myfile ("ramp.bnd");
if (myfile.is_open())
{
while ( getline (myfile,line) )
{
if (boost::algorithm::ends_with(line,B_condition)==true)
{
std::cout << "Its True! \n"; // just for testing
//add complete code
}
}
myfile.close();
}
else std::cout << "Unable to open file";
return 0;
}
while compiling there are no issues, but when I run, it doesnt shows anything.
By the other side, if I modify my boolean condition to false, it will print "Its true!" the number of lines that my input file has.
What am I doing wrong?
Thanks!!
I can only assume that:
your file contains whitespace at the end (use trim)
your file has windows line ends (CRLF) but you're reading it as UNIX text files, meaning that the lines will include a trailing `\r' (CR) (often shown as ^M in various text editors/pagers).
So, either
fix the line endings
trim whitespace from the lines before comparing
or both
Best: use a 'proper' parser to do the work.
Update adding a quick & dirty approach using Boost Spirit: see it Live On Coliru
int main()
{
std::ifstream myfile("ramp.bnd");
myfile.unsetf(std::ios::skipws);
boost::spirit::istream_iterator f(myfile), l;
using namespace qi;
bool ok = phrase_parse(f, l,
(repeat(7) [ int_ ] >> as_string[lexeme[+(char_ - eol)]])
[ phx::bind(process_line, _1, _2) ]
% eol, // supports CRLF and LF
blank);
if (!ok)
std::cerr << "Parse errors\n";
if (f!=l)
std::cerr << "Remaing input: '" << std::string(f,l) << "'\n";
}
As you can see, it validates the whole line, assuming (for now) that the columns are 7 integer values and a string (e.g. "INLE"). Now, the actual work is much simpler and can be implemented in a separate function:
void process_line(std::vector<int> const& values, std::string const& kind)
{
if (kind == "INLE")
{
std::cout << "Column 1: " << values[0] << "\n";
}
}
The actual processing function doesn't have to meddle with trimming, line ends, even parsing the details columns :)
Full Code for reference
#include <iostream>
#include <fstream>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
static const std::string B_condition = "INLE";
void process_line(std::vector<int> const& values, std::string const& kind)
{
if (kind == "INLE")
{
std::cout << "Column 1: " << values[0] << "\n";
}
}
int main()
{
std::ifstream myfile("ramp.bnd");
myfile.unsetf(std::ios::skipws);
boost::spirit::istream_iterator f(myfile), l;
using namespace qi;
bool ok = phrase_parse(f, l,
(repeat(7) [ int_ ] >> as_string[lexeme[+(char_ - eol)]])
[ phx::bind(process_line, _1, _2) ]
% eol, // supports CRLF and LF
blank);
if (!ok)
std::cerr << "Parse errors\n";
if (f!=l)
std::cerr << "Remaing input: '" << std::string(f,l) << "'\n";
}
You don't need a library like boost at all. A solution with pur standard C++ is possible in some lines of code too:
const std::string B_condition = "INLE";
std::ifstream myfile ("ramp.bnd");
for( char c; myfile >> c; )
{
if( std::isdigit(c, myfile.getloc() ) ) // needs #include <locale>
{
int i;
if( myfile.putback(c) >> i )
std::cout << "read " << i << std::endl; // do something with 'i'
}
else
{
std::string token;
if( myfile.putback(c) >> token )
{
if( token == B_condition )
std::cout << B_condition << " found\n";
else
; // no number, no B_condition -> what ever You want to do
}
}
}

How to expand environment variables in .ini files using Boost

I have a INI file like
[Section1]
Value1 = /home/%USER%/Desktop
Value2 = /home/%USER%/%SOME_ENV%/Test
and want to parse it using Boost. I tried using Boost property_tree like
boost::property_tree::ptree pt;
boost::property_tree::ini_parser::read_ini("config.ini", pt);
std::cout << pt.get<std::string>("Section1.Value1") << std::endl;
std::cout << pt.get<std::string>("Section1.Value2") << std::endl;
But it didn't expand the environment variable. Output looks like
/home/%USER%/Desktop
/home/%USER%/%SOME_ENV%/Test
I was expecting something like
/home/Maverick/Desktop
/home/Maverick/Doc/Test
I am not sure if it is even possible with boost property_tree.
I would appreciate any hint to parse this kind of file using boost.
And here's another take on it, using the old crafts:
not requiring Spirit, or indeed Boost
not hardwiring the interface to std::string (instead allowing any combination of input iterators and output iterator)
handling %% "properly" as a single % 1
The essence:
#include <string>
#include <algorithm>
static std::string safe_getenv(std::string const& macro) {
auto var = getenv(macro.c_str());
return var? var : macro;
}
template <typename It, typename Out>
Out expand_env(It f, It l, Out o)
{
bool in_var = false;
std::string accum;
while (f!=l)
{
switch(auto ch = *f++)
{
case '%':
if (in_var || (*f!='%'))
{
in_var = !in_var;
if (in_var)
accum.clear();
else
{
accum = safe_getenv(accum);
o = std::copy(begin(accum), end(accum), o);
}
break;
} else
++f; // %% -> %
default:
if (in_var)
accum += ch;
else
*o++ = ch;
}
}
return o;
}
#include <iterator>
std::string expand_env(std::string const& input)
{
std::string result;
expand_env(begin(input), end(input), std::back_inserter(result));
return result;
}
#include <iostream>
#include <sstream>
#include <list>
int main()
{
// same use case as first answer, show `%%` escape
std::cout << "'" << expand_env("Greeti%%ng is %HOME% world!") << "'\n";
// can be done streaming, to any container
std::istringstream iss("Greeti%%ng is %HOME% world!");
std::list<char> some_target;
std::istreambuf_iterator<char> f(iss), l;
expand_env(f, l, std::back_inserter(some_target));
std::cout << "Streaming results: '" << std::string(begin(some_target), end(some_target)) << "'\n";
// some more edge cases uses to validate the algorithm (note `%%` doesn't
// act as escape if the first ends a 'pending' variable)
std::cout << "'" << expand_env("") << "'\n";
std::cout << "'" << expand_env("%HOME%") << "'\n";
std::cout << "'" << expand_env(" %HOME%") << "'\n";
std::cout << "'" << expand_env("%HOME% ") << "'\n";
std::cout << "'" << expand_env("%HOME%%HOME%") << "'\n";
std::cout << "'" << expand_env(" %HOME%%HOME% ") << "'\n";
std::cout << "'" << expand_env(" %HOME% %HOME% ") << "'\n";
}
Which, on my box, prints:
'Greeti%ng is /home/sehe world!'
Streaming results: 'Greeti%ng is /home/sehe world!'
''
'/home/sehe'
' /home/sehe'
'/home/sehe '
'/home/sehe/home/sehe'
' /home/sehe/home/sehe '
' /home/sehe /home/sehe '
1 Of course, "properly" is subjective. At the very least, I think this
would be useful (how else would you configure a value legitimitely containing %?)
is how cmd.exe does it on Windows
I'm pretty sure that this could be done trivally (see my newer answer) using a handwritten parser, but I'm personally a fan of Spirit:
grammar %= (*~char_("%")) % as_string ["%" >> +~char_("%") >> "%"]
[ _val += phx::bind(safe_getenv, _1) ];
Meaning:
take all non-% chars, if any
then take any word from inside %s and pass it through safe_getenv before appending
Now, safe_getenv is a trivial wrapper:
static std::string safe_getenv(std::string const& macro) {
auto var = getenv(macro.c_str());
return var? var : macro;
}
Here's a complete minimal implementation:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
static std::string safe_getenv(std::string const& macro) {
auto var = getenv(macro.c_str());
return var? var : macro;
}
std::string expand_env(std::string const& input)
{
using namespace boost::spirit::qi;
using boost::phoenix::bind;
static const rule<std::string::const_iterator, std::string()> compiled =
*(~char_("%")) [ _val+=_1 ]
% as_string ["%" >> +~char_("%") >> "%"] [ _val += bind(safe_getenv, _1) ];
std::string::const_iterator f(input.begin()), l(input.end());
std::string result;
parse(f, l, compiled, result);
return result;
}
int main()
{
std::cout << expand_env("Greeting is %HOME% world!\n");
}
This prints
Greeting is /home/sehe world!
on my box
Notes
this is not optimized (well, not beyond compiling the rule once)
replace_regex_copy would do as nicely and more efficient (?)
see this answer for a slightly more involved 'expansion' engine: Compiling a simple parser with Boost.Spirit
using output iterator instead of std::string for accumulation
allowing nested variables
allowing escapes