boost xpressive: wrong match? - c++

I want to match a simple expression with boost, but it behaves strange... The code below should match and display "a" from first and second strings:
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
#include "stdio.h"
using namespace boost::xpressive;
void xmatch_action( const char *line ) {
cregex g_re_var;
cmatch what;
g_re_var = cregex::compile( "\\s*var\\s+([\\w]+)\\s*=.*?" );
if (regex_match(line, what, g_re_var )) {
printf("OK\n");
printf(">%s<\n", what[1] );
}
else {
printf("NOK\n");
}
}
int main()
{
xmatch_action("var a = qqq");
xmatch_action(" var a = aaa");
xmatch_action(" var abc ");
}
but my actual output is:
OK
>a = qqq<
OK
>a = aaa<
NOK
and it should be
OK
>a<
OK
>a<
NOK

Instead of printf() use the << operator to print the sub_match object (what[1]). Or you can try using what[1].str() instead of what[1].
See the docs: sub_match, match_results, regex_match

Remove square brackets around \w in regex AND use std::cout for printing. Then you will get result that you want.

Related

Cross platform file list using wildcard

I'm looking for a cross platform function that supports wildcard listing of a directory contents similar to what FindFirstFile on windows.
Is the wildcard pattern accepted in windows very specific to windows? I want something that supports FindFirstFile wildcard pattern but he working in Linux as well.
If C++17 and above:
You can "walk" a directory using a directory iterator, and match walked file names with a regex, like this:
static std::optional<std::string> find_file(const std::string& search_path, const std::regex& regex) {
const std::filesystem::directory_iterator end;
try {
for (std::filesystem::directory_iterator iter{search_path}; iter != end; iter++) {
const std::string file_ext = iter->path().extension().string();
if (std::filesystem::is_regular_file(*iter)) {
if (std::regex_match(file_ext, regex)) {
return (iter->path().string());
}
}
}
}
catch (std::exception&) {}
return std::nullopt;
}
Usage would be for example, for finding the first file, that ends in .txt:
auto first_file = find_file("DocumentsDirectory", std::regex("\\.(?:txt)"));
Similarly, if you are interested in more than matching by extension, the function line
const std::string file_ext = iter->path().extension().string();
should be modified to something that captures the part of the filename you are interested in (or the whole path to the file)
This could then be used in a function, which performs the wildcard listing by directory.
Here is a recursive variant.
It calls a functional f for each file in the list and returns the number of files found.
It is also recursive: it descends sub directories to the max depth specified.
Note that the search filter does a filename is matched.
The try-catch block in removed so that the caller can catch and process any problems.
#include <string>
#include <regex>
#include <filesystem>
// recursively call a functional *f* for each file that matches the expression
inline int foreach_file(const std::string& search_path, const std::regex& regex, int depth, std::function<void(std::string)> f) {
int n = 0;
const std::filesystem::directory_iterator end;
for (std::filesystem::directory_iterator iter{ search_path }; iter != end; iter++) {
const std::string filename = iter->path().filename().string();
if (std::filesystem::is_regular_file(*iter)) {
if (std::regex_match(filename, regex)) {
n++;
f(iter->path().string());
}
}
else if (std::filesystem::is_directory(*iter) && depth>0) {
n += foreach_file(iter->path().string(), regex, depth - 1, f);
}
}
return n;
}
Example:
void do_something(string filename) {
...
}
void do_all_json_that_start_with_z() {
// regex matches the whole filename
regex r("z.*.json", regex::ECMAScript | regex::icase); // ignoring case
foreach_file(R"(C:\MyFiles\)", r, 99, do_something); // max depth 99
}
// can use lambdas
void do_all_json_that_start_with_z() {
int n=0;
foreach_file(
R"(C:\MyFiles\)", // using raw string - for windows
regex("z.*.json"),
0, // do not descend to sub-directories
[&n](string s) { printf("%d) %s\n", ++n, s.c_str()); });
}

How to test a string that contains a random value?

My mock class looks like this:
struct StringEater {
MOCK_CONST_METHOD1( ExecuteCommand, void( const char* ) );
};
and the string consist of part that doesn't change, and small part that I can not set in test. Something like this :
Command 825 finished
but it can be
Command 123 finished
or "Command " + any number + " finished".
The method from the mock class is always called.
So, how do I set the test? This obviously can not be used:
StringEater mock;
EXPECT_CALL( mock, ExecuteCommand( StrEq( expectedJsonCmd ) ) ).Times( 1 );
What do I need to put for the matcher?
This works (thanks to J):
TEST( abc, some )
{
struct StringEater {
MOCK_CONST_METHOD1( ExecuteCommand, void( const char* ) );
};
StringEater eater;
EXPECT_CALL( eater, ExecuteCommand( MatchesRegex( "Command\\s([0-9]*)\\sfinished" ) ) ).Times( 1 );
eater.ExecuteCommand( "Command 643 finished" );
}
So it will always be the string "Command" followed by an integral number followed by the string "finished"?
Then it could be tested by attempting to extract these three parts from the string, comparing the first and third parts to the expected strings.
Something like
std::istringstream iss(the_input_string);
std::string s1, s3;
int i2;
if (iss >> s1 >> i2 >> s3)
{
if (s1 == "Command" && s3 == "finished")
{
// Test succeeded
}
else
{
// Test failed
}
}
else
{
// Failed, not correct format
}
You could try using regexp where you check whether the string begins with "Command" and ends with "finished" and extract the number as a group.
Similar problem was described in another post:
Regex C++: extract substring

How can I make this regex replace function efficient?

I am using this function to perform regex replace on std::string:
String regexReplace(String s,String search,String replace,String modifier,int user){
bool case_sensitive=true,global=false;
String replaced_string=s;
if(modifier=="gi" || modifier=="ig"){global=true;case_sensitive=false;}
else if(modifier=="i"){case_sensitive=false;}
else if(modifier=="g"){global=true;}
try {
std::regex re (search);
if(user==1){re=createRegex(search,case_sensitive);}
else if(!case_sensitive){re= Regex (search, REGEX_DEFAULT | ICASE);}
if(global){
replaced_string=std::regex_replace (s,re,replace,std::regex_constants::format_default);
}
else{
replaced_string=std::regex_replace (s,re,replace,NON_RECURSIVE_REGEX_REPLACE);
}
}
catch (std::regex_error& e) {
printErrorLog("Invalid replace string regex: "+search);
Exit(1);
}
return replaced_string;
}
typedefs and #defines used:
typedef std::regex Regex;
typedef std::string String;
#define REGEX_DEFAULT std::regex::ECMAScript
#define ICASE std::regex::icase
#define NON_RECURSIVE_REGEX_REPLACE std::regex_constants::format_first_only
But this function consumes approximately 0.3 seconds on 14x4 consecutive executions:
res=regexReplace(res,"([^\\.]*\\.\\d+?)0+$","$1","i",0);
res=regexReplace(res,"([^\\.]*\\.\\d+?)0+(e.*)$","$1$2","i",0);
res=regexReplace(res,"([^\\.]*)\\.0*$","$1","i",0);
res=regexReplace(res,"([^\\.]*)\\.0*(e.*)$","$1$2","i",0);
Can I make it more efficient to lessen the execution time?
Note:
The createRegex() function is not being called (user=0 by default).

Clang Tool: rewrite ObjCMessageExpr

I want to rewrite all messages in my code,
I need replace only selectors, but I need be able to replace nested expressions
f. e. :
[super foo:[someInstance someMessage:#""] foo2:[someInstance someMessage2]];
I tried do it with clang::Rewriter replaceText and just generate new string,
but there is a problem: It would not be work if I change selectors length, because I replace nested messages with those old positions.
So, I assumed that I need to use clang::Rewriter ReplaceStmt(originalStatement, newStatement);
I am using RecursiveASTVisitor to visit all messages, and I want to copy those messages objects, and replace selectors:
How can I do that?
I tried use ObjCMessageExpr::Create but there is so meny args, I don't know how to get ASTContext &Context and ArrayRef<SourceLocation> SeLocs and Expr *Receiver parameters from the original message.
What is the proper way to replace selectors in nested messages using clang tool (clang tooling interface)?
Update:
Should I use ReplaceStmtWithStmt callback and ASTMatchFinder ?
Update:
I am using following function to rewrite text in file:
void ReplaceText(SourceLocation start, unsigned originalLength, StringRef string) {
m_rewriter.ReplaceText(start, originalLength, string);
m_rewriter.overwriteChangedFiles();
}
And I want to replace all messageExpr in code with new selector f.e:
how it was:
[object someMessage:[object2 someMessage:obj3 calculate:obj4]];
how it should be:
[object newSelector:[object2 newSelector:obj3 newSelector:obj4]];
I am using ReqoursiveASTVisitor:
bool VisitStmt(Stmt *statement) {
if (ObjCMessageExpr *messageExpr = dyn_cast<ObjCMessageExpr>(statement)) {
ReplaceMessage(*messageExpr)
}
return true;
}
I created method for generating new message expr string:
string StringFromObjCMessageExpr(ObjCMessageExpr& messageExpression) {
std::ostringstream stringStream;
const string selectorString = messageExpression.getSelector().getAsString();
cout << selectorString << endl;
vector<string> methodParts;
split(selectorString, ParametersDelimiter, methodParts);
stringStream << "[" ;
const string receiver = GetStringFromLocations(m_compiler, messageExpression.getReceiverRange().getBegin(), messageExpression.getSelectorStartLoc());
stringStream << receiver;
clang::ObjCMessageExpr::arg_iterator argIterator = messageExpression.arg_begin();
for (vector<string>::const_iterator partsIterator = methodParts.begin();
partsIterator != methodParts.end();
++partsIterator) {
stringStream << "newSelector";
if (messageExpression.getNumArgs() != 0) {
const clang::Stmt *argument = *argIterator;
stringStream << ":" << GetStatementString(*argument) << " ";
++argIterator;
}
}
stringStream << "]";
return stringStream.str();
}
void ReplaceMessage(ObjCMessageExpr& messageExpression) {
SourceLocation locStart = messageExpression.getLocStart();
SourceLocation locEnd = messageExpression.getLocEnd();
string newExpr = StringFromObjCMessageExpr(messageExpression);
const int exprStringLegth = m_rewriter.getRangeSize(SourceRange(locStart, locEnd));
ReplaceText(locStart, exprStringLegth, newExpr);
}
The problem occurs when I try to replace nested messages, like that:
[simpleClass doSomeActionWithString:string3 andAnotherString:string4];
[simpleClass doSomeActionWithString:str andAnotherString:str2];
[simpleClass doSomeActionWithString:#"" andAnotherString:#"asdasdsad"];
[simpleClass setSimpleClassZAZAZAZAZAZAZAZA:[simpleClass getSimpleClassZAZAZAZAZAZAZAZA]];
the result is:
[simpleClass newSelector:string3 newSelector:string4 ];
[simpleClass newSelector:str newSelector:str2 ];
[simpleClass newSelector:#"" newSelector:#"asdasdsad" ];
[simpleClass newSelector:[simpleClass getSimp[simpleClass newSelector]];
because messageExpression has "old" value of getLocStart(); and getLocEnd(); How can I fix it?
You can rewrite selector name by replacing only continuous parts of selector name. For example, replace only underlined parts
[object someMessage:[object2 someMessage:obj3 calculate:obj4]];
^~~~~~~~~~~ ^~~~~~~~~~~ ^~~~~~~~~
To achieve this you require only
number of selector parts - ObjCMessageExpr::getNumSelectorLocs()
their locations - ObjCMessageExpr::getSelectorLoc(index)
their lengths - ObjCMessageExpr::getSelector().getNameForSlot(index).size().
Overall, you can rewrite ObjCMessageExpr with the following RecursiveASTVisitor:
#include "clang/AST/ASTConsumer.h"
#include "clang/AST/ASTContext.h"
#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Rewrite/Core/Rewriter.h"
namespace clang_tooling
{
using clang::SourceLocation;
class RewritingVisitor : public clang::ASTConsumer,
public clang::RecursiveASTVisitor<RewritingVisitor>
{
public:
// You can obtain SourceManager and LangOptions from CompilerInstance when
// you are creating visitor (which is also ASTConsumer) in
// clang::ASTFrontendAction::CreateASTConsumer.
RewritingVisitor(clang::SourceManager &sourceManager,
const clang::LangOptions &langOptions)
: _sourceManager(sourceManager), _rewriter(sourceManager, langOptions)
{}
virtual void HandleTranslationUnit(clang::ASTContext &context)
{
TraverseDecl(context.getTranslationUnitDecl());
_rewriter.overwriteChangedFiles();
}
bool VisitObjCMessageExpr(clang::ObjCMessageExpr *messageExpr)
{
if (_sourceManager.isInMainFile(messageExpr->getLocStart()))
{
clang::Selector selector = messageExpr->getSelector();
for (unsigned i = 0, end = messageExpr->getNumSelectorLocs();
i < end; ++i)
{
SourceLocation selectorLoc = messageExpr->getSelectorLoc(i);
_rewriter.ReplaceText(selectorLoc,
selector.getNameForSlot(i).size(),
"newSelector");
}
}
return Base::VisitObjCMessageExpr(messageExpr);
}
private:
typedef clang::RecursiveASTVisitor<RewritingVisitor> Base;
clang::SourceManager &_sourceManager;
clang::Rewriter _rewriter;
};
} // end namespace clang_tooling

extract domain between two words

I have in a log file some lines like this:
11-test.domain1.com Logged ...
37-user1.users.domain2.org Logged ...
48-me.server.domain3.net Logged ...
How can I extract each domain without the subdomains? Something between "-" and "Logged".
I have the following code in c++ (linux) but it doesn't extract well. Some function which is returning the extracted string would be great if you have some example of course.
regex_t preg;
regmatch_t mtch[1];
size_t rm, nmatch;
char tempstr[1024] = "";
int start;
rm=regcomp(&preg, "-[^<]+Logged", REG_EXTENDED);
nmatch = 1;
while(regexec(&preg, buffer+start, nmatch, mtch, 0)==0) /* Found a match */
{
strncpy(host, buffer+start+mtch[0].rm_so+3, mtch[0].rm_eo-mtch[0].rm_so-7);
printf("%s\n", tempstr);
start +=mtch[0].rm_eo;
memset(host, '\0', strlen(host));
}
regfree(&preg);
Thank you!
P.S. no, I cannot use perl for this because this part is inside of a larger c program which was made by someone else.
EDIT:
I replace the code with this one:
const char *p1 = strstr(buffer, "-")+1;
const char *p2 = strstr(p1, " Logged");
size_t len = p2-p1;
char *res = (char*)malloc(sizeof(char)*(len+1));
strncpy(res, p1, len);
res[len] = '\0';
which is extracting very good the whole domain including subdomains.
How can I extract just the domain.com or domain.net from abc.def.domain.com ?
is strtok a good option and how can I calculate which is the last dot ?
#include <vector>
#include <string>
#include <boost/regex.hpp>
int main()
{
boost::regex re(".+-(?<domain>.+)\\s*Logged");
std::string examples[] =
{
"11-test.domain1.com Logged ...",
"37-user1.users.domain2.org Logged ..."
};
std::vector<std::string> vec(examples, examples + sizeof(examples) / sizeof(*examples));
std::for_each(vec.begin(), vec.end(), [&re](const std::string& s)
{
boost::smatch match;
if (boost::regex_search(s, match, re))
{
std::cout << match["domain"] << std::endl;
}
});
}
http://liveworkspace.org/code/1983494e6e9e884b7e539690ebf98eb5
something like this with boost::regex. Don't know about pcre.
Is the in a standard format?
it appears so, is there a split function?
Edit:
Here is some logic.
Iterate through each domain to be parsed
Find a function to locate the index of the first string "-"
Next find the index of the second string minus the first string "Logged"
Now you have the full domain.
Once you have the full domain "Split" the domain into your object of choice (I used an array)
now that you have the array broken apart locate the index of the value you wish to reassemble (concatenate) to capture only the domain.
NOTE Written in C#
Main method which defines the first value and the second value
`static void Main(string[] args)
{
string firstValue ="-";
string secondValue = "Logged";
List domains = new List { "11-test.domain1.com Logged", "37-user1.users.domain2.org Logged","48-me.server.domain3.net Logged"};
foreach (string dns in domains)
{
Debug.WriteLine(Utility.GetStringBetweenFirstAndSecond(dns, firstValue, secondValue));
}
}
`
Method to parse the string:
`public string GetStringBetweenFirstAndSecond(string str, string firstStringToFind, string secondStringToFind)
{
string domain = string.Empty;
if(string.IsNullOrEmpty(str))
{
//throw an exception, return gracefully, whatever you determine
}
else
{
//This can all be done in one line, but I broke it apart so it can be better understood.
//returns the first occurrance.
//int start = str.IndexOf(firstStringToFind) + 1;
//int end = str.IndexOf(secondStringToFind);
//domain = str.Substring(start, end - start);
//i.e. Definitely not quite as legible, but doesn't create object unnecessarily
domain = str.Substring((str.IndexOf(firstStringToFind) + 1), str.IndexOf(secondStringToFind) - (str.IndexOf(firstStringToFind) + 1));
string[] dArray = domain.Split('.');
if (dArray.Length > 0)
{
if (dArray.Length > 2)
{
domain = string.Format("{0}.{1}", dArray[dArray.Length - 2], dArray[dArray.Length - 1]);
}
}
}
return domain;
}
`