C++ regex search specific chinese pattern - c++

I need to regex search specific chinese pattern from C++ string
For example I have a source string "什么手机好" and a pattern "什么(.*)好"
I use boost::regex_search and wstring to do this , but there is something wrong
when the search string has english or number, the code doesn't work, for example , source string is "abc什么efg手机好" pattern is "什么(.*)好", the code do ACT2. And source string is "" (empty string) pattern is "什么(.*)好" , the code do ACT1
I want to know how to fix it.
func
std::wstring string2wstring(const std::string& s) {
setlocale(LC_CTYPE, "");
int iWLen = std::mbstowcs(NULL, s.c_str(), s.length());
wchar_t *lpwsz= new wchar_t[iWLen + 1];
std::mbstowcs(lpwsz, s.c_str(), s.length());
std::wstring wstrResult(lpwsz);
delete []lpwsz;
return wstrResult;
}
std::wstring ws = string2wstring(s);
boost::wregex wpattern(string2wstring(pattern));
if (boost::regex_search(ws, wpattern) == true) {
do ACT1;
} else {
do ACT2;
}

It is embarrassed that I needn't use wstring to deal chinese regex.
Just making query and pattern gbk-string can fix it.
boost::regex_search(query,pattern)

Related

Writing normal C++ String to Rapid JSON is resulting in string with backslash

string str = {"appId":"com.mymusic.app","Connectivity":True,"DistractionLevel":2,"display":True};
if(!str.empty())
{
StringBuffer bf;
PrettyWriter<StringBuffer> writer (bf);
writer.StartObject();
writer.Key("info"));
writer.String(str.c_str());
writer.EndObject();
cout<<"request string is:" , (char *)bf.GetString());
}
cout is printing the below line with back slash
{"info":"\"appId\":\"com.mymusic.app\",\"checkConnectivity\":True,\"driverDistractionLevel\":2,\"display\":True}"}
What i was expecting is
{"info": {"appId":"com.mymusic.app","Connectivity":True,"DistractionLevel":2,"display":True} }
You are using the the wrong function. The String function will add a string value to the json-object and in this context the escaping of " to \" is expected.
I think what you actually want to do is add the string as a json-sub-object. From what I found in the rapidjson documentation the function you want to use for that is RawValue.

Conditional replace string using boost::regex_replace

I want to simplify the signs in a mathematical expression using regex_replace, here is a sample code:
string entry="6+-3++5";
boost::regex signs("[\-\+]+");
cout<<boost::regex_replace(entry,signs,"?")<<endl;
The output is then 6?3?5. My question is: How can I get the proper result of 6-3+5 with some neat regular expression tools? Thanks a lot.
Tried something else with sregex_iterator and smatch, but still has some problem:
string s="63--17--42+5555";
collect_sign(s);
Output is
63+17--42+5555+42+5555+5555
i.e.
63+(17--42+5555)+(42+5555)+5555
It seems to me that the problem is related to the match.suffix(), Could anybody help please? The collect_sign function basically just iterate through every sign strings, convert it to "-"/"+" if the number of "-" is odd/even, and then stitch together the suffix expression of the signs.
void collect_sign(string& entry)
{
boost::regex signs("[\-\+]+");
string output="";
auto signs_begin = boost::sregex_iterator(entry.begin(), entry.end(), signs);
auto signs_end = boost::sregex_iterator();
for (boost::sregex_iterator it = signs_begin; it != signs_end; ++it)
{
boost::smatch match = *it;
if (it ==signs_begin)
output+=match.prefix().str();
string match_signs = match.str();
int n_minus=count(match_signs.begin(),match_signs.end(),'-');
if (n_minus%2==0)
output+="+";
else
output+="-";
output+=match.suffix();
}
cout<<"simplify to: "<<output<<endl;
}
Use:
[+\-*\/]*([+\-*\/])
Replace:
$1
You can test here
If you just want a mathematical simplification, you can use:
s = boost::regex_replace(s, boost::regex("(?:++|--"), "+", boost::format_all);
s = boost::regex_replace(s, boost::regex("(?:+-|-+"), "-", boost::format_all);

How do I replace the extensions of filenames using regex in c++?

I would like to replace the file extensions from .nef to .bmp. How do I do it using regex?
My code is something like -
string str("abc.NEF");
regex e("(.*)(\\.)(N|n)(E|e)(F|f)");
string st2 = regex_replace(str, e, "$1");
cout<<regex_match (str,e)<<"REX:"<<st2<<endl;
regex_match (str,e) gets me a hit, but st2 turns out blank. I am not very familiar with regex, but I expected to have something appear in st2. What I am doing wrong?
try this.
it will match .NEF or .nef
string str("abc.NEF");
regex e(".*(\.(NEF)|\.(nef))");
string st2 = regex_replace(str,e,"$1");
$1 will capture .NEF or .nef
check here
Try this
string test = "abc.NEF";
regex reg("\.(nef|NEF)");
test = regex_replace(test, reg, "your_string");
I suggest not to use regex for such a simple task.
Try this function:
#include <string>
#include <algorithm>
std::string Rename(const std::string& name){
std::string newName(name);
static const std::string oldSuffix = "nef";
static const std::string newSuffix = "bmp";
auto dotPos = newName.rfind('.');
if (dotPos == newName.size() - oldSuffix.size() - 1){
auto suffix = newName.substr(dotPos + 1);
std::transform(suffix.begin(), suffix.end(), suffix.begin(), ::tolower);
if (suffix == oldSuffix)
newName.replace(dotPos + 1, std::string::npos, newSuffix);
}
return newName;
}
At first we find a delimiter position, then fetching the whole file extension (suffix), converting it to lower case and compare to oldSuffix.
Of course you can set oldSuffix and newSuffix to be arguments, not static consts.
Here is a test program: http://ideone.com/D09NVL
I think boost offers the simplest and most readable solution with
auto result = boost::algorithm::ireplace_last_copy(input, ".nef", ".bmp");
I think this
string str("abc.NEF");
regex e("(.*)\\.[Nn][Ee][Ff]$");
string st2 = regex_replace(str, e, "$1.bmp");
cout<<regex_match(str, e)<<"REX:"<<st2<<endl;
will work out better for you.

Parsing a string by a delimeter in C++

Ok, so I need some info parsed and I would like to know what would be the best way to do it.
Ok so here is the string that I need to parse. The delimeter is the "^"
John Doe^Male^20
I need to parse the string into name, gender, and age variables. What would be the best way to do it in C++? I was thinking about looping and set the condition to while(!string.empty()
and then assign all characters up until the '^' to a string, and then erase what I have already assigned. Is there a better way of doing this?
You can use getline in C++ stream.
istream& getline(istream& is,string& str,char delimiter=’\n’)
change delimiter to '^'
You have a few options. One good option you have, if you can use boost, is the split algorithm they provide in their string library. You can check out this so question to see the boost answer in action: How to split a string in c
If you cannot use boost, you can use string::find to get the index of a character:
string str = "John Doe^Male^20";
int last = 0;
int cPos = -1;
while ((cPos = str.find('^', cPos + 1)) != string::npos)
{
string sub = str.substr(last, cPos - last);
// Do something with the string
last = cPos + 1;
}
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "This is a sample string";
char * pch;
printf ("Looking for the 's' character in \"%s\"...\n",str);
pch=strchr(str,'s');
while (pch!=NULL)
{
printf ("found at %d\n",pch-str+1);
pch=strchr(pch+1,'s');
}
return 0;
}
Do something like this in an array.
You have a number of choices but I would use strtok(), myself. It would make short work of this.

Regular expressions question

I've got the following string :
const std::string args = "cmdLine=\"-d ..\\data\\configFile.cfg\" rootDir=\"C:\\abc\\def\""; // please note the space after -d
I'd like to split it into 2 substrings :
std::str1 = "cmdLine=...";
and
std::str2 = "rootDir=...";
using boost/algorithm/string.hpp . I figured, regular expressions would be best for this but unfortunately I have no idea how to construct one therefore I needed to ask the question.
Anyone capable of helping me out with this one?
To solve problem from your question the easiest way is to use strstr to find substring in string, and string::substr to copy substring. But if you really want to use Boost and regular expressions you could make it as in the following sample:
#include <boost/regex.hpp>
...
const std::string args = "cmdLine=\"-d ..\\data\\configFile.cfg\" rootDir=\"C:\\abc\\def\"";
boost::regex exrp( "(cmdLine=.*) (rootDir=.*)" );
boost::match_results<string::const_iterator> what;
if( regex_search( args, what, exrp ) ) {
string str1( what[1].first, what[1].second ); // cmdLine="-d ..\data\configFile.cfg"
string str2( what[2].first, what[2].second ); // rootDir="C:\abc\def"
}
Code samples
char *cstr1 = (char*)args.c_str();
char *cstr2 = strstr(cstr1, "=\""); cstr2 = strstr(cstr2, "=\"); // rootDir="
cstr2 = strrchr(cstr2, ' '); // space between " and rootDir
*cstr2++ = '\0';
//then save to your strings
std::string str1 = cstr1;
std::string str2 = cstr2;
that's all.
Notes:
Above code supports these strings
"cmdLine=\"-d ..\\data\\configFile.cfg\" rootDir=\"C:\\abc\\def\"" or
"ABCwhatever=\"-d ..\\data\\configFile.cfg\" XYZ=\"C:\\abc\\def\""