CRichEditCtrl - RegEx - regex

How do use RegEx search in RichEditCtrl.
The problem I have is to highlight the first instance of text matching a list of regular expressions (the regular expressions can be duplicate, in that case, first regex matches the first instance and the second the second, and so on).
Since FindText does not support regex, I am trying to get all text starting with index 0, match first regular expression, find the match, and then issue the FindText on the matched text, highlight the matched indices, repeat the search from the matched end index and the next regular expression.
int iSearchStart = 0;
for (auto &regexString : regexStrings) {
CString text_cstr;
int txtLength = myRichEdit.GetTextLength();
// I am getting an exception on second regex on the following statement
myRichEdit.GetTextRange(iSearchStart, txtLength-iSearchStart, text_cstr);
string text = text_cstr;
std::smatch match;
std::regex regexObj(regexString);
//look for the first match in the text
string matchedString;
if (std::regex_search(text, match, regexObj)) {
matchedString = match.str();
FINDTEXTEX ft;
ft.chrg.cpMin = iSearchStart;
ft.chrg.cpMax = -1;
//ft.lpstrText = _T(tw.c_str());
ft.lpstrText = _T(matchedString.c_str());
int iFound = myRichEdit.FindText(FR_DOWN | FR_MATCHCASE | FR_WHOLEWORD, &ft);
if (iFound != -1) {
myRichEdit.SetSel(ft.chrgText);
CHARFORMAT2 cf;
::memset(&cf, 0, sizeof(cf));
cf.cbSize = sizeof(cf);
cf.dwMask = CFM_BACKCOLOR;
cf.crBackColor = RGB(255, 160, 160); // pale red
myRichEdit.SetSelectionCharFormat(cf);
iSearchStart = ft.chrgText.cpMax + 1;
}
}
}

I found the problem, I though the second param to GetTextRange is length of the text, but it is actually index of the end.
So if I change
myRichEdit.GetTextRange(iSearchStart, txtLength-iSearchStart, text_cstr);
to
myRichEdit.GetTextRange(iSearchStart, txtLength, text_cstr);
it works!!
I am keeping the code for community to see one way to use regex with CRichEditCtrl.

Related

Save all regex matches on a vector

So, I need to create a function that gets all occurrence matches on one string based on a regex, then store them in an array to ultimately choose an arbitrary capture group number within an individual match. I tried this:
std::string match(std::string basestring, std::string regex, int index, int group) {
std::vector<std::smatch> match;
(here I would need to create a while statement that iterates over all matches, but I'm not sure what overload of 'regex_search' I have to use)
return match.at(index)[group]; }
I thought of getting a match and then starting to search just next to the end position of that match, in order to get the next one, when no match was found we assume that there are no more matches, and so the while statement is over, then the index and group arguments would get the desired capture group within a match. But I can't seem to find a 'regex_search' overload that requires a starting (or starting and end) positions as well as requiring the target string.
I found the solution myself after some hours of digging, this code will do the job:
std::string match(std::string s, std::string r, int index = 0, int group = 0) {
std::vector<std::smatch> match;
std::regex rx(r);
auto matches_begin = std::sregex_iterator(s.begin(), s.end(), rx);
auto matches_end = std::sregex_iterator();
for (std::sregex_iterator i = matches_begin; i != matches_end; ++i) { match.push_back(*i); }
return match.at(index)[group]; }

Qt- checking regular expression width

I'm trying to replace regular expression with different expression which depend on width of line and width of box.
Here is my code:
//mangledText is my text that I've searching on it
//rx is regular expression
QRegExp rx("<lms([^<]*)/>");
while ((pos = rx.indexIn(mangledText)) != -1){
for (int j = 0; j < tempLayout->lineCount(); j++){
QTextLine tl = tempLayout->lineAt(j);
//here is width of each line
int naturalTextWidth = tl.naturalTextWidth();
//rect width is maximum width of box
if (naturalTextWidth < rectWidth)
mangledText.replace(pos, rx.matchedLength(), "replace Text");
else
mangledText.replace(pos, rx.matchedLength(), "\n replace Text");
}
}
mangledText.replace('\n', QChar::LineSeparator);
I want to replace regular expression with "\n replace Text" if text on that line is out of box. otherwise I replace it with "replace Text" . Problem is it will always shift it to next line. because rectWidth is smaller that naturalTextWidth. but I want to check an each regular expression to replace.
UPDATED:
For example :
111111111111111111111111111<lms8><lms3><lms2>
is showing :
111111111111111111111111111
<lms8>
<lms3>
<lms2>
and I want this:
111111111111111111111111111
<lms8><lms3><lms2>
Any suggestion?
Try this.
(.[^<]*)(<lms.*>)
Replace with following syntax.
$1\n$2
The result would be
111111111111111111111111111
<lms8><lms3><lms2>

Regular expression for highlighting words in quotes int qt5

I use QHighlighter class, and used regExp to highlight words in quotes:
void Highlighter::highlightBlock(const QString &text)
{
QRegExp expr("\"(.*?)\"");
int index = expr.indexIn(text);
while(index >=0)
{
int length = expr.matchedLength();
setFormat(index, length, Qt::red);
index = expr.indexIn(text, index+length);
}
}
It doesn't work. Work this:
"\".*\""
But it highlights unnecessary. What regular expression is correct?
Just higlight everything between quotes
QRegExp("\"([^\"]*)\"");
highlight single words (run in loop with offset to match words)
QRegExp("\"(\\w)*\"");
How to match words in quotes:
('|")[^\1]*?\1
Example:
http://regex101.com/r/iF5aA1

C++11 regex replace

I have an XML string that i wish to log out. this XML contains some sensitive data that i'd like to mask out before sending to the log file. Currently using std::regex to do this:
std::regex reg("<SensitiveData>(\\d*)</SensitiveData>");
return std::regex_replace(xml, reg, "<SensitiveData>......</SensitiveData>");
Currently the data is being replaced by exactly 6 '.' characters, however what i really want to do is to replace the sensitive data with the correct number of dots. I.e. I'd like to get the length of the capture group and put that exact number of dots down.
Can this be done?
regex_replace of C++11 regular expressions does not have the capability you are asking for — the replacement format argument must be a string. Some regular expression APIs allow replacement to be a function that receives a match, and which could perform exactly the substitution you need.
But regexps are not the only way to solve a problem, and in C++ it's not exactly hard to look for two fixed strings and replace characters inbetween:
const char* const PREFIX = "<SensitiveData>";
const char* const SUFFIX = "</SensitiveData>";
void replace_sensitive(std::string& xml) {
size_t start = 0;
while (true) {
size_t pref, suff;
if ((pref = xml.find(PREFIX, start)) == std::string::npos)
break;
if ((suff = xml.find(SUFFIX, pref + strlen(PREFIX))) == std::string::npos)
break;
// replace stuff between prefix and suffix with '.'
for (size_t i = pref + strlen(PREFIX); i < suff; i++)
xml[i] = '.';
start = suff + strlen(SUFFIX);
}
}

Use GNU libc regexec() to count substring

Is it possible to count how many times a substring appears in a string using regex matching with GNU libc regexec()?
No, regexec() only finds one match per call. If you want to find the next match, you have to call it again further along the string.
If you only want to search for plain substrings, you are much better off using the standard C string.h function strstr(); then you won't have to worry about escaping special regex characters.
regexec returns in its fourth parameter "pmatch" a structure with all the matches.
"pmatch" is a fixed sized structure, if there are more matches you will call the function another time.
I have found this code with two nested loops and I have modified it. The original cod you cand find it in http://www.lemoda.net/c/unix-regex/index.html:
static int match_regex (regex_t * r, const char * to_match)
{
/* "P" is a pointer into the string which points to the end of the
previous match. */
const char * p = to_match;
/* "N_matches" is the maximum number of matches allowed. */
const int n_matches = 10;
/* "M" contains the matches found. */
regmatch_t m[n_matches];
int number_of_matches = 0;
while (1) {
int i = 0;
int nomatch = regexec (r, p, n_matches, m, 0);
if (nomatch) {
printf ("No more matches.\n");
return nomatch;
}
for (i = 0; i < n_matches; i++) {
if (m[i].rm_so == -1) {
break;
}
number_of_matches ++;
}
p += m[0].rm_eo;
}
return number_of_matches ;
}
sorry for creating another answer, because I have not 50 reputation. I cannot comment #Oscar Raig Colon's answer.
pmatch cannot match all the substrings, pmatch is used to save the of offset for subexpression, the key is to understand what's subexpression, subexpression is "\(\)" in BRE, "()" in ERE. if there is not subexpression in entire regular expression, regexec() only return the first match string's offset and put it to pmatch[0].
you can find a example at [http://pubs.opengroup.org/onlinepubs/007908799/xsh/regcomp.html][1]
The following demonstrates how the REG_NOTBOL flag could be used with regexec() to find all substrings in a line that match a pattern supplied by a user. (For simplicity of the example, very little error checking is done.)
(void) regcomp (&re, pattern, 0);
/* this call to regexec() finds the first match on the line */
error = regexec (&re, &buffer[0], 1, &pm, 0);
while (error == 0) { /* while matches found */
/* substring found between pm.rm_so and pm.rm_eo */
/* This call to regexec() finds the next match */
error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
}