Not matching string of regular expression in C programing - regex

Requirement is,
In a string (plan is the variable), For which does not have the sub string "TT", should return success message in the pattern matching ("Regular expression not having TT" message in the console).
I tried [^(TT)], I tried negation,
I tried ^(?!.*TT).$
#include <stdio.h>
#include "regex.h"
int main()
{
regex_t exps;
int r1 =-1;
int r2= -1;
char *pattern="\\^(\\?\\!.*TT).*\\$";
char *plan="TEST QBSE US 5USD charge sample conv offer";
r1=regcomp(&exps, pattern, REG_EXTENDED | REG_NOSUB);
if (r1 == 0)
{
printf("Regular expression is parsed sucessfully:%s \n",pattern);
}
else
{
printf("Regular expression parsing failed.\n");
}
r2=regexec(&exps, plan, (size_t)0, NULL, 0);
if (r2 == 0)
{
printf("Regular expression not having TT \n");
}
else
{
printf("Regular expression is not matched.\n");
}
regfree(&exps);
return 0;
}
current output:
Regular expression is parsed sucessfully :\^(\?!.TT).\$
Regular expression is not matched.

You may use
char *pattern="^[^T]*(T($|[^T]+))*$";
See the C code demo and the regex demo.
Details
^ - start of string
[^T]* - 0 or more chars other than T
(T($|[^T]+))* - 0 or more repetitions of
T - a T char...
($|[^T]+) - ... followed with the end of string ($) or (|) any 1 or more chars other than T ([^T]+)
$ - end of string.
NOTE: If the regex is any kind of regex pattern that you need to negate the way you describe (validate) you should actually do that easier with the code: use a mere char *pattern="TT"; pattern and if regexec return value is not 0 return true (if (r2 != 0)):
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
int main (void)
{
regex_t exps;
int r1 =-1;
int r2= -1;
char *pattern="TT"; // <-- The regex is simple
char *plan="TEST QBSE US 5USD charge sample conv offer";
r1=regcomp(&exps, pattern, REG_EXTENDED | REG_NOSUB);
if (r1 == 0)
{
printf("Regular expression is parsed sucessfully:%s \n",pattern);
}
else
{
printf("Regular expression parsing failed.\n");
}
r2=regexec(&exps, plan, (size_t)0, NULL, 0);
if (r2 != 0) // <-- Here goes the negation
{
printf("Regular expression not having TT \n");
}
else
{
printf("Regular expression is not matched.\n");
}
regfree(&exps);
return 0;
}
See the C demo online.

Related

Simple text file formatter crashes under Linux, but fine in Windows

I've made a simple .acf file to .json file formatter. But for some reason it runs correctly under Windows with GCC using msys2 - But after executing a string insert or replace - it segmentation faults every time.
What it does is convert the below file into a json compatible format. It appends commas after each entry, applies attribute set symbol and puts braces around it.
Save as test.acf:
"AppState"
{
"appid" "730"
"Universe" "1"
"name" "Counter-Strike: Global Offensive"
"StateFlags" "4"
"installdir" "Counter-Strike Global Offensive"
"LastUpdated" "1462547468"
"UpdateResult" "0"
"SizeOnDisk" "14990577143"
"buildid" "1110931"
"LastOwner" "76561198013962068"
"BytesToDownload" "8768"
"BytesDownloaded" "8768"
"AutoUpdateBehavior" "1"
"AllowOtherDownloadsWhileRunning" "0"
"UserConfig"
{
"Language" "english"
}
"MountedDepots"
{
"731" "205709710082221598"
"734" "5169984513691014102"
}
}
Minimal main code with defects triple slashed:
#include <iostream>
#include <fstream>
#include <string>
int main(int argc, char* argv[])
{
file.open("test.acf");
std::string data((std::istreambuf_iterator<char>(file)), (std::istreambuf_iterator<char>()));
int indexQuote = 0;
int index[4];
int insertCommaNext = -1;
string delims = "\"{}"; // It skips between braces and quotes only
std::size_t found = data.find_first_of(delims);
while(found != std::string::npos)
{
int inc = 1; // 0-4 depending on the quote - 0"key1" 2"value3" 4{
char c = data.at(found);
if (c != '"') {
if (c == '}')
insertCommaNext = found + 1; // Record index to insert comma after (following closing brace)
else if (c == '{') {
///data.insert(index[1] + 1, ":");
///inc++;
}
indexQuote = 0;
} else {
if (insertCommaNext != -1) {
///data.insert(insertCommaNext, ",");
///inc++;
insertCommaNext = -1;
}
index[indexQuote] = found;
if (indexQuote == 2) { // Join 'key: value' by placing the comma
///data.replace(index[1] + 1, 1, ":");
} else if (indexQuote == 4) { // Add comma after each key/value entry
indexQuote = 0;
///data.insert(index[3] + 1, ",");
///inc++;
}
indexQuote++;
}
found = data.find_first_of(delims, found + inc);
}
data = "{" + data + "}";
}
If you uncomment any of the triple slashed /// lines - containing an insert/replace, it will crash.
I'm certian the code quality is not great, there's probably better ways to achieve this. Cheers.
The problem is that indexQuote gets higher than 3, so index[indexQuote] = found; goes out of bounds. You have the case below that resets indexQuote to 0, you have to do that before you try to call index[indexQuote].
For reference, I debugged this by adding prints everywhere and printing all the variables until I found where it crashed.

QRegular expression

I can't find a way to match the text before opening curly bracket (i.e. p) using regex and Qt. My input file reads :
solvers
{
p
{
solver PCG;
preconditioner DIC;
tolerance 1e-06;
relTol 0.05;
}
q
{
solver PCG;
relTol 0.03;
}
}
and corresponding code from .cpp is :
rule.pattern = QRegularExpression("\\b(\\w+)(?=[\\s+\n]?\\{)",
QRegularExpression::MultilineOption);
Is anyone with better knowledge of Qt and regex can explain to me a way to achieve that?
EDIT #1
Thanks for the reply and comment. Two things :
I mistype my input file had no ">" symbol so I edited it in the above completed input.
I was trying to match the "p" of p-block and the "q" of q-block. A more extended version of my input is now edited above.
I found \}\s*(\w+)(?=\s*\{) to matched the "q" q-block but does not work in the code.
It seems to struggle with the return to line between "p" and the bracket "{".
EDIT #2 : show the code
in highlighter.cpp
#include "highlighter.h"
Highlighter::Highlighter(QTextDocument *parent)
: QSyntaxHighlighter(parent)
{
HighlightingRule rule;
(...)
varFormat.setFontWeight(QFont::Bold);
varFormat.setForeground(Qt::darkMagenta);
rule.pattern = QRegularExpression("^\\s+(\\w+)\\s*$",QRegularExpression::MultilineOption);
rule.format = varFormat;
highlightingRules.append(rule);
(...) }
void Highlighter::highlightBlock(const QString &text)
{
foreach (const HighlightingRule &rule, highlightingRules) {
QRegularExpressionMatchIterator matchIterator = rule.pattern.globalMatch(text);
while (matchIterator.hasNext()) {
QRegularExpressionMatch match = matchIterator.next();
setFormat(match.capturedStart(), match.capturedLength(), rule.format);
}
}
setCurrentBlockState(0);
int startIndex = 0;
if (previousBlockState() != 1)
startIndex = text.indexOf(commentStartExpression);
while (startIndex >= 0) {
QRegularExpressionMatch match = commentEndExpression.match(text, startIndex);
int endIndex = match.capturedStart();
int commentLength = 0;
if (endIndex == -1) {
setCurrentBlockState(1);
commentLength = text.length() - startIndex;
} else {
commentLength = endIndex - startIndex
+ match.capturedLength();
}
setFormat(startIndex, commentLength, multiLineCommentFormat);
startIndex = text.indexOf(commentStartExpression, startIndex + commentLength);
}
}
Have a look at [\\s+\n]?, it matches 1 or 0 occurrences of any whitespace or + characters. But there are more than 1 whitespace betwee solvers and {.
Replacing (?=[\\s+\n]?\\{) with (?=\\s*{) will already fix the issue. But you may also use
QRegularExpression("^\\s*(\\w+)\\s*\\{", QRegularExpression::MultilineOption)
to match the
^ - start of a line
\\s* - 0+ whitespaces
(\\w+) - Group 1 (you can get it via match.captured(1)): one or more word chars
\\s* - 0+ whitespaces followed with
\{ - a literal {.
See the regex demo.
Because p is not after {, but is after }
You can go this way:
[\{\}]\s*(\w+)(?=\s*\{) see https://regex101.com/r/wA1vu2/3
Or this this one:
(?P<tagname>[^{}\s]*)(?P<postspace>\s*)(?P<json_item>\{[^{}]*\})
?P<tagname> name of the match
?P<json_item>\{[^{}]*\} - leaf level item
(?P<postspace>\s*) - space between leaf item and leaf name
(?P<tagname>[^{}\s]*) - leaf name
https://regex101.com/r/wA1vu2/1/

How can I make this regex replace function efficient?

I am using this function to perform regex replace on std::string:
String regexReplace(String s,String search,String replace,String modifier,int user){
bool case_sensitive=true,global=false;
String replaced_string=s;
if(modifier=="gi" || modifier=="ig"){global=true;case_sensitive=false;}
else if(modifier=="i"){case_sensitive=false;}
else if(modifier=="g"){global=true;}
try {
std::regex re (search);
if(user==1){re=createRegex(search,case_sensitive);}
else if(!case_sensitive){re= Regex (search, REGEX_DEFAULT | ICASE);}
if(global){
replaced_string=std::regex_replace (s,re,replace,std::regex_constants::format_default);
}
else{
replaced_string=std::regex_replace (s,re,replace,NON_RECURSIVE_REGEX_REPLACE);
}
}
catch (std::regex_error& e) {
printErrorLog("Invalid replace string regex: "+search);
Exit(1);
}
return replaced_string;
}
typedefs and #defines used:
typedef std::regex Regex;
typedef std::string String;
#define REGEX_DEFAULT std::regex::ECMAScript
#define ICASE std::regex::icase
#define NON_RECURSIVE_REGEX_REPLACE std::regex_constants::format_first_only
But this function consumes approximately 0.3 seconds on 14x4 consecutive executions:
res=regexReplace(res,"([^\\.]*\\.\\d+?)0+$","$1","i",0);
res=regexReplace(res,"([^\\.]*\\.\\d+?)0+(e.*)$","$1$2","i",0);
res=regexReplace(res,"([^\\.]*)\\.0*$","$1","i",0);
res=regexReplace(res,"([^\\.]*)\\.0*(e.*)$","$1$2","i",0);
Can I make it more efficient to lessen the execution time?
Note:
The createRegex() function is not being called (user=0 by default).

regex get comma seperated values between two words

I have the following query & PRCE regex from which i want to get table names.
FROM student s, #prefix#.sometable, subject s, marks s WHERE ...
(?<=\sfrom)\s+\K(\w*)(?=\s+where)
Desired result student s subject s marks s
I cant figure out how to extract from 1st match.
I'm trying to find & replace in sublime text editor.
Try this: \s+(\w*\s)*s
pcre *myregexp;
const char *error;
int erroroffset;
myregexp = pcre_compile("\\s+(\\w*\\s)*s", PCRE_CASELESS | PCRE_EXTENDED | PCRE_MULTILINE | PCRE_DUPNAMES | PCRE_UTF8, &error, &erroroffset, NULL);
if (myregexp) {
int offsets[2*3]; // (max_capturing_groups+1)*3
int offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, 0, offsets, 2*3);
if (offsetcount > 0) {
pcre_get_substring(subject, &offsets, offsetcount, 1, &result);
// group offset = offsets[1*2];
// group length = offsets[1*2+1] - offsets[1*2];
} else {
result = NULL;
}
} else {
// Syntax error in the regular expression at erroroffset
result = NULL;
}
Using #bobblebubble solution which worked 90%, i added a bit more conditions to match my case. it works but its very aggressive and hangs the editor on large or multiple files. But i can live with what i have got. Hers the solution:
(?is)(?:\bFROM\b|\G(?!^))(?:[\s,]|#[^\s,]++)*(\b\K(?:\s*(?!WHERE|LEFT\b)\w+){4,})\b(?=.*?\bWHERE\b)

nginx module regex unable to match

Hi guys im trying to build a module for nginx and need to match a substring here is what im using to try and match
int match_chan(ngx_http_request_t *r, ngx_pool_t *temp_pool, ngx_str_t *body, ngx_str_t *channel) {
u_char errstr[NGX_MAX_CONF_ERRSTR];
ngx_regex_compile_t *rc;
int captures[2];
if ((rc = ngx_pcalloc(temp_pool, sizeof(ngx_regex_compile_t))) == NULL) {
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "unable to allocate memory to compile agent patterns");
return 0;
}
//ngx_memzero(rc, sizeof(ngx_regex_compile_t));
ngx_str_t pat = ngx_string("test(:|%3[Aa])([a-zA-Z0-9]+)");
rc->pattern = pat;
rc->pool = temp_pool;
rc->err.len = NGX_MAX_CONF_ERRSTR;
rc->err.data = errstr;
if (ngx_regex_compile(rc) != NGX_OK) {
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "unable to compile regex pattern %V", rc->pattern);
return 0;
}
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "%V, %V", &pat, body);
if (ngx_regex_exec(rc->regex, body, captures, 2) >= 0) {
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "It Matched");
//ngx_memcpy(channel->data, body->data + captures[0], body->len);
return 1;
}
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "It did not match");
return 0;
}
ngx_str_t *channel = NULL;
if(match_chan(r, temp_pool, aux, channel)) {
//ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, " match: %c", match);
}
and the message that is passed looks like this
2014/07/04 13:28:49 [error] 10695#0: *38 test:([a-z0-9]+), MSG%0Atest%3Ahello%0A%0A%0Awins%00
2014/07/04 13:28:49 [error] 10695#0: *38 It did not match
taken from nginx log
ive tested the regex in a pure C app and that worked fine i thought nginx was similar but i guess it has its differences
ive looked all over google and ive tried looking at nginx modules with still no luck please help me :)
Thanks
Dave
The problem is that the string you are trying to match is URL-encoded, and due to this it doesn't match the pattern provided. There are two options:
Construct a regular expression so it will match encoded string as well ("test(:|%3[Aa])([a-zA-Z0-9]+)" will match both unescaped and escaped forms);
Unescape the string you are matching. In nginx, this is done with the ngx_unescape_uri() function.