nginx module regex unable to match - regex

Hi guys im trying to build a module for nginx and need to match a substring here is what im using to try and match
int match_chan(ngx_http_request_t *r, ngx_pool_t *temp_pool, ngx_str_t *body, ngx_str_t *channel) {
u_char errstr[NGX_MAX_CONF_ERRSTR];
ngx_regex_compile_t *rc;
int captures[2];
if ((rc = ngx_pcalloc(temp_pool, sizeof(ngx_regex_compile_t))) == NULL) {
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "unable to allocate memory to compile agent patterns");
return 0;
}
//ngx_memzero(rc, sizeof(ngx_regex_compile_t));
ngx_str_t pat = ngx_string("test(:|%3[Aa])([a-zA-Z0-9]+)");
rc->pattern = pat;
rc->pool = temp_pool;
rc->err.len = NGX_MAX_CONF_ERRSTR;
rc->err.data = errstr;
if (ngx_regex_compile(rc) != NGX_OK) {
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "unable to compile regex pattern %V", rc->pattern);
return 0;
}
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "%V, %V", &pat, body);
if (ngx_regex_exec(rc->regex, body, captures, 2) >= 0) {
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "It Matched");
//ngx_memcpy(channel->data, body->data + captures[0], body->len);
return 1;
}
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "It did not match");
return 0;
}
ngx_str_t *channel = NULL;
if(match_chan(r, temp_pool, aux, channel)) {
//ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, " match: %c", match);
}
and the message that is passed looks like this
2014/07/04 13:28:49 [error] 10695#0: *38 test:([a-z0-9]+), MSG%0Atest%3Ahello%0A%0A%0Awins%00
2014/07/04 13:28:49 [error] 10695#0: *38 It did not match
taken from nginx log
ive tested the regex in a pure C app and that worked fine i thought nginx was similar but i guess it has its differences
ive looked all over google and ive tried looking at nginx modules with still no luck please help me :)
Thanks
Dave

The problem is that the string you are trying to match is URL-encoded, and due to this it doesn't match the pattern provided. There are two options:
Construct a regular expression so it will match encoded string as well ("test(:|%3[Aa])([a-zA-Z0-9]+)" will match both unescaped and escaped forms);
Unescape the string you are matching. In nginx, this is done with the ngx_unescape_uri() function.

Related

If statement fails with regex comparison

public list[str] deleteBlockComments(list[str] fileLines)
{
bool blockComment = false;
list[str] sourceFile = [];
for(fileLine <- fileLines)
{
fileLine = trim(fileLine);
println(fileLine);
if (/^[\t]*[\/*].*$/ := fileLine)
{
blockComment = true;
}
if (/^[\t]*[*\/].*$/ := fileLine)
{
blockComment = false;
}
println(blockComment);
if(!blockComment)
{
sourceFile = sourceFile + fileLine;
}
}
return sourceFile;
}
For some reason, I am not able to detect /* at the beginning of a string. If I execute this on the command line, it seems to work fine.
Can someone tell me what I am doing wrong? In the picture below you can see the string to be compared above the comparison result (false).
[\/*] is a character set that matches forward slash or star, not both one after the other. Simply remove the square brackets and your pattern should start behaving as you expect.
While we're at it, let's also get rid of the superfluous square brackets around \t
^\t*\/*.*$

Not matching string of regular expression in C programing

Requirement is,
In a string (plan is the variable), For which does not have the sub string "TT", should return success message in the pattern matching ("Regular expression not having TT" message in the console).
I tried [^(TT)], I tried negation,
I tried ^(?!.*TT).$
#include <stdio.h>
#include "regex.h"
int main()
{
regex_t exps;
int r1 =-1;
int r2= -1;
char *pattern="\\^(\\?\\!.*TT).*\\$";
char *plan="TEST QBSE US 5USD charge sample conv offer";
r1=regcomp(&exps, pattern, REG_EXTENDED | REG_NOSUB);
if (r1 == 0)
{
printf("Regular expression is parsed sucessfully:%s \n",pattern);
}
else
{
printf("Regular expression parsing failed.\n");
}
r2=regexec(&exps, plan, (size_t)0, NULL, 0);
if (r2 == 0)
{
printf("Regular expression not having TT \n");
}
else
{
printf("Regular expression is not matched.\n");
}
regfree(&exps);
return 0;
}
current output:
Regular expression is parsed sucessfully :\^(\?!.TT).\$
Regular expression is not matched.
You may use
char *pattern="^[^T]*(T($|[^T]+))*$";
See the C code demo and the regex demo.
Details
^ - start of string
[^T]* - 0 or more chars other than T
(T($|[^T]+))* - 0 or more repetitions of
T - a T char...
($|[^T]+) - ... followed with the end of string ($) or (|) any 1 or more chars other than T ([^T]+)
$ - end of string.
NOTE: If the regex is any kind of regex pattern that you need to negate the way you describe (validate) you should actually do that easier with the code: use a mere char *pattern="TT"; pattern and if regexec return value is not 0 return true (if (r2 != 0)):
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
int main (void)
{
regex_t exps;
int r1 =-1;
int r2= -1;
char *pattern="TT"; // <-- The regex is simple
char *plan="TEST QBSE US 5USD charge sample conv offer";
r1=regcomp(&exps, pattern, REG_EXTENDED | REG_NOSUB);
if (r1 == 0)
{
printf("Regular expression is parsed sucessfully:%s \n",pattern);
}
else
{
printf("Regular expression parsing failed.\n");
}
r2=regexec(&exps, plan, (size_t)0, NULL, 0);
if (r2 != 0) // <-- Here goes the negation
{
printf("Regular expression not having TT \n");
}
else
{
printf("Regular expression is not matched.\n");
}
regfree(&exps);
return 0;
}
See the C demo online.

regex get comma seperated values between two words

I have the following query & PRCE regex from which i want to get table names.
FROM student s, #prefix#.sometable, subject s, marks s WHERE ...
(?<=\sfrom)\s+\K(\w*)(?=\s+where)
Desired result student s subject s marks s
I cant figure out how to extract from 1st match.
I'm trying to find & replace in sublime text editor.
Try this: \s+(\w*\s)*s
pcre *myregexp;
const char *error;
int erroroffset;
myregexp = pcre_compile("\\s+(\\w*\\s)*s", PCRE_CASELESS | PCRE_EXTENDED | PCRE_MULTILINE | PCRE_DUPNAMES | PCRE_UTF8, &error, &erroroffset, NULL);
if (myregexp) {
int offsets[2*3]; // (max_capturing_groups+1)*3
int offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, 0, offsets, 2*3);
if (offsetcount > 0) {
pcre_get_substring(subject, &offsets, offsetcount, 1, &result);
// group offset = offsets[1*2];
// group length = offsets[1*2+1] - offsets[1*2];
} else {
result = NULL;
}
} else {
// Syntax error in the regular expression at erroroffset
result = NULL;
}
Using #bobblebubble solution which worked 90%, i added a bit more conditions to match my case. it works but its very aggressive and hangs the editor on large or multiple files. But i can live with what i have got. Hers the solution:
(?is)(?:\bFROM\b|\G(?!^))(?:[\s,]|#[^\s,]++)*(\b\K(?:\s*(?!WHERE|LEFT\b)\w+){4,})\b(?=.*?\bWHERE\b)

extract domain between two words

I have in a log file some lines like this:
11-test.domain1.com Logged ...
37-user1.users.domain2.org Logged ...
48-me.server.domain3.net Logged ...
How can I extract each domain without the subdomains? Something between "-" and "Logged".
I have the following code in c++ (linux) but it doesn't extract well. Some function which is returning the extracted string would be great if you have some example of course.
regex_t preg;
regmatch_t mtch[1];
size_t rm, nmatch;
char tempstr[1024] = "";
int start;
rm=regcomp(&preg, "-[^<]+Logged", REG_EXTENDED);
nmatch = 1;
while(regexec(&preg, buffer+start, nmatch, mtch, 0)==0) /* Found a match */
{
strncpy(host, buffer+start+mtch[0].rm_so+3, mtch[0].rm_eo-mtch[0].rm_so-7);
printf("%s\n", tempstr);
start +=mtch[0].rm_eo;
memset(host, '\0', strlen(host));
}
regfree(&preg);
Thank you!
P.S. no, I cannot use perl for this because this part is inside of a larger c program which was made by someone else.
EDIT:
I replace the code with this one:
const char *p1 = strstr(buffer, "-")+1;
const char *p2 = strstr(p1, " Logged");
size_t len = p2-p1;
char *res = (char*)malloc(sizeof(char)*(len+1));
strncpy(res, p1, len);
res[len] = '\0';
which is extracting very good the whole domain including subdomains.
How can I extract just the domain.com or domain.net from abc.def.domain.com ?
is strtok a good option and how can I calculate which is the last dot ?
#include <vector>
#include <string>
#include <boost/regex.hpp>
int main()
{
boost::regex re(".+-(?<domain>.+)\\s*Logged");
std::string examples[] =
{
"11-test.domain1.com Logged ...",
"37-user1.users.domain2.org Logged ..."
};
std::vector<std::string> vec(examples, examples + sizeof(examples) / sizeof(*examples));
std::for_each(vec.begin(), vec.end(), [&re](const std::string& s)
{
boost::smatch match;
if (boost::regex_search(s, match, re))
{
std::cout << match["domain"] << std::endl;
}
});
}
http://liveworkspace.org/code/1983494e6e9e884b7e539690ebf98eb5
something like this with boost::regex. Don't know about pcre.
Is the in a standard format?
it appears so, is there a split function?
Edit:
Here is some logic.
Iterate through each domain to be parsed
Find a function to locate the index of the first string "-"
Next find the index of the second string minus the first string "Logged"
Now you have the full domain.
Once you have the full domain "Split" the domain into your object of choice (I used an array)
now that you have the array broken apart locate the index of the value you wish to reassemble (concatenate) to capture only the domain.
NOTE Written in C#
Main method which defines the first value and the second value
`static void Main(string[] args)
{
string firstValue ="-";
string secondValue = "Logged";
List domains = new List { "11-test.domain1.com Logged", "37-user1.users.domain2.org Logged","48-me.server.domain3.net Logged"};
foreach (string dns in domains)
{
Debug.WriteLine(Utility.GetStringBetweenFirstAndSecond(dns, firstValue, secondValue));
}
}
`
Method to parse the string:
`public string GetStringBetweenFirstAndSecond(string str, string firstStringToFind, string secondStringToFind)
{
string domain = string.Empty;
if(string.IsNullOrEmpty(str))
{
//throw an exception, return gracefully, whatever you determine
}
else
{
//This can all be done in one line, but I broke it apart so it can be better understood.
//returns the first occurrance.
//int start = str.IndexOf(firstStringToFind) + 1;
//int end = str.IndexOf(secondStringToFind);
//domain = str.Substring(start, end - start);
//i.e. Definitely not quite as legible, but doesn't create object unnecessarily
domain = str.Substring((str.IndexOf(firstStringToFind) + 1), str.IndexOf(secondStringToFind) - (str.IndexOf(firstStringToFind) + 1));
string[] dArray = domain.Split('.');
if (dArray.Length > 0)
{
if (dArray.Length > 2)
{
domain = string.Format("{0}.{1}", dArray[dArray.Length - 2], dArray[dArray.Length - 1]);
}
}
}
return domain;
}
`

Regex to replaces slashes inside of JSON

I have some JSON I am parsing that looks like this:
{
"dhkplhfnhceodhffomolpfigojocbpcb": {
"external_crx": "C:\Program Files\Babylon\Babylon-Pro\Utils\BabylonChrome.crx",
"external_version": "1.1"
}
}
Unfortunately, JSON.NET does gives me an error because of the single slashes. Is there a way to either allow single slashes? If not, what is a Regex I can use to double slash the filepath in a safe way with out messing up other entries that might have the correct double slash?
Update The error (using JsonTextReader) is "Bad JSON escape sequence: \P. Line 4, position 25." It turns out there is more that meets the eye on this issue, because the backslash is there to support hex and octal values (http://json.codeplex.com/discussions/244265). How will I know when I'm looking at a hex/octal and not just a filepath backslash that someone forgot to double backslash?
case 'u':
// ...
case 'x':
hexValues = new char[2];
for (int i = 0; i < hexValues.Length; i++)
{
if ((currentChar = MoveNext()) != '\0' || !_end)
hexValues[i] = currentChar;
else
throw CreateJsonReaderException("Unexpected end while parsing unicode character. Line {0}, position {1}.", _currentLineNumber, _currentLinePosition);
}
hexChar = Convert.ToChar(int.Parse(new string(hexValues), NumberStyles.HexNumber, NumberFormatInfo.InvariantInfo));
_buffer.Append(hexChar);
break;
default:
var octValues = new char[3];
var octLength = 0;
for (int i = 0; i < octValues.Length; i++)
{
var octalChar = i==0 ? currentChar : PeekNext();
if ((octalChar > 1 || !_end) && octalChar>='0' && octalChar<'8')
{
octValues[i] = (char)octalChar;
if(i!=0) MoveNext();
octLength++;
}
else
{
break;
}
}
if (octLength>0)
{
hexChar = Convert.ToChar(Convert.ToInt32(new string(octValues, 0, octLength), 8));
_buffer.Append(hexChar);
break;
}
throw CreateJsonReaderException("Bad JSON escape sequence: {0}. Line {1}, position {2}.", #"\" + currentChar, _currentLineNumber, _currentLinePosition);
}
}
else
{
throw CreateJsonReaderException("Unterminated string. Expected delimiter: {0}. Line {1}, position {2}.", quote, _currentLineNumber, _currentLinePosition);
}
break;
To replace single backslashes with double backslashes, but leave existing doubles alone, search for
(?<!\\)\\(?!\\)
and replace that with
\\\\
For C#, RegexBuddy creates this code snippet:
resultString = Regex.Replace(subjectString,
#"(?<!\\) # lookbehind: Check that previous character isn't a \
\\ # match a \
(?!\\) # lookahead: Check that the following character isn't a \",
#"\\", RegexOptions.IgnorePatternWhitespace);
What is the Error?
what is your deserializeObject?
If you use something like :
data = JsonConvert.DeserializeObject<Dictionary<Object, Object>>(jsonText);
You shouldn't have any problems.