Regex - Match Host and User-Agent regardless which of them will be readed first - regex

I'm trying to parse two keywords values which are Host and User-Agent regardless which of them will be readed first.
In my case I have two scenario, host and user-agent always beside each other, but sometimes user-agent came first and sometimes host is the first as shown in below:
How can I parse them without duplicate the regex (Regex1|Regex2), I tried lookbehind function but I failed.
Sample 1:
Word1\r\nHost:10.1.1.2\r\nUser-Agent: Microsoft Office/16.0 (Windows NT)\r\nWord2
Sample 2:
Word1\r\nUser-Agent: Microsoft Office/16.0 (Windows NT)\r\nHost:10.1.1.2\r\nWord2
Regex:
([^H]*(?:H(?!ost:)[^H]*)*Host:(?<IP>[^\\]*)\\)([^U]*(?:U(?!ser-Agent:)[^U]*)*User-Agent:(?<UserAgent>[^\\]*)\\)
Regex101:
Regex101 Example

Related

Improve exim regex to catch everything but specified adresses

I'm using this regex to catch any incoming e-mails excluding mails from from specific people.
^(.(?!(zulgrib#exemple.com|zulgrib#example.org)).)*$/i
This regex correctly let through these scenarios
Zulgrib at example.com <Zulgrib#example.com>
<Zulgrib#example.com>
<Zulgrib#example.com> In behalf of Robot
Regex correctly catches these kind of headers
Associate#example.org
Your Associate Associate#example.com
If an excluded e-mail address is alone, it will catch it, I would like to prevent that. Example:
zulgrib#exemple.org
What should be modified to allow this to work and why my current method is not correct ?
If I understand the documentation, . matches any character, void is not a character, but using * is not working.
First, some issues in your current regex:
exemple has a different spelling than example
Literal points need to be escaped. So \.com instead of .com.
There are two dots (.) in the outermost group, which means you only capture text with an even number of characters, and don't exclude the case where the email addresses start at the beginning of the string. The first dot should not be there.
To make an exception for when the email address is the only thing in the input, I fear you'll have to specify that as a separate alternative in which (unfortunately) you'll have to repeat those email addresses:
^(?:zulgrib#example\.com|zulgrib#example\.org)$|^(?!(?:.*(?:zulgrib#example\.com|zulgrib#example\.org))).*$

Regex parttern to validate the host section of url

I am trying to validate the host section of the url, (not the entire url)
so in the case of http://www.example.com/some/path, all I want to validate is 'www.example.com'.
I have the following regex, ^((?:&#|[[:alnum:]]|[\-_])(?:&#|[[:alnum:]]|[\-\._~\?#\[\]#!$&'\(\)\*\+,;=])*(?::[0-9]{2,})?)$ and it works well in all cases, (including http://localhost and so on).
Looking at https://mathiasbynens.be/demo/url-regex, this all works fine except for 'sites' like http://उदाहरण.परीक्षा and http://⌘.ws, (are those actually allowed?)
If the given host names are possible, what regex could I use, over and above [[:alnum:]] to validate host name like उदाहरण.परीक्षा and उदाहरण.परीक्षा:80?

Filter by regex example

Could anyone provide an example of a regex filter for the Google Chrome Developer toolbar?
I especially need exclusion. I've tried many regexes, but somehow they don't seem to work:
It turned out that Google Chrome actually didn't support this until early 2015, see Google Code issue. With newer versions it works great, for example excluding everything that contains banners:
/^(?!.*?banners)/
It's possible -- at least in Chrome 58 Dev. You just need to wrap your regex with forward-slashes: /my-regex-string/
For example, this is one I'm currently using: /^(.(?!fallback font))+$/
It successfully filters out any messages that contain the substring "fallback font".
EDIT
Something else to note is that if you want to use the ^ (caret) symbol to search from the start of the log message, you have to first match the "fileName.js?someUrlParam:lineNumber " part of the string.
That is to say, the regex is matching against not just the log message, but also the stack-entry for the line which made the log.
So this is the regex I use to match all log messages where the actual message starts with "Dog":
/^.+?:[0-9]+ Dog/
The negative or exclusion case is much easier to write and think about when using the DevTool's native syntax. To provide the exclusion logic you need, simply use this:
-/app/ -/some\sother\sregex/
The "-" prior to the regex makes the result negative.
Your expression should not contain the forward slashes and /s, these are not needed for crafting a filter.
I believe your regex should finally read:
!(appl)
Depending on what exactly you want to filter.
The regex above will filter out all lines without the string "appl" in them.
edit: apparently exclusion is not supported?

Regex to seperate request uri by 'real' page request and image/js/css requests

I want to filter out all entries in my access logs that have a request uri which are requesting static recources like images, js, css, xml. The goal is to have an minfied access log wich only contains 'real' page request.
I'm trying to find a regex that fits my purpose and here's the point where I would like to ask for your help.
Here are some examples:
Strings I want to match:
r:GET / HTTP/1.1
r:GET /m HTTP/1.1
r:GET /autor/William-Mills/Deep-Hunting-Shallow-Fishing-8163700-t/ HTTP/1.1
r:GET /?wicket:interface=:1::IActivePageBehaviorListener:15:2&wicket:ignoreIfNotActive=true HTTP/1.1
Strings I not want to match: (One could say all that contains something like .xxx before the ?querystring )
r:GET /js/global.js?v=17 HTTP/1.1
r:GET /js/global.js HTTP/1.1
r:GET /img/icon_action_arrow.png HTTP/1.1
r:GET /img/icon_action_arrow.PNG HTTP/1.1
I endend up, having a regex like this:
"r:[A-Z]+ \\S*(?!(?i)\\.jpg|\\.png|\\.gif|\\.js|\\.css|\\.xml)(\\?| )"
(With a real whitespace at the end)
But this is matching exactly the opposite, maching everything I not want to match and not matching everything I want to have.
Thanks in advance for any hints, help or advices!
How about something like this?
r:[A-Z]+\s([^\s\.]+)\s
It's a slight twist on yours, allowing a space before the path and then paths that do not contain full stops followed by another space. Really depends if you simply can just ignore paths with a full stop or you need to be more definitive.
Edit :
r:[A-Z]+\s((?:[^\s\.]+)|(?:[^\s\.]+\?.*))\s
Does that fit the bill? Tried to make it easier by splitting it into two. First part matches anything without a full stop, second part would match anything in the querystring (including your full stop) but makes sure there are no full stops before the question mark.
I believe this is what you need:
r:(?!GET [^?]+\.((?i)js|css|xml|jpg|gif|png))
This will produce a match whenever none of the six file endings are found in the path, though they are allowed in arguments, after a ?. Should you want to disallow these file endings only at the end of the path, you can use this version instead:
r:(?!GET [^?]+\.((?i)js|css|xml|jpg|gif|png)(\?| ))
The difference is that the first version would not match the following line, but the second version would:
r:GET /img/icon_action_arrow.png.tar.gz HTTP/1.1

Regex to see if ip starts with 156.21.x.x

I'm writing a regex for google analytics and I need to block any IP from 156.21.x.x I don't care about the last 2 octets just the first two. I would like to keep the regex to as few characters as possible as google only allows 255 chars and my regex is already pretty large.
not sure what flavor of regex or what lang your using, but this will work on most regex engines:
156\.21\.\d{1,3}\.\d{1,3}
Of course, this will match invalid ip's like 156.21.777.888, but if the list your parsing doesnt contain invalid ip addresses, then you should be ok. Or:
156\.21(\.\d{1,3}){2}
If you are running short on space, this would work, though you would match non-IP addresses as well. If you can assume Google will give you valid IP addresses, this is your shortest option:
^156\.21\.
Matches things like: 156.21.1.1 156.21.1000.1000 156.21.ABC
But does not match http://156.21.1.1 ehlo 156.21.1000.1000
The following regex would match (almost) valid IPv4 addresses that starts with 156.21:
(156\.21(?:\.[\d]{1,3}){2})