Why does this regex not match in Python? - regex

I have the regular expression
(GET|POST) (/api/\w+) (HTTP/1\.\d)(?:.*\\r\\n\\r\\n)(\S+)?
which I'm trying to match against HTTP GET and HTTP POST requests. I'm using the helpful regex101.com website to format my regular expression, and according to it, the regular expression should match both the formats I'm seeking.
Here's my regular expression on regex101.com.
However, when I input into Python itself and call re.split(), (on an input of strings), it doesn't split the POST request. It only splits the GET request. I thought it had something to do with the way regex101 parses \r\n (CRLF) versus how Python does it, so I double-checked and made sure that in Python, I actually type in \r\n\ inside the regex, and not \\r\\n, as I did in regex101. Yet it still doesn't work.
How can I get the regular expression to work inside Python?

Your'e just missing an additional \r\n after HTTP/1.0. This will work:
'POST /api/gettime HTTP/1.0\r\n\r\nContent-Length: 13\r\n\r\n100000+200000'

Related

How to correlate/use regular expression for empty or new line right boundary in JMeter

Example:
In a response headers I see,
state=a5d73a14-a728-4f0f-afae-de5fda55d002
Here I can use LB as state= and there is no right boundary.
So I tried using regular expression extractor as:
state=(.+)
and replaced the hard coded value in the next request as ${state}
The requests are failing as it takes ${state} itself in the request URL, it means regular expression extractor is not working. I know I have placed the function rightly. Still I don't know what I am doing wrong here. Any suggestion would really help!
Should search the response in main and sub samples. Before I was searching for response only in main sample's so it failed.
same regular expression worked: when there is no right boundary or new line right boundary.
state=(.+)
enter image description here
Actually your regex should work, maybe it fails due to line break or something, to be on the safe side try using state=(.*) as the regular expression, it should be less restrictive:
Also the value of your "state" looks like a GUID so you can try looking up a guid-like structure instead:
state=([A-Fa-f0-9]{8}[\-][A-Fa-f0-9]{4}[\-][A-Fa-f0-9]{4}[\-][A-Fa-f0-9]{4}[\-]([A-Fa-f0-9]){12})
And last but not the least, by default Regular Expression Extractor looks into response body so you need to change the "Field to check" to Response Headers
More information:
JMeter Regular Expressions
Perl 5 Regex Cheat sheet
Using Regular Expressions to Extract Tokens and Session IDs to Variables

Regular Expression Syntax for URL extraction

I need to extract the string to the right of the equal sign character in an Apache Jmeter project. I am not familiar with Regular Expression syntax at all, but I think this would be the easiest way to extract it. The url is: https://myserver.com:portnum/im;jsessionid=48E10C95151BFB84D795C90FBC31E8E6
I only need the string to the right of the equal sign, not including the equal sign. My question is, what regular expression should I use to extract the string? Thanks!
It is as simple as =(\w+)
Demo:
References:
JMeter - Regular Expressions
Using RegEx (Regular Expression Extractor) with JMeter
Perl 5 Regex Cheat sheet
Also be aware that there are "smarter" ways of handling JSESSIONID:
In case if it comes as a Cookie you can get its value via HTTP Cookie Manager
If it is being passed as an URL parameter you could use HTTP URL Re-writing Modifier

Regular expression not working in google analytics

Im trying to build a regular expression to capture URLs which contain a certain parameter 7136D38A-AA70-434E-A705-0F5C6D072A3B
Ive set up a simple regex to capture a URL with anything before and anything after this parameter (just just all URLs which contain this parameter). Ive tested this on an online checker: http://scriptular.com/ and seems to work fine. However google analytics is saying this is invalid when i try to use it. Any idea what is causing this?
Url will be in the format
/home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd
so i just want to capture URLs that contain that specific "z" parameter.
regex
^.+(?=7136D38A-AA70-434E-A705-0F5C6D072A3B).+$
You just need
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B.+$
Or (a bit safer):
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&.+$)
And I think you can even use
=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&)
See demo
Your regex is invalid because GA regex flavor does not support look-arounds (and you have a (?=...) positive look-ahead in yours).
Here is a good GA regex cheatsheet.
To match /home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd you can use:
\S*7136D38A-AA70-434E-A705-0F5C6D072A3B\S*

Regex for string plus number range

I'm using a regular expression to extract data from our reporting tool.
Here is the range:
cid=300000[195-429]
I tried ?cid=[300000195]-[300000429]
But they are not working.
cid is part of the string. So, for example, return ?cid=300000197 and return ?cid=300000300
And everything in between.
What would be the correct regex syntax?
Try this:
cid=30000(19[5-9]|[2-3]\d{2}|4[0-2]\d)
Paste the regex here and give it a try.
Google Analytics' regular expression engine is rather weak compared to those used by Perl, PHP, JavaScript, and so on, so this took some tweaking. But as long as you're sure your URLs will be following the expected format, this should get the job done.

Parsing words separated with hyphen

I require to parse the below string using regular expressions. I came up with two variants, both of which seem a bit ugly to me. Please assist me as to which would be better suited for the job.
The main task is to parse the url in scrapy.
Sample expression -
/article/2014/01/16/hcl-tech-earnings-shares-idINDEEA0F02920140116
Regex -
/article/(\d+)/(\d+)/(\d+)/([0-9A-Za-z-]+)
/article/(\d+)/(\d+)/(\d+)/\w+(-\w+)*
And yes, I need to capture the whole ending expression, so 1st regex has handled that perfectly. I verified both the regex using https://pythex.org/.
Edit -
Expected Format -
/article/(yyyy)/(mm)/(dd)/(words-separated-by-hyphen)
I want to capture all the stuff separated by / after /article
Simply use:
/article/(\d+)/(\d+)/(\d+)/(.*)
The hyphens don't seem to have to do anything with what's in the url so...