Regex for string plus number range - regex

I'm using a regular expression to extract data from our reporting tool.
Here is the range:
cid=300000[195-429]
I tried ?cid=[300000195]-[300000429]
But they are not working.
cid is part of the string. So, for example, return ?cid=300000197 and return ?cid=300000300
And everything in between.
What would be the correct regex syntax?

Try this:
cid=30000(19[5-9]|[2-3]\d{2}|4[0-2]\d)
Paste the regex here and give it a try.
Google Analytics' regular expression engine is rather weak compared to those used by Perl, PHP, JavaScript, and so on, so this took some tweaking. But as long as you're sure your URLs will be following the expected format, this should get the job done.

Related

Regular expression to find last match in XML output

I have been working for days to learn regex so that I can extract the last match out of an xml output of a test from a scientific instrument. The instrument buffer can hold multiple tests and I am only interested in the last (most recent) test. I can't figure it out!
<Ticket class="SAMPLE" serialno="6000SP210134" versions="FP6000;Main:V1.25;COM:V1.7;D:V1.11;TEC:V1.6">
<Measurement>
<SampleId>6</SampleId>
<DateTime>2022-10-28T15:16:22</DateTime>
<Value>300</Value>
<Unit>mOsmol/kg</Unit>
<DeviceCode>6000SP210134</DeviceCode>
<CheckSum>50c5656fd477cbcd3b7a5036ba98a542</CheckSum>
</Measurement>
</Ticket>
<Ticket class="SAMPLE" serialno="6000SP210134" versions="FP6000;Main:V1.25;COM:V1.7;D:V1.11;TEC:V1.6">
<Measurement>
<SampleId>7</SampleId>
<DateTime>2022-10-28T15:18:55</DateTime>
<Value>425</Value>
<Unit>mOsmol/kg</Unit>
<DeviceCode>6000SP210134</DeviceCode>
<CheckSum>50c5656fd477cbcd3b7a5036ba98a542</CheckSum>
</Measurement>
</Ticket>
I need match and return the last value from the last test <Ticket></Ticket> (the number of Tickets is variable). In this example it would be 425.
I thought this might work, but it doesn't...
\<Value>\d{2,4}<\/Value>.*\n$\
This regular expression is executed and interpreted in a lab information management system called LabVantage, not in any language like perl, php, C, etc. A regular expression is the only option I have.
LabVantage does not seem to publicly reveal their regex engine but if you have access to lookarounds then this should work:
<Value>\d{2,4}<\/Value>(?![\s\S]*<\/Value>)
<Value>\d{2,4}<\/Value> - you know what this does, you wrote it =)
(?![\s\S]*<\/Value>) - ahead of me, </Value> does not exist
https://regex101.com/r/XpbOdR/1
If lookbehinds are supported then you can get fancy like this to extract only the digits:
(?<=<Value>)\d{2,4}(?=<\/Value>(?![\s\S]*<\/Value>))
https://regex101.com/r/VCDURX/1
I was not able to coax LabVantage to work with a regular expression in the ways recommend above. However, if any LabVantage user is looking to solve a similar issue, the way it was resolved was to use a Value Extraction Rule like this:
extract /regex/ extract /regex/
or
extract /regex/ extract last number
This type of expression is not explicitly made a visible to the user but it still works. So the final code that did work is this:
extract /(?s).*Value>/ extract last number
Thanks all who contributed.

Why does this regex not match in Python?

I have the regular expression
(GET|POST) (/api/\w+) (HTTP/1\.\d)(?:.*\\r\\n\\r\\n)(\S+)?
which I'm trying to match against HTTP GET and HTTP POST requests. I'm using the helpful regex101.com website to format my regular expression, and according to it, the regular expression should match both the formats I'm seeking.
Here's my regular expression on regex101.com.
However, when I input into Python itself and call re.split(), (on an input of strings), it doesn't split the POST request. It only splits the GET request. I thought it had something to do with the way regex101 parses \r\n (CRLF) versus how Python does it, so I double-checked and made sure that in Python, I actually type in \r\n\ inside the regex, and not \\r\\n, as I did in regex101. Yet it still doesn't work.
How can I get the regular expression to work inside Python?
Your'e just missing an additional \r\n after HTTP/1.0. This will work:
'POST /api/gettime HTTP/1.0\r\n\r\nContent-Length: 13\r\n\r\n100000+200000'

Regular expression: find abc.com except xyz.abc.com or #abc.com

In Eclipse I want to find a string, and using the normal search results in hundreds of irrelevant results. So I'm trying to use regular expressions, but they don't give me the proper results up til now.
This is what I need: find "abc.com", but not "xyz.abc.com" or "#abc.com". To make it clear, it should return www.abc.com.
I've tried the following regex but I'm not sure if this is how it should be:
[^#xyz\.]abc.com
Using a negative lookbehind should suit your needs:
(?<!xyz[.]|#)abc[.]com
Every "abc.com" that is not preceded by "xyz." nor by "#".

Validate Regex Input, preferably using Regex

I'm looking to have the (admin) user enter some pattern matching string, to give different users of my website access to different database rows, depending on if the text in a particular field of the row matches the pattern matching string against that user.
I decided on Regex because it is trivial to integrate into the MySQL statements directly.
I don't really know where to start with validating that a string is a valid regular expression, with a regular expression.
I did some searching for similar questions, couldn't see one. Google produced the comical answer, sadly not so helpful.
Do people do this in the wild, or avoid it?
Is it able to be done with a simple regex, or will the set of all valid regex need to be limited to a usable subset?
Validating a regex is an incredibly complex task. A regex would not be able to do it.
A simple approach would be to catch any errors that occur when you try to run the SQL statement, then report an appropriate error back to the user.
I am assuming that the 'admin' is a trusted user here. It is quite dangerous to give a non-trusted user the ability to enter regexes, because it is easy to attack your system with a regex that is constructed to take a really long time to execute. And that is before you even start to worry about the Bobby Tables problems.
in javascript:
input = "hello**";
try{
RegExp(input);
// sumbit the regex
}catch(err){
// regex is not valid
}
You cannot validate that a string contains a valid regular expression with a regular expression. But you might be able to compromise.
If you only need to know that only characters which are valid in regular expressions were used in the string, you can use the regex:
^[\d\w \-\}\{\)\(\+\*\?\|\.\$\^\[\]\\]*$
This might be enough depending on the application.

Why won't this regexp work in google spreadsheets?

I'm trying to extract from a url using a regexp in google spreadsheets. However the spreadsheet returns #VALUE! with the following error: Invalid regular expression: invalid perl operator: (?<
Here is the regexp I'm using: (?<=raid_boss=)[a-zA-Z0-9_]+
A sample url will contain a variable in it that says raid_boss=name. This regexp should extract name. It works in my testing program, but not in google spreadsheet.
Here is the exact contents of the cell in google spreadsheets: =REGEXEXTRACT( B1 ; "/(?<=raid_boss=)[-a-zA-{}-9_]+" )
Any insight or help would be much appreciated, thank you!
Sounds like whatever regular-expression engine Google Docs is using doesn't support lookbehind assertions. They are a relatively rare feature.
But if you use captures, REGEXEXTRACT will return the captured text, so you can do it that way:
=REGEXEXTRACT( B1 ; "raid_boss=([a-zA-Z0-9_]+)" )
Javascript is not the issue - Google Sheets uses RE2 which lacks lookbehind
along with other useful things.
You could use:
regexextract(B1, ".*raid_boss=(.*)")
or else native sheet functions like FIND, SUBSTITUTE if that isn't working
Finding a good regex testing tool is tricky - for example you can make something that works in http://rubular.com/ but fails in GSheets. You need to make sure your tool supports the RE2 flavour eg: https://regoio.herokuapp.com/