Regex for page sorting query string

Regex for page sorting query string - regex

I am trying to match
"SomeField:asc"
"SomeField:desc"
"SomeField:asc,SomeField:asc"
"SomeField:desc,SomeField:desc" ...
Does not match if
""
SomeField:desc,SomeField
SomeField,SomeField:asc
SomeField:desc,SomeField:des, (exta comma)
I have current regex [A-Za-z]+:(asc|desc), but I am stuck. I am sure it is really simple regex but I am new to this so please be patient! Thank you

Maybe you can use this regex ^(?:[A-Za-z]+:(?:asc|desc),?)+$
From the beginning of the string ^
Inside a non capturing group (?:
One or more characters [A-Za-z]+
Followed by a color :
Inside a non capturing group (?:
asc or desc asc|desc
with an optional comma ,?
the outer optional group one or more times +
Unitl the end of the string $

I think this will do the trick:
([\w+]+:(asc|desc))(,([\w+]+:(asc|desc)))*
It will match one or more fields, ignoring those that do not meet the spec.

Related

Using JOIN and REGEXEXTRACT with ARRAYFORMULA to Switch First and Last Names Not Working

I have a column of names in my spreadsheet that are structured like this...
Albarran Basten, Thalia Aylin
I'm using the formula below to extract every word BEFORE the comma (last name), and then only the first word AFTER the comma (first name), and then switch their places. It works great.
=join(" ",REGEXEXTRACT(D2,",\s(\S+)"),REGEXEXTRACT(D2,"^(.*?),"))
The formula above returns the name mentioned above like this, exactly as I need it to...
Thalia Albarran Basten
But, when I try to get it to automatically update the entire column of names using ARRAYFORMULA, it joins together all the names in the column all together into one cell, in each of the cells all the way down the column. Here's the formula I'm using that won't work...
={"Student Full Name";arrayformula(if(D2:D="",,join(" ",REGEXEXTRACT(D2:D,",\s(\S+)"),REGEXEXTRACT(D2:D,"^(.*?),"))))}
Any idea on what I could change in this arrayformula to make it work? Thanks for your help.

You can replace your REGEXEXTRACTs with a single REGEXREPLACE:
REGEXREPLACE(D2:D, "^(.*?),\s*(\S+).*", "$2 $1")
Or,
REGEXREPLACE(D2:D, "^([^,]*),\s*(\S+).*", "$2 $1")
See the regex demo.
Details:
^ - start of string
(.*?) - Group 1 ($1): zero or more chars other than line break chars as few as possible
, - a comma
\s* - zero or more whitespaces
(\S+) - Group 2 ($2): one or more non-whitespaces
.* - zero or more chars other than line break chars as many as possible.

With your shown samples please try following regex with REGEXREPLACE.
REGEXREPLACE(D2:D, "^([^,]*),\s*([^\s]+)\s\S*$", "$2 $1")
Here is the Online demo for used regex.
Explanation: Adding detailed explanation for used regex.
^ ##Matching from starting of the value.
([^,]*) ##Creating 1st capturing group which matches everything before comma comes.
,\s* ##Matching comma followed by 0 or more occurrences of spaces here.
([^\s]+) ##Creating 2nd capturing group where matching all non-spaces here.
\s\S*$ ##Matching space followed by 0 or more non-spaces till end of the value.

Comma separated prefix list with commas inside

I'm trying to match a comma separated list with prefixed values which contains also a comma.
I finally made it to match all occurrence which doesn't have a ,.
Sample String (With NL for visualization - original string doesn't have NL):
field01=Value 1,
field02=Value 2,
field03=<xml value>,
field04=127.0.0.1,
field05=User-Agent: curl/7.28.0\r\nHost: example.org\r\nAccept: */*,
field06=Location, Resource,
field07={Item 1},{Item 2}
My actual RegEx looks like this not optimized piece ....
(?'fields'(field[0-9]{2,3})=?([\s\w\d_<>.:="*?\-\/\\(){}<>'#]+))([^,](?&fields))*
Any one has a clue how to solve this?
EDIT:
The first pattern is near to my expected result.
This is a anonymized full example of the string:
asm01=Predictable Resource Location,Information Leakage,asm02=N/A,asm04=Uncategorized,asm08=2021-02-15 09:18:16,asm09=127.0.0.1,asm10=443,asm11=N/A,asm15=,asm16=DE,asm17=User-Agent: curl/7.29.0\r\nHost: dev.example.com\r\nAccept: */*\r\nX-Forwarded-For: 127.0.0.1\r\n\r\n,asm18=/Common/_www.example.com_live_v1,asm20=127.0.0.1,asm22=,asm27=HEAD,asm34=/Common/_www.example.com_live_v1,asm35=HTTPS,asm39=blocked,asm41=0,asm42=3,asm43=0,asm44=Error,asm46=200000028,200100015,asm47=Unix hidden (dot-file) access,.htaccess access,asm48={Unix/Linux Signatures},{Apache/NCSA HTTP Server Signatures},asm50=40622,asm52=200000028,asm53=Unix hidden (dot-file) access,asm54={Unix/Linux Signatures},asm55=,asm61=,asm62=,asm63=8985143867830069446,asm64=example-waf.example.com,asm65=/.htaccess,asm67=Attack signature detected,asm68=<?xml version='1.0' encoding='UTF-8'?><BAD_MSG><violation_masks><block>13020008202d8a-f803000000000000</block><alarm>417020008202f8a-f803000000000000</alarm><learn>13000008202f8a-f800000000000000</learn><staging>200000-0</staging></violation_masks><request-violations><violation><viol_index>42</viol_index><viol_name>VIOL_ATTACK_SIGNATURE</viol_name><context>request</context><sig_data><sig_id>200000028</sig_id><blocking_mask>7</blocking_mask><kw_data><buffer>Ly5odGFjY2Vzcw==</buffer><offset>0</offset><length>2</length></kw_data></sig_data><sig_data><sig_id>200000028</sig_id><blocking_mask>4</blocking_mask><kw_data><buffer>Ly5odGFjY2Vzcw==</buffer><offset>0</offset><length>3</length></kw_data></sig_data><sig_data><sig_id>200100015</sig_id><blocking_mask>7</blocking_mask><kw_data><buffer>Ly5odGFjY2Vzcw==</buffer><offset>1</offset><length>9</length></kw_data></sig_data></violation></request-violations></BAD_MSG>,asm69=5,asm71=/Common/_dev.example.com_SSL,asm75=127.0.0.1,asm100=,asm101=HEAD /.htaccess HTTP/1.1\r\nUser-Agent: curl/7.29.0\r\nHost: dev.example.com\r\nAccept: */*\r\nX-Forwarded-For: 127.0.0.1\r\n\r\n#015

The pattern does not work as the fields group matches the string field
You are trying to repeat the named group fields but the example strings do not have the string field.
Note that [^,] matches any char except a comma, you can omit the capture group inside the named group field as it already is a group and \w also matches \d
With 2 capture groups:
\b(asm[0-9]+)=(.*?)(?=,asm[0-9]+=|$)
\b A word boundary
(asm[0-9]+) Capture group 1, match asm and 1+ digits
= Match literally
(.*?) Capture group 2, match any char as least as possible
(?= Positive lookahead, assert what is at the right is
,asm[0-9]+= Match ,asm followed by 1+ digits and =
| Or
$ Assert the end of the string
) Close lookahead
Regex demo

A simple solution would be (see regexr.com/5mg1b):
/((asm\d{2,3})=(.*?))(?=,asm|$)/g
Match groupings will be:
group #1 - asm01=Predictable Resource Location,Information Leakage
group #2 - asm01
group #3 - Predictable Resource Location,Information Leakage
Conditions:
This will match everything including empty values
The key here is to make sure that each match is delimited by either a comma and your field descriptor, or an end of string. A look ahead will be handy here: (?=,asm|$).

RegEx - Return pattern to the right of a text string for URL

I'm looking to return the URL string to the right of a specific set of text using RegEx:
URL:
www.websitename/countrycode/websitename/contact/thank-you/whitepaper/countrycode/whitepapername.pdf
What I would like to just return:
/whitepapername.pdf
I've tried using ^\w+"countrycode"(\w.*) but the match won't recognize countrycode.
In Google Data Studio, I want to create a new field to remove the beginning of the URL using the REGEX_REPLACE function.
Ideally using:
REGEX_REPLACE(Page,......)

The REGEXP_REPLACE function below does the trick, capturing all (.*) the characters after the last countrycode, where Page represents the respective field:
REGEXP_REPLACE(Page, ".*(countrycode)(.*)$", "\\2")
Alternatively - Adapting the RegEx by The fourth bird to Google Data Studio:
REGEXP_REPLACE(Page, "^.*/countrycode(/[^/]+\\.\\w+)$", "\\1")
Google Data Studio Report as well as a GIF to elaborate:

You could use a capturing group and replace with group 1. You could match /countrycode literally or use the pattern to match 2 times chars a-z with an underscore in between like /[a-z]{2}_[a-z]{2}
In the replacement use group 1 \\1
^.*/countrycode(/[^/]+\.\w+)$
Regex demo
Or using a country code pattern from the comments:
^.*/[a-z]{2}_[a-z]{2}(/[^/]+\.\w+)$
Regex demo
The second pattern in parts
^ Start of string
.*/ Match until the last occurrence of a forward slash
[a-z]{2}_[a-z]{2} Match the country code part, an underscore between 2 times 2 chars a-z
( Capture group 1
/[^/]+ Match a forward slash, then match 1+ occurrences of any char except / using a negated character class
\.\w+ Match a dot and 1+ word chars
) Close group
$ End of string

Regex - optional capture group after wildcard

Say I have the following list:
No 1 And Your Bird Can Sing (4)
No 2 Baby, You're a Rich Man (5)
No 3 Blue Jay Way S
No 4 Everybody's Got Something to Hide Except Me and My Monkey (1)
And I want to extract the number, the title and the number of weeks in the parenthesis if it exists.
Works, but the last group is not optional (regstorm):
No (?<no>\d{1,3}) (?<title>.*?) \((?<weeks>\d)\)
Last group optional, only matches number (regstorm):
No (?<no>\d{1,3}) (?<title>.*?)( \((?<weeks>\d)\))?
Combining one pattern with week capture with a pattern without week capture works, but there gotta be a better way:
(No (?<no>\d{1,3}) (?<title>.*) \((?<weeks>\d)\))|(No (?<no>\d{1,3}) (?<title>.*))
I use C# and javascript but I guess this is a general regex question.

Your regex is almost there!
First and most importantly, you should add a $ at the end. This makes (?<title>.*?) match all the way towards the end of the string. Currently, (?<title>.*?) matches an empty string and then stops, because it realises that it has reached a point where the rest of the regex matches. Why does the rest of the regex match? Because the optional group can match any empty string. By putting the $, you are making the rest of the regex "harder" to match.
Secondly, you forgot to match an open parenthesis \(.
This is how your regex should look like:
No (?<no>\d{1,3}) (?<title>.*?)( \((?<weeks>\d)\))?$
Demo

You may use this regex with an optional last part:
^No (?<no>\d{1,3}) (?<title>.*?\S)(?: \((?<weeks>\d)\))?$
RegEx Demo

Another option could be for the title to match either not ( or when it does encounter a ( it should not be followed by a digit and a closing parenthesis.
^No (?<no>\d{1,3}) (?<title>(?:[^(\r\n]+|\((?!\d\)))+)(?:\((?<weeks>\d)\))?
In parts
^No
(?\d{1,3}) Group no and space
(?<title>
(?: Non capturing group
[^(\r\n]+ Match any char except ( or newline
| Or
\((?!\d\)) Match ( if not directly followed by a digit and )
)+ Close group and repeat 1+ times
) Close group title
(?: Non capturing group
\((?<weeks>\d)\) Group weeks between parenthesis
)? Close group and make it optional
Regex demo
If you don't want to trim the last space of the title you could exclude it from matching before the weeks.
Regex demo

Regex optional group

I am using this regex:
((?:[a-z][a-z]+))_(\d+)_((?:[a-z][a-z]+)\d+)_(\d{13})
to match strings like this:
SH_6208069141055_BC000388_20110412101855
separating into 4 groups:
SH
6208069141055
BC000388
20110412101855
Question: How do I make the first group optional, so that the resulting group is a empty string?
I want to get 4 groups in every case, when possible.
Input string for this case: (no underline after the first group)
6208069141055_BC000388_20110412101855

Making a non-capturing, zero to more matching group, you must append ?.
(?: ..... )?
^ ^____ optional
|____ group

You can easily simplify your regex to be this:
(?:([a-z]{2,})_)?(\d+)_([a-z]{2,}\d+)_(\d+)$
^ ^^
|--------------||
| first group ||- quantifier for 0 or 1 time (essentially making it optional)
I'm not sure whether the input string without the first group will have the underscore or not, but you can use the above regex if it's the whole string.
regex101 demo
As you can see, the matched group 1 in the second match is empty and starts at matched group 2.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex for page sorting query string - regex

I think this will do the trick: ([\w+]+:(asc|desc))(,([\w+]+:(asc|desc)))* It will match one or more fields, ignoring those that do not meet the spec.

Related

Using JOIN and REGEXEXTRACT with ARRAYFORMULA to Switch First and Last Names Not Working

Comma separated prefix list with commas inside

RegEx - Return pattern to the right of a text string for URL

Regex - optional capture group after wildcard

Regex optional group

Categories

Resources