Regex with special chars like | - regex

I'm looking for a way to construct a long regex expression that includes several items with special chars that are regex commands as well, for example:
"Bath Tub | Black" and "Bath Tub | Green".
How do I use them as "string" with the | inside?
I thought about
(Bath Tub|Showers) \| (Black|Green)
but then again there are items like "Bath Tub | Green (light)" and "Bath Tub | Green (dark)" so I need to add that too, so I thought in some cases it would be the best solution to just take it like
"Bath Tub | Green (light)"|"Bath Tub | Green (dark)"
if that would work.
Any idea how to do it? I'm really horrible with regex and it is giving me a lot of headache, so I would appreciate any help!
Kind regards

You can add them all into alternation list:
Bath Tub|Showers|Black|Green(?: \((?:light|dark)\))?
Have a look at the example.
The main point is that all of these alternatives can be added to the list (separated with |), and in case you have additional words after these strings, you can add them as optional non-capturing groups ((?: \((?:light|dark)\))?). Also, pay attention at the escaped round brackets - they will be matched as literals.
If you want to match a literal pipe symbol inside a string |, you should escape it: \|. See this example.
To match all entries from "Bath Tub | Green (light)"|"Bath Tub | Green (dark)", together with quotes, you can add optional groups with quotes at start and end of the pattern: (?:")?(?:Bath Tub|Showers|Black|Green(?: \((?:light|dark)\))?)(?:")?. See yet another example.

Related

REGEX to find match with any whitespace plus a special character plus a single whitespace plus anything with exceptions:

Intro:
I am looking to make a code hinter in Javascript for Vue i18n localizations.
Details:
Using Node readline to read line by line through a Vue file I want to find one pattern using REGEX (of the many patterns I am looking for) which story-wise is as follows:
For a single string,
find any amount of whitespace (spaces or indents)
PLUS
exactly one closing parenthesis
PLUS
exactly one space (for now, this might change)
PLUS (tricky part, bare with me)
any amount of characters, numbers, or special characters except {{$t('[anything here]'}} or {{ $t('[anything here]' }} or if there is nothing after the closing parenthesis altogether this line would fail to match the pattern.
1 | )
2 | )
3 | ) {
4 | ) {{
5 | ) Cancel
6 | ) .[];\`\'.;l][
7 | ) {{ $t('common.cancel') }}
8 | ) {{$t('common.cancel')}}
Lines 1-2 and lines 7-8 should not match. Only lines 3-6 should match.
Attempted Solution:
So far my REGEX pattern is this:
\s+\)\s{1}(.*) which does not match Lines 1 and 2 (good thing) because of the lack of a single whitespace after the closing parenthesis.
Problem:
It allows Lines 7 and 8 to pass. I can't figure out how to say anything is allowed BUT the three exception scenarios mentioned in the story of what I am trying to achieve.
My brain now:
Thinking baby steps, I want to negate a { after the single whitespace portion. If I try \s+\)\s{1}(.*)[^\{], the not block would negate any of the lines with an opening curly bracket from passing the match. But that's not the case because I am assuming the (.*) portion renders the negate block useless. Can't seem to even make this baby step. Please help.
Following your requirements, I came up with this pattern:
^\s+\)\s(?!.*{{ ?\$t\('[^']*'\) ?}}).+$
It's using the common 'Everything but..' approach (here & here)
with the extra bits added in the front and at the end.
Online Test

How to use REGEXEXTRACT to remove spesific text between multiple brackets on Google Sheet

I have to extract campaign names from this text:
[Ads] | [Bing] | [Leaderboard] | UCL MATCH | 29 September - 31 Desember 2019
Ideally, I only want to extract UCL MATCH and remove all the others, how do I do this with regex? Or is there some other way to do it on Google Sheets?
I only managed to do this:
=REGEXEXTRACT(K8,"\[(.*)\]\ | \ w+\|\[(.+)\]\|")
which resulted in Ads] | [Bing] | [Leaderboard.
Please read my comment to your original post.
However, assuming that your answer to all questions in that comment is YES, this should work:
=TRIM(REGEXEXTRACT(K8,"([^\|]+)\|[^\|]+$"))
You can extract the 4th pipe-separated item with
=REGEXEXTRACT(K8,"^(?:[^|]*\|){3}\s*([^|]*[^|\s])")
See the regex demo. Details:
^ - start of string
(?:[^|]*\|){3} - three sequences of zero or more chars other than | and then a | char
\s* - zero or more whitespaces
([^|]*[^|\s]) - Group 1 (the actual return value): zero or more chars other than | and then a char other than whitespace and | char.
try:
=INDEX(TRIM(SPLIT(A1; "|"));;4)
for array:
=INDEX(TRIM(SPLIT(A1:A; "|"));;4)

Regex: Select the content of tags without some words, until |

I have more tags. And I want to Select their content without some words, and to replace with something else. For example:
<title>WORD_1 WORD_2 | Blahhhhhh<title>
<title>WORD_3 WORD_4<title>
<title>WORD_5 WORD_6<title>
<title>WORD_7 WORD_8 | Dammmmmm <title>
The desire select for replace:
WORD_1 WORD_2
WORD_3 WORD_4
WORD_5 WORD_6
WORD_7 WORD_8
Or, in other terms, I want to select all content of tags until the second part (until |)
You could accomplish this using the following regex ...
(?<=<title>).*?(?=\||<title>)
(?<=<title>) looks behind for <title>
.*? matches any charecter
(?=\||<title>) looks forward for | or <title>
see regex demo
EDIT 1 :
To keep only the words until | and delete all the tags ...
search with : .*?(?<=<title>)(.*?)(?=\||<title>).*
replace by : $1
EDIT 2 :
To keep only the words after | and delete all the tags ...
search with : .*?(?<=\|)(.*?)(?:\||<title>)
replace by : $1
While the previous answer is good I would suggest faster(optimized) regex pattern:
(<title>).+?(?=\||<title>)
https://regex101.com/r/8gCnCy/1
Performance comparison:
with PHP(PCRE) flavor:
(<title>).+?(?=\||<title>) - 4 matches, 260 steps (~229ms)
(?<=<title>).*?(?=\||<title>) - 4 matches, 433 steps (~288ms)
with Python flavor:
(<title>).+?(?=\||<title>) - 4 matches, 370 steps (~270ms)
(?<=<title>).*?(?=\||<title>) - 4 matches, 973 steps (~529ms)

Regex mask with varying ignored characters

I have a series of strings which look something like this:
foobar | ABC Some text 123
barfoo | DEF Some te 456
And I want to mask it such that I get the results
ABC123
DEF456
respectively. The text in between will always be a substring Some text which could potentially contain numbers (e.g. S0m3 t3xt or S0m3 t3). It will always be a substring starting from the left, so never me te.
So clearly I need to start the Regex with something like
(?<=| )[A-Z]{3}
which gets me ABC and DEF but I am at a loss of how to effectively concatenate the numbers at the end of the string.
Is there any way to do this with a single expression?
See http://regexr.com?375u8
(?<=| )([A-Z]{3}).*(\d{3})
This will give you three characters in the range of A-Z and three numbers in two capturing groups, allowing you to use these groups to concatenate both to your desired output: $1$2
This will even work if your Some text contains three numbers inbetween.
In case you want to replace everything with both of your capturing groups, add .* in front of the regex:
.*(?<=| )([A-Z]{3}).*?(\d{3})
Another javascript version
[
'foobar | ABC Some text 123',
'barfoo | DEF Some te 456'
].map(function(v) {
return v.replace(/^.*\| ([A-Z]{3}) .* (\d{3})$/, '$1$2');
})
Gives
["ABC123", "DEF456"]

Custom HTML5 validation pattern: YYYY.anynumber

I googled a lot, but I'm stuck.
There is a cool thing in HTML5, required patterns. It's great for emails / phones / dates validation. I use it in my small project for checking numbers. What I need is a pattern for:
YYYY.ordernumber
Order number may be any number from 1 to 1000000.
I tried to modify some YYYY.MM patterns for my case, but with no luck. What ever I type in does not pass the validation.
Can anyone please help?
UPDATE: Added a lookahead to ensure 'ordernumber' is > 0 (thanks to M42's remark in comments).
You can use those two attributes with your <input>:
pattern="^[0-9]{4}\.(?!0+$)([0-9]{1,6}|1000000)$"
required
E.g.
<input type="text" placeHolder="YYYY.ordernumber" title="YYYY.ordernumber"
pattern="^[0-9]{4}\.(?!0+$)([0-9]{1,6}|1000000)$" required />
See, also, this short demo.
Short explanation of the regex:
^[0-9]{4}\.(?!0+$)([0-9]{1,6}|1000000)$ _____________
^\______/\/\_____/ \________/\______/ ^___|match the end|
| | | |_(*2) |_ |_____ |of the string|
_______| | |____ | |
_________|__ _______|_____ _|______ _|________ _|______
|match the | |match exactly| |match a | |match 1 to| |or match|
|beggining of| |4 digits | |dot (*1)| |6 digits | |1000000 |
|the string |
(*1): '.' is a special character in regex, so it has to be escaped ('.').
(*2): This is a negative lookahead which does consume any characters, but looks ahead and makes sure that the rest of the string in not consisted of zeros only.
Just for the sake of completeness:
I must point out the fact that [0-9] matches only digits 0-9. If you need to also match other digit characters, such as for example Eastern Arabic numerals (٠١٢٣٤٥٦٧٨٩), you can use \d instead.