Exclude a combination of characters with regex or add a letter

Exclude a combination of characters with regex or add a letter - regex

I'm trying to adjust KODI's search filter with regex so the scrapers recognize tv shows from their original file names.
They either come in this pattern:
"TV show name S04E01 some extra info" or this "TV show name 01 some extra info"
The first is not recognized, because "S04" scrambles the search in a number of ways, this needs to go.
The second is not recognized, because it needs an 'e' before numbers, otherwise, it won't be recognized as an episode number.
So I see two approaches.
Make the filter ignore s01-99
prepend an 'e' any freestanding two-digit numbers, but I worry if regex can even do that.
I have no experience in the regex, but I've been playing around coming up with this, which unsurprisingly doesn't do the trick
^(?!s{00,99})\d{2}$

You may either find \b([0-9]{2})\b regex matches and replace with E$1, or match \bs(0[1-9]|[1-9][0-9])\b pattern in an ignore filter.
Details
\b([0-9]{2})\b - matches and captures into Group 1 any two digits that are not enclosed with letters, digits and _. The E$1 replacement means that the matched text (two digits) is replaced with itself (since $1 refers to the Group 1 value) with E prepended to the value.
\bs(0[1-9]|[1-9][0-9])\b - matches an s followed with number between 01 and 99 because (0[1-9]|[1-9][0-9]) is a capturing group matching either 0 and then any digit from 1 to 9 ([1-9]), or (|) any digit from 1 to 9 ([1-9]) and then any digit ([0-9]).
NOTE: If you need to generate a number range regex, you may use this JSFiddle of mine.

Related

Regular Expression Stopping at Specified Value

I have to use a regular expression to parse values out of a swift message and there are some situations where the behaviour is not what I want.
Lets say I am after something with a particular pattern - in this case a BIC (6 letters, followed by 2 letters or digits followed by optional XXX or 3 digits)
([A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
this is fine but now I want to look for these bank codes in particular fields. In swift a field is denoted with : and has some numbers and sometimes a letter.
so I want to match a BIC value in field 52A
I can do the following
(52A:[A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
which would match 52A:AAAAAAAAXXX
my problem is you can have things before and after this value - and the value itself might not exist in the field you want
so I can wildcard the reg ex to allow for things before it for example
(52A:.*?[A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
matches 52A:somerubbishAAAAAAAAXXX
but if there isnt something within this field - the reg ex continues to search for the pattern and this is where i have a problem.
for example the above reg ex matches this 52A:somerubbish:57D:AAAAAAAAXXX
Question
I need the reg ex to stop on the first field that is after it (it might not always be 57D but it will always follow the format [0-9]{2}[A-Z]{0,1})
so the above example shouldnt return a match as the pattern I am after is not contained in the 52A section
Does anyone know how I can do this?

Change .*? to [^:]*?:
(52A:[^:]*?[A-Z]{6}[A-Z0-9]{2}[XXX0-9]{0,3})
[^:] means "any character except :", which ensures the match doesn't run into the next field.
See live demo.
Also, unless your situation requires you to match your target as group 1, you don't need the outer brackets: the entire match (ie group 0) will be your target.
I suspect instead of [XXX0-9]{0,3} you want (XXX|\d{3})? (XXX or 3 digits, but optionally) or perhaps (XXX|\d{1,3})? (XXX or up to 3 digits, but optionally)

Using [XXX0-9]{0,3} (which is the same as [X0-9]{0,3}) is a character class notation, repeating 0-3 times an X char or a digit.
If the value itself can also contain a colon, you can match any character as "rubbish" as long as what is directly to the right is not the field format.
52A:(?:(?![0-9]{2}[A-Z]?:).)*[A-Z]{6}[A-Z0-9]{2}(?:[0-9]{3}|XXX)?
The pattern matches:
52A: Match literally
(?:(?![0-9]{2}[A-Z]?:).)* Match any character asserting not 2 digits, optional char A-Z and : directly to the right
[A-Z]{6}[A-Z0-9]{2} Match 6 chars A-Z and 2 chars A-Z or 0-9
(?:[0-9]{3}|XXX)? Optionally match 3 digits or XXX
See a regex demo.

Notepad++ replace specific number with up to it's first 4 digit

I want to find those number which contains more than 5 digits and replace it with first 4 digits.
Used below Regex to find number which contains more than 5 digits.
[0-9]{5,}
How Can I achieve blow output?
99999999 -> this will replace with 9999
12345.66 -> this will replace with 1234.66
1234 -> Remains unchanged

This one should do it:
The regex
([0-9]{4})[0-9]+
takes the four numbers as first (and only) group
requires at lease one more number behind
replaces the complete match with the first (and only) group

Using notepad++, you can match 4 digits, then use \K to clear the current output buffer and match 1 or more digits.
\d{4}\K\d+
See a regex demo.
In the replacement use an empty string.
If you don't want partial matches, you can add word boundaries \b around the pattern.
\b\d{4}\K\d+\b
See another regex demo

Regex capture group that excludes optional substring?

I'm trying to construct a regex to extract Swedish organization numbers from data. These numbers can be of the following formats:
999999999999 // 12 digits, first two should be ignored.
9999999999 // 10 digits, all should be included.
99999999-9999 // 12 digits with a dash, first two digits and the dash should be ignored
999999-9999 // 10 digits with a dash, dash should be ignored.
For the 12 digit cases, the first two digits are always 16, 19 or 20. My current attempt is:
(?:16|19|20)?(\d{6}\-?\d{4})
This will return a ten digit organization number in $1, but it will contain the dash if it's present. I want the dash to be stripped (or possibly added if it's missing), so that $1 has the same format regardless of dash or no dash in the input.
The regex is in a config and will be used in code that simply extracts $1, so I can't solve this in code - I need the regex to do it "by itself".
As a last resort, I could modify the code to allow config to specify a "replace string" in addition to the search regex, and have the code use the result of the replace as the end result of the extraction. In that case I could use this:
Regex: (?:16|19|20)?(\d{6})\-?(\d{4})
Replace string: $1$2
But this causes other problems, because for other config items, the regex will return multiple "data fields", one for each capture group. To get this to work I would need, in that case, to provide a sequence of replace strings, e.g. for a tab separated format with organization number in the middle:
Regex: ([^\t]*)\t(?:16|19|20)?(\d{6})\-?(\d{4})\t([\d]*)
Replace string 1: $1 (free text field)
Replace string 2: $2-$3 (the organization number with dash "enforced")
Replace string 3: $4 (numeric field)
Workable, but rather awkward... So, any way to solve it within the search regex?

Validation of international telephone numbers with REGEXMATCH

I'm trying to apply a data validation formula to a column, checking if the content is a valid international telephone number. The problem is I can't have +1 or +some dial code because it's interpreted as an operator. So I'm looking for a regex that accepts all these, with the dial code in parentheses:
(+1)-234-567-8901
(+61)-234-567-89-01
(+46)-234 5678901
(+1) (234) 56 89 901
(+1) (234) 56-89 901
(+46).234.567.8901
(+1)/234/567/8901
A starting regex can be this one (where I also took the examples).

This regex match all the example you gave us (tested with https://fr.functions-online.com/preg_match_all.html)
/^\(\+\d+\)[\/\. \-]\(?\d{3}\)?[\/\. \-][\d\- \.\/]{7,11}$/m
^ Match the beginning of the string or new line.
To match (+1) and (+61): \(\+\d+\): The plus sign and the parentheses have to be escaped since they have special meaning in the regex. \d+ Stand for any digit (\d) character and the plus means one or more (the plus could be replaced by {1,2})
[\/\. \-] This match dot, space, slash and hyphen exactly one time.
\(?\d{3}\)?: The question mark is for optional parenthesis (? = 0 or 1 time). It expect three digits.
[\/\. \-] Same as step 3
[\d\- \.\/]{7,11}: Expect digits, hyphen, space, dot or slash between 7 and 11 time.
$ Match the end of the line or the end of the string
The m modifier allow the caret (^) and dollar sign ($) combination to match line break. Remove that if you want those symbol to match only the begining and the end of the string.
Slashes are use are delimiter for this regex (there are other character that you can use).
I must admit I don't like the last part of the regex as do not ensure that you have at least 7 digits.
It would be probably better to remove all the separator (by example with PHP function str_replace) and deal only with parenthesis and number with this regex
/(\(\+\d+\))(\(?\d{3}\)?)(\d{3})(\d{4})/m
Notice that in this last regex I used 4 capturing group to match the four digit section of the phone number. This regex keep the parenthesis and the plus sign of the first group and the optional parenthesis of the second group. To keep only the digits group, you can use this regex:
/\(\+(\d+)\)\(?(\d{3})\)?(\d{3})(\d{4})/m
Note: The groups are for formatting the phone number after validating it. It is probably better for you to keep all your phone number in your database in the same format.
Well, here are different possibility you can use.
Note: Those regex should be compatible with all regex engine, but it is good practice to specify with which language you works because regex engine don't deal the same way with advanced/fancy function.
By example, the look behind is not supported by javascript and .Net allow a more powerful control on lookbehind than PHP.
Keep me in touch if you need more information

Regex to match number(s) or UUID

I need regex which loosely matches UUIDs and numbers. I expect my filename to be formatted like:
results_SOMETHING.csv
This something ideally should be numbers (count of how many time a script is run) or a UUID.
This regex is encompasses a huge set of filenames:
^results_?.*.csv$
and this one:
^results_?[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.csv$
matches only UUIDs. I want a regex whose range is somewhere in between. Mostly I don't want matches like result__123.csv.

Note: This doesn't directly answer the OP question, but given the title, it will appear in searches.
Here's a proper regex to match a uuid based on this format without the hex character constraint:
(\w{8}(-\w{4}){3}-\w{12}?)
If you want it to match only hex characters, use:
/([a-f\d]{8}(-[a-f\d]{4}){3}-[a-f\d]{12}?)/i
(Note the / delimiters used in Javascript and the /i flag to denote case-insensitivity; depending on your language, you may need to write this differently, but you definitely want to handle both lower and upper case letters).
If you're prepending results_ and appending .csv to it, that would look like:
^results_([a-z\d]{8}(-[a-z\d]{4}){3}-[a-z\d]{12}?).csv$

-----EDITED / UPDATED-----
Based on the comments you left, there are some other patterns you want to match (this was not clear to me from the question). This makes it a little more challenging - to summarize my current understanding:
results.csv - match (NEW)
results_1A.csv - match (NEW)
results_ABC.csv - ? no match (I assume)
result__123.csv - no match
results_123.csv - match
Results_123.cvs - ? no match
results_0a0b0c0d-884f-0099-aa95-1234567890ab.csv - match
You will find the following modification works according to the above "specification":
results(?:_(?:[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}|(?=.*[0-9])[A-Z0-9]+))?\.csv
Breaking it down:
results matches characters "results" literally
(?:_ ….)? non-capturing group, repeated zero or one time:
"this is either there, or there is nothing"
[0-9a-f]{8}- exactly 8 characters from the group [0-9a-f]
followed by hyphen "-"
(?:[0-9a-f]{4}-){3} ditto but group of 4, and repeated three times
[0-9a-f]{12} ditto, but group of 12
| OR...
(?=.*[0-9]+) at least one number following this
[A-Z0-9]+ at least one capital letter or number
\.csv the literal string ".csv" (the '.' has to be escaped)
demonstration on regex101.com

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Exclude a combination of characters with regex or add a letter - regex

Related

Regular Expression Stopping at Specified Value

Notepad++ replace specific number with up to it's first 4 digit

Regex capture group that excludes optional substring?

Validation of international telephone numbers with REGEXMATCH

Regex to match number(s) or UUID

Categories

Resources