How can I allow hyphens in this RegEx - regex

I know a bit of RegEx but this one's a bit too complicated for me.
All I need to change is for it to allow for a single hyphen too.
replace(/[^\p{L}\s]+/gu, '')

You may use
.replace(/^([^-]*-)|-/g, '$1').replace(/[^\p{L}\s-]+/gu, '')
It will keep the first - in the input string as well as any Unicode letters (\p{L}) and whitespaces (\s), because .replace(/^([^-]*-)|-/g, '$1') will match and capture - from the start of string - all chars other than - up to the first - (with ^([^-]*-)) and then match any other - in the string and replace the matches with the value of Group 1 (it will be empty if the - is not the first hyphen in the string) and .replace(/[^\p{L}\s-]+/gu, '') will remove any one or more chars other than letters, whitespaces and hyphens (there will remain the first one only after the first replacement).
See the ECMAScript 2018+ JS demo below:
console.log( "12-3-**(Виктор Викторович)**...".replace(/^([^-]*-)|-/g, '$1').replace(/[^\p{L}\s-]+/gu, '') )

Related

Pattern to match everything except a string of 5 digits

I only have access to a function that can match a pattern and replace it with some text:
Syntax
regexReplace('text', 'pattern', 'new text'
And I need to return only the 5 digit string from text in the following format:
CRITICAL - 192.111.6.4: rta nan, lost 100%
Created Time Tue, 5 Jul 8:45
Integration Name CheckMK Integration
Node 192.111.6.4
Metric Name POS1
Metric Value DOWN
Resource 54871
Alert Tags 54871, POS1
So from this text, I want to replace everything with "" except the "54871".
I have come up with the following:
regexReplace("{{ticket.description}}", "\w*[^\d\W]\w*", "")
Which almost works but it doesn't match the symbols. How can I change this to match any word that includes a letter or symbol, essentially.
As you can see, the pattern I have is very close, I just need to include special characters and letters, whereas currently it is only letters:
You can match the whole string but capture the 5-digit number into a capturing group and replace with the backreference to the captured group:
regexReplace("{{ticket.description}}", "^(?:[\w\W]*\s)?(\d{5})(?:\s[\w\W]*)?$", "$1")
See the regex demo.
Details:
^ - start of string
(?:[\w\W]*\s)? - an optional substring of any zero or more chars as many as possible and then a whitespace char
(\d{5}) - Group 1 ($1 contains the text captured by this group pattern): five digits
(?:\s[\w\W]*)? - an optional substring of a whitespace char and then any zero or more chars as many as possible.
$ - end of string.
The easiest regex is probably:
^(.*\D)?(\d{5})(\D.*)?$
You can then replace the string with "$2" ("\2" in other languages) to only place the contents of the second capture group (\d{5}) back.
The only issue is that . doesn't match newline characters by default. Normally you can pass a flag to change . to match ALL characters. For most regex variants this is the s (single line) flag (PCRE, Java, C#, Python). Other variants use the m (multi line) flag (Ruby). Check the documentation of the regex variant you are using for verification.
However the question suggest that you're not able to pass flags separately, in which case you could pass them as part of the regex itself.
(?s)^(.*\D)?(\d{5})(\D.*)?$
regex101 demo
(?s) - Set the s (single line) flag for the remainder of the pattern. Which enables . to match newline characters ((?m) for Ruby).
^ - Match the start of the string (\A for Ruby).
(.*\D)? - [optional] Match anything followed by a non-digit and store it in capture group 1.
(\d{5}) - Match 5 digits and store it in capture group 2.
(\D.*)? - [optional] Match a non-digit followed by anything and store it in capture group 3.
$ - Match the end of the string (\z for Ruby).
This regex will result in the last 5-digit number being stored in capture group 2. If you want to use the first 5-digit number instead, you'll have to use a lazy quantifier in (.*\D)?. Meaning that it becomes (.*?\D)?.
(?s) is supported by most regex variants, but not all. Refer to the regex variant documentation to see if it's available for you.
An example where the inline flags are not available is JavaScript. In such scenario you need to replace . with something that matches ALL characters. In JavaScript [^] can be used. For other variants this might not work and you need to use [\s\S].
With all this out of the way. Assuming a language that can use "$2" as replacement, and where you do not need to escape backslashes, and a regex variant that supports an inline (?s) flag. The answer would be:
regexReplace("{{ticket.description}}", "(?s)^(.*\D)?(\d{5})(\D.*)?$", "$2")

Regex to pick a value from url

I am having difficulty to build a regex which can extract a value from the URL. The condition is get the value between after last "/" and ".html" Please help
Sample URL1 - https://www.example.com/fgf/sdf/sdf/as/dwe/we/bingo.html - The value I want to extract is bingo
Sample URL2 - www.example.com/we/b345g.html - The value I want to extract is b345g
I tried to build a regex and I was able to get "bingo.html" and "b345g.html using [^\/]+$ but was not able to remove or skip ".html"
Here you are:
\/([^\/]+?)(?>\..+)?$
Explaination:
\/ - literal character '/'
([^\/]+?) - first group: at least one character that is not a '/' with greedyness (match only the first expansion)
[^\/] - any character that is not a '/'
+ - at least one occurence
? - greediness operator (match only first expansion)
(?>\..+)? - second optional group: '.' + any character (like '.html' or '.exe' or '.png')
?> - non-capturing lookahead group (exclude the content from the result)
\. - literal character '.'
. - any character (except line terminators)
+ - at least one occurence
? - optionality (note that this one is outside the parenthesis)
$ - end of the string
If you want also to exclude query strings you can expand it like this:
\/([^\/]+?)(?>\..+)?(?>\?.*)?$
If you also need to remove the protocol part of the url you can use this:
(?<!\/)\/([^\/]+?)(?>\..+)?(?>\?.*)?$
Where this (?<!\/) just look if there are no '/' before the start of the match
You are only matching using [^\/]+$ but not differentiating between the part before and after the dot.
To make that different, you could use for example a capture group to get the part after the last slash and before the first dot.
\S*\/([^\/\s.]+)\.[^\/\s]+$
\S*\/ Match optional non whitespace chars till the last occurrence of /
([^\/\s.]+) Capture group 1 Match 1+ times any char except a / whitespace char or .
\. Match a dot
[^\/\s]+ Match 1+ times any char except a / whitespace char or .
$ End of string
See a regex demo.

Regular expression to extract string before and the string after the third hyphen

Am trying to parse strings similar to these variations:
"AB-19-027654-A-1"
"AB-19-027654-A-1-2"
"ABC-19-027654-A-1"
"ABC-19-027654-A-1-2"
Looking for a way to use regular expression to have the above strings split at the third hyphen into two strings.
"AB-19-027654-A-1" split into "AB-19-027654" and "A-1"
"AB-19-027654-A-1-2" split into "AB-19-027654" and "A-1-2"
"ABC-19-027654-A-1" split into "ABC-19-027654" and "A-1"
"ABC-19-027654-A-1-2" split into "ABC-19-027654" and "A-1-2"
Have tried something like this ^(?'STRING1'.+[\d-}])-(?'STRING2'.*)-??$
but it does work for all the combinations listed.
The only consistency I can find in the original strings is that there is always at least three hyphens and the two strings I need are before and after that third hyphen accordingly.
Any ideas would be appreciated.
You can use this regex with two capture groups:
/^((?:[^-]+-?){3})-(.*)$/
Explanation:
^ - start of string
( - start capture group 1
(?:[^-]+-?){3} - non-capturing group of characters other than - followed by optional -, repeated 3 times
) - end capture group 1
- - literal -
(.*) - capture group 2: everything to end of string
$ - end of string

RegEx for name: Any language but first letter must be capital

I have a requirement to accept a first name as input and check that the first letter is caps and that there can be 1 space after the end of the string.
This RegEx works for 'Bob ':
^[A-Z][A-Za-z\p{L}]+[\s,.'\-]?[a-zA-Z\p{L}]*$
An extra requirement is then to allow any language / character which then involves allowing unicode.
This RegEx works for a russian name: 'Афанасий'
^[A-Z\p{L}][A-Za-z\p{L}]+[\s,.'\-]?[a-zA-Z\p{L}]*$
... However, while it allows for unicode, it also allows me to enter 'bob' with a small first letter and the RegEx allows this through.
Is there any way to allow both unicode and still flag up the first letter when it is not capital? ( Using a RegEx)
I could make some code changes to get round this issue but it would be nice to be able to keep it all in the RegEx value without making code changes.
Any Unicode uppercase letter can be matched with \p{Lu}.
Use
^\p{Lu}\p{L}+[\s,.'\-]?\p{L}*$
or
^\p{Lu}\p{L}+(?:[\s,.'-]\p{L}+)?$
See the regex demo 1 and regex demo 2. The second regex is more precise as it won't allow trailing whitespace, comma, etc. (what is defined in the [\s,.'-] character class).
Note that there is no point in using [A-Za-z\p{L}] since \p{L} already matches [a-zA-Z].
Pattern details:
^ - start of string
\p{Lu} - an uppercase Unicode letter
\p{L}+ - one or more Unicode characters
(?:[\s,.'-]\p{L}+)? - one or zero (optional) sequence of
[\s,.'-] - a whitespace, ,, ., ' or a hyphen
\p{L}+ - 1 or more Unicode letters
$ - end of string.

Sublime Regex extract

<.*>|\n.*\s.*\sid="(\w*)".*\n+|.*>\n|\n.+
and replace $1
This regex can take all id out from file
<a href="java" class="total" id="maker" placeholder="getTheResult('local6')">master6<a>
Result is maker
How can I extract getTheResult key name?
so my result will be local6
Tried <.*>|\n.*\s.*\sgetTheResult('(\w*)').*\n+|.*>\n|\n.+ but didn't helped
I assume that:
you have files with text like getTheResult('local6')
you may have several values like that on a line
you'd like to keep those text only, one value per line.
I suggest
getTheResult\('([^']*)'\)|(?:(?!getTheResult\(')[\s\S])*
and replace with $1\n. The \n will insert a newline between the values. You can then use ^\n regex (to replace with empty string) to remove empty lines.
Pattern details:
getTheResult\(' - matches getTheResult(' as a literal string (note the ( is escaped)
([^']*) - Group 1 capturing 0+ chars other than '
'\) - a literal ')
| - or
(?:(?!getTheResult\(')[\s\S])* - 0+ chars that are not starting chars of the getTheResult(' character sequence (this is a tempered greedy token).