Regex to match repeated pattern after a string - regex

I need a regex that extract pattern after specific word (her like Limits::)
i have teststring ,So let's say the text is always between delimiter !Limits::****! :
*ksjfl kfj sdfasdfaf dfasf asd sdf a dfasd fdaf ad f afdfaf dfad bla bla ksfajs ldsfskj !Limits::WLo1/WHi1/WHi1/WHi1,WLo2/WHi2/WHi/WHi2,.hier repeated pattern..,WLon/WHin/CLon/CHin!
fasdfakl skdfkas sflas fasf sdf afasf
i just want only words :
WLo1
WHi1
WHi1
WHi1
WLo2
WHi2
WHi
WHi2
.
.
.
WLon
WHin
CLon
CHin
i have tested like (?:!\w+::(?:(\w+)/(\w+)/(\w+)/(\w+)))|(?:,(\w+)/(\w+)/(\w+)/(\w+))+.*!, with fail

Regular expressions:
/(W.*|C.*)(?=\/|!|,)/g : match words beginning with W or C followed by / , !, or ,
/\/|,.*(?=,)|,/ : remove / or , or any characters followed by , or , from string returned from first RegExp
var str = "*ksjfl kfj sdfasdfaf dfasf asd sdf a dfasd fdaf ad f afdfaf dfad bla bla ksfajs ldsfskj !Limits::WLo1/WHi1/WHi1/WHi1,WLo2/WHi2/WHi/WHi2,.hier repeated pattern..,WLon/WHin/CLon/CHin! fasdfakl skdfkas sflas fasf sdf afasf";
var res = str.match(/(W.*|C.*)(?=\/|!|,)/g)[0].split(/\/|,.*(?=,)|,/);
document.body.textContent = res.join(" ")

I don't know what the ending delimiter is, so if it matters, update your question and I'll amend this expression:
/(?<=Limits::)(?:(.+?)\/)+/i
Searches for Limits::, then repeating strings ending with /, your words will be in group 1.

Related

match everything but a given string and do not match single characters from that string

Let's start with the following input.
Input = 'blue, blueblue, b l u e'
I want to match everything that is not the string 'blue'. Note that blueblue should not match, but single characters should (even if present in match string).
From this, If I replace the matches with an empty string, it should return:
Result = 'blueblueblue'
I have tried with [^\bblue\b]+
but this matches the last four single characters 'b', 'l','u','e'
Another solution:
(?<=blue)(?:(?!blue).)+(?=blue|$)|^(?:(?!blue).)+(?=blue|$)
Regex demo
If you regex engine support the \K flag, then we can try:
/blue\K|.*?(?=blue|$)/gm
Demo
This pattern says to match:
blue match "blue"
\K but then forget that match
| OR
.*? match anything else until reaching
(?=blue|$) the next "blue" or the end of the string
Edit:
On JavaScript, we can try the following replacement:
var input = "blue, blueblue, b l u e";
var output = input.replace(/blue|.*?(?=blue|$)/g, (x) => x != "blue" ? "" : "blue");
console.log(output);

Dart Regex: Only allow dot and numbers

I need to format the price string in dart.
String can be: ₹ 2,19,990.00
String can be: $1,114.99
String can be: $14.99
What I tried:
void main() {
String str = "₹ 2,19,990.00";
RegExp regexp = RegExp("(\\d+[,.]?[\\d]*)");
RegExpMatch? match = regexp.firstMatch(str);
str = match!.group(1)!;
print(str);
}
What my output is: 2,19
What my output is: 1,114
What my output is: 14.99
Expected output: 219990.00
Expected output: 1114.99
Expected output: 14.99 (This one is correct because there is no comma)
The simplest solution would be to replace all non-digit/non-dot characters with nothing.
The most efficient way to do that is:
final re = RegExp(r"[^\d.]+");
String sanitizeCurrency(String input) => input.replaceAll(re, "");
You can't do it by matching because a match is always contiguous in the source string, and you want to omit the embedded ,s.
You can use this regex for search:
^\D+|(?<=\d),(?=\d)
And replace with an empty string i.e. "".
RegEx Details:
^: Start
\D+: Match 1+ non-digit characters
|: OR
(?<=\d),(?=\d): Match a comma if it surrounded with digits on both sides
RegEx Demo
Code: Using replaceAll method:
str = str.replaceAll(RegExp(r'^\D+|(?<=\d),(?=\d)'), '');

Remove given string from both start and end of a word

Data :
col 1
AL GHAITHA
AL ASEEL
EMARAT AL
LOREAL
ISLAND CORAL
My code :
def remove_words(df, col, letters):
regular_expression = '^' + '|'.join(letters)
df[col] = df[col].apply(lambda x: re.sub(regular_expression, "", x))
Desired output :
col 1
GHAITHA
ASEEL
EMARAT
LOREAL
ISLAND CORAL
SUNRISE
Function call :
letters = ['AL','SUPERMARKET']
remove_words(df=df col='col 1',letters=remove_letters)
Basically, i wanted remove the letters provided either at the start or end. ( note : it should be seperate string)
Fog eg : "EMARAT AL" should become "EMARAT"
Note "LOREAL" should not become "LORE"
Code to build the df :
raw_data = {'col1': ['AL GHAITHA', 'AL ASEEL', 'EMARAT AL', 'LOREAL UAE',
'ISLAND CORAL','SUNRISE SUPERMARKET']
}
df = pd.DataFrame(raw_data)
You may use
pattern = r'^{0}\b|\b{0}$'.format("|".join(map(re.escape, letters)))
df['col 1'] = df['col 1'].str.replace(pattern, r'\1').str.strip()
The (?s)^{0}\b|(.*)\b{0}$'.format("|".join(map(re.escape, letters)) pattern will create a pattern like (?s)^word\b|(.*)\bword$ and it will match word as a whole word at the start and end of the string.
When checking the word at the end of the string, the whole text before it will be captured into Group 1, hence the replacement pattern contains the \1 placeholder to restore that text in the resulting string.
If your letters list contains items only composed with word chars you may omit map with re.escape, replace map(re.escape, letters) with letters.
The .str.strip() will remove any resulting leading/trailing whitespaces.
See the regex demo.

How to regex match everything but long words?

I would like to select all long words from a string: re.findall("[a-z]{3,}")
However, for a reason I can use substitute only. Hence I need to substitute everything but words of 3 and more letters by space. (e.g. abc de1 fgh ij -> abc fgh)
How would such a regex look like?
The result should be all "[a-z]{3,}" concatenated by spaces. However, you can use substitution only.
Or in Python: Find a regex such that
re.sub(regex, " ", text) == " ".join(re.findall("[a-z]{3,}", text))
Here is some test cases
import re
solution_regex="..."
for test_str in ["aaa aa aaa aa",
"aaa aa11",
"11aaa11 11aa11",
"aa aa1aa aaaa"
]:
expected_str = " ".join(re.findall("[a-z]{3,}", test_str))
print(test_str, "->", expected_str)
if re.sub(solution_regex, " ", test_str)!=expected_str:
print("ERROR")
->
aaa aa aaa aa -> aaa aaa
aaa aa11 -> aaa
11aaa11 11aa11 -> aaa
aa aa1aa aaaa -> aaaa
Note that space is no different than any other symbol.
\b(?:[a-z,A-Z,_]{1,2}|\w*\d+\w*)\b
Explanation:
\b means that the substring you are looking for start and end by border of word
(?: ) - non captured group
\w*\d+\w* Any word that contains at least one digit and consists of digits, '_' and letters
Here you can see the test.
You can use the regex
(\s\b(\d*[a-z]\d*){1,2}\b)|(\s\b\d+\b)
and replace with an empty string, here is a python code for the same
import re
regex = r"(\s\b(\d*[a-z]\d*){1,2}\b)|(\s\b\d+\b)"
test_str = "abcd abc ad1r ab a11b a1 11a 1111 1111abcd a1b2c3d"
subst = ""
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0)
if result:
print (result)
here is a demo
In Autoit this works for me
#include <Array.au3>
$a = StringRegExp('abc de1 fgh ij 234234324 sdfsdfsdf wfwfwe', '(?i)[a-z]{3,}', 3)
ConsoleWrite(_ArrayToString($a, ' ') & #CRLF)
Result ==> abc fgh sdfsdfsdf wfwfwe
import re
regex = r"(?:^|\s)[^a-z\s]*[a-z]{0,2}[^a-z\s]*(?:\s|$)"
str = "abc de1 fgh ij"
subst = " "
result = re.sub(regex, subst, str)
print (result)
Output:
abc fgh
Explanation:
(?:^|\s) : non capture group, start of string or space
[^a-z\s]* : 0 or more any character that is not letter or space
[a-z]{0,2} : 0, 1 or 2 letters
[^a-z\s]* : 0 or more any character that is not letter or space
(?:\s|$) : non capture group, space or end of string
With the other ideas posted here, I came up with an answer. I can't believe I missed that:
([^a-z]+|(?<![a-z])[a-z]{1,2}(?![a-z]))+
https://regex101.com/r/IIxkki/2
Match either non-letters, or up to two letters bounded by non-letters.

.Net Regular Expression(Regex)

VB.NET separate strings using regex split?
Im having a logical error with the pattern string variable, the error occur after i extend the string from "(-)" to "(-)(+)(/)(*)"..
Dim input As String = txtInput.Text
Dim pattern As String = "(-)(+)(/)(*)"
Dim substrings() As String = Regex.Split(input, pattern)
For Each match As String In substrings
lstOutput.Items.Add(match)
This is my output when my pattern string variable is "-" it works fine
input: dog-
output: dog
-
My desired output(This is want i want to happen) but there is something wrong with the code.. its having an error after i did this "(-)(+)(/)()" even this
"(-)" + "(+)" + "(/)" + "()"
input: dog+cat/tree
output: dog
+
cat
/
tree
when space character input from textbox to listbox
input: dog+cat/ tree
output: dog
+
cat
/
tree
You need a character class, not the sequence of subpatterns inside separate capturing gorups:
Dim pattern As String = "([+/*-])"
This pattern will match and capture into Group 1 (and thus, all the captured values will be part of the resulting array) a char that is either a +, /, * or -. Note the position of the hyphen: since it is the last char in the character class, it is treated as a literal -, not a range operator.
See the regex demo: