test_string: '**Amount** : $25k **Name** : James **Excess** : None Returned \n **In Suit?** Y **Venue** : SF **Insurance** : N/A \n **FTSA** : None listed'
import re
regex = r"(?:^|[^.?*,!-]*(?<=[.?\s*,!-]))(n/a)(?=[\s.?*!,-])[^.?*,!-]*[.?*,!-]"
subst = ""
result = re.sub(regex, subst, test_str, 0, re.IGNORECASE | re.MULTILINE)
I tried to extract '**Insurance** : N/A' from the string. But my above code doesn't work. How can I make it?
Thanks in advance!

I would treat the content like a (semi-structured) key-value file format.
You can match the key-value pairs with a regex like this:
(\*\*[a-zA-Y ?]+\*\*) : ((?:(?!\*\*).)*)(?= |$)
(\*\*[a-zA-Y ?]+\*\*) the key: you may have to adjust the character range
: the kv separator with surrounded by spaces
((?:(?!\*\*).)*) the value is captured with a tempered greedy token: everything but literal ** followed by (?= |$) the end of string $ or a separating space.
(?= |$)
Sample Code:
import re
regex = r"(\*\*[a-zA-Z ?]+\*\*) : ((?:(?!\*\*).)*)(?= |$)"
test_str = "**Amount** : $25k **Name** : James **Excess** : None Returned \\n **In Suit?** : Y **Venue** : SF **Insurance** : N/A \\n **FTSA** : None listed"
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
if == "**Insurance**":
print (


match everything but a given string and do not match single characters from that string

Let's start with the following input.
Input = 'blue, blueblue, b l u e'
I want to match everything that is not the string 'blue'. Note that blueblue should not match, but single characters should (even if present in match string).
From this, If I replace the matches with an empty string, it should return:
Result = 'blueblueblue'
I have tried with [^\bblue\b]+
but this matches the last four single characters 'b', 'l','u','e'
Another solution:
Regex demo
If you regex engine support the \K flag, then we can try:
This pattern says to match:
blue match "blue"
\K but then forget that match
| OR
.*? match anything else until reaching
(?=blue|$) the next "blue" or the end of the string
On JavaScript, we can try the following replacement:
var input = "blue, blueblue, b l u e";
var output = input.replace(/blue|.*?(?=blue|$)/g, (x) => x != "blue" ? "" : "blue");

Replace '-' with space if the next charcter is a letter not a digit and remove when it is at the start

I have a list of string i.e.
slist = ["-args", "-111111", "20-args", "20 - 20", "20-10", "args-deep"]
I want to remove the '-' from string where it is the first character and is followed by strings but not numbers or if before the '-' there is number/alphabet but after it is alphabets, then it should replace the '-' with space
So for the list slist I want the output as
["args", "-111111", "20 args", "20 - 20", "20-10", "args deep"]
I have tried
slist = ["-args", "-111111", "20-args", "20 - 20", "20-10", "args-deep"]
nlist = list()
for estr in slist:
nlist.append(re.sub("((^-[a-zA-Z])|([0-9]*-[a-zA-Z]))", "", estr))
print (nlist)
and i get the output
['rgs', '-111111', 'rgs', '20 - 20', '20-10', 'argseep']
You may use
nlist.append(re.sub(r"-(?=[a-zA-Z])", " ", estr).lstrip())
nlist.append(re.sub(r"-(?=[^\W\d_])", " ", estr).lstrip())
Result: ['args', '-111111', '20 args', '20 - 20', '20-10', 'args deep']
See the Python demo.
The -(?=[a-zA-Z]) pattern matches a hyphen before an ASCII letter (-(?=[^\W\d_]) matches a hyphen before any letter), and replaces the match with a space. Since - may be matched at the start of a string, the space may appear at that position, so .lstrip() is used to remove the space(s) there.
Here, we might just want to capture the first letter after a starting -, then replace it with that letter only, maybe with an i flag expression similar to:
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"^-([a-z])"
test_str = ("-args\n"
"20 - 20\n"
subst = "\\1"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE | re.IGNORECASE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
const regex = /^-([a-z])/gmi;
const str = `-args
20 - 20
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
If this expression wasn't desired, it can be modified or changed in
RegEx Circuit visualizes regular expressions:
One option could be to do 2 times a replacement. First match the hyphen at the start when there are only alphabets following:
Regex demo
In the replacement use an empty string.
Then capture 1 or more times an alphabet or digit in group 1, match - followed by capturing 1+ times an alphabet in group 2.
Regex demo
In the replacement use r"\1 \2"
For example
import re
regex1 = r"^-(?=[a-zA-Z]+$)"
regex2 = r"^([a-zA-Z0-9]+)-([a-zA-Z]+)$"
slist = ["-args", "-111111", "20-args", "20 - 20", "20-10", "args-deep"]
slist = list(map(lambda s: re.sub(regex2, r"\1 \2", re.sub(regex1, "", s)), slist))
['args', '-111111', '20 args', '20 - 20', '20-10', 'args deep']
Python demo

How to regex match everything but long words?

I would like to select all long words from a string: re.findall("[a-z]{3,}")
However, for a reason I can use substitute only. Hence I need to substitute everything but words of 3 and more letters by space. (e.g. abc de1 fgh ij -> abc fgh)
How would such a regex look like?
The result should be all "[a-z]{3,}" concatenated by spaces. However, you can use substitution only.
Or in Python: Find a regex such that
re.sub(regex, " ", text) == " ".join(re.findall("[a-z]{3,}", text))
Here is some test cases
import re
for test_str in ["aaa aa aaa aa",
"aaa aa11",
"11aaa11 11aa11",
"aa aa1aa aaaa"
expected_str = " ".join(re.findall("[a-z]{3,}", test_str))
print(test_str, "->", expected_str)
if re.sub(solution_regex, " ", test_str)!=expected_str:
aaa aa aaa aa -> aaa aaa
aaa aa11 -> aaa
11aaa11 11aa11 -> aaa
aa aa1aa aaaa -> aaaa
Note that space is no different than any other symbol.
\b means that the substring you are looking for start and end by border of word
(?: ) - non captured group
\w*\d+\w* Any word that contains at least one digit and consists of digits, '_' and letters
Here you can see the test.
You can use the regex
and replace with an empty string, here is a python code for the same
import re
regex = r"(\s\b(\d*[a-z]\d*){1,2}\b)|(\s\b\d+\b)"
test_str = "abcd abc ad1r ab a11b a1 11a 1111 1111abcd a1b2c3d"
subst = ""
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0)
if result:
print (result)
here is a demo
In Autoit this works for me
#include <Array.au3>
$a = StringRegExp('abc de1 fgh ij 234234324 sdfsdfsdf wfwfwe', '(?i)[a-z]{3,}', 3)
ConsoleWrite(_ArrayToString($a, ' ') & #CRLF)
Result ==> abc fgh sdfsdfsdf wfwfwe
import re
regex = r"(?:^|\s)[^a-z\s]*[a-z]{0,2}[^a-z\s]*(?:\s|$)"
str = "abc de1 fgh ij"
subst = " "
result = re.sub(regex, subst, str)
print (result)
abc fgh
(?:^|\s) : non capture group, start of string or space
[^a-z\s]* : 0 or more any character that is not letter or space
[a-z]{0,2} : 0, 1 or 2 letters
[^a-z\s]* : 0 or more any character that is not letter or space
(?:\s|$) : non capture group, space or end of string
With the other ideas posted here, I came up with an answer. I can't believe I missed that:
Match either non-letters, or up to two letters bounded by non-letters.

Regex to match repeated pattern after a string

I need a regex that extract pattern after specific word (her like Limits::)
i have teststring ,So let's say the text is always between delimiter !Limits::****! :
*ksjfl kfj sdfasdfaf dfasf asd sdf a dfasd fdaf ad f afdfaf dfad bla bla ksfajs ldsfskj !Limits::WLo1/WHi1/WHi1/WHi1,WLo2/WHi2/WHi/WHi2,.hier repeated pattern..,WLon/WHin/CLon/CHin!
fasdfakl skdfkas sflas fasf sdf afasf
i just want only words :
i have tested like (?:!\w+::(?:(\w+)/(\w+)/(\w+)/(\w+)))|(?:,(\w+)/(\w+)/(\w+)/(\w+))+.*!, with fail
Regular expressions:
/(W.*|C.*)(?=\/|!|,)/g : match words beginning with W or C followed by / , !, or ,
/\/|,.*(?=,)|,/ : remove / or , or any characters followed by , or , from string returned from first RegExp
var str = "*ksjfl kfj sdfasdfaf dfasf asd sdf a dfasd fdaf ad f afdfaf dfad bla bla ksfajs ldsfskj !Limits::WLo1/WHi1/WHi1/WHi1,WLo2/WHi2/WHi/WHi2,.hier repeated pattern..,WLon/WHin/CLon/CHin! fasdfakl skdfkas sflas fasf sdf afasf";
var res = str.match(/(W.*|C.*)(?=\/|!|,)/g)[0].split(/\/|,.*(?=,)|,/);
document.body.textContent = res.join(" ")
I don't know what the ending delimiter is, so if it matters, update your question and I'll amend this expression:
Searches for Limits::, then repeating strings ending with /, your words will be in group 1.

Regex that matches specific spaces

I've been trying to do this Regex for a while now. I'd like to create one that matches all the spaces of a text, except those in literal string.
123 Foo "String with spaces"
Space between 123 and Foo would match, as well as the one between Foo and "String with spaces", but only those two.
A common, simple strategy for this is to count the number of quotes leading up to your location in the string. If the count is odd, you are inside a quoted string; if the amount is even, you are outside a quoted string. I can't think of a way to do this in regular expressions, but you could use this strategy to filter the results.
You could use re.findall to match either a string or a space and then afterwards inspect the matches:
import re
hits = re.findall("\"(?:\\\\.|[^\\\"])*\"|[ ]", 'foo bar baz "another\\" test\" and done')
for h in hits:
print "found: [%s]" % h
found: [ ]
found: [ ]
found: [ ]
found: ["another\" test"]
found: [ ]
found: [ ]
A short explanation:
" # match a double quote
(?: # start non-capture group 1
\\\\. # match a backslash followed by any character (except line breaks)
| # OR
[^\\\"] # match any character except a '\' and '"'
)* # end non-capture group 1 and repeat it zero or more times
" # match a double quote
| # OR
[ ] # match a single space
If this ->123 Foo "String with spaces" <- is your structure for a line that is to say text followed by a quoted text you could create 2 groups the quoted and the unquoted text and an tackle them separately.
ex.regex -> (.*)(".*") where $1 should contain ->123 Foo <- and $2 ->"String with spaces"<-
java example.
String aux = "123 Foo \"String with spaces\"";
String regex = "(.*)(\".*\")";
String unquoted = aux.replaceAll(regex, "$1").replace(" ", "");
String quoted = aux.replaceAll(regex, "$2");
javascript example.
str='1 23 Foo \"String with spaces\"';
re = new RegExp('(.*)(".*")') ;
var quoted = str.replace(re, "$1");
var unquoted = str.replace(re, "$2");
document.write (quoted.split(' ').join('')+unquoted);
// -->