delete multiple characers between separator with regular expressions - regex

I have a string like "some_{abcd_etc}_text"
eveything between { } should be removed, including {} itself.
I need only the string "some_text" at the end.
How can this been done by regex?

You could use this expression:
{.*?}

Sure, just replace this with an empty string:
{[^}]+}
Here is a Python example:
>>> from re import sub
>>> s = r'some_{abcd_etc}_text'
>>> sub(r'{[^}]+}', '', s)
'some__text'

Related

Regex to not match a specific string, but with additional check

So for example I have this string
var = 'column1;column2;column3\r\nval1;val2;val3\r\n;val4;val5;val6\r\n'
I want to be able to find all \r\n and replace it with temp\r\n, but I want to ignore column3\r\n
Tried to do ^(?!.*column3).*$\r\n but the \r\n syntax does not work
You want to use a negative lookbehind, that is make the substitution when \r\n is not preceded by column3:
re.sub(r'(?<!column3)\r\n', r'temp\r\n', var)
For example:
>>> import re
>>>
>>> var = 'column1;column2;column3\r\nval1;val2;val3\r\n;val4;val5;val6\r\n'
>>> new_text = re.sub(r'(?<!column3)\r\n', r'temp\r\n', var)
>>> new_text
'column1;column2;column3\r\nval1;val2;val3temp\r\n;val4;val5;val6temp\r\n'
>>>

Comparing strings with regex

I basically want to match strings like: "something", "some,thing", "some,one,thing", but I want to not match expressions like: ',thing', '_thing,' , 'some_thing'.
The pattern I want to match is: A string beginning with only letters and the rest of the body can be a comma, space or letters.
Here's what I did:
import re
x=re.compile('^[a-zA-z][a-zA-z, ]*') #there's space in the 2nd expression here
stri='some_thing'
x.match(str)
It gives me:
<_sre.SRE_Match object; span=(0, 4), match='some'>
The thing is, my regex somehow works but, it actually extracts the parts of the string that do match, but I want to compare the entire string with the regular expression pattern and return False if it does not match the pattern. How do I do this?
You use [a-Z] which matches more thank you think.
If you want to match [a-zA-Z] for both you might use the case insensitive flag:
import re
x=re.compile('^[a-z][a-z, ]*$', re.IGNORECASE)
stri='some,thing'
if x.match(stri):
print ("Match")
else:
print ("No match")
Test
the easiest way would be to just compare the result to the original string.
import re
x=re.compile('^[a-zA-z][a-zA-z, ]*')
str='some_thing'
x.match(str).group(0) == str #-> False
str = 'some thing'
x.match(str).group(0) == str #-> True

Finding out unknown matched words

I have a regex pattern:
import regex as re
re.sub(r'(.*)\bHello (.*) BGC$\b', "OTR", 'Hello People BGC')
This will replace to give OTR, but how do I find out what the matched characters are within the (.*)?
Using regex==2016.1.10, Python 3.5.1
Compile the pattern and then call match() and sub() separately:
>>> pattern = re.compile(r'^Hello (.*?) BGC$')
>>> s = 'Hello People BGC'
>>> pattern.match(s).group(1)
'People'
>>> pattern.sub("OTR", s)
'OTR'

Get the also strings before numeric serarch using regular expression in python

I have a string from which I need to use re to get "PASS_MAX_DAYS 180" as an output and then replace it using re.sub to someother value, but when I do a re I am not able to get the sting
>>>_file = '#\nPASS_MAX_DAYS 180\nPASS_MIN_DAYS 1\nPASS_WARN_AGE 8\n'
>>> re.findall(r'PASS_MAX_DAYS\s*\b([0-9]{1,2}|1[0-7][0-9]|180)\b', _file, re.M)
['180']
Not sure where I am going wrong, any suggesting please
Turn the capturing group to non-capturing group because re.findall function returns only the characters present inside the groups, if the regex used has any capturing groups.
r"PASS_MAX_DAYS\s*\b(?:[0-9]{1,2}|1[0-7][0-9]|180)\b"
Example:
>>> _file = '#\nPASS_MAX_DAYS 180\nPASS_MIN_DAYS 1\nPASS_WARN_AGE 8\n'
>>> re.findall(r'PASS_MAX_DAYS\s*\b(?:[0-9]{1,2}|1[0-7][0-9]|180)\b', _file, re.M)
['PASS_MAX_DAYS 180']

Regex for matching this string

With python ( regex module ), I am triying to substitute 'x' for each letter 'c' in those strings occurring in a text and:
delimited by 'a', at the left, and 'b' at the right, and
with no more 'a's and 'b's in them.
Example:
cuacducucibcl -> cuaxduxuxibcl
How can I do this?
Thank you.
With the standard re module in Python, you can use a[^ab]+b to match the string which starts and end with a and b and doesn't have any occurence of a or b in between, then supply a replacement function to take care of the replacement of c:
>>> import re
>>> re.sub('a[^ab]+b', lambda m: m.group(0).replace('c', 'x'), 'cuacducucibcl')
'cuaxduxuxibcl'
Document of re.sub for reference.
Use the below regex and then replace the matched c's with x . For this , you need to install external regex module.
>>> import regex
>>> s = 'cuacducucibcl'
>>> regex.sub(r'((?:a|(?<!^)\G)[^abc\n]*)c', r'\1x', s)
'cuaxduxuxibcl'
DEMO