Regex for matching this string

Regex for matching this string - regex

With python ( regex module ), I am triying to substitute 'x' for each letter 'c' in those strings occurring in a text and:
delimited by 'a', at the left, and 'b' at the right, and
with no more 'a's and 'b's in them.
Example:
cuacducucibcl -> cuaxduxuxibcl
How can I do this?
Thank you.

With the standard re module in Python, you can use a[^ab]+b to match the string which starts and end with a and b and doesn't have any occurence of a or b in between, then supply a replacement function to take care of the replacement of c:
>>> import re
>>> re.sub('a[^ab]+b', lambda m: m.group(0).replace('c', 'x'), 'cuacducucibcl')
'cuaxduxuxibcl'
Document of re.sub for reference.

Use the below regex and then replace the matched c's with x . For this , you need to install external regex module.
>>> import regex
>>> s = 'cuacducucibcl'
>>> regex.sub(r'((?:a|(?<!^)\G)[^abc\n]*)c', r'\1x', s)
'cuaxduxuxibcl'
DEMO

Related

python3: regex need to character to match but dont want in output

I have a string named
Set-Cookie: BIGipServerApp_Pool_SSL=839518730.47873.0000; path=/
I am trying to extract 839518730.47873.0000 from it. For exact string I am fine with my regex but If I include any digit before 1st = then its all going wrong.
No Digit
>>> m=re.search('[0-9.]+','Set-Cookie: BIGipServerApp_Pool_SSL=839518730.47873.0000; path=/')
>>> m.group()
'839518730.47873.0000'
With Digit
>>> m=re.search('[0-9.]+','Set-Cookie: BIGipServerApp_Pool_SSL2=839518730.47873.0000; path=/')
>>> m.group()
'2'
Is there any way I can extract `839518730.47873.0000' only but doesnt matter what else lies in the string.
I tried
>>> m=re.search('=[0-9.]+','Set-Cookie: BIGipServerApp_Pool_SSL=839518730.47873.0000; path=/')
>>> m.group()
'=839518730.47873.0000'
As well but its starting with '=' in the output and I dont want it.
Any ideas.
Thank you.

If your substring always comes after the first =, you can just use capture group with =([\d.]+) pattern:
import re
result = ""
m = re.search(r'=([0-9.]+)','Set-Cookie: BIGipServerApp_Pool_SSL2=839518730.47873.0000; path=/')
if m:
result = m.group(1) # Get Group 1 value only
print(result)
See the IDEONE demo
The main point is that you match anything you do not need and match and capture (with the unescaped round brackets) the part of pattern you need. The value you need is in Group 1.

You can use word boundaries:
\b[\d.]+
RegEx Demo
Or to make match more targeted use lookahead for next semi-colon after your matched text:
\b[\d.]+(?=\s*;)
RegEx Demo2
Update :
>>> m.group(0)
'839518730.47873.0000'
>>> m=re.search(r'\b[\d.]+','Set-Cookie: BIGipServerApp_Pool_SSL2=839518730.47873.0000; path=/')
>>> m.group(0)
'839518730.47873.0000'
>>>

Finding out unknown matched words

I have a regex pattern:
import regex as re
re.sub(r'(.*)\bHello (.*) BGC$\b', "OTR", 'Hello People BGC')
This will replace to give OTR, but how do I find out what the matched characters are within the (.*)?
Using regex==2016.1.10, Python 3.5.1

Compile the pattern and then call match() and sub() separately:
>>> pattern = re.compile(r'^Hello (.*?) BGC$')
>>> s = 'Hello People BGC'
>>> pattern.match(s).group(1)
'People'
>>> pattern.sub("OTR", s)
'OTR'

Regex to catch a string without () in 3 patterns like abc(ef) ,(ef)abc and (ef)abc(gh)

I have tested this Regex
(?<=\))(.+?)(?=\()|(?<=\))(.+?)\b|(.+?)(?=\()
but it doesn't work for strings like this pattern (ef)abc(gh).
I got a result like this "(ef)abc".
But these 3 regexes (?<=\))(.+?)(?=\() , (?<=\))(.+?)\b, (.+?)(?=\()
do work separately for "(ef)abc(gh)", "(ef)abc" ,"abc(ef)" .
can anyone tell me where the problem is or how can I get the expected result?

Assuming you are looking to match the text from between the elements in parenthesis, try this:
^(?:\(\w*\))?([\w]*)(?:\(\w*\))?$
^ - beginning of string
(?:\(\w*\))? - non-capturing group, match 0 or more alphabetic letters within parens, all optional
([\w]*) - capturing group, match 0 or more alphabetic letters
(?:\(\w*\))? - non-capturing group, match 0 or more alphabetic letters within parens, all optional
$ - end of string
You haven't specified what language you might be using, but here is an example in Python:
>>> import re
>>> string = "(ef)abc(gh)"
>>> string2 = "(ef)abc"
>>> string3 = "abc(gh)"
>>> p = re.compile(r'^(?:\(\w*\))?([\w]*)(?:\(\w*\))?$')
>>> m = re.search(p, string)
>>> m2 = re.search(p, string2)
>>> m3 = re.search(p, string3)
>>> print m.groups()[0]
'abc'
>>> print m2.groups()[0]
'abc'
>>> print m3.groups()[0]
'abc'

\([^)]+\)|([^()\n]+)
Try this.Just grab the capture or group.See demo.
https://regex101.com/r/tX2bH4/6

Your problem is that (.+?)(?=\() matches "(ef)abc" in "(ef)abc(gh)".
The easiest solution to this problem is be more explicit about what you are looking for. In this case by exchanging "any character" ., with "any character that is not a parenthesis" [^\(\)].
(?<=\))([^\(\)]+?)(?=\()|(?<=\))([^\(\)]+?)\b|([^\(\)]+?)(?=\()
A cleaner regexp would be
(?:(?<=^)|(?<=\)))([^\(\)]+)(?:(?=\()|(?=$))

regular expression to contain all strings that don't contain a pattern

I have a pattern 'NewTree' and I want to get all strings that don't contain this pattern 'NewTree'. How do I use regex to do the filter?
So if I have 1.BoostKite 2.SetTree 3. ComeNewTreeNow
Then the output should be BoostKite and SetTree.
Any suggestions? I wanted regex that can work anywhere and not use any language specific function.

You can try using a Negative Lookahead if you want to use a regular expression.
^(?!.*NewTree).*$
Live Demo
Alternatively you can use the alternation operator in context placing what you want to exclude on the left, ( saying throw this away, it's garbage ) and place what you want to match in a capturing group on the right side.
\w*NewTree\w*|([a-zA-Z]+)
Live Demo
In Python:
( The strings being in list context, as you commented 'array' above )
>>> import re
>>> regex = re.compile(r'^(?!.*NewTree).*$')
>>> mylst = ['BoostKite', 'SetTree', 'ComeNewTree', 'NewTree']
>>> matches = [x for x in mylst if regex.match(x)]
['BoostKite', 'SetTree']
If it is just a long string of multiple words and you want to ignore the words that contain NewTree
>>> s = '1.BoostKite 2.SetTree 3. ComeNewTreeNow 4. foo 5. bar'
>>> filter(None, re.findall(r'\w*NewTree\w*|([a-zA-Z]+)', s))
['BoostKite', 'SetTree', 'foo', 'bar']
You can do this without a regular expression as well.
>>> mylst = ['BoostKite', 'SetTree', 'ComeNewTree', 'NewTree']
>>> matches = [x for x in mylst if "NewTree" not in x]
['BoostKite', 'SetTree']

Match each word with the regex \w+NewTree\b. It returns true if it ends with NewTree
Use i modifier for case insensitive match (ignores case of [a-zA-Z])
Use \w* instead of \w+ in above regex if you want to match for NewTree word as well.
If you are looking for contains NewTree then try this regex \w*NewTree\w*\b

I think you can do this in general in the manner of the following example for your specific case:
^(([^N]|N[^e]|Ne[^w]|New[^T]|NewT[^r]|NewTr[^e]|NewTre[^e])+)?(.|..|...|....|.....)?$
So far what I have here is a near miss. It will not match any string that has substring NewTree. But it will not match every string that is free of the substring NewTree. In particular it will not match Nvwxyz.

How to match regex with same format but different in terms of character set?

Suppose i have a string and i want to match only the part where value is empty and not the part where value is present?
for ex : &lang=&val=1233
I need only &lang and not &val as it has an actual value?
I have this
&(.+)=(?!\s\S)
regex which matches &lang=&val= in the string.
Can anyone help me out

Use following regular expression:
(?:(?<=\?)|&)[^=]+=(?=&|$)
could be explained as:
(?: ....): non-capturing (does not make a group), this may not needed according to your purpose.
\?: escaped ? to match ? literally.
(?<=\?): meaning "preceded by ?": ? is not included to the result.
(?=&|$): meaning "followed by &" or ~at end of the input".
Followings are sample test in Python interactive shell:
>>> pattern = r'(?:(?<=\?)|&)[^=]+=(?=&|$)'
>>> re.findall(pattern, '&lang=&val=')
['&lang=', '&val=']
>>> re.findall(pattern, '&lang=&val=1233')
['&lang=']
>>> re.findall(pattern, '&lang=&val=&val2=123&val3=')
['&lang=', '&val=', '&val3=']
>>> re.findall(pattern, '?lang=&val=&val2=123&val3=')
['lang=', '&val=', '&val3=']
>>> re.findall(pattern, '?lang=blah&val=&val2=123&val3=')
['&val=', '&val3=']
>>> re.findall(pattern, 'www.html.com?user=&lang=eng&code=.in')

do you mean
(&|?)([^&=]+)=(&|$)
(you can use non capturing groups if you need)
but I would just build a hash of all query string parameters and pick the keys without values. it is cheaper.

Try this:
[?&]([^&]+)=(&|$)
The first group will have the name of your parameter.
Note that this regex will also catch an empty first parameter (val1 in foo.php?val1=&val2=ok)

Try this one:
(&([^=]+))=(?=&)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex for matching this string - regex

With python ( regex module ), I am triying to substitute 'x' for each letter 'c' in those strings occurring in a text and: delimited by 'a', at the left, and 'b' at the right, and with no more 'a's and 'b's in them. Example: cuacducucibcl -> cuaxduxuxibcl How can I do this? Thank you.

Use the below regex and then replace the matched c's with x . For this , you need to install external regex module. >>> import regex >>> s = 'cuacducucibcl' >>> regex.sub(r'((?:a|(?<!^)\G)[^abc\n]*)c', r'\1x', s) 'cuaxduxuxibcl' DEMO

Related

python3: regex need to character to match but dont want in output

Finding out unknown matched words

Regex to catch a string without () in 3 patterns like abc(ef) ,(ef)abc and (ef)abc(gh)

regular expression to contain all strings that don't contain a pattern

How to match regex with same format but different in terms of character set?

Categories

Resources