Regular Expression for matching word - regex

This may be very easy but for some reason i am unable to get the expression. I want to find position/index of all matching words in a given string. for example
"THIS IS AND NAND XOR NOR AATD". now, I want to find index of matching string starting with A and can have any char between A-Z but must end with T or D. So the result should look like [9,AND][14,AND][24,AAT][25,ATD]
my expression (?s)(A.[TD]) is missing the last index. Thanks in advance. I am using python.

If you are trying to do this by using a regular expression, you need a Positive Lookahead assertion. I replaced the dot in your regular expression with [A-Z] since you stated you want to match word characters.
>>> import re
>>> p = re.compile(r'(?=(A[A-Z][TD]))')
>>> for m in p.finditer('THIS IS AND NAND XOR NOR AATD'):
... print [m.start() + 1, m.group(1)]
[9, 'AND']
[14, 'AND']
[26, 'AAT']
[27, 'ATD']

You're not actually matching words but sequences, and the problem is that you are looking at capturing overlapping sequences.
See Overlapping regex matches for a discussion on the subject.

first match text using:
/^(.*)(A[A_Z]*[TD])/g
then index of matched element would be length of first matched sequence!

Related

python regular expression: how to extract 'A =BC= D' -> 'BC'

I'm at a loss because I don't know how to write regular expressions of python to extract particular strings such as A =BC= D =EF= -> 'BC', 'EF. I searched a lot but couldn't write this operation. please help.
Something like this
=..=
regex.101
result:
Match 1
Full match 2-6 =BC=
Match 2
Full match 9-13 =EF=
Here is a nice tutorial:
Regex tutorial — A quick cheatsheet by examples
You could use =([^=]+)= to extract character (except =) any (non-zero) number of times. You can extract the contents within the equal signs using groups.
If you want to match exactly two characters within equal signs, =([^=]{2})= should do.
First you'll need to use the Regex library
import re
Then you can use re.findall(pattern, string) to get a list of all the substrings that match your pattern.
It's not clear from your question what defines the 'particular strings' you are looking for. Assuming you are looking for everything between two equals signs, but not greedily (not including equals signs inside), you could use the regex "=(.*?)=".
import re
m = re.findall("=(.*?)=", "A =BC= D =EF=")
Result:
>>>m
['BC', 'EF']

regular expression to contain all strings that don't contain a pattern

I have a pattern 'NewTree' and I want to get all strings that don't contain this pattern 'NewTree'. How do I use regex to do the filter?
So if I have 1.BoostKite 2.SetTree 3. ComeNewTreeNow
Then the output should be BoostKite and SetTree.
Any suggestions? I wanted regex that can work anywhere and not use any language specific function.
You can try using a Negative Lookahead if you want to use a regular expression.
^(?!.*NewTree).*$
Live Demo
Alternatively you can use the alternation operator in context placing what you want to exclude on the left, ( saying throw this away, it's garbage ) and place what you want to match in a capturing group on the right side.
\w*NewTree\w*|([a-zA-Z]+)
Live Demo
In Python:
( The strings being in list context, as you commented 'array' above )
>>> import re
>>> regex = re.compile(r'^(?!.*NewTree).*$')
>>> mylst = ['BoostKite', 'SetTree', 'ComeNewTree', 'NewTree']
>>> matches = [x for x in mylst if regex.match(x)]
['BoostKite', 'SetTree']
If it is just a long string of multiple words and you want to ignore the words that contain NewTree
>>> s = '1.BoostKite 2.SetTree 3. ComeNewTreeNow 4. foo 5. bar'
>>> filter(None, re.findall(r'\w*NewTree\w*|([a-zA-Z]+)', s))
['BoostKite', 'SetTree', 'foo', 'bar']
You can do this without a regular expression as well.
>>> mylst = ['BoostKite', 'SetTree', 'ComeNewTree', 'NewTree']
>>> matches = [x for x in mylst if "NewTree" not in x]
['BoostKite', 'SetTree']
Match each word with the regex \w+NewTree\b. It returns true if it ends with NewTree
Use i modifier for case insensitive match (ignores case of [a-zA-Z])
Use \w* instead of \w+ in above regex if you want to match for NewTree word as well.
If you are looking for contains NewTree then try this regex \w*NewTree\w*\b
I think you can do this in general in the manner of the following example for your specific case:
^(([^N]|N[^e]|Ne[^w]|New[^T]|NewT[^r]|NewTr[^e]|NewTre[^e])+)?(.|..|...|....|.....)?$
So far what I have here is a near miss. It will not match any string that has substring NewTree. But it will not match every string that is free of the substring NewTree. In particular it will not match Nvwxyz.

Regular expression which will match if there is no repetition

I would like to construct regular expression which will match password if there is no character repeating 4 or more times.
I have come up with regex which will match if there is character or group of characters repeating 4 times:
(?:([a-zA-Z\d]{1,})\1\1\1)
Is there any way how to match only if the string doesn't contain the repetitions? I tried the approach suggested in Regular expression to match a line that doesn't contain a word? as I thought some combination of positive/negative lookaheads will make it. But I haven't found working example yet.
By repetition I mean any number of characters anywhere in the string
Example - should not match
aaaaxbc
abababab
x14aaaabc
Example - should match
abcaxaxaz
(a is here 4 times but it is not problem, I want to filter out repeating patterns)
That link was very helpful, and I was able to use it to create the regular expression from your original expression.
^(?:(?!(?<char>[a-zA-Z\d]+)\k<char>{3,}).)+$
or
^(?:(?!([a-zA-Z\d]+)\1{3,}).)+$
Nota Bene: this solution doesn't answer exaactly to the question, it does too much relatively to the expressed need.
-----
In Python language:
import re
pat = '(?:(.)(?!.*?\\1.*?\\1.*?\\1.*\Z))+\Z'
regx = re.compile(pat)
for s in (':1*2-3=4#',
':1*1-3=4#5',
':1*1-1=4#5!6',
':1*1-1=1#',
':1*2-a=14#a~7&1{g}1'):
m = regx.match(s)
if m:
print m.group()
else:
print '--No match--'
result
:1*2-3=4#
:1*1-3=4#5
:1*1-1=4#5!6
--No match--
--No match--
It will give a lot of work to the regex motor because the principle of the pattern is that for each character of the string it runs through, it must verify that the current character isn't found three other times in the remaining sequence of characters that follow the current character.
But it works, apparently.

how to create regular expression for this sentence?

i have following statement {$("#aprilfoolc").val("HoliWed27"); $("#UgadHieXampp").val("ugadicome");}.and i want to get the string with combination.i have written following regex but it is not working.
please help!
(?=[\$("#]?)[\w]*(?<=[")]?)
Your lookaround assertions are using character classes by mistake, and you've confused lookbehind and lookahead. Try the following:
(?<=\$\(")\w*(?="\))
You could use this simpler one :
'{$("#aprilfoolc").val("HoliWed27");}'.match(/\$\(\"#(\w+)\"[^"]*"(\w+)"/)
This returns
["$("#aprilfoolc").val("HoliWed27"", "aprilfoolc", "HoliWed27"]
where the strings you want are at indexes 1 and 2.
This construction
(?=[\$*"#]?)
will match a lookahead, but only optional -- the character set is followed by a ?. This kind of defeats the next part,
[\w]
which matches word characters only. So the lookahead will never match. Similar, this part
(?<=[")])
will also never match, because logically there can never be one of the characters " or ) at the end of a string that matches \w only. Again, since this portion is optional (that ? at the end again) it will simply never match.
It's a bit unclear what you are after. Strings inside double quotes, yes, but in the first one you want to skip the hash -- why? Given your input and desired output, this ought to work:
\w+(?=")
Also possible:
/\("[#]?(.*?)"\)/
import re
s='{$("#aprilfoolc").val("HoliWed27");}'
f = re.findall(r'\("[#]?(.*?)"\)',s)
for m in f:
print m
I don't know why, but if you want capturing of two groups simultaneously, so:
/\("#(.*?)"\).*?\("(.*?)"\)/
import re
s='{$("#aprilfoolc").val("HoliWed27");}'
f = re.findall(r'\("#(.*?)"\).*?\("(.*?)"\)',s)
for m in f:
print m[0],m[1]
In JavaScript:
var s='{$("#aprilfoolc").val("HoliWed27")';
var re=/\("#(.*?)"\).*?\("(.*?)"\)/;
alert(s.match(re));

Python: RE only captures first and last match

I'm trying to make a Regular Expression that captures the following:
- XX or XX:XX, up to 6 repetitions (XX:XX:XX:XX:XX:XX), where X is a hexadecimal number.
In other words, I'm trying to capture MAC addresses than can range from 1 to 6 bytes.
regex = re.compile("^([0-9a-fA-F]{2})(?:(?:\:([0-9a-fA-F]{2})){0,5})$")
The problem is that if I enter for example "11:22:33", it only captures the first match and the last, which results in ["11", "22"].
The question: is there any method that {0,5} character will let me catch all repetitions, and not the last one?
Thanks!
Not in Python, no. But you can first check the correct format with your regex, and then simply split the string at ::
result = s.split(':')
Also note that you should always write regular expressions as raw strings (otherwise you get problems with escaping). And your outer non-capturing group does nothing.
Technically there is a way to do it with regex only, but the regex is quite horrible:
r"^([0-9a-fA-F]{2})(?:([0-9a-fA-F]{2}))?(?:([0-9a-fA-F]{2}))?(?:([0-9a-fA-F]{2}))?(?:([0-9a-fA-F]{2}))?(?:([0-9a-fA-F]{2}))?$"
But here you would always get six captures, just that some might be empty.