I need to find brackets that contain any letter.
for example:
a17(1d34) xc
the previous brackets contain the letter d.
So I need to find: (1d34)
The following regex can do the job:
\([^a-z]*[a-z]+[^a-z]*\) with flags g and i
You can test it with the live demo at regex101 to check if it works with all the cases you expect.
Also I don't know the language you are using, regex101 let's you generate code for some.
Breakthrough
\( matches the literal opening bracket
[^a-z]* matches any character before the letter that is not a letter (can be nothing)
the ^ character right after an opening range inverts the match
[a-z]+ matches at least one letter
[^a-z]* matches any character after the letter that is not a letter (can be nothing)
\) matches the literal closing bracket
the flag i (case insensitive) extends the range a to z, to uppercase also
the flag g (global match) lets you match multiple times
Hope it helps!
/\((?:\d*[A-Z]+\d*)+\)/gi will match your brackets that contains at least 1 letter.
var rgx = /\((?:\d*[A-Z]+\d*)+\)/gi;
rgx.exec("a17(1d34) xc"); //(1d34)
Related
There are a thousand regular expression questions on SO, so I apologize if this is already covered. I did look first.
I have string:
Name Subname 11X22 88X620 AB33(20) YA5619 77,66
I need to capture this string: YA5619
What I am doing is just finding AB33(20) and after this I am capturing until first white space. But AB33(20) can be AB-33(20) or AB33(-20) or AB33(-1).
My preg_match regex is: (?<=\bAB\d{2}\(\d{2}\)\s).+?(?=\s)
Why I am getting error when I change from \d{2} to \d+?
For final result I was thinking this regix will work but no:
(?<=\bAB-?\d+\(-?\d+\)\s).+?(?=\s)
Any ideas what I am doing wrong?
With most regex flavors, lookbehind needs to evaluate to a fixed-length sequence, so you can't use variable quantifiers like * or + or even {1,2}.
Instead of using lookaround, you can simply match your marker pattern and then forget it with \K.
AB-?\d+(?:\(-?\d+\))? \K[^ ]+
demo: https://regex101.com/r/8XXngH/1
It depends on the language. If it is in .NET for example, it matches due to the various length in the lookbehind.
Another solution might be to use a character class and add the character you would allow to match. Then match a whitespace character and capture in a group matching \S+ which matches 1+ times not a whitespace character.
\bAB[()\d-]+\s\K\S+
Explanation
\bAB Match literally prepended with word boundary to prevent AB being part of a larger match.
[()\d-]+ Match 1+ times any of the listed character in the character class
\s Match a whitespace char (or \s+ to match 1 or more)
\K Reset the starting point of the reported match( Forget what was matched)
\S+ Match in a group 1+ times not a whitespace character
Regex demo | Php demo
How do you match any one character with a regular expression?
A number of other questions on Stack Overflow sound like they promise a quick answer, but they are actually asking something more specific:
Regex for a string of repeating characters and another optional one at the end
regex to match a single character that is anything but a space
Replace character in regex match only
Match any single character
Use the dot . character as a wildcard to match any single character.
Example regex: a.c
abc // match
a c // match
azc // match
ac // no match
abbc // no match
Match any specific character in a set
Use square brackets [] to match any characters in a set.
Use \w to match any single alphanumeric character: 0-9, a-z, A-Z, and _ (underscore).
Use \d to match any single digit.
Use \s to match any single whitespace character.
Example 1 regex: a[bcd]c
abc // match
acc // match
adc // match
ac // no match
abbc // no match
Example 2 regex: a[0-7]c
a0c // match
a3c // match
a7c // match
a8c // no match
ac // no match
a55c // no match
Match any character except ...
Use the hat in square brackets [^] to match any single character except for any of the characters that come after the hat ^.
Example regex: a[^abc]c
aac // no match
abc // no match
acc // no match
a c // match
azc // match
ac // no match
azzc // no match
(Don't confuse the ^ here in [^] with its other usage as the start of line character: ^ = line start, $ = line end.)
Match any character optionally
Use the optional character ? after any character to specify zero or one occurrence of that character. Thus, you would use .? to match any single character optionally.
Example regex: a.?c
abc // match
a c // match
azc // match
ac // match
abbc // no match
See also
A quick tutorial to teach you the basics of regex
A practice sandbox to try things out
Simple answer
If you want to match single character, put it inside those brackets [ ]
Examples
match + ...... [+] or +
match a ...... a
match & ...... &
...and so on. You can check your regular expresion online on this site: https://regex101.com/
(updated based on comment)
If you are searching for a single isolated character or a set of isolated characters within any string you can use this
\b[a-zA-Z]\s
this will find all single english characters in the string
similarly use
\b[0-9]\s
to find single digits like it will pick 9 but not 98 and so on
I am using the regex
(.*)\d.txt
on the expression
MyFile23.txt
Now the online tester says that using the above regex the mentioned string would be allowed (selected). My understanding is that it should not be allowed because there are two numeric digits 2 and 3 while the above regex expression has only one numeric digit in it i.e \d.It should have been \d+. My current expression reads. Zero of more of any character followed by one numeric digit followed by .txt. My question is why is the above string passing the regex expression ?
This regex (.*)\d.txt will still match MyFile23.txt because of .* which will match 0 or more of any character (including a digit).
So for the given input: MyFile23.txt here is the breakup:
.* # matches MyFile2
\d # matched 3
. # matches a dot (though it can match anything here due to unescaped dot)
txt # will match literal txt
To make sure it only matches MyFile2.txt you can use:
^\D*\d\.txt$
Where ^ and $ are anchors to match start and end. \D* will match 0 or more non-digit.
The pattern you have has one group (.*) which would match using your example:MyFile2
because the . allows any character.
Furthermore the . in the pattern after this group is not escaped which will result in allowing another character of any kind.
To avoid this use:
(\D*)\d+\.txt
the group (\D*) would now match all non digit characters.
Here is the explanation, your "MyFile23.txt" matches the regex pattern:
A literal period . should always be escaped as \. else it will match "any character".
And finally, (.*) matches all the string from the beginning to the last digit (MyFile2). Have a look at the "MATCH INFORMATION" area on the right at this page.
So, I'd suggest the following fix:
^\D*\d\.txt$ = beginning of a line/string, non-digit character, any number of repetitions, a digit, a literal period, a literal txt, and the end of the string/line (depending on the m switch, which depends on the input string, whether you have a list of words on separate lines, or just a separate file name).
Here is a working example.
I noticed some interesting behaviour with some regex work I am doing, and I'd like some insight.
From what I understand, the word character, \w should match the following [a-zA-Z_0-9]
Given this input,
0000000060399301+0000000042456971+0000000
What should this regex
(\d+)\w
Capture?
I would expect it to capture 0000000060399301 but it actually captures 000000006039930
Is there something I am missing? Why is the 1 dropped from the end?
I noticed if I changed the regex to
(\d+\w)
It captures correctly i.e. including the 1
Anyone care to explain? Thanks
You require the regex to match a trailing word character - that would be the 1.
It cannot be another character, because
+ is not a word class character
+ is not a digit
matching is greedy
\d+ - matches one or more digit characters.
\w+ - matches one or more word characters. [A-Za-z\d_]
So with this string 0000000060399301+, \d+ in this (\d+)\w regex matches all the digits (including the 1 before +) at very first, since the following pattern is \w , regex engine tries to find a match, so it backtracks one character to the left and forces \w to match the digit before + . Now the captured group contains 000000006039930 and the last 1 is matched by \w
The 1 is being dropped because \w isn't in the capture group.
How to regex match words that have digits or any non-characters inside words, excluding when digits and non-characters (\/°†#*()'\s+&;±|-\^) are at the end of word? I need to match dAS2a but not dASI6. Could not adapt the Regex to match string not ending with pattern solution.
dA/Sa
dAS2a
dASI/
dASI6
http://regex101.com/r/qM4dV7/1 failed.
This should work just fine (if you use the gmi modifiers):
^.*[a-z]$
Demo
You said each word is on a new line. Using the m modifier we can anchor each expression to the beginning/end of a line with ^ and $ anchors (without the modifier, this means beginning/end of the string). Then you said a word can essentially be anything (.*) as long as it ends in a non-digit or non-special character (I took that to mean a "letter", [a-z] with the i modifier).