I am so sorry, I know this is a simple question, which is not appropriate here, but I am terrible in regex.
I use preg_match with a pattern of (numbers A) to match the following replaces with the substrings
2A -> <i>2A</i>
100 A -> <i>100 A</i>
84.55A -> <i>84.55A</i>
92.1 A -> <i>92.1 A</i>
The numbers can be separated from the character or not
The numbers can be decimal
The letter should not be the begging of a word (not matching 4 All;
in fact, A should be followed by a space or period or linebreak)
My problem is to apply OR conditions to match a character which may exist or not to have a single match to be replaced as
$str = preg_replace($pattern, '<i>$1</i>', $str);
I can suggest
'~\b(?<![\d.])\d*\.?\d+\s*A\b~'
See the regex demo. Replace with '<i>$0</i>' where the $0 is the backreference to the whole match.
Details:
\b - leading word boundary
(?<![\d.]) - a negative lookbehind that fails the match if there is a dot or digit before the current location (NOTE: this is added to avoid matching 33.333.4444 A like strings, just remove if not necessary)
\d*\.?\d+ - a usual simplified float/int value regex (0+ digits, an optional . and 1+ digits) (NOTE: if you need a more sophisticated regex for this, see Matching Floating Point Numbers with a Regular Expression)
\s* - 0+ whitespaces
A\b - a whole word A (here, \b is a trailing word boundary).
Related
Appreciating regex but still beginning.
I tried many workarounds but can't figure how to solve my problem.
String A : 4 x 120glgt
String B : 120glgt
I'd like the proper regex to return 120 as the number after "x".
But sometimes there won't be "x". So, be it [A] or [B] looking for one unique approach.
I tried :
to start the search from the END
Start right after the "x"
I clearly have some syntax issues and didn't quite get the logic of (?=)
(?=[^x])(?=[0-9]+)
So looking forward to learn with your help
As you tagged pcre, you could optionally match the leading digits followed by x and use \K to clear the match buffer to only match the digits after it.
^(?:\d+\h*x\h*)?\K\d+
The pattern matches:
^ Start of string
(?:\d+\h*x\h*)? Optionally match 1+ digits followed by x between optional spaces
\K Forget what is matched so far
\d+ Match 1+ digits
See a regex demo.
If you want to use a lookahead variant, you might use
\d+(?=[^\r\n\dx]*$)
This pattern matches:
\d+ Match 1+ digits
(?= Positive lookahead, assert what is to the right is
[^\r\n\dx]*$ Match optional repetitions of any char except a digit, x or a newline
) Close the lookahead
See another regex demo.
I need to match only section numbers in text with a dot at the end. For example, having a string:
'A.8. 8.4.2.4.1.2. 9.1. 9. 10.0.1.1. 9 0.1 100. 100.5. A.500'
What I want to match: [A.8., 8.4.2.4.1.2., 9.1., 9., 10.0.1.1., 100., 100.5.]
What I have matched: [A.8., 8.4.2.4.1.2., 9.1., 10.0.1.1., 100.5.]
My regex is (?:\d+|A)\.[\d+\.]*\.
Those numbers, without the dot at the end are not matched, which is correct. However, singular numbers with a dot should be matched but are not (such as '9.' and '100.')
How can I update my regex to make it work?
You may use
\b(?:\d+|A)(?:\.\d+)*\.\B
See the regex demo.
If the matches are whitespace-separated you can also use (granted the regex engine supports lookbehinds):
(?<!\S)(?:\d+|A)(?:\.\d+)*\.(?!\S)
See the regex demo.
Details:
\b - a word boundary (start of string or a non-word char should immediately precede the next digit or A)
(?:\d+|A) - one or more digits, or an A
(?:\.\d+)* - zero or more repetitions of a dot and then one or more digits
\. - a dot
\B - a position other than a word boundary (end of string or a non-word char should follow immediately)
The (?<!\S) / (?!\S) require a whitespace or start/end of string positions on both ends of the match.
I'm trying a regex fro Alpha Numeric of length 7 (with positions 1,3,4 as characters and positions 2,5,6,7 as digits).
[a-zA-Z]|[0-9]|[a-zA-Z]|[a-zA-Z]|[0-9]|[0-9]|[0-9]
Can someone help me?
The sequence "character, digit, character, character, digit, digit, digit" is expressed in regex as
[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}
If you're working in PCRE (with say, PHP):
^([a-zA-Z])([0-9])(?1){2}(?2){3}$
Breakdown:
^ - from the start of the string
([a-zA-Z]) - match and capture a single character in the ranges given: a-z, A-Z
([0-9]) - match and capture a single character in the ranges given: 0-9
(?1){2} - redo the regex in the first group twice (recursive subpattern)
(?2){3} - redo the regex in the second group 3 times (recursive subpattern)
$ - the end of the string
If you want to match this in the middle of a sentence, exchange ^ and $ for \b - which will match a word boundary
See the demo
If you're not using PCRE:
^[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}$
Which does the same thing, but has some copy-paste involved
I want to match a pattern with regex, the pattern is:
A-Za-z1-9[0-9-0-9]
so for example:
test1[1-50]
Can you help me ?
Solution update:
^[A-Za-z0-9]+\[[0-9]+-[0-9]+]$
Use this regex: [A-Za-z]+[1-9]\[[0-9]+-[0-9]+\]. You might also want to add \b at the start of the regex to match only after non words character.
[A-Za-z]+ matches things like test, only letters are accepted, one or more times
[1-9] matches a any digit but 0
\[[0-9]+-[0-9]+\] matches one or more digits twice and separated with -. All this must be enclosed with square brackets. (You need to escape those with \ because they are metacharacters)
I am using the regex
(.*)\d.txt
on the expression
MyFile23.txt
Now the online tester says that using the above regex the mentioned string would be allowed (selected). My understanding is that it should not be allowed because there are two numeric digits 2 and 3 while the above regex expression has only one numeric digit in it i.e \d.It should have been \d+. My current expression reads. Zero of more of any character followed by one numeric digit followed by .txt. My question is why is the above string passing the regex expression ?
This regex (.*)\d.txt will still match MyFile23.txt because of .* which will match 0 or more of any character (including a digit).
So for the given input: MyFile23.txt here is the breakup:
.* # matches MyFile2
\d # matched 3
. # matches a dot (though it can match anything here due to unescaped dot)
txt # will match literal txt
To make sure it only matches MyFile2.txt you can use:
^\D*\d\.txt$
Where ^ and $ are anchors to match start and end. \D* will match 0 or more non-digit.
The pattern you have has one group (.*) which would match using your example:MyFile2
because the . allows any character.
Furthermore the . in the pattern after this group is not escaped which will result in allowing another character of any kind.
To avoid this use:
(\D*)\d+\.txt
the group (\D*) would now match all non digit characters.
Here is the explanation, your "MyFile23.txt" matches the regex pattern:
A literal period . should always be escaped as \. else it will match "any character".
And finally, (.*) matches all the string from the beginning to the last digit (MyFile2). Have a look at the "MATCH INFORMATION" area on the right at this page.
So, I'd suggest the following fix:
^\D*\d\.txt$ = beginning of a line/string, non-digit character, any number of repetitions, a digit, a literal period, a literal txt, and the end of the string/line (depending on the m switch, which depends on the input string, whether you have a list of words on separate lines, or just a separate file name).
Here is a working example.