TextMate: Regex replacing $1 with following 0 - regex

I'm trying to fix a file full of 1- and 2-digit numbers to make them all 2 digits long.
The file is of the form:
10,5,2
2,4,5
7,7,12
...
I've managed to match the problem numbers with:
(^|,)(\d)(,|$)
All I want to do now is replace the offending string with:
${1}0$2$3
but TextMate gives me:
10${1}05,2
Any ideas?
Thanks in advance,
Ross

According to this, TextMate supports word boundary anchors, so you could also search for \b\d\b and replace all with 0$0. (Thanks to Peter Boughton for the suggestion!)
This has the advantage of catching all the numbers in one go - your solution will have to be applied at least twice because the regex engine has already consumed the comma before the next number after a successful replace.

Note: Tim's solution is simpler and solves this problem, but I'll leave this here for reference, in case someone has a similar but more complex problem, which using lookarounds can support.
A simpler way than your expression is to replace:
(?<!\d)\d(?!\d)
With:
0$0
Which is "replace all single digits with 0 then itself".
The regex is:
Negative lookbehind to not find a digit (?<!\d)
A single digit: \d
Negative lookahead to not find a digit (?!\d)
Single this is a positional match (not a character match), it caters for both comma and start/end positions.
The $0 part says "entire match" - since the lookbehind/ahead match positions, this will contain the single digit that was matched.

To anyone coming here, as #Amarghosh suggested, it's a bug, or intentional behavior that leads to problems if nothing else.
I just had this problem and had to use the following workaround: If you set up another capture group, and then use a conditional insertion, it will work. For example, I had a string like <WebObject name=Frage01 and wanted to replace the 01 with 02, so I captured the main string in $1 and the end number in $2, which gave me a regex that looked like (<WebObject name=(Frage|Antwort))(01).
Then the replace was $1(?2:02).
The (?2:02) is the conditional insertion, and in this instance will always find something, but it was necessary in order to work around the odd conundrum of appending a number to the end of $n. Hope that helps someone. There is documentation on the conditional insertion here

In TextMate 1.5.11 (1635) ${1} does not work (like the OP described).
I appreciate the many suggestions re altering the query string, however there is a much simpler solution, if you want to break between a capture group and a number: \u.
It is a TextMate specific replacement syntax, that converts the following character to uppercase. As there is no uppercase for numbers, it does nothing and moves on. It is described in the link from Tim Pietzcker's answer.
In my case I had to clean up a csv file, where box measurements were given in cm x cm x mm. Thus I had to add a zero to the first two numbers.
Text: "80 x 40 x 5 mm"
Desired text: "800 x 400 x 5 mm"
Find: (\d+) x (\d+) x (\d+)
Replace: $1\u0 x $2\u0 x $3 mm
Regarding the support of more than 10 capture groups, I do not know if this is a bug. But as OP and #rossmcf wrote, $10 is replaced with null.

You need not ${1} - replace strings support only up to nine groups maximum - so it won't mistake it for $10.
Replace with $10$2$3

Related

regex to match specific pattern of string followed by digits

Sample input:
___file___name___2000___ed2___1___2___3
DIFFERENT+FILENAME+(2000)+1+2+3+ed10
Desired output (eg, all letters and 4-digit numbers and literal 'ed' followed immediately by a digit of arbitrary length:
file name 2000 ed2
DIFFERENT FILENAME 2000 ed10
I am using:
[A-Za-z]+|[\d]{4}|ed\d+ which only returns:
file name 2000 ed
DIFFERENT FILENAME 2000 ed
I see that there is a related Q+A here:Regular Expression to match specific string followed by number?
eg using ed[0-9]* would match ed#, but unsure why it does not match in the above.
As written, your regex is correct. Remember, however, that regex tries to match its statements from left to right. Your ed\d+ is never going to match, because the ed was already consumed by your [A-Za-z] alternative. Reorder your regex and it'll work just fine:
ed\d+|[a-zA-Z]+|\d{4}
Demo
Nick's answer is right, but because in-order matching can be a less readable "gotcha", the best (order-insensitive) ways to do this kind of search are 1) with specified delimiters, and 2) by making each search term unique.
Jan's answer handles #1 well. But you would have to specify each specific delimiter, including its length (e.g. ___). It sounds like you may have some unusual delimiters, so this may not be ideal.
For #2, then, you can make each search term unique. (That is, you want the thing matching "file" and "name" to be distinct from the thing matching "2000", and both to be distinct from the thing matching "ed2".)
One way to do this is [A-Za-z]+(?![0-9a-zA-Z])|[\d]{4}|ed\d+. This is saying that for the first type of search term, you want an alphabet string which is followed by a non-alphanumeric character. This keeps it distinct from the third search term, which is an alphabet string followed by some digit(s). This also allows you to specify any range of delimiters inside of that negative lookbehind.
demo
You might very well use (just grab the first capturing group):
(?:^|___|[+(]) # delimiter before
([a-zA-Z0-9]{2,}) # the actual content
(?=$|___|[+)]) # delimiter afterwards
See a demo on regex101.com

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

Get last characters up to specific character

Lets say I have a string something-123.
I need to get last 5 (or less) characters of it but only up to - if there is one in the string, so the result would be like thing, but if string has no - in it, like something123 then the result would be ng123, and if string is like 123 then the result would be 123.
I know how to mach last 5 characters:
/.{5}$/
I know how to mach everything up to first -:
/[^-]*/
But I can not figure out how to combine them, and to make things worse I need to get the match without extracting it from specific groups and similar advanced regex stuff because I want to use it in SQL Anywhere, please help.
Tank you all for the help, but looks like a complete regex solution is going to be too complicated for my problem, so I did it very simple: SELECT right(regexp_substr('something-123', '[^-]*'), 4).
One option is to group the result:
(.{4})-
Now you have captured the result but without the -.
Or using lookarounds you can:
.{4}(?=-)
which matches any 4 characters that appears before "-".
You can use:
.{5}(?=(?:-[^-]*)?$)
See the regex demo
We match 5 symbols other than a newline only before the last - in the string or at the very end of the string ((?=(?:-[^-]*)?$)). You only need to collect the matches, no need checking groups/submatches.
UPDATE
To match any 1 to 5 characters other than a hyphen before the first hyphen (if present in the string), you can use
([^-]{1,5})(?:(?:-[^-]*)*)?$
See demo. We rely on a lookahead here, that checks if there are -+non-hyphen sequences are after the expected substring.
An faster alternative:
^[^-]*?([^-]{1,5})(?:-|$)
This regex will search for any characters other than - up to 1 to 5 such characters.
Note that here, the value we need is in Group 1.
How about:
(.{5})(?:-[^-]+)?$
The result is in group 1
Try this regex:
(.{1,5})(?:-.*|$)
Group 1 has the result you need
demo

Interesting easy looking Regex

I am re-phrasing my question to clear confusions!
I want to match if a string has certain letters for this I use the character class:
[ACD]
and it works perfectly!
but I want to match if the string has those letter(s) 2 or more times either repeated or 2 separate letters
For example:
[AKL] should match:
ABCVL
AAGHF
KKUI
AKL
But the above should not match the following:
ABCD
KHID
LOVE
because those are there but only once!
that's why I was trying to use:
[ACD]{2,}
But it's not working, probably it's not the right Regex.. can somebody a Regex guru can help me solve this puzzle?
Thanks
PS: I will use it on MYSQL - a differnt approach can also welcome! but I like to use regex for smarter and shorter query!
To ensure that a string contains at least two occurencies in a set of letters (lets say A K L as in your example), you can write something like this:
[AKL].*[AKL]
Since the MySQL regex engine is a DFA, there is no need to use a negated character class like [^AKL] in place of the dot to avoid backtracking, or a lazy quantifier that is not supported at all.
example:
SELECT 'KKUI' REGEXP '[AKL].*[AKL]';
will return 1
You can follow this link that speaks on the particular subject of the LIKE and the REGEXP features in MySQL.
If I understood you correctly, this is quite simple:
[A-Z].*?[A-Z]
This looks for your something in your set, [A-Z], and then lazily matches characters until it (potentially) comes across the set, [A-Z], again.
As #Enigmadan pointed out, a lazy match is not necessary here: [A-Z].*[A-Z]
The expression you are using searches for characters between 2 and unlimited times with these characters ACDFGHIJKMNOPQRSTUVWXZ.
However, your RegEx expression is excluding Y (UVWXZ])) therefore Z cannot be found since it is not surrounded by another character in your expression and the same principle applies to B ([ACD) also excluded in you RegEx expression. For example Z and A would match in an expression like ZABCDEFGHIJKLMNOPQRSTUVWXYZA
If those were not excluded on purpose probably better can be to use ranges like [A-Z]
If you want 2 or more of a match on [AKL], then you may use just [AKL] and may have match >= 2.
I am not good at SQL regex, but may be something like this?
check (dbo.RegexMatch( ['ABCVL'], '[AKL]' ) >= 2)
To put it in simple English, use [AKL] as your regex, and check the match on the string to be greater than 2. Here's how I would do in Java:
private boolean search2orMore(String string) {
Matcher matcher = Pattern.compile("[ACD]").matcher(string);
int counter = 0;
while (matcher.find())
{
counter++;
}
return (counter >= 2);
}
You can't use [ACD]{2,} because it always wants to match 2 or more of each characters and will fail if you have 2 or more matching single characters.
your question is not very clear, but here is my trial pattern
\b(\S*[AKL]\S*[AKL]\S*)\b
Demo
pretty sure this should work in any case
(?<l>[^AKL\n]*[AKL]+[^AKL\n]*[AKL]+[^AKL\n]*)[\n\r]
replace AKL for letters you need can be done very easily dynamicly tell me if you need it
Is this what you are looking for?
".*(.*[AKL].*){2,}.*" (without quotes)
It matches if there are at least two occurences of your charactes sorrounded by anything.
It is .NET regex, but should be same for anything else
Edit
Overall, MySQL regular expression support is pretty weak.
If you only need to match your capture group a minimum of two times, then you can simply use:
select * from ... where ... regexp('([ACD].*){2,}') #could be `2,` or just `2`
If you need to match your capture group more than two times, then just change the number:
select * from ... where ... regexp('([ACD].*){3}')
#This number should match the number of matches you need
If you needed a minimum of 7 matches and you were using your previous capture group [ACDF-KM-XZ]
e.g.
select * from ... where ... regexp('([ACDF-KM-XZ].*){7,}')
Response before edit:
Your regex is trying to find at least two characters from the set[ACDFGHIJKMNOPQRSTUVWXZ].
([ACDFGHIJKMNOPQRSTUVWXZ]){2,}
The reason A and Z are not being matched in your example string (ABCDEFGHIJKLMNOPQRSTUVWXYZ) is because you are looking for two or more characters that are together that match your set. A is a single character followed by a character that does not match your set. Thus, A is not matched.
Similarly, Z is a single character preceded by a character that does not match your set. Thus, Z is not matched.
The bolded characters below do not match your set
ABCDEFGHIJKLMNOPQRSTUVWXYZ
If you were to do a global search in the string, only the italicized characters would be matched:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

How to extract a numeric substring from a string but only if the previous string part matches a target

So I am trying to extract defect numbers from changeset comments in TFS. However, there are several ways people have entered the numbers:
"Defect 1321: blah blah blah"
"Fixes HPQC 1427. Logic modified"
"- Bug 976 - Customer"
I am not great with regexes so any help would be great. I prepare the string ahead of time by tolowering it and stripping out the # and ., so I can be assured I am looking for something that starts with (defect|hpqc|bug) has an optional space (\s) then a number (\d) then ends with a space (\s) but this didn't work:
(defect|hpqc|bug)\s\d\s
I only want to find the first match.
I want to extract the numeric component but only if the previous word is a match.
I am sure this is a result of my trivial knowledge of regex creation.
Case matters (usually) and you want more than one digit \d+ and there is an optional number sign too so something like this should work, depending on your system:
(Defect|HPQC|Bug)\s*#?\s*(\d+)
This allows spaces and # or neither before the digits, and captures the digits. It would help to know if you are using python or something else (tag your question).
I believe this regex should work for you:
(?:defect|hpqc|bug)\s+(\d+)\s+
Defect/Bug # is available in matched group #1
If you are looking only for the number after the keyword here is a regex might should help...
(?<=(Defect|HPQC|Bug)\s*#?\s*)\d+
Good Luck!
I precise Beroe response :
(?:Defect|HPQC|Bug)\s*\#?\s*(\d+)`
(?:Defect|HPQC|Bug) : detect but don't capture
\# : slash for disable the comment
It works for me on Expresso