Regex add text to code [duplicate] - regex

I am using the following expression:
Find what: [0-9]
But what should I write in Replace with field if I want to add specific sup tag to all the digits?
Thanks in advance!

The replacement can be
<sup>$0</sup>
or
<sup>$&</sup>
Note that the $0 / ${0} / $&, or even $MATCH and ${^MATCH} backrefrence inserts the whole match.
See the Substitutions section:
$&, $MATCH, ${^MATCH}
The whole matched text.
and
$n, ${n}, \n
Returns what matched the subexpression numbered n. Negative indices are not alowed.
Note that a match value is usually stored as Group 0 inside a match object.
However, \0 as of now does not work (Notepad++ v.6.9), it looks like it is treated as a NUL character and truncates the replacement pattern right at the location where it is located.

Related

Find strings and add html tags to the beginning and end [duplicate]

I am using the following expression:
Find what: [0-9]
But what should I write in Replace with field if I want to add specific sup tag to all the digits?
Thanks in advance!
The replacement can be
<sup>$0</sup>
or
<sup>$&</sup>
Note that the $0 / ${0} / $&, or even $MATCH and ${^MATCH} backrefrence inserts the whole match.
See the Substitutions section:
$&, $MATCH, ${^MATCH}
The whole matched text.
and
$n, ${n}, \n
Returns what matched the subexpression numbered n. Negative indices are not alowed.
Note that a match value is usually stored as Group 0 inside a match object.
However, \0 as of now does not work (Notepad++ v.6.9), it looks like it is treated as a NUL character and truncates the replacement pattern right at the location where it is located.

Replace with whole match value using Notepad++ regex search and replace

I am using the following expression:
Find what: [0-9]
But what should I write in Replace with field if I want to add specific sup tag to all the digits?
Thanks in advance!
The replacement can be
<sup>$0</sup>
or
<sup>$&</sup>
Note that the $0 / ${0} / $&, or even $MATCH and ${^MATCH} backrefrence inserts the whole match.
See the Substitutions section:
$&, $MATCH, ${^MATCH}
The whole matched text.
and
$n, ${n}, \n
Returns what matched the subexpression numbered n. Negative indices are not alowed.
Note that a match value is usually stored as Group 0 inside a match object.
However, \0 as of now does not work (Notepad++ v.6.9), it looks like it is treated as a NUL character and truncates the replacement pattern right at the location where it is located.

Regular Expression required for matching a string that should not be followed by another specific string

I am using the below code for matching a string(EX: <jdgdt\s+mdy=.*?>\s*) which should not be followed by another specific string (<jdg>). But i am unable to get the desired output as per the below code. Can anyone help me regarding this ?
Input file :
<dckt>Docket No. 7677-12.</dckt>
<jdgdt mdy='02/25/2014'>
<jdg>Opinion by Marvel, <e>J.</e></jdg>
<taxyr></taxyr>
<disp></disp>
</tcpar>
<dckt>Docket No. 7237-13.</dckt>
<jdgdt mdy='02/24/2014'>
</tcpar>
Desired Output:
<dckt>Docket No. 7677-12.</dckt>
<jdgdt mdy='02/25/2014'>
<jdg>Opinion by Marvel, <e>J.</e></jdg>
<taxyr></taxyr>
<disp></disp>
</tcpar>
<dckt>Docket No. 7237-13.</dckt>
<jdgdt mdy='02/24/2014'>
<jdg>Opinion by Marvel, <e>J.</e></jdg>
<taxyr></taxyr>
<disp></disp>
</tcpar>
Code:
#/usr/bin/perl
my $filename = $ARGV[0];
my $ext = $ARGV[1];
my $InputFile = "$filename" . "\." . "$ext";
my $document = do {
local $/ = undef;
open my $fh, "<", $InputFile or die "Error: Could Not Open File $InputFile: $!";
<$fh>;
};
$document =~ s/(<jdgdt\s+mdy=.*?>\s*)(?!<jdg>)/$1<jdg>Opinion by Marvel,<e>J.<\/e><\/jdg>\n<taxyr><\/taxyr>\n<disp><\/disp>/isg;
print $document;
I had to make two minor adjustments to your regex to get the desired output:
$document =~ s{(<jdgdt\s+mdy\=[^>]*>\s*)(?!\s*<jdg>)}{$1<jdg>Opinion by Marvel,<e>J.</e></jdg>\n<taxyr></taxyr>\n<disp></disp>}isg;
Also, to clean up the code, I switched from using / to using {} to delimit the regex; that way, you don't need to backslash all the slashes that you actually want there in your replacement.
Explanation of what I changed:
First off, negative lookahead is tricky. What you have to remember is that perl will try to match your expression the maximum amount of times possible. Because you had this initially:
/(<jdgdt\s+mdy\=.*?>\s*)(?!<jdg>)/
What would happen is that in that first clause you'd get this match:
<jdgdt mdy='02/25/2014'>\n<jdg>Opinion by Marvel, <e>J.</e></jdg>
^^^^^^^^^^^^^^^^^^^^^^^^
(this part matched by paren. Note the \n is not matched!)
Perl would consider this a match because after the first parenthesized expression, you have "\n<jdg>". Well, that doesn't match the expression "<jdg>" (because of the initial newline), so yay! found a match.
In other words, initially, perl would have the \s* that you end your parenthesized expression with match the empty string, and therefore it would find a match and you'd end up stuffing things into the first clause that you didn't want. Another way to put it is that because of the freedom to choose what went into \s*, perl would choose the amount that allowed the expression as a whole to match. (and would fill \s* with the empty string for the first docket record, and newline for the second docket record)
To get perl to never find a match on the first docket record, I repeated the \s* in the negative lookahead as well. That way, no choice of what to put in \s* could make the expression as a whole match on the initial docket record, and perl had to give up and move to the second docket record.
But then there was a second problem! Remember how I said perl was really aggressive about finding matches anywhere it could? Well, next perl would expand your mdy\=.*?> bit to still find a result in the first docket record. After I added \s* to the negative lookahead, the first docket was still matching (but in a different spot) with:
<jdgdt mdy='02/25/2014'>\n<jdg>Opinion by Marvel, <e>J.</e></jdg>
^^^^^^^^^^^???????????????????^
(Underlined part matched by paren. ? denotes the bit matched by .*?)
See how perl expanded your .*? way beyond what you had intended? You'd intended that bit to match only stuff up to the first > character, but perl will stretch your non-greedy matches as far as necessary so that the whole pattern matches. This time, it stretched your .*? to cover the > that closed the <jdg> tag so that it could find a spot where the negative lookahead didn't block the match.
To keep perl from stretching your .*? pattern that far, I replaced .*? with [^>]*, which is really what you meant.
After these two changes, we then only found a match in the second docket record, as initially desired.
Use positive lookahead. (?!<jdg>) or something similar, look it up.

Replace specific capture group instead of entire regex in Perl

I've got a regular expression with capture groups that matches what I want in a broader context. I then take capture group $1 and use it for my needs. That's easy.
But how to use capture groups with s/// when I just want to replace the content of $1, not the entire regex, with my replacement?
For instance, if I do:
$str =~ s/prefix (something) suffix/42/
prefix and suffix are removed. Instead, I would like something to be replaced by 42, while keeping prefix and suffix intact.
As I understand, you can use look-ahead or look-behind that don't consume characters. Or save data in groups and only remove what you are looking for. Examples:
With look-ahead:
s/your_text(?=ahead_text)//;
Grouping data:
s/(your_text)(ahead_text)/$2/;
If you only need to replace one capture then using #LAST_MATCH_START and #LAST_MATCH_END (with use English; see perldoc perlvar) together with substr might be a viable choice:
use English qw(-no_match_vars);
$your_string =~ m/aaa (bbb) ccc/;
substr $your_string, $LAST_MATCH_START[1], $LAST_MATCH_END[1] - $LAST_MATCH_START[1], "new content";
# replaces "bbb" with "new content"
This is an old question but I found the below easier for replacing lines that start with >something to >something_else. Good for changing the headers for fasta sequences
while ($filelines=~ />(.*)\s/g){
unless ($1 =~ /else/i){
$filelines =~ s/($1)/$1\_else/;
}
}
I use something like this:
s/(?<=prefix)(group)(?=suffix)/$1 =~ s|text|rep|gr/e;
Example:
In the following text I want to normalize the whitespace but only after ::=:
some text := a b c d e ;
Which can be achieved with:
s/(?<=::=)(.*)/$1 =~ s|\s+| |gr/e
Results with:
some text := a b c d e ;
Explanation:
(?<=::=): Look-behind assertion to match ::=
(.*): Everything after ::=
$1 =~ s|\s+| |gr: With the captured group normalize whitespace. Note the r modifier which makes sure not to attempt to modify $1 which is read-only. Use a different sub delimiter (|) to not terminate the replacement expression.
/e: Treat the replacement text as a perl expression.
Use lookaround assertions. Quoting the documentation:
Lookaround assertions are zero-width patterns which match a specific pattern without including it in $&. Positive assertions match when their subpattern matches, negative assertions match when their subpattern fails. Lookbehind matches text up to the current match position, lookahead matches text following the current match position.
If the beginning of the string has a fixed length, you can thus do:
s/(?<=prefix)(your capture)(?=suffix)/$1/
However, ?<= does not work for variable length patterns (starting from Perl 5.30, it accepts variable length patterns whose length is smaller than 255 characters, which enables the use of |, but still prevents the use of *). The work-around is to use \K instead of (?<=):
s/.*prefix\K(your capture)(?=suffix)/$1/

Regular expression for number search

I need a regular expression that will find a number(s) that is not inside parenthesis.
Example abcd 1 (35) (df)
It would only see the 1.
Is this very complex? I've tried and had no luck.
Thanks for any help
An easy solution is to first remove the unwanted values:
my $string = "abcd 12 (35) (df) 2311,22";
$string =~ s/\(\d+\)//g; # remove numbers within parens
my #numbers = $string =~ /\d+/g; # extract the numbers
This is quite hard but something like this will probably do:
^(?:\()(\d+)(?:[^)])|(?:[^(0-9]|^)(\d+)(?:[^)0-9]|^)|(?:[^(])(\d+)(?:\))$
The problem is to match (123, 123) and also to not match the string 123 as the number 2 between the non-parentheses characters 1 and 3. Also there are probably some edge cases for start of and end of string.
My suggestion is to not use a regex for this. Maybe a regex that matches numbers and then use the capture info to check if the surrounding characters are not parentheses.
The regular expression would be:
^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$
The result is the first (and only) matching group of the regex.
Maybe you want to remove the ^ and $ if the regex should not match only if it’s the content of a whole single line. You can also use [a-zA-Z] or [[:alpha:]]. This depends on the regular expression engine you use and, of course, the content you want to match.
Example perl code:
if (m/^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$/) {
print("$1\n");
}
Please note that your question contains not enough information to make a good answer possible (you did not say anything about the general format of your expression, for example if you want to match integers or floating points)
How about
/(?:^|[^\d(])(\d+)(?:[^\d)]|$)/
? This matches a string of digits (\d+) that are
preceded by the beginning of the string, or a character that is not a digit or an open parenthesis ((?:^|[^\d(]))
succeeded by the end of the string, or by a character that is not a digit or a close parenthesis ((?:[^\d)]|$))