Replacing backslashes with two backslashes using regex [duplicate] - regex

This question already has answers here:
Perl regex: replace all backslashes with double-backslashes
(6 answers)
Closed 8 years ago.
How do I replace single backslashes in a string with double backslashes?
I've tried things such as
s/\\(?!\\)/\\\\/g
s/\\/\\\\/g
s/[^//]/\\\\/g
But they all produce multiple backslashes after each other.
So I want:
\test
to be replaced with
\\test
Edit: Sorry I should also mention that the regex is in a loop so I need a regex that only matches the string if there is ONLY ONE backslash. Once there is more than one backslash then the regex should reject the string. Apologies

The most helpful thing to note is to use a different delimiter for the regex, so things don't get jumbled by all the leaning towers:
my $str = '\test';
$str =~ s{\\}{\\\\}g;
print $str;
Outputs:
\\test
Update
Per your revised specification, if you only want to escape a single backslash, and ignore all others, then just use a negative lookahead and lookbehind assertion:
my $str = <<'END_STR';
\one \\two \\\three
END_STR
print $str;
$str =~ s{(?<!\\)\\(?!\\)}{\\\\}g;
print $str;
Outputs:
\one \\two \\\three
\\one \\two \\\three

echo '\replace' | perl -pe 's/\\/\\\\/g'
\\replace
OR with sed
# echo '\replace' | sed 's/\\/\\\\/g'
\\replace

Related

RegEx match all word characters, with umlauts from different languages [duplicate]

This question already has answers here:
Why do Perl string operations on Unicode characters add garbage to the string?
(7 answers)
Closed 3 years ago.
I want to check if a person's name is valid.
It should check latin letters, also with umlauts (i.e. öäüÖÄÜé).
unfortunately nothing i've tried works.
regarding many sources (following some links),
https://www.regular-expressions.info/unicode.html
Regex for word characters in any language
\p{L} should work, but it doesn't works for me.
Do i have to use a library for this?
use strict;
use warnings;
my $test = "testString";
print $1 if ($test =~ m/^(\p{L}+)$/); #testString
$test = "testStringö";
print $1 if ($test =~ m/^(\p{L}+)$/); #no print msg
$test = "testéString";
print $1 if ($test =~ m/^(\p{L}+)$/); #no print msg
You need to tell Perl that the source code of your file is in utf8. Add
use utf8;
After
use strict;

Regex to match end of the string and capture the part before end of the string perl [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I'm trying to match any string with the regex that ends with /? and extract the string before /?
Below is my code:
$input = "boringinterestingboring/?";
if($input =~ /(.*)\/?$/) {
print "$1\n";
}
else {
print "not matched";
}
I'm trying to capture "boringinterestingboring" using (.*) but it's not doing that, instead it captures the whole string.
How should i get only the string before /?.
Please help.
To match everything up to, but not including, a /?:
.*(?=/\?)
If you’re not sure about escaping, you can use a character class to do the escaping for you:
.*(?=/[?])
It may seem duplicate, but as the answer of your question,
Your regex need to be:
/(.*)\/\?$/
or
/(.*)(?=\/\?$)/
Example:
$input = "boringinterestingboring/?";
print "Use \$1: $1\n" if($input =~ /(.*)\/\?$/);
print "Use \$1: $1\n" if($input =~ /(.*)(?=\/\?$)/);
print "Use \$&: $&\n" if($input =~ /.*(?=\/\?$)/);
Output:
Use $1: boringinterestingboring
Use $1: boringinterestingboring
Use $&: boringinterestingboring
Different ways, same destination. But either way, you should escape ? too, or put it in [].
Using positive lookahead. The assertion (?=..) will match /? but will not make it part of the capturing group even if it is nested in another group.
$ echo "boringinterestingboring/?" | perl -ne ' ($x)=/(boringinterestingboring(?=\/\?))/ ; print $x '
boringinterestingboring
$
Negative test case. Below prints nothing
$ echo "boringinterestingboring#?" | perl -ne ' ($x)=/(boringinterestingboring(?=\/\?))/ ; print $x '
$

How to group string of characters by 4?

I have string 1234567890 and I want to format it as 1234 5678 90
I write this regex:
$str =~ s/(.{4})/$1 /g;
But for this case 12345678 this does not work. I get excess whitespace at the end:
>>1234 5678 <<
I try to rewrite regex with lookahead:
s/((?:.{4})?=.)/$1 /g;
How to rewrite regex to fix that case?
Just use unpack
use strict;
use warnings 'all';
for ( qw/ 12345678 1234567890 / ) {
printf ">>%s<<\n", join ' ', unpack '(A4)*';
}
output
>>1234 5678<<
>>1234 5678 90<<
Context is your friend:
join(' ', $str =~ /(.{1,4})/g)
In list context, the match will all four character chunks (and anything shorter than that at the end of the string -- thanks to greediness). join will ensure the chunks are separated by spaces and there are no trailing spaces at the end.
If $str is huge and the temporary list increases the memory footprint too much, then you might just want to do the s///g and strip the trailing space.
My preference is for using the simplest possible patterns in regexes. Also, I haven't measured but with long strings, just a single chop might be cheaper than a conditional pattern in the s///g:
$ echo $'12345678\n123456789' | perl -lnE 's/(.{1,4})/$1 /g; chop; say ">>$_<<"'
>>1234 5678<<
>>1234 5678 9<<
You had the syntax almost right. Instead of just ?=., you need (?=.) (parens are part of the lookahead syntax). So:
s/((?:.{4})(?=.))/$1 /g
But you don't need the non-capturing grouping:
s/(.{4}(?=.))/$1 /g
And I think it is more clear if the capture doesn't include the lookahead:
s/(.{4})(?=.)/$1 /g
And given your example data, a non-word-boundary assertion works too:
s/(.{4})\B/$1 /g
Or using \K to automatically Keep the matched part:
s/.{4}\B\K/ /g
To fix the regex I should write:
$str =~ s/(.{4}(?=.))/$1 /g;
I should just add parentheses around ?=.. Without them ?=. is counted as non greed match followed by =.
So we match four characters and append space after them. Then I look ahead that there are still characters. For example, the regex will not match for string 1234
Just use a look ahead to see that you have at least one character remaining:
$ echo $'12345678\n123456789' | perl -lnE 's/.{4}\K(?=.{1})/ /g; say ">>$_<<"'
>>1234 5678<<
>>1234 5678 9<<

perl regex partial word match

I am trying to remove all words that contain two keys (in Perl).
For example, the string
garble variable10 variable1 vssx vddx xi_21_vssx vddx_garble_21 xi_blahvssx_grbl_2
Should become
garble variable10 variable1
To just remove the normal, unappended/prepended keys is easy:
$var =~ s/(vssx|vddx)/ /g;
However I cannot figure out how to get it to remove the entire xi_21_vssx part. I tried:
$var =~ s/\s.*(vssx|vddx).*\s/ /g
Which does not work correctly. I do not understand why... it seems like \s should match the space, then .* matches anything up to one of the patterns, then the pattern, then .* matches anything preceding the pattern until the next space.
I also tried replacing \s (whitespace) with \b (word boundary) but it also did it work. Another attempt:
$var =~ s/ .*(vssx|vddx).* / /g
$var =~ s/(\s.*vssx.*\s|\s.*vddx.*\s)/ /g
As well as a few other mungings.
Any pointers/help would be greatly appreciated.
-John
I think the regex will just be
$var =~ s/\S*(vssx|vddx)\S*/ /g;
You can use
\s*\S*(?:vssx|vddx)\S*\s*
The problem with your regex were:
The .* should have been non-greedy.
The .* in front of (vssx|vddx) mustn't match whitespace characters, so you have to use \S*.
Note that there's no way to properly preserve the space between words - i.e. a vssx b will become ab.
regex101 demo.
I am trying to remove all words that [...]
This type of problem lends itself well to grep, which can be used to find the elements in a list that match a condition. You can use split to convert your string to a list of words and then filter it like this:
use strict;
use warnings;
use 5.010;
my $string = 'garble variable10 variable1 vssx vddx xi_21_vssx vddx_garble_21 xi_blahvssx_grbl_2';
my #words = split ' ', $string;
my #filtered = grep { $_ !~ /(?:vssx|vddx)/ } #words;
say "#filtered";
Output:
garble variable10 variable1
Try this as the regex:
\b[\w]*(vssx|vddx)[\w]*\b

What does s-/-- and s-/\Z-- in perl mean?

I am a beginner in perl and I have a query regarding pattern matching.
I came across a line in perl where it was written
$variable =~ s-/\Z--;
And as the code goes ahead some another variable was assigned
$variable1 =~ s-/--;
Can you please tell me what does these 2 lines do?
I want to know what does s-/\Z-- and s-/-- mean.
$variable =~ s-/\Z--;
- is used as a delimiter here. However, best practice suggests that you either use / or {} as delimiters.
It could be re-written as:
$variable =~ s{/\Z}{}; # remove a / at the end of a string
Consider:
$variable1 =~ s-/--;
Again, it could be re-written as:
$variable1 =~ s{/}{}; # remove the first /
The s/// operator in Perl is a substitution operation, which performs a search-and-replace on a string using a special kind of pattern called a regular expression. You can read more about regular expressions and Perl's pattern matching in the man pages that come with Perl:
man perlretut
man perlre
If you don't have these on your system, try searching Google for the same.
Applying a substitution to a variable is done with the =~ operator. So the following replaces all instances of 'foo' in the variable $var with 'bar'.
$var =~ s/foo/bar/;
All the Perl operators are documented on the 'perlop' man page.
Even though the most common separator character is a slash (hence s///), you can also use any other punctuation character as a separator. So in this case, the author has decided to use the dash (-) as the separator.
Here's the same line of code above using dash as a separator:
$var =~ s-foo-bar-;
In your case, the dash doesn't seem to add any clarity to the code, so it might be best to update it to use the conventional slashes instead.
The s/// search and replace function in perl can be used with different delimeters, which is what is done in this case. They have replaced / with the minus sign -, or dash.
The s-/-- removes the first / from the string.
The s-/\Z-- matches and removes a slash at the end of the line. I think this is better written: s{/$}{}.
$variable1 =~ s-/--;could be written as
$variable =~ s{/}{}xms;
or this
$variable =~ s/ \/ //xms;
It means delete the first / in the string.
Regarding s-/\Z--, it is usually written like this
$variable =~ s{/ \Z}{}xms;
or this
$variable =~ s/ \/ \Z //xms;
It means delete a / if it is at the end of the string (\Z).