Regex searching and adding characters

Regex searching and adding characters - regex

I'm trying to use regex to add $ to the start of words in a string such that:
Answer = partOne + partTwo
becomes
$Answer = $partOne + $partTwo
I'm using / [a-z]/ to locate them but not sure what I'm meant to replace it with.
Is there anyway to do it with regex or am I suppose to just split up my string and put in the $?
I'm using perl right now.

You can match word boundary \b, followed by word class \w
my $s = 'Answer = partOne + partTwo';
$s =~ s|\b (?= \w)|\$|xg;
print $s;
output
$Answer = $partOne + $partTwo

You could use a lookahead to match only the space or start of a line anchor which was immediately followed by an alphabet. Replace the matched space character or starting anchor with a $ symbol.
use strict;
use warnings;
while(my $line = <DATA>) {
$line =~ s/(^|\s)(?=[A-Za-z])/$1\$/g;
print $line;
}
__DATA__
Answer = partOne + partTwo
Output:
$Answer = $partOne + $partTwo

Perl's regexes have a word character class \w that is meant for exactly this sort of thing. It matches upper-case and lower-case letters, decimal digits, and the underscore _.
So if you prefix all ocurrences of one or more such characters with a dollar then it will achieve what you ask. It would look like this
use strict;
use warnings;
my $str = 'Answer = partOne + partTwo';
$str =~ s/(\w+)/\$$1/g;
print $str, "\n";
output
$Answer = $partOne + $partTwo
But please note that, if the text you're processing is a programming language, this will also process all comments and string literals in a way you probably don't want.

(\w+)
You can use this.Replace by \$$1.
See demo.
http://regex101.com/r/lS5tT3/40

Related

How to get the string which is starting after front slash in Perl regex?

For example, I have a below string starting with two front slashes. Now I want to get the string "foo_foo". How do I do that? Thanks in advance.
my $str = "// filename : foo_foo";
if ($_ =~ m/^filename\s+:\s+(.+)/) {print "regex $1 \n";}

You populate $str but bind the match against $_.
Use a different delimiter so you don't have to escape the slashes.
my $str = "// filename : foo_foo";
if ($str =~ m{^/+\s+filename\s+:\s+(.+)}) {
print "regex: '$1'\n";
}

You can use
my $str = "// filename : foo_foo";
if ($str =~ m{^//\h*filename\s*:\s*(.+)}) {
print "regex $1 \n";
}
See the online Perl demo. Here, I used {...} regex delimiters instead of /.../ and the pattern looks like ^//\h*filename\s*:\s*(.+) now, matching
^ - start of string
// - a // substring
\h* - zero or more horizontal whitespaces
filename - some fixed string
\s*:\s* - a : char enclosed with zero or more whitespaces
(.+) - Group 1: one or more chars other than line break chars as many as possible (greedy dot).

Something in line with following sample code should produce desired result.
use strict;
use warnings;
use feature 'say';
my $str = "// filename : foo_foo";
my($fname) = $str =~ m|// filename : (.*)\z|;
say $fname;
Output
foo_foo

I would like to use regex to insert specific characters in a regex expression?

I'd like to be able to use regex in Perl to insert characters into words.
So that the word "TABLE" would become "T%A%B%L%E%"
Can I ask for the syntax for such a feat?
Many thanks

Break the string into characters then join them with what you want in between; also append that
my $res = ( join '%', split //, $string ) . '%';
A simple-minded way with regex
$string =~ s/(.)/$1%/g;
where with /r modifier you can preserve $string and return the changed string instead
my $res = $string =~ s/(.)/$1%/gr;

You can use this command,
echo TABLE|perl -pe 's/\w/$&%/g'
This outputs T%A%B%L%E%
OR (in case your data is contained in a file)
perl -pe 's/\w/$&%/g' test.pl
You may replace \w with [a-zA-Z] if you just want to replace with alphabets as \w matchs alphabets numbers and underscore.

You can use look-behind also
my $s = "table";
$s=~s/(?<=.)/%/g;
print $s;
If your version >5.14 you can use \K
$s=~s/.\K/%/g;

Perl Regex Remove Hyphen but Ignore Specific Hyphenated words

I have a perl regex which converts hyphens to spaces eg:-
$string =~ s/-/ /g;
I need to modify this to ignore specific hyphenated phrases and not replace the hyphen e.g. in a string like this:
"use-either-dvi-d-or-dvi-i"
I wish to NOT replace the hyphen in dvi-d and dvi-i so it reads:
"use either dvi-d or dvi-i"
I have tried various negative look ahead matches but failed miserably.

You can use this PCRE regex with verbs (*SKIP)(*F) to skip certain words from your match:
dvi-[id](*SKIP)(*F)|-
RegEx Demo
This will skip words dvi-i and dvi-d for splitting due to use of (*SKIP)(*F).
For your code:
$string =~ s/dvi-[id](*SKIP)(*F)|-/ /g;
Perl Code Demo
There is an alternate lookarounds based solution as well:
/(?<!dvi)-|-(?![di])/
Which basically means match hyphen if it is not preceded by dvi OR if it is not followed by d or i, thus making sure to not match - when we have dvi on LHS and [di] on RHS.
Perl code:
$string =~ s/(?<!dvi)-|-(?![di])/ /g;
Perl Code Demo 2

$string =~ s/(?<!dvi)-(?![id])|(?<=dvi)-(?![id])|(?<!dvi)-(?=[id])/ /g;
While using just (?<!dvi)-(?![id]) you will exclude also dvi-x or x-i, where x can be any character.

It is unlikely that you could get a simple and straightforward regex solution to this. However, you could try the following:
#!/usr/bin/env perl
use strict;
use warnings;
my %whitelist = map { $_ => 1 } qw( dvi-d dvi-i );
my $string = 'use-either-dvi-d-or-dvi-i';
while ( $string =~ m{ ( [^-]+ ) ( - ) ( [^-]+ ) }gx ) {
my $segment = substr($string, $-[0], $+[0] - $-[0]);
unless ( $whitelist{ $segment } ) {
substr( $string, $-[2], 1, ' ');
}
pos( $string ) = $-[ 3 ];
}
print $string, "\n";
The #- array contains the starting offsets of matched groups, and the #+ array contains the ends offsets. In both cases, element 0 refers to the whole match.
I had to resort to something like this because of how \G works:
Note also that s/// will refuse to overwrite part of a substitution that has already been replaced; so for example this will stop after the first iteration, rather than iterating its way backwards through the string:
$_ = "123456789";
pos = 6;
s/.(?=.\G)/X/g;
print; # prints 1234X6789, not XXXXX6789
Maybe #tchrist can figure out how to bend various assertions to his will.

we can ignore specific words using negative Look-ahead and negative Look-behind
Example :
(?!pattern)
is a negative look-ahead assertion
in your case the pattern is
$string =~ s/(?<!dvi)-(?<![id])/ /g;
output :
use either dvi-d or dvi-i
Reference : http://www.perlmonks.org/?node_id=518444
Hope this will help you.

Perl simple regex uppercase words separated by underscore

Consider I have string like print_this_text_in_camel_case and I want to uppercase the first word and every word after the underscore, so the result will be Print_This_Text_In_Camel_Case. The below test does not work on the first word.
#!/usr/bin/perl
my $str = "print_this_text_in_camel_case";
$str =~ s/(_.)/uc($1)/ge;
print $str, "\n";

Just modify the regex to match the first char as well:
#!/usr/bin/perl
my $str = "print_this_text_in_camel_case";
$str =~ s/(_.|^.)/uc($1)/ge;
print $str, "\n";
will print out:
Print_This_Text_In_Camel_Case

You need to add a beginning-of-string anchor as an alternative to the underscore.
For Perl 5.10+, I'd use a \K (keep) escape to emulate variable-width look-behind and only uppercase the letter. I'd also use use \U to perform the uppercase in the replacement text instead of uc and the /e (eval) modifier.
$str =~ s/(?:^|_)\K(.)/\U$1/g;
If you're using an older version of Perl (without \K) you could do it this way:
$str =~ s/(^|_)(.)/$1\U$2/g;
Another alternative is using split and join instead of a regex:
$str = join '_', map { ucfirst } split /_/, $s;

It is tidiest to use a negative look-behind. This code fragment upper-cases all letters that aren't preceded by a letter.
my $str = "print_this_text_in_camel_case";
$str =~ s/ (?<!\p{alpha}) (\p{alpha}) /uc $1/xgei;
print $str, "\n";
output
Print_This_Text_In_Camel_Case
If you prefer, or if you have a very old copy of Perl that doesn't support Unicode properties, you can use [a-z] instead od \p{alpha}, like this
$str =~ s/ (?<![a-z]) ([a-z]) /uc $1/xige;
which produces the same result.

You could also use ucfirst
use feature 'say';
my $str = "print_this_text_in_camel_case";
my #split = map(ucfirst, (split/(_)/, $str));
say #split;

How can I extract a substring up to the first digit?

How can I find the first substring until I find the first digit?
Example:
my $string = 'AAAA_BBBB_12_13_14' ;
Result expected: 'AAAA_BBBB_'

Judging from the tags you want to use a regular expression. So let's build this up.
We want to match from the beginning of the string so we anchor with a ^ metacharacter at the beginning
We want to match anything but digits so we look at the character classes and find out this is \D
We want 1 or more of these so we use the + quantifier which means 1 or more of the previous part of the pattern.
This gives us the following regular expression:
^\D+
Which we can use in code like so:
my $string = 'AAAA_BBBB_12_13_14';
$string =~ /^\D+/;
my $result = $&;

Most people got half of the answer right, but they missed several key points.
You can only trust the match variables after a successful match. Don't use them unless you know you had a successful match.
The $&, $``, and$'` have well known performance penalties across all regexes in your program.
You need to anchor the match to the beginning of the string. Since Perl now has user-settable default match flags, you want to stay away from the ^ beginning of line anchor. The \A beginning of string anchor won't change what it does even with default flags.
This would work:
my $substring = $string =~ m/\A(\D+)/ ? $1 : undef;
If you really wanted to use something like $&, use Perl 5.10's per-match version instead. The /p switch provides non-global-perfomance-sucking versions:
my $substring = $string =~ m/\A\D+/p ? ${^MATCH} : undef;
If you're worried about what might be in \D, you can specify the character class yourself instead of using the shortcut:
my $substring = $string =~ m/\A[^0-9]+/p ? ${^MATCH} : undef;
I don't particularly like the conditional operator here, so I would probably use the match in list context:
my( $substring ) = $string =~ m/\A([^0-9]+)/;
If there must be a number in the string (so, you don't match an entire string that has no digits, you can throw in a lookahead, which won't be part of the capture:
my( $substring ) = $string =~ m/\A([^0-9]+)(?=[0-9])/;

$str =~ /(\d)/; print $`;
This code print string, which stand before matching

perl -le '$string=q(AAAA_BBBB_12_13_14);$string=~m{(\D+)} and print $1'
AAAA_BBBB_

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex searching and adding characters - regex

You can match word boundary \b, followed by word class \w my $s = 'Answer = partOne + partTwo'; $s =~ s|\b (?= \w)|\$|xg; print $s; output $Answer = $partOne + $partTwo

(\w+) You can use this.Replace by \$$1. See demo. http://regex101.com/r/lS5tT3/40

Related

How to get the string which is starting after front slash in Perl regex?

I would like to use regex to insert specific characters in a regex expression?

Perl Regex Remove Hyphen but Ignore Specific Hyphenated words

Perl simple regex uppercase words separated by underscore

How can I extract a substring up to the first digit?

Categories

Resources