Perl simple regex uppercase words separated by underscore

Perl simple regex uppercase words separated by underscore - regex

Consider I have string like print_this_text_in_camel_case and I want to uppercase the first word and every word after the underscore, so the result will be Print_This_Text_In_Camel_Case. The below test does not work on the first word.
#!/usr/bin/perl
my $str = "print_this_text_in_camel_case";
$str =~ s/(_.)/uc($1)/ge;
print $str, "\n";

Just modify the regex to match the first char as well:
#!/usr/bin/perl
my $str = "print_this_text_in_camel_case";
$str =~ s/(_.|^.)/uc($1)/ge;
print $str, "\n";
will print out:
Print_This_Text_In_Camel_Case

You need to add a beginning-of-string anchor as an alternative to the underscore.
For Perl 5.10+, I'd use a \K (keep) escape to emulate variable-width look-behind and only uppercase the letter. I'd also use use \U to perform the uppercase in the replacement text instead of uc and the /e (eval) modifier.
$str =~ s/(?:^|_)\K(.)/\U$1/g;
If you're using an older version of Perl (without \K) you could do it this way:
$str =~ s/(^|_)(.)/$1\U$2/g;
Another alternative is using split and join instead of a regex:
$str = join '_', map { ucfirst } split /_/, $s;

It is tidiest to use a negative look-behind. This code fragment upper-cases all letters that aren't preceded by a letter.
my $str = "print_this_text_in_camel_case";
$str =~ s/ (?<!\p{alpha}) (\p{alpha}) /uc $1/xgei;
print $str, "\n";
output
Print_This_Text_In_Camel_Case
If you prefer, or if you have a very old copy of Perl that doesn't support Unicode properties, you can use [a-z] instead od \p{alpha}, like this
$str =~ s/ (?<![a-z]) ([a-z]) /uc $1/xige;
which produces the same result.

You could also use ucfirst
use feature 'say';
my $str = "print_this_text_in_camel_case";
my #split = map(ucfirst, (split/(_)/, $str));
say #split;

Related

I would like to use regex to insert specific characters in a regex expression?

I'd like to be able to use regex in Perl to insert characters into words.
So that the word "TABLE" would become "T%A%B%L%E%"
Can I ask for the syntax for such a feat?
Many thanks

Break the string into characters then join them with what you want in between; also append that
my $res = ( join '%', split //, $string ) . '%';
A simple-minded way with regex
$string =~ s/(.)/$1%/g;
where with /r modifier you can preserve $string and return the changed string instead
my $res = $string =~ s/(.)/$1%/gr;

You can use this command,
echo TABLE|perl -pe 's/\w/$&%/g'
This outputs T%A%B%L%E%
OR (in case your data is contained in a file)
perl -pe 's/\w/$&%/g' test.pl
You may replace \w with [a-zA-Z] if you just want to replace with alphabets as \w matchs alphabets numbers and underscore.

You can use look-behind also
my $s = "table";
$s=~s/(?<=.)/%/g;
print $s;
If your version >5.14 you can use \K
$s=~s/.\K/%/g;

How to replace stuff in a Perl regex

I have a string $text and want to modify it with a regex. The string contains multiple sections like <NAME>John</NAME>.
I want to search for those sections, which I would normally do with something like
$text =~ m/<NAME>(.*?)<\/NAME>/g
but then make sure that there are no leading and trailing blanks and no leading non-word characters, which I would normally ensure with something like
$temp =~ s/^\s+|\s+$//g; # trim leading and trailing whitespaces
$temp = s/^\W*//g; # remove all leading non-word chars
Now my question is: How do I actually make this happen? Is it possible to use a s/// regex instead of the m//?

This is possible in a single substitution, but it's unnecessarily complex. I suggest you do a two-tier substitution using a executable replacement.
my $text = '<NAME> %^John^%
</NAME>';
$text =~ s{ (?<=<NAME>) ([^<>]*) (?=</NAME>) }{
(my $new = $1) =~ s/\A\s+|\s+\z//g;
$new =~ s/\A\W+//;
$new;
}eg;
print $text;
output
<NAME>John^%</NAME>
This is even simpler if you have version 14 or later of Perl 5, and want to use the non-destructive ( /r modifier) substitution mode.
$text =~ s{ (?<=<NAME>) ([^<>]*) (?=</NAME>) }{ $1 =~ s/\A\s+|\s+\z//gr =~ s/\A\W+//r }exg;

If I understand correctly, what you want to do is merely "clean up" the text inside the tag (insofar as it's possible to "parse" XML using regular expressions). This should do the trick:
$text =~ s/(<NAME>)\s*\W*(.*?)\s*(<\/NAME>)/$1$2$3/sgi;

Regex searching and adding characters

I'm trying to use regex to add $ to the start of words in a string such that:
Answer = partOne + partTwo
becomes
$Answer = $partOne + $partTwo
I'm using / [a-z]/ to locate them but not sure what I'm meant to replace it with.
Is there anyway to do it with regex or am I suppose to just split up my string and put in the $?
I'm using perl right now.

You can match word boundary \b, followed by word class \w
my $s = 'Answer = partOne + partTwo';
$s =~ s|\b (?= \w)|\$|xg;
print $s;
output
$Answer = $partOne + $partTwo

You could use a lookahead to match only the space or start of a line anchor which was immediately followed by an alphabet. Replace the matched space character or starting anchor with a $ symbol.
use strict;
use warnings;
while(my $line = <DATA>) {
$line =~ s/(^|\s)(?=[A-Za-z])/$1\$/g;
print $line;
}
__DATA__
Answer = partOne + partTwo
Output:
$Answer = $partOne + $partTwo

Perl's regexes have a word character class \w that is meant for exactly this sort of thing. It matches upper-case and lower-case letters, decimal digits, and the underscore _.
So if you prefix all ocurrences of one or more such characters with a dollar then it will achieve what you ask. It would look like this
use strict;
use warnings;
my $str = 'Answer = partOne + partTwo';
$str =~ s/(\w+)/\$$1/g;
print $str, "\n";
output
$Answer = $partOne + $partTwo
But please note that, if the text you're processing is a programming language, this will also process all comments and string literals in a way you probably don't want.

(\w+)
You can use this.Replace by \$$1.
See demo.
http://regex101.com/r/lS5tT3/40

Perl: How to replace only matched part of string?

I have a string foo_bar_not_needed_string_part_123. Now in this string I want to remove not_needed_string_part only when foo_ is followed by bar.
I used the below regex:
my $str = "foo_bar_not_needed_string_part_123";
say $str if $str =~ s/foo_(?=bar)bar_(.*?)_\d+//;
But it removed the whole string and just prints a newline.
So, what I need is to remove only the matched (.*?) part. So, that the output is
foo_bar__123.

There's another way, and it's quite simple:
my $str = "foo_bar_not_needed_string_part_123";
$str =~ s/(?<=foo_bar_)\D+//gi;
print $str;
The trick is to use lookbehind check anchor, and replace all non-digit symbols that follow this anchor (not a symbol). Basically, with this pattern you match only the symbols you need to be removed, hence no need for capturing groups.
As a sidenote, in the original regex (?=bar)bar construct is redundant. The first part (lookahead) will match only if some position is followed by 'bar' - but that's exactly what's checked with non-lookahead part of the pattern.

You can capture the parts you do not want to remove:
my $str = "foo_bar_not_needed_string_part_123";
$str =~ s/(foo_bar_).*?(_\d+)/$1$2/;
print $str;

You can try this:
my $str = "foo_bar_not_needed_string_part_123";
say $str if $str =~ s/(foo_(?=bar)bar_).*?(_\d+)/$1$2/;
Outputs:
foo_bar__123
PS: I am new to perl/regex so I am interested if there exist a way to directly replace the matched part. What I have done is captured everything which is required and than replaced the whole string with it.

What's about to divide string to 3 parts, and delete only middle?
$str =~ s/(foo_(?=bar)bar_)(.*?)(_\d+)/$1$3/;

Try this:
(?<=foo_bar_).*(?=_\d)
In this variant, it includes in result ALL (.*) between foo_bar_ and _"any digit".
In your regex, it includes in result:
foo_
Then it looks for "bar" after "foo_":
(?=bar)
But it DOES NOT included at this step. It is included on the next step:
bar_
And then rest of line is included by (.*?)_\d+.
So, in general: it includes in result all this that you typed, EXCEPT (?=bar), which is just looking for "bar" after expression.

go with
echo "foo_bar_not_needed_string_part_123" | perl -pe 's/(?<=foo_bar_)[^\d]+//'

You can use look-behind/look-ahead in this case
$str =~ s/(?<=foo_bar_).*?(?=_\d+)//;
and the look-behind can be replace with \K (keep) to make it a little tidier
$str =~ s/foo_bar_\K.*?(?=_\d+)//;

How to pull out every comma, hyphen, underscore, space, and joining remaining words without spaces?

I guess this would be a rather long regular expression, but is there a way to takeout underscores, spaces, commas, and hyphens from a string and then join the words together in perl?
Example:
_Car - Eat, Tree
Becomes:
CarEatTree

You can use a simple substitution:
$string =~ s/[_ ,-]//g;

This can also be done without regular expressions: Transliterate: tr///
use warnings;
use strict;
my $s = '_Car - Eat, Tree';
$s =~ tr/_ ,\-//d;
print "$s\n";
__END__
CarEatTree

If you're looking to strip any punctuation, you can always use s/[[:punct:]]//g

search for [_, -] and replace with the empty string ""
$str = "_Car - Eat, Tree";
$str =~ s/[_, -]//g;

my $str = '_Car - Eat, Tree';
$str =~ s/[\_\-\,\s]*//g;

Using the transliteration operator with (d)elete the (c)omplement;
#!/usr/bin/perl
use strict;
use warnings;
use 5.012;
my $str = '_Car - Eat, Tree';
$str =~ tr/a-zA-Z//cd;
print $str;
__END__
C:\Old_Data\perlp>perl t6.pl
CarEatTree

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl simple regex uppercase words separated by underscore - regex

Just modify the regex to match the first char as well: #!/usr/bin/perl my $str = "print_this_text_in_camel_case"; $str =~ s/(_.|^.)/uc($1)/ge; print $str, "\n"; will print out: Print_This_Text_In_Camel_Case

You could also use ucfirst use feature 'say'; my $str = "print_this_text_in_camel_case"; my #split = map(ucfirst, (split/(_)/, $str)); say #split;

Related

I would like to use regex to insert specific characters in a regex expression?

How to replace stuff in a Perl regex

Regex searching and adding characters

Perl: How to replace only matched part of string?

How to pull out every comma, hyphen, underscore, space, and joining remaining words without spaces?

Categories

Resources