Better way to remove specific characters from a Perl string

Better way to remove specific characters from a Perl string - regex

I have dynamically generated strings like ###!efq#!#!, and I want to remove specific characters from the string using Perl.
Currently I am doing something this (replacing the characters with nothing):
$varTemp =~ s/['\$','\#','\#','\~','\!','\&','\*','\(','\)','\[','\]','\;','\.','\,','\:','\?','\^',' ', '\`','\\','\/']//g;
Is there a better way of doing this? I am fooking for something clean.

You've misunderstood how character classes are used:
$varTemp =~ s/[\$##~!&*()\[\];.,:?^ `\\\/]+//g;
does the same as your regex (assuming you didn't mean to remove ' characters from your strings).
Edit: The + allows several of those "special characters" to match at once, so it should also be faster.

You could use the tr instead:
$p =~ tr/fo//d;
will delete every f and every o from $p. In your case it should be:
$p =~ tr/\$##~!&*()[];.,:?^ `\\\///d
See Perl's tr documentation.
tr/SEARCHLIST/REPLACEMENTLIST/cdsr
Transliterates all occurrences of the characters found (or not found if the /c modifier is specified) in the search list with the positionally corresponding character in the replacement list, possibly deleting some, depending on the modifiers specified.
[…]
If the /d modifier is specified, any characters specified by SEARCHLIST not found in REPLACEMENTLIST are deleted.

With a character class this big it is easier to say what you want to keep. A caret in the first position of a character class inverts its sense, so you can write
$varTemp =~ s/[^"%'+\-0-9<=>a-z_{|}]+//gi
or, using the more efficient tr
$varTemp =~ tr/"%'+\-0-9<=>A-Z_a-z{|}//cd
tr docs

Well if you're using the randomly-generated string so that it has a low probability of being matched by some intentional string that you might normally find in the data, then you probably want one string per file.
You take that string, call it $place_older say. And then when you want to eliminate the text, you call quotemeta, and you use that value to substitute:
my $subs = quotemeta $place_holder;
s/$subs//g;

Related

matching cond in perl using double exclaimation

if ($a =~ m!^$var/!)
$var is a key in a two dimensional hash and $a is a key in another hash.
What is the meaning of this expressions?

This is a regular expression ("regex"), where the ! character is used as the delimiter for the pattern that is to be matched in the string that it binds to via the =~ operator (the $a† here).
It may clear it up to consider the same regex with the usual delimiter instead, $a =~ /^$var\// (then m may be omitted); but now any / used in the pattern clearly must be escaped. To avoid that unsightly and noisy \/ combo one often uses another character for the delimiter, as nearly any character may be used (my favorite is the curlies, m{^$var/}). ‡ §
This regex in the question tests whether the value in the variable $a begins with (by ^ anchor) the value of the variable $var followed by / (variables are evaluated and the result used). §
† Not a good choice for a variable name since $a and $b are used by the builtin sort
‡ With the pattern prepared ahead of time the delimiter isn't even needed
my $re = qr{^$var/};
if ($string =~ $re) ...
(but I do like to still use // then, finding it clearer altogether)
Above I use qr but a simple q() would work just fine (while I absolutely recommend qr). These take nearly any characters for the delimiter, as well.
§ Inside a pattern the evaluated variables are used as regex patterns, what is wrong in general (when this is intended they should be compiled using qr and thus used as subpatterns).
An unimaginative example: a variable $var = q(\s) (literal backslash followed by letter s) evaluated inside a pattern yields the \s sequence which is then treated as a regex pattern, for whitespace. (Presumably unintended; we just wanted \ and s.)
This is remedied by using quotemeta, /\Q$var\E/, so that possible metacharacters in $var are escaped; this results in the correct pattern for the literal characters, \\s. So a correct way to write the pattern is m{^\Q$var\E/}.
Failure to do this also allows the injection bug. Thanks to ikegami for commenting on this.

The match operator (m/.../) is one of Perl's "quote-like" operators. The standard usage is to use slashes before and after the regex that goes in the middle of the operator (and if you use slashes, then you can omit the m from the start of the operator). But if the regex itself contains a slash then it is convenient to use a different delimiter instead to avoid having to escape the embedded slash. In your example, the author has decided to use exclamation marks, but any non-whitespace character can be used.
Many Perl operators work like this - m/.../, s/.../.../, tr/.../.../, q/.../, qq/.../, qr/.../, qw/.../, qx/.../ (I've probably forgotten some).

Perl: Regex with Variables inside?

Is there a more elegant way of bringing a variable into a pattern than this (put the patterin in a string before instead of using it directly in //)??
my $z = "1"; # variable
my $x = "foo1bar";
my $pat = "^foo".$z."bar\$";
if ($x =~ /$pat/)
{
print "\nok\n";
}

The qr operator does it
my $pat = qr/^foo${z}bar$/;
unless the delimiters are ', in which case it doesn't interpolate variables.
This operator is the best way to build patterns ahead of time, as it builds a proper regex pattern, accepting everything that one can use in a pattern in a regex. It also takes modifiers, so for one the above can be written as
my $pat = qr/^foo $z bar$/x;
for a little extra clarity (but careful with omitting those {}).†
The initial description from the above perlop link (examples and discussion follow):
qr/STRING/msixpodualn
This operator quotes (and possibly compiles) its STRING as a regular expression. STRING is interpolated the same way as PATTERN in m/PATTERN/. If "'" is used as the delimiter, no variable interpolation is done. Returns a Perl value which may be used instead of the corresponding /STRING/msixpodualn expression. The returned value is a normalized version of the original pattern. It magically differs from a string containing the same characters: ref(qr/x/) returns "Regexp"; however, dereferencing it is not well defined (you currently get the normalized version of the original pattern, but this may change).
† Once a variable is getting interpolated it may be a good idea to escape special characters that may be hiding in it, that could throw off the regex, using quotemeta based \Q ... \E
my $pat = qr/^foo \Q$z\E bar$/x;
If this is used then the variable name need not be delimited either
my $pat = qr/^foo\Q$z\Ebar$/;
Thanks to Håkon Hægland for bringing this up.

Regex to find text between second and third slashes

I would like to capture the text that occurs after the second slash and before the third slash in a string. Example:
/ipaddress/databasename/
I need to capture only the database name. The database name might have letters, numbers, and underscores. Thanks.

How you access it depends on your language, but you'll basically just want a capture group for whatever falls between your second and third "/". Assuming your string is always in the same form as your example, this will be:
/.*/(.*)/
If multiple slashes can exist, but a slash can never exist in the database name, you'd want:
/.*/(.*?)/

/.*?/(.*?)/
In the event that your lines always have / at the end of the line:
([^/]*)/$
Alternate split method:
split("/")[2]

The regex would be:
/[^/]*/([^/]*)/
so in Perl, the regex capture statement would be something like:
($database) = $text =~ m!/[^/]*/([^/]*)/!;
Normally the / character is used to delimit regexes but since they're used as part of the match, another character can be used. Alternatively, the / character can be escaped:
($database) = $text =~ /\/[^\/]*\/([^\/]*)\//;

You can even more shorten the pattern by going this way:
[^/]+/(\w+)
Here \w includes characters like A-Z, a-z, 0-9 and _
I would suggest you to give SPLIT function a priority, since i have experienced a good performance of them over RegEx functions wherever it is possible to use them.

you can use explode function with PHP or split with other languages to so such operation.
anyways, here is regex pattern:
/[\/]*[^\/]+[\/]([^\/]+)/

I know you specifically asked for regex, but you don't really need regex for this. You simply need to split the string by delimiters (in this case a backslash), then choose the part you need (in this case, the 3rd field - the first field is empty).
cut example:
cut -d '/' -f 3 <<< "$string"
awk example:
awk -F '/' {print $3} <<< "$string"
perl expression, using split function:
(split '/', $string)[2]
etc.

remove up to _ in perl using regex?

How would I go about removing all characters before a "_" in perl? So if I had a string that was "124312412_hithere" it would replace the string as just "hithere". I imagine there is a very simple way to do this using regex, but I am still new dealing with that so I need help here.

Remove all characters up to and including "_":
s/^[^_]*_//;
Remove all characters before "_":
s/^[^_]*(?=_)//;
Remove all characters before "_" (assuming the presence of a "_"):
s/^[^_]*//;

This is a bit more verbose than it needs to be, but would be probably more valuable for you to see what's going on:
my $astring = "124312412_hithere";
my $find = "^[^_]*_";
my $replace = "_";
$astring =~ s/$find/$replace/;
print $astring;
Also, there's a bit of conflicting requirements in your question. If you just want hithere (without the leading _), then change it to:
$astring =~ s/$find//;

I know it's slightly different than what was asked, but in cases like this (where you KNOW the character you are looking for exists in the string) I prefer to use split:
$str = '124312412_hithere';
$str = (split (/_/, $str, 2))[1];
Here I am splitting the string into parts, using the '_' as a delimiter, but to a maximum of 2 parts. Then, I am assigning the second part back to $str.
There's still a regex in this solution (the /_/) but I think this is a much simpler solution to read and understand than regexes full of character classes, conditional matches, etc.

You can try out this: -
$_ = "124312412_hithere";
s/^[^_]*_//;
print $_; # hithere
Note that this will also remove the _(as I infer from your sample output). If you want to keep the _ (as it seems doubtful what you want as per your first statement), you would probably need to use look-ahead as in #ikegami's answer.
Also, just to make it little more clear, any substitution and matching in regex is applied by default on $_. So, you don't need to bind it to $_ explicitly. That is implied.
So, s/^[^_]*_//; is essentially same as - $_ =~ s/^[^_]*_//;, but later one is not really required.

Regex to match suffixes to english words

I'm searching for the word "move" and i want to match "moved" as well when I print.
The way I'm going about this is:
if ($sentence =~ /($search_key)d$/i) {
$search_key = $search_keyd;
}
$subsentences[$i] =~ s/$search_key/ **$search_key** /i;
$subsentences[$i] =~ s/\b$parsewords[1]_\w+/ --$parsewords[1]--/i;
print "MATCH #$count\n",split(/_\S+/,$subsentences[$i]), "\n";
$count++;
This is part of a longer code so if anything is unclear let me know. The _ is because the words in the sentence are tagged (ex. I_NN move_VB to_PREP ....).
Where $search_keyd will be $search_key."d", which worked!
A nice addition would be to check if the word ended in e and therefore only a d would need to be appended. I'd guess it'd look something like this: e?$/d$
Even a general answer will suffice.
I'm new to Perl. So sorry if this is elementary. Thanks in advance!!!

If I understand you correctly, you want to search for "move" and add a highlight, but also include any variation of the basic word, such as "moves" "moved".
When you are replacing words in a text like this, you usually want to replace all the words, and then you need the /g operator on the regex, like so:
$subsentences[$i] =~ s/$search_key/ **$search_key** /ig
Also, you should make sure to not match partials of words. E.g. you want to match "move", but not perhaps "remove". For this, you can use \b to mark word boundry:
$subsentences[$i] =~ s/\b$search_key/ **$search_key** /ig
In order to match certain suffixes, you need a character class with valid characters or combination of characters. move[sd] will find "moves" and "moved". However, for a word like "jump", you would need to be a bit more specific: "jump(s|ed)". Note that [sd] can be replaced with (s|d). So barring any bad spelling in your text, you can get away with:
$subsentences[$i] =~ s/\b$search_key(s|d|ed)/ **$search_key$1** /ig
Note that $1 matches whatever is found inside the first matching parenthesis.
To find the number of matching words:
my $matches = $subsentences[$i] =~ s/\b$search_key(s|d|ed)/ **$search_key$1** /ig
If you want to be more specific with the suffixes, i.e. make it not match badly spelled words like "moveed", you'd need to do some special matching. Something like:
if ($search_key =~ /e$/i) { $suffix = '(s|d)' }
else { $suffix = '(s|ed)' }
my $matches = $subsentences[$i] =~ s/\b$search_key$suffix/ **$search_key$1** /ig
It can probably become very complicated the more search words you add.
Some help about regexes here

If what you want is to match all complete words which begin with your search term, i.e. 'move' matches 'move', 'moved', 'movers', etc, then you want to use a character class to detect the end of the word.
So, instead of:
if ($sentence =~ /($search_key)d$/i)
Try using:
if ($sentence =~ /($search_key\w*)\W$/i)
The \w* will match any number of standard word characters and the \W should prevent you from including other characters, such as whitespace or punctuation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Better way to remove specific characters from a Perl string - regex

With a character class this big it is easier to say what you want to keep. A caret in the first position of a character class inverts its sense, so you can write $varTemp =~ s/[^"%'+\-0-9<=>a-z_{|}]+//gi or, using the more efficient tr $varTemp =~ tr/"%'+\-0-9<=>A-Z_a-z{|}//cd tr docs

Related

matching cond in perl using double exclaimation

Perl: Regex with Variables inside?

Regex to find text between second and third slashes

remove up to _ in perl using regex?

Regex to match suffixes to english words

Categories

Resources