how to replace specific character(s) in string by number(s) - regex

I would like to replace a string specific character(s) with numbers.
lets assume I have such format string "B######" so it has one "letter" and 6 "#" characters. My need is to first figure out how many "#" it contains and based on this number, will generate random token
Session::Token->new(alphabet => ['0'..'9'], length => $length_from_format_string);
then, I need to replace that #... with the generated number. BUT...
format string could be also B##CDE###1 so it still has 6 "#" so generated number must be divided according to format :( and all this should be as effective as possible
Thanks for your hints

Regular expressions (in perl) can have functions embedded if you use the e flag. Adding the g modifier will do it multiple times.
So:
my $string = "B##CDE###1";
$string =~ s/\#/int rand(10)/ge;
print $string;

Related

Matching the last digits of a number in Perl

I have a file in which there are a lot of GUIDs mentioned like this
Dlg1={929EC5C7-0A40-4BE4-8F0A-60C3CB4A62A7}-SdWelcome-0
I wanted to replace the last eight digits of these GUIDs with the last eight digits of a new GUID which is already generated using a tool. What I have tried so follows.
Read the last eight digits of the generated GUID like this:
$GUID =~ /[0-9a-fA-F]{8}/;
Assign it to a new variable like:
$newGUID = $1;
Now try to replace this with the old GUID inside the file:
if ($line =~ /^.* {(.*)}/) {
$line =~ s/[0-9a-fA-F]{8}}/$newGUID/;
}
But it does not seem to be working. It replaces the last eight digits of the old GUID with 32 digits of the new GUID. How can I fix this?
it replaces the last 8 digits of old GUID with 32 digits of new GUID , any ideas how to achieve it.
You now have this:
$line =~s/[0-9a-fA-F]{8}}/$newGUID/;
You say that replaces the last eight characters of your GUID with the entire 32 digit new GUID. That means your finding and replacing the right characters, but what you're replacing it with is wrong.
What is $newGUID equal to? Is it an entire 32 digit GUID? If so, you need to pull off the last 8 characters.
Two things I would recommend.
If you are using a hexadecimal number in your regular expression, use [[:xdigit:]] and not [0-9a-fA-F]. Although both are pretty much equivalent. Using :xdigit: is cleaner and it's easier to understand.
In Perl, we love regular expressions. Heck, Perl regular expression syntax has invaded and found homes in almost all other programming languages. However, regular expressions can be difficult to get right and test. They can also be difficult to understand too. However, sometimes there are better ways of doing something besides a regular expression that's cleaner and easier to undertstand.
In this case, you should use substr rather than regular expressions. You know exactly what you want, and you know the location in the string. The substr command would make what you're doing easier to understand and even cleaner:
use constant {
GUID_RE => qr/^[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}$/,
};
my $old_guid = '929EC5C7-0A40-4BE4-8F0A-60C3CB4A62A7';
my $new_guid = 'oooooooo-oooo-oooo-oooo-ooooXXXXXXXX';
# Regular expressions are great for verifying formats!
if ( not $old_guid =~ GUID_RE ) {
die qq(Old GUID "$new_guid" is not a GUID string);
}
if ( not $new_guid =~ GUID_RE ) { # Yes, I know this will die in this case
die qq(New GUID "$new_guid" is not a GUID string);
}
# Easy to understand, I'm removing the last eight characters of $old_guid
# and appending the last eight digits of $new_guid
my $munged_guid = substr( $old_guid, 0, -8 ) . substr( $new_guid, -8 );
say $munged_guid; # Prints 929EC5C7-0A40-4BE4-8F0A-60C3XXXXXXX
I'm using regular expressions to verify that the GUID are correctly formatted which is a great task for regular expressions.
I define a GUID_RE constant. You can look to see how it's defined and verify if it's in the correct format (12 hex digits, 4 hex digits, 4 hex digits, and 12 hex digits all separated by dashes).
Then, I can use that GUID_RE constant in my program, and it's easy to see what I'm doing. Is my GUID actually in the GUID_ID format?
Using substr instead of regular expressions make it easy to see exactly what I am doing. I am removing the last eight characters off of $old_guid and appending the last eight characters of $new_guid.
Again, your immediate issue is that your s/.../.../ is finding the right characters, but your substitution string isn't correct. However, this isn't the best use for regular expressions.
I think your problem is that you're not correctly setting $1 to the last eight digits (if it's coming from that regex, it would match the first eight digits and isn't setting any groups). You could instead try something like $newGUID = substr($GUID, -8);. I also think something like $GUIDTail makes more sense for the variable since it doesn't store an entire GUID.
Also, at the moment you're eating the closing curly brace. You should either include that in newGuid/guidTail, include it in the s/// call, or change the curly in the match to (?=\}) (which represents match this but don't include it in the match).
P.S.: You're making the assumption there that's there's only one GUID on the line. You may want to tack a global modifier to the match if there's any chance of multiple GUIDs (or otherwise disambiguating which one you want to modify, but this will just replace the first one).
Here's a small code snippet that demonstrates the principle I think you are after. First off, I start with a given string, and take the last 8 characters of it and store it in a new variable, $insert. Then I perform a somewhat strict substitution on the input data (here in the internal file handle DATA, which is convenient when demonstrating), and print the altered string.
The regex in the substitution looks for curly brackets { ... } with a mixture of hex digits [:xdigit:] and dashes \- between them ([[:xdigit:]\-]+), followed by 8 hex digits. The \K escape allows us to "keep" the matched string before it, so all we need to do is insert our stored string, and replace the closing curly bracket.
If you wish to try this on a file, change <DATA> to <> and run it like so:
perl script.pl input
Code:
use strict;
use warnings;
my $new = "929EC5C7-0A40-4BE4-8F0A-1234567890";
my $insert = substr($new, -8);
while (<DATA>) {
s/\{[[:xdigit:]\-]+\K[[:xdigit:]]{8}\}/$insert}/i;
print;
}
__DATA__
Dlg1={929EC5C7-0A40-4BE4-8F0A-60C3CB4A62A7}-SdWelcome-0
Output:
Dlg1={929EC5C7-0A40-4BE4-8F0A-60C334567890}-SdWelcome-0

Check if string is subset of a bunch of characters? (RegEx)?

I have a little problem, I have 8 characters, for example "a b c d a e f g", and a list of words, for example:
mom, dad, bad, fag, abac
How can I check if I can or cannot compose these words with the letters I have?
In my example, I can compose bad, abac and fag, but I cannot compose dad (I have not two D) and mom (I have not M or O).
I'm pretty sure it can be done using a RegEx but would be helpful even using some functions in Perl..
Thanks in advance guys! :)
This is done most simply by forming a regular expression from the word that is to be tested.
This sorts the list of available characters and forms a string by concatenating them. Then each candidate word is split into characters, sorted, and rejoined with the regex term .* as separator. So, for instance, abac will be converted to a.*a.*b.*c.
Then the validity of the word is determined by testing the string of available characters against the derived regex.
use strict;
use warnings;
my #chars = qw/ a b c d a e f g /;
my $chars = join '', sort #chars;
for my $word (qw/ mom dad bad fag abac /) {
my $re = join '.*', sort $word =~ /./g;
print "$word is ", $chars =~ /$re/ ? 'valid' : 'NOT valid', "\n";
}
output
mom is NOT valid
dad is NOT valid
bad is valid
fag is valid
abac is valid
This is to demonstrate the possibility rather than endorsing the regex method. Please consider other saner solution.
First step, you need to count the number of characters available.
Then construct your regex as such (this is not Perl code!):
Start with start of input anchor, this matches the start of the string (a single word from the list):
^
Append as many of these as the number of unique characters:
(?!(?:[^<char>]*+<char>){<count + 1>})
Example: (?!(?:[^a]*+a){3}) if the number of a is 2.
I used an advanced regex construct here called zero-width negative look-ahead (?!pattern). It will not consume text, and it will try its best to check that nothing ahead in the string matches the pattern specified (?:[^a]*+a){3}. Basically, the idea is that I check that I cannot find 3 'a' ahead in the string. If I really can't find 3 instances of 'a', it means that the string can only contain 2 or less 'a'.
Note that I use *+, which is 0 or more quantifier, possessively. This is to avoid unnecessary backtracking.
Put the characters that can appear within []:
[<unique_chars_in_list>]+
Example: For a b c d a e f g, this will become [abcdefg]+. This part will actually consume the string, and make sure the string only contains characters in the list.
End with end of input anchor, which matches the end of the string:
$
So for your example, the regex will be:
^(?!(?:[^a]*+a){3})(?!(?:[^b]*+b){2})(?!(?:[^c]*+c){2})(?!(?:[^d]*+d){2})(?!(?:[^e]*+e){2})(?!(?:[^f]*+f){2})(?!(?:[^g]*+g){2})[abcdefg]+$
You must also specify i flag for case-insensitive matching.
Note that this only consider the case of English alphabet (a-z) in the list of words to match. Space and hyphen are not (yet) considered here.
How about sorting both strings into alphabetical order then for the string you want to check insert .*
between each letter like so:
'aabcdefg' =~ m/a.*b.*d.*/
True
'aabcdefg' =~ m/m.*m.*u.*/
False
'aabcdefg' =~ m/a.*d.*d.*/
False
Some pseudocode:
Sort the available characters into alphabetical order
for each word:
Sort the characters of the word into alphabetical order
For each character of the word search forwards through the available characters to find a matching character. Note the this
search will never go back to the start of the available chars,
matched chars are consumed.
Or even better, use frequency counts of characters.
For your available characters, construct a map from character to occurence count of that character.
Do the same for each candidate word and compare against the available map, if the word map contains a mapping for a character where the available map does not, or the mapped value is larger in the word map than the available map, then the word cannot be constructed using the available characters.
Here's a really simple script that would be rather easy to generalize:
#!/usr/bin/env perl
use strict;
use warnings;
sub check_word {
my $word = shift;
my %chars;
$chars{$_}++ for #_;
$chars{$_}-- or return for split //, $word;
return 1;
}
print check_word( 'cab', qw/a b c/ ) ? "Good" : "Bad";
And of course the performance of this function could be greatly enhanced if the letters list is going to be the same every time. Actually for eight characters, copying the hash vs building a new one each time is probably the same speed.
pseudocode:
bool possible=true
string[] chars= { "a", "b", "c"}
foreach word in words
{
foreach char in word.chars
{
possible=possible && chars.contains(char)
}
}

Regular expression help in Perl

I have following text pattern
(2222) First Last (ab-cd/ABC1), <first.last#site.domain.com> 1224: efadsfadsfdsf
(3333) First Last (abcd/ABC12), <first.last#site.domain.com> 1234, 4657: efadsfadsfdsf
I want the number 1224 or 1234, 4657 from the above text after the text >.
I have this
\((\d+)\)\s\w*\s\w*\s\(\w*\/\w+\d*\),\s<\w*\.\w*\#\w*\.domain.com>\s\d+:
which will take the text before : But i want the one after email till :
Is there any easy regular expression to do this? or should I use split and do this
Thanks
Edit: The whole text is returned by a command line tool.
(3333) First Last (abcd/ABC12), <first.last#site.domain.com> 1234, 4657: efadsfadsfdsf
(3333) - Unique ID
First Last - First and last names
<first.last#site.domain.com> - Email address in format FirstName.LastName#sub.domain.com
1234, 4567 - database primary Keys
: xxxx - Headline
What I have to do is process the above and get hte database ID (in ex: 1234, 4567 2 separate ID's) and query the tables
The above is the output (like this I will get many entries) from the tool which I am calling via my Perl script.
My idea was to use a regular expression to get the database id's. Guess I could use regular expression for this
you can fudge the stuff you don't care about to make the expression easier, say just 'glob' the parts between the parentheticals (and the email delimiters) using non-greedy quantifiers:
/(\d+)\).*?\(.*?\),\s*<.*?>\s*(\d+(?:,\s*\d+)*):/ (not tested!)
there's only two captured groups, the (1234), and the (1234, 4657), the second one which I can only assume from your pattern to mean: "a digit string, followed by zero or more comma separated digit strings".
Well, a simple fix is to just allow all the possible characters in a character class. Which is to say change \d to [\d, ] to allow digits, commas and space.
Your regex as it is, though, does not match the first sample line, because it has a dash - in it (ab-cd/ABC1 does not match \w*\/\w+\d*\). Also, it is not a good idea to rely too heavily on the * quantifier, because it does match the empty string (it matches zero or more times), and should only be used for things which are truly optional. Use + otherwise, which matches (1 or more times).
You have a rather strict regex, and with slight variations in your data like this, it will fail. Only you know what your data looks like, and if you actually do need a strict regex. However, if your data is somewhat consistent, you can use a loose regex simply based on the email part:
sub extract_nums {
my $string = shift;
if ($string =~ /<[^>]*> *([\d, ]+):/) {
return $1 =~ /\d+/g; # return the extracted digits in a list
# return $1; # just return the string as-is
} else { return undef }
}
This assumes, of course, that you cannot have <> tags in front of the email part of the line. It will capture any digits, commas and spaces found between a <> tag and a colon, and then return a list of any digits found in the match. You can also just return the string, as shown in the commented line.
There would appear to be something missing from your examples. Is this what they're supposed to look like, with email?
(1234) First Last (ab-cd/ABC1), <foo.bar#domain.com> 1224: efadsfadsfdsf
(1234) First Last (abcd/ABC12), <foo.bar#domain.com> 1234, 4657: efadsfadsfdsf
If so, this should work:
\((\d+)\)\s\w*\s\w*\s\(\w*\/\w+\d*\),\s<\w*\.\w*\#\w*\.domain\.com>\s\d+(?:,\s(\d+))?:
$string =~ /.*>\s*(.+):.+/;
$numbers = $1;
That's it.
Tested.
With number catching:
$string =~ /.*>\s*(?([0-9]|,)+):.+/;
$numbers = $1;
Not tested but you get the idea.

How to match the pattern between a specified characters?

I have a fixed message delimited by "|"... tag=value is the pair between the delimiter;
(8=FIX.4.2|9=0360|35=8|49=BLPFT|56=ESP|34=8415|52=20110201-15:59:59|50=MBA|143=LN|115=MSET|57=2457172|30=CHIX|60=20110201-15:59:59.121|150=1|31=56.3100|151=71785|32=137|6=56.4058|37=9D9ZIhgu4BGU9sBtfHcYeQA|38=97370|39=1|40=1|11=20110201-05529|12=0.0012|13=2|14=25585|15=EUR|76=CHIXCCP|17=272674|47=A|167=CS|18=1|48=FR0000131104|20=0|21=1|22=4|113=N|54=1|55=BNP|207=FP|29=1|59=0|10=205|)
How to extract a data between "11=" and a first occurrence of "|" after a match?
For example i want a data
20110201-05529
which is between "|11=" and "|"
Can you please tell me the regular expression?
The best approach will depend on how much you know about the data you are trying to match. If you know it will be comprised of numbers and dashes only:
m/11=([0-9\-]+)/
Conversely, if the data could contain any kind of characters, use:
m/11=([^|]+)/
Which matches anything that isn't a pipe character. This is probably the most reliable expression.
In both cases, the data you want is captured into the $1 special variable.
If you don't always want to match the value for the key 11, you can use variables in the pattern, so:
my $key = 42; # or any number
if ($text =~ m/$key=([^|]+)/) {
print "I found $1"; # prints "I found 20110201-05529"
}
As always, there is more than 1 way to solve the problem. Therefore, there is no such thing as "the regular expression".
But you will definitely want to perldoc split.
Something like this will match everything else than = then everything else than |
[^=]+=([^|])+

How to use a REGEX pattern to remove a specific word "THE" only if at beginning of text string?

I have a text input field for titles of various things and to help minimize false negatives on search results(internal search is not the best), I need to have a REGEX pattern which looks at the first four characters of the input string and removes the word(and space after the word) _the _ if it is there at the beginning only.
For example if we are talking about the names of bands, and someone enters The Rolling Stones , what i need is for the entry to say only Rolling Stones
Can a regex be used to automatically strip these 4characters?
Applying the regex
^(?:\s*the\s*)?(.*)$
will match any string, and capture it in backreference no. 1, unless it starts with the (optionally surrounded by whitespace), in which case backref no. 1 will contain whatever follows.
You need to set the case-insensitive option in your regex engine for this to work.
You can use the ^ identifier to match a pattern at the beginning of a line, however for what you are using this for, it can be considered overkill.
A lot of languages support string manipulations, which is a more suitable choice. I can provide an example to demonstrate in Python,
>>> def func(n):
n = n[4:len(n)] if n[0:4] == "The " else n
return n
>>> func("The Rolling Stones")
'Rolling Stones'
>>> func("They Might Be Giants")
'They Might Be Giants'
As you don't clarify with language, here is a solution in Perl :
my $str = "The Rolling Stones";
$str =~ s/^the //i;
say $str; # Rolling Stones