Regex default value if not found - regex

I would like to supply my regular expression with a 'default' value, so if the thing I was looking for is not found, it will return the default value as if it had found it.
Is this possible to do using regex?

It sounds like you want some sort of regex syntax that says "if the regexp does not match any part of the given string pretend that it matched the following substring: 'foobar'". Such a feature does not exist in any regexp syntax I've seen.
You'll probably need to something like this:
matched_string = string.find_regex_match(regex);
if(matched_string == null) {
string = "default";
}
(This will of course need to be adjusted to the language you're using)

It's hard to answer this without a specific language, but in Perl at least, something like this works:
$string='hello';
$default = 1234;
($match) = ($string =~ m/(\d+)/ or $default);
print "$match\n";
1234
Not strictly part of the regex, but avoids the extra conditional block.

As far as I know, you can't do that with RegExp`s, at least with Perl Compatible Regular Expressions.
You can see by your self here.

Here's what I did in javascript...
function match(regx, str, dflt, index = 0) {
if (!str) return dflt
let x = str.match(regx)
return x ? x[index] || dflt : dflt
}

Related

Regular expression checking URLs, only allowing lowercase [duplicate]

What is the regular expression (in JavaScript if it matters) to only match if the text is an exact match? That is, there should be no extra characters at other end of the string.
For example, if I'm trying to match for abc, then 1abc1, 1abc, and abc1 would not match.
Use the start and end delimiters: ^abc$
It depends. You could
string.match(/^abc$/)
But that would not match the following string: 'the first 3 letters of the alphabet are abc. not abc123'
I think you would want to use \b (word boundaries):
var str = 'the first 3 letters of the alphabet are abc. not abc123';
var pat = /\b(abc)\b/g;
console.log(str.match(pat));
Live example: http://jsfiddle.net/uu5VJ/
If the former solution works for you, I would advise against using it.
That means you may have something like the following:
var strs = ['abc', 'abc1', 'abc2']
for (var i = 0; i < strs.length; i++) {
if (strs[i] == 'abc') {
//do something
}
else {
//do something else
}
}
While you could use
if (str[i].match(/^abc$/g)) {
//do something
}
It would be considerably more resource-intensive. For me, a general rule of thumb is for a simple string comparison use a conditional expression, for a more dynamic pattern use a regular expression.
More on JavaScript regexes: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
"^" For the begining of the line "$" for the end of it. Eg.:
var re = /^abc$/;
Would match "abc" but not "1abc" or "abc1". You can learn more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Unanchored substring searching: index vs regex?

I am writing some Perl scripts where I need to do a lot of string matching.
For example:
my $str1 = "this is a test string";
my $str2 = "test";
To see if $str1 contains $str2 - I found that there are 2 approaches:
Approach 1:
use Index function:
if ( index($str1, $str2) != -1 ) { .... }
Approach 2:
use regular expression:
if( $str1 =~ /$str2/ ) { .... }
Which is better? and when should we use each of these over the other?
Here is the result of Benchmark:
use Benchmark qw(:all) ;
my $count = -1;
my $str1 = "this is a test string";
my $str2 = "test";
my $str3 = qr/test/;
cmpthese($count, {
'type1' => sub { if ( index($str1, $str2) != -1 ) { 1 } },
'type2' => sub { if( $str1 =~ $str3 ) { 1 } },
});
Result (when a match happens):
Rate type2 type1
type2 1747627/s -- -70%
type1 5770465/s 230% --
To be able to draw a conclusion, test not to match:
my $str2 = "text";
my $str3 = qr/text/;
Result (when a match does not happen):
Rate type2 type1
type2 1857295/s -- -67%
type1 5560630/s 199% --
Conclusion:
The index function is much faster than the regexp match.
When I see code that uses index, I usually see an index within an index within an index, etc. There's also more branching too: "if found, look for this; otherwise since not found, look for that." Almost always a single regex would have worked. So, for me, I almost always use a regex unless there's some specific reason I want to use an index.
Unfortunately, most programmers I run into don't read regex well and so for maintainability, the index method should be used more than I do.
If you need a substring match, use index. If you need a regexp match (with special meaning for regexp metacharacters), use =~. A substring match is usually faster, but regexps in Perl are quite well optimized, and simple regexp matches can be surprisingly fast. Benchmark it for yourself.
Since Perl 5.6, Perl is smart enough to recompile the regexp in $str =~ /$str2/ iff $str2 has changed since the last compilation. To fully control when your regexp is compiled, use qr/$str2/. See Does the 'o' modifier for Perl regular expressions still provide any benefit? for q/.../o (obsolete) and qr/.../ (not needed most of the time, but can be useful).

Passing a parameter to a regular expression to match the first letter in a word in perl

So here is what I'm doing. This is for homework, and I know I can't come on here and get you guys to do my homework for me but I'm stuck. We have to use perl (First time ever using it so forgive my stupidity) to make a function $starts_with that takes a parameter $str0 and $prefix. if $str0 starts with $prefix. then the function returns true. if it doesn't then it isn't pretty simple. We have to use regular expressions because that is the whole point of the exercise so here is my code
sub starts_with
{
$str0 = $_[0];
$prefix = $_[1];
if($prefix =~ /^($str0)/)
{
print $str0."\n";
print m/^(prefix)/."\n";
$startsWith = "Y"
}
if ($startsWith eq "Y")
{
print $str0." starts with ".$prefix."\n";
}
else
{
print $str0." does not start with ".$prefix."\n";
}
}
I'm almost ashamed to put this up here because I have no Idea what I'm doing yet. But I am trying to learn. I don't know how to do true false in perl thats why I have the $startsWith variable. you can fix that if you want. the part I need to fix is the line
if(str0 =~ /^($prefix)/)
I also need to find out how to refer to the first letter in str0...I think
A couple points without giving away the answer:
1) Arguments to functions are passed in a special variable called #_, which is what you are accessing when you say $_[0] and $_[1], but can be written much more concisely by assigned the argument list (#_) to your variables in list context
sub starts_with {
my ($str0, $prefix) = #_;
...
}
2) This statement: if($prefix =~ /^($str0)/) tests the exact opposite condition you are trying to prove. It says does the prefix start with the value of the variable $str0. What you really want to test is if $str0 starts with $prefix.
It might also be using to prefix your pattern with m flag, m/PATTERN which means match this pattern.
3) You don't have a return statement in your function, (As #M42 points out) the result of the last expression is returned; that expression being print will return true. You probably want to return true or false explicity.
See if you can use this to get started.
What I would do :
use Modern::Perl; # or use strict; use warnings; use feature qw/say/;
sub starts_with {
# better use #_, the default array instead of just elements of them
# ...like $_[0]
my ($str, $pref) = #_;
# very short expression, the pattern matching return a boolean.
# \Q\E is there to treat the prefix as-is (no metacharacters)
return $str =~ /^\Q$pref\E/;
}
# using our function
if (starts_with("foobar", "f")) {
say "TRUE";
}
else {
say "FALSE";
}
Golfing it a bit...
sub starts_with { $_[0] =~ /^\Q$_[1]/ }
Don't hand that version in though :-)

regular expression help: catch this: |TrxId=475665|

For example I have a string:
MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100 111|
and I want to catch this: |TrxId=475665|
after TrxId= it could be any numbers and any amount of them, so regex should catch as well:
|TrxId=111333| and |TrxId=0000011112222| and |TrxId=123|
TrxId=(\d+)
That would give a group (1) with the TrxId.
PS: Use global modifier.
The regex should look somewhat like this:
TrxId=[0-9]+
It will match TrxId= followed by at least one digit.
An example solution in Python:
In [107]: data = 'MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100 111|'
In [108]: m = re.search(r'\|TrxId=(\d+)\|', data)
In [109]: m.group(0)
Out[109]: '|TrxId=475665|'
In [110]: m.group(1)
Out[110]: '475665'
/MsgNam\=.*?\|(TrxId\=\d+)\|.*/
for example in perl:
$a = "MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100111|";
$a =~ /MsgNam\=.*?\|(TrxId\=\d+)\|.*/;
print $1;
will print TrxId=475665
You know what your delimiters look like, so you don't need a regex, you need to split. Here's an implementation in Perl.
use strict;
use warnings;
my $input = "MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100 111|";
my #first_array = split(/\|/,$input); #splitting $input on "|"
#Now, since the last character of $input is "|", the last element
#of this array is undef (ie the Perl equivalent of null)
#So, filter that out.
#first_array = grep{defined}#first_array;
#Also filter out elements that do not have an equals sign appearing.
#first_array = grep{/=/}#first_array;
#Now, put these elements into an associative array:
my %assoc_array;
foreach(#first_array)
{
if(/^([^=]+)=(.+)$/)
{
$assoc_array{$1} = $2;
}
else
{
#Something weird may be happening...
#we may have an element starting with "=" for example.
#Do what you want: throw a warning, die, silently move on, etc.
}
}
if(exists $assoc_array{TrxId})
{
print "|TrxId=" . $assoc_array{TrxId} . "|\n";
}
else
{
print "Sorry, TrxId not found!\n";
}
The code above yields the expected output:
|TrxId=475665|
Now, obviously this is more complex than some of the other answers, but it's also a bit more robust in that it allows you to search for more keys as well.
This approach does have a potential issue if your keys appear more than once. In that case, it's easy enough to modify the code above to collect an array reference of values for each key.

What regular expression can I use to match a cell reference?

For one of my projects I want to use a regular expression to match a string like "REF:Sheet1!$C$6".
So far I have done
public static private bool IsCellReference()
{
string CELL_REFERENCE_PATTERN = #"REF:Sheet[1-9]!$[A-Z]$[0-9]";
Regex r = new Regex(CELL_REFERENCE_PATTERN);
Match m = r.Match("REF:Sheet1!$C$6");
if (m.Success) return true;
else return false;
}
but it is not working. It is returning false.
Where am I wrong?
You need to escape your $ signs.
REF:Sheet[1-9]!\$[A-Z]\$[0-9]
See Regular Expression Language Elements for more information
Also, this page is good for testing your regexes: A better .NET Regular Expression Tester