Regex replace that preserves capitalization

Regex replace that preserves capitalization - regex

Is it possible to replace "this" with "that" and "This" with "That" in one regex?
Perl extensions and other special tricks are allowed.
Edit: It's too easy if the replacement starts with the same letter. How about server to node and Server to Node
Assuming I only care about the first letter and using Perl, I came up with:
s/(s)erver/($1 eq uc $1 ? "N" : "n") . "ode"/ie

Yeah, you can do it like:
var z = "This this";
z.replace(/([Tt])his/g, '$1hat');
result would be: That that

Related

Keep case with regex find and replace

My question is pretty straightforward, using only a regular expression find and replace, is it possible to keep the case of the original words.
So if I have the string: "Pretty pretty is so pretty"
How can I turn it into: "Lovely lovely is so lovely"
All I have so far is find /(P|p)retty/g and replace with $1ovely but I dont know how to replace caplital P with L and lowercase p with l.
I am not interested in accomplishing this in any particular language, I want to know if it is possible to do with pure regex.

It can't be possible to replace captured uppercase or lowercase letter with the letter according to the type of letter captured through regex alone. But it can be possible through language built-in functions + regex.
In php, i would do like.
$str = "Pretty pretty is so pretty";
echo preg_replace_callback('~([pP])retty~', function ($m)
{
if($m[1] == "P") {
return "Lovely"; }
else { return "lovely"; }
}, $str);
Output:
Lovely lovely is so lovely

Trying to match a string in the format of domain\username using Lua and then mask the pattern with '#'

I am trying to match a string in the format of domain\username using Lua and then mask the pattern with #.
So if the input is sample.com\admin; the output should be ######.###\#####;. The string can end with either a ;, ,, . or whitespace.
More examples:
sample.net\user1,hello -> ######.###\#####,hello
test.org\testuser. Next -> ####.###\########. Next
I tried ([a-zA-Z][a-zA-Z0-9.-]+)\.?([a-zA-Z0-9]+)\\([a-zA-Z0-9 ]+)\b which works perfectly with http://regexr.com/. But with Lua demo it doesn't. What is wrong with the pattern?
Below is the code I used to check in Lua:
test_text="I have the 123 name as domain.com\admin as 172.19.202.52 the credentials"
pattern="([a-zA-Z][a-zA-Z0-9.-]+).?([a-zA-Z0-9]+)\\([a-zA-Z0-9 ]+)\b"
res=string.match(test_text,pattern)
print (res)
It is printing nil.

Lua pattern isn't regular expression, that's why your regex doesn't work.
\b isn't supported, you can use the more powerful %f frontier pattern if needed.
In the string test_text, \ isn't escaped, so it's interpreted as \a.
. is a magic character in patterns, it needs to be escaped.
This code isn't exactly equivalent to your pattern, you can tweek it if needed:
test_text = "I have the 123 name as domain.com\\admin as 172.19.202.52 the credentials"
pattern = "(%a%w+)%.?(%w+)\\([%w]+)"
print(string.match(test_text,pattern))
Output: domain com admin
After fixing the pattern, the task of replacing them with # is easy, you might need string.sub or string.gsub.

Like already mentioned pure Lua does not have regex, only patterns.
Your regex however can be matched with the following code and pattern:
--[[
sample.net\user1,hello -> ######.###\#####,hello
test.org\testuser. Next -> ####.###\########. Next
]]
s1 = [[sample.net\user1,hello]]
s2 = [[test.org\testuser. Next]]
s3 = [[abc.domain.org\user1]]
function mask_domain(s)
s = s:gsub('(%a[%a%d%.%-]-)%.?([%a%d]+)\\([%a%d]+)([%;%,%.%s]?)',
function(a,b,c,d)
return ('#'):rep(#a)..'.'..('#'):rep(#b)..'\\'..('#'):rep(#c)..d
end)
return s
end
print(s1,'=>',mask_domain(s1))
print(s2,'=>',mask_domain(s2))
print(s3,'=>',mask_domain(s3))
The last example does not end with ; , . or whitespace. If it must follow this, then simply remove the final ? from pattern.
UPDATE: If in the domain (e.g. abc.domain.org) you need to also reveal any dots before that last one you can replace the above function with this one:
function mask_domain(s)
s = s:gsub('(%a[%a%d%.%-]-)%.?([%a%d]+)\\([%a%d]+)([%;%,%.%s]?)',
function(a,b,c,d)
a = a:gsub('[^%.]','#')
return a..'.'..('#'):rep(#b)..'\\'..('#'):rep(#c)..d
end)
return s
end

TCL regsub isn't working when the expression has [0]

I tried the following code:
set exp {elem[0]}
set temp {elem[0]}
regsub $temp $exp "1" exp
if {$exp} {
puts "######### 111111111111111 ################"
} else {
puts "########### 0000000000000000 ############"
}
of course, this is the easiest regsub possible (the words match completely), and still it doesnt work, and no substitution is done. if I write elem instead of elem[0], everything works fine.
I tried using {elem[0]}, elem[0], "elem[0]" etc, and none of them worked.
Any clue anyone?

This is the easiest regsub possible (the words match completely)
Actually, no, the words don't match. You see, in a regular expression, square brackets have meaning. Your expression {elem[0]} actually mean:
match the sequence of letters 'e'
followed by 'l'
followed by 'e'
followed by 'm'
followed by '0' (the character for the number zero)
So it would match the string "elem0" not "elem[0]" since the character after 'm' is not '0'.
What you want is {elem\[0\]} <-- backslash escapes special meaning.
Read the manual for tcl's regular expression syntax, re_syntax, for more info on how regular expressions work in tcl.

In addition to #slebetman's answer, if your want any special characters in your regular expression to be treated like plain text, there is special syntax for that:
set word {abd[0]}
set regex $word
regexp $regex $word ;# => returns 0, did not match
regexp "(?q)$regex" $word ;# => returns 1, matched
That (?q) marker must be the first part of the RE.
Also, if you're really just comparing literal strings, consider the simpler if {$str1 eq $str2} ... or the glob-style matching of [string match]

How can I convert a string into a regular expression that matches itself in Perl?

How can I convert a string to a regular expression that matches itself in Perl?
I have a set of strings like these:
Enter your selection:
Enter Code (Navigate, Abandon, Copy, Exit, ?):
and I want to convert them to regular expressions sop I can match something else against them. In most cases the string is the same as the regular expression, but not in the second example above because the ( and ? have meaning in regular expressions. So that second string needs to be become an expression like:
Enter Code \(Navigate, Abandon, Copy, Exit, \?\):
I don't need the matching to be too strict, so something like this would be fine:
Enter Code .Navigate, Abandon, Copy, Exit, ..:
My current thinking is that I could use something like:
s/[\?\(\)]/./g;
but I don't really know what characters will be in the list of strings and if I miss a special char then I might never notice the program is not behaving as expected. And I feel that there should exist a general solution.
Thanks.

As Brad Gilbert commented use quotemeta:
my $regex = qr/^\Q$string\E$/;
or
my $quoted = quotemeta $string;
my $regex2 = qr/^$quoted$/;

There is a function for that quotemeta.
quotemeta EXPR
Returns the value of EXPR
with all non-"word" characters
backslashed. (That is, all characters
not matching /[A-Za-z_0-9]/ will be
preceded by a backslash in the
returned string, regardless of any
locale settings.) This is the internal
function implementing the \Q escape in
double-quoted strings.
If EXPR is omitted, uses $_.

From http://www.regular-expressions.info/characters.html :
there are 11 characters with special meanings: the opening square bracket [, the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening round bracket ( and the closing round bracket )
In Perl (and PHP) there is a special function quotemeta that will escape all these for you.

To put Brad Gilbert's suggestion into an answer instead of a comment, you can use quotemeta function. All credit to him

Why use a regular expression at all? Since you aren't doing any capturing and it seems you will not be going to allow for any variations, why not simply use the index builtin?
$s1 = 'hello, (world)?!';
$s2 = 'he said "hello, (world)?!" and nothing else.';
if ( -1 != index $s2, $s1 ) {
print "we've got a match\n";
}
else {
print "sorry, no match.\n";
}

Exclusive Or in Regular Expression

Looking for a bit of regex help.
I'd like to design an expression that matches a string with "foo" OR "bar", but not both "foo" AND "bar"
If I do something like...
/((foo)|(bar))/
It'll match "foobar". Not what I'm looking for. So, how can I make regex match only when one term or the other is present?
Thanks!

This is what I use:
/^(foo|bar){1}$/
See: http://www.regular-expressions.info/quickstart.html under repetition

If your regex language supports it, use negative lookaround:
(?<!foo|bar)(foo|bar)(?!foo|bar)
This will match "foo" or "bar" that is not immediately preceded or followed by "foo" or "bar", which I think is what you wanted.
It's not clear from your question or examples if the string you're trying to match can contain other tokens: "foocuzbar". If so, this pattern won't work.
Here are the results of your test cases ("true" means the pattern was found in the input):
foo: true
bar: true
foofoo: false
barfoo: false
foobarfoo: false
barbar: false
barfoofoo: false

You can do this with a single regex but I suggest for the sake of readability you do something like...
(/foo/ and not /bar/) || (/bar/ and not /foo/)

This will take 'foo' and 'bar' but not 'foobar' and not 'blafoo' and not 'blabar':
/^(foo|bar)$/
^ = mark start of string (or line)
$ = mark end of string (or line)
This will take 'foo' and 'bar' and 'foo bar' and 'bar-foo' but not 'foobar' and not 'blafoo' and not 'blabar':
/\b(foo|bar)\b/
\b = mark word boundry

You haven't specified behaviour regarding content other than "foo" and "bar" or repetitions of one in the absence of the other. e.g., Should "food" or "barbarian" match?
Assuming that you want to match strings which contain only one instance of either "foo" or "bar", but not both and not multiple instances of the same one, without regard for anything else in the string (i.e., "food" matches and "barbarian" does not match), then you could use a regex which returns the number of matches found and only consider it successful if exactly one match is found. e.g., in Perl:
#matches = ($value =~ /(foo|bar)/g) # #matches now hold all foos or bars present
if (scalar #matches == 1) { # exactly one match found
...
}
If multiple repetitions of that same target are allowed (i.e., "barbarian" matches), then this same general approach could be used by then walking the list of matches to see whether the matches are all repeats of the same text or if the other option is also present.

You might want to consider the ? conditional test.
(?(?=regex)then|else)
Regular Expression Conditionals

If you want a true exclusive or, I'd just do that in code instead of in the regex. In Perl:
/foo/ xor /bar/
But your comment:
Matches: "foo", "bar" nonmatches:
"foofoo" "barfoo" "foobarfoo" "barbar"
"barfoofoo"
indicates that you're not really looking for exclusive or. You actually mean
"Does /foo|bar/ match exactly once?"
my $matches = 0;
while (/foo|bar/g) {
last if ++$matches > 1;
}
my $ok = ($matches == 1)

I know this is a late entry, but just to help others who may be looking:
(/b(?:(?:(?!foo)bar)|(?:(?!bar)foo))/b)

I'd use something like this. It just checks for space around the words, but you could use the \b or \B to check for a border if you use \w. This would match " foo " or " bar ", so obviously you'd have to replace the whitespace as well, just in case. (Assuming you're replacing anything.)
/\s((foo)|(bar))\s/

I don't think this can be done with a single regular expression. And boundaries may or may not work depending on what you're matching against.
I would match against each regex separately, and do an XOR on the results.
foo = re.search("foo", str) != None
bar = re.search("bar", str) != None
if foo ^ bar:
# do someting...

I tried with Regex Coach against:
x foo y
x bar y
x foobar y
If I check the g option, indeed it matches all three words, because it searches again after each match.
If you don't want this behavior, you can anchor the expression, for example matching only on word boundaries:
\b(foo|bar)\b
Giving more context on the problem (what the data looks like) might give better answers.

\b(foo)\b|\b(bar)\b
And use only the first capture group.

Using the word boundaries, you can get the single word...
me#home ~
$ echo "Where is my bar of soap?" | egrep "\bfoo\b|\bbar\b"
Where is my bar of soap?
me#home ~
$ echo "What the foo happened here?" | egrep "\bfoo\b|\bbar\b"
What the foo happened here?
me#home ~
$ echo "Boy, that sure is foobar\!" | egrep "\bfoo\b|\bbar\b"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex replace that preserves capitalization - regex

Yeah, you can do it like: var z = "This this"; z.replace(/([Tt])his/g, '$1hat'); result would be: That that

Related

Keep case with regex find and replace

Trying to match a string in the format of domain\username using Lua and then mask the pattern with '#'

TCL regsub isn't working when the expression has [0]

How can I convert a string into a regular expression that matches itself in Perl?

Exclusive Or in Regular Expression

Categories

Resources