Use dict to replace word in string - regex

Im trying to replace one part of my string using a dict.
s = 'I am a string replaceme'
d = {
'replaceme': 'replace me'
}
Ive tried lots of variations like
s = s.replace(d, d[other])
That throws an error being name error: name 'other' is not defined. If I do
s = s.replace('replaceme', 'replace me')
It works. How can i achive my goal?

You have to replace each KEY of your dict with the VALUE associated. Which value holds the other variable? Is it a valid KEY of your substitutions dict?
You can try with this solution.
for k in d:
s = s.replace(k, d[k])
Each key in dictionary is the value to be replaced, using the corresponding VALUE accessed with d[k].
If the dictionary is big the provided example will show poor performances.

You could split the string and rejoin:
s = 'I am a string replaceme'
d = {
'replaceme': 'replace me'
}
print(" ".join([w if w not in d else d[w] for w in s.split(" ")]))
That won't match substrings where str.replace will, if you are trying to match substring iterate over the dict.items and replace the key with the value:
d = {
'replaceme': 'replace me'
}
for k,v in d.items():
s = s.replace(k,v)
print(s)
I am a string replace me

Here is a different approach: using reduce:
s = 'I am a string replaceme'
d = {'replaceme': 'replace me', 'string': 'phrase,'}
s = reduce(lambda text, old_new_pair: text.replace(* old_new_pair), d.items(), s)
# s is now 'I am a phrase, replace me'

Related

Regex that extract string of length that is encoded in string

I have the following string to parse:
X4IitemX6Nabc123
that is structured as follows:
X... marker for 'field identifier'
4... length of item (name), will change according to length of item name
I... identifier for item name, must not be extracted, fixed
item... value that should be extraced as "name"
X... marker for 'field identifier'
6... length of item (name), will change according to length of item name
N... identifier for item number, must not be extracted, fixed
abc123... value that should be extraced as "num"
Only these two values will be contained in the string, the sequence is also always the same (name, nmuber).
What I have so far is
\AX(?I<namelen>\d+)U(?<name>.+)X(?<numlen>\d+)N(?<num>.+)$
But that does not take into account that the length of the name is contained in the string itself. Somehow the .+ in the name group should be replaced by .{4}. I tried {$1}, {${namlen}} but that does not yield the result I expect (on rubular.com or regex.191)
Any ideas or further references?
What you ask for is only possible in languages that allow code insertions in the regex pattern.
Here is a Perl example:
#!/usr/bin/perl
use warnings;
use strict;
my $text = "X4IitemX6Nabc123";
if ($text =~ m/^X(?<namelen>[0-9]+)I(?<name>(??{".{".$^N."}"}))X(?<numlen>[0-9]+)N(?<num>.+)$/) {
print $text . ": PASS!\n";
} else {
print $text . ": FAIL!\n"
}
# -> X4IitemX6Nabc123: PASS!
In other languages, use a two-step approach:
Extract the number after X,
Build a regex dynamically using the result of the first step.
See a JavaScript example:
const text = "X4IitemX6Nabc123";
const rx1 = /^X(\d+)/;
const m1 = rx1.exec(text)
if (m1) {
const rx2 = new RegExp(`^X(?<namelen>\\d+)I(?<name>.{${m1[1]}})X(?<numlen>\\d+)N(?<num>.+)$`)
if (rx2.test(text)) {
console.log(text, '-> MATCH!')
} else console.log(text, '-> FAIL!');
} else {
console.log(text, '-> FAIL!')
}
See the Python demo:
import re
text = "X4IitemX6Nabc123"
rx1 = r'^X(\d+)'
m1 = re.search(rx1, text)
if m1:
rx2 = fr'^X(?P<namelen>\d+)I(?P<name>.{{{m1.group(1)}}})X(?P<numlen>\d+)N(?P<num>.+)$'
if re.search(rx2, text):
print(text, '-> MATCH!')
else:
print(text, '-> FAIL!')
else:
print(text, '-> FAIL!')
# => X4IitemX6Nabc123 -> MATCH!

Return first instance of capturing group if found, otherwise empty string

My inputs are strings that may or may not contain a pattern:
p = '(\d)'
s = 'abcd3f'
I want to return the capturing group for the first match of this pattern if it is found, and an empty string otherwise.
result = re.search(p, s)[1]
Will return the first match. But if s = 'abcdef' then search will return None and the indexing will throw an exception. Instead of doing that, I'd like it to just return an empty string. I can do:
g = re.search(p, s)
result = ''
if len(g) > 0: result = g[1]
Or even:
try:
result = re.search(p, s)[1]
except:
result = ''
But these both seem pretty complicated for something so simple. Is there a more elegant way of accomplishing what I want, preferably in one line?
You could use if YourString is None: to accomplish that. For example:
if s is None : s = ''
Example for Python:
import re
m = re.search('(\d)', 'ab1cdf')
if m is None : m = ''
print m.group(1)

Selectively uppercasing a string

I have a string with some XML tags in it, like:
"hello <b>world</b> and <i>everyone</i>"
Is there a good Scala/functional way of uppercasing the words, but not the tags, so that it looks like:
"HELLO <b>WORLD<b> AND <i>EVERYONE</i>"
We can use dustmouse's regex to replace all the text in/outside XML tags with Regex.replaceAllIn. We can get the matched text with Regex.Match.matched which then can easily be uppercased using toUpperCase.
val xmlText = """(?<!<|<\/)\b\w+(?!>)""".r
val string = "hello <b>world</b> and <i>everyone</i>"
xmlText.replaceAllIn(string, _.matched.toUpperCase)
// String = HELLO <b>WORLD</b> AND <i>EVERYONE</i>
val string2 = "<h1>>hello</h1> <span>world</span> and <span><i>everyone</i>"
xmlText.replaceAllIn(string2, _.matched.toUpperCase)
// String = <h1>>HELLO</h1> <span>WORLD</span> AND <span><i>EVERYONE</i>
Using dustmouse's updated regex :
val xmlText = """(?:<[^<>]+>\s*)(\w+)""".r
val string3 = """<h1>>hello</h1> <span id="test">world</span>"""
xmlText.replaceAllIn(string3, m =>
m.group(0).dropRight(m.group(1).length) + m.group(1).toUpperCase)
// String = <h1>>hello</h1> <span id="test">WORLD</span>
Okay, how about this. It just prints the results, and takes into consideration some of the scenarios brought up by others. Not sure how to capitalize the output without mercilessly poaching from Peter's answer:
val string = "<h1 id=\"test\">hello</h1> <span>world</span> and <span><i>everyone</i></span>"
val pattern = """(?:<[^<>]+>\s*)(\w+)""".r
pattern.findAllIn(string).matchData foreach {
m => println(m.group(1))
}
The main thing here is that it is extracting the correct capture group.
Working example: http://ideone.com/2qlwoP
Also need to give credit to the answer here for getting capture groups in scala: Scala capture group using regex

Regex as key in Dictionary in VB.NET

Is there a way to use Regex as a key in a Dictionary? Something like Dictionary(Of Regex, String)?
I'm trying to find a Regex in a list (let's say that there is no dictionary for the first time) by string, which it matches.
I can do it by manually iterating through the list of RegEx expressions. I'm just seeking for a method to do that more easily, such as TryGetValue from a Dictionary.
When you use Regex as the type for the key in a Dictionary, it will work, but it compares the key by object instance, not by the expression string. In other words, if you create two separate Regex objects, using the same expression for both, and then add them to the dictionary, they will be treated as two different keys (because they are two different objects).
Dim d As New Dictionary(Of Regex, String)()
Dim r As New Regex(".*")
Dim r2 As New Regex(".*")
d(r) = "1"
d(r2) = "2"
d(r) = "overwrite 1"
Console.WriteLine(d.Count) ' Outputs "2"
If you want to use the expression as the key, rather than the Regex object, then you need to create your dictionary with a key type of String, for instance:
Dim d As New Dictionary(Of String, String)()
d(".*") = "1"
d(".*") = "2"
d(".*") = "3"
Console.WriteLine(d.Count) ' Outputs "1"
Then, when you are using the expression string as the key, you can use TryGetValue, like you described:
Dim d As New Dictionary(Of String, String)()
d(".*") = "1"
Dim value As String = Nothing
' Outputs "1"
If d.TryGetValue(".*", value) Then
Console.WriteLine(value)
Else
Console.WriteLine("Not found")
End If
' Outputs "Not found"
If d.TryGetValue(".+", value) Then
Console.WriteLine(value)
Else
Console.WriteLine("Not found")
End If

Using regex replace, how to include characters in the match pattern but exclude them in the value to replace

I use the following method to replace variables in a testscript (simplified):
Dim myDict
Set myDict = createobject("scripting.dictionary")
mydict.add "subject", "testobject"
mydict.add "name", "callsign"
mydict.add "testobject callsign here", "Chell"
mydict.add "subject hometown here", "City 17"
Class cls_simpleReplacer
Private re_
Public Sub Class_Initialize
Set re_ = new regexp
re_.global = true
re_.pattern = "\{[a-zA-Z][\w ]*\}"
End Sub
Public Function Parse(myString)
dim tStr
tStr = re_.replace(myString, getref("varreplace"))
' see if there was a change
If tStr <> myString Then
' see if we can replace more
tStr = Parse(tStr)
End If
Parse = tStr
End Function
End Class
Private Function varreplace(a, b, c)
varreplace = mydict.Item(mid(a, 2, len(a) - 2))
End Function
MsgBox ((new cls_simpleReplacer).Parse("Unbelievable. You, {{subject} {name} here}, must be the pride of {subject hometown here}!"))
' This will output `Unbelievable. You, Chell, must be the pride of City 17!`.
In the varreplace, I have to strip the braces {}, making it ugly (when I change the pattern to double braces for example, I have to change the varreplace function)
Is there a method to put the braces in the pattern, but without including them in the replacement string? I want to use a generic varreplace function without bothering about the format and length of the variable identifiers.
To get rid of the {}, change the pattern to
re_.pattern = "\{([a-zA-Z][\w ]*)\}"
and the replace function to:
Private Function varreplace(sMatch, sGroup1, nPos, sSrc)
varreplace = mydict(sGroup1)
End Function
Capturing the sequence between the braces by using (plain) group capture () and adding a parameter to the function makes the dictionary key immediately accessible.