Include multiple patterns in regex word break - regex

I have the following program which uses regex to search for a pattern and replaces it a key word.
Sample as shown below will replace names like "Incorp","Inc.","Inc corp" with "Inc".
package main
import (
"fmt"
"regexp"
)
func replaceWholeWord(input string, patterns map[string]string) string {
for searchPattern, replacePattern := range patterns {
re, _ := regexp.Compile(`(?i)(^|\s)` + regexp.QuoteMeta(searchPattern) + `(\s|$)`)
input = re.ReplaceAllString(input, "${1}"+replacePattern+"${2}")
}
return input
}
func main() {
patterns := map[string]string{"Inc.": "Inc", "Incorp.": "Inc", "Incorporation": "Inc", ", Incorpa.": "Inc"}
fmt.Println(replaceWholeWord("ABC Inc.", patterns))
fmt.Println(replaceWholeWord("ABC Incorp.", patterns))
fmt.Println(replaceWholeWord("ABC InCorp.", patterns))
fmt.Println(replaceWholeWord("ABC InCorporation", patterns))
fmt.Println(replaceWholeWord("ABC , InCorpa.", patterns))
}
As you can see this performance intensive as the number of patterns increase. I want to build regular expression only once and do the search and replace operation. I am facing tough time to add the those multiple
patterns in a single regex without breaking the functionality.
Edit:
I modified my program to avoid building the regexes only if the word has the pattern, this way I have avoided to the performance hit.
Please feel free to close the question.

I am not a GO developer, but a single Regular Expression pattern for what you have shown would be:
(In(c|C)(\.|orp(\.|a\.|oration)))$
UPDATE: Found the GO way.
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`(?i)^(.*)(?:Inc(?:\.|orp(?:\.|a|oration)??\.))(.*)$`)
fmt.Println(re.ReplaceAllString("ABC Inc.", "${1}Inc${2}"))
fmt.Println(re.ReplaceAllString("ABC Incorp.", "${1}Inc${2}"))
fmt.Println(re.ReplaceAllString("ABC InCorporation.", "${1}Inc${2}"))
fmt.Println(re.ReplaceAllString("ABC InCorpa.", "${1}Inc${2}"))
}
ABC Inc
ABC Inc
ABC Inc
ABC Inc

Why not use an "or":
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`(?i)^(.*)(?:Inc\.|Incorp\.|Incorporation\.|Incorpa\.)(.*)$`)
fmt.Println(re.ReplaceAllString("ABC Inc.", "${1}Inc${2}"))
fmt.Println(re.ReplaceAllString("ABC Incorp.", "${1}Inc${2}"))
fmt.Println(re.ReplaceAllString("ABC InCorporation.", "${1}Inc${2}"))
fmt.Println(re.ReplaceAllString("ABC InCorpa.", "${1}Inc${2}"))
}
See Playground:
ABC Inc
ABC Inc
ABC Inc
ABC Inc

If all you 'search & replace' are done on whole words, you can simply turn your string into a slice of words and construct a new string which replaces each word present in your map with its counterpart:
var buffer bytes.Buffer
for _, word := range words {
if val, ok := patterns[word]; ok {
word = val
}
buffer.WriteString(word)
buffer.WriteString(" ")
}

Related

How to select first chars with a custom word boundary?

I've test cases with a series of words like this :
{
input: "Halley's Comet",
expected: "HC",
},
{
input: "First In, First Out",
expected: "FIFO",
},
{
input: "The Road _Not_ Taken",
expected: "TRNT",
},
I want with one regex to match all first letters of these words, avoid char: "_" to be matched as a first letter and count single quote in the word.
Currently, I have this regex working on pcre syntax but not with Go regexp package : (?<![a-zA-Z0-9'])([a-zA-Z0-9'])
I know lookarounds aren't supported by Go but I'm looking for a good way to do that.
I also use this func to get an array of all strings : re.FindAllString(s, -1)
Thanks for helping.
Something that plays with character classes and word boundaries should suffice:
\b_*([a-z])[a-z]*(?:'s)?_*\b\W*
demo
Usage:
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`(?i)\b_*([a-z])[a-z]*(?:'s)?_*\b\W*`)
fmt.Println(re.ReplaceAllString("O'Brian's dog", "$1"))
}
ftr, regexp less solution
package main
import (
"fmt"
)
func main() {
inputs := []string{"Hallمرحباey's Comet", "First In, First Out", "The Road _Not_ Taken", "O'Brian's Dog"}
c := [][]string{}
w := [][]string{}
for _, input := range inputs {
c = append(c, firstLet(input))
w = append(w, words(input))
}
fmt.Printf("%#v\n", w)
fmt.Printf("%#v\n", c)
}
func firstLet(in string) (out []string) {
var inword bool
for _, r := range in {
if !inword {
if isChar(r) {
inword = true
out = append(out, string(r))
}
} else if r == ' ' {
inword = false
}
}
return out
}
func words(in string) (out []string) {
var inword bool
var w []rune
for _, r := range in {
if !inword {
if isChar(r) {
w = append(w, r)
inword = true
}
} else if r == ' ' {
if len(w) > 0 {
out = append(out, string(w))
w = w[:0]
}
inword = false
} else if r != '_' {
w = append(w, r)
}
}
if len(w) > 0 {
out = append(out, string(w))
}
return out
}
func isChar(r rune) bool {
return (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z')
}
outputs
[][]string{[]string{"Hallمرحباey's", "Comet"}, []string{"First", "In,", "First", "Out"}, []string{"The", "Road", "Not", "Taken"}, []string{"O'Brian's", "Dog"}}
[][]string{[]string{"H", "C"}, []string{"F", "I", "F", "O"}, []string{"T", "R", "N", "T"}, []string{"O", "D"}}

Regex Replace within Sub Match

Given a string (a line in a log file):
Date=2017-06-29 03:10:01.140 -700 PDT,clientDataRate="12.0,18.0,24.0,36.0,48.0,54.0",host=superawesomehost.foo,foo=bar
I'd like to replace the commas with a single space, but only within double quotes.
Desired result:
Date=2017-06-29 03:10:01.140 -700 PDT,clientDataRate="12.0 18.0 24.0 36.0 48.0 54.0",host=superawesomehost.foo,foo=bar
I've begun with a basic combination of regex and ReplaceAllString but am rapidly realizing I don't understand how to implement the match group (?) needed to accomplish this.
package main
import (
"fmt"
"log"
"regexp"
)
func main() {
logLine := "Date=2017-06-29 03:10:01.140 -700 PDT,clientDataRate=\"12.0,18.0,24.0,36.0,48.0,54.0\",host=superawesomehost.foo,foo=bar"
fmt.Println("logLine: ", logLine)
reg, err := regexp.Compile("[^A-Za-z0-9=\"-:]+")
if err != nil {
log.Fatal(err)
}
repairedLogLine := reg.ReplaceAllString(logLine, ",")
fmt.Println("repairedLogLine:", repairedLogLine )
}
All help is much appreciated.
You'll want to use Regexp.ReplaceAllStringFunc, which allows you to use a function result as the replacement of a substring:
package main
import (
"fmt"
"log"
"regexp"
"strings"
)
func main() {
logLine := `Date=2017-06-29 03:10:01.140 -700 PDT,clientDataRate="12.0,18.0,24.0,36.0,48.0,54.0",host=superawesomehost.foo,foo=bar`
fmt.Println("logLine: ", logLine)
reg, err := regexp.Compile(`"([^"]*)"`)
if err != nil {
log.Fatal(err)
}
repairedLogLine := reg.ReplaceAllStringFunc(logLine, func(entry string) string {
return strings.Replace(entry, ",", " ", -1)
})
fmt.Println("repairedLogLine:", repairedLogLine)
}
https://play.golang.org/p/BsZxcrrvaR

Remove quotes between letters

In golang, how can I remove quotes between two letters, like that:
import (
"testing"
)
func TestRemoveQuotes(t *testing.T) {
var a = "bus\"zipcode"
var mockResult = "bus zipcode"
a = RemoveQuotes(a)
if a != mockResult {
t.Error("Error or TestRemoveQuotes: ", a)
}
}
Function:
import (
"fmt"
"strings"
)
func RemoveQuotes(s string) string {
s = strings.Replace(s, "\"", "", -1) //here I removed all quotes. I'd like to remove only quotes between letters
fmt.Println(s)
return s
}
For example:
"bus"zipcode" = "bus zipcode"
You may use a simple \b"\b regex that matches a double quote only when preceded and followed with word boundaries:
package main
import (
"fmt"
"regexp"
)
func main() {
var a = "\"test1\",\"test2\",\"tes\"t3\""
fmt.Println(RemoveQuotes(a))
}
func RemoveQuotes(s string) string {
re := regexp.MustCompile(`\b"\b`)
return re.ReplaceAllString(s, "")
}
See the Go demo printing "test1","test2","test3".
Also, see the online regex demo.
I am not sure about what you need when you commented I want to only quote inside test3.
This code is removing the quotes from the inside, as you did, but it is adding the quotes with fmt.Sprintf()
package main
import (
"fmt"
"strings"
)
func main() {
var a = "\"test1\",\"test2\",\"tes\"t3\""
fmt.Println(RemoveQuotes(a))
}
func RemoveQuotes(s string) string {
s = strings.Replace(s, "\"", "", -1) //here I removed all quotes. I'd like to remove only quotes between letters
return fmt.Sprintf(`"%s"`, s)
}
https://play.golang.org/p/dKB9DwYXZp
In your example you define a string variable so the outer quotes are not part of the actual string. If you would do fmt.Println("bus\"zipcode") the output on the screen would be bus"zipcode. If your goal is to replace quotes in a string with a space then you need to replace the quote not with an empty string as you do, but rather with a space - s = strings.Replace(s, "\"", " ", -1). Though if you want to remove the quotes entirely you can do something like this:
package main
import (
"fmt"
"strings"
)
func RemoveQuotes(s string) string {
result := ""
arr := strings.Split(s, ",")
for i:=0;i<len(arr);i++ {
sub := strings.Replace(arr[i], "\"", "", -1)
result = fmt.Sprintf("%s,\"%s\"", result, sub)
}
return result[1:]
}
func main() {
a:= "\"test1\",\"test2\",\"tes\"t3\""
fmt.Println(RemoveQuotes(a))
}
Note however that this is not very efficient, but I assume it's more about learning how to do it in this case.

How to concatenate Service metadata for consul-template with commas

Does anyone know how to concatenate strings from consul for consul-template?
If I have a service 'foo' registered in Consul
{
"Node": "node1",
"Address": "192.168.0.1",
"Port": 3333
},
{
"Node": "node2",
"Address": "192.168.0.2",
"Port": 4444
}
I would like consul-template to generate the following line:
servers=192.168.0.1:3333,192.168.0.2:4444/bogus
The following attempt does not work since it leaves a trailing comma ,
servers={{range service "foo"}}{{.Address}}{{.Port}},{{end}}/bogus
# renders
servers=192.168.0.1:3333,192.168.0.2:4444,/bogus
# What I actually want
servers=192.168.0.1:3333,192.168.0.2:4444/bogus
I know consul-template uses golang template syntax, but I simply cannot figure out the syntax to get this working. Its likely that I should use consul-template's join but how do I pass both .Address and .Port to join? This is just a trivial example, and I'm not using indexes intentionally since the number of services could be more than two. Any ideas?
This should work.
{{$foo_srv := service "foo"}}
{{if $foo_srv}}
{{$last := len $foo_srv | subtract 1}}
servers=
{{- range $i := loop $last}}
{{- with index $foo_srv $i}}{{.Address}}{{.Port}},{{end}}
{{- end}}
{{- with index $foo_srv last}}{{.Address}}{{.Port}}{{end}}/bogus
{{end}}
I was thinking if "join" can be used.
Note "{{-" means removing leading white spaces (such ' ', \t, \n).
You can use a custom plugin.
servers={{service "foo" | toJSON | plugin "path/to/plugin"}}
The plugin code:
package main
import (
"encoding/json"
"fmt"
"os"
)
type InputEntry struct {
Node string
Address string
Port int
}
func main() {
arg := []byte(os.Args[1])
var input []InputEntry
if err := json.Unmarshal(arg, &input); err != nil {
fmt.Fprintln(os.Stderr, fmt.Sprintf("err: %s", err))
os.Exit(1)
}
var output string
for i, entry := range input {
output += fmt.Sprintf("%v:%v", entry.Address, entry.Port)
if i != len(input)-1 {
output += ","
}
}
fmt.Fprintln(os.Stdout, string(output))
os.Exit(0)
}

Swift 2.1+ return String array, with emojis \\w+ expression

The problem is "\w+" works fine with just plain text. However, the goal is to avoid having the emoji characters included as whitespace.
Example:
"This is some text 🏈🏈".regex("\\w+")
Desired output:
["This","is","some","text","🏈🏈"]
Code:
extension String {
func regex (pattern: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions(rawValue: 0))
let nsstr = self as NSString
let all = NSRange(location: 0, length: nsstr.length)
var matches : [String] = [String]()
regex.enumerateMatchesInString(self, options: NSMatchingOptions(rawValue: 0), range: all) {
(result : NSTextCheckingResult?, _, _) in
if let r = result {
let result = nsstr.substringWithRange(r.range) as String
matches.append(result)
}
}
return matches
} catch {
return [String]()
}
}
}
The code above gives the following output:
"This is some text 🏈🏈".regex("\\w+")
// Yields: ["This", "is", "some", "text"]
// Note the 🏈🏈 are missing.
Is it a coding issue, regex issue, or both? Other answers seem to show the same problem.
func matchesForRegexInText(regex: String!, text: String!) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex, options: [])
let nsString = text as NSString
let results = regex.matchesInString(text,
options: [], range: NSMakeRange(0, nsString.length))
return results.map { nsString.substringWithRange($0.range)}
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
let string = "This is some text 🏈🏈"
let matches = matchesForRegexInText("\\w+", text: string)
// Also yields ["This", "is", "some", "text"]
My Mistake
\w+ is word boundary
"This is some text \t 🏈🏈".regex("[^ |^\t]+")
// Give correct answer ["This", "is", "some", "text", "🏈🏈"]