Include string into another string using regex - regex

I have some set of strings. Strings might have items listed between square brackets. I'd like to include into strings with brackets a constant number of extra items. Brackets might be empty, or absent. For example:
string1 --> string1 # added nothing
string2[] --> string2[extra1="1",extra2="2"] # added two items
string3[item="1"] --> string3[item="1",extra1="1",extra2="2"] # added two items
Currently I achieve this with the following code (Golang):
str1 := "test"
str2 := `test[]`
str3 := `test[item1="1"]`
re := regexp.MustCompile(`\[(.+)?\]`)
for _, s := range []string{str1, str2, str3} {
s = re.ReplaceAllString(s, fmt.Sprintf(`[item1="a",item2="b",$1]`))
fmt.Println(s)
}
But in the output, in case of empty brackets I also got an unwanted comma "," in the end:
test
test[item1="a",item2="b",]
test[item1="a",item2="b",item1="1"]
Is it possible to avoid paste comma in case of empty brackets?
Of course it's possible to parse string again and trim the comma, but it seems suboptimal.
Code example on Go playground

You can have two regexes, where one matches for empty [] and other
matches for string with text inside []. Below is the tested code -
https://play.golang.org/p/_DOOGDMUOCm
Second way is just look back in the string after replacing it. If the
last two characters are ,] and you can substring till , and add ]. I
guess you already know this approach.
package main
import (
"fmt"
"regexp"
)
func main() {
str1 := "test"
str2 := `test[]`
str3 := `test[item1="1"]`
re := regexp.MustCompile(`\[(.*)\]`)
for _, s := range []string{str1, str2, str3} {
matched,err := regexp.Match(`\[(.+)\]`, []byte(s));
_ = err;
if(matched==true){
s = re.ReplaceAllString(s, fmt.Sprintf(`[item1="a",item2="b",$1]`));
}else {
s = re.ReplaceAllString(s, fmt.Sprintf(`[item1="a",item2="b"]`));
}
fmt.Println(s)
}
}

Related

How to replace symbol AND make next letter uppercase in Go

I'm beginner trainee in Go.
I can't figure out how not just replace a symbol, but to make next letter Uppercase in Go.
Task:
Complete the method/function so that it converts dash/underscore delimited words into camel casing. The first word within the output should be capitalized only if the original word was capitalized (known as Upper Camel Case, also often referred to as Pascal case).
I tried to implement regexp methods with:
re, _ := regexp.Compile(`/[-_]\w/ig`)
res := re.FindAllStringSubmatch(s, -1)
return res
But i can't return res because it's slice/array, but i need to return just string.
My code:
package main
import (
"fmt"
"strings"
)
func ToCamelCase(s string) string {
s = strings.ReplaceAll(s, "-", "")
s = strings.ReplaceAll(s, "_", "")
return s
}
func main() {
var s string
fmt.Scan(&s)
fmt.Println(ToCamelCase(s))
}
Input:
"the-stealth-warrior" or "the_stealth_warrior"
Output:
"theStealthWarrior" or "TheStealthWarrior"
My Output: thestealthwarrior
You need to define the regex without regex delimiters in Go string literals, and it is more convenient to use the ReplaceAllStringFunc function:
package main
import (
"fmt"
"regexp"
"strings"
)
func ToCamelCase(s string) string {
re, _ := regexp.Compile(`[-_]\w`)
res := re.ReplaceAllStringFunc(s, func(m string) string {
return strings.ToUpper(m[1:])
})
return res
}
func main() {
s := "the-stealth-warrior"
fmt.Println(ToCamelCase(s))
}
See the Go playground.
The output is theStealthWarrior.
The [-_]\w pattern matches a - or _ and then any word char. If you want to exclude _ from \w, use [^\W_] instead of \w.

Czech characters in regexp search

I am trying to implement very simple text matcher for Czech words. Since Czech language is very suffix heavy I want to define start of the word and then just greedy match rest of the word. This is my implementation so far:
r := regexp.MustCompile("(?i)\\by\\w+\\b")
text := "x yž z"
matches := r.FindAllString(text, -1)
fmt.Println(matches) //have [], want [yž]
I studied Go's regexp syntax:
https://github.com/google/re2/wiki/Syntax
but I don't know, how to define czech language characters there? Using \w just matches ASCII characters, not Czech UTF characters.
Can you please help me?
In RE2, both \w and \b are not Unicode-aware:
\b at ASCII word boundary («\w» on one side and «\W», «\A», or «\z» on the other)
\w word characters (== [0-9A-Za-z_])
A more generalized example will be to split with any chunk of one or more non-letter chars, and then collect only those items that meet your criteria:
package main
import (
"fmt"
"strings"
"regexp"
)
func main() {
output := []string{}
r := regexp.MustCompile(`\P{L}+`)
str := "x--++yž,,,.z..00"
words := r.Split(str, -1)
for i := range words {
if len(words[i]) > 0 && (strings.HasPrefix(words[i], `y`) || (strings.HasPrefix(words[i], `Y`)) {
output = append(output, words[i])
}
}
fmt.Println(output)
}
See the Go demo.
Note that a naive approach like
package main
import (
"fmt"
"regexp"
)
func main() {
output := []string{}
r := regexp.MustCompile(`(?i)(?:\P{L}|^)(y\p{L}*)(?:\P{L}|$)`)
str := "x--++yž,,,.z..00..."
matches := r.FindAllStringSubmatch(str, -1)
for _, v := range matches {
output = append(output, v[1])
}
fmt.Println(output)
}
won't work in case you have match1,match2 match3 like consecutive matches in the string as it will only getch the odd occurrences since the last non-capturing group pattern will consume the char that is supposed to be matched by the first non-capturing group pattern upon the next match.
A workaround for the above code would be adding some non-letter char to the end of the non-letter streaks, say
package main
import (
"fmt"
"regexp"
)
func main() {
output := []string{}
r := regexp.MustCompile(`(?i)(?:\P{L}|^)(u\p{L}*)(?:\P{L}|$)`)
str := "uhličitá,uhličité,uhličitou,uhličitého,yz,my"
matches := r.FindAllStringSubmatch(regexp.MustCompile(`\P{L}+`).ReplaceAllString(str, `$0 `), -1)
for _, v := range matches {
output = append(output, v[1])
}
fmt.Println(output)
}
// => [uhličitá uhličité uhličitou uhličitého]
See this Go demo.
Here, regexp.MustCompile(`\P{L}+`).ReplaceAllString(str, `$0 `) adds a space after all chunks of non-letter chars.

replace all characters in string except last 4 characters

Using Go, how do I replace all characters in a string with "X" except the last 4 characters?
This works fine for php/javascript but not for golang as "?=" is not supported.
\w(?=\w{4,}$)
Tried this, but does not work. I couldn't find anything similar for golang
(\w)(?:\w{4,}$)
JavaScript working link
Go non-working link
A simple yet efficient solution that handles multi UTF-8-byte characters is to convert the string to []rune, overwrite runes with 'X' (except the last 4), then convert back to string.
func maskLeft(s string) string {
rs := []rune(s)
for i := 0; i < len(rs)-4; i++ {
rs[i] = 'X'
}
return string(rs)
}
Testing it:
fmt.Println(maskLeft("123"))
fmt.Println(maskLeft("123456"))
fmt.Println(maskLeft("1234世界"))
fmt.Println(maskLeft("世界3456"))
Output (try it on the Go Playground):
123
XX3456
XX34世界
XX3456
Also see related question: How to replace all characters in a string in golang
Let's say inputString is the string you want to mask all the characters of (except the last four).
First get the last four characters of the string:
last4 := string(inputString[len(inputString)-4:])
Then get a string of X's which is the same length as inputString, minus 4:
re := regexp.MustCompile("\w")
maskedPart := re.ReplaceAllString(inputString[0:len(inputString)-5], "X")
Then combine maskedPart and last4 to get your result:
maskedString := strings.Join([]string{maskedPart,last4},"")
Simpler approach without regex and looping
package main
import (
"fmt"
"strings"
)
func main() {
string := "thisisarandomstring"
head := string[:len(string)-4]
tail := string[len(string)-4:]
mask := strings.Repeat("x", len(head))
fmt.Printf("%v%v", mask, tail)
}
// Output:
// xxxxxxxxxxxxxxxring
Create a Regexp with
re := regexp.MustCompile("\w{4}$")
Let's say inputString is the string you want to remove the last four characters from. Use this code to return a copy of inputString without the last 4 characters:
re.ReplaceAllString(inputString, "")
Note: if it's possible that your input string could start out with less than four characters, and you still want those characters removed since they are at the end of the string, you should instead use:
re := regexp.MustCompile("\w{0,4}$")

apostrophe in word not being recognized for string replace

I am having a problem replacing the word "you're" with regexp.
All of the other words are changing correctly just the word "you're".
I think it is not parsing after the apostrophe.
I have to replace the word "you" to "I" and "you're" to "I'm".
It will change "you" to "I" but "you're" becomes "I're" because it is not going past the apostrophe and it thinks that is the end of the word for some reason. I have to escape the apostrophe somehow.
Please see below for the code in question.
package main
import (
"fmt"
"math/rand"
"regexp"
"strings"
"time"
)
//Function ElizaResponse to take in and return a string
func ElizaResponse(str string) string {
// replace := "How do you know you are"
/*Regex MatchString function with isolation of the word "father"
*with a boundry ignore case regex command.
*/
if matched, _ := regexp.MatchString(`(?i)\bfather\b`, str);
//Condition to replace the original string if it has the word "father"
matched {
return "Why don’t you tell me more about your father?"
}
r1 := regexp.MustCompile(`(?i)\bI'?\s*a?m\b`)
//Match the words "I am" and capture for replacement
matched := r1.MatchString(str)
//condition if "I am" is matched
if matched {
capturedString := r1.ReplaceAllString(str, "$1")
boundaries := regexp.MustCompile(`\b`)
tokens := boundaries.Split(capturedString, -1)
// List the reflections.
reflections := [][]string{
{`I`, `you`},
{`you're`, `I'm`},
{`your`, `my`},
{`me`, `you`},
{`you`, `I`},
{`my`, `your`},
}
// Loop through each token, reflecting it if there's a match.
for i, token := range tokens {
for _, reflection := range reflections {
if matched, _ := regexp.MatchString(reflection[0], token); matched {
tokens[i] = reflection[1]
break
}
}
}
// Put the tokens back together.
return strings.Join(tokens, ``)
}
//Get random number from the length of the array of random struct
//an array of strings for the random response
response := []string{"I’m not sure what you’re trying to say. Could you explain it to me?",
"How does that make you feel?",
"Why do you say that?"}
//Return a random index of the array
return response[rand.Intn(len(response))]
}
func main() {
rand.Seed(time.Now().UTC().UnixNano())
fmt.Println("Im supposed to just take what you're saying at face value?")
fmt.Println(ElizaResponse("Im supposed to just take what you're saying at face value?"))
}
Note that the apostrophe character creates a word boundary, so your use of \b in regular expressions is probably tripping you up. That is, the string "I'm" has four word boundaries, one before and after each character.
┏━┳━┳━┓
┃I┃'┃m┃
┗━┻━┻━┛
│ │ │ └─ end of line creates a word boundary
│ │ └─── after punctuation character creates a word boundary
│ └───── before punctuation character creates a word boundary
└─────── start of line creates a word boundary
There is no way to change the behavior of the word boundary metacharacter so you might be better off mapping regexes that include the full word with punctuation to the desired replacement, e.g.:
type Replacement struct {
rgx *regexp.Regexp
rpl string
}
replacements := []Replacement{
{regexp.MustCompile("\\bI\\b"), "you"},
{regexp.MustCompile("\\byou're\\b"), "I'm"},
// etc...
}
Note also that one of your examples contains a UTF-8 "right single quotation mark" (U+2019, 0xe28099), not to be confused with the UTF-8/ASCII apostrophe (U+0027, 0x27)!
fmt.Sprintf("% x", []byte("'’")) // => "27 e2 80 99"
What you want to achieve here is to replace specific strings with specific replacements. It is easier to achieve that with a map of string keys and values, where each unique key is a literal phrase to search and the values are the texts to replace with.
This how you may define the reflections:
reflections := map[string]string{
`you're`: `I'm`,
`your`: `my`,
`me`: `you`,
`you`: `I`,
`my`: `your`,
`I` : `you`,
}
Next, you need to get the keys in the descending by length order (here is a sample code):
type ByLenDesc []string
func (a ByLenDesc) Len() int {
return len(a)
}
func (a ByLenDesc) Less(i, j int) bool {
return len(a[i]) > len(a[j])
}
func (a ByLenDesc) Swap(i, j int) {
a[i], a[j] = a[j], a[i]
}
And then in the function:
var keys []string
for key, _ := range reflections {
keys = append(keys, key)
}
sort.Sort(ByLenDesc(keys))
Then build the pattern:
pat := "\\b(" + strings.Join(keys, `|`) + ")\\b"
// fmt.Println(pat) // => \b(you're|your|you|me|my|I)\b
The pattern matches you're, your, you, me, my, or I as whole words.
res := regexp.MustCompile(pat).ReplaceAllStringFunc(capturedString, func(m string) string {
return reflections[m]
})
The above code creates a regex object and replaces all matches with the corresponding reflections values.
See the Go demo.
I have found that i just needed to change these two lines of code.
boundaries := regexp.MustCompile(`(\b[^\w']|$)`)
return strings.Join(tokens, ` `)
Its stops the split function from splitting at the ' character.
Then the return of tokens needs a space to put out the string otherwise it would be a continuous string.

How do you replace a character in Go using the Regexp package ReplaceAll function?

I am not familiar with C-like syntaxes and would like to write code to find & replace, say, all 'A's to 'B's in a source string, say 'ABBA' with the Regexp package ReplaceAll or ReplaceAllString functions? How do I set up type Regexp, src and repl? Here's the ReplaceAll code snippet from the Go documentation:
// ReplaceAll returns a copy of src in which all matches for the Regexp
// have been replaced by repl. No support is provided for expressions
// (e.g. \1 or $1) in the replacement text.
func (re *Regexp) ReplaceAll(src, repl []byte) []byte {
lastMatchEnd := 0; // end position of the most recent match
searchPos := 0; // position where we next look for a match
buf := new(bytes.Buffer);
for searchPos <= len(src) {
a := re.doExecute("", src, searchPos);
if len(a) == 0 {
break // no more matches
}
// Copy the unmatched characters before this match.
buf.Write(src[lastMatchEnd:a[0]]);
// Now insert a copy of the replacement string, but not for a
// match of the empty string immediately after another match.
// (Otherwise, we get double replacement for patterns that
// match both empty and nonempty strings.)
if a[1] > lastMatchEnd || a[0] == 0 {
buf.Write(repl)
}
lastMatchEnd = a[1];
// Advance past this match; always advance at least one character.
_, width := utf8.DecodeRune(src[searchPos:len(src)]);
if searchPos+width > a[1] {
searchPos += width
} else if searchPos+1 > a[1] {
// This clause is only needed at the end of the input
// string. In that case, DecodeRuneInString returns width=0.
searchPos++
} else {
searchPos = a[1]
}
}
// Copy the unmatched characters after the last match.
buf.Write(src[lastMatchEnd:len(src)]);
return buf.Bytes();
}
This is a routine to do what you want:
package main
import ("fmt"; "regexp"; "os"; "strings";);
func main () {
reg, error := regexp.Compile ("B");
if error != nil {
fmt.Printf ("Compile failed: %s", error.String ());
os.Exit (1);
}
output := string (reg.ReplaceAll (strings.Bytes ("ABBA"),
strings.Bytes ("A")));
fmt.Println (output);
}
Here is a small example. You can also find good examples in he Regexp test class
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
re, _ := regexp.Compile("e")
input := "hello"
replacement := "a"
actual := string(re.ReplaceAll(strings.Bytes(input), strings.Bytes(replacement)))
fmt.Printf("new pattern %s", actual)
}