How to replace symbol AND make next letter uppercase in Go - regex

I'm beginner trainee in Go.
I can't figure out how not just replace a symbol, but to make next letter Uppercase in Go.
Task:
Complete the method/function so that it converts dash/underscore delimited words into camel casing. The first word within the output should be capitalized only if the original word was capitalized (known as Upper Camel Case, also often referred to as Pascal case).
I tried to implement regexp methods with:
re, _ := regexp.Compile(`/[-_]\w/ig`)
res := re.FindAllStringSubmatch(s, -1)
return res
But i can't return res because it's slice/array, but i need to return just string.
My code:
package main
import (
"fmt"
"strings"
)
func ToCamelCase(s string) string {
s = strings.ReplaceAll(s, "-", "")
s = strings.ReplaceAll(s, "_", "")
return s
}
func main() {
var s string
fmt.Scan(&s)
fmt.Println(ToCamelCase(s))
}
Input:
"the-stealth-warrior" or "the_stealth_warrior"
Output:
"theStealthWarrior" or "TheStealthWarrior"
My Output: thestealthwarrior

You need to define the regex without regex delimiters in Go string literals, and it is more convenient to use the ReplaceAllStringFunc function:
package main
import (
"fmt"
"regexp"
"strings"
)
func ToCamelCase(s string) string {
re, _ := regexp.Compile(`[-_]\w`)
res := re.ReplaceAllStringFunc(s, func(m string) string {
return strings.ToUpper(m[1:])
})
return res
}
func main() {
s := "the-stealth-warrior"
fmt.Println(ToCamelCase(s))
}
See the Go playground.
The output is theStealthWarrior.
The [-_]\w pattern matches a - or _ and then any word char. If you want to exclude _ from \w, use [^\W_] instead of \w.

Related

Is it possible to match a string with two equal parts and a separator

I'm trying to come up with a regular expression that would allow me to match strings that have equal parts and a separator between them. For example:
foo;foo <- match
foobar;foobar <- match
foo;foobar <- no match
foo;bar <- no match
This could be easlily done with PCRE by using positive look-ahead assertion:
([^;]+);(?=\1$) The problem is, I need this for a program written in Go, using Re2 library, which doesn't support look-around assertions. I cannot change code, I can only feed it with a regex strings.
I am afraid the problem cannot be solved only with regex. So I have two solutions for you.
Solution 1 (using regex)
NOTE: This solution works if the string contains only one separator.
package main
import (
"fmt"
"regexp"
)
func regexMatch(str string) bool {
pattern1 := regexp.MustCompile(`^([^;]+);`)
pattern2 := regexp.MustCompile(`;([^;]+)$`)
match1 := pattern1.FindString(str)
match2 := pattern2.FindString(str)
return match1[:len(match1)-1] == match2[1:]
}
func main() {
fmt.Println(regexMatch("foo;foo")) // true
fmt.Println(regexMatch("foobar;foobar")) // true
fmt.Println(regexMatch("foo;foobar")) // false
fmt.Println(regexMatch("foo;bar")) // false
}
Solution 2 (using split)
This solution is more compact and if the separators can be more than one you can easily change the logic.
package main
import (
"fmt"
"strings"
)
func splitMatch(str string) bool {
matches := strings.Split(str, ";")
if (len(matches) != 2) {
return false
}
return matches[0] == matches[1]
}
func main() {
fmt.Println(splitMatch("foo;foo")) // true
fmt.Println(splitMatch("foobar;foobar")) // true
fmt.Println(splitMatch("foo;foobar")) // false
fmt.Println(splitMatch("foo;bar")) // false
}

Czech characters in regexp search

I am trying to implement very simple text matcher for Czech words. Since Czech language is very suffix heavy I want to define start of the word and then just greedy match rest of the word. This is my implementation so far:
r := regexp.MustCompile("(?i)\\by\\w+\\b")
text := "x yž z"
matches := r.FindAllString(text, -1)
fmt.Println(matches) //have [], want [yž]
I studied Go's regexp syntax:
https://github.com/google/re2/wiki/Syntax
but I don't know, how to define czech language characters there? Using \w just matches ASCII characters, not Czech UTF characters.
Can you please help me?
In RE2, both \w and \b are not Unicode-aware:
\b at ASCII word boundary («\w» on one side and «\W», «\A», or «\z» on the other)
\w word characters (== [0-9A-Za-z_])
A more generalized example will be to split with any chunk of one or more non-letter chars, and then collect only those items that meet your criteria:
package main
import (
"fmt"
"strings"
"regexp"
)
func main() {
output := []string{}
r := regexp.MustCompile(`\P{L}+`)
str := "x--++yž,,,.z..00"
words := r.Split(str, -1)
for i := range words {
if len(words[i]) > 0 && (strings.HasPrefix(words[i], `y`) || (strings.HasPrefix(words[i], `Y`)) {
output = append(output, words[i])
}
}
fmt.Println(output)
}
See the Go demo.
Note that a naive approach like
package main
import (
"fmt"
"regexp"
)
func main() {
output := []string{}
r := regexp.MustCompile(`(?i)(?:\P{L}|^)(y\p{L}*)(?:\P{L}|$)`)
str := "x--++yž,,,.z..00..."
matches := r.FindAllStringSubmatch(str, -1)
for _, v := range matches {
output = append(output, v[1])
}
fmt.Println(output)
}
won't work in case you have match1,match2 match3 like consecutive matches in the string as it will only getch the odd occurrences since the last non-capturing group pattern will consume the char that is supposed to be matched by the first non-capturing group pattern upon the next match.
A workaround for the above code would be adding some non-letter char to the end of the non-letter streaks, say
package main
import (
"fmt"
"regexp"
)
func main() {
output := []string{}
r := regexp.MustCompile(`(?i)(?:\P{L}|^)(u\p{L}*)(?:\P{L}|$)`)
str := "uhličitá,uhličité,uhličitou,uhličitého,yz,my"
matches := r.FindAllStringSubmatch(regexp.MustCompile(`\P{L}+`).ReplaceAllString(str, `$0 `), -1)
for _, v := range matches {
output = append(output, v[1])
}
fmt.Println(output)
}
// => [uhličitá uhličité uhličitou uhličitého]
See this Go demo.
Here, regexp.MustCompile(`\P{L}+`).ReplaceAllString(str, `$0 `) adds a space after all chunks of non-letter chars.

Regex to match empty string or pattern

I'm trying to build an application that reads lines of csv text from the network and inserts it into sqlite db. I need to extract all strings that appear between commas, including empty strings.
For e.g a line of text that I need to parse looks like:
"1/17/09 1:23,\"Soap, Shampoo and cleaner\",,1200,Amex,Steven O' Campbell,,Kuwait,1/16/09 14:26,1/18/09 9:08,29.2891667,,48.05"
My code snippet is below , I figured I need to use regex since I'm trying to split the line of string at "," character but the comma may also appear as part of the string.
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
re := regexp.MustCompile(`^|[^,"']+|"([^"]*)"|'([^']*)`)
txt := "1/17/09 1:23,\"Soap, Shampoo and cleaner\",,1200,Amex,Steven O' Campbell,,Kuwait,1/16/09 14:26,1/18/09 9:08,29.2891667,,48.05"
arr := re.FindAllString(txt, -1)
arr2 := strings.Split(txt, ",")
fmt.Println("Array lengths: ", len(arr), len(arr2))
}
The correct length of the split array in this case should be 13.
Like Marc and Flimzy said, regex isn't the right tool here. And since you're not specifying that we should use regex as the tool to extract data from your string, here's a snippet on how you'd extract those from your string and fit the result you're looking for:
import (
"bytes"
"encoding/csv"
"fmt"
)
func main() {
var testdata = `1/17/09 1:23,"Soap, Shampoo and cleaner",,1200,Amex,Steven O' Campbell,,Kuwait,1/16/09 14:26,1/18/09 9:08,29.2891667,,48.05`
var reader = csv.NewReader(bytes.NewBufferString(testdata))
var content, err = reader.Read()
if err != nil {
panic(err)
}
fmt.Println(len(content)) // 13
}

Split string using regular expression in Go

I'm trying to find a good way to split a string using a regular expression instead of a string. Thanks
http://nsf.github.io/go/strings.html?f:Split!
You can use regexp.Split to split a string into a slice of strings with the regex pattern as the delimiter.
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile("[0-9]+")
txt := "Have9834a908123great10891819081day!"
split := re.Split(txt, -1)
set := []string{}
for i := range split {
set = append(set, split[i])
}
fmt.Println(set) // ["Have", "a", "great", "day!"]
}
I made a regex-split function based on the behavior of regex split function in java, c#, php.... It returns only an array of strings, without the index information.
func RegSplit(text string, delimeter string) []string {
reg := regexp.MustCompile(delimeter)
indexes := reg.FindAllStringIndex(text, -1)
laststart := 0
result := make([]string, len(indexes) + 1)
for i, element := range indexes {
result[i] = text[laststart:element[0]]
laststart = element[1]
}
result[len(indexes)] = text[laststart:len(text)]
return result
}
example:
fmt.Println(RegSplit("a1b22c333d", "[0-9]+"))
result:
[a b c d]
If you just want to split on certain characters, you can use strings.FieldsFunc, otherwise I'd go with regexp.FindAllString.
The regexp.Split() function would be the best way to do this.
You should be able to create your own split function that loops over the results of RegExp.FindAllString, placing the intervening substrings into a new array.
http://nsf.github.com/go/regexp.html?m:Regexp.FindAllString!
I found this old post while looking for an answer. I'm new to Go but these answers seem overly complex for the current version of Go. The simple function below returns the same result as those above.
package main
import (
"fmt"
"regexp"
)
func goReSplit(text string, pattern string) []string {
regex := regexp.MustCompile(pattern)
result := regex.Split(text, -1)
return result
}
func main() {
fmt.Printf("%#v\n", goReSplit("Have9834a908123great10891819081day!", "[0-9]+"))
}

How do you replace a character in Go using the Regexp package ReplaceAll function?

I am not familiar with C-like syntaxes and would like to write code to find & replace, say, all 'A's to 'B's in a source string, say 'ABBA' with the Regexp package ReplaceAll or ReplaceAllString functions? How do I set up type Regexp, src and repl? Here's the ReplaceAll code snippet from the Go documentation:
// ReplaceAll returns a copy of src in which all matches for the Regexp
// have been replaced by repl. No support is provided for expressions
// (e.g. \1 or $1) in the replacement text.
func (re *Regexp) ReplaceAll(src, repl []byte) []byte {
lastMatchEnd := 0; // end position of the most recent match
searchPos := 0; // position where we next look for a match
buf := new(bytes.Buffer);
for searchPos <= len(src) {
a := re.doExecute("", src, searchPos);
if len(a) == 0 {
break // no more matches
}
// Copy the unmatched characters before this match.
buf.Write(src[lastMatchEnd:a[0]]);
// Now insert a copy of the replacement string, but not for a
// match of the empty string immediately after another match.
// (Otherwise, we get double replacement for patterns that
// match both empty and nonempty strings.)
if a[1] > lastMatchEnd || a[0] == 0 {
buf.Write(repl)
}
lastMatchEnd = a[1];
// Advance past this match; always advance at least one character.
_, width := utf8.DecodeRune(src[searchPos:len(src)]);
if searchPos+width > a[1] {
searchPos += width
} else if searchPos+1 > a[1] {
// This clause is only needed at the end of the input
// string. In that case, DecodeRuneInString returns width=0.
searchPos++
} else {
searchPos = a[1]
}
}
// Copy the unmatched characters after the last match.
buf.Write(src[lastMatchEnd:len(src)]);
return buf.Bytes();
}
This is a routine to do what you want:
package main
import ("fmt"; "regexp"; "os"; "strings";);
func main () {
reg, error := regexp.Compile ("B");
if error != nil {
fmt.Printf ("Compile failed: %s", error.String ());
os.Exit (1);
}
output := string (reg.ReplaceAll (strings.Bytes ("ABBA"),
strings.Bytes ("A")));
fmt.Println (output);
}
Here is a small example. You can also find good examples in he Regexp test class
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
re, _ := regexp.Compile("e")
input := "hello"
replacement := "a"
actual := string(re.ReplaceAll(strings.Bytes(input), strings.Bytes(replacement)))
fmt.Printf("new pattern %s", actual)
}