How can I use/access capture groups in Go? [duplicate] - regex

This question already has an answer here:
Regex with replace in Golang
(1 answer)
Closed 3 years ago.
I have a file with dates in the format dd.mm.yyyy (e.g. 31.12.2019).
I want to transform into format yyyy-mm-dd (e.g. 2019-12-31).
In Notepad++ I can do a Search and Replace with these strings using back references:
Search:
(\d{2}).(\d{2}).(\d{4})
Replace:
\3-\2-\1
How would I do this with Go?

You could do it by slicing your input string, and assembling the parts in different order:
func transform(s string) string {
d, m, y := s[:2], s[3:5], s[6:]
return y + "-" + m + "-" + d
}
Note: the above function does not validate the input, it could panic if the input is shorter than 6 bytes.
If you need input validation (including date validation), you could use the time package to parse the date, and format it into your expected output:
func transform2(s string) (string, error) {
t, err := time.Parse("02.01.2006", s)
if err != nil {
return "", err
}
return t.Format("2006-01-02"), nil
}
Testing the above functions:
fmt.Println(transform("31.12.2019"))
fmt.Println(transform2("31.12.2019"))
Output (try it on the Go Playground):
2019-12-31
2019-12-31 <nil>

regex might be overkill since you have such well defined input. how about this:
var dmy = strings.Split("31.12.2019",".")
var mdy = []string{dmy[1],dmy[0],dmy[2]}
fmt.Println(strings.Join(mdy, "-"))
https://play.golang.org/p/Ak3TlCAGHUv

Related

golang regex to capture text between a quote and a / [duplicate]

This question already has answers here:
Find all string matches with Regex golang
(1 answer)
Regex find many word in the string
(1 answer)
How to iterate through regex matching groups
(1 answer)
Closed 4 months ago.
my strings:
12345:cluster/ecs52v-usw2-tst",
12345:cluster/test32-euw1-stg",
The output I'm looking for is:
ecs52v-usw2-tst
test32-euw1-stg
I have multiple cluster names that I'm trying to capture in a slice
I've gotten it in ruby (?<=\/)(.*(tst|stg|prd))(?=") but I'm having trouble in golang.
https://go.dev/play/p/DyYr3igu2CF
You could use FindAllStringSubmatch to find all submatches.
func main() {
var re = regexp.MustCompile(`(?mi)(/(.*?)")`)
var str = `cluster/ecs52v-usw2-tst",
cluster/ecs52v-usw2-stg",
cluster/ecs52v-usw2-prd",`
matches := re.FindAllStringSubmatch(str, -1)
for _, match := range matches {
fmt.Printf("val = %s \n", match[2])
}
}
Output
val = ecs52v-usw2-tst
val = ecs52v-usw2-stg
val = ecs52v-usw2-prd
https://go.dev/play/p/_LRVYaM2r7z
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?mi)/(.*?)"`)
var str = `cluster/ecs52v-usw2-tst",
cluster/ecs52v-usw2-stg",
cluster/ecs52v-usw2-prd",`
for i, match := range re.FindAllStringSubmatch(str, -1) {
fmt.Println(match[1], "found at line", i)
}
}

How to extract different variables from a regex without compiling each expression

I have a struct representing sizes of computer objects. Objects of this struct are constructed from string values input by users; e.g. "50KB" would be tokenised into an int value of "50" and the string value "KB".
type SizeUnit string
const (
B = "B"
KB = "KB"
MB = "MB"
GB = "GB"
TB = "TB"
)
type ObjectSize struct {
NumberOfUnits int
Unit SizeUnit
}
func NewObjectSizeFromString(input_str string) (*ObjectSize, error)
In the body of this function, I first check if the input value is in the valid format; i.e. any number of digits, followed by any one of "B", "KB", "MB", "GB" or "TB". I then extract the int and string components separately and return a pointer to a struct.
In order to do these three things though, I'm having to compile the regex three times.
The first time to check the format of the input string
rg, err := regexp.Compile(`^[0-9]+B$|KB$|MB$|GB$|TB$`)
And then compile again to fetch the int component:
rg, err := regexp.Compile(`^[0-9]+`)
rg.FindString(input_str)
And then compile again to fetch the string/units component:
rg, err := regexp.Compile(`B$|KB$|MB$|GB$|TB$`)
rg.FindString(input_str)
Is there any way to get the two components from the input string with a single regex compilation?
The full code can be found on the Go Playground.
I should point out that this is an academic question as I'm experimenting with Go's regex library. For a simple use-case of this sort, I would probably use a simple for loop to parse the input string.
You can capture both the values with a single expression using regexp.FindStringSubmatch:
func NewObjectSizeFromString(input_str string) (*ObjectSize, error) {
var defaultReturn *ObjectSize = nil
full_search_pattern := `^([0-9]+)([KMGT]?B)$`
rg, err := regexp.Compile(full_search_pattern)
if err != nil {
return defaultReturn, errors.New("Could not compile search expression")
}
matched := rg.FindStringSubmatch(input_str)
if matched == nil {
return defaultReturn, errors.New("Not in valid format")
}
i, err := strconv.ParseInt(matched[1], 10, 32)
return &ObjectSize{int(i), SizeUnit(matched[2])}, nil
}
See the playground.
The ^([0-9]+)([KMGT]?B)$ regex matches
^ - start of string
([0-9]+) - Group 1 (this value will be held in matched[1]): one or more digits
([KMGT]?B) - Group 2 (it will be in matched[2]): an optional K, M, G, T letter, and then a B letter
$ - end of string.
Note that matched[0] will hold the whole match.

Selecting text between borders using regexp in Go [duplicate]

content := `{null,"Age":24,"Balance":33.23}`
rule,_ := regexp.Compile(`"([^\"]+)"`)
results := rule.FindAllString(content,-1)
fmt.Println(results[0]) //"Age"
fmt.Println(results[1]) //"Balance"
There is a json string with a ``null`` value that it look like this.
This json is from a web api and i don't want to replace anything inside.
I want to using regex to match all the keys in this json which are without the double quote and the output are ``Age`` and ``Balance`` but not ``"Age"`` and ``"Balance"``.
How can I achieve this?
One solution would be to use a regular expression that matches any character between quotes (such as your example or ".*?") and either put a matching group (aka "submatch") inside the quotes or return the relevant substring of the match, using regexp.FindAllStringSubmatch(...) or regexp.FindAllString(...), respectively.
For example (Go Playground):
func main() {
str := `{null,"Age":24,"Balance":33.23}`
fmt.Printf("OK1: %#v\n", getQuotedStrings1(str))
// OK1: []string{"Age", "Balance"}
fmt.Printf("OK2: %#v\n", getQuotedStrings2(str))
// OK2: []string{"Age", "Balance"}
}
var re1 = regexp.MustCompile(`"(.*?)"`) // Note the matching group (submatch).
func getQuotedStrings1(s string) []string {
ms := re1.FindAllStringSubmatch(s, -1)
ss := make([]string, len(ms))
for i, m := range ms {
ss[i] = m[1]
}
return ss
}
var re2 = regexp.MustCompile(`".*?"`)
func getQuotedStrings2(s string) []string {
ms := re2.FindAllString(s, -1)
ss := make([]string, len(ms))
for i, m := range ms {
ss[i] = m[1 : len(m)-1] // Note the substring of the match.
}
return ss
}
Note that the second version (without a submatching group) may be slightly faster based on a simple benchmark, if performance is critical.

Golang Regex: FindAllStringSubmatch to []string

I download a multiline file from Amazon S3 in format like:
ColumnAv1 ColumnBv1 ColumnCv1 ...
ColumnAv2 ColumnBv2 ColumnCv2 ...
the file is of type byte. Then I want to parse this with regex:
matches := re.FindAllSubmatch(file,-1)
then I want to feed result row by row to function which takes []string as input (string[0] is ColumnAv1, string[1] is ColumnBv2, ...).
How should I convert result of [][][]byte to []string containing first, second, etc row? I suppose I should do it in a loop, but I cannot get this working:
for i:=0;i<len(len(matches);i++{
tmp:=myfunction(???)
}
BTW, Why does function FindAllSubmatch return [][][]byte whereas FindAllStringSubmatch return [][]string?
(Sorry I don't have right now access to my real example, so the syntax may not be proper)
It's all explained extensively in the package's documentation.
Read the parapgraph which explains :
There are 16 methods of Regexp that match a regular expression and identify the matched text. Their names are matched by this regular expression:
Find(All)?(String)?(Submatch)?(Index)?
In your case, you probably want to use FindAllStringSubmatch.
In Go, a string is just a read-only []byte.
You can choose to either keep passing []byte variables around,
or cast the []byte values to string :
var byteSlice = []byte{'F','o','o'}
var str string
str = string(byteSlice)
You can simply iterate through the bytes result as you would do for strings result using two nested loop, and just convert slice of bytes to a string in the second loop:
package main
import "fmt"
func main() {
f := [][][]byte{{{'a', 'b', 'c'}}}
for _, line := range f {
for _, match := range line { // match is a type of []byte
fmt.Println(string(match))
}
}
}
Playground

Split string using regular expression in Go

I'm trying to find a good way to split a string using a regular expression instead of a string. Thanks
http://nsf.github.io/go/strings.html?f:Split!
You can use regexp.Split to split a string into a slice of strings with the regex pattern as the delimiter.
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile("[0-9]+")
txt := "Have9834a908123great10891819081day!"
split := re.Split(txt, -1)
set := []string{}
for i := range split {
set = append(set, split[i])
}
fmt.Println(set) // ["Have", "a", "great", "day!"]
}
I made a regex-split function based on the behavior of regex split function in java, c#, php.... It returns only an array of strings, without the index information.
func RegSplit(text string, delimeter string) []string {
reg := regexp.MustCompile(delimeter)
indexes := reg.FindAllStringIndex(text, -1)
laststart := 0
result := make([]string, len(indexes) + 1)
for i, element := range indexes {
result[i] = text[laststart:element[0]]
laststart = element[1]
}
result[len(indexes)] = text[laststart:len(text)]
return result
}
example:
fmt.Println(RegSplit("a1b22c333d", "[0-9]+"))
result:
[a b c d]
If you just want to split on certain characters, you can use strings.FieldsFunc, otherwise I'd go with regexp.FindAllString.
The regexp.Split() function would be the best way to do this.
You should be able to create your own split function that loops over the results of RegExp.FindAllString, placing the intervening substrings into a new array.
http://nsf.github.com/go/regexp.html?m:Regexp.FindAllString!
I found this old post while looking for an answer. I'm new to Go but these answers seem overly complex for the current version of Go. The simple function below returns the same result as those above.
package main
import (
"fmt"
"regexp"
)
func goReSplit(text string, pattern string) []string {
regex := regexp.MustCompile(pattern)
result := regex.Split(text, -1)
return result
}
func main() {
fmt.Printf("%#v\n", goReSplit("Have9834a908123great10891819081day!", "[0-9]+"))
}