Selecting text between borders using regexp in Go [duplicate] - regex

content := `{null,"Age":24,"Balance":33.23}`
rule,_ := regexp.Compile(`"([^\"]+)"`)
results := rule.FindAllString(content,-1)
fmt.Println(results[0]) //"Age"
fmt.Println(results[1]) //"Balance"
There is a json string with a ``null`` value that it look like this.
This json is from a web api and i don't want to replace anything inside.
I want to using regex to match all the keys in this json which are without the double quote and the output are ``Age`` and ``Balance`` but not ``"Age"`` and ``"Balance"``.
How can I achieve this?

One solution would be to use a regular expression that matches any character between quotes (such as your example or ".*?") and either put a matching group (aka "submatch") inside the quotes or return the relevant substring of the match, using regexp.FindAllStringSubmatch(...) or regexp.FindAllString(...), respectively.
For example (Go Playground):
func main() {
str := `{null,"Age":24,"Balance":33.23}`
fmt.Printf("OK1: %#v\n", getQuotedStrings1(str))
// OK1: []string{"Age", "Balance"}
fmt.Printf("OK2: %#v\n", getQuotedStrings2(str))
// OK2: []string{"Age", "Balance"}
}
var re1 = regexp.MustCompile(`"(.*?)"`) // Note the matching group (submatch).
func getQuotedStrings1(s string) []string {
ms := re1.FindAllStringSubmatch(s, -1)
ss := make([]string, len(ms))
for i, m := range ms {
ss[i] = m[1]
}
return ss
}
var re2 = regexp.MustCompile(`".*?"`)
func getQuotedStrings2(s string) []string {
ms := re2.FindAllString(s, -1)
ss := make([]string, len(ms))
for i, m := range ms {
ss[i] = m[1 : len(m)-1] // Note the substring of the match.
}
return ss
}
Note that the second version (without a submatching group) may be slightly faster based on a simple benchmark, if performance is critical.

Related

How to extract different variables from a regex without compiling each expression

I have a struct representing sizes of computer objects. Objects of this struct are constructed from string values input by users; e.g. "50KB" would be tokenised into an int value of "50" and the string value "KB".
type SizeUnit string
const (
B = "B"
KB = "KB"
MB = "MB"
GB = "GB"
TB = "TB"
)
type ObjectSize struct {
NumberOfUnits int
Unit SizeUnit
}
func NewObjectSizeFromString(input_str string) (*ObjectSize, error)
In the body of this function, I first check if the input value is in the valid format; i.e. any number of digits, followed by any one of "B", "KB", "MB", "GB" or "TB". I then extract the int and string components separately and return a pointer to a struct.
In order to do these three things though, I'm having to compile the regex three times.
The first time to check the format of the input string
rg, err := regexp.Compile(`^[0-9]+B$|KB$|MB$|GB$|TB$`)
And then compile again to fetch the int component:
rg, err := regexp.Compile(`^[0-9]+`)
rg.FindString(input_str)
And then compile again to fetch the string/units component:
rg, err := regexp.Compile(`B$|KB$|MB$|GB$|TB$`)
rg.FindString(input_str)
Is there any way to get the two components from the input string with a single regex compilation?
The full code can be found on the Go Playground.
I should point out that this is an academic question as I'm experimenting with Go's regex library. For a simple use-case of this sort, I would probably use a simple for loop to parse the input string.
You can capture both the values with a single expression using regexp.FindStringSubmatch:
func NewObjectSizeFromString(input_str string) (*ObjectSize, error) {
var defaultReturn *ObjectSize = nil
full_search_pattern := `^([0-9]+)([KMGT]?B)$`
rg, err := regexp.Compile(full_search_pattern)
if err != nil {
return defaultReturn, errors.New("Could not compile search expression")
}
matched := rg.FindStringSubmatch(input_str)
if matched == nil {
return defaultReturn, errors.New("Not in valid format")
}
i, err := strconv.ParseInt(matched[1], 10, 32)
return &ObjectSize{int(i), SizeUnit(matched[2])}, nil
}
See the playground.
The ^([0-9]+)([KMGT]?B)$ regex matches
^ - start of string
([0-9]+) - Group 1 (this value will be held in matched[1]): one or more digits
([KMGT]?B) - Group 2 (it will be in matched[2]): an optional K, M, G, T letter, and then a B letter
$ - end of string.
Note that matched[0] will hold the whole match.

Regex extracting sets of numbers from string when prefix occurs, while not matching said prefix

As stated in the title, given a situation where I have a string like so:
"somestring~200~122"
I am wanting to regex to match the numbers when the prefix "~" occurs. So I can ultimately end up with [200, 122].
Matching the prefix is necessary as I need to protect against a case where a string like the one below should not be matched
"somestring~abc200~def122"
For additional context: As stated in the title, I am using go so I am planning on using doing something like the following in order to obtain the numbers within the string:
pattern := regexp.MustCompile("regex i need help with")
numbers := pattern.FindAllString(host, -1)
You can use FindAllStringSubmatch to extract the group containing just the digits. Below is an example that finds all instances of ~ followed by numbers. It additionally converts all the matches to ints
and inserts them into a slice:
package main
import (
"fmt"
"regexp"
"strconv"
)
func main() {
host := "somestring~200~122"
pattern := regexp.MustCompile(`~(\d+)`)
numberStrings := pattern.FindAllStringSubmatch(host, -1)
numbers := make([]int, len(numberStrings))
for i, numberString := range numberStrings {
number, err := strconv.Atoi(numberString[1])
if err != nil {
panic(err)
}
numbers[i] = number
}
fmt.Println(numbers)
}
https://play.golang.org/p/09YyewtRXz

Golang regular expression for parsing key value pair into a string map

I'm looking to parse the following string into a map[string]string using a regular expression:
time="2017-05-30T19:02:08-05:00" level=info msg="some log message" app=sample size=10
I'm trying to create a map that would have
m["time"] = "2017-05-30T19:02:08-05:00"
m["level"] = "info"
etc
I have tried using regex.FindAllStringIndex but can't quite come up with an appropriate regex? Is this the correct way to go?
This is not using regex but is just an example of how to achieve the same by using strings.FieldsFunc.
https://play.golang.org/p/rr6U8xTJZT
package main
import (
"fmt"
"strings"
"unicode"
)
const foo = `time="2017-05-30T19:02:08-05:00" level=info msg="some log message" app=sample size=10`
func main() {
lastQuote := rune(0)
f := func(c rune) bool {
switch {
case c == lastQuote:
lastQuote = rune(0)
return false
case lastQuote != rune(0):
return false
case unicode.In(c, unicode.Quotation_Mark):
lastQuote = c
return false
default:
return unicode.IsSpace(c)
}
}
// splitting string by space but considering quoted section
items := strings.FieldsFunc(foo, f)
// create and fill the map
m := make(map[string]string)
for _, item := range items {
x := strings.Split(item, "=")
m[x[0]] = x[1]
}
// print the map
for k, v := range m {
fmt.Printf("%s: %s\n", k, v)
}
}
Instead of writing regex of your own, you could simply use the github.com/kr/logfmt package.
Package implements the decoding of logfmt key-value pairs.
Example logfmt message:
foo=bar a=14 baz="hello kitty" cool%story=bro f %^asdf
Example result in JSON:
{
"foo": "bar",
"a": 14,
"baz": "hello kitty",
"cool%story": "bro",
"f": true,
"%^asdf": true
}
Use named capturing groups in your regular expression and the FindStringSubmatch and SubexpNames functions. E.g.:
s := `time="2017-05-30T19:02:08-05:00" level=info msg="some log message" app=sample size=10`
re := regexp.MustCompile(`time="(?P<time>.*?)"\slevel=(?P<level>.*?)\s`)
values := re.FindStringSubmatch(s)
keys := re.SubexpNames()
// create map
d := make(map[string]string)
for i := 1; i < len(keys); i++ {
d[keys[i]] = values[i]
}
fmt.Println(d)
// OUTPUT: map[time:2017-05-30T19:02:08-05:00 level:info]
values is a list containing all submatches. The first submatch is the whole expression that matches the regexp, followed by a submatch for each capturing group.
You can wrap the code into a function if you need this more frequently (i.e. if you need something like pythons match.groupdict):
package main
import (
"fmt"
"regexp"
)
func groupmap(s string, r *regexp.Regexp) map[string]string {
values := r.FindStringSubmatch(s)
keys := r.SubexpNames()
// create map
d := make(map[string]string)
for i := 1; i < len(keys); i++ {
d[keys[i]] = values[i]
}
return d
}
func main() {
s := `time="2017-05-30T19:02:08-05:00" level=info msg="some log message" app=sample size=10`
re := regexp.MustCompile(`time="(?P<time>.*?)"\slevel=(?P<level>.*?)\s`)
fmt.Println(groupmap(s, re))
// OUTPUT: map[time:2017-05-30T19:02:08-05:00 level:info]
}

Split string using regular expression in Go

I'm trying to find a good way to split a string using a regular expression instead of a string. Thanks
http://nsf.github.io/go/strings.html?f:Split!
You can use regexp.Split to split a string into a slice of strings with the regex pattern as the delimiter.
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile("[0-9]+")
txt := "Have9834a908123great10891819081day!"
split := re.Split(txt, -1)
set := []string{}
for i := range split {
set = append(set, split[i])
}
fmt.Println(set) // ["Have", "a", "great", "day!"]
}
I made a regex-split function based on the behavior of regex split function in java, c#, php.... It returns only an array of strings, without the index information.
func RegSplit(text string, delimeter string) []string {
reg := regexp.MustCompile(delimeter)
indexes := reg.FindAllStringIndex(text, -1)
laststart := 0
result := make([]string, len(indexes) + 1)
for i, element := range indexes {
result[i] = text[laststart:element[0]]
laststart = element[1]
}
result[len(indexes)] = text[laststart:len(text)]
return result
}
example:
fmt.Println(RegSplit("a1b22c333d", "[0-9]+"))
result:
[a b c d]
If you just want to split on certain characters, you can use strings.FieldsFunc, otherwise I'd go with regexp.FindAllString.
The regexp.Split() function would be the best way to do this.
You should be able to create your own split function that loops over the results of RegExp.FindAllString, placing the intervening substrings into a new array.
http://nsf.github.com/go/regexp.html?m:Regexp.FindAllString!
I found this old post while looking for an answer. I'm new to Go but these answers seem overly complex for the current version of Go. The simple function below returns the same result as those above.
package main
import (
"fmt"
"regexp"
)
func goReSplit(text string, pattern string) []string {
regex := regexp.MustCompile(pattern)
result := regex.Split(text, -1)
return result
}
func main() {
fmt.Printf("%#v\n", goReSplit("Have9834a908123great10891819081day!", "[0-9]+"))
}

How do you replace a character in Go using the Regexp package ReplaceAll function?

I am not familiar with C-like syntaxes and would like to write code to find & replace, say, all 'A's to 'B's in a source string, say 'ABBA' with the Regexp package ReplaceAll or ReplaceAllString functions? How do I set up type Regexp, src and repl? Here's the ReplaceAll code snippet from the Go documentation:
// ReplaceAll returns a copy of src in which all matches for the Regexp
// have been replaced by repl. No support is provided for expressions
// (e.g. \1 or $1) in the replacement text.
func (re *Regexp) ReplaceAll(src, repl []byte) []byte {
lastMatchEnd := 0; // end position of the most recent match
searchPos := 0; // position where we next look for a match
buf := new(bytes.Buffer);
for searchPos <= len(src) {
a := re.doExecute("", src, searchPos);
if len(a) == 0 {
break // no more matches
}
// Copy the unmatched characters before this match.
buf.Write(src[lastMatchEnd:a[0]]);
// Now insert a copy of the replacement string, but not for a
// match of the empty string immediately after another match.
// (Otherwise, we get double replacement for patterns that
// match both empty and nonempty strings.)
if a[1] > lastMatchEnd || a[0] == 0 {
buf.Write(repl)
}
lastMatchEnd = a[1];
// Advance past this match; always advance at least one character.
_, width := utf8.DecodeRune(src[searchPos:len(src)]);
if searchPos+width > a[1] {
searchPos += width
} else if searchPos+1 > a[1] {
// This clause is only needed at the end of the input
// string. In that case, DecodeRuneInString returns width=0.
searchPos++
} else {
searchPos = a[1]
}
}
// Copy the unmatched characters after the last match.
buf.Write(src[lastMatchEnd:len(src)]);
return buf.Bytes();
}
This is a routine to do what you want:
package main
import ("fmt"; "regexp"; "os"; "strings";);
func main () {
reg, error := regexp.Compile ("B");
if error != nil {
fmt.Printf ("Compile failed: %s", error.String ());
os.Exit (1);
}
output := string (reg.ReplaceAll (strings.Bytes ("ABBA"),
strings.Bytes ("A")));
fmt.Println (output);
}
Here is a small example. You can also find good examples in he Regexp test class
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
re, _ := regexp.Compile("e")
input := "hello"
replacement := "a"
actual := string(re.ReplaceAll(strings.Bytes(input), strings.Bytes(replacement)))
fmt.Printf("new pattern %s", actual)
}