Issues with regex. nil slice and FindStringSubmatch - regex

I'm trying to regex a pattern:
random-text (800)
I'm doing something like this:
func main() {
rando := "random-text (800)"
parsedThing := regexp.MustCompile(`\((.*?)\)`)
match := parsedThing.FindStringSubmatch(rando)
if match[1] == "" {
fmt.Println("do a thing")
}
if match[1] != "" {
fmt.Println("do a thing")
}
}
I only want to capture what's in the parentheses but FindString is parsing the (). I've also tried FindStringSubmatch, which is great I can specify the capture group in the slice but then I have an error in my unit test, that the slice is . I need to test for an empty string as that's a thing that could happen. Is there a better regex, that I can use that will only capture inside the parentheses? Or is there a better way to error handle for an nil slice.

I usually compare against nil, based on the documentation:
A return value of nil indicates no match.
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`\((.+)\)`)
find := re.FindStringSubmatch("random-text (800)")
if find != nil {
fmt.Println(find[1] == "800")
}
}

Related

Is it possible to match a string with two equal parts and a separator

I'm trying to come up with a regular expression that would allow me to match strings that have equal parts and a separator between them. For example:
foo;foo <- match
foobar;foobar <- match
foo;foobar <- no match
foo;bar <- no match
This could be easlily done with PCRE by using positive look-ahead assertion:
([^;]+);(?=\1$) The problem is, I need this for a program written in Go, using Re2 library, which doesn't support look-around assertions. I cannot change code, I can only feed it with a regex strings.
I am afraid the problem cannot be solved only with regex. So I have two solutions for you.
Solution 1 (using regex)
NOTE: This solution works if the string contains only one separator.
package main
import (
"fmt"
"regexp"
)
func regexMatch(str string) bool {
pattern1 := regexp.MustCompile(`^([^;]+);`)
pattern2 := regexp.MustCompile(`;([^;]+)$`)
match1 := pattern1.FindString(str)
match2 := pattern2.FindString(str)
return match1[:len(match1)-1] == match2[1:]
}
func main() {
fmt.Println(regexMatch("foo;foo")) // true
fmt.Println(regexMatch("foobar;foobar")) // true
fmt.Println(regexMatch("foo;foobar")) // false
fmt.Println(regexMatch("foo;bar")) // false
}
Solution 2 (using split)
This solution is more compact and if the separators can be more than one you can easily change the logic.
package main
import (
"fmt"
"strings"
)
func splitMatch(str string) bool {
matches := strings.Split(str, ";")
if (len(matches) != 2) {
return false
}
return matches[0] == matches[1]
}
func main() {
fmt.Println(splitMatch("foo;foo")) // true
fmt.Println(splitMatch("foobar;foobar")) // true
fmt.Println(splitMatch("foo;foobar")) // false
fmt.Println(splitMatch("foo;bar")) // false
}

Why this regular expression in Golang is non greedy? [duplicate]

This question already has answers here:
Lazy quantifier {,}? not working as I would expect
(3 answers)
Closed 2 years ago.
Here is a simple regular expression:
package main
import (
"fmt"
"regexp"
)
const data = "abcdefghijklmn"
func main() {
r, err := regexp.Compile(".{1,6}")
if err != nil {
panic(err)
}
for _, d := range r.FindAllIndex([]byte(data), -1) {
fmt.Println(data[d[0]:d[1]])
}
}
And we know it is greedy:
abcdef
ghijkl
mn
Now, we can add a ? after the expression to make it non greedy:
package main
import (
"fmt"
"regexp"
)
const data = "abcdefghijklmn"
func main() {
r, err := regexp.Compile(".{1,6}?")
if err != nil {
panic(err)
}
for _, d := range r.FindAllIndex([]byte(data), -1) {
fmt.Println(data[d[0]:d[1]])
}
}
And we can get:
a
b
c
d
e
f
g
h
i
j
k
l
m
n
However, if we add other chars after the expression, it becomes greedy:
package main
import (
"fmt"
"regexp"
)
const data = "abcdefghijklmn"
func main() {
r, err := regexp.Compile(".{1,6}?k")
if err != nil {
panic(err)
}
for _, d := range r.FindAllIndex([]byte(data), -1) {
fmt.Println(data[d[0]:d[1]])
}
}
And we get:
efghijk
So why it becomes greedy if we add a char after it?
Adding a lazy quantifier after a repetition count changes it from matching as many as possible, to as few as possible.
However, this does not change the fact that the string must be processed serially. This is where your two cases differ:
.{1,6}? returns one character at a time because this is the fewest matches as the string is being processed. The lazy quantifier lets the engine match after a single character, not needing to keep processing the string.
.{1,6}?k has to skip over abcd to get a match, but it then finds the substring starting at e to be a match. A lazy quantifier does not let the engine move to the next character in the string.
In short: matching from the current position takes precedence over moving to the next position in the hope of a smaller match.
As for your question about making it lazy again, you can't. You'll have to find a different regular expression for the output you want.

IPv4 regexp capturing the incorrect parts of the address [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 3 years ago.
I'm trying to write a program that prints the invalid part or parts of an IPv4 address from terminal input.
Here is my code:
package chapter4
import (
"bufio"
"fmt"
"os"
"regexp"
"strings"
"time"
)
func IPV4() {
var f *os.File
f = os.Stdin
defer f.Close()
scanner := bufio.NewScanner(f)
fmt.Println("Exercise 1, Chapter 4 - Detecting incorrect parts of IPv4 Addresses, enter an address!")
for scanner.Scan() {
if scanner.Text() == "STOP" {
fmt.Println("Initializing Level 4...")
time.Sleep(5 * time.Second)
break
}
expression := "(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])"
matchMe, err := regexp.Compile(expression)
if err != nil {
fmt.Println("Could not compile!", err)
}
s := strings.Split(scanner.Text(), ".")
for _, value := range s {
fmt.Println(value)
str := matchMe.FindString(value)
if len(str) == 0 {
fmt.Println(value)
}
}
}
}
My thought process is that for every terminal IP address input, I split the string by '.'
Then I iterate over the resulting []string and match each value to the regular expression.
For some reason the only case where the regex expression doesn't match is when there are letter characters in the input. Every number, no matter the size or composition, is a valid match for my expression.
I'm hoping you can help me identify the problem, and if there's a better way to do it, I'm all ears. Thanks!
Maybe, this expression might be closer to what you might have in mind:
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$
Test
package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?m)^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$`)
var str = `127.0.0.1
192.168.1.1
192.168.1.255
255.255.255.255
0.0.0.0
1.1.1.01
30.168.1.255.1
127.1
192.168.1.256
-1.2.3.4
3...3`
for i, match := range re.FindAllString(str, -1) {
fmt.Println(match, "found at index", i)
}
}
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Reference:
Validating IPv4 addresses with regexp
RegEx Circuit
jex.im visualizes regular expressions:
I am pretty sure that your expression needs anchors or the last part of it will match any single digit and succeed. Try using ^ on the front and $ on the back.

Split string using regular expression in Go

I'm trying to find a good way to split a string using a regular expression instead of a string. Thanks
http://nsf.github.io/go/strings.html?f:Split!
You can use regexp.Split to split a string into a slice of strings with the regex pattern as the delimiter.
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile("[0-9]+")
txt := "Have9834a908123great10891819081day!"
split := re.Split(txt, -1)
set := []string{}
for i := range split {
set = append(set, split[i])
}
fmt.Println(set) // ["Have", "a", "great", "day!"]
}
I made a regex-split function based on the behavior of regex split function in java, c#, php.... It returns only an array of strings, without the index information.
func RegSplit(text string, delimeter string) []string {
reg := regexp.MustCompile(delimeter)
indexes := reg.FindAllStringIndex(text, -1)
laststart := 0
result := make([]string, len(indexes) + 1)
for i, element := range indexes {
result[i] = text[laststart:element[0]]
laststart = element[1]
}
result[len(indexes)] = text[laststart:len(text)]
return result
}
example:
fmt.Println(RegSplit("a1b22c333d", "[0-9]+"))
result:
[a b c d]
If you just want to split on certain characters, you can use strings.FieldsFunc, otherwise I'd go with regexp.FindAllString.
The regexp.Split() function would be the best way to do this.
You should be able to create your own split function that loops over the results of RegExp.FindAllString, placing the intervening substrings into a new array.
http://nsf.github.com/go/regexp.html?m:Regexp.FindAllString!
I found this old post while looking for an answer. I'm new to Go but these answers seem overly complex for the current version of Go. The simple function below returns the same result as those above.
package main
import (
"fmt"
"regexp"
)
func goReSplit(text string, pattern string) []string {
regex := regexp.MustCompile(pattern)
result := regex.Split(text, -1)
return result
}
func main() {
fmt.Printf("%#v\n", goReSplit("Have9834a908123great10891819081day!", "[0-9]+"))
}

How do you replace a character in Go using the Regexp package ReplaceAll function?

I am not familiar with C-like syntaxes and would like to write code to find & replace, say, all 'A's to 'B's in a source string, say 'ABBA' with the Regexp package ReplaceAll or ReplaceAllString functions? How do I set up type Regexp, src and repl? Here's the ReplaceAll code snippet from the Go documentation:
// ReplaceAll returns a copy of src in which all matches for the Regexp
// have been replaced by repl. No support is provided for expressions
// (e.g. \1 or $1) in the replacement text.
func (re *Regexp) ReplaceAll(src, repl []byte) []byte {
lastMatchEnd := 0; // end position of the most recent match
searchPos := 0; // position where we next look for a match
buf := new(bytes.Buffer);
for searchPos <= len(src) {
a := re.doExecute("", src, searchPos);
if len(a) == 0 {
break // no more matches
}
// Copy the unmatched characters before this match.
buf.Write(src[lastMatchEnd:a[0]]);
// Now insert a copy of the replacement string, but not for a
// match of the empty string immediately after another match.
// (Otherwise, we get double replacement for patterns that
// match both empty and nonempty strings.)
if a[1] > lastMatchEnd || a[0] == 0 {
buf.Write(repl)
}
lastMatchEnd = a[1];
// Advance past this match; always advance at least one character.
_, width := utf8.DecodeRune(src[searchPos:len(src)]);
if searchPos+width > a[1] {
searchPos += width
} else if searchPos+1 > a[1] {
// This clause is only needed at the end of the input
// string. In that case, DecodeRuneInString returns width=0.
searchPos++
} else {
searchPos = a[1]
}
}
// Copy the unmatched characters after the last match.
buf.Write(src[lastMatchEnd:len(src)]);
return buf.Bytes();
}
This is a routine to do what you want:
package main
import ("fmt"; "regexp"; "os"; "strings";);
func main () {
reg, error := regexp.Compile ("B");
if error != nil {
fmt.Printf ("Compile failed: %s", error.String ());
os.Exit (1);
}
output := string (reg.ReplaceAll (strings.Bytes ("ABBA"),
strings.Bytes ("A")));
fmt.Println (output);
}
Here is a small example. You can also find good examples in he Regexp test class
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
re, _ := regexp.Compile("e")
input := "hello"
replacement := "a"
actual := string(re.ReplaceAll(strings.Bytes(input), strings.Bytes(replacement)))
fmt.Printf("new pattern %s", actual)
}