What would be a regex to match all occurrences after =, space separated? - regex

I have /components/component[name=fan/10 index=55]/cpu
I want a regex that givens me fan/10 and 55.
I tried stuff like =(.*)\s, but doesn't work. But I'm guessing it has to be done using capturing groups (the () ) somehow?

You may use
=([^\]\s]+)
See regex demo
Details
= - an equals sign
([^\]\s]+) - Capturing group 1: any 1 or more chars other than ] and whitespace.
GO demo:
package main
import (
"fmt"
"regexp"
)
func main() {
s := "/components/component[name=fan/10 index=55]/cpu"
rx := regexp.MustCompile(`=([^\]\s]+)`)
matches := rx.FindAllStringSubmatch(s, -1)
for _, v := range matches {
fmt.Println(v[1])
}
}
Output:
fan/10
55

You may try to use something like this:
s := "/components/component[name=fan/10 index=55]/cpu"
re := regexp.MustCompile(`=([^\s\]]*)`)
matches := re.FindAllStringSubmatch(s, -1)
fmt.Println(matches)
Result will be:
[[=fan/10 fan/10] [=55 55]]

Related

Is it possible to match a string with two equal parts and a separator

I'm trying to come up with a regular expression that would allow me to match strings that have equal parts and a separator between them. For example:
foo;foo <- match
foobar;foobar <- match
foo;foobar <- no match
foo;bar <- no match
This could be easlily done with PCRE by using positive look-ahead assertion:
([^;]+);(?=\1$) The problem is, I need this for a program written in Go, using Re2 library, which doesn't support look-around assertions. I cannot change code, I can only feed it with a regex strings.
I am afraid the problem cannot be solved only with regex. So I have two solutions for you.
Solution 1 (using regex)
NOTE: This solution works if the string contains only one separator.
package main
import (
"fmt"
"regexp"
)
func regexMatch(str string) bool {
pattern1 := regexp.MustCompile(`^([^;]+);`)
pattern2 := regexp.MustCompile(`;([^;]+)$`)
match1 := pattern1.FindString(str)
match2 := pattern2.FindString(str)
return match1[:len(match1)-1] == match2[1:]
}
func main() {
fmt.Println(regexMatch("foo;foo")) // true
fmt.Println(regexMatch("foobar;foobar")) // true
fmt.Println(regexMatch("foo;foobar")) // false
fmt.Println(regexMatch("foo;bar")) // false
}
Solution 2 (using split)
This solution is more compact and if the separators can be more than one you can easily change the logic.
package main
import (
"fmt"
"strings"
)
func splitMatch(str string) bool {
matches := strings.Split(str, ";")
if (len(matches) != 2) {
return false
}
return matches[0] == matches[1]
}
func main() {
fmt.Println(splitMatch("foo;foo")) // true
fmt.Println(splitMatch("foobar;foobar")) // true
fmt.Println(splitMatch("foo;foobar")) // false
fmt.Println(splitMatch("foo;bar")) // false
}

Czech characters in regexp search

I am trying to implement very simple text matcher for Czech words. Since Czech language is very suffix heavy I want to define start of the word and then just greedy match rest of the word. This is my implementation so far:
r := regexp.MustCompile("(?i)\\by\\w+\\b")
text := "x yž z"
matches := r.FindAllString(text, -1)
fmt.Println(matches) //have [], want [yž]
I studied Go's regexp syntax:
https://github.com/google/re2/wiki/Syntax
but I don't know, how to define czech language characters there? Using \w just matches ASCII characters, not Czech UTF characters.
Can you please help me?
In RE2, both \w and \b are not Unicode-aware:
\b at ASCII word boundary («\w» on one side and «\W», «\A», or «\z» on the other)
\w word characters (== [0-9A-Za-z_])
A more generalized example will be to split with any chunk of one or more non-letter chars, and then collect only those items that meet your criteria:
package main
import (
"fmt"
"strings"
"regexp"
)
func main() {
output := []string{}
r := regexp.MustCompile(`\P{L}+`)
str := "x--++yž,,,.z..00"
words := r.Split(str, -1)
for i := range words {
if len(words[i]) > 0 && (strings.HasPrefix(words[i], `y`) || (strings.HasPrefix(words[i], `Y`)) {
output = append(output, words[i])
}
}
fmt.Println(output)
}
See the Go demo.
Note that a naive approach like
package main
import (
"fmt"
"regexp"
)
func main() {
output := []string{}
r := regexp.MustCompile(`(?i)(?:\P{L}|^)(y\p{L}*)(?:\P{L}|$)`)
str := "x--++yž,,,.z..00..."
matches := r.FindAllStringSubmatch(str, -1)
for _, v := range matches {
output = append(output, v[1])
}
fmt.Println(output)
}
won't work in case you have match1,match2 match3 like consecutive matches in the string as it will only getch the odd occurrences since the last non-capturing group pattern will consume the char that is supposed to be matched by the first non-capturing group pattern upon the next match.
A workaround for the above code would be adding some non-letter char to the end of the non-letter streaks, say
package main
import (
"fmt"
"regexp"
)
func main() {
output := []string{}
r := regexp.MustCompile(`(?i)(?:\P{L}|^)(u\p{L}*)(?:\P{L}|$)`)
str := "uhličitá,uhličité,uhličitou,uhličitého,yz,my"
matches := r.FindAllStringSubmatch(regexp.MustCompile(`\P{L}+`).ReplaceAllString(str, `$0 `), -1)
for _, v := range matches {
output = append(output, v[1])
}
fmt.Println(output)
}
// => [uhličitá uhličité uhličitou uhličitého]
See this Go demo.
Here, regexp.MustCompile(`\P{L}+`).ReplaceAllString(str, `$0 `) adds a space after all chunks of non-letter chars.

Find all strings in between two strings in Go

I am working on extracting mutliple matches between two strings.
In the example below, I am trying to regex out an A B C substring out of my string.
Here is my code:
package main
import (
"fmt"
"regexp"
)
func main() {
str:= "Movies: A B C Food: 1 2 3"
re := regexp.MustCompile(`[Movies:][^Food:]*`)
match := re.FindAllString(str, -1)
fmt.Println(match)
}
I am clearly doing something wrong in my regex. I am trying to get the A B C string between Movies: and Food:.
What is the proper regex to get all strings between two strings?
In Go, since its RE2-based regexp does not support lookarounds, you need to use capturing mechanism with regexp.FindAllStringSubmatch function:
left := "LEFT_DELIMITER_TEXT_HERE"
right := "RIGHT_DELIMITER_TEXT_HERE"
rx := regexp.MustCompile(`(?s)` + regexp.QuoteMeta(left) + `(.*?)` + regexp.QuoteMeta(right))
matches := rx.FindAllStringSubmatch(str, -1)
Note the use of regexp.QuoteMeta that automatically escapes all special regex metacharacters in the left- and right-hand delimiters.
The (?s) makes . match across lines and (.*?) captures all between ABC and XYZ into Group 1.
So, here you can use
package main
import (
"fmt"
"regexp"
)
func main() {
str:= "Movies: A B C Food: 1 2 3"
r := regexp.MustCompile(`Movies:\s*(.*?)\s*Food`)
matches := r.FindAllStringSubmatch(str, -1)
for _, v := range matches {
fmt.Println(v[1])
}
}
See the Go demo. Output: A B C.

skip regex chars until search using golang

This will skip the 1st 2 characters and start matching left to right
re := regexp.MustCompile("(^.{2})(\\/path\\/subpath((\\/.*)|()))")
fmt.Println(re.MatchString("c:/path/subpath/path/subpath/")) // true
fmt.Println(re.MatchString("c:/patch/subpath/path/subpath/")) // false
notice the second one doesnt hit. even though /path/subpath exists in the string. This is perfect.
now if if dont know how many characters to skip and want to start search at the 1st '/' then i tried this
re2 := regexp.MustCompile("([^\\/])(\\/path\\/subpath((\\/.*)|()))")
fmt.Println(re2.MatchString("cddddd:/path/subpath/path/subpath")) // true
which is perfect. but if i change the 1st path
fmt.Println(re2.MatchString("cddddd:/patch/subpath/path/subpath")) // this is true as well
I don't want the last one to match the second /path/subpath. I want to be able to search in the 1st group, start the second group from there and do a left to right match.
Any help would be great appreciated.
It pays to be more precise about what you want, state what you want in absolute terms, not like "second first should not match third". Instead, say;
I want to capture the path if it begins with /path/subpath in the second group. If a path contains /path/subpath somewhere in later than the beginning, then I don't want that to match.
Also, slashes are not special in regex, so you don't need to double-escape them for nothing.
The third expression, does this:
capture everything that is not a slash from the start anchor
delimit group 1 from group 2 by :
require /path/subpath to be the at the top of the path
capture whatever remains
This may be what you want:
package main
import (
"fmt"
"regexp"
)
func main() {
paths := []string{
"c:/path/subpath/path/subpath/",
"c:/patch/subpath/path/subpath/",
"cddddd:/path/subpath/path/subpath",
}
re1 := regexp.MustCompile("(^.{2})(/path/subpath(/.*))")
re2 := regexp.MustCompile("([^/])(/path/subpath((/.*)|()))")
re3 := regexp.MustCompile(`^([^/]+):/path/subpath(/.*)`)
for i, re := range []*regexp.Regexp{re1, re2, re3} {
i++
for _, s := range paths {
fmt.Println(i, re.MatchString(s), s)
if re.MatchString(s) {
matches := re.FindStringSubmatch(s)
for m, g := range matches {
m++
if m > 1 {
fmt.Printf("\n\t%d %v", m, g)
}
}
}
println()
}
println()
}
}
Output
$ go run so-regex-path.go
(...)
3 true c:/path/subpath/path/subpath/
2 c
3 /path/subpath/
3 false c:/patch/subpath/path/subpath/
3 true cddddd:/path/subpath/path/subpath
2 cddddd
3 /path/subpath

Regex to find string and backslashes

I have these strings and they can come in a variety of ways such as:
id=PS\\ Old\\ Gen, value=34 and id=Code\\ Cache,value=22 etc.
I would like a regex that would extract anything after the = to the , so basically: PS\\ Old\\ Gen and Code\\ Cache etc.
I have written the following regex but can't seem to get the last word before the ,.
(([a-zA-z]+)\\{2})+
Any thoughts? This is for go language.
You can use this regex and capture your text from group1,
id=([^,=]*),
Explanation:
id= - Matches id= literally
([^,=]*) - Matches any character except , or = zero or more times and captures in first grouping pattern
, - Matches a comma
Demo
Sample Go codes,
var re = regexp.MustCompile(`id=([^,=]*),`)
var str = `id=PS\\ Old\\ Gen, value=34 id=Code\\ Cache,value=22`
res := re.FindAllStringSubmatch(str, -1)
for i := range res {
fmt.Printf("Match: %s\n", res[i][1])
}
Prints,
Match: PS\\ Old\\ Gen
Match: Code\\ Cache
Does something like id=([^,]+), do the trick?
Capture group no.1 will contain your match. See this in action here
How about that? SEE REGEX
package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?mi)id=([^,]+)`)
var str = `id=PS\\ Old\\ Gen, value=34 and id=Code\\ Cache,value=22`
for i, match := range re.FindAllString(str, -1) {
fmt.Println(match, "found at index", i)
}
}