Golang Regex extract text between 2 delimiters - including delimiters - regex

As stated in the title I have an program in golang where I have a string with a reoccurring pattern. I have a beginning and end delimiters for this pattern, and I would like to extract them from the string. The following is pseudo code:
string := "... This is preceding text
PATTERN BEGINS HERE (
pattern can continue for any number of lines...
);
this is trailing text that is not part of the pattern"
In short what I am attempting to do is from the example above is extract all occurrences of of the pattern that begins with "PATTERN BEGINS HERE" and ends with ");" And I need help in figuring out what the regex for this looks like.
Please let me know if any additional info or context is needed.

The regex is:
(?s)PATTERN BEGINS HERE.*?\);
where (?s) is a flag to let .* match multiple lines (see Go regex syntax).
See demo

Not regex, but works
func findInString(str, start, end string) ([]byte, error) {
var match []byte
index := strings.Index(str, start)
if index == -1 {
return match, errors.New("Not found")
}
index += len(start)
for {
char := str[index]
if strings.HasPrefix(str[index:index+len(match)], end) {
break
}
match = append(match, char)
index++
}
return match, nil
}
EDIT: Best to handle individual character as bytes and return a byte array

Related

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

Search through whole line and change words with ize to ise using regex in Notepad++

I want to search all the words in a line/sentence and detect any word with ize and convert it to ise except for certain words listed.
Find: ^(?!size)(?!resize)(?!Belize)(?!Bizet)(?!Brize)(?!Pfizer)(?!assize)(?!baize)(?!bedizen)(?!citizen)(?!denizen)(?!filesize)(?!maize)(?!prize)(?!netizen)(?!seize)(?!wizen)(?!outsize)(?!oversize)(?!misprize)(?!supersize)(?!undersize)(?!unsized)(?!upsize)([a-zA-Z-\s]+)ize
Replace: $1ise
So far all i get is the first word of the line with ize to work, or the last word with ize to work.
Example Organize to socialize whatever size.
To Organise to socialise whatever size.
Find (?i)(?!size|resize|Belize|so&so|unsized|upsize)(?<!\w)(\w+)ize
Replace $1ise
worked as intended. Capitalisation issues added (?i)
The regex ([a-zA-Z-\s]+)ize has the whitespace marker in it (\s) so it will will match anything beyond the word boundary. You might want to work with \w and/or \b to match only characters from the word where the "ize" is located. Additionally, you don't want the ^ at the beginning since this would match the start of the string.
Possible regex: (?!....your list....)(\w+)ize
Example input: "Organize to socialize whatever size."
Found matches: "Organize" and "socialize", but not "size", see https://regex101.com/r/UIfoa8/1
After that you can use your replacement $1ise to replace the found string with the captured group and "ise".
Make a Whitelist Array
Make the excluded words (whitelist) an array of strings
.split(' ') the text being searched through (searchStr) into an array
then .map() through each word of the array
using .indexOf() to compare a word vs. the whitelist
using .test() to see if it's a x+"ize" word to .replace()
Once the searchArray is complete, .join() it into a string (resultString).
Demo
"organize", "mesmerized", "socialize", and "baptize" was mixed into the search string of some whitelist words
var searchStr = `organize Belize Bizet mesmerized Brize Pfizer assize baize bedizen citizen denizen filesize socialize maize prize netizen seize wizen outsize baptize`;
var whitelist = ["size", "resize", "Belize", "Bizet", "Brize", "Pfizer", "assize", "baize", "bedizen", "citizen", "denizen", "filesize", "maize", "prize", "netizen", "seize", "wizen", "outsize", "oversize", "misprize", "supersize", "undersize", "unsized", "upsize"];
var searchArray = searchStr.split(' ').map(function(word) {
var match;
if (whitelist.indexOf(word) !== -1) {
match = word;
} else if (/([a-z]+?)ize/i.test(word)) {
match = word.replace(/([a-z]+?)ize/i, '$1ise');
} else {
match = word;
}
return match;
});
var resultString = searchArray.join(', ');
console.log(resultString);

Regex Replace everything except between the first " and the last "

i need a regex that replaces everything except the content between the first " and the last ".
I need it like this:
Input String:["Key:"Value""]
And after the regex i only need this:
Output String:Key:"Value"
Thanks!
You can try something like this.
patern:
^.*?"(.*)".*$
Substion:
$1
On Regex101
Explination:
the first part ^.*?" matches as few characters as possible that are between the start of the string and a double quote
the second part(.*)" makes the largest match it can that ends in a double quote, and stuffs it all in a capture group
the last part .*$ grabs what ever is left and includes it in the match
Finally you replace the entire match with the contents of the first capture group
Can you say why you need a RegExp?
A function like:
String unquote(String input) {
int start = input.indexOf('"');
if (start < 0) return input; // or throw.
int end = input.lastIndexOf('"');
if (start == end) return input; // or throw
return input.substring(start + 1, end);
}
is going to be faster and easier to understand than a RegExp.
Anyway, for the challenge, let's say we do want a RegExp that replaces the part up to the first " and from the last " with nothing. That's two replaces, so you can do an
input.replaceAll(RegExp(r'^[^"]*"|"[^"]*$'), "")`
or you can use a capturing group and a computed replacement like:
input.replaceFirstMapped(RegExp(r'^[^"]*"([^]*)"[^"]*$'), (m) => m[1])
Alternatively, you can use the capturing group to select the text between the two and extract it in code, instead of doing string replacement:
String unquote(String input) {
var re = RegExp(r'^[^"]*"([^]*)"[^"]$');
var match = re.firstMatch(input);
if (match == null) return input; // or throw.
return match[1];
}

apostrophe in word not being recognized for string replace

I am having a problem replacing the word "you're" with regexp.
All of the other words are changing correctly just the word "you're".
I think it is not parsing after the apostrophe.
I have to replace the word "you" to "I" and "you're" to "I'm".
It will change "you" to "I" but "you're" becomes "I're" because it is not going past the apostrophe and it thinks that is the end of the word for some reason. I have to escape the apostrophe somehow.
Please see below for the code in question.
package main
import (
"fmt"
"math/rand"
"regexp"
"strings"
"time"
)
//Function ElizaResponse to take in and return a string
func ElizaResponse(str string) string {
// replace := "How do you know you are"
/*Regex MatchString function with isolation of the word "father"
*with a boundry ignore case regex command.
*/
if matched, _ := regexp.MatchString(`(?i)\bfather\b`, str);
//Condition to replace the original string if it has the word "father"
matched {
return "Why don’t you tell me more about your father?"
}
r1 := regexp.MustCompile(`(?i)\bI'?\s*a?m\b`)
//Match the words "I am" and capture for replacement
matched := r1.MatchString(str)
//condition if "I am" is matched
if matched {
capturedString := r1.ReplaceAllString(str, "$1")
boundaries := regexp.MustCompile(`\b`)
tokens := boundaries.Split(capturedString, -1)
// List the reflections.
reflections := [][]string{
{`I`, `you`},
{`you're`, `I'm`},
{`your`, `my`},
{`me`, `you`},
{`you`, `I`},
{`my`, `your`},
}
// Loop through each token, reflecting it if there's a match.
for i, token := range tokens {
for _, reflection := range reflections {
if matched, _ := regexp.MatchString(reflection[0], token); matched {
tokens[i] = reflection[1]
break
}
}
}
// Put the tokens back together.
return strings.Join(tokens, ``)
}
//Get random number from the length of the array of random struct
//an array of strings for the random response
response := []string{"I’m not sure what you’re trying to say. Could you explain it to me?",
"How does that make you feel?",
"Why do you say that?"}
//Return a random index of the array
return response[rand.Intn(len(response))]
}
func main() {
rand.Seed(time.Now().UTC().UnixNano())
fmt.Println("Im supposed to just take what you're saying at face value?")
fmt.Println(ElizaResponse("Im supposed to just take what you're saying at face value?"))
}
Note that the apostrophe character creates a word boundary, so your use of \b in regular expressions is probably tripping you up. That is, the string "I'm" has four word boundaries, one before and after each character.
┏━┳━┳━┓
┃I┃'┃m┃
┗━┻━┻━┛
│ │ │ └─ end of line creates a word boundary
│ │ └─── after punctuation character creates a word boundary
│ └───── before punctuation character creates a word boundary
└─────── start of line creates a word boundary
There is no way to change the behavior of the word boundary metacharacter so you might be better off mapping regexes that include the full word with punctuation to the desired replacement, e.g.:
type Replacement struct {
rgx *regexp.Regexp
rpl string
}
replacements := []Replacement{
{regexp.MustCompile("\\bI\\b"), "you"},
{regexp.MustCompile("\\byou're\\b"), "I'm"},
// etc...
}
Note also that one of your examples contains a UTF-8 "right single quotation mark" (U+2019, 0xe28099), not to be confused with the UTF-8/ASCII apostrophe (U+0027, 0x27)!
fmt.Sprintf("% x", []byte("'’")) // => "27 e2 80 99"
What you want to achieve here is to replace specific strings with specific replacements. It is easier to achieve that with a map of string keys and values, where each unique key is a literal phrase to search and the values are the texts to replace with.
This how you may define the reflections:
reflections := map[string]string{
`you're`: `I'm`,
`your`: `my`,
`me`: `you`,
`you`: `I`,
`my`: `your`,
`I` : `you`,
}
Next, you need to get the keys in the descending by length order (here is a sample code):
type ByLenDesc []string
func (a ByLenDesc) Len() int {
return len(a)
}
func (a ByLenDesc) Less(i, j int) bool {
return len(a[i]) > len(a[j])
}
func (a ByLenDesc) Swap(i, j int) {
a[i], a[j] = a[j], a[i]
}
And then in the function:
var keys []string
for key, _ := range reflections {
keys = append(keys, key)
}
sort.Sort(ByLenDesc(keys))
Then build the pattern:
pat := "\\b(" + strings.Join(keys, `|`) + ")\\b"
// fmt.Println(pat) // => \b(you're|your|you|me|my|I)\b
The pattern matches you're, your, you, me, my, or I as whole words.
res := regexp.MustCompile(pat).ReplaceAllStringFunc(capturedString, func(m string) string {
return reflections[m]
})
The above code creates a regex object and replaces all matches with the corresponding reflections values.
See the Go demo.
I have found that i just needed to change these two lines of code.
boundaries := regexp.MustCompile(`(\b[^\w']|$)`)
return strings.Join(tokens, ` `)
Its stops the split function from splitting at the ' character.
Then the return of tokens needs a space to put out the string otherwise it would be a continuous string.

Golang regexp to match multiple patterns between keyword pairs

I have a string which has two keywords: "CURRENT NAME(S)" and "NEW NAME(S)" and each of these keywords are followed by a bunch of words. I want to extract those set of words beyond each of these keywords. To elaborate with a code:
s := `"CURRENT NAME(S)
Name1, Name2",,"NEW NAME(S)
NewName1,NewName2"`
re := regexp.MustCompile(`"CURRENT NAME(S).*",,"NEW NAME(S).*"`)
segs := re.FindAllString(s, -1)
fmt.Println("segs:", segs)
segs2 := re.FindAllStringSubmatch(s, -1)
fmt.Println("segs2:", segs2)
As you can see, the string 's' has the input. "Name1,Name2" is the current names list and "NewName1, NewName2" is the new names list. I want to extract these two lists. The two lists are separated by a comma. Each of the keywords are beginning with a double quote and their reach ends, when their corresponding double quote ends.
What is the way to use regexp such that the program can print "Name1, Name2" and "NewName1,NewName2" ?
The issue with your regex is that the input string contains newline symbols, and . in Go regex does not match a newline. Another issue is that the .* is a greedy pattern and will match as many symbols as it can up to the last second keyword. Also, you need to escape parentheses in the regex pattern to match the ( and ) literal symbols.
The best way to solve the issue is to change .* into a negated character class pattern [^"]* and place it inside a pair of non-escaped ( and ) to form a capturing group (a construct to get submatches from the match).
Here is a Go demo:
package main
import (
"fmt"
"regexp"
)
func main() {
s := `"CURRENT NAME(S)
Name1, Name2",,"NEW NAME(S)
NewName1,NewName2"`
re := regexp.MustCompile(`"CURRENT NAME\(S\)\s*([^"]*)",,"NEW NAME\(S\)\s*([^"]*)"`)
segs2 := re.FindAllStringSubmatch(s,-1)
fmt.Printf("segs2: [%s; %s]", segs2[0][1], segs2[0][2])
}
Now, the regex matches:
"CURRENT NAME\(S\) - a literal string "CURRENT NAME(S)`
\s* - zero or more whitespaces
([^"]*) - Group 1 capturing 0+ chars other than "
",,"NEW NAME\(S\) - a literal string ",,"NEW NAME(S)
\s* - zero or more whitespaces
([^"]*) - Group 2 capturing 0+ chars other than "
" - a literal "
If your input doesn't change then the simplest way would be to use submatches (groups). You can try something like this:
// (?s) is a flag that enables '.' to match newlines
var r = regexp.MustCompile(`(?s)CURRENT NAME\(S\)(.*)",,"NEW NAME\(S\)(.*)"`)
fmt.Println(r.MatchString(s))
m := r.FindSubmatch([]byte(s)) // FindSubmatch requires []byte
for _, match := range m {
s := string(match)
fmt.Printf("Match - %d: %s\n", i, strings.Trim(s, "\n")) //remove the newline
}
Output: (Note that the first match is the entire input string because it completely matches the regex (https://golang.org/pkg/regexp/#Regexp.FindSubmatch)
Match - 0: CURRENT NAME(S)
Name1, Name2",,"NEW NAME(S)
NewName1,NewName2"
Match - 1: Name1, Name2
Match - 2: NewName1,NewName2
Example: https://play.golang.org/p/0cgBOMumtp
For a fixed format like in the example, you can also avoid regular expressions and perform explicit parsing as in this example - https://play.golang.org/p/QDIyYiWJHt:
package main
import (
"fmt"
"strings"
)
func main() {
s := `"CURRENT NAME(S)
Name1, Name2",,"NEW NAME(S)
NewName1,NewName2"`
names := []string{}
parts := strings.Split(s, ",,")
for _, part := range parts {
part = strings.Trim(part, `"`)
part = strings.TrimPrefix(part, "CURRENT NAME(S)")
part = strings.TrimPrefix(part, "NEW NAME(S)")
part = strings.TrimSpace(part)
names = append(names, part)
}
fmt.Println("Names:")
for _, name := range names {
fmt.Println(name)
}
}
Output:
Names:
Name1, Name2
NewName1,NewName2
It uses a few more lines of code but may make it easier to understand the processing logic at a first glance.