Regex to match empty string or pattern - regex

I'm trying to build an application that reads lines of csv text from the network and inserts it into sqlite db. I need to extract all strings that appear between commas, including empty strings.
For e.g a line of text that I need to parse looks like:
"1/17/09 1:23,\"Soap, Shampoo and cleaner\",,1200,Amex,Steven O' Campbell,,Kuwait,1/16/09 14:26,1/18/09 9:08,29.2891667,,48.05"
My code snippet is below , I figured I need to use regex since I'm trying to split the line of string at "," character but the comma may also appear as part of the string.
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
re := regexp.MustCompile(`^|[^,"']+|"([^"]*)"|'([^']*)`)
txt := "1/17/09 1:23,\"Soap, Shampoo and cleaner\",,1200,Amex,Steven O' Campbell,,Kuwait,1/16/09 14:26,1/18/09 9:08,29.2891667,,48.05"
arr := re.FindAllString(txt, -1)
arr2 := strings.Split(txt, ",")
fmt.Println("Array lengths: ", len(arr), len(arr2))
}
The correct length of the split array in this case should be 13.

Like Marc and Flimzy said, regex isn't the right tool here. And since you're not specifying that we should use regex as the tool to extract data from your string, here's a snippet on how you'd extract those from your string and fit the result you're looking for:
import (
"bytes"
"encoding/csv"
"fmt"
)
func main() {
var testdata = `1/17/09 1:23,"Soap, Shampoo and cleaner",,1200,Amex,Steven O' Campbell,,Kuwait,1/16/09 14:26,1/18/09 9:08,29.2891667,,48.05`
var reader = csv.NewReader(bytes.NewBufferString(testdata))
var content, err = reader.Read()
if err != nil {
panic(err)
}
fmt.Println(len(content)) // 13
}

Related

How to replace symbol AND make next letter uppercase in Go

I'm beginner trainee in Go.
I can't figure out how not just replace a symbol, but to make next letter Uppercase in Go.
Task:
Complete the method/function so that it converts dash/underscore delimited words into camel casing. The first word within the output should be capitalized only if the original word was capitalized (known as Upper Camel Case, also often referred to as Pascal case).
I tried to implement regexp methods with:
re, _ := regexp.Compile(`/[-_]\w/ig`)
res := re.FindAllStringSubmatch(s, -1)
return res
But i can't return res because it's slice/array, but i need to return just string.
My code:
package main
import (
"fmt"
"strings"
)
func ToCamelCase(s string) string {
s = strings.ReplaceAll(s, "-", "")
s = strings.ReplaceAll(s, "_", "")
return s
}
func main() {
var s string
fmt.Scan(&s)
fmt.Println(ToCamelCase(s))
}
Input:
"the-stealth-warrior" or "the_stealth_warrior"
Output:
"theStealthWarrior" or "TheStealthWarrior"
My Output: thestealthwarrior
You need to define the regex without regex delimiters in Go string literals, and it is more convenient to use the ReplaceAllStringFunc function:
package main
import (
"fmt"
"regexp"
"strings"
)
func ToCamelCase(s string) string {
re, _ := regexp.Compile(`[-_]\w`)
res := re.ReplaceAllStringFunc(s, func(m string) string {
return strings.ToUpper(m[1:])
})
return res
}
func main() {
s := "the-stealth-warrior"
fmt.Println(ToCamelCase(s))
}
See the Go playground.
The output is theStealthWarrior.
The [-_]\w pattern matches a - or _ and then any word char. If you want to exclude _ from \w, use [^\W_] instead of \w.

IPv4 regexp capturing the incorrect parts of the address [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 3 years ago.
I'm trying to write a program that prints the invalid part or parts of an IPv4 address from terminal input.
Here is my code:
package chapter4
import (
"bufio"
"fmt"
"os"
"regexp"
"strings"
"time"
)
func IPV4() {
var f *os.File
f = os.Stdin
defer f.Close()
scanner := bufio.NewScanner(f)
fmt.Println("Exercise 1, Chapter 4 - Detecting incorrect parts of IPv4 Addresses, enter an address!")
for scanner.Scan() {
if scanner.Text() == "STOP" {
fmt.Println("Initializing Level 4...")
time.Sleep(5 * time.Second)
break
}
expression := "(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])"
matchMe, err := regexp.Compile(expression)
if err != nil {
fmt.Println("Could not compile!", err)
}
s := strings.Split(scanner.Text(), ".")
for _, value := range s {
fmt.Println(value)
str := matchMe.FindString(value)
if len(str) == 0 {
fmt.Println(value)
}
}
}
}
My thought process is that for every terminal IP address input, I split the string by '.'
Then I iterate over the resulting []string and match each value to the regular expression.
For some reason the only case where the regex expression doesn't match is when there are letter characters in the input. Every number, no matter the size or composition, is a valid match for my expression.
I'm hoping you can help me identify the problem, and if there's a better way to do it, I'm all ears. Thanks!
Maybe, this expression might be closer to what you might have in mind:
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$
Test
package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?m)^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$`)
var str = `127.0.0.1
192.168.1.1
192.168.1.255
255.255.255.255
0.0.0.0
1.1.1.01
30.168.1.255.1
127.1
192.168.1.256
-1.2.3.4
3...3`
for i, match := range re.FindAllString(str, -1) {
fmt.Println(match, "found at index", i)
}
}
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Reference:
Validating IPv4 addresses with regexp
RegEx Circuit
jex.im visualizes regular expressions:
I am pretty sure that your expression needs anchors or the last part of it will match any single digit and succeed. Try using ^ on the front and $ on the back.

Regex extracting sets of numbers from string when prefix occurs, while not matching said prefix

As stated in the title, given a situation where I have a string like so:
"somestring~200~122"
I am wanting to regex to match the numbers when the prefix "~" occurs. So I can ultimately end up with [200, 122].
Matching the prefix is necessary as I need to protect against a case where a string like the one below should not be matched
"somestring~abc200~def122"
For additional context: As stated in the title, I am using go so I am planning on using doing something like the following in order to obtain the numbers within the string:
pattern := regexp.MustCompile("regex i need help with")
numbers := pattern.FindAllString(host, -1)
You can use FindAllStringSubmatch to extract the group containing just the digits. Below is an example that finds all instances of ~ followed by numbers. It additionally converts all the matches to ints
and inserts them into a slice:
package main
import (
"fmt"
"regexp"
"strconv"
)
func main() {
host := "somestring~200~122"
pattern := regexp.MustCompile(`~(\d+)`)
numberStrings := pattern.FindAllStringSubmatch(host, -1)
numbers := make([]int, len(numberStrings))
for i, numberString := range numberStrings {
number, err := strconv.Atoi(numberString[1])
if err != nil {
panic(err)
}
numbers[i] = number
}
fmt.Println(numbers)
}
https://play.golang.org/p/09YyewtRXz

Golang Regex: FindAllStringSubmatch to []string

I download a multiline file from Amazon S3 in format like:
ColumnAv1 ColumnBv1 ColumnCv1 ...
ColumnAv2 ColumnBv2 ColumnCv2 ...
the file is of type byte. Then I want to parse this with regex:
matches := re.FindAllSubmatch(file,-1)
then I want to feed result row by row to function which takes []string as input (string[0] is ColumnAv1, string[1] is ColumnBv2, ...).
How should I convert result of [][][]byte to []string containing first, second, etc row? I suppose I should do it in a loop, but I cannot get this working:
for i:=0;i<len(len(matches);i++{
tmp:=myfunction(???)
}
BTW, Why does function FindAllSubmatch return [][][]byte whereas FindAllStringSubmatch return [][]string?
(Sorry I don't have right now access to my real example, so the syntax may not be proper)
It's all explained extensively in the package's documentation.
Read the parapgraph which explains :
There are 16 methods of Regexp that match a regular expression and identify the matched text. Their names are matched by this regular expression:
Find(All)?(String)?(Submatch)?(Index)?
In your case, you probably want to use FindAllStringSubmatch.
In Go, a string is just a read-only []byte.
You can choose to either keep passing []byte variables around,
or cast the []byte values to string :
var byteSlice = []byte{'F','o','o'}
var str string
str = string(byteSlice)
You can simply iterate through the bytes result as you would do for strings result using two nested loop, and just convert slice of bytes to a string in the second loop:
package main
import "fmt"
func main() {
f := [][][]byte{{{'a', 'b', 'c'}}}
for _, line := range f {
for _, match := range line { // match is a type of []byte
fmt.Println(string(match))
}
}
}
Playground

Split string using regular expression in Go

I'm trying to find a good way to split a string using a regular expression instead of a string. Thanks
http://nsf.github.io/go/strings.html?f:Split!
You can use regexp.Split to split a string into a slice of strings with the regex pattern as the delimiter.
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile("[0-9]+")
txt := "Have9834a908123great10891819081day!"
split := re.Split(txt, -1)
set := []string{}
for i := range split {
set = append(set, split[i])
}
fmt.Println(set) // ["Have", "a", "great", "day!"]
}
I made a regex-split function based on the behavior of regex split function in java, c#, php.... It returns only an array of strings, without the index information.
func RegSplit(text string, delimeter string) []string {
reg := regexp.MustCompile(delimeter)
indexes := reg.FindAllStringIndex(text, -1)
laststart := 0
result := make([]string, len(indexes) + 1)
for i, element := range indexes {
result[i] = text[laststart:element[0]]
laststart = element[1]
}
result[len(indexes)] = text[laststart:len(text)]
return result
}
example:
fmt.Println(RegSplit("a1b22c333d", "[0-9]+"))
result:
[a b c d]
If you just want to split on certain characters, you can use strings.FieldsFunc, otherwise I'd go with regexp.FindAllString.
The regexp.Split() function would be the best way to do this.
You should be able to create your own split function that loops over the results of RegExp.FindAllString, placing the intervening substrings into a new array.
http://nsf.github.com/go/regexp.html?m:Regexp.FindAllString!
I found this old post while looking for an answer. I'm new to Go but these answers seem overly complex for the current version of Go. The simple function below returns the same result as those above.
package main
import (
"fmt"
"regexp"
)
func goReSplit(text string, pattern string) []string {
regex := regexp.MustCompile(pattern)
result := regex.Split(text, -1)
return result
}
func main() {
fmt.Printf("%#v\n", goReSplit("Have9834a908123great10891819081day!", "[0-9]+"))
}