How to extract substrings [duplicate] - regex

This question already has answers here:
How to extract a floating number from a string [duplicate]
(7 answers)
Closed 2 years ago.
I wrote in Go the following code to extract two values ​​inside the string.
I used two regexp to seek the numbers (float64).
The first result is the correct, only de number. But the second is wrong.
This is the code:
package main
import (
"fmt"
"regexp"
)
func main() {
// RegExp utiliza la sintaxis RE2
pat1 := regexp.MustCompile(`[^m2!3d][\d\.-]+`)
s1 := pat1.FindString(`Torre+Eiffel!8m2!3d-48.8583701!4d-2.2944813!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d-48.8583701!4d-2.2944813`)
pat2 := regexp.MustCompile(`[^!4d][\d\.-]+`)
s2 := pat2.FindString(`Torre+Eiffel!8m2!3d-48.8583701!4d-2.2944813!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d-48.8583701!4d-2.2944813`)
fmt.Println(s1) // Print -> -48.8583701
fmt.Println(s2) // Print -> m2 (The correct answer is "-2.2944813")
}
Here I modify the syntax
pat2 := regexp.MustCompile(!4d[\d\.-]+)
and I get the following answer:
!4d-2.2944813
but it's not what I'm expecting.

It seems like you are only interessed in the latitude and longitute of an attraction and not really in the regex.
Maybe you just use something like this:
package main
import (
"fmt"
"strconv"
"strings"
)
var replacer = strings.NewReplacer("3d-", "", "4d-", "")
func main() {
var str = `Torre+Eiffel!8m2!3d-48.8583701!4d-2.2944813!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d-48.8583701!4d-2.2944813`
fmt.Println(getLatLong(str))
}
func getLatLong(str string) (float64, float64, error) {
parts := strings.Split(str, "!")
if latFloat, err := strconv.ParseFloat(replacer.Replace(parts[2]), 64); err != nil {
return 0, 0, err
} else if lngFloat, err := strconv.ParseFloat(replacer.Replace(parts[3]), 64); err != nil {
return 0, 0, err
} else {
return latFloat, lngFloat, nil
}
}
https://play.golang.org/p/UOIwGbl6nrb

You where almost there. Try (?m)(?:3d|4d)-([\d\.-]+)(?:!|$)
https://regex101.com/r/8KgirB/1
All you need is a matching group around the [\d\.-]+ part. With this group you are able to access it directly
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?m)(?:3d|4d)-([\d\.-]+)!`)
var str = `Torre+Eiffel!8m2!3d-48.8583701!4d-2.2944813!3m4!1s0x47e66e2964e34e2d:0x8ddca9ee380ef7e0!8m2!3d-48.8583701!4d-2.2944813`
for _, match := range re.FindAllStringSubmatch(str, -1) {
fmt.Println(match[1])
}
}

Related

How to optimize a CSV loader with pprof?

I am trying to optimize a CSV loading process that is basically doing a regex search in a large CSV file (+4GB - 31033993 records for my experiment)
I managed to build a multiprocessing logic to read the CSV but when I analyze the CPU profiling using pprof I think my regex search is not optimized. Could you help me improve this code so that it can read the CSV much quickly?
Here is my code so far:
package main
import (
"bufio"
"flag"
"fmt"
"log"
"os"
"regexp"
"runtime"
"runtime/pprof"
"strings"
"sync"
)
func processFile(path string) [][]string {
file, err := os.Open(path)
if err != nil {
log.Println("Error:", err)
}
var pattern = regexp.MustCompile(`^.*foo.*$`)
numCPU := runtime.NumCPU()
jobs := make(chan string, numCPU+1)
fmt.Printf("Strategy: Parallel, %d Workers ...\n", numCPU)
results := make(chan []string)
wg := new(sync.WaitGroup)
for w := 1; w <= numCPU; w++ {
wg.Add(1)
go parseRecord(jobs, results, wg, pattern)
}
go func() {
scanner := bufio.NewScanner(file)
for scanner.Scan() {
jobs <- scanner.Text()
}
close(jobs)
}()
go func() {
wg.Wait()
close(results)
}()
lines := [][]string{}
for line := range results {
lines = append(lines, line)
}
return lines
}
func parseRecord(jobs <-chan string, results chan<- []string, wg *sync.WaitGroup, pattern *regexp.Regexp) {
defer wg.Done()
for j := range jobs {
if pattern.MatchString(j) {
x := strings.Split(string(j), "\n")
results <- x
}
}
}
func split(r rune) bool {
return r == ','
}
func main() {
f, err := os.Create("perf.data")
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
pathFlag := flag.String("file", "", `The CSV file to operate on.`)
flag.Parse()
lines := processFile(*pathFlag)
fmt.Println("loaded", len(lines), "records")
}
When I process the file without any regex constraint I am getting a reasonable computing time (I simply load the parsed string into the 2D array without any pattern.MatchString())
Strategy: Parallel, 8 Workers ...
loaded 31033993 records
2018/10/09 11:46:38 readLines took 30.611246035s
Instead, when I run the above code with the Regex constraint I am getting this result:
Strategy: Parallel, 8 Workers ...
loaded 143090 records
2018/10/09 12:04:32 readLines took 1m24.029830907s
MatchString looks for any match on the string
So you can get rid of the anchors and the wildcarding
The wildcarding at both ends is usually slow in regexp engines
example showing this on go 1.10
package reggie
import (
"regexp"
"testing"
)
var pattern = regexp.MustCompile(`^.*foo.*$`)
var pattern2 = regexp.MustCompile(`foo`)
func BenchmarkRegexp(b *testing.B) {
for i := 0; i < b.N; i++ {
pattern.MatchString("youfathairyfoobar")
}
}
func BenchmarkRegexp2(b *testing.B) {
for i := 0; i < b.N; i++ {
pattern2.MatchString("youfathairyfoobar")
}
}
$ go test -bench=.
goos: darwin
goarch: amd64
BenchmarkRegexp-4 3000000 471 ns/op
BenchmarkRegexp2-4 20000000 101 ns/op
PASS
ok _/Users/jsandrew/wip/src/reg 4.031s

Matching serialized strings [duplicate]

This question already has answers here:
Regexp won't match
(2 answers)
Closed 4 years ago.
I am receiving this string "\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80J\x13\x80SQ\x80L\xe0\x80#\x92\x80L?\x80H\xe0" from a function (which runs a GET command on a redis bitmap and gives me the serialized string)
But due to escape sequences I am having trouble matching this kind of pattern. Can some please tell me the regex sequence that will match this kind of string?
First, I'd try to find out how to get that same data from Redis in its direct, binary, form (as []byte); the rest would be way more simple then.
But to deal with this stuff in its present form,
I would first normalize the input string—replacing all those
backslash-escaped hex-encoded characters with their equivalent bytes.
This would allow easily searchig for the exact values of these
bytes—possibly using backslash-escaped hex-encoded characters
in the patterns:
package main
import (
"fmt"
"strconv"
"strings"
)
func main() {
s := "\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80J\x13\x80SQ\x80L\xe0\x80#\x92\x80L?\x80H\xe0"
s, err := strconv.Unquote(`"` + s + `"`)
if err != nil {
panic(err)
}
fmt.Println(strings.Index(s, "\x80SQ\x80L"))
}
Playground link.
An introduction to Redis data types and abstractions
Bitmaps
Bitmaps are not an actual data type, but a set of bit-oriented
operations defined on the String type.
Regular expressions are not the best solution. Write a simple Go function to do the conversion. For example,
package main
import (
"fmt"
"strconv"
)
func redisBits(s string) (string, error) {
s, err := strconv.Unquote(`"` + s + `"`)
if err != nil {
return "", err
}
return s, nil
}
func main() {
s := "\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80J\x13\x80SQ\x80L\xe0\x80#\x92\x80L?\x80H\xe0"
fmt.Printf("%x\n", s)
b, err := redisBits(s)
if err != nil {
fmt.Println(err)
return
}
fmt.Printf("%x\n", b)
}
Playground: https://play.golang.org/p/rbE9iG3tOTx
Output:
0100000000000000000000804a13805351804ce0804092804c3f8048e0
0100000000000000000000804a13805351804ce0804092804c3f8048e0

How can I unit test that a text will appear in the center of the screen?

This is a little script in go.
package bashutil
import (
"fmt"
"github.com/nsf/termbox-go"
)
func Center(s string) {
if err := termbox.Init(); err != nil {
panic(err)
}
w, _ := termbox.Size()
termbox.Close()
fmt.Printf(
fmt.Sprintf("%%-%ds", w/2),
fmt.Sprintf(fmt.Sprintf("%%%ds", w/2+len(s)/2), s),
)
}
Can I unit test it? How can I test it? I think is a nonsense test a snippet so little. But, ... What if I would test this code? How can I test that an output is equals as I expect?
Can I test that fmt prints something like I expect?
What means "test" ?
I think "test" need have effect on output of a function.
Your function's output is Stdout, so we need get the output first.
We can do this simply:
func TestCenter(*testing.T) {
stdoutBak := os.Stdout
r, w, _ := os.Pipe()
os.Stdout = w
Center("hello")
w.Close()
os.Stdout = stdoutBak
// Check output as a byte array
outstr, _ := ioutil.ReadAll(r)
fmt.Printf("%s", outstr)
}
Thus, you can check output format, spelling, etc.

How do I unit test command line flags in Go?

I would like a unit test that verifies a particular command line flag is within an enumeration.
Here is the code I would like to write tests against:
var formatType string
const (
text = "text"
json = "json"
hash = "hash"
)
func init() {
const (
defaultFormat = "text"
formatUsage = "desired output format"
)
flag.StringVar(&formatType, "format", defaultFormat, formatUsage)
flag.StringVar(&formatType, "f", defaultFormat, formatUsage+" (shorthand)")
}
func main() {
flag.Parse()
}
The desired test would pass only if -format equalled one of the const values given above. This value would be available in formatType. An example correct call would be: program -format text
What is the best way to test the desired behaviors?
Note: Perhaps I have phrased this poorly, but the displayed code it not the unit test itself, but the code I want to write unit tests against. This is a simple example from the tool I am writing and wanted to ask if there were a good way to test valid inputs to the tool.
Custom testing and processing of flags can be achieved with the flag.Var function in the flag package.
Flag.Var "defines a flag with the specified name and usage string. The type and value of the flag are represented by the first argument, of type Value, which typically holds a user-defined implementation of Value."
A flag.Value is any type that satisfies the Value interface, defined as:
type Value interface {
String() string
Set(string) error
}
There is a good example in the example_test.go file in the flag package source
For your use case you could use something like:
package main
import (
"errors"
"flag"
"fmt"
)
type formatType string
func (f *formatType) String() string {
return fmt.Sprint(*f)
}
func (f *formatType) Set(value string) error {
if len(*f) > 0 && *f != "text" {
return errors.New("format flag already set")
}
if value != "text" && value != "json" && value != "hash" {
return errors.New("Invalid Format Type")
}
*f = formatType(value)
return nil
}
var typeFlag formatType
func init() {
typeFlag = "text"
usage := `Format type. Must be "text", "json" or "hash". Defaults to "text".`
flag.Var(&typeFlag, "format", usage)
flag.Var(&typeFlag, "f", usage+" (shorthand)")
}
func main() {
flag.Parse()
fmt.Println("Format type is", typeFlag)
}
This is probably overkill for such a simple example, but may be very useful when defining more complex flag types (The linked example converts a comma separated list of intervals into a slice of a custom type based on time.Duration).
EDIT: In answer to how to run unit tests against flags, the most canonical example is flag_test.go in the flag package source. The section related to testing custom flag variables starts at Line 181.
You can do this
func main() {
var name string
var password string
flag.StringVar(&name, "name", "", "")
flag.StringVar(&password, "password", "", "")
flag.Parse()
for _, v := range os.Args {
fmt.Println(v)
}
if len(strings.TrimSpace(name)) == 0 || len(strings.TrimSpace(password)) == 0 {
log.Panicln("no name or no passward")
}
fmt.Printf("name:%s\n", name)
fmt.Printf("password:%s\n", password)
}
func TestMainApp(t *testing.T) {
os.Args = []string{"test", "-name", "Hello", "-password", "World"}
main()
}
You can test main() by:
Making a test that runs a command
Which then calls the app test binary, built from go test, directly
Passing the desired flags you want to test
Passing back the exit code, stdout, and stderr which you can assert on.
NOTE This only works when main exits, so that the test does not run infinitely, or gets caught in a recursive loop.
Given your main.go looks like:
package main
import (
"flag"
"fmt"
"os"
)
var formatType string
const (
text = "text"
json = "json"
hash = "hash"
)
func init() {
const (
defaultFormat = "text"
formatUsage = "desired output format"
)
flag.StringVar(&formatType, "format", defaultFormat, formatUsage)
flag.StringVar(&formatType, "f", defaultFormat, formatUsage+" (shorthand)")
}
func main() {
flag.Parse()
fmt.Printf("format type = %v\n", formatType)
os.Exit(0)
}
Your main_test.go may then look something like:
package main
import (
"fmt"
"os"
"os/exec"
"path"
"runtime"
"strings"
"testing"
)
// This will be used to pass args to app and keep the test framework from looping
const subCmdFlags = "FLAGS_FOR_MAIN"
func TestMain(m *testing.M) {
// Only runs when this environment variable is set.
if os.Getenv(subCmdFlags) != "" {
runAppMain()
}
// Run all tests
exitCode := m.Run()
// Clean up
os.Exit(exitCode)
}
func TestMainForCorrectness(tester *testing.T) {
var tests = []struct {
name string
wantCode int
args []string
}{
{"formatTypeJson", 0, []string{"-format", "json"}},
}
for _, test := range tests {
tester.Run(test.name, func(t *testing.T) {
cmd := getTestBinCmd(test.args)
cmdOut, cmdErr := cmd.CombinedOutput()
got := cmd.ProcessState.ExitCode()
// Debug
showCmdOutput(cmdOut, cmdErr)
if got != test.wantCode {
t.Errorf("unexpected error on exit. want %q, got %q", test.wantCode, got)
}
})
}
}
// private helper methods.
// Used for running the application's main function from other test.
func runAppMain() {
// the test framework has process its flags,
// so now we can remove them and replace them with the flags we want to pass to main.
// we are pulling them out of the environment var we set.
args := strings.Split(os.Getenv(subCmdFlags), " ")
os.Args = append([]string{os.Args[0]}, args...)
// Debug stmt, can be removed
fmt.Printf("\nos args = %v\n", os.Args)
main() // will run and exit, signaling the test framework to stop and return the exit code.
}
// getTestBinCmd return a command to run your app (test) binary directly; `TestMain`, will be run automatically.
func getTestBinCmd(args []string) *exec.Cmd {
// call the generated test binary directly
// Have it the function runAppMain.
cmd := exec.Command(os.Args[0], "-args", strings.Join(args, " "))
// Run in the context of the source directory.
_, filename, _, _ := runtime.Caller(0)
cmd.Dir = path.Dir(filename)
// Set an environment variable
// 1. Only exist for the life of the test that calls this function.
// 2. Passes arguments/flag to your app
// 3. Lets TestMain know when to run the main function.
subEnvVar := subCmdFlags + "=" + strings.Join(args, " ")
cmd.Env = append(os.Environ(), subEnvVar)
return cmd
}
func showCmdOutput(cmdOut []byte, cmdErr error) {
if cmdOut != nil {
fmt.Printf("\nBEGIN sub-command out:\n%v", string(cmdOut))
fmt.Print("END sub-command\n")
}
if cmdErr != nil {
fmt.Printf("\nBEGIN sub-command stderr:\n%v", cmdErr.Error())
fmt.Print("END sub-command\n")
}
}
I'm not sure whether we agree on the term 'unit test'. What you want to achieve seems to me
more like a pretty normal test in a program. You probably want to do something like this:
func main() {
flag.Parse()
if formatType != text || formatType != json || formatType != hash {
flag.Usage()
return
}
// ...
}
Sadly, it is not easily possible to extend the flag Parser with own value verifiers
so you have to stick with this for now.
See Intermernet for a solution which defines a custom format type and its validator.

Golang regex replace does nothing

I want to replace any non-alphanumeric character sequences with a dash. A snippet of what I wrote is below. However it does not work and I'm completely clueless why. Could anyone explain me why the snippet behaves not like I expect it to and what would be the correct way to accomplish this?
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
reg, _ := regexp.Compile("/[^A-Za-z0-9]+/")
safe := reg.ReplaceAllString("a*-+fe5v9034,j*.AE6", "-")
safe = strings.ToLower(strings.Trim(safe, "-"))
fmt.Println(safe) // Output: a*-+fe5v9034,j*.ae6
}
The forward slashes are not matched by your string.
package main
import (
"fmt"
"log"
"regexp"
"strings"
)
func main() {
reg, err := regexp.Compile("[^A-Za-z0-9]+")
if err != nil {
log.Fatal(err)
}
safe := reg.ReplaceAllString("a*-+fe5v9034,j*.AE6", "-")
safe = strings.ToLower(strings.Trim(safe, "-"))
fmt.Println(safe) // Output: a*-+fe5v9034,j*.ae6
}
(Also here)
Output
a-fe5v9034-j-ae6