Runtime optimization for regular expression - regex

Most regular expression is "constant" in their life time. Is it a good idea to use global regular expression to speed up execution? For example:
func work() {
r := regexp.MustCompile(`...`)
if r.MatchString(...) {
...
}
}
comparing with:
var r *regexp.Regexp
func work() {
if r.MatchString(...) {
...
}
}
func init() {
r = regexp.MustCompile(`...`)
}
Do these 2 versions has any meaningful difference?
Regular expression compiling is so cheap so that it is not worth to use global regex, both in term of CPU cost and garbage collecting (suppose work() is heavily called)
It is better to use global regular expression whenever approriate.
Which of the above is correct, or the answer is not simply black/white?

if you use same regular expression(eg "\d+") just once -> it is not worth to use global regex.
if you use same regular expression(eg "\d+") often -> it is worth to use
func Benchmark01(b *testing.B) {
for i := 0; i < b.N; i++ {
r := regexp.MustCompile(`\d+`)
r.MatchString("aaaaaaa123bbbbbbb")
}
}
func Benchmark02(b *testing.B) {
r := regexp.MustCompile(`\d+`)
for i := 0; i < b.N; i++ {
r.MatchString("aaaaaaa123bbbbbbb")
}
}
Benchmark01
Benchmark01-4 886909 1361 ns/op
Benchmark02
Benchmark02-4 5368380 232.8 ns/op

Related

Pass list of one of two structures to the function

New in Go, couldn't find any intuitive way of doing that.
I have such piece of code
tx = getTx()
for _, record := range tx.a {
// do a lot with record.Important
}
for _, record := range tx.b {
// do a lot with record.Important
}
for _, record := range tx.c {
// do a lot with record.Important
}
And the following structs:
type Record1 struct {
// fields of Record1
Important string
}
type Record2 struct {
// fields of Record1
Important string
}
type TX struct {
a []Record1
b []Record1
c []Record2
}
Now the logical is to extract every for logic into the function:
func helper(records) { // Here is the problem
// do a lot with record.Important
}
Problem:
records is a []Record1 or []Record2 type. But it looks like Union types doesn't exists in Golang. So I thought I could pass []string into the helper, but cannot even find an elegant way to get something equivalent to map(lambda r: r.Important, tx.a). There is no high order map function, no list comprehesion. I am not convinced to use raw for loop to solve that.
One approach to do the loop across multiple types is to use interfaces together with generics. Have each Record type implement a getter method for the important field. Then declare an interface that includes that getter method in its method set. Then you can make your helper generic by declaring the interface as its type parameter.
func (r Record1) GetImportant() string { return r.Important }
func (r Record2) GetImportant() string { return r.Important }
type ImportantGetter interface {
GetImporant() string
}
func helper[T ImportantGetter](s []T) {
for _, v := range s {
_ = v.GetImportant()
}
}
Unless I'm misunderstanding your question, it seems like you want to extract all the values in column X from a set of records and then pass those values in as a slice to some function - I'm basing my assumption on your wish that go had something like map().
If what you're after is type-agnosticism, you could certainly use an interface approach like that suggested by mkopriva, but you aren't going to get out of using a for loop - iteration over list types is core to idiomatic go. If you need a mapping function, you're going to have to write one that performs the mapping you want.
I'd note that you do not need generics to do what mkopriva suggests, you can just use an interface without muddying the waters with generics go playground:
package main
import "fmt"
type Record1 struct {
Important string
}
type Record2 struct {
Important string
}
func (r Record1) GetImportant() string { return r.Important }
func (r Record2) GetImportant() string { return r.Important }
type ImportantGetter interface {
GetImportant() string
}
func helper(s []ImportantGetter) {
for _, v := range s {
fmt.Println(v.GetImportant())
}
}
func main() {
records := []ImportantGetter{Record1{Important: "foo"}, Record2{Important: "bar"}}
helper(records)
}
Another approach to the type-agnosticism, and one that's a bit more (IMHO) idiomatic for "I expect all of these types to have a common property," is to use struct embedding and type assertions to build your own Map() function up go playground:
type CommonFields struct {
Important string
}
type Record1 struct {
CommonFields
FieldSpecificToRecord1 string
}
type Record2 struct {
CommonFields
FieldSpecificToRecord2 int
}
func main() {
r1 := Record1{
CommonFields{Important: "I'm r1!"},
"foo",
}
r2 := Record2{
CommonFields{Important: "I'm r2!"},
5,
}
records := []interface{}{r1, r2, "this is not a valid record type"}
fmt.Println(Map(records))
}
func Map(source []interface{}) []string {
destination := make([]string, len(source))
for i, sourceRecord := range source {
if rr, ok := sourceRecord.(Record1); ok {
destination[i] = rr.Important
} else if rr, ok := sourceRecord.(Record2); ok {
destination[i] = rr.Important
} else {
destination[i] = "undefined"
}
}
return destination
}
You'd likely want to make your implementation of Map() accept an argument specifying the field to extract to conform to what you have in other languages, or possibly even just pass in a helper function which does most of the type-specific value extraction.

Not seeing the expected side effects from goroutines

I'm trying to get a grasp on goroutines. Take this code:
package main
import "fmt"
var (
b1 []float64
b2 []float64
)
func main() {
go fill(&b1, 10)
go fill(&b2, 10)
fmt.Println(b1,b2)
var s string
fmt.Scanln(&s)
}
func fill(a *[]float64, n int) {
for i:=0; i<n; i++ {
*a = append(*a, rand.Float64()*100)
}
}
As you see, I'm trying to fill two slices. But when run this way (with go fill()), it prints two empty slices. Why is this not working?
Any goroutines you start aren't guaranteed to have finished (or even started!) until you've explicitly waited on them using a sync.WaitGroup, channel, or other mechanism. This works:
package main
import (
"fmt"
"math/rand"
"sync"
)
var (
b1 []float64
b2 []float64
)
func main() {
wg := new(sync.WaitGroup)
wg.Add(2)
go fill(&b1, 10, wg)
go fill(&b2, 10, wg)
wg.Wait()
fmt.Println(b1)
fmt.Println(b2)
}
func fill(a *[]float64, n int, wg *sync.WaitGroup) {
for i := 0; i < n; i++ {
*a = append(*a, rand.Float64()*100)
}
wg.Done()
}
(Just speaking of style, if it were me I'd make this function return the enlarged slice so it's similar to append() itself, and Go's Code Review Comments suggest passing values, though it's not at all unconventional to extend a slice passed as a pointer receiver ("this") parameter.)

What is the equivalent of Go's range time.Tick?

I'm new to and studying Rust currently, coming from Go. How do I implement something like long concurrent polling?
// StartGettingWeather initialize weather getter and setter
func StartGettingWeather() {
// start looping
for i := range time.Tick(time.Second * time.Duration(delay)) {
_ = i
loopCounter++
fmt.Println(time.Now().Format(time.RFC850), " counter: ", loopCounter)
mainWeatherGetter()
}
}
and I will call this func as go StartGettingWeather()
Rust threads are OS threads, they use OS scheduler and so you can emulate this with thread::sleep_ms:
use std::thread;
fn start_getting_weather() {
let mut loop_counter = 0;
loop {
loop_counter += 1;
println!("counter: {}", loop_counter);
main_weather_getter();
thread::sleep_ms(delay);
}
}
thread::spawn(move || start_getting_weather());

Parsing Perl regex with golang

http://play.golang.org/p/GM0SWo0qGs
This is my code and playground.
func insert_comma(input_num int) string {
temp_str := strconv.Itoa(input_num)
var validID = regexp.MustCompile(`\B(?=(\d{3})+$)`)
return validID.ReplaceAllString(temp_str, ",")
}
func main() {
fmt.Println(insert_comma(1000000000))
}
Basically, my desired input is 1,000,000,000.
And the regular expression works in Javascript but I do not know how to make this Perl regex work in Go. I would greatly appreciate it. Thanks,
Since lookahead assertion seems to be not supported, I'm providing you a different algorithm with no regexp:
Perl code:
sub insert_comma {
my $x=shift;
my $l=length($x);
for (my $i=$l%3==0?3:$l%3;$i<$l;$i+=3) {
substr($x,$i++,0)=',';
}
return $x;
}
print insert_comma(1000000000);
Go code: Disclaimer: I have zero experience with Go, so bear with me if I have errors and feel free to edit my post!
func insert_comma(input_num int) string {
temp_str := strconv.Itoa(input_num)
var result []string
i := len(temp_str)%3;
if i == 0 { i = 3 }
for index,element := range strings.Split(temp_str, "") {
if i == index {
result = append(result, ",");
i += 3;
}
result = append(result, element)
}
return strings.Join(result, "")
}
func main() {
fmt.Println(insert_comma(1000000000))
}
http://play.golang.org/p/7pvo7-3G-s

Go concurrent slice access

I'm doing some stream processing in Go and got stuck trying to figure out how to do this the "Go way" without locks.
This contrived example shows the problem I'm facing.
We get one thing at a time.
There is a goroutine which buffers them into a slice called things.
When things becomes full len(things) == 100 then it is processed somehow and reset
There are n number of concurrent goroutines that need to access things before it's full
Access to the "incomplete" things from other goroutines is not predictable.
Neither doSomethingWithPartial nor doSomethingWithComplete needs to mutate things
Code:
var m sync.Mutex
var count int64
things := make([]int64, 0, 100)
// slices of data are constantly being generated and used
go func() {
for {
m.Lock()
if len(things) == 100 {
// doSomethingWithComplete does not modify things
doSomethingWithComplete(things)
things = make([]int64, 0, 100)
}
things = append(things, count)
m.Unlock()
count++
}
}()
// doSomethingWithPartial needs to access the things before they're ready
for {
m.Lock()
// doSomethingWithPartial does not modify things
doSomethingWithPartial(things)
m.Unlock()
}
I know that slices are immutable so does that mean I can remove the mutex and expect it to still work (I assume no).
How can I refactor this to use channels instead of a mutex.
Edit: Here's the solution I came up with that does not use a mutex
package main
import (
"fmt"
"sync"
"time"
)
func Incrementor() chan int {
ch := make(chan int)
go func() {
count := 0
for {
ch <- count
count++
}
}()
return ch
}
type Foo struct {
things []int
requests chan chan []int
stream chan int
C chan []int
}
func NewFoo() *Foo {
foo := &Foo{
things: make([]int, 0, 100),
requests: make(chan chan []int),
stream: Incrementor(),
C: make(chan []int),
}
go foo.Launch()
return foo
}
func (f *Foo) Launch() {
for {
select {
case ch := <-f.requests:
ch <- f.things
case thing := <-f.stream:
if len(f.things) == 100 {
f.C <- f.things
f.things = make([]int, 0, 100)
}
f.things = append(f.things, thing)
}
}
}
func (f *Foo) Things() []int {
ch := make(chan []int)
f.requests <- ch
return <-ch
}
func main() {
foo := NewFoo()
var wg sync.WaitGroup
wg.Add(10)
for i := 0; i < 10; i++ {
go func(i int) {
time.Sleep(time.Millisecond * time.Duration(i) * 100)
things := foo.Things()
fmt.Println("got things:", len(things))
wg.Done()
}(i)
}
go func() {
for _ = range foo.C {
// do something with things
}
}()
wg.Wait()
}
It should be noted that the "Go way" is probably just to use a mutex for this. It's fun to work out how to do it with a channel but a mutex is probably simpler and easier to reason about for this particular problem.