Idiomatic variable-size worker pool in Go

Idiomatic variable-size worker pool in Go - concurrency

I'm trying to implement a pool of workers in Go. The go-wiki (and Effective Go in the Channels section) feature excellent examples of bounding resource use. Simply make a channel with a buffer that's as large as the worker pool. Then fill that channel with workers, and send them back into the channel when they're done. Receiving from the channel blocks until a worker is available. So the channel and a loop is the entire implementation -- very cool!
Alternatively one could block on sending into the channel, but same idea.
My question is about changing the size of the worker pool while it's running. I don't believe there's a way to change the size of a channel. I have some ideas, but most of them seem way too complicated. This page actually implements a semaphore using a channel and empty structs in much the same way, but it has the same problem (these things come up all the time while Googling for "golang semaphore".

I would do it the other way round. Instead of spawning many goroutines (which still require a considerable amount of memory) and use a channel to block them, I would model the workers as goroutines and use a channel to distribute the work. Something like this:
package main
import (
"fmt"
"sync"
)
type Task string
func worker(tasks <-chan Task, quit <-chan bool, wg *sync.WaitGroup) {
defer wg.Done()
for {
select {
case task, ok := <-tasks:
if !ok {
return
}
fmt.Println("processing task", task)
case <-quit:
return
}
}
}
func main() {
tasks := make(chan Task, 128)
quit := make(chan bool)
var wg sync.WaitGroup
// spawn 5 workers
for i := 0; i < 5; i++ {
wg.Add(1)
go worker(tasks, quit, &wg)
}
// distribute some tasks
tasks <- Task("foo")
tasks <- Task("bar")
// remove two workers
quit <- true
quit <- true
// add three more workers
for i := 0; i < 3; i++ {
wg.Add(1)
go worker(tasks, quit, &wg)
}
// distribute more tasks
for i := 0; i < 20; i++ {
tasks <- Task(fmt.Sprintf("additional_%d", i+1))
}
// end of tasks. the workers should quit afterwards
close(tasks)
// use "close(quit)", if you do not want to wait for the remaining tasks
// wait for all workers to shut down properly
wg.Wait()
}
It might be a good idea to create a separate WorkerPool type with some convenient methods. Also, instead of type Task string it is quite common to use a struct that also contains a done channel that is used to signal that the task had been executed successfully.
Edit: I've played around a bit more and came up with the following: http://play.golang.org/p/VlEirPRk8V. It's basically the same example, with a nicer API.

A simple change that can think is to have a channel that controls how big is the semaphore.
The relevant part is the select statements. If there is more work from the queue process it with the current semaphore. If there is a request to change the size of the semaphore change it and continue processing the req queue with the new semaphore. Note that the old one is going to be garbage collected.
package main
import "time"
import "fmt"
type Request struct{ num int }
var quit chan struct{} = make(chan struct{})
func Serve(queue chan *Request, resize chan int, semsize int) {
for {
sem := make(chan struct{}, semsize)
var req *Request
select {
case semsize = <-resize:
{
sem = make(chan struct{}, semsize)
fmt.Println("changing semaphore size to ", semsize)
}
case req = <-queue:
{
sem <- struct{}{} // Block until there's capacity to process a request.
go handle(req, sem) // Don't wait for handle to finish.
}
case <-quit:
return
}
}
}
func process(r *Request) {
fmt.Println("Handled Request", r.num)
}
func handle(r *Request, sem chan struct{}) {
process(r) // May take a long time & use a lot of memory or CPU
<-sem // Done; enable next request to run.
}
func main() {
workq := make(chan *Request, 1)
ctrlq := make(chan int)
go func() {
for i := 0; i < 20; i += 1 {
<-time.After(100 * time.Millisecond)
workq <- &Request{i}
}
<-time.After(500 * time.Millisecond)
quit <- struct{}{}
}()
go func() {
<-time.After(500 * time.Millisecond)
ctrlq <- 10
}()
Serve(workq, ctrlq, 1)
}
http://play.golang.org/p/AHOLlAv2LH

Related

golang string channel send/receive inconsistency

New to go. I'm using 1.5.1. I'm trying to accumulate a word list based on an incoming channel. However, my input channel (wdCh) is sometimes getting the empty string ("") during testing. I'm perplexed. I'd rather not have a test for the empty string before I add its accumulated count in my map. Feels like a hack to me.
package accumulator
import (
"fmt"
"github.com/stretchr/testify/assert"
"testing"
)
var words map[string]int
func Accumulate(wdCh chan string, closeCh chan bool) {
words = make(map[string]int)
for {
select {
case word := <-wdCh:
fmt.Printf("word = %s\n", word)
words[word]++
case <-closeCh:
return
}
}
}
func pushWords(w []string, wdCh chan string) {
for _, value := range w {
fmt.Printf("sending word = %s\n", value)
wdCh <- value
}
close(wdCh)
}
func TestAccumulate(t *testing.T) {
sendWords := []string{"one", "two", "three", "two"}
wMap := make(map[string]int)
wMap["one"] = 1
wMap["two"] = 2
wMap["three"] = 1
wdCh := make(chan string)
closeCh := make(chan bool)
go Accumulate(wdCh, closeCh)
pushWords(sendWords, wdCh)
closeCh <- true
close(closeCh)
assert.Equal(t, wMap, words)
}

Check out this article about channel-axioms. Looks like there's a race between closing wdCh and sending true on the closeCh channel.
So the outcome depends on what gets scheduled first between pushWords returning and Accumulate.
If TestAccumulate runs first, sending true on closeCh, then when Accumulate runs it picks either of the two channels since they can both be run because pushWords closed wdCh.
A receive from a closed channel returns the zero value immediately.
Until closedCh is signaled, Accumulate will randomly put one or more empty "" words in the map.
If Accumulate runs first then it's likely to put many empty strings in the word map as it loops until TestAccumulate runs and finally it sends a signal on closeCh.
An easy fix would be to move
close(wdCh)
after sending true on the closeCh. That way wdCh can't return the zero value until after you've signaled on the closeCh. Additionally, closeCh <- true blocks because closeCh doesn't have a buffer size, so wdCh won't get closed until after you've guaranteed that Accumulate has finished looping forever.

I think the reason is when you close the channle, "select" will although receive the signal.
So when you close "wdCh" in "func pushWords", the loop in Accumulate will receive signal from "<-wdCh".
May be you should add some code to test the action after channel is closed!
for {
select {
case word, ok := <-wdCh:
if !ok {
fmt.Println("channel wdCh is closed!")
continue
}
fmt.Printf("word = %s\n", word)
words[word]++
case <-closeCh:
return
}
}

Worker pool for a potentially recursive task (i.e., each job can queue other jobs)

I'm writing an application that the user can start with a number of "jobs" (URLs actually). At the beginning (main routine), I add these URLs to a queue, then start x goroutines that work on these URLs.
In special cases, the resource a URL points to may contain even more URLs which have to be added to the queue. The 3 workers are waiting for new jobs to come in and process them. The problem is: once EVERY worker is waiting for a job (and none is producing any), the workers should stop altogether. So either all of them work or no one works.
My current implementation looks something like this and I don't think it's elegant. Unfortunately I couldn't think of a better way that wouldn't include race conditions and I'm not entirely sure if this implementation actually works as intended:
var queue // from somewhere
const WORKER_COUNT = 3
var done chan struct{}
func work(working chan int) {
absent := make(chan struct{}, 1)
// if x>1 jobs in sequence are popped, send to "absent" channel only 1 struct.
// This implementation also assumes that the select statement will be evaluated "in-order" (channel 2 only if channel 1 yields nothing) - is this actually correct? EDIT: It is, according to the specs.
one := false
for {
select {
case u, ok := <-queue.Pop():
if !ok {
close(absent)
return
}
if !one {
// I have started working (delta + 1)
working <- 1
absent <- struct{}{}
one = true
}
// do work with u (which may lead to queue.Push(urls...))
case <-absent: // no jobs at the moment. consume absent => wait
one = false
working <- -1
}
}
}
func Start() {
working := make(chan int)
for i := 0; i < WORKER_COUNT; i++ {
go work(working)
}
// the amount of actually working workers...
sum := 0
for {
delta := <-working
sum += delta
if sum == 0 {
queue.Close() // close channel -> kill workers.
done <- struct{}{}
return
}
}
}
Is there a better way to tackle this problem?

You can use a sync.WaitGroup (see docs) to control the lifetime of the workers, and use a non-blocking send so workers can't deadlock when they try to queue up more jobs:
package main
import "sync"
const workers = 4
type job struct{}
func (j *job) do(enqueue func(job)) {
// do the job, calling enqueue() for subtasks as needed
}
func main() {
jobs, wg := make(chan job), new(sync.WaitGroup)
var enqueue func(job)
// workers
for i := 0; i < workers; i++ {
go func() {
for j := range jobs {
j.do(enqueue)
wg.Done()
}
}()
}
// how to queue a job
enqueue = func(j job) {
wg.Add(1)
select {
case jobs <- j: // another worker took it
default: // no free worker; do the job now
j.do(enqueue)
wg.Done()
}
}
todo := make([]job, 1000)
for _, j := range todo {
enqueue(j)
}
wg.Wait()
close(jobs)
}
The difficulty with trying to avoid deadlocks with a buffered channel is that you have to allocate a big enough channel up front to definitely hold all pending tasks without blocking. Problematic unless, say, you have a small and known number of URLs to crawl.
When you fall back to doing ordinary recursion in the current thread, you don't have that static buffer-size limit. Of course, there are still limits: you'd probably run out of RAM if too much work were pending, and theoretically you could exhaust the stack with deep recursion (but that's hard!). So you'd need to track pending tasks some more sophisticated way if you were, say, crawling the Web at large.
Finally, as a more complete example, I'm not super proud of this code, but I happened to write a function to kick off a parallel sort that's recursive in the same way your URL fetching is.

Go app hangs when testing a function that contains a lock

This is a function I wrote that adds a request to a request queue:
func (self *RequestQueue) addRequest(request *Request) {
self.requestLock.Lock()
self.queue[request.NormalizedUrl()] = request.ResponseChannel
self.requestLock.Unlock()
}
and this is one of its tests:
func TestAddRequest(t *testing.T) {
before := len(rq.queue)
r := SampleRequests(1)[0]
rq.addRequest(&r)
if (len(rq.queue) - 1) != before {
t.Errorf("Failed to add request to queue")
}
}
When I run this test, the application hangs. If I comment out this test, everything works fine.
I think the problem is the locking inside the function. Is there something that I'm doing wrong?
Thanks for your help!

The problem was an infinite loop in the SampleRequests() function:
func SampleRequests(num int) []Request {
requests := make([]Request, num, num+10)
for i := 0; i < len(requests); i++ {
r := NewRequest("GET", "http://api.openweathermap.org/data/2.5/weather", nil)
r.Params.Set("lat", "35")
r.Params.Add("lon", "139")
r.Params.Add("units", "metric")
requests = append(requests, r)
}
return requests
}
I was checking if i was less than the length of the array in the continuation condition of the for loop. With each iteration, an item was added to the array, the length increased and the for loop continued executing.

processing jobs from a neverending queue with a fixed number of workers

This is doing my head in, I cant figure out how to solve it;
I want to have a fixed number N of goroutines running in parallell
From a never-ending queue I will fetch X msg about jobs to process
I want to let the N goroutines process these X jobs, and as soon as one of the routines have nothing more to do, I want to fetch another X jobs from the neverending queue
The code in the answer below (see url) works brilliantly to process the tasks, but the workers will die once that tasks list is empty, I want them to stay alive and somehow notify the main code that they are out of work so I can fetch more jobs to fill the tasks list with tasks
How would you define a pool of goroutines to be executed at once in Golang?
Using user:Jsor example code from below, I try to create a simple program, but I am confused.
import (
"fmt"
"strconv"
)
//workChan - read only that delivers work
//requestChan - ??? what is this
func Worker(myid string, workChan <- chan string, requestChan chan<- struct{}) {
for {
select {
case work := <-workChan:
fmt.Println("Channel: " + myid + " do some work: " + work)
case requestChan <- struct{}{}:
//hm? how is the requestChan used?
}
}
}
func Logic(){
workChan := make(chan string)
requestChan := make(chan struct{})
//Create the workers
for i:=1; i < 5; i++ {
Worker( strconv.Itoa( i), workChan, requestChan)
}
//Give the workers some work
for i:=100; i < 115; i++ {
workChan<- "workid"+strconv.Itoa( i)
}
}

This is what the select statement is for.
func Worker(workChan chan<- Work, requestChan chan<- struct{}) {
for {
select {
case work := <-workChan:
// Do work
case requestChan <- struct{}{}:
}
}
}
This worker will run forever and ever. If work is available, it will pull it from the worker channel. If there's nothing left it will send a request.
Not that since it runs forever and ever, if you want to be able to kill a worker you need to do something else. One possibility is to always check ok with workChan and if that channel is closed quit the function. Another option is to use an individual quit channel for each worker.

Compared to the other solution you posted, you just need (first) not to close the channel, and just keep feeding items to it.
Then you need to answer the following question: is it absolutely necessary that (a) you fetch the next X items from your queue only once one of the workers has “nothing more to do” (or, what is the same, once the first X items are either fully processed, or assigned to a worker); or (b) is it okay if you keep the second set of X items in memory, and go feeding them to the workers as new work items are needed?
As I understand it, only (a) needs the requestChan you’re wondering about (see below). For (b), something as simple as the following would suffice:
# B version
type WorkItem int
const (
N = 5 // Number of workers
X = 15 // Number of work items to get from the infinite queue at once
)
func Worker(id int, workChan <-chan WorkItem) {
for {
item := <-workChan
doWork(item)
fmt.Printf("Worker %d processes item #%v\n", id, item)
}
}
func Dispatch(workChan chan<- WorkItem) {
for {
items := GetNextFromQueue(X)
for _, item := range items {
workChan <- item
fmt.Printf("Dispatched item #%v\n", item)
}
}
}
func main() {
workChan := make(chan WorkItem) // Shared amongst all workers; could make it buffered if GetNextFromQueue() is slow.
// Start N workers.
for i := 0; i < N; i++ {
go Worker(i, workChan)
}
// Dispatch items to the workers.
go Dispatch(workChan)
time.Sleep(20 * time.Second) // Ensure main(), and our program, finish.
}
(I’ve uploaded to the Playground a full working solution for (b).)
As for (a), the workers change to say: do work, or if there’s no more work, tell the dispatcher to get more via the reqChan communication channel. That “or” is implemented via select. Then, the dispatcher waits on reqChan before making another call to GetNextFromQueue(). It’s more code, but ensures the semantics that you might be interested in. (The previous version is overall simpler, though.)
# A version
func Worker(id int, workChan <-chan WorkItem, reqChan chan<- int) {
for {
select {
case item := <-workChan:
doWork(item)
fmt.Printf("Worker %d processes item #%v\n", id, item)
case reqChan <- id:
fmt.Printf("Worker %d thinks they requested more work\n", id)
}
}
}
func Dispatch(workChan chan<- WorkItem, reqChan <-chan int) {
for {
items := GetNextFromQueue(X)
for _, item := range items {
workChan <- item
fmt.Printf("Dispatched item #%v\n", item)
}
id := <-reqChan
fmt.Printf("Polling the queue in Dispatch() at the request of worker %d\n", id)
}
}
(I’ve also uploaded to the Playground a full working solution for (a).)

Func will not run; increment channel

I'm writing a function where I'm trying to increment a channel. In a much larger program, this is not working and it actually hangs on a line that looks like:
current = <-channel
The go funcs are running, but the program seems to halt on this line.
I tried to write a smaller SSCCE, but now I'm having a different problem. Here it is:
package main
import (
"fmt"
)
func main() {
count := make(chan int)
go func(count chan int) {
current := 0
for {
current = <-count
current++
count <- current
fmt.Println(count)
}
}(count)
}
However, in the above the go func does not actually seem to be called at all. If I put a fmt.Println statement before for {, it does not print out. If I put fmt.Println statements before or after they go func block, they will both print out.
Why does the self-calling block in the above example not seem to run at all?
If it were running, why would it block on current = <-count? How could I properly increment the channel?

I can't answer the first one issue without more info. The code you did show has two issues. First, the program exits after the goroutine is started. The second issue is that the goroutine is waiting for something to be sent to count, if you receive from the count channel it will not deadlock.
Here is an example showing the deadlock (http://play.golang.org/p/cRgjZt7U2A):
package main
import (
"fmt"
)
func main() {
count := make(chan int)
go func() {
current := 0
for {
current = <-count
current++
count <- current
fmt.Println(count)
}
}()
fmt.Println(<-count)
}
Here is an example of it working the way I think you are expecting (http://play.golang.org/p/QQnRpCDODu)
package main
import (
"fmt"
)
func main() {
count := make(chan int)
go func() {
current := 0
for {
current = <-count
current++
count <- current
fmt.Println(count)
}
}()
count <- 1
fmt.Println(<-count)
}

Channel :- Channel is something that can't store the value. It can only buffer the value so the basic usage is it can send and receive the value. So when you declare count := make(chan int) it does not contain any value. So the statement current = <-count will give you error that all go routines are asleep. Basically channel was design to work as communicator for different go routines which are running on different process and your main function is running on different process.
So your answer to first question is:-
1.Why does the self-calling block in the above example not seem to run at all?
Answer- See the main function you are running has one process and the go routine is running on another process so if your main function gets its execution completed before your go-routine than you will never get result form go-routine because your main thread gets dead after the execution is complete. So i am providing you a web example which is related to your example of incrementing the counter. In this example you will create a server and listen on port 8000.First of all run this example and go in your web browser and type localhost:8000 and it will so you the incrementing counter that channel stores in every buffer. This example will provide you an idea of how channel works.
2.If it were running, why would it block on current = <-count? How could I properly increment the channel?
Answer-You are receiving from the channel but channel does not have anything in its buffer so you will get an error "All go-routines are asleep". First you should transfer value into the channel and correspondingly receive it otherwise it will again go to deadlock.
package main
import (
"fmt"
"http"
)
type webCounter struct {
count chan int
}
func NewCounter() *webCounter {
counter := new(webCounter)
counter.count = make(chan int, 1)
go func() {
for i:=1 ;; i++ { counter.count <- i }
}()
return counter
}
func (w *webCounter) ServeHTTP(r http.ResponseWriter, rq *http.Request) {
if rq.URL.Path != "/" {
r.WriteHeader(http.StatusNotFound)
return
}
fmt.Fprintf(r, "You are visitor %d", <-w.count)
}
func main() {
http.ListenAndServe(":8000", NewCounter());
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Idiomatic variable-size worker pool in Go - concurrency

Related

golang string channel send/receive inconsistency

Worker pool for a potentially recursive task (i.e., each job can queue other jobs)

Go app hangs when testing a function that contains a lock

processing jobs from a neverending queue with a fixed number of workers

Func will not run; increment channel

Categories

Resources