goroutine的退出與泄露：如何檢測和預防當代碼里循環的退出條件不可達時，會令該goroutine進入死循環中，進而導致資源一直無法釋放，引起泄露。在實際項目中，往往死循環會發生在一些后臺的常駐服

發布時間：2023-07-03 12:42:28 作者：網友整理

以下文章來源于Golang技術分享，作者機器鈴砍菜刀

goroutine的退出機制

Go中，goroutine是否結束執行（退出）是由其自身決定，其他goroutine只能通過消息傳遞的方式通知其關閉，而并不能在外部強制結束一個正在執行的goroutine。當然有一種特殊情況會導致正在運行的goroutine會因為其他goroutine的結束而終止，即main函數退出。

在Go中常見的控制goroutine退出方式有以下幾種

// main函數的結束

func G1()  {
 time.Sleep(time.Second)
 fmt.Println("G1 exit")
}

func main() {
 go  G1()
 fmt.Println("main exit")
}

$ go run main.go
main exit

如上所示，程序未等G1執行完畢，即隨著main函數的退出而停止執行。

// context通知退出

func G1(ctx context.Context)  {
 num := 0
 for {
  select {
  case <-ctx.Done():
   fmt.Println("G1 exit")
   return
  case <-time.After(time.Second):
   num++
   fmt.Printf("G1 wait times: %dn", num)
  }
 }
}

func main() {
 ctx, cancel := context.WithCancel(context.Background())
 go G1(ctx)
 time.Sleep(3*time.Second)
 cancel()
 time.Sleep(time.Second)
 fmt.Println("main exit")
}

$ go run main.go
G1 wait times: 1
G1 wait times: 2
G1 exit
main exit

// panic異常結束

func G1() {
 defer func() {
  if err := recover(); err != nil {
   fmt.Printf("G1 exit by panic: %vn", err)
  }
 }()

 _, err := os.Open("notExistFile.txt")
 if err != nil {
  panic(err)
 }
 fmt.Println("G1 exit naturally")
}

func main() {
 go G1()
 time.Sleep(time.Second)
 fmt.Println("main exit")
}

$ go run main.go
G1 exit by panic: open notExistFile.txt: no such file or directory
main exit

上面函數G1中defer函數使用了recover來捕獲panic，當panic發生時可使goroutine拿回控制權，確保程序不會將panic傳遞到goroutine調用棧頂部后引起崩潰。

// 執行完畢后退出

func G1() {
 for i := 0; i < 10000; i++ {
  //do some work
 }
 fmt.Println("G1 exit")
}

func main() {
 go G1()
 time.Sleep(time.Second)
 fmt.Println("main exit")
}

$ go run main.go
G1 exit
main exit

goroutine里的任務執行完畢，即結束。

什么是goroutine泄露

如果你啟動了一個goroutine，但并沒有按照預期的一樣退出，直到程序結束，此goroutine才結束，這種情況就是 goroutine 泄露。當 goroutine 泄露發生時，該 goroutine 的棧一直被占用而不能釋放，goroutine 里的函數在堆上申請的空間也不能被垃圾回收器回收。這樣，在程序運行期間，內存占用持續升高，可用內存越來也少，最終將導致系統崩潰。

大多數情況下，引起goroutine泄露的原因有兩類：channel阻塞；goroutine陷入死循環。

// channel阻塞

1. 從channel里讀，但是沒有寫

func G1() {
 c := make(chan int)
 go func() {
  <-c
 }()
 time.Sleep(2*time.Second)
 fmt.Println("G1 exit")
}

func main() {
 go G1()

 c := time.Tick(time.Second)
 for range c {
  fmt.Printf("goroutine [nums]: %dn", runtime.NumGoroutine())
 }
}

$ go run main.go
goroutine [nums]: 3
goroutine [nums]: 3
G1 exit
goroutine [nums]: 2
goroutine [nums]: 2
...

2. 向已滿的channel里寫，但是沒有讀

func G2(size int) {
 c := make(chan int, size)
 go func() {
  <-c
 }()

 go func() {
  for i := 0; i < 10; i++ {
   c <- i
  }
 }()

 fmt.Println("G2 exit")
}

var size = flag.Int("c",0, "define channel size")

func main() {
 flag.Parse()
 go G2(*size)

 c := time.Tick(time.Second)
 for range c {
  fmt.Printf("goroutine [nums]: %dn", runtime.NumGoroutine())
 }
}

$ go run main.go -c 2
G2 exit
goroutine [nums]: 2
goroutine [nums]: 2
...
$ go run main.go -c 11
G2 exit
goroutine [nums]: 1
goroutine [nums]: 1
...

// 死循環

當代碼里循環的退出條件不可達時，會令該goroutine進入死循環中，進而導致資源一直無法釋放，引起泄露。在實際項目中，往往死循環會發生在一些后臺的常駐服務中。

goroutine泄露的預防和檢測

// 預防

1. 最重要的一點，在創建goroutine時，就應該知道goroutine啥時能結束。

2. channel引起的goroutine泄露問題，主要是看在channel阻塞goroutine時，該goroutine的阻塞是正常的，還是可能導致協程永遠沒有機會執行。若是后者，則極大可能會造成協程泄露。

channel的實際使用中，常用的兩種模型：生產者-消費者模型；master-worker模型。一般的解決方案是：當主線程結束時，告知生產協程，生產協程得到通知后，進行清理工作然后退出；為每個worker任務制定超時，當超時觸發，返回給master超時信息，并結束該worker協程。具體代碼方案是使用上下文context。

3. 實現循環語句時必須清晰地知道退出循環的條件，避免死循環。

// 檢測

1. Go提供的pprof工具。

2. 利用runtime.NumGoroutine接口，實時查看程序中運行的goroutine數。

3. 開源三方profiling庫。

gops，地址：https://github.com/google/gops

goleak，地址：https://github.com/uber-go/goleak

分享到：

標簽：goroutine