Golang的strings.Split()踩坑記錄

Posted on 2022-05-29 by WalkonNet

背景

工作中，當我們需要對字符串按照某個字符串切分成字符串數組數時，常用到strings.Split()

最近在使用過程中踩到瞭個坑，後對踩坑原因做瞭分析，並總結瞭使用string.Split可能踩到的坑。最後寫本篇文章做復盤總結與分享

場景

當時是需要取某個結構體的某個屬性，並將其按,切分整體邏輯類似這樣的

type Info struct{
   Ids string // Ids: 123,456
}

func test3(info Info){
   ids := info.Ids
   idList := strings.Split(ids , ",")
   if len(idList) < 1 {
      return
   }
   log.Println("ids-not-empty")
   // ***
}

當ids = "" 時，控制臺打印瞭 ids-not-empty ,當時百思不得其解，按理來說應該直接走return 這個問題激發瞭我的好奇心，決定認真排查一下

前置

在排查之前，先大概講講 Go 中string的基本結構

golang的string它的運行時的數據結構位於reflect.StringHeader

type stringHeader struct {
   Data unsafe.Pointer
   Len  int
}

其中Data指向數據數組的指針 ,Len為數組的長度

排查

驗證

既然代碼中的 if 判斷為false，那麼就實際打印一下 isList的長度看看呢

func test3(info Info){  
    ids := info.Ids
    idList := strings.Split(ids, ",")
    log.Printf("idList長度: [%d], idList: [%v]", len(idList), idList)
    for index, _ := range idList {
       log.Printf("idList[%d]:[%v]", index, idList[index])
    }    
   // ***
}

打印底層信息

好奇心加深，打印一下ids和idList的信息

const (
  basePrintInfoV3 = "%s 字符串的指針地址:[%v]，字符串buf數組地址:[%v] ,Len字段的地址:[%p] ,Len字段值:[%v]"
  basePrintInfoV2 = "%s切片的指針地址:[%p]，切片數組地址:[%p], Len字段的地址:[%p], Len字段的值:[%v]"
)

func test3(info Info) {
  ids := info.Ids
  idList := strings.Split(ids, ",")
  getStringPtr("ids ", &ids)
  getStringSliceAllPtr("idList ", &idList)
  // ***
}
func getStringPtr(name string, str *string) {
   s2 := (*reflect.StringHeader)(unsafe.Pointer(str))
   log.Printf(basePrintInfoV3, name, unsafe.Pointer(str), unsafe.Pointer(s2.Data), unsafe.Pointer(&s2.Len), s2.Len)
}

func getStringSliceAllPtr(name string, s1 *[]string) {
   s2 := (*reflect.StringHeader)(unsafe.Pointer(s1))
   log.Printf(basePrintInfoV2, name, unsafe.Pointer(&s1), unsafe.Pointer(s2.Data), unsafe.Pointer(&s2.Len), s2.Len)
}

追源碼

ids 經過 split 之後的數組和預期的不一樣，看來應該是 split 源碼有特殊處理瞭，那追一下源碼吧

func Split(s, sep string) []string { return genSplit(s, sep, 0, -1) }

大概讀一遍源碼能夠理清楚genSplit思路

預先確定s 能夠被切分成n份
創建長度為n的數組
遍歷 s ,將每片數據放入數組中
返回

func genSplit(s, sep string, sepSave, n int) []string {
   if n == 0 {
      return nil
   }
   if sep == "" {
      return explode(s, n)
   }
   if n < 0 {
      // 計算 s 按照 seq 能被切成多少份
      n = Count(s, sep) + 1
   }

   a := make([]string, n)
   n--
   i := 0
   for i < n {
      // 定位 s裡的第一個 sep 所在的位置
      m := Index(s, sep)
      if m < 0 {
         break
      }
      // 放入返回的數組
      a[i] = s[:m+sepSave]
      // 切割s
      s = s[m+len(sep):]
      i++
   }
   a[i] = s
   return a[:i+1]
}

那麼問題應該出就出在 Count 函數中

跟進看看 count 函數會計算 s 字符串中包含瞭多少個 subStr

func Count(s, substr string) int {
   // special case
   if len(substr) == 0 {
      return utf8.RuneCountInString(s) + 1
   }
   if len(substr) == 1 {
      return bytealg.CountString(s, substr[0])
   }
   n := 0
   for {
      i := Index(s, substr)
      if i == -1 {
         return n
      }
      n++
      s = s[i+len(substr):]
   }
}

Count 中會走 len(substr) == 1這個邏輯，其中的CountString計算s中存在多少個 substr[0]，當時跟進，返回的結果是0 ，這裡符合預期。

再結合 genSplit 中的 n = Count() + 1 我們可以發現，在genSplit時，預先創建的數組長度就為0 + 1 = 1 ! 問題迎刃而解

類似情況

經過查閱，這裡再總結一下其他使用strings.Split可能遇到的坑

s := strings.Split("", "")
fmt.Println(s, len(s)) // [] 0 //返回空數組

s = strings.Split("abc,abc", "")
fmt.Println(s, len(s)) // [a b c , a b c] 7 //返回7個數組元素

s = strings.Split("", ",")
fmt.Println(s, len(s)) // [] 1 

s = strings.Split("abc,abc", ",")
fmt.Println(s, len(s)) // [abc abc] 2

s = strings.Split("abc,abc", "|")
fmt.Println(s, len(s)) // [abc,abc] 1

fmt.Println(len("")) // 0
fmt.Println(len([]string{""})) // 1 

str := ""
fmt.Println(str[0]) // panic

總結

這次小小的踩坑其實也算是繞瞭一點點彎路，直接讀源碼就好瞭 hhhhhh

到此這篇關於Golang的strings.Split()踩坑記錄的文章就介紹到這瞭,更多相關Golang strings.Split()內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet！

Golang的strings.Split()踩坑記錄

目錄

背景

場景

前置

排查

驗證

打印底層信息

追源碼

類似情況

總結

推薦閱讀：

發佈留言取消回覆

近期文章

目錄

背景

場景

前置

排查

驗證

打印底層信息

追源碼

類似情況

總結

推薦閱讀：

發佈留言 取消回覆

近期文章

標籤

發佈留言取消回覆