ElasticSearch學習之多條件組合查詢驗證及示例分析

多條件組合查詢

bool

es中使用bool來控制多條件查詢,bool查詢支持以下參數:

  • must:被查詢的數據必須滿足當前條件
  • mush_not:被查詢的數據必須不滿足當前條件
  • should:被查詢的數據應該滿足當前條件。should查詢被用於修正查詢結果的評分。需要註意的是,如果組合查詢中沒有must,那麼被查詢的數據至少要匹配一條should。如果有must語句,那麼就無須匹配shouldshould將完全用於修正查詢結果的評分
  • filter:被查詢的數據必須滿足當前條件,但是filter操作不涉及查詢結果評分。僅用於條件過濾

下面通過一個例子來看下如何使用:

GET class_1/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "apple"
        }}
      ],
      "must_not": [
        {"term": {
          "num": {
            "value": "5"
          }
        }}
      ],
      "should": [
        {"match": {
          "name": "k"
        }}
      ],"filter": [
        {"range": {
          "num": {
            "gte": 0,
            "lte": 10
          }
        }}
      ]
    }
  }
}

結果返回:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 0.7389809,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}

constant_score

constant_score查詢可以通過boost指定一個固定的評分,通常來說,constant_score的作用是代替一個隻有filterbool查詢

下面看具體使用:

GET class_1/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "num": 6
        }
      },
      "boost": 1.2
    }
  }
}

返回:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.2,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "h2Fg-4UBECmbBdQA6VLg",
        "_score" : 1.2,
        "_source" : {
          "name" : "b",
          "num" : 6
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.2,
        "_source" : {
          "name" : "l",
          "num" : 6
        }
      }
    ]
  }
}

查詢驗證 & 分析

驗證

es中通過/_validate/query路由來驗證查詢條件的正確性, 這裡要註意是驗證查詢條件是否準確

示例:

GET class_1/_validate/query?explain
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "apple"
        }}
      ]
    }
  }
}

正常返回:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "class_1",
      "valid" : true,
      "explanation" : "+name:apple"
    }
  ]
}

name字段改為 name1再查詢:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "class_1",
      "valid" : true,
      "explanation" : """+MatchNoDocsQuery("unmapped fields [name1]")"""
    }
  ]
}

可以看到報瞭異常錯誤

分析

es中通過/_validate/query?explain路由來進行查詢分析

示例:

GET class_1/_validate/query?explain
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "apple so"
        }}
      ]
    }
  }
}

返回:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "class_1",
      "valid" : true,
      "explanation" : "+(name:apple name:so)"
    }
  ]
}

可以看到"explanation" : "+(name:apple name:so)",查詢的短語apple so被進行瞭分詞,分成瞭name:apple, name: so

排序

默認排序

在前面的幾個例子中,我們可以看到它的默認排序是按照_score降序,也就是匹配度高的比較靠前,但是_socre的計算是很占用查詢性能的,這個不難理解。

當我們不需要進行_score計算,可以通過filterconstant_score來進行構建查詢條件

filter示例:

GET class_1/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": {
          "num": 1
        }}
      ]
    }
  }
}

返回:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 0.0,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 0.0,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 0.0,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}

通過查詢結果我們發現score都為0.0瞭,說明沒有進行score計算

constant_score示例:

GET class_1/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "num": 1
        }
      },
      "boost": 1.2
    }
  }
}

返回:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.2,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 1.2,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 1.2,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 1.2,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}

可以看到,對應返回的分值,都是使用boost屬性指定的分值

自定義排序

自定義可以用於大部分場景,那麼es中怎麼進行自定義排序呢? es中使用sort參數來自定義排序順序,默認為升序,那麼降序怎麼操作呢?

  • 升序
{"sort":["num"]}
  • 降序, desc代表降序
{"sort":[{"num":{"order":"desc"}}]} 

tips

  • es中使用doc value列式存儲來實現字段的排序功能
  • text字段默認不創建doc value,因此無法針對text字段進行排序
  • 可以通過設置text字段屬性fielddata=true來開啟對text字段的排序功能,但是不建議開啟,對text字段排序及其消耗查詢性能且不符合需求

單字段排序

GET class_1/_search
{
    "sort": [
        "num"
    ]
}

返回:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "h2Fg-4UBECmbBdQA6VLg",
        "_score" : null,
        "_source" : {
          "name" : "b",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "l",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "num" : 9,
          "name" : "e",
          "age" : 9,
          "desc" : [
            "hhhh"
          ]
        },
        "sort" : [
          9
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "f",
          "age" : 10,
          "num" : 10
        },
        "sort" : [
          10
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "RWlfBIUBDuA8yW5cu9wu",
        "_score" : null,
        "_source" : {
          "name" : "一年級",
          "num" : 20
        },
        "sort" : [
          20
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iGFt-4UBECmbBdQAnVJe",
        "_score" : null,
        "_source" : {
          "name" : "g",
          "age" : 8
        },
        "sort" : [
          9223372036854775807
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iWFt-4UBECmbBdQAnVJg",
        "_score" : null,
        "_source" : {
          "name" : "h",
          "age" : 9
        },
        "sort" : [
          9223372036854775807
        ]
      }
    ]
  }
}

可以看到是按照num默認升序排序

再看下降序:

GET class_1/_search
{
    "sort": [
        {"num": {"order":"desc"}}
    ]
}

返回:

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "RWlfBIUBDuA8yW5cu9wu",
        "_score" : null,
        "_source" : {
          "name" : "一年級",
          "num" : 20
        },
        "sort" : [
          20
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "f",
          "age" : 10,
          "num" : 10
        },
        "sort" : [
          10
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "num" : 9,
          "name" : "e",
          "age" : 9,
          "desc" : [
            "hhhh"
          ]
        },
        "sort" : [
          9
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "h2Fg-4UBECmbBdQA6VLg",
        "_score" : null,
        "_source" : {
          "name" : "b",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "l",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iGFt-4UBECmbBdQAnVJe",
        "_score" : null,
        "_source" : {
          "name" : "g",
          "age" : 8
        },
        "sort" : [
          -9223372036854775808
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iWFt-4UBECmbBdQAnVJg",
        "_score" : null,
        "_source" : {
          "name" : "h",
          "age" : 9
        },
        "sort" : [
          -9223372036854775808
        ]
      }
    ]
  }
}

這下就降序排序瞭

多字段

GET class_1/_search
{
    "sort": [
        "num", "age"
    ]
}

scroll分頁

還記得之前給大傢講的from+size的分頁方式嗎,es中默認允許from+size的分頁的最大數據量為10000。當我們想要批量獲取更大的數據量時,使用from+size就會十分的耗費性能。

然而大部分應用場景下的數據量是極其龐大的,比如你要查詢某些系統日志數據。es中可以使用/scorll路由來進行滾動分頁查詢,它類似於在查詢初始時間點創建瞭一個當前服務集群的數據快照(包含每一個分片),並保留它一段時間。在時間超過瞭設置的過期時間以後,快照將在es空閑時被刪除。

需要註意的是,因為是進行快照查詢,因此在快照創建後數據的變更在本次的滾動查詢中,不可見

初始化快照 & 快照保存10分鐘

查詢示例:

GET class_1/_search?scroll=10m
{
"query": {
 "match_phrase": {
   "name": "apple"
 }
},
"size": 2
}

返回:

{
  "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==",
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      }
    ]
  }
}

如圖,當前共返回2條數據,並且返回瞭一個快照ID,後續可以根據快照ID進行滾動查詢:

根據快照ID滾動查詢

GET /_search/scroll
{
 "scroll": "10m", 
 "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw=="
}

返回:

{
  "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==",
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 0.7389809,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}

在滾動一次:

{
  "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==",
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [ ]
  }
}

有的小夥伴可能不知道怎麼滾動的,因為後續滾動都是同一個scroll_id,其實通過結果,我們不難發現:

  • 首先創建瞭一個10分鐘的快照,規定瞭每次返回的數據量為2條,並且初始化的時候,返回瞭2條
  • 通過scroll_id進行滾動操作,返回瞭1條數據,原因是快照的數據量總共隻有3條,初始化的時候返回瞭2條,所以現在隻有1條
  • 再次滾動的時候,發現返回瞭空,因為數據已經被查完瞭

以上就是ElasticSearch 多條件組合查詢驗證及示例分析的詳細內容,更多關於ElasticSearch 多條件組合查詢的資料請關註WalkonNet其它相關文章!

推薦閱讀: