postgresql 中的 like 查詢優化方案
當時數量量比較龐大的時候,做模糊查詢效率很慢,為瞭優化查詢效率,嘗試如下方法做效率對比
一、對比情況說明:
1、數據量100w條數據
2、執行sql
二、對比結果
explain analyze SELECT c_patent, c_applyissno, d_applyissdate, d_applydate, c_patenttype_dimn, c_newlawstatus, c_abstract FROM public.t_knowl_patent_zlxx_temp WHERE c_applicant LIKE '%本溪滿族自治縣連山關鎮安平安養殖場%';
1、未建索時執行計劃:
"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=217.264..217.264 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=212.355..212.355 rows=0 loops=3) Filter: ((c_applicant)::text ~~ '%本溪滿族自治縣連山關鎮安平安養殖場%'::text) Rows Removed by Filter: 333333 Planning time: 0.272 ms Execution time: 228.116 ms"
2、btree索引
建索引語句
CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx(c_applicant varchar_pattern_ops);
執行計劃
"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=208.253..208.253 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=203.573..203.573 rows=0 loops=3) Filter: ((c_applicant)::text ~~ '%本溪滿族自治縣連山關鎮安平安養殖場%'::text) Rows Removed by Filter: 333333 Planning time: 0.116 ms Execution time: 218.189 ms"
但是如果將查詢sql稍微改動一下,把like查詢中的前置%去掉是這樣的
Index Scan using idx_public_t_knowl_patent_zlxx_applicant on t_knowl_patent_zlxx_temp (cost=0.55..8.57 rows=92 width=1278) (actual time=0.292..0.292 rows=0 loops=1) Index Cond: (((c_applicant)::text ~>=~ '本溪滿族自治縣連山關鎮安平安養殖場'::text) AND ((c_applicant)::text ~<~ '本溪滿族自治縣連山關鎮安平安養殖圻'::text)) Filter: ((c_applicant)::text ~~ '本溪滿族自治縣連山關鎮安平安養殖場%'::text) Planning time: 0.710 ms Execution time: 0.378 ms
3、gin索引
創建索引語句(postgresql要求在9.6版本及以上)
create extension pg_trgm; CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx USING gin (c_applicant gin_trgm_ops);
執行計劃
Bitmap Heap Scan on t_knowl_patent_zlxx (cost=244.71..600.42 rows=91 width=1268) (actual time=0.649..0.649 rows=0 loops=1) Recheck Cond: ((c_applicant)::text ~~ '%本溪滿族自治縣連山關鎮安平安養殖場%'::text) -> Bitmap Index Scan on idx_public_t_knowl_patent_zlxx_applicant (cost=0.00..244.69 rows=91 width=0) (actual time=0.647..0.647 rows=0 loops=1) Index Cond: ((c_applicant)::text ~~ '%本溪滿族自治縣連山關鎮安平安養殖場%'::text) Planning time: 0.673 ms Execution time: 0.740 ms
三、結論
btree索引可以讓後置% “abc%”的模糊匹配走索引,gin + gp_trgm可以讓前後置% “%abc%” 走索引。但是gin 索引也有弊端,以下情況可能導致無法命中:
搜索字段少於3個字符時,不會命中索引,這是gin自身機制導致。
當搜索字段過長時,比如email檢索,可能也不會命中索引,造成原因暫時未知。
補充:PostgreSQL LIKE 查詢效率提升實驗
一、未做索引的查詢效率
作為對比,先對未索引的查詢做測試
EXPLAIN ANALYZE select * from gallery_map where author = '曹志耘'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.011..39.753 rows=1031 loops=1) Filter: ((author)::text = '曹志耘'::text) Rows Removed by Filter: 71315 Planning time: 0.194 ms Execution time: 39.879 ms (5 rows) Time: 40.599 ms EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.017..41.513 rows=1031 loops=1) Filter: ((author)::text ~~ '曹志耘'::text) Rows Removed by Filter: 71315 Planning time: 0.188 ms Execution time: 41.669 ms (5 rows) Time: 42.457 ms EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.017..41.492 rows=1031 loops=1) Filter: ((author)::text ~~ '曹志耘%'::text) Rows Removed by Filter: 71315 Planning time: 0.307 ms Execution time: 41.633 ms (5 rows) Time: 42.676 ms
很顯然都會做全表掃描
二、創建btree索引
PostgreSQL默認索引是btree
CREATE INDEX ix_gallery_map_author ON gallery_map (author); EXPLAIN ANALYZE select * from gallery_map where author = '曹志耘'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.457..1.312 rows=1031 loops=1) Recheck Cond: ((author)::text = '曹志耘'::text) Heap Blocks: exact=438 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.358..0.358 rows=1031 loops=1) Index Cond: ((author)::text = '曹志耘'::text) Planning time: 0.416 ms Execution time: 1.422 ms (7 rows) Time: 2.462 ms EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.752..2.119 rows=1031 loops=1) Filter: ((author)::text ~~ '曹志耘'::text) Heap Blocks: exact=438 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.560..0.560 rows=1031 loops=1) Index Cond: ((author)::text = '曹志耘'::text) Planning time: 0.270 ms Execution time: 2.295 ms (7 rows) Time: 3.444 ms EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.015..41.389 rows=1031 loops=1) Filter: ((author)::text ~~ '曹志耘%'::text) Rows Removed by Filter: 71315 Planning time: 0.260 ms Execution time: 41.518 ms (5 rows) Time: 42.430 ms EXPLAIN ANALYZE select * from gallery_map where author like '%研究室'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=2282 width=621) (actual time=0.064..52.824 rows=2152 loops=1) Filter: ((author)::text ~~ '%研究室'::text) Rows Removed by Filter: 70194 Planning time: 0.254 ms Execution time: 53.064 ms (5 rows) Time: 53.954 ms
可以看到,等於、like的全匹配是用到索引的,like的模糊查詢還是全表掃描
三、創建gin索引
CREATE EXTENSION pg_trgm; CREATE INDEX ix_gallery_map_author ON gallery_map USING gin (author gin_trgm_ops); EXPLAIN ANALYZE select * from gallery_map where author like '曹%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on gallery_map (cost=19.96..2705.69 rows=1028 width=621) (actual time=0.419..1.771 rows=1031 loops=1) Recheck Cond: ((author)::text ~~ '曹%'::text) Heap Blocks: exact=438 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..19.71 rows=1028 width=0) (actual time=0.312..0.312 rows=1031 loops=1) Index Cond: ((author)::text ~~ '曹%'::text) Planning time: 0.358 ms Execution time: 1.916 ms (7 rows) Time: 2.843 ms EXPLAIN ANALYZE select * from gallery_map where author like '%耘%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.015..51.641 rows=1031 loops=1) Filter: ((author)::text ~~ '%耘%'::text) Rows Removed by Filter: 71315 Planning time: 0.268 ms Execution time: 51.957 ms (5 rows) Time: 52.899 ms EXPLAIN ANALYZE select * from gallery_map where author like '%研究室%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on gallery_map (cost=31.83..4788.42 rows=2559 width=621) (actual time=0.914..4.195 rows=2402 loops=1) Recheck Cond: ((author)::text ~~ '%研究室%'::text) Heap Blocks: exact=868 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..31.19 rows=2559 width=0) (actual time=0.694..0.694 rows=2402 loops=1) Index Cond: ((author)::text ~~ '%研究室%'::text) Planning time: 0.306 ms Execution time: 4.403 ms (7 rows) Time: 5.227 ms
gin_trgm索引的效果好多瞭
由於pg_trgm的索引是把字符串切成多個3元組,然後使用這些3元組做匹配,所以gin_trgm索引對於少於3個字符(包括漢字)的查詢,隻有前綴匹配會走索引
另外,還測試瞭btree_gin,效果和btree一樣
註意:
gin_trgm要求數據庫必須使用UTF-8編碼
demo_v1 # \l demo_v1 List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges ---------+-----------+----------+-------------+-------------+------------------- demo_v1 | wmpp_user | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
以上為個人經驗,希望能給大傢一個參考,也希望大傢多多支持WalkonNet。如有錯誤或未考慮完全的地方,望不吝賜教。
推薦閱讀:
- postgresql insert into select無法使用並行查詢的解決
- 淺談pg_hint_plan定制執行計劃
- 解決postgresql insert into select無法使用並行查詢的問題
- postgresql模糊匹配大殺器(推薦)
- PostgreSQL 對IN,EXISTS,ANY/ALL,JOIN的sql優化方案