MySQL kill不掉線程的原因

背景

在日常的使用過程中,時不時會遇到個別,或者大量的連接堆積在 MySQL 中的現象,這時一般會考慮使用 kill 命令強制殺死這些長時間堆積起來的連接,盡快釋放連接數和數據庫服務器的 CPU 資源。

問題描述

在實際操作 kill 命令的時候,有時候會發現連接並沒有第一時間被 kill 掉,仍舊在 processlist 裡面能看到,但是顯示的 Command 為 Killed,而不是常見的 Query 或者是 Execute 等。例如:

mysql> show processlist;
+----+------+--------------------+--------+---------+------+--------------+---------------------------------+
| Id | User | Host               | db     | Command | Time | State        | Info                            |
+----+------+--------------------+--------+---------+------+--------------+---------------------------------+
| 31 | root | 192.168.1.10:50410 | sbtest | Query   |    0 | starting     | show processlist                |
| 32 | root | 192.168.1.10:50412 | sbtest | Query   |   62 | User sleep   | select sleep(3600) from sbtest1 |
| 35 | root | 192.168.1.10:51252 | sbtest | Killed  |   47 | Sending data | select sleep(100) from sbtest1  |
| 36 | root | 192.168.1.10:51304 | sbtest | Query   |   20 | Sending data | select sleep(3600) from sbtest1 |
+----+------+--------------------+--------+---------+------+--------------+---------------------------------+

原因分析

遇事不決先翻官方文檔,這裡摘取部分官方文檔的內容:

When you use KILL, a thread-specific kill flag is set for the thread. In most cases, it might take some time for the thread to die because the kill flag is checked only at specific intervals:During SELECT operations, for ORDER BY and GROUP BY loops, the flag is checked after reading a block of rows. If the kill flag is set, the statement is aborted.
      ALTER TABLE operations that make a table copy check the kill flag periodically for each few copied rows read from the original table. If the kill flag was set, the statement is aborted and the temporary table is deleted.
      The KILL statement returns without waiting for confirmation, but the kill flag check aborts the operation within a reasonably small amount of time. Aborting the operation to perform any necessary cleanup also takes some time.
      During UPDATE or DELETE operations, the kill flag is checked after each block read and after each updated or deleted row. If the kill flag is set, the statement is aborted. If you are not using transactions, the changes are not rolled back.
      GET_LOCK() aborts and returns NULL.
      If the thread is in the table lock handler (state: Locked), the table lock is quickly aborted.
      If the thread is waiting for free disk space in a write call, the write is aborted with a “disk full” error message.

官方文檔第一段就很明確的說清楚瞭 kill 的作用機制:會給連接的線程設置一個線程級別的 kill 標記,等到下一次“標記檢測”的時候才會生效。這也意味著如果下一次“標記檢測”遲遲沒有發生,那麼就有可能會出現問題描述中的現象。

官方文檔中列舉瞭不少的場景,這裡根據官方的描述列舉幾個比較常見的問題場景:

  • select 語句中進行 order by,group by 的時候,如果服務器 CPU 資源比較緊張,那麼讀取/獲取一批數據的時間會變長,從而影響下一次“標記檢測”的時間。
  • 對大量數據進行 DML 操作的時候,kill 這一類 SQL 語句會觸發事務回滾(InnoDB引擎),雖然語句被 kill 掉瞭,但是回滾操作也會非常久。
  • kill alter 操作時,如果服務器的負載比較高,那麼操作一批數據的時間會變長,從而影響下一次“標記檢測”的時間。
  • 其實參考 kill 的作用機制,做一個歸納性的描述的話,那麼:任何阻塞/減慢 SQL 語句正常執行的行為,都會導致下一次“標記檢測”推遲、無法發生,最終都會導致 kill 操作的失敗。

模擬一下

這裡借用一個參數innodb_thread_concurrency來模擬阻塞 SQL 語句正常執行的場景:

Defines the maximum number of threads permitted inside of InnoDB. A value of 0 (the default) is interpreted as infinite concurrency (no limit). This variable is intended for performance tuning on high concurrency systems.

參照官方文檔的描述,這個參數設置得比較低的時候,超過數量限制的 InnoDB 查詢會被阻塞。因此在本次模擬中,這個參數被設置瞭一個非常低的值。

mysql> show variables like '%innodb_thread_concurrency%';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| innodb_thread_concurrency | 1     |
+---------------------------+-------+
1 row in set (0.00 sec)

然後開兩個數據庫連接(Session 1 和 Session 2),分別執行select sleep(3600) from sbtest.sbtest1語句,然後在第三個連接上 kill 掉 Session 2 的查詢:

Session 1:
mysql> select sleep(3600) from sbtest.sbtest1;

Session 2:
mysql> select sleep(3600) from sbtest.sbtest1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>

Session 3:
mysql> show processlist;
+----+------+--------------------+------+---------+------+--------------+----------------------------------------+
| Id | User | Host               | db   | Command | Time | State        | Info                                   |
+----+------+--------------------+------+---------+------+--------------+----------------------------------------+
| 44 | root | 172.16.64.10:39290 | NULL | Query   |   17 | User sleep   | select sleep(3600) from sbtest.sbtest1 |
| 45 | root | 172.16.64.10:39292 | NULL | Query   |    0 | starting     | show processlist                       |
| 46 | root | 172.16.64.10:39294 | NULL | Query   |    5 | Sending data | select sleep(3600) from sbtest.sbtest1 |
+----+------+--------------------+------+---------+------+--------------+----------------------------------------+
3 rows in set (0.00 sec)

mysql> kill 46;
Query OK, 0 rows affected (0.00 sec)

mysql> show processlist;
+----+------+--------------------+------+---------+------+--------------+----------------------------------------+
| Id | User | Host               | db   | Command | Time | State        | Info                                   |
+----+------+--------------------+------+---------+------+--------------+----------------------------------------+
| 44 | root | 172.16.64.10:39290 | NULL | Query   |   26 | User sleep   | select sleep(3600) from sbtest.sbtest1 |
| 45 | root | 172.16.64.10:39292 | NULL | Query   |    0 | starting     | show processlist                       |
| 46 | root | 172.16.64.10:39294 | NULL | Killed  |   14 | Sending data | select sleep(3600) from sbtest.sbtest1 |
+----+------+--------------------+------+---------+------+--------------+----------------------------------------+
3 rows in set (0.00 sec)

mysql>

可以看到,kill 命令執行之後,Session 2 的連接馬上就斷開瞭,但是 Session 2 發起的查詢仍舊殘留在 MySQL 中。當然,如果是因為innodb_thread_concurrency這個參數導致瞭類似的問題的話,直接使用set global的命令調高上限,或者直接設置為 0 就可以解決,這個參數的變更是實時對所有連接生效的。

總結一下

MySQL 的 kill 操作並不是想象中的直接強行終止數據庫連接,隻是發送瞭一個終止的信號,如果 SQL 自身的執行效率過慢,或者受到其他的因素影響(服務器負載高,觸發大量數據回滾)的話,那麼這個 kill 的操作很有可能並不能及時終止這些問題查詢,反而可能會因為程序側連接被斷開之後觸發重連,產生更多的低效查詢,進一步拖垮數據庫。

以上就是MySQL kill不掉線程的原因的詳細內容,更多關於MySQL kill線程的資料請關註WalkonNet其它相關文章!

推薦閱讀: