CLOSE_WAIT狀態解決方案

這個問題之前沒有怎麼留意過,是最近在面試過程中遇到的一個問題,面瞭兩傢公司,兩傢公司竟然都面到到瞭這個問題,不得不使我開始關註這個問題。說起CLOSE_WAIT狀態,如果不知道的話,還是先瞧一下TCP的狀態轉移圖吧。

 

關閉socket分為主動關閉(Active closure)和被動關閉(Passive closure)兩種情況。前者是指有本地主機主動發起的關閉;而後者則是指本地主機檢測到遠程主機發起關閉之後,作出回應,從而關閉整個連接。將關閉部分的狀態轉移摘出來,就得到瞭下圖:

TCP 狀態變化

產生原因
通過圖上,我們來分析,什麼情況下,連接處於CLOSE_WAIT狀態呢?
在被動關閉連接情況下,在已經接收到FIN,但是還沒有發送自己的FIN的時刻,連接處於CLOSE_WAIT狀態。
通常來講,CLOSE_WAIT狀態的持續時間應該很短,正如SYN_RCVD狀態。但是在一些特殊情況下,就會出現連接長時間處於CLOSE_WAIT狀態的情況。

出現大量close_wait的現象,主要原因是某種情況下對方關閉瞭socket鏈接,但是我方忙與讀或者寫,沒有關閉連接。代碼需要判斷socket,一旦讀到0,斷開連接,read返回負,檢查一下errno,如果不是AGAIN,就斷開連接。

參考資料4中描述,通過發送SYN-FIN報文來達到產生CLOSE_WAIT狀態連接,沒有進行具體實驗。不過個人認為協議棧會丟棄這種非法報文,感興趣的同學可以測試一下,然後把結果告訴我;-)

為瞭更加清楚的說明這個問題,我們寫一個測試程序,註意這個測試程序是有缺陷的。
隻要我們構造一種情況,使得對方關閉瞭socket,我們還在read,或者是直接不關閉socket就會構造這樣的情況。

server.c:

#include <stdio.h>
#include <string.h>
#include <netinet/in.h>
 
#define MAXLINE 80
#define SERV_PORT 8000
 
int main(void)
{
    struct sockaddr_in servaddr, cliaddr;
    socklen_t cliaddr_len;
    int listenfd, connfd;
    char buf[MAXLINE];
    char str[INET_ADDRSTRLEN];
    int i, n;
 
    listenfd = socket(AF_INET, SOCK_STREAM, 0);
 
        int opt = 1;
        setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
 
    bzero(&servaddr, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(SERV_PORT);
    
    bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr));
 
    listen(listenfd, 20);
 
    printf("Accepting connections ...\n");
    while (1) {
        cliaddr_len = sizeof(cliaddr);
        connfd = accept(listenfd, 
                (struct sockaddr *)&cliaddr, &cliaddr_len);
        //while (1) 
                {
            n = read(connfd, buf, MAXLINE);
            if (n == 0) {
                printf("the other side has been closed.\n");
                break;
            }
            printf("received from %s at PORT %d\n",
                   inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)),
                   ntohs(cliaddr.sin_port));
    
            for (i = 0; i < n; i++)
                buf[i] = toupper(buf[i]);
            write(connfd, buf, n);
        }
        //這裡故意不關閉socket,或者是在close之前加上一個sleep都可以
        //sleep(5);
        //close(connfd);
    }
}

client.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
 
#define MAXLINE 80
#define SERV_PORT 8000
 
int main(int argc, char *argv[])
{
    struct sockaddr_in servaddr;
    char buf[MAXLINE];
    int sockfd, n;
    char *str;
    
    if (argc != 2) {
        fputs("usage: ./client message\n", stderr);
        exit(1);
    }
    str = argv[1];
    
    sockfd = socket(AF_INET, SOCK_STREAM, 0);
 
    bzero(&servaddr, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr);
    servaddr.sin_port = htons(SERV_PORT);
    
    connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr));
 
    write(sockfd, str, strlen(str));
 
    n = read(sockfd, buf, MAXLINE);
    printf("Response from server:\n");
    write(STDOUT_FILENO, buf, n);
    write(STDOUT_FILENO, "\n", 1);
 
    close(sockfd);
    return 0;
}

結果如下:

debian-wangyao:~$ ./client a
Response from server:
A
debian-wangyao:~$ ./client b
Response from server:
B
debian-wangyao:~$ ./client c
Response from server:
C
debian-wangyao:~$ netstat -antp | grep CLOSE_WAIT
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        1      0 127.0.0.1:8000          127.0.0.1:58309         CLOSE_WAIT  6979/server     
tcp        1      0 127.0.0.1:8000          127.0.0.1:58308         CLOSE_WAIT  6979/server     
tcp        1      0 127.0.0.1:8000          127.0.0.1:58307         CLOSE_WAIT  6979/server  

解決方法
基本的思想就是要檢測出對方已經關閉的socket,然後關閉它。

1.代碼需要判斷socket,一旦read返回0,斷開連接,read返回負,檢查一下errno,如果不是AGAIN,也斷開連接。(註:在UNP 7.5節的圖7.6中,可以看到使用select能夠檢測出對方發送瞭FIN,再根據這條規則就可以處理CLOSE_WAIT的連接)

2.給每一個socket設置一個時間戳last_update,每接收或者是發送成功數據,就用當前時間更新這個時間戳。定期檢查所有的時間戳,如果時間戳與當前時間差值超過一定的閾值,就關閉這個socket。

3.使用一個Heart-Beat線程,定期向socket發送指定格式的心跳數據包,如果接收到對方的RST報文,說明對方已經關閉瞭socket,那麼我們也關閉這個socket。

4.設置SO_KEEPALIVE選項,並修改內核參數

前提是啟用socket的KEEPALIVE機制:
//啟用socket連接的KEEPALIVE
int iKeepAlive = 1;
setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (void *)&iKeepAlive, sizeof(iKeepAlive));

tcp_keepalive_intvl (integer; default: 75; since Linux 2.4)
The number of seconds between TCP keep-alive probes.

tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
The  maximum  number  of  TCP  keep-alive  probes  to  send before giving up and killing the connection if no response is obtained from the other end.

tcp_keepalive_time (integer; default: 7200; since Linux 2.2)
The number of seconds a connection needs to be idle before TCP begins sending out  keep-alive  probes.   Keep-alives  are only  sent when the SO_KEEPALIVE socket option is enabled.  The default value is 7200 seconds (2 hours).  An idle connec‐tion is terminated after approximately an additional 11 minutes (9 probes an interval of 75  seconds  apart)  when  keep-alive is enabled.

echo 120 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 2 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 1 > /proc/sys/net/ipv4/tcp_keepalive_probes

除瞭修改內核參數外,可以使用setsockopt修改socket參數,參考man 7 socket。

int KeepAliveProbes=1;
int KeepAliveIntvl=2;
int KeepAliveTime=120;
setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, (void *)&KeepAliveProbes, sizeof(KeepAliveProbes));
setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, (void *)&KeepAliveTime, sizeof(KeepAliveTime));
setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, (void *)&KeepAliveIntvl, sizeof(KeepAliveIntvl));

到此這篇關於CLOSE_WAIT狀態解決方案的文章就介紹到這瞭,更多相關CLOSE_WAIT狀態內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet!

推薦閱讀: