当前位置: 首页 > 分布式系统 > 正文

ZooKeeper客户端打印当前连接的服务器地址为null

关键字:
1 星2 星3 星4 星5 星 (1 次投票, 评分: 5.00, 总分: 5)
Loading ... Loading ...
baidu_share

问题描述
公司之前进行了几次机房容灾演习中,经常是模拟一个机房挂掉的场景,把一个机房的网络切掉,使得这个机房内部网络通信正常,与外部的网络不通。在容灾演习过程中,我们发现ZK的客户端应用中出现大量类似这样的日志:

1
An exception was thrown while closing send thread for ession 0x for server null, unexpected error, closing socket connection and attempting

从这个日志中,红色部分出现的是null。当时看到这个情况,觉得,正常情况正在,这个地方应用出现的是那个被隔离的机房中部署的ZK的机器IP的,但是这里出现的是null,非常困惑。
具体描述也可以在这里查看:https://issues.apache.org/jira/browse/ZOOKEEPER-1480
问题定位
看了下3.4.3及其以前版本的ZooKeeper代码,发现问题出在这里,日志打印的逻辑在这里:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
} catch (Throwable e) {  
    if (closing) {  
        if (LOG.isDebugEnabled()) {  
            // closing so this is expected  
            LOG.debug("An exception was thrown while closing send thread for session 0x"  
                    + Long.toHexString(getSessionId())  
                    + " : " + e.getMessage());  
        }  
        break;  
    } else {  
        // this is ugly, you have a better way speak up  
        if (e instanceof SessionExpiredException) {  
            LOG.info(e.getMessage() + ", closing socket connection");  
        } else if (e instanceof SessionTimeoutException) {  
            LOG.info(e.getMessage() + RETRY_CONN_MSG);  
        } else if (e instanceof EndOfStreamException) {  
            LOG.info(e.getMessage() + RETRY_CONN_MSG);  
        } else if (e instanceof RWServerFoundException) {  
            LOG.info(e.getMessage());  
        } else {  
            LOG.warn(  
                    "Session 0x"  
                            + Long.toHexString(getSessionId())  
                            + " for server "  
                            + clientCnxnSocket.getRemoteSocketAddress()  
                            + ", unexpected error"  
                            + RETRY_CONN_MSG, e);  
        }

可以看到,在打印日志过程,是通过clientCnxnSocket.getRemoteSocketAddress() 来获取当前连接的服务器地址的,那再来看下这个方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/** 
     * Returns the address to which the socket is connected. 
     * @return ip address of the remote side of the connection or null if not connected 
     */ 
    @Override 
    SocketAddress getRemoteSocketAddress() { 
        // a lot could go wrong here, so rather than put in a bunch of code 
        // to check for nulls all down the chain let's do it the simple 
        // yet bulletproof way 
        try { 
            return ((SocketChannel) sockKey.channel()).socket() 
                    .getRemoteSocketAddress(); 
        } catch (NullPointerException e) { 
            return null; 
        } 
} 
    /** 
     * Returns the address of the endpoint this socket is connected to, or 
     * <code>null</code> if it is unconnected. 
     * @return a <code>SocketAddress</code> reprensenting the remote endpoint of this 
     *         socket, or <code>null</code> if it is not connected yet. 
     * @see #getInetAddress() 
     * @see #getPort() 
     * @see #connect(SocketAddress, int) 
     * @see #connect(SocketAddress) 
     * @since 1.4 
     */ 
    public SocketAddress getRemoteSocketAddress() { 
      if (!isConnected()) 
        return null; 
      return new InetSocketAddress(getInetAddress(), getPort()); 
}

所以,现在基本就可以定位问题了,如果服务器端非正常关闭socket连接(例如容灾演习的时候把机房网络切断),那么getRemoteSocketAddress这个方法就会返回null了,也就是日志中为什么出现null的原因了。

问题解决
这个日志输出对于开发人员来说非常重要,在排查问题过程中可以清楚的定位当时是哪台服务器出现问题,但是这里一旦输出null,那么将无从下手。这里我做了一些改进,确保出现问题的时候,客户端能够输出当前出现问题的服务器IP。在这里下载补丁:https://github.com/downloads/nileader/taokeeper/getCurrentZooKeeperAddr_for_3.4.3.patch
首先是给org.apache.zookeeper.client.HostProvider类添加两个接口,分别用于获取“当前地址列中正在使用的地址序号”和获取“所有地址列表”。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public interface HostProvider { 
    …… …… 
    /**  
     * Get current index that is connecting or connected.  
     * @see ZOOKEEPER-1480:https://issues.apache.org/jira/browse/ZOOKEEPER-1480 
     * */ 
    public int getCurrentIndex(); 
    /** 
     * Get all server address that config when use zookeeper client. 
     * @return List  
     * @see ZOOKEEPER-1480:https://issues.apache.org/jira/browse/ZOOKEEPER-1480 
     */ 
    public List<InetSocketAddress> getAllServerAddress(); 
 
}

其次是修改org.apache.zookeeper.ClientCnxn类中日志输出逻辑:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/** 
         * Get current zookeeper addr that client is connected or connecting.<br> 
         * Note:The method will return null if can't not get host ip. 
         * */ 
        private InetSocketAddress getCurrentZooKeeperAddr(){ 
            try { 
                InetSocketAddress addr = null; 
                if( null == hostProvider || null == hostProvider.getAllServerAddress() ) 
                    return addr; 
                int index = hostProvider.getCurrentIndex(); 
                if ( index >= 0  ) { 
                    addr = hostProvider.getAllServerAddress().get( index ); 
                } 
                return addr; 
            } catch ( Exception e ) { 
                return null; 
            } 
        } 
…… …… 
        //get current ZK host to log 
        InetSocketAddress addr = getCurrentZooKeeperAddr(); 
 
        LOG.warn( 
            "Session 0x" 
                    + Long.toHexString(getSessionId()) 
                    + " for server ip: " + addr + ", detail conn: " 
                    + clientCnxnSocket.getRemoteSocketAddress() 
                    + ", unexpected error" 
                    + RETRY_CONN_MSG, e);

本文固定链接: http://www.chepoo.com/zookeeper-client-print-null-error.html | IT技术精华网

ZooKeeper客户端打印当前连接的服务器地址为null:等您坐沙发呢!

发表评论