詳解Java ES多節點任務的高效分發與收集實現

一、概述

我們知道,當我們對es發起search請求或其他操作時,往往都是隨機選擇一個coordinator發起請求。而這請求,可能是該節點能處理,也可能是該節點不能處理的,也可能是需要多節點共同處理的,可以說是情況比較復雜。

所以,coordinator的重要工作是,做請求分發與結果收集。那麼,如何高性能和安全準確地實現這一功能則至關重要。

二、請求分發的簡單思路

我們這裡所說的請求分發,一般是針對多個網絡節點而言的。那麼,如何將請求發往多節點,並在最終將結果合並起來呢?

同步請求各節點,當第一個節點響應後,再向第二個節點發起請求,以此類推,直到所有節點請求完成,然後再將結果聚合起來。就完成瞭需求瞭,不費吹灰之力。簡單不?

無腦處理自有無腦處理的缺點。依次請求各節點,無法很好利用系統的分佈式特點,變並行為串行瞭,好不厲害。另外,對於當前請求,當其未處理完成這所有節點的分發收集工作時,當前線程將會一直被占用。從而,下遊請求將無法再接入,從而將你瞭並發能力,使其與線程池大小同日而語瞭。這可不好。

我們依次想辦法優化下。

首先,我們可以將串行分發請求變成並行分發,即可以使用多線程,向多節點發起請求,當某線程處理完成時,就返回結果。使用類似於CountDownLatch的同步工具,保證所有節點都處理完成後,再由外單主線程進行結果合並操作。

以上優化,看起來不錯,避免瞭同步的性能問題。但是,當有某個節點響應非常慢時,它將阻塞後續節點的工作,從而使整個請求變慢,從而同樣變成線程池的大小即是並發能力的瓶頸。可以說,治標不治本。

再來,繼續優化。我們可以釋放掉主線程的持有,讓每個分發線程處理完成當前任務時,都去檢查任務隊列,是否已完成。如果未完成則忽略,如果已完成,則啟動合並任務。

看起來不錯,已經有完全並發樣子瞭。但還能不能再優化?各節點的分發,同樣是同步請求,雖然處理簡單,但在這server響應期間,該線程仍是無法被使用的,如果類似請求過多,則必然是不小的消耗。如果能將單節點的請求,能夠做到異步處理,那樣豈不完美?但這恐怕不好做吧!不過,終歸是一個不錯的想法瞭。

三、es中search的多節點分發收集

我們以search的分發收集為出發點,觀看es如何辦成這件。原因是search在es中最為普遍與經典,雖說不得每個地方實現都一樣,但至少參考意義還是有的。故以search為切入點。search的框架工作流程,我們之前已經研究過,本節就直接以核心開始講解,它是在 TransportSearchAction.executeRequest() 中的。

// org.elasticsearch.action.search.TransportSearchAction#executeRequest
    private void executeRequest(Task task, SearchRequest searchRequest,
                                SearchAsyncActionProvider searchAsyncActionProvider, ActionListener<SearchResponse> listener) {
        final long relativeStartNanos = System.nanoTime();
        final SearchTimeProvider timeProvider =
            new SearchTimeProvider(searchRequest.getOrCreateAbsoluteStartMillis(), relativeStartNanos, System::nanoTime);
        ActionListener<SearchSourceBuilder> rewriteListener = ActionListener.wrap(source -> {
            if (source != searchRequest.source()) {
                // only set it if it changed - we don't allow null values to be set but it might be already null. this way we catch
                // situations when source is rewritten to null due to a bug
                searchRequest.source(source);
            }
            final ClusterState clusterState = clusterService.state();
            final SearchContextId searchContext;
            final Map<String, OriginalIndices> remoteClusterIndices;
            if (searchRequest.pointInTimeBuilder() != null) {
                searchContext = SearchContextId.decode(namedWriteableRegistry, searchRequest.pointInTimeBuilder().getId());
                remoteClusterIndices = getIndicesFromSearchContexts(searchContext, searchRequest.indicesOptions());
            } else {
                searchContext = null;
                remoteClusterIndices = remoteClusterService.groupIndices(searchRequest.indicesOptions(),
                    searchRequest.indices(), idx -> indexNameExpressionResolver.hasIndexAbstraction(idx, clusterState));
            }
            OriginalIndices localIndices = remoteClusterIndices.remove(RemoteClusterAware.LOCAL_CLUSTER_GROUP_KEY);
            if (remoteClusterIndices.isEmpty()) {
                executeLocalSearch(
                    task, timeProvider, searchRequest, localIndices, clusterState, listener, searchContext, searchAsyncActionProvider);
            } else {
                // 多節點數據請求
                if (shouldMinimizeRoundtrips(searchRequest)) {
                    // 通過 parentTaskId 關聯所有子任務
                    final TaskId parentTaskId = task.taskInfo(clusterService.localNode().getId(), false).getTaskId();
                    ccsRemoteReduce(parentTaskId, searchRequest, localIndices, remoteClusterIndices, timeProvider,
                        searchService.aggReduceContextBuilder(searchRequest),
                        remoteClusterService, threadPool, listener,
                        (r, l) -> executeLocalSearch(
                            task, timeProvider, r, localIndices, clusterState, l, searchContext, searchAsyncActionProvider));
                } else {
                    AtomicInteger skippedClusters = new AtomicInteger(0);
                    // 直接分發多shard請求到各節點
                    collectSearchShards(searchRequest.indicesOptions(), searchRequest.preference(), searchRequest.routing(),
                        skippedClusters, remoteClusterIndices, remoteClusterService, threadPool,
                        ActionListener.wrap(
                            searchShardsResponses -> {
                                // 當所有節點都響應後,再做後續邏輯處理,即此處的後置監聽
                                final BiFunction<String, String, DiscoveryNode> clusterNodeLookup =
                                    getRemoteClusterNodeLookup(searchShardsResponses);
                                final Map<String, AliasFilter> remoteAliasFilters;
                                final List<SearchShardIterator> remoteShardIterators;
                                if (searchContext != null) {
                                    remoteAliasFilters = searchContext.aliasFilter();
                                    remoteShardIterators = getRemoteShardsIteratorFromPointInTime(searchShardsResponses,
                                        searchContext, searchRequest.pointInTimeBuilder().getKeepAlive(), remoteClusterIndices);
                                } else {
                                    remoteAliasFilters = getRemoteAliasFilters(searchShardsResponses);
                                    remoteShardIterators = getRemoteShardsIterator(searchShardsResponses, remoteClusterIndices,
                                        remoteAliasFilters);
                                }
                                int localClusters = localIndices == null ? 0 : 1;
                                int totalClusters = remoteClusterIndices.size() + localClusters;
                                int successfulClusters = searchShardsResponses.size() + localClusters;
                                // 至於後續搜索實現如何,不在此間
                                executeSearch((SearchTask) task, timeProvider, searchRequest, localIndices, remoteShardIterators,
                                    clusterNodeLookup, clusterState, remoteAliasFilters, listener,
                                    new SearchResponse.Clusters(totalClusters, successfulClusters, skippedClusters.get()),
                                    searchContext, searchAsyncActionProvider);
                            },
                            listener::onFailure));
                }
            }
        }, listener::onFailure);
        if (searchRequest.source() == null) {
            rewriteListener.onResponse(searchRequest.source());
        } else {
            Rewriteable.rewriteAndFetch(searchRequest.source(), searchService.getRewriteContext(timeProvider::getAbsoluteStartMillis),
                rewriteListener);
        }
    }

可以看到,es的search功能,會被劃分為幾種類型,有點會走集群分發,而有的則不需要。我們自然是希望走集群分發的,所以,隻需看 collectSearchShards() 即可。這裡面其實就是對多個集群節點的依次請求,當然還有結果收集。

// org.elasticsearch.action.search.TransportSearchAction#collectSearchShards
static void collectSearchShards(IndicesOptions indicesOptions, String preference, String routing, AtomicInteger skippedClusters,
                                Map<String, OriginalIndices> remoteIndicesByCluster, RemoteClusterService remoteClusterService,
                                ThreadPool threadPool, ActionListener<Map<String, ClusterSearchShardsResponse>> listener) {
    // 使用該計數器進行結果控制
    final CountDown responsesCountDown = new CountDown(remoteIndicesByCluster.size());
    final Map<String, ClusterSearchShardsResponse> searchShardsResponses = new ConcurrentHashMap<>();
    final AtomicReference<Exception> exceptions = new AtomicReference<>();
    // 迭代各節點,依次發送請求
    for (Map.Entry<String, OriginalIndices> entry : remoteIndicesByCluster.entrySet()) {
        final String clusterAlias = entry.getKey();
        boolean skipUnavailable = remoteClusterService.isSkipUnavailable(clusterAlias);
        Client clusterClient = remoteClusterService.getRemoteClusterClient(threadPool, clusterAlias);
        final String[] indices = entry.getValue().indices();
        ClusterSearchShardsRequest searchShardsRequest = new ClusterSearchShardsRequest(indices)
            .indicesOptions(indicesOptions).local(true).preference(preference).routing(routing);
        // 向集群中 clusterAlias 異步發起請求處理 search
        clusterClient.admin().cluster().searchShards(searchShardsRequest,
            new CCSActionListener<ClusterSearchShardsResponse, Map<String, ClusterSearchShardsResponse>>(
                clusterAlias, skipUnavailable, responsesCountDown, skippedClusters, exceptions, listener) {
                @Override
                void innerOnResponse(ClusterSearchShardsResponse clusterSearchShardsResponse) {
                    // 每次單節點響應時,將結果存放到 searchShardsResponses 中
                    searchShardsResponses.put(clusterAlias, clusterSearchShardsResponse);
                }

                @Override
                Map<String, ClusterSearchShardsResponse> createFinalResponse() {
                    // 所有節點都返回時,將結果集返回
                    return searchShardsResponses;
                }
            }
        );
    }
}
// org.elasticsearch.client.support.AbstractClient.ClusterAdmin#searchShards
@Override
public void searchShards(final ClusterSearchShardsRequest request, final ActionListener<ClusterSearchShardsResponse> listener) {
    // 發起請求 indices:admin/shards/search_shards, 其對應處理器為 TransportClusterSearchShardsAction
    execute(ClusterSearchShardsAction.INSTANCE, request, listener);
}

以上是es向集群中多節點發起請求的過程,其重點在於所有的請求都是異步請求,即向各節點發送完成請求後,當前線程即為斷開狀態。這就體現瞭無阻塞的能力瞭,以listner形式進行處理後續業務。這對於發送自然沒有問題,但如何進行結果收集呢?實際上就是通過listner來處理的。在遠程節點響應後,listener.onResponse()將被調用。

3.1、多節點響應結果處理

這是我們本文討論的重點。前面我們看到es已經異步發送請求出去瞭(且不論其如何發送),所以如何收集結果也很關鍵。而es中的做法則很簡單,使用一個 ConcurrentHashMap 收集每個結果,一個CountDown標識是否已處理完成。

// org.elasticsearch.action.search.TransportSearchAction.CCSActionListener#CCSActionListener
CCSActionListener(String clusterAlias, boolean skipUnavailable, CountDown countDown, AtomicInteger skippedClusters,
                    AtomicReference<Exception> exceptions, ActionListener<FinalResponse> originalListener) {
    this.clusterAlias = clusterAlias;
    this.skipUnavailable = skipUnavailable;
    this.countDown = countDown;
    this.skippedClusters = skippedClusters;
    this.exceptions = exceptions;
    this.originalListener = originalListener;
}

// 成功時的響應
@Override
public final void onResponse(Response response) {
    // inner響應為將結果放入 searchShardsResponses 中
    innerOnResponse(response);
    // maybeFinish 則進行結果是否完成判定,如果完成,則調用回調方法,構造結果
    maybeFinish();
}

private void maybeFinish() {
    // 使用一個 AtomicInteger 進行控制
    if (countDown.countDown()) {
        Exception exception = exceptions.get();
        if (exception == null) {
            FinalResponse response;
            try {
                // 創建響應結果,此處 search 即為 searchShardsResponses
                response = createFinalResponse();
            } catch(Exception e) {
                originalListener.onFailure(e);
                return;
            }
            // 成功響應回調,實現結果收集後的其他業務處理
            originalListener.onResponse(response);
        } else {
            originalListener.onFailure(exceptions.get());
        }
    }
}
// CountDown 實現比較簡單,隻有最後一個返回true, 其他皆為false, 即實現瞭 At Most Once 語義
/**
* Decrements the count-down and returns <code>true</code> iff this call
* reached zero otherwise <code>false</code>
*/
public boolean countDown() {
    assert originalCount > 0;
    for (;;) {
        final int current = countDown.get();
        assert current >= 0;
        if (current == 0) {
            return false;
        }
        if (countDown.compareAndSet(current, current - 1)) {
            return current == 1;
        }
    }
}

可見,ES中的結果收集,是以一個 AtomicInteger 實現的CountDown來處理的,當所有節點都響應時,就處理最終結果,否則將每個節點的數據放入ConcurrentHashMap中暫存起來。

而通過一個Client通用的異步調用框架,實現多節點的異步提交。整個節點響應以 CCSActionListener 作為接收者。可以說是比較簡潔的瞭,好像也沒有我們前面討論的復雜性。因為:大道至簡。

3.2、異步提交請求實現

我們知道,如果本地想實現異步提交請求,隻需使用另一個線程或者線程池技術,即可實現。而對於遠程Client的異步提交,則還需要借助於外部工具瞭。此處借助於Netty的channel.write()實現,節點響應時再回調回來,從而恢復上下文。整個過程,沒有一點阻塞同步,從而達到瞭高效的處理能力,當然還有其他的一些異常處理,自不必說。

具體樣例大致如下:因最終的處理器是以 TransportClusterSearchShardsAction 進行處理的,所以直接轉到 TransportClusterSearchShardsAction。

// org.elasticsearch.action.admin.cluster.shards.TransportClusterSearchShardsAction
public class TransportClusterSearchShardsAction extends
    TransportMasterNodeReadAction<ClusterSearchShardsRequest, ClusterSearchShardsResponse> {

    private final IndicesService indicesService;

    @Inject
    public TransportClusterSearchShardsAction(TransportService transportService, ClusterService clusterService,
                                              IndicesService indicesService, ThreadPool threadPool, ActionFilters actionFilters,
                                              IndexNameExpressionResolver indexNameExpressionResolver) {
        super(ClusterSearchShardsAction.NAME, transportService, clusterService, threadPool, actionFilters,
            ClusterSearchShardsRequest::new, indexNameExpressionResolver, ClusterSearchShardsResponse::new, ThreadPool.Names.SAME);
        this.indicesService = indicesService;
    }

    @Override
    protected ClusterBlockException checkBlock(ClusterSearchShardsRequest request, ClusterState state) {
        return state.blocks().indicesBlockedException(ClusterBlockLevel.METADATA_READ,
                indexNameExpressionResolver.concreteIndexNames(state, request));
    }

    @Override
    protected void masterOperation(final ClusterSearchShardsRequest request, final ClusterState state,
                                   final ActionListener<ClusterSearchShardsResponse> listener) {
        ClusterState clusterState = clusterService.state();
        String[] concreteIndices = indexNameExpressionResolver.concreteIndexNames(clusterState, request);
        Map<String, Set<String>> routingMap = indexNameExpressionResolver.resolveSearchRouting(state, request.routing(), request.indices());
        Map<String, AliasFilter> indicesAndFilters = new HashMap<>();
        Set<String> indicesAndAliases = indexNameExpressionResolver.resolveExpressions(clusterState, request.indices());
        for (String index : concreteIndices) {
            final AliasFilter aliasFilter = indicesService.buildAliasFilter(clusterState, index, indicesAndAliases);
            final String[] aliases = indexNameExpressionResolver.indexAliases(clusterState, index, aliasMetadata -> true, true,
                indicesAndAliases);
            indicesAndFilters.put(index, new AliasFilter(aliasFilter.getQueryBuilder(), aliases));
        }

        Set<String> nodeIds = new HashSet<>();
        GroupShardsIterator<ShardIterator> groupShardsIterator = clusterService.operationRouting()
            .searchShards(clusterState, concreteIndices, routingMap, request.preference());
        ShardRouting shard;
        ClusterSearchShardsGroup[] groupResponses = new ClusterSearchShardsGroup[groupShardsIterator.size()];
        int currentGroup = 0;
        for (ShardIterator shardIt : groupShardsIterator) {
            ShardId shardId = shardIt.shardId();
            ShardRouting[] shardRoutings = new ShardRouting[shardIt.size()];
            int currentShard = 0;
            shardIt.reset();
            while ((shard = shardIt.nextOrNull()) != null) {
                shardRoutings[currentShard++] = shard;
                nodeIds.add(shard.currentNodeId());
            }
            groupResponses[currentGroup++] = new ClusterSearchShardsGroup(shardId, shardRoutings);
        }
        DiscoveryNode[] nodes = new DiscoveryNode[nodeIds.size()];
        int currentNode = 0;
        for (String nodeId : nodeIds) {
            nodes[currentNode++] = clusterState.getNodes().get(nodeId);
        }
        listener.onResponse(new ClusterSearchShardsResponse(groupResponses, nodes, indicesAndFilters));
    }
}
// doExecute 在父類中完成
// org.elasticsearch.action.support.master.TransportMasterNodeAction#doExecute
@Override
protected void doExecute(Task task, final Request request, ActionListener<Response> listener) {
    ClusterState state = clusterService.state();
    logger.trace("starting processing request [{}] with cluster state version [{}]", request, state.version());
    if (task != null) {
        request.setParentTask(clusterService.localNode().getId(), task.getId());
    }
    new AsyncSingleAction(task, request, listener).doStart(state);
}

// org.elasticsearch.action.support.master.TransportMasterNodeAction.AsyncSingleAction#doStart
AsyncSingleAction(Task task, Request request, ActionListener<Response> listener) {
    this.task = task;
    this.request = request;
    this.listener = listener;
    this.startTime = threadPool.relativeTimeInMillis();
}

protected void doStart(ClusterState clusterState) {
    try {
        final DiscoveryNodes nodes = clusterState.nodes();
        if (nodes.isLocalNodeElectedMaster() || localExecute(request)) {
            // check for block, if blocked, retry, else, execute locally
            final ClusterBlockException blockException = checkBlock(request, clusterState);
            if (blockException != null) {
                if (!blockException.retryable()) {
                    listener.onFailure(blockException);
                } else {
                    logger.debug("can't execute due to a cluster block, retrying", blockException);
                    // 重試處理
                    retry(clusterState, blockException, newState -> {
                        try {
                            ClusterBlockException newException = checkBlock(request, newState);
                            return (newException == null || !newException.retryable());
                        } catch (Exception e) {
                            // accept state as block will be rechecked by doStart() and listener.onFailure() then called
                            logger.trace("exception occurred during cluster block checking, accepting state", e);
                            return true;
                        }
                    });
                }
            } else {
                ActionListener<Response> delegate = ActionListener.delegateResponse(listener, (delegatedListener, t) -> {
                    if (t instanceof FailedToCommitClusterStateException || t instanceof NotMasterException) {
                        logger.debug(() -> new ParameterizedMessage("master could not publish cluster state or " +
                            "stepped down before publishing action [{}], scheduling a retry", actionName), t);
                        retryOnMasterChange(clusterState, t);
                    } else {
                        delegatedListener.onFailure(t);
                    }
                });
                // 本地節點執行結果,直接以異步線程處理即可
                threadPool.executor(executor)
                    .execute(ActionRunnable.wrap(delegate, l -> masterOperation(task, request, clusterState, l)));
            }
        } else {
            if (nodes.getMasterNode() == null) {
                logger.debug("no known master node, scheduling a retry");
                retryOnMasterChange(clusterState, null);
            } else {
                DiscoveryNode masterNode = nodes.getMasterNode();
                final String actionName = getMasterActionName(masterNode);
                // 發送到master節點,以netty作為通訊工具,完成後回調 當前listner
                transportService.sendRequest(masterNode, actionName, request,
                    new ActionListenerResponseHandler<Response>(listener, responseReader) {
                        @Override
                        public void handleException(final TransportException exp) {
                            Throwable cause = exp.unwrapCause();
                            if (cause instanceof ConnectTransportException ||
                                (exp instanceof RemoteTransportException && cause instanceof NodeClosedException)) {
                                // we want to retry here a bit to see if a new master is elected
                                logger.debug("connection exception while trying to forward request with action name [{}] to " +
                                        "master node [{}], scheduling a retry. Error: [{}]",
                                    actionName, nodes.getMasterNode(), exp.getDetailedMessage());
                                retryOnMasterChange(clusterState, cause);
                            } else {
                                listener.onFailure(exp);
                            }
                        }
                });
            }
        }
    } catch (Exception e) {
        listener.onFailure(e);
    }
}

可見,es中確實有兩種異步的提交方式,一種是當前節點就是執行節點,直接使用線程池提交;另一種是遠程節點則起網絡調用,最終如何實現異步且往下看。

// org.elasticsearch.transport.TransportService#sendRequest
public final <T extends TransportResponse> void sendRequest(final DiscoveryNode node, final String action,
                                                            final TransportRequest request,
                                                            final TransportRequestOptions options,
                                                            TransportResponseHandler<T> handler) {
    final Transport.Connection connection;
    try {
        // 假設不是本節點,則獲取遠程的一個 connection, channel
        connection = getConnection(node);
    } catch (final NodeNotConnectedException ex) {
        // the caller might not handle this so we invoke the handler
        handler.handleException(ex);
        return;
    }
    sendRequest(connection, action, request, options, handler);
}
// org.elasticsearch.transport.TransportService#getConnection
/**
    * Returns either a real transport connection or a local node connection if we are using the local node optimization.
    * @throws NodeNotConnectedException if the given node is not connected
    */
public Transport.Connection getConnection(DiscoveryNode node) {
    if (isLocalNode(node)) {
        return localNodeConnection;
    } else {
        return connectionManager.getConnection(node);
    }
}

// org.elasticsearch.transport.TransportService#sendRequest
/**
    * Sends a request on the specified connection. If there is a failure sending the request, the specified handler is invoked.
    *
    * @param connection the connection to send the request on
    * @param action     the name of the action
    * @param request    the request
    * @param options    the options for this request
    * @param handler    the response handler
    * @param <T>        the type of the transport response
    */
public final <T extends TransportResponse> void sendRequest(final Transport.Connection connection, final String action,
                                                            final TransportRequest request,
                                                            final TransportRequestOptions options,
                                                            final TransportResponseHandler<T> handler) {
    try {
        final TransportResponseHandler<T> delegate;
        if (request.getParentTask().isSet()) {
            // If the connection is a proxy connection, then we will create a cancellable proxy task on the proxy node and an actual
            // child task on the target node of the remote cluster.
            //  ----> a parent task on the local cluster
            //        |
            //         ----> a proxy task on the proxy node on the remote cluster
            //               |
            //                ----> an actual child task on the target node on the remote cluster
            // To cancel the child task on the remote cluster, we must send a cancel request to the proxy node instead of the target
            // node as the parent task of the child task is the proxy task not the parent task on the local cluster. Hence, here we
            // unwrap the connection and keep track of the connection to the proxy node instead of the proxy connection.
            final Transport.Connection unwrappedConn = unwrapConnection(connection);
            final Releasable unregisterChildNode = taskManager.registerChildConnection(request.getParentTask().getId(), unwrappedConn);
            delegate = new TransportResponseHandler<T>() {
                @Override
                public void handleResponse(T response) {
                    unregisterChildNode.close();
                    handler.handleResponse(response);
                }

                @Override
                public void handleException(TransportException exp) {
                    unregisterChildNode.close();
                    handler.handleException(exp);
                }

                @Override
                public String executor() {
                    return handler.executor();
                }

                @Override
                public T read(StreamInput in) throws IOException {
                    return handler.read(in);
                }

                @Override
                public String toString() {
                    return getClass().getName() + "/[" + action + "]:" + handler.toString();
                }
            };
        } else {
            delegate = handler;
        }
        asyncSender.sendRequest(connection, action, request, options, delegate);
    } catch (final Exception ex) {
        // the caller might not handle this so we invoke the handler
        final TransportException te;
        if (ex instanceof TransportException) {
            te = (TransportException) ex;
        } else {
            te = new TransportException("failure to send", ex);
        }
        handler.handleException(te);
    }
}

// org.elasticsearch.transport.TransportService#sendRequestInternal
private <T extends TransportResponse> void sendRequestInternal(final Transport.Connection connection, final String action,
                                                                final TransportRequest request,
                                                                final TransportRequestOptions options,
                                                                TransportResponseHandler<T> handler) {
    if (connection == null) {
        throw new IllegalStateException("can't send request to a null connection");
    }
    DiscoveryNode node = connection.getNode();

    Supplier<ThreadContext.StoredContext> storedContextSupplier = threadPool.getThreadContext().newRestorableContext(true);
    ContextRestoreResponseHandler<T> responseHandler = new ContextRestoreResponseHandler<>(storedContextSupplier, handler);
    // TODO we can probably fold this entire request ID dance into connection.sendReqeust but it will be a bigger refactoring
    final long requestId = responseHandlers.add(new Transport.ResponseContext<>(responseHandler, connection, action));
    final TimeoutHandler timeoutHandler;
    if (options.timeout() != null) {
        timeoutHandler = new TimeoutHandler(requestId, connection.getNode(), action);
        responseHandler.setTimeoutHandler(timeoutHandler);
    } else {
        timeoutHandler = null;
    }
    try {
        if (lifecycle.stoppedOrClosed()) {
            /*
                * If we are not started the exception handling will remove the request holder again and calls the handler to notify the
                * caller. It will only notify if toStop hasn't done the work yet.
                */
            throw new NodeClosedException(localNode);
        }
        if (timeoutHandler != null) {
            assert options.timeout() != null;
            timeoutHandler.scheduleTimeout(options.timeout());
        }
        connection.sendRequest(requestId, action, request, options); // local node optimization happens upstream
    } catch (final Exception e) {
        // usually happen either because we failed to connect to the node
        // or because we failed serializing the message
        final Transport.ResponseContext<? extends TransportResponse> contextToNotify = responseHandlers.remove(requestId);
        // If holderToNotify == null then handler has already been taken care of.
        if (contextToNotify != null) {
            if (timeoutHandler != null) {
                timeoutHandler.cancel();
            }
            // callback that an exception happened, but on a different thread since we don't
            // want handlers to worry about stack overflows. In the special case of running into a closing node we run on the current
            // thread on a best effort basis though.
            final SendRequestTransportException sendRequestException = new SendRequestTransportException(node, action, e);
            final String executor = lifecycle.stoppedOrClosed() ? ThreadPool.Names.SAME : ThreadPool.Names.GENERIC;
            threadPool.executor(executor).execute(new AbstractRunnable() {
                @Override
                public void onRejection(Exception e) {
                    // if we get rejected during node shutdown we don't wanna bubble it up
                    logger.debug(
                        () -> new ParameterizedMessage(
                            "failed to notify response handler on rejection, action: {}",
                            contextToNotify.action()),
                        e);
                }
                @Override
                public void onFailure(Exception e) {
                    logger.warn(
                        () -> new ParameterizedMessage(
                            "failed to notify response handler on exception, action: {}",
                            contextToNotify.action()),
                        e);
                }
                @Override
                protected void doRun() throws Exception {
                    contextToNotify.handler().handleException(sendRequestException);
                }
            });
        } else {
            logger.debug("Exception while sending request, handler likely already notified due to timeout", e);
        }
    }
}
// org.elasticsearch.transport.RemoteConnectionManager.ProxyConnection#sendRequest
@Override
public void sendRequest(long requestId, String action, TransportRequest request, TransportRequestOptions options)
    throws IOException, TransportException {
    connection.sendRequest(requestId, TransportActionProxy.getProxyAction(action),
        TransportActionProxy.wrapRequest(targetNode, request), options);
}
// org.elasticsearch.transport.TcpTransport.NodeChannels#sendRequest
@Override
public void sendRequest(long requestId, String action, TransportRequest request, TransportRequestOptions options)
    throws IOException, TransportException {
    if (isClosing.get()) {
        throw new NodeNotConnectedException(node, "connection already closed");
    }
    TcpChannel channel = channel(options.type());
    outboundHandler.sendRequest(node, channel, requestId, action, request, options, getVersion(), compress, false);
}
// org.elasticsearch.transport.OutboundHandler#sendRequest
/**
    * Sends the request to the given channel. This method should be used to send {@link TransportRequest}
    * objects back to the caller.
    */
void sendRequest(final DiscoveryNode node, final TcpChannel channel, final long requestId, final String action,
                    final TransportRequest request, final TransportRequestOptions options, final Version channelVersion,
                    final boolean compressRequest, final boolean isHandshake) throws IOException, TransportException {
    Version version = Version.min(this.version, channelVersion);
    OutboundMessage.Request message = new OutboundMessage.Request(threadPool.getThreadContext(), features, request, version, action,
        requestId, isHandshake, compressRequest);
    ActionListener<Void> listener = ActionListener.wrap(() ->
        messageListener.onRequestSent(node, requestId, action, request, options));
    sendMessage(channel, message, listener);
}
// org.elasticsearch.transport.OutboundHandler#sendMessage
private void sendMessage(TcpChannel channel, OutboundMessage networkMessage, ActionListener<Void> listener) throws IOException {
    MessageSerializer serializer = new MessageSerializer(networkMessage, bigArrays);
    SendContext sendContext = new SendContext(channel, serializer, listener, serializer);
    internalSend(channel, sendContext);
}
private void internalSend(TcpChannel channel, SendContext sendContext) throws IOException {
    channel.getChannelStats().markAccessed(threadPool.relativeTimeInMillis());
    BytesReference reference = sendContext.get();
    // stash thread context so that channel event loop is not polluted by thread context
    try (ThreadContext.StoredContext existing = threadPool.getThreadContext().stashContext()) {
        channel.sendMessage(reference, sendContext);
    } catch (RuntimeException ex) {
        sendContext.onFailure(ex);
        CloseableChannel.closeChannel(channel);
        throw ex;
    }
}
// org.elasticsearch.transport.netty4.Netty4TcpChannel#sendMessage
@Override
public void sendMessage(BytesReference reference, ActionListener<Void> listener) {
    // netty 發送數據,異步回調,完成異步請求
    channel.writeAndFlush(Netty4Utils.toByteBuf(reference), addPromise(listener, channel));

    if (channel.eventLoop().isShutdown()) {
        listener.onFailure(new TransportException("Cannot send message, event loop is shutting down."));
    }
}

簡單說,就是依托於netty的pipeline機制以及eventLoop實現遠程異步請求,至於具體實現如何,請參考之前文章或各網文。

以上就是詳解Java ES多節點任務的高效分發與收集實現的詳細內容,更多關於Java ES多節點任務的高效分發 收集實現的資料請關註WalkonNet其它相關文章!

推薦閱讀: