dubbo系列之集群容错-Cluster集群

简要

cluster是用来干什么的？

cluster少Directory中的多个invoker伪装成一个invoker，来容错，调试失败重试。

源码构成

Cluster接口

@SPI(FailoverCluster.NAME)
public interface Cluster {
    @Adaptive
    <T> Invoker<T> join(Directory<T> directory) throws RpcException;

}

**@SPI(FailoverCluster.NAME)**代表失败转移，当出现失败的时候，重新重试其他服务器。

看一下Cluster的继承体系图，共

Cluster实现类

除MockClusterWrapper外共八个实现类

FailoverCluster：

（默认）失败转移，当出现失败，重试其它服务器，通常用于读操作，但重试会带来更长延迟。

FailfastCluster：

快速失败，只发起一次调用，失败立即报错，通常用于非幂等性的写操作。

FailbackCluster：

失败自动恢复，后台记录失败请求，定时重发，通常用于消息通知操作。

FailsafeCluster：

失败安全，出现异常时，直接忽略，通常用于写入审计日志等操作。

ForkingCluster：

并行调用，只要一个成功即返回，通常用于实时性要求较高的操作，但需要浪费更多服务资源。

BroadcastCluster:

广播调用。遍历所有Invokers, 逐个调用每个调用catch住异常不影响其他invoker调用

MergeableCluster:

分组聚合，按组合并返回结果，比如菜单服务，接口一样，但有多种实现，用group区分，现在消费方需从每种group中调用一次返回结果，合并结果返回，这样就可以实现聚合菜单项。

AvailableCluster:

获取可用的调用。遍历所有Invokers判断Invoker.isAvalible,只要一个有为true直接调用返回，不管成不成功

本篇博客重点只讲FailoverCluster和FailfastCluster。

FailoverClusterInvoker类

/**
 * 失败转移，当出现失败，重试其它服务器，通常用于读操作，但重试会带来更长延迟。
 */
public class FailoverClusterInvoker<T> extends AbstractClusterInvoker<T> {
 	// .......
    public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        List<Invoker<T>> copyinvokers = invokers;
        checkInvokers(copyinvokers, invocation);
        //len为失败重试次数，默认是3
        int len = getUrl().getMethodParameter(invocation.getMethodName(), Constants.RETRIES_KEY, Constants.DEFAULT_RETRIES) + 1;
        if (len <= 0) {
            len = 1;
        }
        // retry loop.
        RpcException le = null; // last exception.
        //从负载均衡里面获取一个invoker
        List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyinvokers.size()); // invoked invokers.
        Set<String> providers = new HashSet<String>(len);
        for (int i = 0; i < len; i++) {
            //重试时，进行重新选择，避免重试时invoker列表已发生变化.
            //注意：如果列表发生了变化，那么invoked判断会失效，因为invoker示例已经改变
            if (i > 0) {
                checkWhetherDestroyed();
                copyinvokers = list(invocation);
                //重新检查一下
                checkInvokers(copyinvokers, invocation);
            }
            Invoker<T> invoker = select(loadbalance, invocation, copyinvokers, invoked);
            invoked.add(invoker);
            RpcContext.getContext().setInvokers((List) invoked);
            try {
                Result result = invoker.invoke(invocation);
                if (le != null && logger.isWarnEnabled()) {
                    logger.warn("Although retry the method " + invocation.getMethodName()
                            + " in the service " + getInterface().getName()
                            + " was successful by the provider " + invoker.getUrl().getAddress()
                            + ", but there have been failed providers " + providers
                            + " (" + providers.size() + "/" + copyinvokers.size()
                            + ") from the registry " + directory.getUrl().getAddress()
                            + " on the consumer " + NetUtils.getLocalHost()
                            + " using the dubbo version " + Version.getVersion() + ". Last error is: "
                            + le.getMessage(), le);
                }
                return result;
            } catch (RpcException e) {
                if (e.isBiz()) { // biz exception.
                    throw e;
                }
                le = e;
            } catch (Throwable e) {
                le = new RpcException(e.getMessage(), e);
            } finally {
                providers.add(invoker.getUrl().getAddress());
            }
        }
        throw new RpcException(le != null ? le.getCode() : 0, "Failed to invoke the method "
                + invocation.getMethodName() + " in the service " + getInterface().getName()
                + ". Tried " + len + " times of the providers " + providers
                + " (" + providers.size() + "/" + copyinvokers.size()
                + ") from the registry " + directory.getUrl().getAddress()
                + " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
                + Version.getVersion() + ". Last error is: "
                + (le != null ? le.getMessage() : ""), le != null && le.getCause() != null ? le.getCause() : le);
    }

}

FailfastClusterInvoker类

**
 * 快速失败，只发起一次调用，失败立即报错，通常用于非幂等性的写操作。
 */
public class FailfastClusterInvoker<T> extends AbstractClusterInvoker<T> {
	// ......
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        checkInvokers(invokers, invocation);
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        try {
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            if (e instanceof RpcException && ((RpcException) e).isBiz()) { // biz exception.
                throw (RpcException) e;
            }
            throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failfast invoke providers " + invoker.getUrl() + " " + loadbalance.getClass().getSimpleName() + " select from all providers " + invokers + " for service " + getInterface().getName() + " method " + invocation.getMethodName() + " on consumer " + NetUtils.getLocalHost() + " use dubbo version " + Version.getVersion() + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
        }
    }
}

看代码我们知道FailoverClusterInvoker（失败转移）里面有个for循环，当有异常的时候，是没有直接抛出去的，而是做了异常存储，默认重试三次。而FailfastClusterInvoker（快速失败）有异常就直接抛出来了，失败就立即报错，不会再去重试了。

应用场景

**dubbo中”读接口”和”写接口”有什么区别?**答案也是很明显的,因为默认FailoverCluster会重试,如果是”写”类型的接口,如果在网络抖动情况下写入多个值,所以”写”类型的接口要换成FailfastCluster。