zookeeper leader选举

Leader选举

在实际使用ZooKeeper开发中,我们最常用的是Apache Curator。 在使用ZK API开发时会遇到让人头疼的几个问题,ZK连接管理、SESSION失效等一些异常问题的处理,Curator替我们解决了这些问题,通过对ZK连接状态的监控来做出相应的重连等操作,并触发事件。

更好的地方是Curator对ZK的一些应用场景提供了非常好的实现,而且有很多扩充,这些都符合ZK使用规范。
它的主要组件为:

  • Recipes: ZooKeeper的系列recipe实现, 基于 Curator Framework.**
  • Framework: 封装了大量ZooKeeper常用API操作,降低了使用难度, 基于Zookeeper增加了一些新特性,对ZooKeeper链接的管理,对链接丢失自动重新链接。**
  • Utilities: 一些ZooKeeper操作的工具类包括ZK的集群测试工具路径生成等非常有用,在Curator-Client包下org.apache.curator.utils。
  • Client: ZooKeeper的客户端API封装,替代官方 ZooKeeper class,解决了一些繁琐低级的处理,提供一些工具类。
  • Errors: 异常处理, 连接异常等
  • Extensions: 对curator-recipes的扩展实现,拆分为curator-: stuck_out_tongue_closed_eyes: iscovery和curator-: stuck_out_tongue_closed_eyes: iscovery-server提供基于RESTful的Recipes WEB服务.

在分布式计算中, leader election是很重要的一个功能, 这个选举过程是这样子的: 指派一个进程作为组织者,将任务分发给各节点。 在任务开始前, 哪个节点都不知道谁是leader或者coordinator. 当选举算法开始执行后, 每个节点最终会得到一个唯一的节点作为任务leader.
除此之外, 选举还经常会发生在leader意外宕机的情况下,新的leader要被选举出来。

Curator 有两种选举recipe, 可以根据需求选择合适的。

Leader latch

使用LeaderLatch类选举的例子.

构造函数如下:

1
2
public LeaderLatch(CuratorFramework client, String latchPath)
public LeaderLatch(CuratorFramework client, String latchPath, String id)

启动LeaderLatch: leaderLatch.start();

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/**
* Add this instance to the leadership election and attempt to acquire leadership.
*
* @throws Exception errors
*/
public void start() throws Exception
{
Preconditions.checkState(state.compareAndSet(State.LATENT, State.STARTED), "Cannot be started more than once");
startTask.set(AfterConnectionEstablished.execute(client, new Runnable()
{
@Override
public void run()
{
try
{
internalStart();
}
finally
{
startTask.set(null);
}
}
}));
}

一旦启动, LeaderLatch会和其它使用相同latch path的其它LeaderLatch交涉,然后随机的选择其中一个作为leader。 可以随时查看一个给定的实例是否是leader:

1
public boolean hasLeadership()

LeaderLatch在请求成为leadership时有block方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public void await()
throws InterruptedException, EOFException
{
synchronized (this)
{
while ((this.state.get() == State.STARTED) && (!this.hasLeadership.get()))
{
wait();
}
}
if (this.state.get() != State.STARTED)
{
throw new EOFException();
}
}
Causes the current thread to wait until this instance acquires leadership
unless the thread is interrupted or closed.
public boolean await(long timeout,
TimeUnit unit)
throws InterruptedException

一旦不使用LeaderLatch了,必须调用close方法。 如果它是leader,会释放leadership, 其它的参与者将会重新选举一个leader。

异常处理
LeaderLatch实例可以增加ConnectionStateListener来监听网络连接问题。 当 SUSPENDED 或 LOST 时, leader不再认为自己还是leader.当LOST 连接重连后 RECONNECTED,LeaderLatch会删除先前的ZNode然后重新创建一个.
LeaderLatch用户必须考虑导致leadership丢失的连接问题。 强烈推荐你使用ConnectionStateListener。

LeaderLatch案例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
package com.zsr.test.curator;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.CuratorFrameworkFactory;
import org.apache.curator.framework.recipes.leader.LeaderLatch;
import org.apache.curator.retry.ExponentialBackoffRetry;
import org.apache.curator.utils.CloseableUtils;
import org.apache.curator.test.TestingServer;
import com.google.common.collect.Lists;
public class LeaderLatchExample {
private static final int CLIENT_QTY = 10;
private static final String PATH = "/examples/leader";
public static void main(String[] args) throws Exception {
List<CuratorFramework> clients = Lists.newArrayList();
List<LeaderLatch> examples = Lists.newArrayList();
TestingServer server = new TestingServer();
try {
for (int i = 0; i < CLIENT_QTY; ++i) {
CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(), new ExponentialBackoffRetry(1000, 3));
clients.add(client);
LeaderLatch example = new LeaderLatch(client, PATH, "Client #" + i);
examples.add(example);
client.start();
example.start();
}
Thread.sleep(2000);
LeaderLatch currentLeader = null;
for (int i = 0; i < CLIENT_QTY; ++i) {
LeaderLatch example = examples.get(i);
if (example.hasLeadership())
currentLeader = example;
}
System.out.println("current leader is " + currentLeader.getId());
System.out.println("release the leader " + currentLeader.getId());
currentLeader.close();
examples.get(0).await(2, TimeUnit.SECONDS);
System.out.println("Client #0 maybe is elected as the leader or not although it want to be");
System.out.println("the new leader is " + examples.get(0).getLeader().getId());
System.out.println("Press enter/return to quit\n");
new BufferedReader(new InputStreamReader(System.in)).readLine();
} catch (Exception e) {
e.printStackTrace();
} finally {
System.out.println("Shutting down...");
for (LeaderLatch exampleClient : examples) {
CloseableUtils.closeQuietly(exampleClient);
}
for (CuratorFramework client : clients) {
CloseableUtils.closeQuietly(client);
}
CloseableUtils.closeQuietly(server);
}
}
}

首先创建了10个LeaderLatch,启动后它们中的一个会被选举为leader。 因为选举会花费一些时间,start后并不能马上就得到leader。
通过hasLeadership查看自己是否是leader, 如果是的话返回true。
可以通过.getLeader().getId()可以得到当前的leader的ID。
只能通过close释放当前的领导权。
await是一个阻塞方法, 尝试获取leader地位,但是未必能选中。

Leader Election

Curator还提供了另外一种选举方法。
主要涉及以下四个类:

  • LeaderSelector
  • LeaderSelectorListener
  • LeaderSelectorListenerAdapter
  • CancelLeadershipException

重要的是LeaderSelector类,它的构造函数为:

1
2
public LeaderSelector(CuratorFramework client, String mutexPath,LeaderSelectorListener listener)
public LeaderSelector(CuratorFramework client, String mutexPath, ThreadFactory threadFactory, Executor executor, LeaderSelectorListener listener)

类似LeaderLatch,必须start: leaderSelector.start();
一旦启动,当实例取得领导权时你的listener的takeLeadership()方法被调用. 而takeLeadership()方法只有领导权被释放时才返回。
当你不再使用LeaderSelector实例时,需要调用它的close方法

异常处理
LeaderSelectorListener类继承ConnectionStateListener。LeaderSelector必须小心连接状态的改变. 如果实例成为leader, 它应该响应SUSPENDED 或 LOST. 当 SUSPENDED 状态出现时, 实例必须假定在重新连接成功之前它可能不再是leader了。 如果LOST状态出现, 实例不再是leader, takeLeadership方法返回.

注意:推荐处理方式是当收到SUSPENDED 或 LOST时抛出CancelLeadershipException异常. 这会导致LeaderSelector实例中断并取消执行takeLeadership方法的异常. 这非常重要, 你必须考虑扩展LeaderSelectorListenerAdapter. LeaderSelectorListenerAdapter提供了推荐的处理逻辑。

LeaderSelector案例

首先创建一个ExampleClient类, 它继承LeaderSelectorListenerAdapter, 它实现了takeLeadership方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
package com.zsr.test.curator;
import java.io.Closeable;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.recipes.leader.LeaderSelector;
import org.apache.curator.framework.recipes.leader.LeaderSelectorListenerAdapter;
/**
* An example leader selector client. Note that {@link LeaderSelectorListenerAdapter} which has the
* recommended handling for connection state issues
*/
public class ExampleClient extends LeaderSelectorListenerAdapter implements Closeable {
private final String name;
private final LeaderSelector leaderSelector;
private final AtomicInteger leaderCount = new AtomicInteger();
public ExampleClient(CuratorFramework client, String path, String name) {
this.name = name;
// create a leader selector using the given path for management
// all participants in a given leader selection must use the same path
// ExampleClient here is also a LeaderSelectorListener but this isn't required
leaderSelector = new LeaderSelector(client, path, this);
// for most cases you will want your instance to requeue when it relinquishes leadership
leaderSelector.autoRequeue();
}
public void start() throws IOException {
// the selection for this instance doesn't start until the leader selector is started
// leader selection is done in the background so this call to leaderSelector.start() returns
// immediately
leaderSelector.start();
}
@Override
public void close() throws IOException {
leaderSelector.close();
}
@Override
public void takeLeadership(CuratorFramework client) throws Exception {
// we are now the leader. This method should not return until we want to relinquish leadership
final int waitSeconds = (int) (5 * Math.random()) + 1;
System.out.println(name + " is now the leader. Waiting " + waitSeconds + " seconds...");
System.out.println(name + " has been leader " + leaderCount.getAndIncrement() + " time(s) before.");
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(waitSeconds));
} catch (InterruptedException e) {
System.err.println(name + " was interrupted.");
Thread.currentThread().interrupt();
} finally {
System.out.println(name + " relinquishing leadership.\n");
}
}
}

可以在takeLeadership进行任务的分配等等,并且不要返回,如果你想要要此实例一直是leader的话可以加一个死循环。
leaderSelector.autoRequeue();:保证在此实例释放领导权之后还可能获得领导权。
在这里我们使用AtomicInteger来记录此client获得领导权的次数, 它是公平的, 每个client有平等的机会获得领导权。

测试代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
package com.zsr.test.curator;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.util.List;
import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.CuratorFrameworkFactory;
import org.apache.curator.retry.ExponentialBackoffRetry;
import org.apache.curator.test.TestingServer;
import org.apache.curator.utils.CloseableUtils;
import com.google.common.collect.Lists;
public class LeaderSelectorExample {
private static final int CLIENT_QTY = 10;
private static final String PATH = "/examples/leader";
public static void main(String[] args) throws Exception {
// all of the useful sample code is in ExampleClient.java
System.out.println("Create " + CLIENT_QTY
+ " clients, have each negotiate for leadership and then wait a random number of seconds before letting another leader election occur.");
System.out.println(
"Notice that leader election is fair: all clients will become leader and will do so the same number of times.");
List<CuratorFramework> clients = Lists.newArrayList();
List<ExampleClient> examples = Lists.newArrayList();
TestingServer server = new TestingServer();
try {
for (int i = 0; i < CLIENT_QTY; ++i) {
CuratorFramework client = CuratorFrameworkFactory.newClient(server.getConnectString(),
new ExponentialBackoffRetry(1000, 3));
clients.add(client);
ExampleClient example = new ExampleClient(client, PATH, "Client #" + i);
examples.add(example);
client.start();
example.start();
}
System.out.println("Press enter/return to quit\n");
new BufferedReader(new InputStreamReader(System.in)).readLine();
} finally {
System.out.println("Shutting down...");
for (ExampleClient exampleClient : examples) {
CloseableUtils.closeQuietly(exampleClient);
}
for (CuratorFramework client : clients) {
CloseableUtils.closeQuietly(client);
}
CloseableUtils.closeQuietly(server);
}
}
}

与LeaderLatch相比, 通过LeaderSelectorListener可以对领导权进行控制, 在适当的时候释放领导权,这样每个节点都有可能获得领导权。 而LeaderLatch除非调用close方法,否则它不会释放领导权。

热评文章