1.修复 ZooKeeper 数据视不一致
2.ZK currentEpoch&acceptedEpoch
3.Zookeeper ç论åºç¡
修复 ZooKeeper 数据视不一致
ZooKeeper是一个开源的分布式系统协调中间件,常用于实现选主、pub/sub模式、分布式锁等功能。ZooKeeper的核心是Zab协议,即ZooKeeper原子广播协议。彩虹量能线源码Zookeeper通过ZKDatabase保存数据,数据结构是DataTree,它维护了一个路径到DataNode的哈希表。Snapshot是DataTree序列化后保存在磁盘的一系列文件,而启动时Zookeeper会使用磁盘上的Snapshot在内存中重建DataTree。
Zookeeper的客户端在进行读操作时,会从本地服务器获取数据,与Raft模型中的非领导者节点不同。每个事务都有一个zxid标识,它是全局唯一的位整数,由位的Javassm商城项目源码Epoch和位的自增ID(Counter)组成。每次事务提交,Counter加1,当有新成员当选领导者时,Epoch加1。当zxid溢出时,会触发选举并重置为0。
ZooKeeper提供了一个类似文件系统的API,用于组织和操作Znode。看android源码工具Znode以树状结构排列,并支持create、delete、getData和setData等操作。
ZooKeeper的一致性保证属于ordered sequential consistency,即在写操作中保证线性一致性,在读操作中只保证顺序一致性。这意味着,eclipse推箱子源码客户端A更新ZnodeZ后,客户端B读取Z时,B可能无法立即读到最新值,但在读到最新值后,B不应该再读到任何过期数据。
在解决服务在ZooKeeper节点A上更新数据后,一段时间内读取到过期数据的问题时,发现了一个名为ZOOKEEPER-的键盘钩子记录源码类似问题。该问题在新版本的ZooKeeper中已经修复。通过分析ZooKeeper的工作过程,我们可以了解到其Zab协议包括选举、恢复和广播三个阶段。
每个成员在启动时会进入选举状态,选出lastZxid最大的节点作为领导者。领导者进入恢复阶段,首先通过选票找到当前领导者,然后与领导者同步已提交事务,确保本地数据副本与领导者一致。同步策略包括DIFF Sync、TRUNC Sync和SNAP Sync。DIFF Sync通过发送一系列的PROPOSAL和COMMIT消息进行数据同步。TRUNC Sync通过删除比leader更大的事务进行同步。SNAP Sync则通过发送快照进行数据恢复。在恢复完成后,领导者向follower发送NEWLEADER消息,等待多数follower确认后,发送UPTODATE消息,follower接收到UPTODATE并确认后开始对外提供服务。
当follower在接收到NEWLEADER消息后,ACK NEWLEADER之前需要持久化所有未提交的DIFF Sync Proposal。这样可以确保在领导者永久下线并最终导致客户端认为已提交请求在同步过程中被丢弃的情况下,数据一致性得到保证。通过这种方式,ZooKeeper成功修复了ZOOKEEPER-问题。
在理解了问题的原因和修复方式后,我司通过将关键的patch回滚到v3.5.9版本,并从3.4.升级到3.5.9来解决实际问题。经过一系列的调试和修正,最终解决了Socket关闭导致的单测失败问题,完成了对ZOOKEEPER-问题的修复及版本回滚。
总结来看,分布式系统的设计和维护充满挑战,需要细致入微的调试和深入的理解。通过这次经验,我深刻认识到分布式系统的复杂性和工作背后的辛勤付出,同时也对维护和优化分布式系统环境的重要性有了更深刻的认识。
ZK currentEpoch&acceptedEpoch
å¨åå¤æºæ¿kafkaåZKæ¼ç»æ¶åç°,å½åé群çzkèç¹å å ¥æ°é群æ¶,åºç°æ¥é
Leaders epoch, 6 is less than accepted epoch, 9
æ¥ç/data/zookeeper/data/version-2ç®å½ä¸ç¡®å®æ2个æ件,åå«æ¯
acceptedEpochãcurrentEpoch,è¿2个æ件éçå¼é½æ¯9
è¿æ¯ä¸ºä»ä¹å¢?è¿ä¸¤ä¸ªæ件æ¯åä»ä¹ç?
è¿ä¸¤ä¸ªæ件åå«åæ äºæå®çserverè¿ç¨å·²ç»çå°çååä¸çepoch numberã尽管è¿äºæ件ä¸å å«ä»»ä½åºç¨çº§å«çæ°æ®ï¼ä½ä»ä»¬å¯¹äºæ°æ®ä¸è´æ§æ¥è¯´å¾éè¦ï¼å³å®äºé群çé主è½å¦æå.
.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-
è¿ä¸¤ä¸ªåé主è¦æ¯ä¸ºäºè§£å³é群失败æ¢å¤çåºæ¯
As mentioned, the implementation up to version 3.3.3 has not included epoch variables acceptedEpoch and currentEpoch. This omission has generated problems [5]
(issue ZOOKEEPER- in Apacheâs issue tracking system) in a production version
and was noticed by many ZooKeeper clients. The origin of this problem is at the beginning of Recovery Phase (Algorithm 4 line 2), when the leader increments its epoch
(contained in lastZxid) even before acquiring a quorum of successfully connected followers (such leader is called false leader ). Since a follower goes back to FLE if its
epoch is larger than the leaderâs epoch (line ), when a false leader drops leadership
and becomes a follower of a leader from a previous epoch, it finds a smaller epoch (line
ç®åæ¥è¯´å°±æ¯: 以åæ¯ä¸åºåacceptedEpoch å currentEpochçï¼ä»¥åepochæ¯ç´æ¥ä»zxidä¸åä½éæåçãä½è¿ä¼å¯¼è´ä¸ä¸ªé®é¢ï¼å设æä¸ä¸ªæå¡å¨s1, s2, s3. é群s1ås2åå¾èç³»ï¼ä¸s1为leaderï¼s3为LOOKING:
s2éå¯ï¼å ä¸s3çé票ï¼å°s3é为leader
s3æèªå·±å½åleaderï¼ä¸epoch+1ï¼ä½æ æ³ä¸å ¶å®serveråå¾èç³»ãæ¤æ¶s1è¿æ¯è®¤ä¸ºèªå·±æ¯leader(åæä¼é®ä¸ºä»ä¹)ã
s2æ æ³ä¸s3åå¾èç³»ï¼åæ¶æ¶å°s1çLEADINGä¿¡æ¯ï¼ä¾¿åå°s1çæ§é群é
s3æ æ³ä¸ä»äººåå¾èç³»ï¼éåºleadershipï¼åå°FLEï¼å¹¶æ¶å°æ§é群leader s1çæ¶æ¯ï¼ä¾¿ä½ä¸ºfollowerä¹åå°æ§é群é
s3ä½ä¸ºfolloweråç°èªå·±çepochæ¯æ§leaderçepochè¿å¤§ï¼ä¾¿ååå°FLE
ä¹ås3å°±ä¸æå¨4å5ä¹é´å¾å¾ï¼ä¸æå¨FLEé¶æ®µåRECOVERé¶æ®µå¾ªç¯ã
è³äºä¸ºä»ä¹s1èªè®¤ä¸ºèªå·±æ¯leader, æ¯å 为leaderæä¸ä¸ªç¼åæ¶é´å¯¼è´leaderä¸ä¼å 为æäºç¬æ¶æ éèç»æèªå·±çä»»æ.
è¿ä¸ªç¼åæ¶é´çåçæ¯:å¿è·³å
å¨å¿è·³å 以å leader1æ£æµä¸å°leader2åleader3çlearnHandler线ç¨æ»äº¡,å èleaderç¶æä¿æææ,ä» ä» æ¯ç¶æ表示æ è¯,ä¸ä¼å½±ååæä½,å 为åæä½ä¼è¦æ±åæ°ä»¥ä¸èç¹ååº,èè¿ä¸ªæ¶é´ç«¯è¿ä¸ªè¦æ±æ¯ä¸æ»¡è¶³ç.
é£ä¹acceptedEpochåcurrentEpochæ¯æä¹è§£å³æ éæ¢å¤é®é¢çå¢?
if (newEpoch > self.getAcceptedEpoch()) {
wrappedEpochBytes.putInt((int) self.getCurrentEpoch());
self.setAcceptedEpoch(newEpoch);
} else if (newEpoch == self.getAcceptedEpoch()) {
// since we have already acked an epoch equal to the leaders, we cannot ack
// again, but we still need to send our lastZxid to the leader so that we can
// sync with it if it does assume leadership of the epoch.
// the -1 indicates that this reply should not count as an ack for the new epoch
wrappedEpochBytes.putInt(-1);
} else {
throw new IOException("Leaders epoch, "
+ newEpoch
+ " is less than accepted epoch, "
+ self.getAcceptedEpoch());
ç´æ¥æ¥é,强å¶ä¸å 许大äºleaderçepochçèç¹å å ¥é群
Zookeeper ç论åºç¡
ZooKeeper ç±é èç 究é¢å¼åï¼åæ¥æèµ ç»äº ApacheãZooKeeper æ¯ä¸ä¸ªå¼æºçåå¸å¼åºç¨ç¨åºåè°æå¡å¨ï¼å ¶ä¸ºåå¸å¼ç³»ç»æä¾ä¸è´æ§æå¡ãå ¶ä¸è´æ§æ¯éè¿åºäº Paxos ç®æ³çZAB åè®®å®æçãå ¶ä¸»è¦åè½å æ¬ï¼é 置维æ¤ãååæå¡ãåå¸å¼åæ¥ãé群管ççãzookeeper çå®ç½ï¼ mit åè½ãå ·ä½çä¸é¢çæè¿°ã
Paxos ç®æ³ç 3PC æ§è¡è¿ç¨åå为ä¸ä¸ªé¶æ®µï¼åå¤é¶æ®µ prepareãæ¥åé¶æ®µ acceptï¼ä¸æ交é¶æ®µ commitã
è¥ææ¡è æ¥æ¶å°çåé¦æ°éè¶ è¿äºåæ°ï¼åå ¶ä¼åå¤å¹¿æ两类信æ¯ï¼
2PC ä¸ 3PC çåºå«æ¯ï¼å¨ææ¡è æ¥æ¶å°è¶ è¿åæ°ç表å³è å¯¹äº parepare é¶æ®µçåé¦åï¼å ¶ä¼åææ表å³è åéçæ£çææ¡ proposalãå½è¡¨å³è æ¥åå° proposal åå°±ç´æ¥å°å ¶åæ¥å°äºæ¬å°ï¼ä¸ç¨åçå¾ commit æ¶æ¯äºã
é£ä¹ï¼ä¸ºä»ä¹ä¸ç´æ¥ä½¿ç¨ 2PCï¼èè¦ä½¿ç¨ 3PC å¢ï¼æ¯å 为 2PC ä¸åå¨çè¾å¤çå¼ç«¯ï¼è¿éå°±ä¸åå±å¼æ¥è¯´äºï¼ãæ以å¾å¤ Paxos å·¥ä¸å®ç°ä½¿ç¨çé½æ¯ 3PC æ交ãä½ 2PC æ交çæçè¦é«äº 3PC æ交ï¼æ以å¨ä¿è¯ä¸åºé®é¢çæ åµä¸ï¼æ¯å¯ä»¥ä½¿ç¨ 2PC æ交çã
åé¢æè¿°çPaxos ç®æ³å¨å®é å·¥ç¨åºç¨è¿ç¨ä¸ï¼æ ¹æ®ä¸åçå®é éæ±åå¨è¯¸å¤ä¸ä¾¿ä¹å¤ï¼ æ以ä¹å°±åºç°äºå¾å¤å¯¹äºåºæ¬ Paxos ç®æ³çä¼åç®æ³ï¼ä»¥å¯¹ Paxos ç®æ³è¿è¡æ¹è¿ï¼ä¾å¦ï¼Multi PaxosãFast PaxosãEPaxosã
ä¾å¦ï¼Paxos ç®æ³åå¨âæ´»éé®é¢âï¼Fast Paxos ç®æ³å¯¹ Paxos ç®æ³è¿è¡äºæ¹è¿ï¼åªå 许ä¸ä¸ªè¿ç¨æ交ææ¡ï¼å³è¯¥è¿ç¨å ·æ对 N çå¯ä¸æä½æã该æ¹å¼è§£å³äºâæ´»éâé®é¢ã
ZAB ï¼Zookeeper Atomic Broadcastï¼zk ååæ¶æ¯å¹¿æåè®®ï¼æ¯ä¸ä¸º ZooKeeper 设计çä¸ç§æ¯æå´©æºæ¢å¤çåå广æåè®®ï¼å¨ Zookeeper ä¸ï¼ä¸»è¦ä¾èµ ZAB åè®®æ¥å®ç°åå¸å¼æ°æ®ä¸è´æ§ã
Zookeeper 使ç¨ä¸ä¸ªåä¸ä¸»è¿ç¨æ¥æ¥æ¶å¹¶å¤ç客æ·ç«¯çææäºå¡è¯·æ±ï¼å³å请æ±ãå½æå¡å¨æ°æ®çç¶æåçåæ´åï¼é群éç¨ ZAB åå广æåè®®ï¼ä»¥äºå¡ææ¡ Proposal çå½¢å¼å¹¿æå°ææçå¯æ¬è¿ç¨ä¸ãZAB åè®®è½å¤ä¿è¯ä¸ä¸ªå ¨å±çåæ´åºåï¼å³å¯ä»¥ä¸ºæ¯ä¸ä¸ªäºå¡åé ä¸ä¸ªå ¨å±çéå¢ç¼å· xidã
å½ Zookeeper 客æ·ç«¯è¿æ¥å° Zookeeper é群çä¸ä¸ªèç¹åï¼è¥å®¢æ·ç«¯æ交çæ¯è¯»è¯·æ±ï¼ é£ä¹å½åèç¹å°±ç´æ¥æ ¹æ®èªå·±ä¿åçæ°æ®å¯¹å ¶è¿è¡ååºï¼å¦ææ¯å请æ±ä¸å½åèç¹ä¸æ¯Leaderï¼é£ä¹èç¹å°±ä¼å°è¯¥å请æ±è½¬åç» Leaderï¼Leader ä¼ä»¥ææ¡çæ¹å¼å¹¿æ该åæä½ï¼åªè¦æè¶ è¿åæ°èç¹åæ该åæä½ï¼å该åæä½è¯·æ±å°±ä¼è¢«æ交ãç¶å Leader ä¼å次广æç»ææ订é è ï¼å³ Learnerï¼éç¥å®ä»¬åæ¥æ°æ®ã
ZAB åè®®æ¯ Paxos ç®æ³çä¸ç§å·¥ä¸å®ç°ç®æ³ãä½ä¸¤è ç设计ç®æ ä¸å¤ªä¸æ ·ãZAB å议主è¦ç¨äºæ建ä¸ä¸ªé«å¯ç¨çåå¸å¼æ°æ®ä¸»ä»ç³»ç»ï¼å³ Follower æ¯ Leader çä»æºï¼Leader æäºï¼ 马ä¸å°±å¯ä»¥é举åºä¸ä¸ªæ°ç Leaderï¼ä½å¹³æ¶å®ä»¬é½å¯¹å¤æä¾æå¡ãè Fast Paxos ç®æ³åæ¯ç¨äºæ建ä¸ä¸ªåå¸å¼ä¸è´æ§ç¶ææºç³»ç»ï¼ç¡®ä¿ç³»ç»ä¸å个èç¹çç¶æé½æ¯ä¸è´çã
å¦å¤ï¼ZAB è¿ä½¿ç¨ Google ç Chubby ç®æ³ä½ä¸ºåå¸å¼éçå®ç°ï¼è Google ç Chubby ä¹æ¯ Paxos ç®æ³çåºç¨ã
zk é群对äºäºå¡è¯·æ±çå¤çæ¯ Fast Paxos ç®æ³çä½ç°ï¼å³åªå 许 Leader æåºææ¡ãå ¶å±äº 3PC æ交ã
ä½ Leader éä¸¾æ¯ Paxos ç®æ³çä½ç°ï¼å 为 Leader å®æºåï¼ææ Follower åå¯æ交ææ¡ï¼ å®ä»¬å¨æåé½æ¯âæéæâãå ¶å±äº 2PC æ交ã
为äºé¿å Zookeeper çåç¹é®é¢ï¼zk ä¹æ¯ä»¥é群çå½¢å¼åºç°çãzk é群ä¸çè§è²ä¸»è¦æ以ä¸ä¸ç±»ï¼
Learnerï¼å¦ä¹ è ï¼åæ¥è ã
Learner = Follower + Observer
QuorumPeer = Participant = Leader + Follower
å¨ ZAB ä¸æä¸ä¸ªå¾éè¦çæ°æ®ï¼
ZAB åè®®ä¸å¯¹zkServer çç¶ææè¿°æä¸ç§æ¨¡å¼ãè¿ä¸ç§æ¨¡å¼å¹¶æ²¡æååææ¾çç线ï¼å®ä»¬ç¸äºäº¤ç»å¨ä¸èµ·ã
zk é群ä¸çæ¯ä¸å°ä¸»æºï¼å¨ä¸åçé¶æ®µä¼å¤äºä¸åçç¶æãæ¯ä¸å°ä¸»æºå ·æåç§ç¶æã
å¨é群å¯å¨è¿ç¨ä¸ï¼æ Leader å®æºåï¼é群就è¿å ¥äºæ¢å¤æ¨¡å¼ãæ¢å¤æ¨¡å¼ä¸æéè¦çé¶æ®µå°±æ¯ Leader é举ã
AãserverId
è¿æ¯zk é群ä¸æå¡å¨çå¯ä¸æ è¯ï¼ä¹ç§°ä¸º sidï¼å ¶å®è´¨å°±æ¯ zk ä¸é ç½®ç myidãä¾å¦ï¼ æä¸ä¸ª zk æå¡å¨ï¼é£ä¹ç¼å·åå«æ¯ 1,2,3ã
Bã é»è¾æ¶é
é»è¾æ¶éï¼Logicalclockï¼æ¯ä¸ä¸ªæ´åæ°ï¼è¯¥æ¦å¿µå¨é举æ¶ç§°ä¸º logicalclockï¼èå¨é举ç»æå称为epochãå³ epoch ä¸ logicalclock æ¯åä¸ä¸ªå¼ï¼å¨ä¸åæ åµä¸çä¸åå称ã
å¨é群å¯å¨è¿ç¨ä¸ç Leader é举è¿ç¨ï¼ç®æ³ï¼ä¸ Leader æè¿åç Leader é举è¿ç¨ç¨å¾®æä¸äºåºå«ï¼åºæ¬ç¸åã
Aãé群å¯å¨ä¸ç Leader é举
å¯¹äº Server1 èè¨ï¼å®çæ票æ¯(1, 0)ï¼æ¥æ¶ Server2 çæ票为(2, 0)ãå ¶é¦å ä¼æ¯è¾ä¸¤è ç ZXIDï¼å为 0ï¼åæ¯è¾ myidï¼æ¤æ¶ Server2 ç myid æ大ï¼äºæ¯ Server1 æ´æ°èªå·±çæ票为(2, 0)ï¼ç¶åéæ°æ票ãå¯¹äº Server2 èè¨ï¼å ¶æ é¡»æ´æ°èªå·±çæ票ï¼åªæ¯å次åé群ä¸ææ主æºååºä¸ä¸æ¬¡æ票信æ¯å³å¯ã
(4) ç»è®¡æ票ãæ¯æ¬¡æ票åï¼æå¡å¨é½ä¼ç»è®¡æ票信æ¯ï¼å¤ææ¯å¦å·²ç»æè¿åæºå¨æ¥åå°ç¸åçæ票信æ¯ãå¯¹äº Server1ãServer2 èè¨ï¼é½ç»è®¡åºé群ä¸å·²ç»æ两å°ä¸»æºæ¥åäº(2, 0)çæ票信æ¯ï¼æ¤æ¶ä¾¿è®¤ä¸ºå·²ç»éåºäºæ°ç Leaderï¼å³ Server2ã
(5) æ¹åæå¡å¨ç¶æãä¸æ¦ç¡®å®äº Leaderï¼æ¯ä¸ªæå¡å¨å°±ä¼æ´æ°èªå·±çç¶æï¼å¦ææ¯Followerï¼é£ä¹å°±åæ´ä¸º FOLLOWINGï¼å¦ææ¯ Leaderï¼å°±åæ´ä¸º LEADINGã
(6) æ·»å 主æºãå¨æ°ç Leader é举åºæ¥å Server3 å¯å¨ï¼å ¶æ³ååºæ°ä¸è½®çé举ãä½ç±äºå½åé群ä¸å个主æºçç¶æ并ä¸æ¯ LOOKINGï¼èæ¯åå¸å ¶èçæ£å¸¸æå¡ï¼æä»¥å ¶åªè½æ¯ä»¥Follower ç身份å å ¥å°é群ä¸ã
Bã å®æºåç Leader é举
å¨ Zookeeper è¿è¡æé´ï¼Leader ä¸é Leader æå¡å¨åå¸å ¶èï¼å³ä¾¿å½æé Leader æå¡å¨å®æºææ°å å ¥æ¶ä¹ä¸ä¼å½±å Leaderãä½æ¯è¥ Leader æå¡å¨æäºï¼é£ä¹æ´ä¸ªé群å°æå对å¤æå¡ï¼è¿å ¥æ°ä¸è½®ç Leader é举ï¼å ¶è¿ç¨åå¯å¨æ¶æç Leader é举è¿ç¨åºæ¬ä¸è´ã
åé¢æ们说è¿ï¼æ¢å¤æ¨¡å¼å ·æ两个é¶æ®µï¼Leader é举ä¸åå§ååæ¥ãå½å®æ Leader é举åï¼æ¤æ¶ç Leader è¿æ¯ä¸ä¸ªå Leaderï¼å ¶è¦ç»è¿åå§ååæ¥åæè½å为çæ£ç Leaderã
å ·ä½è¿ç¨å¦ä¸ï¼
å½é群ä¸ç Learner å®æäºåå§åç¶æåæ¥ï¼é£ä¹æ´ä¸ª zk é群就è¿å ¥å°äºæ£å¸¸å·¥ä½æ¨¡å¼äºã
å¦æé群ä¸ç Learner èç¹æ¶å°å®¢æ·ç«¯çäºå¡è¯·æ±ï¼é£ä¹è¿äº Learner ä¼å°è¯·æ±è½¬åç»Leader æå¡å¨ãç¶ååæ§è¡å¦ä¸çå ·ä½è¿ç¨ï¼
Observer æ°é并ä¸æ¯è¶å¤è¶å¥½ï¼ä¸è¬ä¸ Follower æ°éç¸åãå 为 Observer æ°éçå¢å¤è½ä¸ä¼å¢å äºå¡æä½ååï¼ä½å ¶éè¦ä» Leader åæ¥æ°æ®ï¼Observer åæ¥æ°æ®çæ¶é´æ¯å°äºçäº Follower åæ¥æ°æ®çæ¶é´çãå½ Follower åæ¥æ°æ®å®æï¼Leader ç Observer å表ä¸çObserver 主æºå°ç»æåæ¥ãé£äºå®æåæ¥ç Observer å°ä¼è¿å ¥å°å¦ä¸ä¸ªå¯¹å¤æä¾æå¡çå表ãé£ä¹ï¼é£äºæ²¡æåæ¥äºæ°æ®æ æ³æä¾æå¡ç Observer 主æºå°±å½¢æäºèµæºæµªè´¹ã
æ以ï¼å¯¹äºäºå¡æä½åçé¢ç¹çç³»ç»ï¼ä¸å»ºè®®ä½¿ç¨è¿å¤ç Observerã
Leader ä¸ä¿åç Observer åè¡¨å ¶å®æ两个ï¼
allï¼å å«ææ Observerã
serviceï¼å·²ç»å®æäºä» Leader åæ¥æ°æ®çä»»å¡ãservice <= allãå ¶æ¯å¨æçã
Leader ä¸ä¿åç Follower åè¡¨å ¶å®ä¹æ两个ï¼
allï¼è¦æ±å ¶ä¸å¿ é¡»æè¿åç Follower åLeader åé¦ACK
serviceï¼
å½é群æ£å¨å¯å¨è¿ç¨ä¸ï¼æ Leader å´©æºåï¼é群就è¿å ¥äºæ¢å¤æ¨¡å¼ã对äºè¦æ¢å¤çæ°æ®ç¶æéè¦éµå¾ªä¸ä¸ªååã
è¥éç¾¤ä¸ Leader æ¶å°ç Follower å¿è·³æ°é没æè¿åï¼æ¤æ¶ Leader ä¼èªè®¤ä¸ºèªå·±ä¸é群çè¿æ¥å·²ç»åºç°äºé®é¢ï¼å ¶ä¼ä¸»å¨ä¿®æ¹èªå·±çç¶æ为 LOOKINGï¼å»æ¥æ¾æ°ç Leaderã
èå ¶å® Server ç±äºæè¿åç主æºè®¤ä¸ºå·²ç»ä¸¢å¤±äº Leaderï¼æ以å®ä»¬ä¼åèµ·æ°ç Leaderé举ï¼éåºä¸ä¸ªæ°ç Leaderã
æ£å¸¸æ åµä¸ï¼å½ Leader æ¶å°è¶ è¿åæ° Follower ç ACKs åï¼å°±åå个 Follower 广æCOMMIT æ¶æ¯ï¼æ¹åå个Server æ§è¡è¯¥åæä½äºå¡ãå½å个Server å¨æ¥æ¶å°Leader çCOMMIT æ¶æ¯åå°±ä¼å¨æ¬å°æ§è¡è¯¥åæä½ï¼ç¶åä¼å客æ·ç«¯ååºåæä½æåã
ä½æ¯å¦æå¨éå ¨é¨ Follower æ¶å° COMMIT æ¶æ¯ä¹å Leader å°±æäºï¼è¿å°å¯¼è´ä¸ç§åæï¼é¨å Server å·²ç»æ§è¡äºè¯¥äºå¡ï¼èé¨å Server å°æªæ¶å° COMMIT æ¶æ¯ï¼æä»¥å ¶å¹¶æ²¡ææ§è¡è¯¥äºå¡ãå½æ°ç Leader 被é举åºï¼é群ç»è¿æ¢å¤æ¨¡å¼åéè¦ä¿è¯ææ Server ä¸é½æ§è¡äºé£äºå·²ç»è¢«é¨å Server æ§è¡è¿çäºå¡ã
å½å¨ Leader æ°äºå¡å·²ç»éè¿ï¼å ¶å·²ç»å°è¯¥äºå¡æ´æ°å°äºæ¬å°ï¼ä½ææ Follower è¿é½æ²¡ææ¶å° COMMIT ä¹åï¼Leader å®æºäºï¼æ¤æ¶ï¼ææ Follower æ ¹æ¬å°±ä¸ç¥é该 Proposal çåå¨ãå½æ°ç Leader é举åºæ¥ï¼æ´ä¸ªé群è¿å ¥æ£å¸¸æå¡ç¶æåï¼ä¹åæäºç Leader 主æºéæ°å¯å¨å¹¶æ³¨åæä¸ºäº Followerãè¥é£ä¸ªå«äººæ ¹æ¬ä¸ç¥éç Proposal è¿ä¿çå¨é£ä¸ªä¸»æºï¼é£ä¹å ¶æ°æ®å°±ä¼æ¯å ¶å®ä¸»æºå¤åºäºå 容ï¼å¯¼è´æ´ä¸ªç³»ç»ç¶æçä¸ä¸è´ãæ以ï¼è¯¥ Proposa åºè¯¥è¢«ä¸¢å¼ã类似è¿æ ·åºè¯¥è¢«ä¸¢å¼çäºå¡ï¼æ¯ä¸è½å次åºç°å¨é群ä¸çï¼åºè¯¥è¢«æ¸ é¤ã
åé¢æ们说è¿ï¼æ 论æ¯åæä½æ票ï¼è¿æ¯ Leader é举æ票ï¼é½å¿ é¡»è¿åæè½éè¿ï¼ä¹å°±æ¯è¯´è¥åºç°è¶ è¿åæ°ç主æºå®æºï¼åæ票永è¿æ æ³éè¿ãåºäºè¯¥ç论ï¼ç± 5 å°ä¸»æºææçé群ï¼æå¤åªå 许 2 å°å®æºãèç± 6 å°ææçé群ï¼å ¶æå¤ä¹åªå 许 2 å°å®æºãå³ï¼6 å°ä¸5 å°ç容ç¾è½åæ¯ç¸åçãåºäºæ¤å®¹ç¾è½åçåå ï¼å»ºè®®ä½¿ç¨å¥æ°å°ä¸»æºææé群ï¼ä»¥é¿å èµæºæµªè´¹ã
ä½ä»ç³»ç»ååéä¸è¯´ï¼6 å°ä¸»æºçæ§è½ä¸å®æ¯é«äº 5 å°çãæä»¥ä½¿ç¨ 6 å°ä¸»æºå¹¶ä¸æ¯èµæºæµªè´¹ã
对äºä¸ä¸ªé«å¯ç¨çç³»ç»ï¼é¤äºè¦è®¾ç½®å¤å°ä¸»æºé¨ç½²ä¸ºä¸ä¸ªé群é¿å åç¹é®é¢å¤ï¼è¿éè¦èèå°é群é¨ç½²å¨å¤ä¸ªæºæ¿ãå¤ä¸ªæ¥¼å®ã对äºå¤ä¸ªæºæ¿ã楼å®ä¸é群ä¹æ¯ä¸è½éæé¨ç½²çï¼ ä¸é¢å°±å¤ä¸ªæºæ¿çé¨ç½²è¿è¡åæã
å¨å¤æºæ¿é¨ç½²è®¾è®¡ä¸ï¼è¦å åèèâè¿åååâï¼ä¹å°±æ¯è¯´ï¼å°½éè¦ç¡®ä¿ zk é群ä¸æè¿åçæºå¨è½å¤æ£å¸¸è¿è¡ã
å¨ç产ç¯å¢ä¸ï¼ä¸æºæ¿é¨ç½²æ¯æ常è§çã容ç¾æ§æ好çé¨ç½²æ¹æ¡ãä¸æºæ¿é¨ç½²ä¸è¦æ±æ¯ä¸ªæºæ¿ä¸ç主æºæ°éå¿ é¡»å°äºé群æ»æ°çä¸åã
zk å®æ¹æ²¡æç»åºè¾å¥½çåæºæ¿é¨ç½²ç容ç¾æ¹æ¡ãåªè½æ¯è®©å ¶ä¸ä¸ä¸ªæºæ¿å æè¶ è¿åæ°ç主æºï¼ä½¿å ¶å为主æºæ¿ï¼èå¦ä¸æºæ¿å°äºåæ°ãå½ç¶ï¼è¥ä¸»æºæ¿åºç°é®é¢ï¼åæ´ä¸ªé群ä¼ç«çªã
CAP å®çå称 CAP ååï¼æçæ¯å¨ä¸ä¸ªåå¸å¼ç³»ç»ä¸ï¼Consistencyï¼ä¸è´æ§ï¼ãAvailabilityï¼å¯ç¨æ§ï¼ãPartition toleranceï¼ååºå®¹éæ§ï¼ï¼ä¸è ä¸å¯å ¼å¾ã
对äºåå¸å¼ç³»ç»ï¼ç½ç»ç¯å¢ç¸å¯¹æ¯ä¸å¯æ§çï¼åºç°ç½ç»ååºæ¯ä¸å¯é¿å çï¼å æ¤ç³»ç»å¿ é¡»å ·å¤ååºå®¹éæ§ãä½å ¶å¹¶ä¸è½åæ¶ä¿è¯ä¸è´æ§ä¸å¯ç¨æ§ãCAP åå对äºä¸ä¸ªåå¸å¼ç³»ç»æ¥è¯´ï¼åªå¯è½æ»¡è¶³ä¸¤é¡¹ï¼å³è¦ä¹ CPï¼è¦ä¹ APã
BASE æ¯Basically Availableï¼åºæ¬å¯ç¨ï¼ãSoft stateï¼è½¯ç¶æï¼å Eventually consistentï¼æç»ä¸è´æ§ï¼ä¸ä¸ªçè¯çç®åã
BASE ç论çæ ¸å¿ææ³æ¯ï¼å³ä½¿æ æ³åå°å®æ¶ä¸è´æ§ï¼ä½æ¯ä¸ªç³»ç»é½å¯ä»¥æ ¹æ®èªèº«çä¸å¡ç¹ç¹ï¼éç¨éå½çæ¹å¼æ¥ä½¿ç³»ç»è¾¾å°æç»ä¸è´æ§ã
åºæ¬å¯ç¨æ¯æåå¸å¼ç³»ç»å¨åºç°ä¸å¯é¢ç¥æ éçæ¶åï¼å 许æ失é¨åå¯ç¨æ§ã
æ失ååºæ¶é´ï¼
æ失åè½ï¼
软ç¶æï¼æ¯æå 许系ç»æ°æ®åå¨çä¸é´ç¶æï¼å¹¶è®¤ä¸ºè¯¥ä¸é´ç¶æçåå¨ä¸ä¼å½±åç³»ç»çæ´ä½å¯ç¨æ§ï¼å³å 许系ç»ä¸»æºé´è¿è¡æ°æ®åæ¥çè¿ç¨åå¨ä¸å®å»¶æ¶ã软ç¶æï¼å ¶å®å°±æ¯ä¸ç§ç°åº¦ç¶æï¼è¿æ¸¡ç¶æã
æç»ä¸è´æ§å¼ºè°çæ¯ç³»ç»ä¸ææçæ°æ®å¯æ¬ï¼å¨ç»è¿ä¸æ®µæ¶é´çåæ¥åï¼æç»è½å¤è¾¾å°ä¸ä¸ªä¸è´çç¶æãå æ¤ï¼æç»ä¸è´æ§çæ¬è´¨æ¯éè¦ç³»ç»ä¿è¯æç»æ°æ®è½å¤è¾¾å°ä¸è´ï¼èä¸éè¦å®æ¶ä¿è¯ç³»ç»æ°æ®çä¸è´æ§ã
ä»è¾¾å°ä¸è´æ§çæ¶é´è§åº¦æ¥ååï¼å¯ä»¥å为ï¼
åä»å®¢æ·ç«¯è®¿é®å°çå 容è§åº¦æ¥ååï¼å¯ä»¥å为ï¼
zk éµå¾ªçæ¯ CP ååï¼å³ä¿è¯äºä¸è´æ§ï¼ä½çºç²äºå¯ç¨æ§ãä½ç°å¨åªéå¢ï¼
å½ Leader å®æºåï¼zk é群ä¼é©¬ä¸è¿è¡æ°ç Leader çé举ãä½é举æ¶é¿ä¸è¬å¨ 毫ç§å ï¼æé¿ä¸è¶ è¿ ç§ï¼æ´ä¸ªé举æé´ zk é群æ¯ä¸æ¥å客æ·ç«¯ç读åæä½çï¼å³ zk é群æ¯å¤äºç«çªç¶æçãæ以ï¼å ¶ä¸æ»¡è¶³å¯ç¨æ§ã
è¿é说çzkå¯è½ä¼å¼åèè£ï¼æ¯æçå¨å¤æºæ¿é¨ç½²ä¸ï¼è¥åºç°äºç½ç»è¿æ¥é®é¢ï¼å½¢æå¤ä¸ªååºï¼åå¯è½ä¼åºç°èè£é®é¢ï¼å¯è½ä¼å¯¼è´æ°æ®ä¸ä¸è´ã
ï¼1ï¼æ åµä¸
ï¼2ï¼æ åµäº
ï¼5ï¼æ åµäº