Est. Reading time:
Part - 2 | Uses
In this second part of this blog, we'll look into how ZK is actually used in distriubted systems to implement their needs.
We'll go over some common use cases, like :
- Configuration management
- Group membership
- Simple locks
- Locks without unfairness due to contention
- Write preferring, Read-Write locks.
And later, we'll just see very quickly, how Yahoo! actually used these in their real production systems.
Configuration management
One of the simplest primitives to implement, each node during startup, reads a znode, say Zc
. They set a watch flag on the same. Any changes to the configuration is propagated to everyone.
What happens when the znode
Zc
doesn't exist? How would one handle that? A question for thought.
Group membership
We start by creating a znode for the group, Zg
. When a process of the member start, it creates an ephmeral sequential znode inside Zg
. If the process fails, the said processes ephemeral node will simply be deleted.
Each process in the group that needs information about the group can set a watch in this znode Zg
.
Simple locks
Things get interesting now. The simplest implementation, will have each interested process try and create the same ephemeral node. If it fails, it knows the lock is held by someone else. It can then simply set a watch on this ephemeral node. When there is a notification, it can try creating that said znode again.
If you observe closely, when there are a large number of processes trying to accquire the lock, there is not saying who will get the lock. There is no fairness in this process. No FIFO ordering. This is also called the herd effect
. This effect is seen as a result of too many processes vie'ing for the lock at the same time.
So how do we fix this?
Locks without herd effect
1. n = create(g+"/lock_", EPHEMERAL|SEQUENTIAL) //g is the group znode
2. C = getChildren(g, false)
3. if n is lowest in C
4. //Do something, as it has the lock
5. exit
6. p = znode in g just before n
7. while exists(p, true) wait //watching the previous znode
8. goto 2
Write preferring read-write locks :
Write lock :
1. n = create(g+"/write_", EPHEMERAL|SEQUENTIAL) //g is the group znode
2. C = getChildren(g, false)
3. if n is lowest in C
4. //Do some write, as it has the lock
5. exit
6. p = znode in g just before n
7. while exists(p, true) wait //watching the previous znode
8. goto 2
Read lock:
1. n = create(g+"/read_", EPHEMERAL|SEQUENTIAL) //g is the group znode
2. C = getChildren(g, false)
3. if no write znodelower than n in C
4. //Do some read, as it has the lock
5. exit
6. p = write znode in g just before n
7. while exists(p, true) wait //watching the previous write znode
8. goto 2
Some Yahoo! applications using ZK :
Fetching service
Yahoo! being a web browser, they have a need for crawling the internet. They have many worker processes that craw the internet monitored by a master node. If a worker node fails, the master node has to reassign the work to some other node.
Hence, FS mainly uses ZK for configuration metadata, membership and leader election (what to do if the master fails).
YMB - Yahoo Message Broker
Each broker has a ephemeral node in a nodes
znode, this provides us with the membership info. Similarly, each topic has a znode, under which a primary and a backup znodes are created by the same nodes. All of them being ephemeral. On failure (as master that is watching all these things) can react to these failures.
Adios