Designing Data-Intensive Applications 6: Partition
Contents
Partition
Partition 两种主流方式
- partition by key range: efficient at range query but may got skewed and hotspot issue
- partition by hashing key: less skewed and hostspot issue but sacrifices range query performance
Partitioning secondary indexes
- By Document(local index): Cassandra, MongoDB, ElasticSearch,VoltDB,Riak
- By Term(global index): DynamoDB
- pros and cons of the two methods
Rebalaning partitions
- 解释下为啥需要 rebalancing
rebalancing 的目标
- When rebalancing, the database should continue accept read writes requests.
- No more data than necessay should be moved during reblancing.
- After rebalancing, the load (data storage and read, write requests) should be shared fairly amond nodes.
rebalancing 的三种 partition 模式:
- fixed partitioning: Riak, ElastichSearch, CouchBase, Voldermot
- dynamic partition: HBase and MongoDB
- Partitioning proportionally to Nodes: Cassandra
- 三者对比,优缺点,适用场景
Operations, manual or automatic: 了解下 fully automatic 存在的缺陷 - cascading faliure
Request Routing
三种模式
- Allow clients to send requests to all nodes, using round-robin
- Create a routing tier
- Require client to be aware of partitioning and how the partitioning is conducted
三种方式都有一个难点:how does the component making the routing decision (which may be one of the nodes, or the routing tier, or the client) learn about changes in the assignment of partitions to nodes
- MongoDB, HBase 等用方式 2,依赖 zookeeper 做管理
- Cassandra 用了自己的 goosip protocol
MPP (Massively parallel processing) 了解下
Author lekai
LastMod 2019-01-21