Skip to content

Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service

Published: at 12:00 AM

Table of contents

Open Table of contents

DynamoDB 101

  Paper内容之前,先简单介绍一下 DynamoDB。

What is DynamoDB

DynamoDB Core Components

DynamoDB Components

Primary Key

Two types of different kinds:

Secondary Indexes

Two types of different kinds:

Others

Introduction

  Amazon DynamoDB 是一个非常重要的 NoSQL Database,它起源于 Amazon 的 Dynamo。 目前在 Amazon 和 AWS 内部广泛使用,而且很多的 AWS Services 也是用到了 DynamoDB, 比如 AWS Lambda。 DynamoDB 通过牺牲功能性来换取了 fast and predictable performance at any scale

History

DynamoDB Timeline

  还有一篇2007年的论文《Dynamo: Amazon’s Highly Available Key-value Store》,介绍了 Dynamo 的设计和实现。(没看过) 虽然名字有相似,而且都是 KV 数据库,但这是两个不同的东西。Dynamo 是个很难去运维的数据库。

SimpleDB

  是 AWS 的第一个 Database-as-a-Service (DBaaS) 产品,但是它有一些限制:

DynamoDB 应运而生

Dynamo + SimpleDB = DynamoDB

  DynamoDB 的设计目标是为了解决 SimpleDB 的限制,同时保留 Dynamo 的优点。

Architecture

hash(partition key) + sort key => where the item will be stored

Storage

The replication group uses Multi-Paxos for leader election and consensus. Any replica can trigger a round of the election. Once elected leader, a replica can maintain leadership as long as it periodically renews its leadership lease.

Storage Node and Log Node

Storage Node Log Node

Microservices

DynamoDB Architecture

DynamoDB consists of tens of microservices.

Provisioned to on-demand

If you can’t measure it, you can’t manage it - Peter Drucker

RCUs and WCUs collectively are called provisioned throughput.(pre-allocated capacity, 预配置)

Partition abstraction proved to be really valuable and continues to be central to the design of DynamoDB. 但是早期的版本将容量和性能紧密地耦合进了独立的 partition,带来了挑战。

DynamoDB 的接纳控制一开始是按照 provisioned throughput 来设计的,但实际情况通常是 non-uniform workloads.

数据的分布也很重要,一般来说请求并不会均匀分布到每一个 partition 的。会导致 hot portion

Since throughput was allocated statically and enforced at a partition level, these nonuniform workloads occasionally resulted in an application’s reads and writes being rejected, called throttling, even though the total provisioned throughput of the table was sufficient to meet its needs.

Others Points

Summary

  本质上系统设计就是在做 tradeoff,有舍才有得。DynamoDB Serverless 很好用。 DynamoDB DAX 的本质就是签名加了一个 cache。