DynamoDB: Guidelines for faster reads and writes

DynamoDB: Guidelines for faster reads and writesSasidhar SekarBlockedUnblockFollowFollowingJan 3This is the post #4 of the series aimed at exploring DynamoDB in detail.

If you haven’t read the earlier posts, you can find them here.

DynamoDB: BasicsDynamoDB: Why migrate to DynamoDB from Cassandra?DynamoDB: Data ModelingThis post will aim to present some efficient design patterns, along with some key considerations involved in designing your read/write operations on DynamoDB tables.

If you are not interested in reading through the entire blog and want to jump to the summary straightaway, click hereGuidelines to efficient and cost effective read/writes:While a good data model is key to the performance of DynamoDB, it is not the only key factor.

Design of read/write operations also play a major role in ensuring that your services get the best performance out of DynamoDB.

In this section, we’ll take a look at some of the key factors that affect the performance/cost of read/write operations.

Consistency:Write consistency is not configurable in DynamoDB but reads are.

You have two consistency options for reads.

Eventual consistencyStrong consistencyKey differences between these two consistency levels are listed in the below table:DynamoDB: Consistency LevelsIf your service can function satisfactorily without the need to have a strong consistency level, it is better to go with eventually consistent reads, for cost and performance reasons.

Please note, if you’re using Global tables, you will have to be content with eventual consistency globally.

So, there is little benefit in being strongly consistent on individual replicas.

Use eventual consistency everywhere.

Parallel Requests:As mentioned in the “Partitioning” section, in my earlier post, different partition keys can go into different partitions and each partition has its own share of provisioned throughput.

Consider the example of a hypothetical “Landmarks” table shown below.

DynamoDB: Sample Table for IllustrationBelow is an example partition view of this table.

DynamoDB: Partition View of the Sample TableAssume, you had provisioned 6 WCU for the table and post partitioning, each partition has 1 WCU provisioned.

Now, if your access pattern is to update the “Landmark” attribute of every hotel id, you might do this in a couple of ways.

Approach 1: Iterate over each hotel id and update the “Description” attribute of each landmarkDynamoDB: Maximum Throughput Achievable via Approach 1A typical write path, in this case, might look like the one shown below.

DynamoDB: Write Path with Approach 1As can be seen above, the approach to update all the items of one partition key first and then move on to the next one might not be the most efficient.

Because Hotel_ID’s 1 and 2 are in Partition-1, the maximum writes possible on that partition is limited to 1 WCU (based on what you provisioned).

So, even though you still have 5 WCU’s unused, you cannot get more than 1 WCU throughput.

This is inefficient.

Approach-2: Update the attributes of different hotel id’s in parallelDynamoDB: Maximum Throughput Achievable via Approach 2A typical write path, in this case, might look like the one shown below.

As can be seen from the above figure, with this approach, because you are writing to different partitions concurrently, you can fully utilize the write capacity of the table and achieve a maximum of 6 WCU.

By merely changing your approach to writing you could increase your DynamoDB throughput by several times (6 times in this case), without making any changes to your data model or increasing the provisioned throughput.

The same logic applies for reads as well.

Because DynamoDB is a managed service, you do not have any visibility on which partition key goes into which partition.

So, the best approach to writing parallel requests is to randomize your partition keys as much as possible, so you increase the probability of writing to different partitions.

**** DynamoDB adaptive capacity can “loan” IO provisioning across partitions, but this can take several minutes to kick-in.

Additionally it wouldn’t work at all for the type of update in the above example because you would be trying to write more than provisioned, successively to different partitions.

Adaptive capacity cannot really handle this as it is looking for consistent throttling against a single partition.

** DynamoDB on-demand provisioning would allow all the writes to execute in the example with no throttling, but this is much more expensive; and at high IO rates will still encounter the same problem as the IO per partition is limited to 1000 WCU or 3000 RCU even in on-demand provisioning mode.

Document store or attribute store?With DynamoDB, there are costs to reading and writing data.

These costs are packaged into RCU (Read Capacity Units) and WCU (Write Capacity Units).

1 RCU = 4KB read1 WCU = 1KB writeSo, if you have a wide column table with a number of attributes per item, it pays to retrieve only attributes that are required.

Same applies for writes.

With DynamoDB, you have the option to update individual attributes of an item.

So, it is more cost efficient to not update items altogether but rather update only the required attributes.

Example Scenario: Assume your service has a Json document (shown below) that contains customer information and you want to save this in DynamoDB for future reference.

{"firstName": "John","lastName": "Smith","age": 25,"address":{"streetAddress": "21 2nd Street","city": "New York","state": "NY","postalCode": "10021"},"phoneNumber":[{"type": "home","number": "212 555-1234"},{"type": "fax","number": "646 555-4567"}]}Credit: SitepointYou can store this in DynamoDB in a couple of ways:You can either store this entire document as an attributeDynamoDB: Document as an AttributeConsidering this table structure, if you want to retrieve only the first name of a given customer, you have to retrieve the entire document and parse it, in order to get the first name.

If the size of the document is greater than 4KB, this might consume multiple RCUs, not to mention the additional time it spends over the wire.

Alternatively, you can store each parameter within the Json document as a separate attribute in DynamoDBDynamoDB: Document Parameters Stored as Individual DynamoDB AttributesStore each parameter within the Json document as a separate attribute in DynamoDBIn this case, if you want to retrieve only the first name of the customer, you can retrieve the single attribute “First_Name”, which will cost you 1 RCU only (even if strongly consistent), as long as this attribute size is < 4KB.

Note: If your Json document is <4KB in size, there will be no cost savings, as DynamoDB RCUs are rounded off to a minimum of 4KB.

Batching:DynamoDB gives you the option to run multiple reads/writes, as a single batch.

There are a few fundamental concepts to keep in mind, while using DynamoDB batches.

Unlike some other nosql datastores, DynamoDB batches are not atomic (i.

e.

) if one of the reads/writes in a batch fail, it does not fail the entire batch, rather the client receives info on the failed operations, so it can retry the operations.

DynamoDB client (driver/cli) does not group the batches into a single command and send it over to DynamoDB.

Instead, the Client sends each request separately over to DynamoDB.

So, even if you group 100 reads into a single batch at the client, DynamoDB receives 100 individual read requestsConsidering the above facts, if you’re wondering why use batching at all, there are a couple of reasons as to why:Reads/writes in DynamoDB batches are sent and processed in parallel.

This is certainly faster than individual requests sent sequentially and also saves the developer the overhead of managing threadpools and multi-threaded executionWhile reads and writes in batch operations are similar to individual reads and writes, they are not exactly the same.

In order to improve performance with large-scale operations, batch reads/writes do not behave exactly in the same way as individual reads/writes would.

For example, you cannot specify conditions on individual put and delete requests with BatchWriteItem and BatchWriteItem does not return deleted items in the response.

If your use case involves a need to run multiple read/write operations to DynamoDB, batching might be a more performant option, than individual read/write requests.

Note: There is a limit of 16MB payload and 25 write requests (or) 100 read requests per batch.

Avoid full table scans:DynamoDB offers two commands that can retrieve multiple items per request.

ScanQueryTo run a Scan request against a DynamoDB table, you do not need to specify any criteria (not even the Partition Key).

Scan requests navigate the entire table item by item, until it reaches a limit of 1 MB during which time a response is sent to the client with the scanned items, a pointer to the LastEvaluatedKey and information of how many items are left to be scanned in the table.

Query operations are slightly different.

To run a Query request against a table, you need to at least specify the Partition Key.

Query requests attempt to retrieve all the items belonging to a single Partition Key, up to a limit of 1MB beyond which you need to use the LastEvaluated key to paginate the results.

For the key differences between Query and Scan operations, refer to the below table.

DynamoDB: Query vs Scan OperationBecause you do not need to specify any key criteria to retrieve items, Scan requests can be an easy option to start getting the items in the table.

But, this comes at a cost.

As you’re not specifying the Partition Key, Scan requests will have to navigate through all the items in all the partitions.

This can be a very expensive way to find what you’re looking for.

Query requests would be a better option to retrieve multiple items, if they all belong to the same Partition Key because not only are queries run against a single partition, you can also specify the sort key to narrow down the results even further.

So, Query requests are expected to be much faster than Scan requests.

Avoid Scan operations.

Use Query operations instead (along with indexes*, if required), wherever possible.

If for any reason you need to get all the items in the table, you can use Scan requests but please note that this causes extreme stress on your provisioned throughput.

So, look at some of the mitigative measures like rate limiting, parallel scans, reducing the page size etc.

,* Please refer to my other blog post DynamoDB: Efficient Indexes, to learn more about indexes.

To Summarize:If you do not need strongly consistent reads, always go with eventually consistent reads.

Eventually consistent reads are faster and cost less than strongly consistent readsYou can increase your DynamoDB throughput by several times, by parallelizing reads/writes over multiple partitionsUse DynamoDB as an attribute store rather than as a document store.

This will not only reduce the read/write costs but also improve the performance of your operations considerablyUse batching, wherever you can, to parallelize requests to DynamoDB.

Batching offers an optimized, parallel request execution without burdening the developer with the overhead of managing threadpoolsAvoid full table scans.

Consider using Query operations along with indexes as an alternative, wherever possibleI hope this blog gave you a reasonable insight into designing faster read/write operations on DynamoDB tables.

The next and final article — part 5 of this series will focus on the key considerations involved in creating and using DynamoDB indexes.

.

. More details

Leave a Reply