Thursday, June 25, 2020

Migration to Aerospike from Redis Clusters & HBase

We recently moved our production data base technology infrastructure from Redis and HBase to Aerospike. In this blog I am sharing the learning from this exercise

What use case we are solving ? 

I am responsible for a large scale, low latency service. This service handle 350,000 request per second at peak with average latency of 10ms. This is one monolithic service to serve bids for RTB auctions at scale. I need to query large amount of data from different data sources, compute multiple models with 100s of trees on the fly with in this time frame. This data is heterogeneous in nature and cannot be stored together. Some data is large to be stored in memory.  Data is distributed between many Redis Clusters and HBase based on type of data. 


What data is being stored in Redis Cluster

Redis Cluster is used for storing large amount of key-values where key size is between 32 to 50 characters. Some values are simple string other are maps, lists, sets or sorted sets. Size of value range from few bytes to large sorted sets of few Gigabytes.  

Write patterns  in Redis

In few Redis cluster data is written once every hour and other Redis clusters data is inserted/updated and read on the fly with every call to service. 

Read Patterns in Redis 

Few operations we perform 
  • Read data against one key 
  • Read data against many keys 
  • Get one value from sorted set
  • Get one/all values from map 

There are many operations performed directly on redis server using Lua scripts. These scripts are used for read update write type of patterns. 
We heavily use pipeline and TTL functionality of Redis
 

What data we are storing in HBase 

Simple Key-Value pairs where number of keys are in billions and storing in memory will be very expensive.  These keys are of type write once with TTL and read many ti\mes. 

Why we migrated from Redis to Aerospike:

We were running 10s of Redis clusters with number of shards ranging from 3 to 100s in each cluster. 
  • Since in Redis cluster individual shards needs to be added/removed it was being management nightmare. 
  • Our data was growing and keeping all the data in memory was becoming very expensive. 

Why we migrated from HBase to Aerospike:

  • HBase was slow for our use case. 
  • Managing tail latencies was coming more and more challenging because of HBase's JVM GC.
  • HBase was costly, the number of requests served by per host was very less. 

Some issues and restrictions of Aerospace :

  • Aerospike is good for storing small values against keys. Aerospike rewrite replaces the whole value against key on every update. Something to keep in note of because we if we are using large values in data structure against one key this will slow down the system alot. 
  • For HyBrid and InMemory version Aerospike reserves 64 Bytes in memory for each key so to store 1 Billions keys we need ~64GB of memory to store the index.  
  • There is no control over how Aerospike distribute data in its shards, this can lead to unbalanced sharding. In our testing we have found there is a difference of 10% in number of slots between highest slot machine and second highest slot machine. 
  • If we are using in memory version of Aerospike and machine goes down we will loose the data but in case of Redis data can be recovered(not fully) if persistence is on 
  • Redis supports key tags where we can force the multiple keys to go to one shard by specifying the sharding key as substring of original key. Aerospike doesn't support this functionality. This functionality if very useful for specific query patterns. 
  • Redis supports pipeline/batch calls for insert/updates. Aerospike doesn't support this. 

Some benefits of Aerospike 

  • Aerospike is multithreaded compared to Redis so we don't have to manage 100s of redis instances we can use few big hosts to make a cluster. This reduces management overhead. 
  • Aerospike is optimized for reads & writes to SSD / NVMe which makes it blazing fast. 
  • Aerospike is fully written in C++ so there is no JVM or Garbage collection. 
  • Tail latencies are less in Aerospike then in HBase 
  • Aerospike support nested data structures so you can create keys where value is  map which has key and then map again in value of internal map. 

Migration of HBase

Our Data model for large number of keys in Aerospike

We have 8 Billion records to store with TTL. Where key is hexa string of 32 characters and value is string of length between 10 to 20 characters. To store these records with replication factor of 3 we needed 3 X 8^9 X 64 bytes or 1.4 TB of memory for the Aerospike index. We can just store all of this data in memory with 1.4 TB of memory. 
To reduce the memory usage we decided to use nested map functionality of Aerospike and use first 7 characters of key as primary key and keep a map inside it with rest of the key as key and value as value and another key with timestamp. Because of this data format we are able to reduce our memory footprint by factor of 30 and but loose the functionality of TTL by Aerospike. 
We decided to go with this Data Model because of following reasons 
  • Our cost was reducing a lot 
  • There was no degradation of performance because of this. 
  • Instead of TTL we want to delete/expire key from the last read time which is not available in Aerospike, so we decided to build a Hybrid solution to expire they keys ourselves. If you are interested in our hybrid solution of expiring keys please check this. 

                                                                To Be Continued