Monday, October 29, 2018

Twemproxy vs RedisCluster for high performance large datasets

This project demostrate the difference between Twemproxy and RedisCluster when it comes to high performance large in memory datasets. 
RedisCluster solves all the issues originaly comes with Twemproxy along with providing all(most) the current features of twemproxy.


Twemproxy : 


responsible for sharding based on given configurations. We can start multiple Twemproxy servers with same configurations. Call first goes to Twemproxy and then Twemproxy calls Redis to fire the command, it support all type of commands. 


Architecture based on twemproxy has many issues : 


Data has to be on same host as twemproxy to avoid n/w call. So, Data cannot be sharded over different servers.
Twem proxy takes more CPU then actual Redis and add extra hop. This is specially a problem in AWS where redis machines have lesser CPU power then compute machines.
Adding another shard and re sharding is very difficult process.
Adding another twem is difficult because we have to add it to client code manually by production build. 



RedisCluster : 


It makes Twemproxy obsolete by adding sharding intelligence in client and Redis servers. It provide support for adding new shards by providing auto-sharding and It further auto-syncs b/w client and server when we add another shard. 


Advantages of using RedisCluster instead of Twemproxy. 


Data can be sharding in multiple hosts.
Logic of computing the correct shard is on the client itself. This does 2 thing.
Removes extra hop
Move the CPU requirements on different box. 
Easy to add new shard
Auto sync the shard information with client which means we don't have to do any code upload. 
Cutting Redis command time by 50%. 


Note : RedisCluster java client Jedis doesn't provide pipeline support. I have written my own wrapper to support pipeline with jedis. This pipeline wrapper is not fully constant during node failure or resharding. 



Test Results : 


Running with 10 threads with 4 redis and 4 twemproxy as 1st setup and 4 redis servers in redis clusters as 2nd setup.


SetupCall TypeTime Taken(Sec)Redis CPU * 4Client CPUTwemproxy CPU * 4Total CPU on given timeTotal CPU cycles
TwemproxyGET14022905037852920
TwemproxyGET Pipeline3525705037012950
RedisClusterGET6744190036624522
RedisClusterGET Pipeline2944180035610324


To generate the results you can follow commands here : https://github.com/uniqueuser/RedisClusterTest

PS: This is not in production and all tests are done offline only. 

No comments:

Post a Comment