Software Engineering and Machine Learning enthusiast: October 2018

This project demostrate the difference between Twemproxy and RedisCluster when it comes to high performance large in memory datasets.
RedisCluster solves all the issues originaly comes with Twemproxy along with providing all(most) the current features of twemproxy.

Twemproxy :

responsible for sharding based on given configurations. We can start multiple Twemproxy servers with same configurations. Call first goes to Twemproxy and then Twemproxy calls Redis to fire the command, it support all type of commands.

Architecture based on twemproxy has many issues :

Data has to be on same host as twemproxy to avoid n/w call. So, Data cannot be sharded over different servers.
Twem proxy takes more CPU then actual Redis and add extra hop. This is specially a problem in AWS where redis machines have lesser CPU power then compute machines.
Adding another shard and re sharding is very difficult process.
Adding another twem is difficult because we have to add it to client code manually by production build.

RedisCluster :

It makes Twemproxy obsolete by adding sharding intelligence in client and Redis servers. It provide support for adding new shards by providing auto-sharding and It further auto-syncs b/w client and server when we add another shard.

Advantages of using RedisCluster instead of Twemproxy.

Data can be sharding in multiple hosts.
Logic of computing the correct shard is on the client itself. This does 2 thing.
Removes extra hop
Move the CPU requirements on different box.
Easy to add new shard
Auto sync the shard information with client which means we don't have to do any code upload.
Cutting Redis command time by 50%.

Note : RedisCluster java client Jedis doesn't provide pipeline support. I have written my own wrapper to support pipeline with jedis. This pipeline wrapper is not fully constant during node failure or resharding.

Test Results :

Running with 10 threads with 4 redis and 4 twemproxy as 1st setup and 4 redis servers in redis clusters as 2nd setup.

Setup	Call Type	Time Taken(Sec)	Redis CPU * 4	Client CPU	Twemproxy CPU * 4	Total CPU on given time	Total CPU cycles
Twemproxy	GET	140	22	90	50	378	52920
Twemproxy	GET Pipeline	35	25	70	50	370	12950
RedisCluster	GET	67	44	190	0	366	24522
RedisCluster	GET Pipeline	29	44	180	0	356	10324

To generate the results you can follow commands here : https://github.com/uniqueuser/RedisClusterTest

PS: This is not in production and all tests are done offline only.

Load Test is done by simulating the calls both type and number in production environment. This is done to find how many calls production setup can serve and what are the bottlenecks and to set SLA expectations.

Some of the perquisites of doing load test.

You should have significant number of calls from access logs which can be simulate production call patterns. If you have small number of calls and plan to loop over them, results of load test might be biased because of following reason.

Caches in the system will show increased hit rate because same call is coming many times over.
If number of calls are very small then it will lead to high hitting single or not all shard because same key will be going to same shard.

Understand your load test client. You should at least understand following things before starting the load test

How to check/log the response of your service.
How to check the Qps at which client is hitting the server.
How to add extra parameter in the call. This is required for analysis and validating the load test.

Your load test client should be in same geographical area as your production clients.
Way to do component testing i.e. if you are using database/hbase/application servers/etc as different components you should do load test of each component or at least have a way to do load test them independently. This is good practise to test each component independently before doing full setup load test. This is similar to Unit testing before doing integration testing.

Set right expectations.

You need estimate how much Qps your system can handle. One way to do this to test each component of the system by sub-scaling it to single machine(if possible). For example for application servers hit one application server by keeping other component as it(without reducing capacity) and check at what Qps its performance drops below acceptable level.

Now its easy to calculate how much load each of independent component can handle. While calculating you need to consider 2 things

Average number of calls to each component goes for single external call. i.e. you might be hitting DB 2 times from API after removing cached returns. You need to consider it while calculating the overall expected Qps of system.
Discounting the integrated system Qps. Qps computed for each component independently is the max limit of that component while other components are not loaded but in practise when all systems are loaded the performance of each component degrades. i.e. Qps will drop.

How to Load Test:

Start with very small Qps and log the response. Validate the response is as expected.
Start with higher number of Qps something like half of max expected Qps of setup. This is very important that you do no start hitting with large Qps in the start and test you system with 1/2 of the traffic.
While computing Qps of your client make sure you are not averaging over higher time ranges. you should rolling consider Max and Mean per second over a period of 5 min to compute the Qps.
If things are working smooth verify the execution time/load/etc by running on same Qps for atleast 10-15 min.
Gradually increase the Qps and everytime verify things are working by both checking response and load/execution time over a period of 10-15 min.
You can increase more when Qps is less and slower when Qps is high.
Find the point where service execution time/load/etc or response are below expectation. Don't forget to take stats of each component to find which component is choking.

Software Engineering and Machine Learning enthusiast

Monday, October 29, 2018

Twemproxy vs RedisCluster for high performance large datasets