Monday, October 29, 2018

How to do load test ?

Load Test is done by simulating the calls both type and number in production environment. This is done to find how many calls production setup can serve and what are the bottlenecks and to set SLA expectations.

Some of the perquisites of doing load test. 

  1. You should have significant number of calls from access logs which can be simulate production call patterns. If you have small number of calls and plan to loop over them, results of load test might be biased because of following reason. 
    1. Caches in the system will show increased hit rate because same call is coming many times over. 
    2. If number of calls are very small then it will lead to high hitting single or not all shard because same key will be going to same shard. 
  2. Understand your load test client. You should at least understand following things before starting the load test
    1. How to check/log the response of your service. 
    2. How to check the Qps at which client is hitting the server. 
    3. How to add extra parameter in the call. This is required for analysis and validating the load test. 
  3. Your load test client should be in same geographical area as your production clients. 
  4. Way to do component testing i.e. if you are using database/hbase/application servers/etc as different components you should do load test of each component or at least have a way to do load test them independently. This is good practise to test each component independently before doing full setup load test. This is similar to Unit testing before doing integration testing.


Set right expectations. 

You need estimate how much Qps your system can handle. One way to do this to test each component of the system by sub-scaling it to single machine(if possible). For example for application servers hit one application server by keeping other component as it(without reducing capacity) and check at what Qps its performance drops below acceptable level.

Now its easy to calculate how much load each of independent component can handle. While calculating you need to consider 2 things

  1. Average number of calls to each component goes for single external call. i.e. you might be hitting DB 2 times from API after removing cached returns. You need to consider it while calculating the overall expected Qps of system. 
  2. Discounting the integrated system Qps. Qps computed for each component independently is the max limit of that component while other components are not loaded but in practise when all systems are loaded the performance of each component degrades. i.e. Qps will drop. 

How to Load Test:

  1. Start with very small Qps and log the response. Validate the response is as expected.
  2. Start with higher number of Qps something like half of max expected Qps of setup. This is very important that you do no start hitting with large Qps in the start and test you system with 1/2 of the traffic.
  3. While computing Qps of your client make sure you are not averaging over higher time ranges. you should rolling consider Max and Mean per second over a period of 5 min to compute the Qps. 
  4. If things are working smooth verify the execution time/load/etc by running on same Qps for atleast 10-15 min. 
  5. Gradually increase the Qps and everytime verify things are working by both checking response and load/execution time over a period of 10-15 min. 
  6. You can increase more when Qps is less and slower when Qps is high. 
  7. Find the point where service execution time/load/etc or response are below expectation. Don't forget to take stats of each component to find which component is choking. 


No comments:

Post a Comment