Monday, April 29, 2019

Go Vs Java Benchmark

Desclamer

I have 9 YEARS of experience in Java and 3 DAYS of experience in Go. This comparision might be highly biased.

What is compared here.


  • Ease of coding
  • Execution Time
  • Memory used

What is being done here ? 

I recently(3 days ago) started to learn Go and wanted to know if Go is actually as good as its publisized. So, in order to learn along with testing I translated one Java project into Go and did some benchmarking on it. Below are the results. 

Machine Used here is Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz ( 2 physical cores and 4 cores with hyperthreading ).

My understanding of Go (based on 3 days) :

Go is better C. Go cherry picked good parts of C, Java, Python along with adding new concepts like goroutines

Go seems very simple if you know C & Java. 


Results
Go Java
Serial Execution(Seconds) 86 171
Parallel Execution(Seconds) 105 207
Memory(MB) 3112 4423


Conclusion : 

Go is really useful for small projects with relatively small code, I think as the code become big with large number of functionality managing code might be little difficult.
Go is easy to code and relatively very efficient compared to Java.






Thursday, April 4, 2019

How to do LightGBM predictions in realtime at high scale in Java

We recently landed on a problem where we need to do predictions of LightGBM model in realtime in Java. Training in Python and Predicting in java is no new problem but this problem was unique because popular way of using PMML to do prediction was giving following issues.

Issues

  • Validations at PMML generation end is very scrict even stricter then LightGBM. Some examples are
    • PMML doesn't allow Single valued features while LightGBM do. 
    • PMML have weird restrictions on value range which is strange. 
  • At the time of prediction PMML predictor was doing some validation which was making the predictions very costly. 
  • PMML generator is not able to parse feature values with special characters from model dump.
  • PMML generation was taking 20 Min with 60GB of memory for 160MB model file. This is a big issue if we need to update the model every hour.
  • Client of PMML has to be aware of data type of features.
We were able to find work arounds to some of above but not all and was in the bad state to scale it using PMML. Finally we gave up on PMML and
we wrote our own parser and predictor to parse model dump of LightGBM in Java and load self defined objects and do predictions. 

Problems faced : 

LightGBM uses its own format and representation of trees in dump of LightGBM. With no documentation along with fairly complex representation of Trees make this understanding hard. Understanding this dump was the primary thing refraining us from taking this approach from biggining. 

What we achived : 

  • Memory requirements for loaded model is reduced by 50%
  • Prediction time is reduced by 50%, cutting down server cost to compute prediction by 50%
  • Model is more debugable, we can log anything we want, we are able to add break points and debug the model better.
  • We are able to remove middle man PMML completely and along with dependency on external library.
  • Significantly reducing resouce requirement and cutting time by 20 Min to complete the process. 





Monday, October 29, 2018

Twemproxy vs RedisCluster for high performance large datasets

This project demostrate the difference between Twemproxy and RedisCluster when it comes to high performance large in memory datasets. 
RedisCluster solves all the issues originaly comes with Twemproxy along with providing all(most) the current features of twemproxy.


Twemproxy : 


responsible for sharding based on given configurations. We can start multiple Twemproxy servers with same configurations. Call first goes to Twemproxy and then Twemproxy calls Redis to fire the command, it support all type of commands. 


Architecture based on twemproxy has many issues : 


Data has to be on same host as twemproxy to avoid n/w call. So, Data cannot be sharded over different servers.
Twem proxy takes more CPU then actual Redis and add extra hop. This is specially a problem in AWS where redis machines have lesser CPU power then compute machines.
Adding another shard and re sharding is very difficult process.
Adding another twem is difficult because we have to add it to client code manually by production build. 



RedisCluster : 


It makes Twemproxy obsolete by adding sharding intelligence in client and Redis servers. It provide support for adding new shards by providing auto-sharding and It further auto-syncs b/w client and server when we add another shard. 


Advantages of using RedisCluster instead of Twemproxy. 


Data can be sharding in multiple hosts.
Logic of computing the correct shard is on the client itself. This does 2 thing.
Removes extra hop
Move the CPU requirements on different box. 
Easy to add new shard
Auto sync the shard information with client which means we don't have to do any code upload. 
Cutting Redis command time by 50%. 


Note : RedisCluster java client Jedis doesn't provide pipeline support. I have written my own wrapper to support pipeline with jedis. This pipeline wrapper is not fully constant during node failure or resharding. 



Test Results : 


Running with 10 threads with 4 redis and 4 twemproxy as 1st setup and 4 redis servers in redis clusters as 2nd setup.


SetupCall TypeTime Taken(Sec)Redis CPU * 4Client CPUTwemproxy CPU * 4Total CPU on given timeTotal CPU cycles
TwemproxyGET14022905037852920
TwemproxyGET Pipeline3525705037012950
RedisClusterGET6744190036624522
RedisClusterGET Pipeline2944180035610324


To generate the results you can follow commands here : https://github.com/uniqueuser/RedisClusterTest

PS: This is not in production and all tests are done offline only. 

How to do load test ?

Load Test is done by simulating the calls both type and number in production environment. This is done to find how many calls production setup can serve and what are the bottlenecks and to set SLA expectations.

Some of the perquisites of doing load test. 

  1. You should have significant number of calls from access logs which can be simulate production call patterns. If you have small number of calls and plan to loop over them, results of load test might be biased because of following reason. 
    1. Caches in the system will show increased hit rate because same call is coming many times over. 
    2. If number of calls are very small then it will lead to high hitting single or not all shard because same key will be going to same shard. 
  2. Understand your load test client. You should at least understand following things before starting the load test
    1. How to check/log the response of your service. 
    2. How to check the Qps at which client is hitting the server. 
    3. How to add extra parameter in the call. This is required for analysis and validating the load test. 
  3. Your load test client should be in same geographical area as your production clients. 
  4. Way to do component testing i.e. if you are using database/hbase/application servers/etc as different components you should do load test of each component or at least have a way to do load test them independently. This is good practise to test each component independently before doing full setup load test. This is similar to Unit testing before doing integration testing.


Set right expectations. 

You need estimate how much Qps your system can handle. One way to do this to test each component of the system by sub-scaling it to single machine(if possible). For example for application servers hit one application server by keeping other component as it(without reducing capacity) and check at what Qps its performance drops below acceptable level.

Now its easy to calculate how much load each of independent component can handle. While calculating you need to consider 2 things

  1. Average number of calls to each component goes for single external call. i.e. you might be hitting DB 2 times from API after removing cached returns. You need to consider it while calculating the overall expected Qps of system. 
  2. Discounting the integrated system Qps. Qps computed for each component independently is the max limit of that component while other components are not loaded but in practise when all systems are loaded the performance of each component degrades. i.e. Qps will drop. 

How to Load Test:

  1. Start with very small Qps and log the response. Validate the response is as expected.
  2. Start with higher number of Qps something like half of max expected Qps of setup. This is very important that you do no start hitting with large Qps in the start and test you system with 1/2 of the traffic.
  3. While computing Qps of your client make sure you are not averaging over higher time ranges. you should rolling consider Max and Mean per second over a period of 5 min to compute the Qps. 
  4. If things are working smooth verify the execution time/load/etc by running on same Qps for atleast 10-15 min. 
  5. Gradually increase the Qps and everytime verify things are working by both checking response and load/execution time over a period of 10-15 min. 
  6. You can increase more when Qps is less and slower when Qps is high. 
  7. Find the point where service execution time/load/etc or response are below expectation. Don't forget to take stats of each component to find which component is choking. 


Saturday, July 30, 2011

HTTP Range header and Download managers

Prerequisite

Java RandomAccessFile Class

Overview

This article details the uses of HTTP Range header and its use in Download managers.

HTTP Range Header

HTTP Range is a available in HTTP/1.1 and allow to specify the range of bytes. It allow you to specify the starting byte and ending byte of the requested stream.
Range header allow many type of ranges we have used only byte range. Here is the syntax of byte range "Range: bytes=1-100", here 0 is starting byte and 100 is ending byte if ending byte is missing (Range: bytes=1-) it will be taken as the last byte in the file(EOF).
For more information on Range header click here


Range header used in Download manager

  • Continuing the download: If you are downloading and download break in the middle because of some external or internal reason, without range header you have to start from the beginning range header allow you to specify the starting byte of the download.
  • Multi-threading download: you can divide the download of large file into multiple chunks and download them in parallel.
    Below is the code which divide the file in chunks and download in parallel, this program after configuring properly gives you 50% better performance then Orbit download manager.
package www.directi.com.junk;

import java.io.IOException;
import java.io.InputStream;
import java.io.RandomAccessFile;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Date;

/*
* this class downloads a file from from given URL in multiple parts. It uses http range header
* $author = Paras Malik(masterofmasters22@gmail.com)
* $date = june 25, 2011
* */

public class DownloadInChunks implements Runnable {
    private static String fileURL = "sourceFileLink"; //file to be downloaded
    private static final String targetFileName = "targetFileName";
    private static final int MAX_BUFFER_SIZE = 1024 * 64 * 4; //maximum number of bytes for which connection will be probed
    private static int chunkOfFile = 1024 * 1024;  //size of one chunk of file to be downloaded
    private int downloadStartingByte;
    private int downloadEndingByte;
    static int fs;

    public DownloadInChunks(int downloadStartingByte, int downloadEndingByte) {
        this.downloadStartingByte = downloadStartingByte;
        this.downloadEndingByte = downloadEndingByte;
    }

    public static void main(String s[]) throws IOException {
        System.out.println(new Date());
        int downloadStartingByte = 0;
        int downloadEndingByte = chunkOfFile;
        int fileSize = getFileSize();
        fs = fileSize;
        while (fileSize > downloadStartingByte) {
            if (downloadEndingByte == fileSize)
                new Thread(new DownloadInChunks(downloadStartingByte, -1)).start();
            else
                new Thread(new DownloadInChunks(downloadStartingByte, downloadEndingByte)).start();
            downloadStartingByte = downloadEndingByte;
            if (downloadEndingByte + chunkOfFile <= fileSize)
                downloadEndingByte = downloadEndingByte + chunkOfFile;
            else
                downloadEndingByte = fileSize;
        }
    }

    static int getFileSize() throws IOException {
        URL url = new URL(fileURL);
        HttpURLConnection connection = (HttpURLConnection) url.openConnection();
        connection.connect();
        if (connection.getResponseCode() != 200) {
            System.out.println("Error return code != 200");
        }
        int contentLength = connection.getContentLength();
        connection.disconnect();
        return contentLength;
    }

    public void downloadPartOfFile(int startingByte, String endingByte) throws IOException {
        URL url = new URL(fileURL);
        HttpURLConnection connection = (HttpURLConnection) url.openConnection();
        int downloaded = 0;
        connection.setRequestProperty("Range", "bytes=" + startingByte + "-" + endingByte); //is endingByte is empty string http request all the bytes till EOF from given starting byte.
        connection.connect();
        if (connection.getResponseCode() != 206) {   // http return 206 as signal of OK if request is send with range header
            System.out.println("Error return code != 206");
        }
        int contentLength = connection.getContentLength();
        RandomAccessFile file = new RandomAccessFile(targetFileName, "rw");
        file.seek(startingByte);
        InputStream stream = connection.getInputStream();
        while (true) {
            byte buffer[];
            if (contentLength - downloaded > MAX_BUFFER_SIZE) {
                buffer = new byte[MAX_BUFFER_SIZE];
            } else {
                buffer = new byte[contentLength - downloaded];
            }
            int read = stream.read(buffer);
            if (read == -1 || downloaded == contentLength)
                break;
            file.write(buffer, 0, read);
            downloaded += read;
        }
        file.close();
    }

    @Override
    public void run() {
        try {
            if (downloadEndingByte == -1)
                downloadPartOfFile(downloadStartingByte, "");
            else
                downloadPartOfFile(downloadStartingByte, String.valueOf(downloadEndingByte));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}


Sunday, June 19, 2011

RandomAccessFile Read/Write Concurrency

Prerequisite

  • Java Basics I/O

Overview

This article details some useful features of RandomAccessFile java class.

RandomAccessFile

Instances of this class support both reading and writing to a random access file. A random access file behaves like a large array of bytes stored in the file system. There is a kind of cursor, or index into the implied array, called the file pointer; input operations read bytes starting at the file pointer and advance the file pointer past the bytes read.
If the random access file is created in read/write mode, then output operations are also available; output operations write bytes starting at the file pointer and advance the file pointer past the bytes written. Output operations that write past the current end of the implied array cause the array to be extended. The file pointer can be set by the seek method.

seek operation

Sets the file-pointer offset, measured from the beginning of this file, at which the next read or write occurs. The offset can be set beyond the end of the file. Setting the offset beyond the end of the file does not change the file length. The file length will change only by writing after the offset has been set beyond the end of the file

Concurrent Read/Write

This is a useful but very dangerous feature. It goes like this "if you create different instances of RandomAccessFile for a same file you can concurrently write to the different parts of the file."
You can create multiple threads pointing to different parts of the file using seek method and multiple threads can update the file at the same time. Seek allow you to move to any part of the file even if it doesn't exist( after EOF ), hence you can move to any location in the newly created file and write bytes on that location. You can open multiple instances of the same file and seek to different locations and write to multiple locations at the same time. Below is a working example to copy one file to another using multiple threads. You can tune this code to perform better then windows .
Note: writing concurrently is only a performance hack. If not properly implemented this can lead to race condition and many other critical bugs. You should avoid doing this unless necessary.
package www.directi.com.junk;

import java.io.IOException;
import java.io.RandomAccessFile;

/*
 * copy file(filePath) to (targetFileName) in chunks. It divide the file in parts of chuckOfFile and copy each part concurrently
 * $author = Paras Malik(masterofmasters22@gmail.com)
 * $Date = 20 June, 2011
*/
public class CopyFile implements Runnable {

    public static String filePath = "source.mkv";  //file to copy
    public static String targetFileName = "target.mkv";
    private static final int MAX_BUFFER_SIZE = 1024 * 1024 * 5;
    private int startingByte;
    private int endingByte;
    private static int chunkOfFile = 1024 * 1024 * 70;  //size of one chunk of file

    public CopyFile(int startingByte, int endingByte) {
        this.startingByte = startingByte;
        this.endingByte = endingByte;
    }

    public static void main(String s[]) throws IOException {
        int copyStartingByte = 0;
        int copyEndingByte = chunkOfFile;
        int fileSize = (int) getFileSize();
        while (fileSize > copyStartingByte) {
            new Thread(new CopyFile(copyStartingByte, copyEndingByte)).start();
            copyStartingByte = copyEndingByte;
            if (copyEndingByte + chunkOfFile <= fileSize)
                copyEndingByte = copyEndingByte + chunkOfFile;
            else
                copyEndingByte = fileSize;
        }
    }

    static long getFileSize() throws IOException {
        RandomAccessFile f = new RandomAccessFile(filePath, "r");
        long contentLength = f.length();
        return contentLength;
    }

    /* copy one part of the file starting from startingByte till endingByte*/
    public void copyPartOfFile(int startingByte, int endingByte) throws IOException {
        int copied = 0;
        int contentLength = endingByte - startingByte;
        RandomAccessFile sourceFile = new RandomAccessFile(filePath, "rw");
        RandomAccessFile targetFile = new RandomAccessFile(this.targetFileName, "rw");
        targetFile.seek(startingByte);
        sourceFile.seek(startingByte);
        while (true) {
            byte buffer[];
            if (contentLength - copied > MAX_BUFFER_SIZE) {
                buffer = new byte[MAX_BUFFER_SIZE];
            } else {
                buffer = new byte[contentLength - copied];
            }
            int read = sourceFile.read(buffer, 0, buffer.length);

            if (read == -1 || copied == contentLength)
                break;
            targetFile.write(buffer, 0, read);
            copied += read;
        }
        sourceFile.close();
        targetFile.close();
    }

    @Override
    public void run() {
        try {
            copyPartOfFile(startingByte, endingByte);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}