Tuesday 20 January 2015

How fast StringBuilder really is and why

All Java developers should know the class StringBuilder for creating strings. What, you don't?  Don't worry. I have found a lot of people developing java that don't.

StringBuilder is a class from de Java API and was introduced in version 1.5. It's name says it all: it is a class for building Strings. It offers a wide range of functions to manipulate the string within that will make your life easier in a heavy string  use application as it can be a J2EE app.  Notice that all functions in StringBuilder are instance functions, so you crate a new StringBuilder and then modify that object.

On of the most used functions of StringBuilder is .append(param). Append just adds the parameter at the end of the string. It has 13 overloaded methods to use (append(int), append(String), append(long), etc...) so take a look at the API. And why it's the most used function? Well, because it's one way to ensure performance while concatenating strings.

How fast it really is?

To answer this question I have written a simple java program that concatenates 65000 strings in three different ways: Method 1 Using + operator on the string, Method 2using String.concat() function and Method 3 using StringBuilder.append() function. Here its the code.


Compile with javac StringConcat.java and run it with java -cp . StringConcat. The output on my laptop is as follows:

Appending 65000 string with different methods... 

Total time +: 3594 milliseconds 
Total time calling concat: 965 milliseconds 
Total time StringBuilder: 1 milliseconds

WOW! The StringBuilder code did the same job in less than 1% of the time that using "+" lasts. This doesn't mean that StringBuilder is going to be always that faster, but you get the point.

But why?

Let's dig a little bit on the bytecode generated by the compiler to understand this difference. Executing  javap - c StrincConcat.class will show us the bytecode generated. In my case it has 209 lines, so for the sake of space and simplicity I will focus on the three for loops. Notice that different compilers can generate different bytecode.

First loop


We can identify the loops with the instructions if_icmpge and goto. The first one would correspond to the for line and will jump to line 61 at the end of the loop. Goto just goes to the line 26.
If you look closely to de bytecode there is a StringBuilder there. So, what's the diference with method 3? Well, there is a new on line 33, so this code it's creating a new StringBuilder instance on every iteration of the loop. Then it appends the current string and after that, appends the string "a". Notice that the default constructor of StringBuilder ensures a 16 char capacity, so over that length the process has to expand the capacity of the StringBuilder. Expanding the capacticy implies allocating a new larger array. Also, the toString() method of of StringBuilder copies his char array to a new array an returns it. Thats creating a total of 3 new arrays each iteration past the 16th iteration.
The compilated code is almost the same as coding this:
That doesn't look very optimal, right?

Second loop


The third loop code is simpler than the first one. It iterates calling String.concat(). Why is faster than method one? Well, you neet to get a loog to String.concat code. It uses arrays to copy the content and creates a new String with the resulting array. The new String(char[]) just stores the reference to the array, so no new array is created. That means around a third of work of method 1.

Third loop


In the third loop the code just calls StringBuilder.append(). What it make it so fast is that the StringBuilder grows by length * 2 + 2 when it's out of capacity for the append operation. That means that it only has to grow about 12 times to be able to store the 65000 chars.

No comments:

Post a Comment