Sunday, 13 November 2011

Scala traits

If you are a Java programmer (like me) then you might have used interface. In Java we define an abstract type by using interface. Interface outlines the contract by just having method signature. It does not have any implementation. Traits are similar to interface but it may contain partial implementation.

So what is a trait?

A trait encapsulates the fields and method definitions. It can be mixed into other classes to add new feature or to modify its existing behavior.

Let me give you a simple example how it can be done:

trait DiscountPerson {
 
     def isDiscountApplied() : Boolean
 
     def discountAmount() : Double = 10.5
 
   }

   class Person(age: Int) extends DiscountPerson {
     
      var myAge: Int = age
  
      def isDiscountApplied() = myAge > 60
 
   }

   object DiscountTraitTest extends Application {
  
     val seniorPerson = new Person(65)
  
     if( seniorPerson.isDiscountApplied() ) 
        println(seniorPerson.discountAmount())
   }

If you see carefully you will find that out of two methods in DiscountPerson trait discountAmount method has the implementation and isDiscountApplied method does not have any implementation.

Any other class can extend the DiscountPerson trait and can have different implementation of isDiscountApplied method to get the same amount of discount. This new class does not need to have discountAmount method again to get the same amount discount. So you can see the benefit of using trait - code reusability.

How does trait work?

By this time you already know how you can write a trait in Scala. A trait definition looks like a class definition except that it uses the keyword trait. In the above example the trait is named DiscountPerson. It does not declare a superclass. It means that it has the default superclass of AnyRef.

Once a trait is defined, it can be "mixed in" to a class using either the extends or with keyword. When we use extends keyword to mix in a trait, we implicitly inherit the trait's superclass. In above example, Person subclasses AnyRef (superclass of DiscountPerson) and mixes in DiscountPerson.

It is also possible to mix a trait into a class that explicitly extends a superclass. In this case, extends can be used to indicate the superclass and with to mix in the trait. For Example:

trait Programmer {
 
     def profile(){
        println("I am programmer.")
     }
 
   }

   class Blogger   

   class John extends Blogger with Programmer {
     
     override def toString = "blogger"  
 
   }

The below example will show you how to mix in multiple traits:

trait Programmer {
 
     def profile(){
        println("I am programmer.")
     }
 
   }

   class Blogger 
   
   trait Photographer 

   class John extends Blogger with Programmer 
                              with Photographer {
     
     override def toString = "blogger"  
 
   }

Few more points to note about trait:

1. I have already mentioned that a trait is similar to an interface but it can have implementation. We can also declare fields in a trait. It means that trait can maintain state. In fact, a trait definition can look exactly like a class definition, with two exceptions:

(a) a trait cannot have any "class" parameters

class Person(age: Int)         // will compile

   trait AnotherPerson(age: Int)  // will not compile

(b) in traits "super" calls are dynamically bound whereas in classes they are statically bound. It means that when we write "super.toString" in a class, we know exactly which method implementation will be invoked. But when we define the trait the method implementation to invoke for the "super" call is undefined. Rather, the implementation to invoke will be determined anew each time the trait is mixed into a concrete class.

2. It is possible to add new feature to a class easily using trait.

3. Trait plays an important role in implementing DCI (Data, Context and Interaction) in Scala easily. More you can read from http://www.artima.com/articles/dci_vision.html

When to use trait?

Rules are simple to find out when to use trait:

(a) You might have noticed I used a term called "code reusability". Remember it. If the behavior might be reused in multiple, unrelated classes then you can make it a trait. Trait has the ability to mixed into different parts of the class hierarchy.
Otherwise make it a concrete class.

(b) Second thing to consider is the efficiency. When efficiency is important, it is better to use a class. Traits get compiled to interfaces. So there is a slight performance overhead. However this point should be considered if there is an evidence that trait in question is creating a performance bottleneck and a class can solve the problem.

(c) If you are not sure then start by making it as a trait. You can change it later on.

Conclusion

In this article I have explained the Scala trait - what it is, how it works, how to use them and benefit of using them. Using trait we are actually creating a unit of code that can be reused through inheritance. If we start any of our Scala project with trait then we are keeping more options open before us. I hope that you have enjoyed reading this article. Any comment is welcome.

Sunday, 31 July 2011

Scala - Lists

In this session I am going to tell you about Scala's List. Lists support fast addition and removal of items to the begining of the list, but they do not provide fast access to arbitrary indexes because the implementation must iterate through the list linearly.

Scala's List, scala.List differs from Java's java.util.List type in that Scala Lists are always immutable whereas in Java Lists can be mutable.

Since Lists are immutable in Scala, they behave a bit like Java's strings: when a method is called on a list, it creates and returns a new list with the new value.The immutability of lists helps to develop correct, efficient algorithms bacause it is never needed to make copies of a list.

The below examples show how to create list in Scala:

scala> val capital = List("London", "Delhi");
fruit: List[java.lang.String] = List(London, Delhi)

scala> println(capital)
List(London, Delhi)

scala> val testMatrix =
            List(
             List(1, 0, 0),
             List(0, 1, 0),
             List(0, 0, 1)
         )
testMatrix: List[List[Int]] = List(List(1, 0, 0), List(0, 1, 0), List(0, 0, 1))

scala> println(testMatrix)
List(List(1, 0, 0), List(0, 1, 0), List(0, 0, 1))
  
scala> val empty = List()
empty: List[Nothing] = List()

Lists are homogeneous - the elements of a list all have the same type. The type of a list that has elements of type T is written as List[T].

The list type in Scala is covariant. This means that for each pair of types S and T, if S is a subtype of T, then List[S] is a subtype of List[T].

For Example:

List[String] is a subtype of List[Object].

List[Nothing] is a subtype of every other Scala type. Nothing is the bottom type in Scala's class hierarchy.

Constructing lists

All lists are built from two fundamental building blocks:

(1) Nil =>  represents the empty
(2) : : (pronounced "cons")  =>  the infix operator expresses list extension at the front.

For Example:

 x : : xs represents a list whose first element is x, followed by ( the elements of ) list xs.
 
The above example can be written like this.

scala> val myfruits = "apple" :: ("orange" :: Nil)
myfruits: List[java.lang.String] = List(apple, orange)
  
scala> println(myfruits)
List(apple, orange)

Parentheses can be dropped.

For Example: the below list is same as above:

scala> val myfruits = "apple" :: "orange" :: Nil
myfruits: List[java.lang.String] = List(apple, orange)

Basic Operations on lists

head  =>     returns the first element of a list

tail  =>    returns a list consisting of all elements except the first

isEmpty =>    returns true if the list is empty

scala> myfruits.head
res3: java.lang.String = apple

scala> myfruits.tail
res4: List[java.lang.String] = List(orange)

scala> myfruits.isEmpty
res5: Boolean = false

scala> myfruits.tail.head
res6: java.lang.String = orange

scala> empty.isEmpty
res7: Boolean = true

The head and tail methods are defined for non-empty lists. So when selected from an empty list, they throw an exception.

List Patterns

List(...) can be used to match all elements of a list.

scala> val fruits = List("orange", "apple", "pear")
fruits: List[java.lang.String] = List(orange, apple, pear)

scala> val List(a, b, c) = fruits
a: java.lang.String = orange
b: java.lang.String = apple
c: java.lang.String = pear

In the above example the pattern List(a, b, c) matches lists of length 3, and binds the three elements to the pattern variables a, b, and c.

If the number of list elements is not known beforehand, it is better to match with :: instead.

For Example: the pattern a :: b :: rest matches lists of length 2 or greater.

scala> val e :: f :: rest = fruits
e: java.lang.String = orange
f: java.lang.String = apple
rest: List[java.lang.String] = List(pear)

Concatenating two lists

List has a method named ':::' for list concatenation.

scala> val oneTwo = List(1,2)
oneTwo: List[Int] = List(1, 2)

scala> val threeFour = List(3,4)
threeFour: List[Int] = List(3, 4)

scala> val oneTwoThreeFour = oneTwo ::: threeFour
oneTwoThreeFour: List[Int] = List(1, 2, 3, 4)

Length of a List

scala> List(2, 3, 5).length
res3: Int = 3

It is slower. It traverses the full list to find out the empty list.

Accessing the end of a list: init and last

last returns the last element of a list.

init returns the rest of the list except the last one.

scala> val abcde = List('a', 'b', 'c', 'd', 'e')
abcde: List[Char] = List(a, b, c, d, e)

scala> abcde.last
res0: Char = e

scala> abcde.init
res1: List[Char] = List(a, b, c, d)

Like head and tail, these methods throw an exception when applied to an empty list.

Unlike head and tail, which both run in constant time, init and last need to traverse the whole list to compute their result. They therefore take time proportional to the length of the list.

Reversing a list

reverse method is used to reversing a list.

scala> abcde.reverse
res2: List[Char] = List(e, d, c, b, a)

reverse creates a new list rather than changing the one it operates on. reverse has its own inverse.

scala> abcde.reverse.reverse
res3: List[Char] = List(a, b, c, d, e)

Prefixes and suffixes: drop, take and splitAt

The expression xs take n returns the first n elements of the list xs.

scala> abcde take 2
res4: List[Char] = List(a, b)

The operation xs drop n returns all elements of the list xs except the first n ones.

scala> abcde drop 2
res5: List[Char] = List(c, d, e)

The operation xs splitAt n splits the list at a given index, returning a pair of two lists.

scala> abcde splitAt 2
res6: (List[Char], List[Char]) = (List(a, b),List(c, d, e))

Element selection: apply and indices

apply is used for random element selection.

scala> abcde apply 2
res7: Char = c

apply method is rarely used in Scala because xs apply n takes time proportional to the index n. In fact apply is simply defined by a combination of drop and head.

abcde apply 2 => (abcde drop n).head

indices range from 0 up to the length of the list minus one.

scala> abcde.indices
res8:scala.collection.immutable.Range = Range(0, 1, 2, 3, 4)

Thank you for reading this article. I hope you have enjoyed it.

Friday, 8 July 2011

Java Surprises

This post is all about Java surprises that I have collected while reading Java Puzzlers : Traps, Pitfalls, and Corner Cases By Joshua Bloch, Neal Gafter . You can use this post as a quick reference. I have listed only 40 items but please read this book for more details. I believe that you will enjoy ( the way I enjoyed ) when you read this book.

1. A char is an unsigned 16-bit primitive integer, nothing more.

2. The + operator performs string concatenation if and only if at least one of its operands is of type String  otherwise, it performs addition.

3. char arrays are not strings. To convert a char array to a string, invoke String.valueOf(char[]).

4. The == operator, however, does not test whether two objects are equal; it tests whether two object references are identical. In other words, it tests whether they refer to precisely the same object.

5. Compile-time constants of type String are interned [JLS 15.28]. In other words, any two constant expressions of type String that designate the same character sequence are represented by identical object references.

  class Test {
   public static void main(String[] args) {
   
    String pig = "length: 10";
    String dog = "length: 10";
    
    System.out.println("Animals are equal: "+(pig == dog));
   }  
 }

Output
Animals are equal: true

6. Java provides no special treatment for Unicode escapes within string literals. The compiler translates Unicode escapes into the characters they represent before it parses the program into tokens, such as strings literals [JLS 3.2].

 public class Test {

  public static void main(String[] args) {

   // \u0022 is the Unicode escape for double quote (")

   System.out.println("a\u0022.length() + \u0022b".length());
     
   System.out.println("a".length() + "b".length());
  
  }
 }

Output
2
2

7. Unicode escapes must be well formed, even if they appear in comments.

8. Every time a byte sequence is translated to a String, a charset is used, whether it is specified explicitly or not. To make the program behave predictably, a charset should be specified.

9. charset: is the combination of a coded character set and a character-encoding scheme. In other words, a charset is a bunch of characters, the numerical codes that represent them, and a way to translate back and forth between a sequence of character codes and a sequence of bytes. The translation scheme differs greatly among charsets. Some have a one-to-one mapping between characters and bytes; most do not. The only default charset that will make the program print the integers from 0 to 255 in order is ISO-8859-1, more commonly known as Latin-1.

10. String.replaceAll takes a regular expression as its first parameter, not a literal sequence of characters. (Regular expressions were added to the Java platform in release 1.4.) The regular expression "." matches any single character of a string.

11. The URL that appears in the middle of the program is a statement label [JLS 14.7] followed by an end-of-line comment [JLS 3.7].

  public class Test {

    public static void main(String[] args) {

        System.out.print("I like ");

        http://www.google.com;

        System.out.println("firefox");

    }
  }

Output:

I like firefox

12. The specification for Random.nextInt(int) says: "Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive)" [Java-API]
   Random rnd = new Random();

This means that the only possible values of the expression rnd.nextInt(2) are 0 and 1.

13. StringBuffer(char) constructor does not exist. There is a parameterless constructor, one that takes a String indicating the initial contents of the string buffer and one that takes an int indicating its initial capacity.

14. The float and double types have a special NaN value to represent a quantity that is not a number.

15. NaN is not equal to any floating-point value, including itself.
    float i = Float.NAN;

     /* This is not terminate */
 
    while(i != i){ }

16. Any floating-point operation evaluates to NaN if one or more of its operands are NaN.

17. The + operator is overloaded: For the String type, it performs not addition but string concatenation. If one operand in the concatenation is of some type other than String, that operand is converted to a string prior to concatenation [JLS 15.18.1].

18.
    System.out.println(new Integer(0) == new Integer(0));
Output: False

Because of identity comparison on the object references. Two distinct objects.

19. It is illegal to apply the unary minus operator to a non-numeric operand.

20. In a try-finally statement, the finally block is always executed when control leaves the try block [JLS 14.20.2]. This is true whether the try block completes normally or abruptly. Abrupt completion of a statement or block occurs when it throws an exception, executes a break or continue to an enclosing statement, or executes a return from the method as in this program. These are called abrupt completions because they prevent the program from executing the next statement in sequence.

21. It is a compile-time error for a catch clause to catch a checked exception type E if the corresponding Try clause can't throw an exception of some subtype of E [JLS 11.2.3].

22. The set of checked exceptions that a method can throw is the intersection of the sets of checked exceptions that it is declared to throw in all applicable types, not the union.

23. Instance initializers run before constructor bodies. Any exceptions thrown by instance initializers propagate to constructors. If initializers throw checked exceptions, constructors must be declared to throw them too.

24. StackOverflowError is a subtype of Error rather than Exception.

25. Java's exception checking is not enforced by the virtual machine. It is a compile-time facility designed to make it easier to write correct programs, but it can be circumvented at run time.

26. VM loads and initializes the class containing its main method. In between loading and initialization, the VM must link the class [JLS 12.3]. The first phase of linking is verification. Verification ensures that a class is well formed and obeys the semantic requirements of the language. Verification is critical to maintaining the guarantees that distinguish a safe language like Java from an unsafe language like C or C++.

27. To write a program that can detect when a class is missing, use reflection to refer to the class rather than the usual language constructs.

  try {
    Object m = Class.forName("Missing").newInstance();
  } catch (ClassNotFoundException ex) {
    System.err.println("Got it!");
  }

28. Class.newInstance method instantiates a class reflectively. Class.newInstance invokes a class's parameterless constructor. Class.newInstance can throw checked exceptions that it does not declare.

29. Java's overload resolution process operates in two phases. The first phase selects all the methods or constructors that are accessible and applicable. The second phase selects the most specific of the methods or constructors selected in the first phase. One method or constructor is less specific than another if it can accept any parameters passed to the other [JLS 15.12.2.5].

The test for which method or constructor is most specific does not use the actual parameters: the parameters appearing in the invocation. They are used only to determine which overloadings are applicable. Once the compiler determines which overloadings are applicable and accessible, it selects the most specific overloading, using only the formal parameters: the parameters appearing in the declaration.

More generally, to force the compiler to select a specific overloading, cast actual parameters to the declared types of the formal parameters.

30. A single copy of each static field is shared among its declaring class and all subclasses.

31. When in doubt, favor composition over inheritance.

32. There is no dynamic dispatch on static methods [JLS 15.12.4.4]. When a program calls a static method, the method to be invoked is selected at compile time, based on the compile-time type of the qualifier, which is the name we give to the part of the method invocation expression to the left of the dot.

33. The instanceof operator is defined to return false when its left operand is null.

  String s = null;
  
  System.out.println(s instanceof String);

Output: false

34. Class initialization executes static initializers in the order they appear in the source.

35. A qualifying expression for a static method invocation is evaluated, but its value is ignored.

  public class Null {
    
    public static void greet() {
      System.out.println("Hello world!");
    }
    
    public static void main(String[] args) {
      ((Null) null).greet();
    }
 
  }

Output: Hello world!

36. Java language syntax does not allow a local variable declaration statement as the statement repeated by a for, while, or do loop [JLS 14.12-14]. A local variable declaration can appear only as a statement directly within a block. (A block is a pair of curly braces and the statements and declarations contained within it.)

  // Does not compile
        
  for (int i = 0; i < 100; i++)
    Creature creature = new Creature();
    
  Two ways to fix this problem:
  
  for (int i = 0; i < 100; i++) {
    Creature creature = new Creature();
  }
 
  OR
 
  for (int i = 0; i < 100; i++)
    new Creature();
37. BigInteger instances are immutable. So are instances of String, BigDecimal, and the wrapper types: Integer, Long, Short, Byte, Character, Boolean, Float, and Double. So their values can't be changed. Instead of modifying existing instances, operations on these types return new instances.
  BigInteger fiveThousand  = new BigInteger("5000");
 
  BigInteger total = BigInteger.ZERO;

  total.add(fiveThousand);
 
  System.out.println(total);
Output: 0
  BigInteger fiveThousand  = new BigInteger("5000");
  
  BigInteger total = BigInteger.ZERO;
 
  total = total.add(fiveThousand);
  
  System.out.println(total);
Output: 5000 38. hashCode() method must be overridden whenever equals() method is overrridden, otherwise hashCode() implementation from Object will be inherited. This implementation returns an identity-based hash code. In other words, distinct objects are likely to have unequal hash values, even if they are equal.

39. When a variable and a type have the same name and both are in scope, the variable name takes precedence [JLS 6.5.2]. The variable name is said to obscure the type name [JLS 6.3.2]. Similarly, variable and type names can obscure package names.

40. A package-private method cannot be directly overridden by a method in a different package [JLS 8.4.8.1].

Friday, 10 June 2011

Scala - tutorial for newbies (Part 2)

This is the next part of this tutorial. In this session I will tell you how to compile Scala program. I will also let you know some more characteristics of Scala. I hope that you will enjoy it. So lets begin.

Scala is a pure object-oriented language. Every value in Scala is an object and every operation is a method call. So when you type 2 + 3 in your Scala interpreter you are actually invoking a method named "+" defined in class Int. With this little bit of Scala knowledge we will try to write and compile our first Scala program.

object HelloWorld {
  def main (args: Array[String]) {
     println("Hello World")
  }
}

Please type the above program in your favorite notepad, save it and give it a name, say HelloWorld.scala. Now it is time to compile the program. We will use scalac for this purpose. Execute the below command in your command prompt:


E:\>scalac HelloWorld.scala

E:\> 


This will generate few class files in your current directory. One of them will be HelloWorld.class. It contains a class which can be directly executed using the scala command.


E:\>scala HelloWorld

E:\>Hello World


If you have done little bit of Java programming before then the above program is not difficult to understand. This program consists of a main method that takes the command line arguments, an array of Strings, as parameter. The main method does not return any value. For this reason there is no declaration of return type. It has a single call to the predefined method println with the friendly greeting as argument.

There are few things to notice in the above program. One of them is object declaration. This declaration introduces singleton object - a class with a single instance. So the above declaration declares both a class called HelloWorld and an instance of that class, also called HelloWorld. The next thing to notice - the main method is not declared as static. In Scala, static members (methods or fields) do not exist. You need to declare these members in singleton objects.

By this time I think you know how to define a function. In Scala, function definition starts with def, then the function's name followed by a comma-sperated list of parameters in parentheses.

Lets try the below example in your scala interpreter.


scala> def min(x: Int, y: Int): Int = {
      |   if( x < y ) x
      |   else y
      | }
 min: (x: Int,y: Int)Int


In the above example, the function named min takes two parameters, x and y, both of type Int. You might have noticed that after the closing parenthesis of min's parameters list there is another : Int type annotation. This defines the result type of the min function. In Java, the type of the value returned from a method is its return type. In Scala it is called result type.

Did you see the equals sign in the above function? It says that a function defines an expression that results in a value. Since the above function contains only one statement, you can leave off the curly braces. It can be written like:


scala>def min2(x: Int, y: Int) = if (x < y) x else y
min2: (x: Int,y: Int)Int


There may be situation when you need to define a method that takes no parameter and returns no value. Here is an example:


scala> def sayHelloToScala() = println("Hello Scala")
sayHelloToScala: ()Unit


Scala interpreter has responded with sayHelloToScala: ()Unit. Here sayHelloToScala is the function's name. The empty parentheses indicate that this function does not take any parameter. Unit is the result type of this function. A result type of Unit indicates function does not return any value. It is similar to Java's void type.

With this I stop for now. Thank you very much for reading it. Please give your feedback.

Sunday, 5 June 2011

Scala - tutorial for newbies (Part 1)

Last couple of days I have been thinking of sharing my Scala notes. I am still in learning phase. So any feedback from you will definitely help me to improve my understanding in Scala. There are many articles available in the Internet about Scala. You can read these articles to know more about Scala. In this article I will tell you how to program in Scala. There will be several parts of this tutorial. This is the first part.

A brief introduction:
  • Scala stands for "scalable language".
  • It is both object-oriented and functional.
  • Java compatible - Scala programs compile to JVM bytecode.
  • Scala program can call Java methods, access Java fields, inherit from Java classes, and implement Java interfaces - good news to Java programmers.
  • Scala is concise, less ceremonies.
  • Scala is statically typed.
  • And it is smart Java.

Please download the latest Scala installation from http://www.scala-lang.org/downloads and follow the instructions suitable for your platform. I will use Scala interpreter to try all the examples. So let start writing some Scala code. In my machine I have Scala 2.8.1.final. If you install Scala correctly and enter "scala" at a command prompt then you should see something like this:

E:\>scala
Welcome to Scala version 2.8.1.final (Java HotSpot(TM) Client VM, Java 1.6.0_23)

Type in expressions to have them evaluated.
Type :help for more information.

scala>

Probably every programmer in this planet started their programming journey by typing "Hello World!". Let me show you how you can do it in Scala.

scala> println("Hello World!")
Hello World!

scala>

I think you probably notice that I haven't typed 'System.out.println("Hello World!")' like Java. I also haven't used any semi-colon ";". By this time I believe you have understood why I said that Scala is concise and less ceremonial. You will see these things more.

Now type 2 + 3.

scala> 2 + 3
res1: Int = 5

In the above figure you will find that an automatic user-defined name "res1" is created and 5 is assigned to it. "res1" means result 1. "Int" indicates integer.

You can use "res1" identifier to do some further programming. For example:

scala> res1 * 4
res2: Int = 20

A new identifier "res2" is created and 20 is assigned to it. The value inside "res1" is not changed, it is still 5. If you type "println(res1)" at your command prompt you will see this.

scala> println(res1)
5

Now it is time to do some experiment with variables. Scala supports two kinds of variables - val and var. A val can never be reassigned if it is already assigned. So it is similar to final variable in Java. But a var can be reassigned. Try these below examples:

scala> val mymsg = "Hello to Scala"
mymsg: java.lang.String = Hello to Scala

A val variable "mymsg" of type java.lang.String is created and "Hello to Scala" is assigned to it. Now if you try to reassign it with a new value you will get an error message.

scala> mymsg = "Good Morning!!"
<console>:6: error: reassignment to val
mymsg = "Good Morning!!"
^

Now if you try the same thing with var variable you will not see any error message.

scala> var mymsg2 = "Hello to Scala"
mymsg2: java.lang.String = Hello to Scala

scala> mymsg2 = "Good Morning"
mymsg2: java.lang.String = Good Morning

If you look carefully in above variable related examples you will find that I haven't specified "java.lang.String" anywhere in my variable definition. This shows Scala's ability to figure out types if you leave off - "type inference".

Few more points before I stop this article for today:

Packages in Scala are similar to packages in Java: they partition the global namespace and provide a mechanism for information hiding.

All of Java's primitive types have corresponding classes in the Scala package.

For Example:

scala.Boolean => Java's boolean
scala.Float => Java's float

Once again thank you very much for reading this article. I hope that you enjoy it and like me you will also keep learning Scala. Any feedback is welcome.

Sunday, 22 May 2011

Spring Batch in a Web Container

In this post I will show how to use Spring Batch in a web container (Tomcat). I will upload vacancy related data from a flat file to the database using Spring Batch. Before I show how I have done this, a brief introduction to Spring Batch is necessary.

Spring Batch - An Introduction

Spring Batch is a lightweight batch processing framework. Spring Batch is designed for bulk processing to perform business operations. Moreover it also provides logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. The below diagram shows the processing strategy provided by Spring Batch (source: http://static.springsource.org/spring-batch/reference/html/whatsNew.html)


A batch Job has one or more step(s).

A JobInstance is a representation of a Job. JobInstances are distinguished from each other with the help of JobParameter. JobParameters is a set of parameters used to start a batch job. Each run of of a JobInstance is a JobExecution.

A Step contains all of the information necessary to define and control the actual batch processing. In our case the "vacancy_step" is responsible to upload vacancy data from a flat file to database.

ItemReader is responsible retrieval of input for a Step, one item at a time, whereas ItemWriter represents the output of a Step, one batch or chunk of items at a time.

JobLauncher is used to launch a Job with a given set of JobParameters.

JobRepository is used to to store runtime information related to the batch execution.

A tasklet is an object containing any custom logic to be executed as a part of a job.

I have used SpringSource Tool Suite (STS) and Spring Roo to develop a simple web application which is responsible for initiating the batch processing upon receiving a request from a user. Below figure shows how batch processing will be started upon receiving the request (source: http://static.springsource.org/spring-batch/reference/html/)




Spring Roo is very good to develop a prototype application in a short period of time using Spring best practices. You can also use Eclipse to implement this.

If you have Spring STS then open it and create Spring Roo Project.

File -> New -> Spring Roo Project.

Give project name and top level package name.

Now open the Roo shell in your STS and execute the below commands:

roo > persistence setup --database MYSQL --provider HIBERNATE
roo > entity --class ~.model.Vacancy --testAutomatically
roo > field string --fieldName referenceNo
roo > field string --fieldName title
roo > field string --fieldName salary

Here is my Vacancy Entity Class

@RooJavaBean
@RooToString
@RooEntity
public class Vacancy {

      private String referenceNo;

      private String title;

      private String salary;
}

I have used MYSQL as my backend database (you can use any database). I have created "batchsample" database. So please create a database and enter the below details in the "database.properties"  file

database.password=admin
database.url=jdbc\:mysql\://localhost\:3306/batchsample
database.username=root
database.driverClassName=com.mysql.jdbc.Driver

I have also written a simple integration test to find out whether my database configuration is ok or not.

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = "classpath:/META-INF/spring/applicationContext.xml")
@Transactional
public class VacancyIntegrationTest {

     private SimpleJdbcTemplate jdbcTemplate;

    @Autowired
    public void initializeJdbcTemplate(DataSource ds){
            jdbcTemplate = new SimpleJdbcTemplate(ds);
    }

   @Test
   public void testBatchDbConfig() {
           Assert.assertEquals(0, jdbcTemplate.queryForInt("select count(0) from vacancy"));
    }
}

Run this test. If the test is passed then execute the below roo command to create web infrastructure for this application.

roo > controller all --package ~.web

Roo will create necessary web structure. A controller called "VacancyController" will also be created by Roo to handle the request.

I have slightly modified the VacancyController to meet my needs. Here is the controller:


@Controller
@RequestMapping("/vacancy/*")
public class VacancyController {
   
    private static Log log = LogFactory.getLog(VacancyController.class);
   
    @Autowired
    private ApplicationContext context;
   
    @RequestMapping("list")
    public String list(Model model) {
       
        model.addAttribute("vacancies", Vacancy.findAllVacancys());
       
        return "vacancy/list";
    }
   
    @RequestMapping("handle")
    public String jobLauncherHandle(){
       
           JobLauncher jobLauncher = (JobLauncher)context.getBean("jobLauncher");

           Job job = (Job)context.getBean("vacancyjob");
       
           log.info(jobLauncher);
           log.info(job);
       
           ExitStatus exitStatus = null;
       
           try {
           

                       JobExecution jobExecution = jobLauncher.run(
                                            job,
                                            new JobParametersBuilder()
                                            .addDate("date", new Date())
                                            .toJobParameters()
                                      );
           
                  exitStatus = jobExecution.getExitStatus();
           
                  log.info(exitStatus.getExitCode());
        }
        catch(JobExecutionAlreadyRunningException jobExecutionAlreadyRunningException) {
            log.info("Job execution is already running.");
        }   
        catch(JobRestartException jobRestartException) {
            log.info("Job restart exception happens.");
        }
        catch(JobInstanceAlreadyCompleteException jobInstanceAlreadyCompleteException) {
            log.info("Job instance is already completed.");
        }
        catch(JobParametersInvalidException jobParametersInvalidException){
            log.info("Job parameters invalid exception");
        }
        catch(BeansException beansException) {
            log.info("Bean is not found.");
        }
       
        return "vacancy/handle";
    }
}


Now it is the time to include the batch configuration in the applicationContext.xml.

applicationContext.xml

<context:property-placeholder location="classpath*:META-INF/spring/*.properties">

<context:spring-configured>

<context:component-scan base-package="com.mega">
<context:exclude-filter expression=".*_Roo_.*" type="regex">
<context:exclude-filter expression="org.springframework.stereotype.Controller" type="annotation">
</context:exclude-filter></context:exclude-filter></context:component-scan>
<bean class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="dataSource">
<property name="driverClassName" value="${database.driverClassName}">
<property name="url" value="${database.url}">
<property name="username" value="${database.username}">
<property name="password" value="${database.password}">
<property name="validationQuery" value="SELECT 1 FROM DUAL">
<property name="testOnBorrow" value="true">
</property></property></property></property></property></property></bean>
<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager">
<property name="entityManagerFactory" ref="entityManagerFactory">
</property></bean>
<tx:annotation-driven mode="aspectj" transaction-manager="transactionManager">
<bean class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="entityManagerFactory">
<property name="dataSource" ref="dataSource">
</property></bean>

<import resource="classpath:/META-INF/spring/batch-context.xml">

<bean class="org.springframework.batch.core.launch.support.SimpleJobLauncher" id="jobLauncher">
<property name="jobRepository" ref="jobRepository">
<property name="taskExecutor">
<bean class="org.springframework.core.task.SimpleAsyncTaskExecutor">
</bean></property>
</property></bean>

<bean class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean" id="jobRepository" p:datasource-ref="dataSource" p:tableprefix="BATCH_" p:transactionmanager-ref="transactionManager">
<property name="isolationLevelForCreate" value="ISOLATION_DEFAULT">
</property></bean>
</import></tx:annotation-driven></context:spring-configured></context:property-placeholder>

I have kept batch job related configuration in a sperate file "batch-context.xml"

batch-context.xml

<description>Batch Job Configuration</description>

<job id="vacancyjob" xmlns="http://www.springframework.org/schema/batch">
<step id="vacancy_step" parent="simpleStep">
<tasklet>
<chunk reader="vacancy_reader" writer="vacancy_writer"/>
</tasklet>
</step>
</job>

<bean id="vacancy_reader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="classpath:META-INF/data/vacancies.csv"/>
<property name="linesToSkip" value="1" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="reference,title,salary"/>
</bean>
</property>
<property name="fieldSetMapper">
<bean class="com.mega.batch.fieldsetmapper.VacancyMapper"/>
</property>
</bean>
</property>
</bean>

<bean id="vacancy_writer" class="com.mega.batch.item.VacancyItemWriter" />

<bean id="simpleStep"
class="org.springframework.batch.core.step.item.SimpleStepFactoryBean"
abstract="true">
<property name="transactionManager" ref="transactionManager" />
<property name="jobRepository" ref="jobRepository" />
<property name="startLimit" value="100" />
<property name="commitInterval" value="1" />
</bean>

I have written VacancyItemWriter to save the vacancy related data in the Database.

public class VacancyItemWriter implements ItemWriter<Vacancy> {

    private static final Log log = LogFactory.getLog(VacancyItemWriter.class);
   
    /**
     * @see ItemWriter#write(Object)
     */
    public void write(List<? extends Vacancy> vacancies) throws Exception {
       
        for (Vacancy vacancy : vacancies) {
            log.info(vacancy);
            vacancy.persist();
            log.info("Vacancy is saved.");
        }
   
    }

You will find other additional helper classes such as VacancyMapper, ProcessorLogAdvice, SimpleMessageApplicationEvent etc. in the attached ZIP file. Once the configuration is completed please run the application in your tc / tomcat server. 

In this article I have demonstrated Spring Batch in a web container by building a simple Spring application. Additional information is available in Spring Batch Reference Document. Please download the application by clicking the below link and have fun !!!! 


Note: Spring Batch related monitoring tables can be created by executing the commands found in "schema-mysql.sql" file available in spring-batch-core-2.1.1.RELEASE.jar in your mysql command prompt.

References:

1. http://static.springsource.org/spring-batch/reference/html/
2. http://java.dzone.com/news/spring-batch-hello-world-1
3. http://static.springsource.org/spring-roo/reference/html/

 

Sunday, 15 May 2011

Java Garbage Collection Process

Efficient memory management is important to run a software system smoothly. In this article I will write my understanding of Java garbage collection process. Any feedback is welcome.  I hope that you will enjoy reading it.

So, What is garbage collection?

An application can create a large amount of short lived objects during its life span. These objects consume memory and memory is not unlimited. Garbage collection (GC) is a process of reclaiming memory occupied by objects that are no longer needed (garbage) and making it available for new objects. An object is considered garbage when it can no longer be reached from any pointer in the running program.

Heap plays a very important role in this process. Objects are allocated on the heap. In fact to understand how Java garbage collection works we need to know how heap is designed in the Java Virtual Machine (JVM).

Heap

Heap is the memory area within the virtual machine where objects are born, live and die. It is divided into two parts:

(a) First part - young space  contains recent objects, also called children.
(b) Second part - tenured space holds objects with a long life span, also called ancestors.

There is another particular memory area next to the heap is called Perm, in which the binary code of each class loaded.

Eden Survivor Survivor Virtual Objects Virtual Virtual
Young Tenured Perm

The young space is divided into Eden and two survivor spaces. Whenever a new object is allocated to the heap, the JVM puts it in the Eden. GC treats two survivors as temporary storage buckets. Young space/ generation is for recent objects and tenured space/generation is for old objects.  Both the young and tenured space contain a virtual space - a zone of memory available to the JVM but free of any data. This means those spaces might grow and shrink with time.

How the garbage collection (GC) process works:

Memory is managed in “generations” or memory pools holding objects of different ages. Garbage collection occurs in each generation when the generation fills up. The vast majority of objects are allocated in a pool dedicated to young objects (the young generation/space), and most objects die there. When the young generation fills up it causes a “minor collection”. During a minor collection, the GC runs through every object in both Eden and the occupied survivor space to determine which ones are still alive, in other words which still have external references to themselves. Each one of them will then be copied into empty survivor space.

At the end of a minor collection, both the Eden and the explored survivor space are considered empty. As minor collection are performed, living objects proceed from one survivor space to the other. As an object reaches a given age, dynamically defined at runtime by HotSpot, or as the survivor space gets too small, a copy is made to the tenured space. Yet most objects are still born and die right in the young space.

Eventually, the tenured space/generation will fill up and must be collected, resulting in a major collection, in which the entire heap is collected. It is done with the help of the Mark-Sweep-Compact algorithm. During this process the GC will run through all the objects in the heap, mark the candidates for memory reclaiming and run through the heap again to compact remaining objects and avoid memory fragmentation. At the end of this cycle, all living objects exist side by side in the tenured space.

Performance

Throughput and Pauses are the two primary measures of garbage collection performance.

Throughput is the percentage of total time not spent in garbage collection, considered over long periods of time. Throughput includes time spent in allocation.

Pauses are the times when an application appears unresponsive because garbage collection is happening.

For example: in an interactive graphics program short pauses may negatively affect user experience whereas pauses during garbage collection may be tolerable in a web server.

Other two issues should be taken into considerations: Footprint and Promptness.

Footprint is the working set of a process, measured in pages and cache lines.

Promptness is the time between when  an object becomes dead and when the memory becomes available.

A very large young generation may maximize throughput at the expense of footprint, promptness and pause times. On the other hand young generation pauses can be minimized by using a small young generation at the expense of throughput.

There is no one right way to size generations. The best choice is determined by the way the application uses memory as well as user requirements.

Available Collectors

The Java HotSpot VM includes three different collectors, each with different performance characteristics:

(1) Serial Collector

  • it uses a single thread to perform all garbage collection work.
  • there is no communication overhead between threads.
  • it is best-suited to single processor machine.
  • it can be useful on multiprocessors for applications with small data sets (up to approximately 100MB).
  • it can be explicitly enabled with the option -XX:+UseSerialGC.

(2) Parallel/Throughput Collector

  • it performs minor collections in parallel, which can significantly reduce garbage collection overhead.
  • it is useful for applications with medium-to large-sized data sets that are run on multiprocessor or multithreaded hardware.
  • it can be explicitly enabled with the option -XX:+UseParallelGC

(3) Concurrent Collector

  • it performs most of its work concurrently (i.e. while the application is still running) to keep garbage collection pauses short.
  • it is designed for applications with medium-to large-sized data sets for which response time is important than overall throughput.
  • it can be explicitly enabled with the option -XX:+UseConcMarkSweepGC.

Default Settings

By default the following selections were made in the J2SE platform version 1.4.2

    - Serial Garbage Collector
    - Initial heap size of 4 Mbyte
    - Maximum heap size of 64 Mbyte
    - Client runtime compiler

In the J2SE platform version 1.5 a class of machine referred to as a server-class machine has been defined as a machine with

    1.  >= 2 physical processors
    2.  >= 2 Gbytes of physical memory

Default settings for this type of machine:

    - Throughput Garbage Collector
    - Initial heap size of 1/64 of physical memory up to 1Gbyte
    - Maximum heap size of ¼ of physical memory up to 1 Gbyte
    - Server runtime compiler

Some of the HotSpot VM options can be used for tuning:

-Xms <size>

    - specifies the minimal size of the heap.
    - this option is used to avoid frequent resizing of the heap when the application needs a lot of memory.

-Xmx <size>

    - specifies the maximum size of the heap.
    - this option is used mainly by server side applications that sometimes need several gigs of memory.

So the heap is allowed to grow and shrink between these two values defined by -Xms and -Xmx.

-XX:NewRatio = < a number>

    - specifies the size ratio between the tenured and young space.

For example: -XX:NewRatio = 2 would yield a 64 MB tenured space and a 32 MB young space, together a 96 MB heap.

-XX:SurvivorRatio = < a number >

    - specifies the size ratio between the eden and one survivor space.

For example: with a ratio of 2 and a young space of 64 MB, the eden will need 32 MB of memory whereas each survivor space will use 16MB.

-XX:+PrintGCDetails

    - causes additional information about the collections to be printed.

-XX:MaxPermSize=<N>

    - using this option the maximum permanent generation size can be increased

References:

1. Know Your Worst Friend, the Garbase Collector by Romain Guy.

2. Virtual Machine Garbage Collection Tuning

3. Ergonomics in the 5.0 JavaTM Virtual Machine