Sunday, 22 May 2011

Spring Batch in a Web Container

In this post I will show how to use Spring Batch in a web container (Tomcat). I will upload vacancy related data from a flat file to the database using Spring Batch. Before I show how I have done this, a brief introduction to Spring Batch is necessary.

Spring Batch - An Introduction

Spring Batch is a lightweight batch processing framework. Spring Batch is designed for bulk processing to perform business operations. Moreover it also provides logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. The below diagram shows the processing strategy provided by Spring Batch (source: http://static.springsource.org/spring-batch/reference/html/whatsNew.html)


A batch Job has one or more step(s).

A JobInstance is a representation of a Job. JobInstances are distinguished from each other with the help of JobParameter. JobParameters is a set of parameters used to start a batch job. Each run of of a JobInstance is a JobExecution.

A Step contains all of the information necessary to define and control the actual batch processing. In our case the "vacancy_step" is responsible to upload vacancy data from a flat file to database.

ItemReader is responsible retrieval of input for a Step, one item at a time, whereas ItemWriter represents the output of a Step, one batch or chunk of items at a time.

JobLauncher is used to launch a Job with a given set of JobParameters.

JobRepository is used to to store runtime information related to the batch execution.

A tasklet is an object containing any custom logic to be executed as a part of a job.

I have used SpringSource Tool Suite (STS) and Spring Roo to develop a simple web application which is responsible for initiating the batch processing upon receiving a request from a user. Below figure shows how batch processing will be started upon receiving the request (source: http://static.springsource.org/spring-batch/reference/html/)




Spring Roo is very good to develop a prototype application in a short period of time using Spring best practices. You can also use Eclipse to implement this.

If you have Spring STS then open it and create Spring Roo Project.

File -> New -> Spring Roo Project.

Give project name and top level package name.

Now open the Roo shell in your STS and execute the below commands:

roo > persistence setup --database MYSQL --provider HIBERNATE
roo > entity --class ~.model.Vacancy --testAutomatically
roo > field string --fieldName referenceNo
roo > field string --fieldName title
roo > field string --fieldName salary

Here is my Vacancy Entity Class

@RooJavaBean
@RooToString
@RooEntity
public class Vacancy {

      private String referenceNo;

      private String title;

      private String salary;
}

I have used MYSQL as my backend database (you can use any database). I have created "batchsample" database. So please create a database and enter the below details in the "database.properties"  file

database.password=admin
database.url=jdbc\:mysql\://localhost\:3306/batchsample
database.username=root
database.driverClassName=com.mysql.jdbc.Driver

I have also written a simple integration test to find out whether my database configuration is ok or not.

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = "classpath:/META-INF/spring/applicationContext.xml")
@Transactional
public class VacancyIntegrationTest {

     private SimpleJdbcTemplate jdbcTemplate;

    @Autowired
    public void initializeJdbcTemplate(DataSource ds){
            jdbcTemplate = new SimpleJdbcTemplate(ds);
    }

   @Test
   public void testBatchDbConfig() {
           Assert.assertEquals(0, jdbcTemplate.queryForInt("select count(0) from vacancy"));
    }
}

Run this test. If the test is passed then execute the below roo command to create web infrastructure for this application.

roo > controller all --package ~.web

Roo will create necessary web structure. A controller called "VacancyController" will also be created by Roo to handle the request.

I have slightly modified the VacancyController to meet my needs. Here is the controller:


@Controller
@RequestMapping("/vacancy/*")
public class VacancyController {
   
    private static Log log = LogFactory.getLog(VacancyController.class);
   
    @Autowired
    private ApplicationContext context;
   
    @RequestMapping("list")
    public String list(Model model) {
       
        model.addAttribute("vacancies", Vacancy.findAllVacancys());
       
        return "vacancy/list";
    }
   
    @RequestMapping("handle")
    public String jobLauncherHandle(){
       
           JobLauncher jobLauncher = (JobLauncher)context.getBean("jobLauncher");

           Job job = (Job)context.getBean("vacancyjob");
       
           log.info(jobLauncher);
           log.info(job);
       
           ExitStatus exitStatus = null;
       
           try {
           

                       JobExecution jobExecution = jobLauncher.run(
                                            job,
                                            new JobParametersBuilder()
                                            .addDate("date", new Date())
                                            .toJobParameters()
                                      );
           
                  exitStatus = jobExecution.getExitStatus();
           
                  log.info(exitStatus.getExitCode());
        }
        catch(JobExecutionAlreadyRunningException jobExecutionAlreadyRunningException) {
            log.info("Job execution is already running.");
        }   
        catch(JobRestartException jobRestartException) {
            log.info("Job restart exception happens.");
        }
        catch(JobInstanceAlreadyCompleteException jobInstanceAlreadyCompleteException) {
            log.info("Job instance is already completed.");
        }
        catch(JobParametersInvalidException jobParametersInvalidException){
            log.info("Job parameters invalid exception");
        }
        catch(BeansException beansException) {
            log.info("Bean is not found.");
        }
       
        return "vacancy/handle";
    }
}


Now it is the time to include the batch configuration in the applicationContext.xml.

applicationContext.xml

<context:property-placeholder location="classpath*:META-INF/spring/*.properties">

<context:spring-configured>

<context:component-scan base-package="com.mega">
<context:exclude-filter expression=".*_Roo_.*" type="regex">
<context:exclude-filter expression="org.springframework.stereotype.Controller" type="annotation">
</context:exclude-filter></context:exclude-filter></context:component-scan>
<bean class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close" id="dataSource">
<property name="driverClassName" value="${database.driverClassName}">
<property name="url" value="${database.url}">
<property name="username" value="${database.username}">
<property name="password" value="${database.password}">
<property name="validationQuery" value="SELECT 1 FROM DUAL">
<property name="testOnBorrow" value="true">
</property></property></property></property></property></property></bean>
<bean class="org.springframework.orm.jpa.JpaTransactionManager" id="transactionManager">
<property name="entityManagerFactory" ref="entityManagerFactory">
</property></bean>
<tx:annotation-driven mode="aspectj" transaction-manager="transactionManager">
<bean class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean" id="entityManagerFactory">
<property name="dataSource" ref="dataSource">
</property></bean>

<import resource="classpath:/META-INF/spring/batch-context.xml">

<bean class="org.springframework.batch.core.launch.support.SimpleJobLauncher" id="jobLauncher">
<property name="jobRepository" ref="jobRepository">
<property name="taskExecutor">
<bean class="org.springframework.core.task.SimpleAsyncTaskExecutor">
</bean></property>
</property></bean>

<bean class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean" id="jobRepository" p:datasource-ref="dataSource" p:tableprefix="BATCH_" p:transactionmanager-ref="transactionManager">
<property name="isolationLevelForCreate" value="ISOLATION_DEFAULT">
</property></bean>
</import></tx:annotation-driven></context:spring-configured></context:property-placeholder>

I have kept batch job related configuration in a sperate file "batch-context.xml"

batch-context.xml

<description>Batch Job Configuration</description>

<job id="vacancyjob" xmlns="http://www.springframework.org/schema/batch">
<step id="vacancy_step" parent="simpleStep">
<tasklet>
<chunk reader="vacancy_reader" writer="vacancy_writer"/>
</tasklet>
</step>
</job>

<bean id="vacancy_reader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="classpath:META-INF/data/vacancies.csv"/>
<property name="linesToSkip" value="1" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="lineTokenizer">
<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="reference,title,salary"/>
</bean>
</property>
<property name="fieldSetMapper">
<bean class="com.mega.batch.fieldsetmapper.VacancyMapper"/>
</property>
</bean>
</property>
</bean>

<bean id="vacancy_writer" class="com.mega.batch.item.VacancyItemWriter" />

<bean id="simpleStep"
class="org.springframework.batch.core.step.item.SimpleStepFactoryBean"
abstract="true">
<property name="transactionManager" ref="transactionManager" />
<property name="jobRepository" ref="jobRepository" />
<property name="startLimit" value="100" />
<property name="commitInterval" value="1" />
</bean>

I have written VacancyItemWriter to save the vacancy related data in the Database.

public class VacancyItemWriter implements ItemWriter<Vacancy> {

    private static final Log log = LogFactory.getLog(VacancyItemWriter.class);
   
    /**
     * @see ItemWriter#write(Object)
     */
    public void write(List<? extends Vacancy> vacancies) throws Exception {
       
        for (Vacancy vacancy : vacancies) {
            log.info(vacancy);
            vacancy.persist();
            log.info("Vacancy is saved.");
        }
   
    }

You will find other additional helper classes such as VacancyMapper, ProcessorLogAdvice, SimpleMessageApplicationEvent etc. in the attached ZIP file. Once the configuration is completed please run the application in your tc / tomcat server. 

In this article I have demonstrated Spring Batch in a web container by building a simple Spring application. Additional information is available in Spring Batch Reference Document. Please download the application by clicking the below link and have fun !!!! 


Note: Spring Batch related monitoring tables can be created by executing the commands found in "schema-mysql.sql" file available in spring-batch-core-2.1.1.RELEASE.jar in your mysql command prompt.

References:

1. http://static.springsource.org/spring-batch/reference/html/
2. http://java.dzone.com/news/spring-batch-hello-world-1
3. http://static.springsource.org/spring-roo/reference/html/

 

4 comments:

  1. Great article. It would be better if you use a syntax highlighter like the one from http://alexgorbatchev.com/SyntaxHighlighter/

    ReplyDelete
  2. It not works for me. I have in my pom file:

    roo.version 1.1.4.RELEASE
    spring.batch.version 2.1.8.RELEASE
    spring.version 3.0.5.RELEASE
    aspectj.version 1.6.11
    slf4j.version 1.6.1

    ApplicationConversionServiceFactoryBean not compile and it gives this error:

    The type FactoryBean is not generic; it cannot be parameterized with arguments

    ReplyDelete
  3. Hi Orfeo,

    Sorry for the delayed reply. In my machine I have

    STS 2.5.1
    ROO 1.1.0
    SPRING 3.0.5
    ASPECTJ 1.6.10
    SLF4J 1.6.1
    BATCH 2.1.1

    I will try to update everything and findout what went wrong (if any). Thank you.

    Sanjoy

    ReplyDelete
  4. Has got something to do with jdk version I think

    ReplyDelete