Spring batch processing: overview
What is the spring batch?
Spring batch is a framework processing large volumes of records, including transaction management, job processing statistics, and resource management. It provides advanced services and features which enable the processing of high-volume data through chunk-based processing and partitioning techniques.
Spring batch flow:
Job Launcher: It starts a job execution for a given job which passed as a parameter. It also accepts job parameters that can be used during job execution.
jobLauncher.run("job name", jobParameters);
Job is executed from the job launcher. It represents the configuration and updates the meta-information like status and statistics as necessary of jobs in a job execution context. JobExecution indicates the “physical” execution of Job.
Step is executed from the job. It fetches the input data by using ItemReader. It processes the input data by using ItemProcessor and outputs the processed data by using ItemWriter.
Job can have more than one steps in job flow.
JobLauncher registers JobInstance in Database through JobRepository.It registers that job execution has started, updates information like i/o records, status, job execution has completed in the database through JobRepository.
Develop a spring batch application using spring boot.
In this, we will develop a spring application that will run the spring batch job that will read the list of student details from a CSV file, transform it using a custom processor, and stores the final result in an in-memory database.
Prerequisite:
- JDK 1.8 or later
- Gradle 4+
- Intellij (IDE).
- Getting started:
- Add Gradle dependencies for Spring Batch and HyperSQL Database.
build.gradle:
implementation 'org.springframework.boot:spring-boot-starter-batch'
runtimeOnly 'org.hsqldb:hsqldb'
2. CSV file and SQL script
student-data.csv: This file contains the data to transform
Smit,Doe
Joe,Doe
Justin,Drake
Jane,Dyane
John,Shaw
student-schema.sql: SQL script to create student table and store data.
DROP TABLE student IF EXISTS;
CREATE TABLE student (
person_id BIGINT IDENTITY NOT NULL PRIMARY KEY,
first_name VARCHAR(20),
last_name VARCHAR(20)
);
Add the following property in the application.yml file.
file.input: student-data.csv
3. Student.java: Model class to hold the student details.
4. Job configuration: To put all configurations in a place.
@EnableBatchProcessing gives us access to many useful beans that support jobs and will save us a lot of leg work.
It also provides us with access to two useful factories:
JobBuilderFactory (Creates a job builder and initializes its job repository). StepBuilderFactory (Creates a step builder and initializes its job repository and transaction manager).
5. StudentItemProcessor.java: To transform student details.
What all steps are doing:
- we configure our step so that it will write up to ten records at a time using the chunk(10) declaration.
- we read in the student data using our reader bean, which we set using the reader method.
- we pass each of our student items to a custom processor where we apply some custom business logic.
- we write each student item to the database using the writer we saw previously.
6. Job Notification listener which we use to get notified when the job completes. It also tells us the execution start time, end time along job status.
7. Spring batch application:
8. Run the Job :
You can either run the job by using:
java -jar <Jar-file-name.jar>
Or
Write a test controller and use the JobLauncher to trigger the job and then hit the controller from the postman.
Applications of Spring batch:
- Moving high volumes of data between databases, transforming it.
- Processing a high volume of data/records.
- Transaction management, Resource management.