Feed Your Database

by Aparna Chaudhary, September 10, 2009

The priority that performance testing gets among other development activities is always less; it is an “afterthought”, not a critical, ongoing part of the development process. Typically, it is done in higher environments like ST, QA. Some attention is given to performance in the prior SDLC cycles like design by making proper selection of frameworks. But during development its often limited to following the right coding standards. With this approach, the performance issues accumulate over time and then these bottlenecks become project killers.

The only solution to this problem is Tune Early, Tune Often. A process for injecting performance testing into all stages of the development process – making it, appropriately, a key part of application development is the prime necessity for successful execution of the project. Especially in database oriented applications, even if the developers are willing to test their application, they often face issues with the scarcity of test data.

Brand new application planned to go-live for the first time
Because of some security norms, the customer is not ready to share the production data with the development team.
Application undergoes major schema refactoring and its lot of efforts to map the existing data to the new schema
Dependency on the database team.

Because of these and such similar problems, application developers cannot go beyond Unit testing. The result is they end up writing queries with complex joins, totally unaware of how it would perform with large volumetrics. And then lot of time and cost is spent on refactoring.

Scarcity of test data in development enviroment is a common problem. In the following article I’ll introduce you to the use of Benerator. Benerator is a data generator tool that can be used to feed database with pseudo-random test data.

Installing Benerator:

Unzip the benerator installation in an appropriate directory. Set BENERATOR_HOME that points to the path you extracted benerator.

Sample Project:

Execute benerator-gui.bat. Configure the database details in the project creation pop-up.

In the generated project, rename benerator-demo.ben.xml to benerator.xml. Add the database configuration details in benerator-demo.properties file.

dbDriver=com.mysql.jdbc.Driver
dbUrl=jdbc:mysql://localhost/datagenerator_demo
dbSchema=datagenerator_demo
dbUser=build
dbPassword=build
dbBatch=true
benerator.defaultPagesize=1000

To run the datagenerator from command line, go to the benerator installation directory and run:

bin\benerator.bat ${project_home}\benerator.xml

Or through IDE, run org.databene.benerator.main.Benerator

Benerator ships with domain packages. Default generators are available for the commonly used entities like Organization, Address, Bank Account Number, Person, Email Address etc. Its possible to configure region and locale for these generators.

  <attribute name="email" generator="EMailAddressGenerator" />

Generate Fresh Data:


     <execute target="db">
        drop table if exists customer;
     </execute>

     <execute target="db">
        create table customer (
            id bigint not null auto_increment,
            name varchar(255),
            pword varchar(20),
            address varchar(100),
            phone varchar(20),
            primary key (id)
        );
    </execute>

    <import defaults="true"/>
    <import domain="person"/>
    <import domain="address"/>

    <!-- Populate Customer Entity -->
    <create-entities name="customer" count="{ftl:${customer_count}}" consumer="db">
        <variable name="person" generator="PersonGenerator" dataset="{${country}}" locale="{${locale}}"/>
        <variable name="address" generator="AddressGenerator" dataset="{${country}}"/>
        <id name="id" strategy="{${idStrategy}}" param="{${customerIdParam}}" source="db" />
        <attribute name="name" script="{ftl:${person.givenName} ${person.familyName}}" />
        <attribute name="pword" pattern="[A-Za-z0-9]{8,12}" />
        <attribute name="address" script="{ftl:${address.street} ${address.houseNumber} ${address.city} ${address.country} ${address.zipCode} }" />
        <attribute name="phone" script="{ftl:${address.mobilePhone}}" />
    </create-entities>

Generating Data for Entities with Parent Child Relationships:

For generating parent child entities e.g. customer and customer order, the following configuration can be used.

    <create-entities name="cust_order" count="{ftl:${customer_count * orders_per_customer}}" consumer="db">
        <id name="id" strategy="{${idStrategy}}" source="db" param="{${custOrderIdParam}}" />
        <attribute name="customer_id" source="db" selector="select id from customer" cyclic="true" />
        <attribute name="created" nullable="false" generator="CurrentDateGenerator"/>
    </create-entities>

But this configuration is not optimal if you need to generate data for a table with few million items. The easiest and the most optimum way to generate references is to control the id range in which referenced entities are generated and then generate the ids in that range without querying the database.

    <create-entities name="cust_order" count="{ftl:${customer_count * orders_per_customer}}" consumer="db">
        <id name="id" strategy="{${idStrategy}}" source="db" param="{${custOrderIdParam}}" />
        <attribute name="customer_id" type="int" min="1" max="{ftl:${customer_count}}" cyclic="true" />
        <attribute name="created" nullable="false" generator="CurrentDateGenerator"/>
    </create-entities>

As you can see the customer_id is generated without querying the customer table here.

With benerator it is also possible to feed the database with DBUnit export.

      <create-entities source="shop.dbunit.xml" consumer="db" />

Performance could be the issue while populating mass test data. Few tips for generating mass test data:

Set batch mode to true {batch=true in the database definition}.
Tweak JVM Heap Size. If you are executing benerator using command line then use bin\benerator ${project_home}\benerator.xml –Xmx512m. Make sure you modify the heap size based on your system configurations.

With maximum heap size of 512MB and batch size of 1000, I could generate 10000 customer, 4000 products and 50000 customer orders in 1382 sec.

So with the usage of benerator, its possible to generate mass test data. With the availability of mass random test data, its easy to find out the poorly performing queries, using database features like EXPLAIN. EXPLAIN is the database command that shows the optimum execution path chosen by the database server. The query plan gives the details about the indices used in the query execution, type of join performed, cost of the overall query execution and much more.

Conclusion:

Benerator definitely provides a solution to address one of the performance anti-pattern “Load more data than needed“. With the usage of benerator, it is possible to generate test data that resembles production data and to solve performance problems early in the development cycle.