{"id":847,"date":"2009-09-10T12:06:06","date_gmt":"2009-09-10T11:06:06","guid":{"rendered":"http:\/\/blog.jteam.nl\/?p=847"},"modified":"2009-09-10T12:06:06","modified_gmt":"2009-09-10T11:06:06","slug":"feed-your-database","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/feed-your-database\/","title":{"rendered":"Feed Your Database"},"content":{"rendered":"<p>The priority that performance testing gets among other development activities is always less; it is an &#8220;afterthought&#8221;, not a critical, ongoing part of the development process.\u00a0Typically, it is done in higher environments like ST, QA. Some attention is given to performance in the prior SDLC cycles like design by making proper selection of frameworks. But during development its often limited to following the right coding standards. With this approach, the performance issues accumulate over time and then these bottlenecks become project killers.<br \/>\n<!--more--><br \/>\nThe only solution to this problem is <strong><em>Tune Early, Tune Often<\/em><\/strong>. A process for injecting performance testing into all stages of the development process &#8211; making it, appropriately, a key part of application development is the prime necessity for successful execution of the project.\u00a0Especially in database oriented applications, even if the developers are willing to test their application, they often face issues with the scarcity of test data.<\/p>\n<ol>\n<li>Brand new application planned to go-live for the first time<\/li>\n<li>Because of some security norms, the customer is not ready to share the production data with the development team.<\/li>\n<li>Application undergoes major schema refactoring and its lot of efforts to map the existing data to the new schema<\/li>\n<li>Dependency on the database team.<\/li>\n<\/ol>\n<p>Because of these and such similar problems, application developers cannot go beyond Unit testing. The result is they end up writing queries with complex joins, totally unaware of how it would perform with large volumetrics. And then lot of time and cost is spent on refactoring.<\/p>\n<p>Scarcity of test data in development enviroment is a common problem. In the following article I&#8217;ll introduce you to the use of Benerator. Benerator is a data generator tool that can be used to feed database with pseudo-random test data.<\/p>\n<p><strong>Installing Benerator:<\/strong><\/p>\n<p>Unzip the <a href=\"http:\/\/databene.org\/databene-benerator\/112-installing-the-benerator-distribution.html\">benerator<\/a> installation in an appropriate directory. Set BENERATOR_HOME that points to the path you extracted benerator.<\/p>\n<p><strong>Sample Project:<\/strong><\/p>\n<div>Execute benerator-gui.bat. Configure the database details in the project creation pop-up.<\/div>\n<div><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-937\" title=\"benerator\" src=\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2009\/09\/benerator.JPG\" alt=\"benerator\" width=\"414\" height=\"427\" \/><\/div>\n<p style=\"text-align: center\">\n<p><strong><br \/>\n<\/strong><\/p>\n<p>In the generated project, rename benerator-demo.ben.xml to <em>benerator.xml<\/em>. Add the database configuration details in <em>benerator-demo.properties<\/em> file.<\/p>\n<p>dbDriver=com.mysql.jdbc.Driver<br \/>\ndbUrl=jdbc:mysql:\/\/localhost\/datagenerator_demo<br \/>\ndbSchema=datagenerator_demo<br \/>\ndbUser=build<br \/>\ndbPassword=build<br \/>\ndbBatch=true<br \/>\nbenerator.defaultPagesize=1000<\/p>\n<p>To run the datagenerator from command line, go to the benerator installation directory and run:<\/p>\n<p>bin\\benerator.bat ${project_home}\\benerator.xml<\/p>\n<p>Or through IDE, run org.databene.benerator.main.Benerator<\/p>\n<p>Benerator ships with domain packages. Default generators are available for the commonly used entities like Organization, Address, Bank Account Number, Person, Email Address etc. Its possible to configure region and locale for these generators.<\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\u00a0 &lt;attribute name=&quot;email&quot; generator=&quot;EMailAddressGenerator&quot; \/&gt;\n<\/pre>\n<p><strong>Generate Fresh Data:<\/strong><\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\n     &lt;execute target=&quot;db&quot;&gt;\n        drop table if exists customer;\n     &lt;\/execute&gt;\n\n     &lt;execute target=&quot;db&quot;&gt;\n        create table customer (\n            id bigint not null auto_increment,\n            name varchar(255),\n            pword varchar(20),\n            address varchar(100),\n            phone varchar(20),\n            primary key (id)\n        );\n    &lt;\/execute&gt;\n\n    &lt;import defaults=&quot;true&quot;\/&gt;\n    &lt;import domain=&quot;person&quot;\/&gt;\n    &lt;import domain=&quot;address&quot;\/&gt;\n\n    &lt;!-- Populate Customer Entity --&gt;\n    &lt;create-entities name=&quot;customer&quot; count=&quot;{ftl:${customer_count}}&quot; consumer=&quot;db&quot;&gt;\n        &lt;variable name=&quot;person&quot; generator=&quot;PersonGenerator&quot; dataset=&quot;{${country}}&quot; locale=&quot;{${locale}}&quot;\/&gt;\n        &lt;variable name=&quot;address&quot; generator=&quot;AddressGenerator&quot; dataset=&quot;{${country}}&quot;\/&gt;\n        &lt;id name=&quot;id&quot; strategy=&quot;{${idStrategy}}&quot; param=&quot;{${customerIdParam}}&quot; source=&quot;db&quot; \/&gt;\n        &lt;attribute name=&quot;name&quot; script=&quot;{ftl:${person.givenName} ${person.familyName}}&quot; \/&gt;\n        &lt;attribute name=&quot;pword&quot; pattern=&quot;&#x5B;A-Za-z0-9]{8,12}&quot; \/&gt;\n        &lt;attribute name=&quot;address&quot; script=&quot;{ftl:${address.street} ${address.houseNumber} ${address.city} ${address.country} ${address.zipCode} }&quot; \/&gt;\n        &lt;attribute name=&quot;phone&quot; script=&quot;{ftl:${address.mobilePhone}}&quot; \/&gt;\n    &lt;\/create-entities&gt;\n\n<\/pre>\n<p><strong>Generating Data for Entities with Parent Child Relationships:<\/strong><\/p>\n<p>For generating parent child entities e.g. customer and customer order, the following configuration can be used.<\/p>\n<pre class=\"brush: xml; highlight: [3]; title: ; notranslate\" title=\"\">\n    &lt;create-entities name=&quot;cust_order&quot; count=&quot;{ftl:${customer_count * orders_per_customer}}&quot; consumer=&quot;db&quot;&gt;\n        &lt;id name=&quot;id&quot; strategy=&quot;{${idStrategy}}&quot; source=&quot;db&quot; param=&quot;{${custOrderIdParam}}&quot; \/&gt;\n        &lt;attribute name=&quot;customer_id&quot; source=&quot;db&quot; selector=&quot;select id from customer&quot; cyclic=&quot;true&quot; \/&gt;\n        &lt;attribute name=&quot;created&quot; nullable=&quot;false&quot; generator=&quot;CurrentDateGenerator&quot;\/&gt;\n    &lt;\/create-entities&gt;\n<\/pre>\n<p>But this configuration is not optimal if you need to generate data for a table with few million items. The easiest and the most optimum way to generate references is to control the id range in which referenced entities are generated and then generate the ids in that range without querying the database.<\/p>\n<pre class=\"brush: xml; highlight: [3]; title: ; notranslate\" title=\"\">\n    &lt;create-entities name=&quot;cust_order&quot; count=&quot;{ftl:${customer_count * orders_per_customer}}&quot; consumer=&quot;db&quot;&gt;\n        &lt;id name=&quot;id&quot; strategy=&quot;{${idStrategy}}&quot; source=&quot;db&quot; param=&quot;{${custOrderIdParam}}&quot; \/&gt;\n        &lt;attribute name=&quot;customer_id&quot; type=&quot;int&quot; min=&quot;1&quot; max=&quot;{ftl:${customer_count}}&quot; cyclic=&quot;true&quot; \/&gt;\n        &lt;attribute name=&quot;created&quot; nullable=&quot;false&quot; generator=&quot;CurrentDateGenerator&quot;\/&gt;\n    &lt;\/create-entities&gt;\n<\/pre>\n<p>As you can see the customer_id is generated without querying the customer table here.<\/p>\n<p>With benerator it is also possible to feed the database with DBUnit export.<\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n      &lt;create-entities source=&quot;shop.dbunit.xml&quot; consumer=&quot;db&quot; \/&gt;\n<\/pre>\n<p>Performance could be the issue while populating mass test data. Few tips for generating mass test data:<\/p>\n<ul>\n<li>Set batch mode to true {batch=true in the database definition}.<\/li>\n<li>Tweak JVM Heap Size. If you are executing benerator using command line then use\u00a0<span style=\"font-family: -webkit-monospace\">bin\\benerator ${project_home}\\benerator.xml\u00a0<strong>&#8211;<\/strong>Xmx512m.<\/span> Make sure you modify the heap size based on your system configurations.<\/li>\n<\/ul>\n<p>With maximum heap size of 512MB and batch size of 1000, I could generate 10000 customer, 4000 products and 50000 customer orders in 1382 sec.<\/p>\n<p>So with the usage of benerator, its possible to generate mass test data. With the availability of mass random test data, its easy to find out the poorly performing queries, using database features like EXPLAIN. EXPLAIN is the database command that shows the optimum execution path chosen by the database server. The query plan gives the details about the indices used in the query execution, type of join performed, cost of the overall query execution and much more.<\/p>\n<p><strong>Conclusion:<\/strong><\/p>\n<p>Benerator definitely provides a solution to address one of the performance anti-pattern &#8220;<a href=\"http:\/\/www.infoq.com\/articles\/Anti-Patterns-Alois-Reitbauer\">Load more data than needed<\/a>&#8220;. With the usage of benerator, it is possible to generate test data that resembles production data and to solve performance problems early in the development cycle.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The priority that performance testing gets among other development activities is always less; it is an &#8220;afterthought&#8221;, not a critical, ongoing part of the development process.\u00a0Typically, it is done in higher environments like ST, QA. Some attention is given to performance in the prior SDLC cycles like design by making proper selection of frameworks. But [&hellip;]<\/p>\n","protected":false},"author":15,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[10],"tags":[36,174,73,11,9,175,121],"class_list":["post-847","post","type-post","status-publish","format-standard","hentry","category-development","tag-apm","tag-benerator","tag-database","tag-java","tag-open-source","tag-performance-tuning","tag-testing-tools"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Feed Your Database - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/feed-your-database\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Feed Your Database - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"The priority that performance testing gets among other development activities is always less; it is an &#8220;afterthought&#8221;, not a critical, ongoing part of the development process.\u00a0Typically, it is done in higher environments like ST, QA. Some attention is given to performance in the prior SDLC cycles like design by making proper selection of frameworks. But [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/feed-your-database\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2009-09-10T11:06:06+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2009\/09\/benerator.JPG\" \/>\n<meta name=\"author\" content=\"Aparna Chaudhary\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Aparna Chaudhary\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/feed-your-database\/\",\"url\":\"https:\/\/trifork.nl\/blog\/feed-your-database\/\",\"name\":\"Feed Your Database - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/trifork.nl\/blog\/feed-your-database\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/trifork.nl\/blog\/feed-your-database\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2009\/09\/benerator.JPG\",\"datePublished\":\"2009-09-10T11:06:06+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/b5abf896e440714ab5e2fbc22bf3c8ef\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/feed-your-database\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/feed-your-database\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/feed-your-database\/#primaryimage\",\"url\":\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2009\/09\/benerator.JPG\",\"contentUrl\":\"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2009\/09\/benerator.JPG\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/feed-your-database\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Feed Your Database\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/b5abf896e440714ab5e2fbc22bf3c8ef\",\"name\":\"Aparna Chaudhary\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/2ac0079622cd2529eae81f1b29d41a72?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/2ac0079622cd2529eae81f1b29d41a72?s=96&d=mm&r=g\",\"caption\":\"Aparna Chaudhary\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/aparna\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Feed Your Database - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/feed-your-database\/","og_locale":"en_US","og_type":"article","og_title":"Feed Your Database - Trifork Blog","og_description":"The priority that performance testing gets among other development activities is always less; it is an &#8220;afterthought&#8221;, not a critical, ongoing part of the development process.\u00a0Typically, it is done in higher environments like ST, QA. Some attention is given to performance in the prior SDLC cycles like design by making proper selection of frameworks. But [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/feed-your-database\/","og_site_name":"Trifork Blog","article_published_time":"2009-09-10T11:06:06+00:00","og_image":[{"url":"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2009\/09\/benerator.JPG","type":"","width":"","height":""}],"author":"Aparna Chaudhary","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Aparna Chaudhary","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/feed-your-database\/","url":"https:\/\/trifork.nl\/blog\/feed-your-database\/","name":"Feed Your Database - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/trifork.nl\/blog\/feed-your-database\/#primaryimage"},"image":{"@id":"https:\/\/trifork.nl\/blog\/feed-your-database\/#primaryimage"},"thumbnailUrl":"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2009\/09\/benerator.JPG","datePublished":"2009-09-10T11:06:06+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/b5abf896e440714ab5e2fbc22bf3c8ef"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/feed-your-database\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/feed-your-database\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/feed-your-database\/#primaryimage","url":"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2009\/09\/benerator.JPG","contentUrl":"http:\/\/blog.jteam.nl\/wp-content\/uploads\/2009\/09\/benerator.JPG"},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/feed-your-database\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Feed Your Database"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/b5abf896e440714ab5e2fbc22bf3c8ef","name":"Aparna Chaudhary","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/2ac0079622cd2529eae81f1b29d41a72?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2ac0079622cd2529eae81f1b29d41a72?s=96&d=mm&r=g","caption":"Aparna Chaudhary"},"url":"https:\/\/trifork.nl\/blog\/author\/aparna\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/847","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=847"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/847\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=847"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=847"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=847"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}