{"id":3662,"date":"2011-08-01T10:13:45","date_gmt":"2011-08-01T08:13:45","guid":{"rendered":"http:\/\/blog.jteam.nl\/?p=3662"},"modified":"2011-08-01T10:13:45","modified_gmt":"2011-08-01T08:13:45","slug":"cleaning-up-your-maven-repository","status":"publish","type":"post","link":"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/","title":{"rendered":"Cleaning up your maven repository"},"content":{"rendered":"<p>A few days a go I was looking at a warning that my disk was getting to full. I just upgraded to apple osx lion. There were a few things that were related to the upgrade, but another large directory was the maven repository directory. The easy way out is to just remove everything, but I do not want to do that every week. Than I started to think about a solution to delete only part of the repository. As I like playing around with groovy, it must be a groovy script.<\/p>\n<p>So what libraries or artifacts to remove? I want to remove all old snapshots and I want to remove old artifacts of which newer ones have been installed. In this blog post I explain the script, what it can and how it works.<\/p>\n<p><!--more--><\/p>\n<h2>Requirements<\/h2>\n<p>What libraries to clean? The maven repository has a lot of libraries that we no longer need. Of course there is a risk we delete to much, still what is this risk? If we remove to much, we just have to download it again. Not to big of a risk if you ask me. Why not just remove everything at once? Well that is just a bit to much and not something you&#8217;d do once a week.<\/p>\n<p>I separate the requirements in two parts. Libraries that are versioned and libraries that are snapshots.<\/p>\n<h3>snapshots<\/h3>\n<p>I want to throw away all snapshots that are older than 60 days.<\/p>\n<h3>Other libraries<\/h3>\n<p>I want to throw away all versions that are created more than 90 days a go and have a more recent version.<\/p>\n<h2>The script<\/h2>\n<p>I have chosen to do everything in one file. You can find it in Github:<\/p>\n<p><a href=\"https:\/\/github.com\/jettro\/small-scripts\/blob\/master\/groovy\/CleanDir.groovy\">https:\/\/github.com\/jettro\/small-scripts\/blob\/master\/groovy\/CleanDir.groovy<\/a><\/p>\n<h3>Configure the script<\/h3>\n<p>I started creating a mechanism to provide commend line parameters, but that was not ideal when I needed more parameters. Therefore I just made it easy to change the configuration in one spot in the source code. You can change the Configuration object and influence what the script does.<\/p>\n<pre class=\"brush: groovy; title: ; notranslate\" title=\"\">\nnow = new Date()\nconfiguration = new Configuration()\ncleanedSize = 0\ndetails = &#x5B;]\ndirectoryFilter = new DirectoryFilter()\nnonSnapshotDirectoryFilter = new NonSnapshotDirectoryFilter()\n\ndef class Configuration {\n    def homeFolder = System.getProperty(&quot;user.home&quot;)\n    def path = homeFolder + &quot;\/.m2\/repository&quot;\n    def dryRun = true\n    def printDetails = true\n    def maxAgeSnapshotsInDays = 60\n    def maxAgeInDays = 90\n    def versionsToKeep = &#x5B;&quot;3.1.0.M1&quot;]\n    def snapshotsOnly = true\n}\n<\/pre>\n<p>The first parameters without the def are global parameters. I bit strange, if you do not use the keyword def, they are available to all your methods. The configuration object contains the parameters to configure the script. These are the ones you want to alter in order to change the execution result. You can change the location of the repository to clean. You can make the script to s dry run only, change the output details, change the age of artifacts to keep and influence removing snapshots or all kind of artifacts. The final parameter is the versionsToKeep, here you can add versions strings that you do not want to remove. These are meant for strange versions like the example with the millstone in it.<\/p>\n<h3>Which directories to delete<\/h3>\n<p>Cleaning the repository starts by going through all directories and check if the directory contains old artifacts. The definition of old should be flexible, but if the artifacts are old, we remove the directory. The following code block show the recursive function that checks a directory for sub-directories. If no sub-directories are found, we check the age. The behavior is different for snapshot folders and normal version folders.<\/p>\n<pre class=\"brush: groovy; title: ; notranslate\" title=\"\">\nprivate def cleanMavenRepository(File file) {\n    def lastModified = new Date(file.lastModified());\n    def ageInDays = now - lastModified;\n    def directories = file.listFiles(directoryFilter);\n\n    if (directories.length &gt; 0) {\n        directories.each {\n            cleanMavenRepository(it);\n        }\n    } else {\n        if (ageInDays &gt; configuration.maxAgeSnapshotsInDays &amp;&amp; file.canonicalPath.endsWith(&quot;-SNAPSHOT&quot;)) {\n            int size = removeDirAndReturnFreedKBytes(file)\n            details.add(&quot;About to remove directory $file.canonicalPath with total size $size and $ageInDays days old&quot;);\n        } else if (ageInDays &gt; configuration.maxAgeInDays &amp;&amp; !file.canonicalPath.endsWith(&quot;-SNAPSHOT&quot;) &amp;&amp; !configuration.snapshotsOnly) {\n            String highest = obtainHighestVersionOfArtifact(file)\n            if (file.name != highest &amp;&amp; !configuration.versionsToKeep.contains(file.name)) {\n                int size = removeDirAndReturnFreedKBytes(file)\n                details.add(&quot;About to remove directory $file.canonicalPath with total size $size and $ageInDays days old and not highest version $highest&quot;);\n            }\n        }\n    }\n}\n<\/pre>\n<p>Lines 7-9 show the recursive behavior. For each sub-directory we call this method again. In line 11 we check if we deal with an old snapshot folder. If that is the case we call the removeDir function and store a message in the details array that we use later on for reporting.<\/p>\n<p>In line 14 we check for normal version artifacts and if we configured to clean them as well. If so, we have to do a little bit more work. We want to remove all versions that are older than specified days and that have a higher version in the repository. So we need to find the highest version first. Than we check for the current folder if it is the highest version. If not, and it is not in the special versions array, we delete the folder like we did before with the snapshots.<\/p>\n<p>So how do we determine the highest version?<\/p>\n<h3>Determine the highest version<\/h3>\n<p>First step in finding the highest version is finding all the versions.<\/p>\n<pre class=\"brush: groovy; title: ; notranslate\" title=\"\">\nprivate String obtainHighestVersionOfArtifact(File file) {\n    def folderWithVersions = file.parentFile\n    \/\/ Keep only the highest version\n    def versionsFolders = folderWithVersions.listFiles(nonSnapshotDirectoryFilter)\n    def highest = '0'\n    versionsFolders.each {\n        if (higherThan(highest, it.name)) {\n            highest = it.name\n        }\n    }\n    return highest\n}\ndef class NonSnapshotDirectoryFilter implements FileFilter {\n    boolean accept(File file) {\n        return file.directory &amp;&amp; !file.name.endsWith(&quot;-SNAPSHOT&quot;)\n    }\n}\n<\/pre>\n<p>In order to find all the versions, we move up one folder and ask for all sub-directories that not end with SNAPSHOT. The way to do this is by providing a filter to the listFiles method. The filter is an instance of the provided filter <em>NonSnapshotDirectoryFilter<\/em>. Than we go through the folders and compare each folder with the highest version found so far. Comparing the versions is done in the following code block.<\/p>\n<pre class=\"brush: groovy; title: ; notranslate\" title=\"\">\nprivate boolean higherThan(highestVersion, newVersion) {\n    def highestArr = highestVersion.tokenize('.')\n    def newArr = newVersion.tokenize('.')\n    if (highestVersion.endsWith(&quot;RELEASE&quot;) &amp;&amp; !newVersion.endsWith(&quot;RELEASE&quot;)) {\n        return false\n    }\n    return compareTwoIntegersInArray(highestArr, newArr, 0)\n}\n\nprivate boolean compareTwoIntegersInArray(highestArr, newArr, counter) {\n    def counterPlus1 = counter + 1\n    if (highestArr&#x5B;counter] == newArr&#x5B;counter]) {\n        if (highestArr.size() &gt; counterPlus1 &amp;&amp; newArr.size() &gt; counterPlus1) {\n            return compareTwoIntegersInArray(highestArr, newArr, counterPlus1)\n        } else if (newArr.size() &gt; counterPlus1) {\n            return true\n        }\n    } else {\n        def highest = highestArr&#x5B;counter]\n        def newest = newArr&#x5B;counter]\n        if (highest.isInteger() &amp;&amp; newest.isInteger()) {\n            if (highest.toInteger() &lt; newest.toInteger()) {\n                return true\n            }\n        } else {\n            if (highest &lt; newest) {\n                return true\n            }\n        }\n    }\n    return false\n}\n<\/pre>\n<p>The most important part to grasp here is what a version looks like and how we determine the highest version. In the higherThan method we take the string and tokenize it on the dots. Each version looks like 3.1.0. This is an easy one, we can also have 3.1.0.RELEASE or 3.1.0-RC1. Using the arrays from the higherThan method, we compare the two versions. If we can compare them as numbers, we do that, if not we compare them as strings. If one of the versions is shorter (less dots) than the other, i.e. 3.0 and 3.0.1, the longest version wins if the first part is the same. So 3.0.1 is higher than 3.0, but 3.1 is higher than 3.0.1<\/p>\n<h3>Final remarks<\/h3>\n<p>Within the script we also do some reporting, this is kind of trivial, refer to the code if you want to see how it works.<\/p>\n<p>I ended up cleaning about 7 Gb of my repository, so mission succeeded. As a catch, I do not know yet what I removed that I still needed :-).<\/p>\n<p>As always feel free to ask questions or leave comments for improvements.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few days a go I was looking at a warning that my disk was getting to full. I just upgraded to apple osx lion. There were a few things that were related to the upgrade, but another large directory was the maven repository directory. The easy way out is to just remove everything, but [&hellip;]<\/p>\n","protected":false},"author":60,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[83,31,10],"tags":[82,102],"class_list":["post-3662","post","type-post","status-publish","format-standard","hentry","category-groovy","category-java","category-development","tag-groovy","tag-maven"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Cleaning up your maven repository - Trifork Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Cleaning up your maven repository - Trifork Blog\" \/>\n<meta property=\"og:description\" content=\"A few days a go I was looking at a warning that my disk was getting to full. I just upgraded to apple osx lion. There were a few things that were related to the upgrade, but another large directory was the maven repository directory. The easy way out is to just remove everything, but [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/\" \/>\n<meta property=\"og:site_name\" content=\"Trifork Blog\" \/>\n<meta property=\"article:published_time\" content=\"2011-08-01T08:13:45+00:00\" \/>\n<meta name=\"author\" content=\"Jettro Coenradie\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jettro Coenradie\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/\",\"url\":\"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/\",\"name\":\"Cleaning up your maven repository - Trifork Blog\",\"isPartOf\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#website\"},\"datePublished\":\"2011-08-01T08:13:45+00:00\",\"author\":{\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/198c00ea654e6a5e38e33511d983613d\"},\"breadcrumb\":{\"@id\":\"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/trifork.nl\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Cleaning up your maven repository\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/trifork.nl\/blog\/#website\",\"url\":\"https:\/\/trifork.nl\/blog\/\",\"name\":\"Trifork Blog\",\"description\":\"Keep updated on the technical solutions Trifork is working on!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/trifork.nl\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/198c00ea654e6a5e38e33511d983613d\",\"name\":\"Jettro Coenradie\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/bfce5dacae07c9ed6b0283448d22fee7?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/bfce5dacae07c9ed6b0283448d22fee7?s=96&d=mm&r=g\",\"caption\":\"Jettro Coenradie\"},\"url\":\"https:\/\/trifork.nl\/blog\/author\/jettro\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Cleaning up your maven repository - Trifork Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/","og_locale":"en_US","og_type":"article","og_title":"Cleaning up your maven repository - Trifork Blog","og_description":"A few days a go I was looking at a warning that my disk was getting to full. I just upgraded to apple osx lion. There were a few things that were related to the upgrade, but another large directory was the maven repository directory. The easy way out is to just remove everything, but [&hellip;]","og_url":"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/","og_site_name":"Trifork Blog","article_published_time":"2011-08-01T08:13:45+00:00","author":"Jettro Coenradie","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Jettro Coenradie","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/","url":"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/","name":"Cleaning up your maven repository - Trifork Blog","isPartOf":{"@id":"https:\/\/trifork.nl\/blog\/#website"},"datePublished":"2011-08-01T08:13:45+00:00","author":{"@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/198c00ea654e6a5e38e33511d983613d"},"breadcrumb":{"@id":"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/trifork.nl\/blog\/cleaning-up-your-maven-repository\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/trifork.nl\/blog\/"},{"@type":"ListItem","position":2,"name":"Cleaning up your maven repository"}]},{"@type":"WebSite","@id":"https:\/\/trifork.nl\/blog\/#website","url":"https:\/\/trifork.nl\/blog\/","name":"Trifork Blog","description":"Keep updated on the technical solutions Trifork is working on!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/trifork.nl\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/198c00ea654e6a5e38e33511d983613d","name":"Jettro Coenradie","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/trifork.nl\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/bfce5dacae07c9ed6b0283448d22fee7?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/bfce5dacae07c9ed6b0283448d22fee7?s=96&d=mm&r=g","caption":"Jettro Coenradie"},"url":"https:\/\/trifork.nl\/blog\/author\/jettro\/"}]}},"_links":{"self":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/3662","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/users\/60"}],"replies":[{"embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/comments?post=3662"}],"version-history":[{"count":0,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/posts\/3662\/revisions"}],"wp:attachment":[{"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/media?parent=3662"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/categories?post=3662"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trifork.nl\/blog\/wp-json\/wp\/v2\/tags?post=3662"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}