Apache Lucene in Google Summer of Code – The Apache Way

by Simon WillnauerAugust 29, 2011

In 2011 Google invited open source project around the globe for its 7th Google Summer of Code“Google Summer of Code is a global program that offers student developers stipends to write code for various open source software projects. We have worked with several open source, free software, and technology-related groups to identify and fund several projects over a three month period. Since its inception in 2005, the program has brought together over 4500 successful student participants and over 3000 mentors from over 100 countries worldwide, all for the love of code.”

In 2011 Apache Lucene received 5 slots to accept student work on several long standing issues:

  • NumericRange support for new Query Parser (Vinicius Barros)
  • Ability to separately specify a fields type (Nikola Tankovic)
  • Implementing State of the Art Ranking for Lucene (David Mark Nemeskey)
  • Enable Lucene to take advantage of low-level IO options (direct IO) and generalize it’s Directory implementation (Varun Thacker)
  • Simplify configuration API of contrib Query Parser (Phillipe Ramalho)

The students attending GSoC for Apache Lucene and AFAIK was a one-off, none of them had any Open Source development experience prior to these projects. However, you might think that writing the code is the biggest challange in Google Summer of Code which might be true for some of the projects. However, adopting the Apache Way was certainly one of the bigger issues. Since this doesn’t only apply to these students but to a lot of other possible contributors too, let me quickly explain how OSS development works here at the Apache Software Foundation.

Within the ASF, development happens exclusively in the public domain and every contributor must adhere to public communication. We consider private emails as non-existant or in other words “if it didn’t happen on the mailing list, it didn’t happen”. We try to stick to this rule to get the best possible visibility as well as to attract other interested community members to jump on bugs, issues and improvement discussions. As always, it’s the idea that counts not the where it came from, although we do of course pay attention to people too!

So when we have new recruits into a project like Apache Lucene with a huge install base and use in countless applications they tend to either being affraid to contribute or try to rewrite a large protion of the code which usually ends in a dead issue. However, Open Source projects usually gain more from a smaller, well discussed and carefully crafted & tested feature than from a half baked patch-that’s too large to work with. Generally speaking we try to take every person and issue seriously and iterate towards a committable solution. Therefore nobody should be afraid of creating new issues, contribute to discussions or submit code to our issue tracker.

Especially in Lucene-Land we prefer “progress over perfection”. Nobody should hesitate to describe an overall vision, but when the rubber meets the road it should be taken in small steps. A code patch of 20 KB is likely to be reviewed very quickly and gets rapid feedback, while a patch even 60kb in size can take a little longer. So try to break up your vision and the community will work with you to get things done!

The five students listed above did an amazing job on getting themselves into the community, working with committers to push their project forward. I’m proud that all our GSoC students passed their final evaluation and the most of the code is already committed into the current development branch. I’m curious to see how these “first time” contributors evolve since the hardest part is to stick around. It’s a fast moving community!

Thank you to all the GSoC students not only those who worked on Lucene you do a great job on pushing this (Software) world to be a little more open!