Verifying Coding Standards Using Static Code Analysis Techniques

by Wilco Koorn, May 27, 2019

In a previous blog we described how we used JUnit and Reflection to verify correct implementation of coding standards addressing Axon Upcaster consistency. We concluded that, although possible, it felt a bit like a hack because we had to change modifiers of Java variables defined as “private static final” resulting in Java security manager warnings. We also promised to get back to the subject, this time using static code analysis technology as found in e.g. SonarQube.

Before we present our solution we recap the case at hand and we will of course close with a comparison and draw some conclusions.

Source code consistency

The system we were building in my project is using the Axon framework (Axon Framework) . In Axon all events in the system end up in an event store to be used for event sourcing later. We for example defined “Persons” and “Organisations” where Persons work for Organisations. During development such entities evolve, for example, they get a new property. Events such as “OrganisationUpdated” therefore become versioned. When new versions, in Axon terms revisions, contain new properties we have older events in our event store without those properties. This is a problem if the new property is mandatory. In Axon terminology they need to be upcasted by Java classes called Upcasters to allow for event replaying. Over time there will be several upcasters taking an event first from revision 0 to revision 1, then from 1 to 2 and so on until we reach the current one.

In our project we introduced a generic way of managing upcasters and we introduced a coding standard for it as follows. We have an event in Kotlin defining the current revision like this:

@Revision("4")
data class OrganisationUpdatedEvent(val id: String, val
 name: String, …

Next we have a series of upcasters where the name immediately tells what it does, like:

OrganisationUpdatedEventUpcasterNullTo1.java

And

OrganisationUpdatedEventUpcaster1To2.java

Next, each of these upcasters contains code like this:

private static final SimpleSerializedType INPUT = new
	  SimpleSerializedType(
		OrganisationUpdatedEvent.class.getName(), "1");
private static final SimpleSerializedType OUTPUT = new
	  SimpleSerializedType(
		OrganisationUpdatedEvent.class.getName(), "2");

Upcasters use this information to “recognize” if a particular revision of an event can be upcasted. Finally upcasters are Spring Components requiring a particular ordering, so we have code like this:

@Component
@Order(1)
public class OrganisationUpdatedEventUpcaster1To2

We need this ordering because Axon runs all events through a list of all upcasters it knows about at event replay time. Axon does not impose anything in terms of ordering on that list, but we want to apply upcaster “…NullTo1” before “…1To2” etc. Managing Axon upcasters over time will be another blog post, as there are a couple of alternative implementations all with different pros and cons.

Let us get back to our current subject. To summarize, in our coding standard we have:

The current revision in an annotation on an event (e.g. “@Revision(“4”)”)
A series of classes with a particular name (e.g. …NullTo1, …1To2, ….)
Per upcaster an INPUT and OUTPUT revision (e.g. “1” and “2”)
The previous revision in an annotation on an upcaster (e.g. @Order(1))

Given an event with a Revision “4” this implies:

There should be 4 upcasters
The name of each upcaster (OrganisationUpdatedEventUpcaster1To2) contains 2 revisions that should match values specified in INPUT (1) and OUTPUT (2) constants
Likewise, the name of each upcaster (OrganisationUpdatedEventUpcaster1To2) contains the TypeName used in INPUT and OUTPUT (OrganisationUpdatedEvent)
The value in the “@Order” annotation should equal the revision used in the INPUT (1)

Now how do we know this standard is properly implemented? A static analysis test would add value to catch mistakes that are easily made and easily overlooked at review time such as copy-paste errors:

Forgetting to change the type in INPUT and OUTPUT constants
Forgetting to update the @Order annotation

And other errors like omitting the upcaster in the first place.

Static code analysis

One of the best-known tools in the area of static code analysis is SonarQube. It catches potential bugs, reports on code coverage, analyses complexity and the like. As SonarQube is easily extensible we looked at it for implementing a so-called CustomRule verifying our coding standard. This appeared harder than expected for two reasons:

SonarQube uses plugins per language
SonarQube “unit of analysis” is a single file

Let us elaborate on these two before we explain in detail how SonarQube rules work.

SonarQube uses plugins per language

The first hoop we have to jump through is the fact that we have both Kotlin as well as Java source code. SonarQube supports Java since ages but for Kotlin we need to look at other solutions. In the end we used parts of Detekt. But we probably could have used other solutions as well. The problem is that adding a CustomRule to the Java plugin is unnatural, as it can’t understand Kotlin files and vice versa.

SonarQube’s “unit of analysis” is a single file

SonarQube takes a path that points at some directory and then recursively analyses each relevant file on a per file basis. So here is the second hoop to jump through: information from a single Kotlin file (the current revision in an annotation on an event) needs to be combined with information coming from several Java files.

How SonarQube rules work

A SonarQube rule basically defines a set of tree types we are interested in and a visitor in which we examine the tree and possibly produce errors and warnings. For example, to verify a maximum number of parameters in a method we say we are interested in Java methods. In a visitor we get each method, count the number of parameters and produce an error or a warning when the count exceeds a particular value.

Key to observe here is that the visitor gets trees. These are sub-trees of the Abstract Syntax Tree derived by parsing the source code file.

Statically deriving information from Upcaster (java) files

As said upcasters are Java files and we need to obtain information from an annotation (@Order) and from two variables (INPUT and OUTPUT). Our CustomRule becomes:

public class UpcasterInformationRule extends
		IssuableSubscriptionVisitor {

With:

@Override
public List<Tree.Kind> nodesToVisit() {
  return ImmutableList.of(
		Tree.Kind.VARIABLE,
		Tree.Kind.ANNOTATION);
}

And:

@Override
public void visitNode(Tree tree) {
  if (tree instanceof VariableTree) {
    VariableTree variableTree = (VariableTree)tree;
    …
   } else if (tree instanceof AnnotationTree) {
    AnnotationTree annotationTree = (AnnotationTree)tree;
    …
   }
}

Cool. Now we need to look inside the variableTree and inside the annotationTree to obtain the information we are after. The latter is simpler so let us start with that one to get the hang of it. The annotationTree in our case is shown in Figure 1.

Figure 1. Abstract Syntax Tree representation of “@Order(1)”.

To verify we have an Annotation with name “Order” and then derive its value (1) we get, given the annotationTree:

IdentifierTree identifierTree = (IdentifierTree)
  annotationTree.annotationType();
if (identifierTree.name().equals("Order")) {
  Arguments arguments = annotationTree.arguments();
  LiteralTree literalTree = (LiteralTree)arguments.get(0);
  valueOfOrderAnnotation = literalTree.value();
}

Getting the values of the INPUT and OUTPUT variables is more complicated. Let us first show the tree representation of the INPUT variable declaration in Figure 2.

Figure 2. Abstract Syntax Tree representation of variable declaration “INPUT”.

To derive the name of the event and the revision value give the variableTree we now get:

if (variableTree.simpleName().name().equals("INPUT")) {
  inputInfo = getEventAndRevision(variableTree);
}

And:

private EventAndRevision getEventAndRevision(
     VariableTree variableTree) {
  NewClassTree newClass =
    (NewClassTree) variableTree.initializer();
  if (((IdentifierTree) newClass.identifier()).name()
        .equals("SimpleSerializedType")) {
    // get the name of the event from argument 0
    Arguments arguments = newClass.arguments();
    MethodInvocationTree zero = (MethodInvocationTree)
       arguments.get(0);
    MemberSelectExpressionTree expressionTree =
       (MemberSelectExpressionTree) zero.methodSelect();
    MemberSelectExpressionTree child = (MemberSelectExpressionTree) expressionTree.expression();
    IdentifierTree id = (IdentifierTree) child.expression();
    String eventName = id.name();

    // now get the revision value from argument 1
    LiteralTree value = (LiteralTree) arguments.get(1);
    String revision = value.value();
    return new EventAndRevision(eventName, revision);
  }
  return null;
}

To draw an early conclusion, that was quite a bit of coding. Figuring out how to traverse the variableTree is also cumbersome, as up front we do not have any idea about the tree shape and the types involved.

Statically deriving information from Event (Kotlin) files

Above we showed how to obtain information from a Java file, so here is its peer for handling Kotlin files. Recall we have an event like this:

@Revision("4")
data class OrganisationUpdatedEvent(val id: String, val
     name: String, …

And we need to obtain the value “4” from the Revision annotation. The tree representation is shown in Figure 3.

Figure 3. Abstract Syntax Tree representation of an annotated constructor.

The code to obtain the value of the revision annotation looks like this.

KtCompiler ktCompiler = new KtCompiler();
Path path = new File("src/test/files").toPath();
KtFile ktFile = ktCompiler.compile(path, path.resolve("events.kt"));

List<KtClass> constructors =
  PsiTreeUtil.getChildrenOfTypeAsList(ktFile, KtClass.class);
for (KtClass constructor : constructors) {
  if (constructor.getAnnotationEntries().size() > 0) {
    KtAnnotationEntry annotationEntry =
      constructor.getAnnotationEntries().get(0);
    String perhapsRevision = annotationEntry.getShortName().asString();
    if (perhapsRevision.equals("Revision")) {
      KtValueArgumentList revisionArgList =
        (KtValueArgumentList) annotationEntry.
          getChildren()[1];
      String revisionValue = revisionArgList.
        getArguments().get(0).getText();
…
    }
  }
}

Here we have used KtCompiler from Detekt. All other types such as KtClass, KtFile etc are from the “org.jetbrains.kotlin.psi” package. The PsiTreeUtil is from IntelliJ package “org.jetbrains.kotlin.com.intellij.psi.util”.

The careful reader may notice a slight mismatch between the tree from Figure 3 and the code above. The code uses some shortcuts by using “getText”; that is, the textual representation of a tree as it saves some tree walking.

Comparison and Conclusion

For our use case, verifying a coding standard for Axon upcasters, our first implementation used JUnit and Reflection. It felt a bit hacky, especially because obtaining information from constants is bound to break in the future when security managers will no longer allow changing modifiers of Java variables defined as “private static final”. Our second implementation based on SonarQube style parsing and tree analysis can in our opinion also not be qualified as “perfect”. It works, but as we have seen above it is quite tedious and calls for usage of the third party library Detekt.

Concluding, we prefer the JUnit/Reflection approach for our use case mainly because it uses already compiled code and thus does not rely on parsing which may fail. In general however, in cases where more complexity is involved we feel static code analysis is the only way to get the job done.