Using Gradle Enterprise to Detect Configuration Regressions In An Android Gradle Build

Nate Ebel
Engineering at Premise
12 min readJun 9, 2021

--

By Nate Ebel, Android Developer

Photo by iMattSmart on Unsplash

The Mobile Engineering team here at Premise has made a concerted effort to improve developer efficiency over the past year.

One of the biggest areas of tech investment has been into understanding and reducing the Gradle build times of our Android project. We worked with engineers from Gradle to help us optimize and monitor our build using tools such as Gradle Enterprise and gradle-profiler. The results from this work have been terrific. We are seeing decreases in both local and CI build times of 50–80% from when we first started our work.

Now that we have our build under control, the story has changed a bit. Instead of a laser-focus on speeding up our builds, we now look towards adopting new build optimizations and towards maintaining existing build performance.

How can we monitor build trends over time? How do we debug local build issues for a team spread across multiple continents? What can we do to help identify and fix regressions?

It’s these questions that we are more often considering.

In this post, I want to walk you through a bit of a case study into this build monitoring process. We’re going to explore how we used Gradle Enterprise, and other tools to:

  • Discover a build regression
  • Identify the root cause of that regression
  • Fix the issue
  • Improve monitoring so the issue is easier to deal with in the future

If you want to jump to the final summary analysis, you can skip to the end.

If you want to walk through the full case study, then hopefully as you read through this, you might gain some insights into how you might identify Gradle build issues in your own projects and to leverage tools such as Gradle buildscans to diagnosis and fix those issues.

Discovering A Build Regression

The issue we uncovered was this:

  • Over the course of a few days, our PR builds in GitHub Actions saw a 3x increase in Gradle configuration time
  • This resulted in a ~20% increase in total PR build times in GitHub Actions
  • Local build times didn’t seem to be impacted

How Did We Detect The Increase In Gradle Configuration Time?

We detected this issue by reviewing build trend data in our Gradle Enterprise dashboard. If you aren’t familiar, Gradle Enterprise is a tool to help collect and analyze Gradle build data for your team or organization.

A sample of Gradle Enterprise buildscan trends for PR builds over the past 90 days

On reviewing our median PR build times over the past several weeks, we noticed a significant increase had occurred and persisted over several days. Without our Gradle Enterprise server collecting data for all Android builds, we likely would not have discovered this issue on our own.

Digging deeper into performance data for individual builds showed a significant increase in configuration time for one of our Gradle modules. This happened to be the one module in our app with native C++ code.

The was our first clue that maybe the issue was related to the NDK somehow.

Identifying The Root Cause Of The Regression?

With several days of increased build times for our :nativecode module, how did we then identify the root cause?

We started running through our list of Gradle debugging tools.

Trying to reproduce locally

First, we ran ./gradlew help locally to try and replicate the problem.

Running the help task will run just the configuration phase for a Gradle build as was the first place for us to try and reproduce the issue locally.

Unfortunately, this did not help as we were unable to replicate the build regression during a local build. Great for individual developer productivity, but not helpful in identifying our mystery issue.

Identifying the causal commit

Our next step, was to try and identify when the build regression started, and to identify any commits around that time that seemed likely to impact build performance.

Again, here we turned to Gradle Enterprise. We were able to view buildscan data over the past two weeks, and identify the approximate date where the build times started going up.

A sample buildscan within Gradle Enterprise

Once we had a timeframe in mind, we turned to our git history in GitHub.

On investigating the changes from that period, one change stuck out as the likely culprit — the updating of AGP to version 4.1 and Gradle to version 6.7

Benchmarking build performance around possible problem commit

To determine if the build regression was due to our bumping of AGP/Gradle versions, we turned to gradle-profiler to locally benchmark build performance before, and after, the version bump commit.

Sample gradle-profiler output from benchmarking the `help` task

If you aren’t familiar, gradle-profiler automates the running of Gradle tasks and the collection of build metrics to provide aggregate performance data and comparisons. By default, gradle-profiler will run a target task 10 times and summarize that build data for you to compare against other builds.

In our benchmarking, we identified several commits around the possible problem commit, and ran our benchmarks against each of them.

Given that we couldn’t reproduce the issue locally to begin with, it’s not surprising that the results of our benchmarking were inconclusive. We weren’t seeing any statistically significant difference in build times across any of the commits we tested.

Then, we realized we may be using gradle-profiler incorrectly.

We were measuring total build time, but our issue was specifically at configuration time. Maybe if we isolated configuration time, our benchmark results would turn up something more interesting?

As it turns out, gradle-profiler can be configured to include configuration time in its output.

After re-running our benchmarks with configuration time reported, sadly, we still saw nothing conclusive.

At this time, we considered admitting defeat and calling in reinforcements from the developer community. We decided to sleep on it, and give it one more try later that week.

Reviewing buildscan output

After taking a break, we decided to take a step back from the benchmarking and deep dives into commit history. Instead, we opened up one of the impacted buildscans and gave it a thorough examination.

We noticed something important upon review of the console output for that build. The output logs mentioned the download and installation of the NDK.

Example buildscan highlighting the automatic download of the NDK by Android Gradle Plugin

This struck us as odd.

Why are we downloading the NDK if our pinned version matches what is pre-installed on our virtual machine?

Examining the pre-installed tooling of the GitHub Actions virtual environment

We needed to double check the pre-installed version(s) of the NDK available in the GitHub Actions virtual environments.

The list of current GitHub Actions virtual environments is available on GitHub. We found the listing for ubuntu-latest which, at the time, was ubuntu-18.*

If you open the README for a specific virtual environment, you can find the pre-installed tooling descriptions. And in the case of our investigation, we searched for NDK to determine which NDK versions were available to us.

We compared the downloaded NDK version to the version pinned in our application module.

Why/how could this be?

Reviewing AGP changes

We checked the AGP release notes — specifically for NDK changes. We found that, sure enough, our updated version of AGP had a new default NDK version — a version not pre-installed on the GitHub Actions virtual environment.

Forming Our Hypothesis

We now knew what was happening, but not why?

The Android Gradle Plugin was downloading and installing a version of the NDK during the configuration phase of our build. The install was required, because that new version of the NDK was not available on the GitHub Actions virtual machine.

But why was AGP downloading a version different than what we had pinned in our application module?

We formed the following hypothesis:

Our desired NDK version was not set for all Android modules in the project — therefore leaving AGP to download whatever versions it needed if it didn’t detect a pinned version.

How Could We Reproduce And Test The Issue Locally?

To validate this hypotheses, we tried again to reproduce the issue locally.

We check the installed NDK versions on a local machine and found that the new default NDK had, in fact, been downloaded to the machine.

This would explain why local configuration times didn’t seem to be impacted. The NDK had been previously installed by either a dev or by AGP on a previous build.

We then deleted that version of the NDK from the local test machine, and ran ./gradlew help to replicate the issue.

This time, we saw that the task kicked off a download of the new NDK version.

Successful repro!!!

We then tested a possible fix. Again remove the new NDK version from the local machine. Then, we pinned our desired NDK version in each project module. Once again we ran the help task, and this time, the project built without downloading or installing any other version of the NDK.

Fixing The Issue For Impacted CI Builds

To fix this issue for good, so our CI builds were not facing increased build times, we needed to ensure we were pinning the same NDK version for each of our modules.

To accomplish this, we created a precompiled script plugin named android-library.gradle file to consolidate any configuration that should be commonly applied to our Android library modules.

This included a single pining of the NDK which was then reused across modules to ensure consistency.

android {
ndkVersion "<our desired NDK version>"
}

We then applied this android-library.gradle file across our build.gradle files.

Validating The Fix

Once we consolidated our buildscript logic, we pushed a branch with our changes and generated several Gradle buildscans for the new branch.

We examined the buildscans in Gradle Enterprise and verified that configuration time once again looked normal and output logs no longer included NDK installation

Summary Analysis

The Android Gradle Plugin (AGP) has a default version of the Native Development Kit (NDK). This is the version of the NDK that will be used if your project requires the NDK, but doesn’t specific a specific version of the NDK.

If you don’t specifically pin an NDK version for each module in your app, AGP will automatically download the default NDK if it doesn’t exist on your local machine.

The virtual environments provided by GitHub Actions have pre-installed versions of the NDK available. And for a while, the pre-installed versions matched AGP’s default NDK version for our builds.

However, after updating our Android Gradle Plugin version, the default NDK version was no longer present on the GitHub Actions virtual machine and had to be downloaded by AGP on each run.

Because our CI builds were not caching any/all installed NDK versions due to the large download size, our CI builds were having to download this new NDK on each build.

This download/installation time occurs during the configuration phase of the Gradle build for an Android project which is what was leading to the 5x increase in configuration time for our CI builds.

To avoid this, ensure that you’re using a consistent NDK version across all modules in your app so you don’t start downloading new versions when you update AGP in the future.

How Were We Able To Improve For The Future?

At this point, there was some minor celebration and we took pleasure in our newfound understanding of how Android Gradle Plugin works.

But very quickly, we started asking ourselves how we could identify and fix this issue more efficiently in the future? In trying to answer that question, we identified a number of ideas:

Ensure NDK pinned properly across modules.

This is the approach we implemented and helps us ensure that we’re only dealing with a single NDK version.

Cleanup installed NDK versions

By keeping multiple NDK versions around on local dev machines, we run the risk of not finding these types of issues during local development in the future. We took it as an opportunity to cleanup unused versions.

Pay closer attention to the default NDK version when updating AGP

Now that we know that Android Gradle Plugin will download and install missing versions of the NDK, we pay closer attention to the versions of the NDK that are installed on our GitHub Actions virtual environments.

If either the environment changes, or AGP’s default NDK version changes, we run the risk of facing the same configuration time regression.

Benchmarking before merging

This experience led to a greater understanding of the gradle-profiler tool. We now have this as a tool at our disposal when we think a change might impact build times.

Before merging in major changes to our build, we can benchmark build performance and better understand if our build will be significantly impacted in any way.

Automated configuration checks

Because the impact to configuration time is so dramatic when a new version of the NDK is downloaded, it’s pretty easy to spot the change if you are looking for it.

Wit this in mind, we created a workflow in GitHub Actions to help monitor and identify this issue in the future.

On a regular schedule, we run a job that simply runs Gradle’s help task.

GitHub Action workflow runs to test the configuration of our project

This does several useful things for us

  • Generates regular buildscans for the help task — making trends easier to analyze in Gradle Enterprise
  • Generates CI builds that have very consistent runtime characteristics. Because this job always completes in ~1 min, it’s very obvious that something is wrong with the job takes ~3–4 min.

We’ve found that simply reviewing the job duration in GitHub has been enough to identify this regression — without even needing to review data in Gradle Enterprise.

Suggesting improvements to Gradle Enterprise

Gradle Enterprise already breaks build trends into several categories like “serial execution” and “cache avoidance”. We recognized that if Gradle Enterprise surfaced “configuration time” in this same way, it may make detecting this kinds of issues even easier.

We suggested the idea to Gradle engineers as a feature request for Gradle Enterprise.

Takeaways for your own projects

Hopefully you’ve found this to be an interesting and practical exploration of how to debug an Android Gradle build regression.

I just want to highlight a few key things for you that may be most helpful for your own builds:

  • Gradle Enterprise is a terrific tool for monitoring, debugging, and optimizing Gradle builds. It collects buildscan data from across your team or organization and helps identifies issues and trends that may otherwise go unnoticed.
  • Gradle buildscans are an invaluable tool for learning about a specific build, and for diagnosing issues in that build. We rely on them heavily for maintaining/improving our builds. If you don’t use Gradle Enterprise, you can still capture a buildscan by adding --scan to the end of your Gradle wrapper invocation.
  • gradle-profiler can be a very helpful tool for comparing the build impact of any project changes. You can test simple scenarios like our help task. Or, you can simulate complex scenarios like incremental builds across a multi-module project.
  • Be aware that Android Gradle Plugin will download missing NDK dependencies in the background if the needed version is not present on the building machine. The NDK is multiple Gb in size, so downloading this on every CI build is likely a cost you’d rather not pay each time if you can avoid it.
  • It’s a very good idea to use the same NDK version across all project modules — and to be explicit in setting that version across modules. Consolidating common build logic into a precompiled script plugin is one very viable approach for ensuring Android Gradle Plugin is configured properly across all your modules.

Premise is constantly looking to hire top-tier engineering talent to help us improve our front-end, mobile offerings and cloud infrastructure. Come find out why we’re consistently named among the top places to work in Silicon Valley by visiting our Careers page.

--

--