ProGuard & R8: Part 1

Tools such as R8 and ProGuard are available for free, but to many they are a black box that code goes into and stuff comes out. My goal is to demystify the black box and teach developers how to properly diagnose the code going into and coming out of these build tools. It will also explain why you should care and why applying these tools should be a priority before shipping any code into the wild.

This blog post is the first in a series to teach about the build tools ProGuard and R8, and will primarily be centered around understanding the build pipeline and applying protections to Android Applications and Code.

Understanding the Android Build Pipeline

The majority of Software Development blogs and tutorials teach developers how to build something. For example the languages, architectures, and patterns a developer should apply to get their product from idea to reality. But not much is written on how to protect your code once it is in your customer’s hands, or worse, your competitors.

My goal with these series of posts is to get you semi-comfortable with reading Dalvik/Smali, understand the compilation stages and how they play into optimizing, shrinking, and most importantly, protecting and obfuscating your code. These posts will also help readers to understand the stages that transform source code into bytecode.

In future iterations of this blog, I plan to dive into various ProGuard/R8 settings and show the resulting ByteCode. I realized that there has been very little written about these tools that literally touch every developers code, so I hope that through these posts, I can bring some attention to these powerful tools.

My goal is NOT to teach you how to hack, nor is it to show how to configure your proguard-rules.pro file to ensure you are secure (those will be later lessons.)

This first post is an explanation of the Build Pipeline and where R8 and ProGuard fit into it. I think this is important to know the fundamental difference between these two and where they are taking place in the pipeline, as that plays a big difference in the way they perform there functions.

But let’s take a step back… tools such as R8/ProGuard are available to apply optimizations and obfuscation to your code, but what exactly are these tools doing? Why should you care?

The first question I will get to, but the “why should you care?” question I want to address now.

Why should you care?

Once your code is in the wild, it is out there for the world to see. In the lessons ahead, I will quicky demonstrate how easy it is to take any .apk and crack it open using just the IDE to read and/or modify the contents of the code. And for even the most novice of adversaries, it is trivial for them to take that code, add some malicious content in, repackage and resign the application, then advertise YOUR APP on a service somewhere for your users to mistakenly download via a phishing campaign or some other means. (I will either cover this in a future lesson or refer you, readers, to a different blog that covers this concept).

And let’s just quickly get this out of the way… THIS DOES NOT REQUIRE A ROOTED DEVICE.

It is one thing if you provide some sort of Open Source library where you do not require your code to be obfuscated. In that instance, the individuals consuming your library can handle the obfuscation/optimizations. But for anyone else, ensuring that your app has proper protections and multiple layers of security is YOUR RESPONSIBILITY to YOUR CUSTOMERS, and that all starts with the code that you are pushing out into the world.

A line I always use is that applying this first layer of obfuscation moves your fruit higher up on the tree of low-hanging fruit, and may just be enough to deter malicious intent.

To read more about this threat that has existed since the dawn of Android, look here and here. Also, this growing trend of Mobile Malware and Fake Apps is suspected to be one of the TOP threats in 2020. You can read about that in the previous link from McAfee, as well as these two links here and here.

All of this literature is not here to scare you, but instead to show you that you need to start implementing protections into our Application NOW.

But ProGuard/R8 also have the added bonuses of speeding up your application and shrinking the resulting APK size… so why not add it?

Now that I have your attention…

Fear is a good motivator, right? But don’t worry, in this series of blog posts, I aim to help you to at least get started down the path of better securing your app, and it all starts with ensuring you get these basic protections in place first.

Concepts and Definitions

What’s in an APK?

In a nutshell, an Android Application Package, or an APK, is a container file that includes application code, resources, and the Android Manifest file.

What we will primarily be focusing on during these talks is the classes.dex file (or files), which are numbered depending on how large your application is or how you have your code organized.

That’s great and all…. but why `.dex`?

Well, that’s a great question!

.dex stands for Dalvik Executable. I will explain what the Dalvik part of that is here in a second.

We all know that native Android Applications are built using Java or Kotlin(which is just Java under the hood). But if you have any experience developing plain old Java Applications, you know that a .jar is just a container that holds references to the compiled Java code. Java bytecode is held in .class files. If you essentially “unzip” any .jar, you will see these files. We will experiment doing this in the lessons ahead so you have familiarity with this concept.

Above is a quick diagram of how this is accomplished when writing plain old Java/Kotlin Applications.

What this means is that each .java or .kt file that is written has a corresponding .class file. When the Java Virtual Machine(JVM) launches a .jar it essentially just looks for the .class which contains the main() method and off it goes. Unless obfuscation is applied, this makes unpacking and looking through .jar applications relatively easy. So why doesn’t Android have a similar construct and runtime environment?

Because Android doesn’t use the same Virtual Machine(VM) for running. Instead, from the beginning Android has used its own version of a VM. At first this was called the Dalvik VM, but these days it is known as Android Run Time(ART).

In the early days of Android, they did not have the processing power of your typical stand-alone Desktop/Laptop, so this custom VM was required in order to run the applications… especially since the original intent for Android was for it to be a operating system for a Camera.

So instead, Android (or gradle, I should say) first compiles all of your .java/.kt files into their respective .class files, and then runs them through a process known as “DEXing”. The original tool that handled this process was known as DX. So Android’s Build Pipeline at a high level looked something like this.

Ok… but why .dex!?

I hear you, I hear you…

What exactly is a .dex file? Essentially what the DEXer is doing is making everything super simple for the Dalvik/ART VM running the code by combining everything into one file. It doesn’t care about your fancy architectures, your design patterns, your dependency injection framework, etc… it throws all of that out the window and tries to streamline everything as much as possible. So it essentially tries to add everything into one file so that it reduces the amount of loading and jumps that the Dalvik/ART VMs have to take.

So you might be asking yourself:

“If we’re all about stream lining for Android’s VM, then why are there still multiple classes.dex files”?

Well, that leads to another good point.

The primary reason for having multiple .dex files is due to the size limitation on the amount of methods one can have. The limit is set to 65,536 methods (64 x 1024). These methods include all functions/methods declared in the Android Framework, external libraries, and the methods that you declare in your own code. Just between the Android Framework and the External Libraries, the majority of that initial limit are taken up.

So because of this, depending on your Min API level, you will need to allow for multiple .dex files… hence multiDexEnabled true :)

For large projects, you will see double digit classes.dex files…. leading to multiple MB size .apk’s, and potentially sluggish performance. So because of this, Android quickly realized they needed a way to optimize and shrink this code… but HOW?!

Introducing ProGuard

As you can see above, there was a part of the Build Process that I intentionally skipped over, and that’s the Java ByteCode Transformers stage. At this stage is where developers/third-parties can have transformers that process the bytecode after the compilers finish processing your code. These are things like:

Annotation Processors
Jacoco Code Coverage
And yes, ProGuard…

So after the compiler spits out the .class files, those same files are read in again by the transformer in this stage and the transformer manipulates that bytecode prior to it being handed off to the DEXer and continuuing on down the pipe line.

So what exactly does ProGuard do?

ProGuard has three primary phases:

Shrinking
Optimizing
Obfuscating

Shrinking

Shrinking is commonly referred by GuardSquare as “tree shaking”. Essentially they look for dead code paths, and remove that code. This can have dramatic size decrease benefits on the resulting APK. If you’re only using one small function of a third-party library, guess what… the rest can shrink away. It can do the same for resources as well.

As discussed above, this can also help you to decrease the number of classes.dex files present in your application, since you will be removing methods that are not used.

Optimizing

Changes access modifiers (private, static, final) to make it easier at run time to access certain items, can remove unused parameters and if the optimizer can trace a code path to see that a method always returns a certain value, it can inline those methods and remove the actual method, further reducing code size and streamlining the execution.

Obfuscating

For Obfuscation, ProGuard renames classes and class members that are not entry points to the code. If entry points, or seeds, are obfuscated then your application will crash. This is where the various -keep rules come in to play, but more on those later.

Below is an example of a mapping.txt file, which showcases what obfuscation is doing under the hood.

You can see that it is renaming class packages and the methods inside to something like a or a.a.a.a.b.

This plays two parts… it shrinks the size of the application by renaming your super descriptive function names of

fun launchFragmentAfterUserDoesSomething(val userDidSomething: Int){
    return SuperCoolFragmentFromTouchEvent
}

a(b){return c}

From an attackers perspective, this will raise the bar of entry for them to be able to determine exactly what is happening within the application. It’s easier to find LoginActivity and hone in on that functionality than it is to figure out what a is doing.

But note… Obfuscation by itself is not super difficult to defeat. As mentioned before, Application Entry points have to be maintained for the Runtime VM to be able to determine your applications starting point, so attackers can still trace the application from that point forward. It’s as simple as cracking open an APK, taking a peak into the AndroidManifest.xml and looking for the activity associated with the Launcher <intent-filter>.

<activity
            android:name=".HabitTrackerActivity"
            android:label="Habit Tracker">
            <intent-filter>
                <action android:name="android.intent.action.MAIN"/>
                <category android:name="android.intent.category.LAUNCHER"/>
            </intent-filter>
</activity>

Obfuscation does raise the bar of entry for attackers, but not for persistent attackers. I will go into this more in future posts.

The Result

All of these functions help to provide a much smaller, more optimized, faster application that will then be run by the Dalvik VM. Once on the device, that same .dex file has to be converted to machine language, which is done using either Just In Time Compilation(JIT) and/or Ahead of Time Compilation(AOT)… but more on those later.

Recap

So that was a decent amount of information to digest. If you were unfamiliar with any of these topics, I’m sure your head is spinning with all of this new found knowledge. Let’s quickly recap what we covered:

Android Build Pipeline
Why Android has a different build pipeline from typical Java/Kotlin applications
What is in an APK
Why Android uses .dex files
How .dex files are created
Why tools like ProGuard exist
What ProGuard is doing (high level)

To Be Continued…

In the next post, I will re-enforce what I discussed in this blog with hands on exercises, and following that I will show how D8 and R8 fit into this pipeline.

proguard android r8 smali dalvik jvm kotlin ART gradle java