Java Under the Hood — Part 1: Java Virtual Machine

·

8 min read

Java Under the Hood — Part 1: Java Virtual Machine

The expression under the hood is an allusion to an automobile. The “hood” refers to the hatch covering the engine compartment wherein one finds the engine and internal components that make up the bulk of the functional electronic and mechanical components of the vehicle.

Most people never look at the engine of the car, which lives under the hood but as a Senior Java developer we need to know what’s happening under the hood. Of course it's not car but Java. In this article, we’ll peek under the hood to understand the components that make up the Java Virtual Machine (JVM). Think of the JVM as the engine that drives our Java programs from source code to execution.

At the beginning when start learn Java language we don't necessarily need to understand the JVM to be a Java developer. However, understand its internals can really improve our coding ability which can lead to better software.

On the other hand, if you want to reason about performance you definitely must understand deeply the JVM's internals. To be able to measure and argue about performance, you must understand the complex ecosystem that exists behind the scenes.

This post offers a brief theoretical introduction to the Java language and an overview of the Java Virtual Machine describing its aspects and the whole process prior to bytecode execution.

Java in a Nutshell

Java is a programming language for general purpose first released in 1995. The main goals that driven its creation highly influenced its design and evolution. These goals were:

  • To provide a container for simple execution of object-oriented application code.

  • To remove tedious bookkeeping from the hands of developers and make the platform responsible for accounting for memory.

  • To remove C/C++ platform security vulnerabilities wherever possible.

  • To allow cross-platform execution.

Java is a blue collar language. It’s not PhD thesis material but a language for a job. (James Gosling)

These goals were pursued even at the expense of low-level developer control and performance cost.

Java was not designed to be a high-performance language, instead, it was thought to provide a consistent and simple programming model. Today high-performance is possible thanks to the consistency of Java’s programming model that allows its internal components to automatically perform optimizations (which is somehow not possible in many other languages).

The portability goal — to allow cross-platform execution — was referred to as Write Once, Run Anywhere (WORA) and comes with the idea that unaltered Java classes could be executed in multiple distinct platforms. It relies on the existence and availability of a Java Virtual Machine (JVM).

Running any Java code in a platform is just a matter of implementing a JVM for this platform that complies with the virtual machine specification (VMSpec).

In the early days, the Java language and the JVM designs influenced each other. For example, the JVM bytecode is typed and the types are essentially those of the Java language. Nowadays, the Java language moved from this deep relation with the JVM to the position of the “first high-level language to run on the JVM”.

Many factors contributed to Java’s popularity and also to the growth of the ecosystem around the language and the components that support it, among them, the JVM.

The Java Virtual Machine

Given the fact that Java is, by design, a high-level programming language, it means that we, as developers, give up from the low-level control and simply don’t need to care about the details. Of course, in order to give up on low-level control, we must delegate to something once the work has to be done.

The Java Virtual Machine is an abstract computing machine, composed by multiple managed subsystems, and responsible for Java's hardware and operating system independence. The JVM subsystems are required during runtime and their existence imply complex effects that make the JVM such an unpredictable environment.

The JVM is composed of four key aspects:

  • The interpreter

  • The classloading mechanism

  • The JIT compiler

  • The garbage collector

The image below is a representation of Oracle's HotSpotJava Virtual Machine and its aspects during execution. The image shows the components: Java source code, javac, ".class" file, classloader, method cache, emitter, profiler, code cache, and interpreter. It also shows the compilation process prior to execution. We will analyze in detail each step of the process.

The Compiler

One of the most important and known characteristics of Java is: it's a compiled language. The term "compiled language" means that the source code written in Java must be processed by the Java compiler prior to processing. This is required because the JVM knows nothing about the Java language itself and understands only a particular binary format: the .class files.

The javac (think of "Java compiler") is an external tool responsible to transform the source code located in the .java files into .class files containing Java Virtual Machine instructions. These instructions are called bytecode.

Other languages like ScalaandKotlinare able to use the JVM as a delivery vehicle simply because their definitions can be represented in terms of a valid .class file. Basically, each language just needs to implement its own compiler that parses the language-specific source code to bytecode.

If it's a valid bytecode, the JVM can execute. The image below demonstrates how source codes written in different languages can be compiled to ".class" files by using the proper compiler.

The Interpreter

JVM is a stack-based interpreted machine.

Instead of using registers it piles every instruction in an execution stack and performs calculations by processing the top value (or values) of the stack following the "last in, first out" manner.

When the java binary is used to execute a compiled .class file, the OS starts a new virtual machine process that sets the Java virtual environment and initializes the execution stack that will be used to execute the code.

The interpreter can essentially be understood as a "switch inside a while loop" that process each instruction in the execution stack.

The entry point of most Java programs will be the main() method of a Java class. In order to execute any code of this class, it must be loaded into the JVM. This is achieved by the classloading mechanism.

The Classloaders

Ever wondered how the JVM finds and loads all the classes used in your program? Meet the ClassLoader — your personal class butler.

The ClassLoader locates .class files, brings them into memory, and converts the Java bytecode into actual, usable classes for your program. No need to manually load classes, just ask your ClassLoader!

ClassLoader loader = ClassLoader.getSystemClassLoader();
Class c = loader.loadClass("MyClass");

Classloaders are special objects that come with its own runtime and type system, so they are able to bring other classes into existence without requiring Java itself. This self-contained characteristic avoids a "the chicken or the egg" causality dilemma — commonly known by software people as "circular dependency".

Classloading starts by first booting the Bootstrap loaderthat delivers the core Java runtime. This loader is responsible to bring fundamental classes — such as java.lang.Object, java.lang.Class, and java.lang.Classloader — that allow the boot of other classloaders responsible to provide the rest of the system.

The process continues by loading the Extension loaderwhich defines the Bootstrap loader as your parent. This one is not widely used and supply overrides and native code required by a specific OS or platform.

The classloading ends by loading the Application loaderthat is responsible for loading the initial user-defined class. Then, it initializes the initial class and invokes the public main() method. This loader is frequently used and inherits from the Extension loader.

Java only loads a new class when it first encounters an invoke instruction that references to any method of this class. If a classloader fails to find a class, it delegates to its parent. When the Bootstrap loader is reached and fails to load a given class, a ClassNotFoundException is thrown.

To minimize problems related to classloading, developers must compile using the exact same classpaththat will be used in production. Build automation tools as Gradleand Mavenhelp to prevent this issue.

Every code that lives in the JVM must be an object. Therefore, Java only loads a class when an object of type Class is created to represent the class in the runtime environment.

Since a class can be loaded twice by different classloaders, a class is identified by its loader as well as the fully qualified name — which includes the package name.

In short, the ClassLoader uses a hierarchy to load classes. The primordial Bootstrap ClassLoader loads core Java classes first. Then the Extensions loader handles any extensions you added. Finally, the Application ClassLoader loads classes for your specific app.

By only loading classes on-demand, the JVM starts and runs light and fast.

To be continued…

Until here we described all the process required prior to execution and had a taste of the interpreter aspect of the JVM.

In the next part, we will walk through the bytecode execution and understand how the Java Virtual Machine behaves while it is processing the instructions of a .class file.

References

This post is a summary of my understanding of the contents of the following references. I strongly recommend the reading of these contents if you want to go deeper into this subject. Are they:

“Under the Hood of the JVM” Series

source:
[1] https://medium.com/@caique.me/8b10fae2a468
[2] https://naveen-metta.medium.com/8f5b98747486