# Java Under the Hood — Part 1: Java Virtual Machine

The expression **under the hood** is an allusion to an automobile. The “hood” refers to the hatch covering the engine compartment wherein one finds the engine and internal components that make up the bulk of the functional electronic and mechanical components of the vehicle.

Most people never look at the engine of the car, which lives under the hood but as a Senior Java developer we need to know what’s happening under the hood. Of course it's not car but Java. In this article, we’ll peek under the hood to understand the components that make up the ***Java Virtual Machine*** (JVM). Think of the JVM as the engine that drives our Java programs from source code to execution.

At the beginning when start learn Java language we don't necessarily need to understand the JVM to be a Java developer. However, understand its internals can really improve our coding ability which can lead to better software.

On the other hand, if you want to reason about performance you definitely must understand deeply the *JVM*'s internals. To be able to measure and argue about performance, you must understand the complex ecosystem that exists behind the scenes.

This post offers a brief theoretical introduction to the Java language and an overview of the ***Java Virtual Machine*** describing its aspects and the whole process prior to *bytecode* execution.

### Java in a Nutshell

Java is a programming language for general purpose first released in 1995. The main goals that driven its creation highly influenced its design and evolution. These goals were:

* To provide a container for simple execution of object-oriented application code.
    
* To remove tedious bookkeeping from the hands of developers and make the platform responsible for accounting for memory.
    
* To remove C/C++ platform security vulnerabilities wherever possible.
    
* To allow cross-platform execution.
    

> *Java is a blue collar language. It’s not PhD thesis material but a language for a job. (*[*James Gosling*](https://en.wikipedia.org/wiki/James_Gosling)*)*

These goals were pursued even at the expense of low-level developer control and performance cost.

Java was not designed to be a high-performance language, instead, it was thought to provide a consistent and simple programming model. Today high-performance is possible thanks to the consistency of Java’s programming model that allows its internal components to automatically perform optimizations (which is somehow not possible in many other languages).

The portability goal — to allow cross-platform execution — was referred to as ***Write Once, Run Anywhere*** (WORA) and comes with the idea that unaltered Java classes could be executed in multiple distinct platforms. It relies on the existence and availability of a ***Java Virtual Machine*** (*JVM*).

Running any Java code in a platform is just a matter of implementing a *JVM* for this platform that complies with the virtual machine specification (*VMSpec*).

In the early days, the Java language and the *JVM* designs influenced each other. For example, the *JVM* bytecode is typed and the types are essentially those of the Java language. Nowadays, the Java language moved from this deep relation with the *JVM* to the position of the “*first high-level language to run on the JVM*”.

Many factors contributed to Java’s popularity and also to the growth of the ecosystem around the language and the components that support it, among them, the *JVM*.

### The Java Virtual Machine

Given the fact that Java is, by design, a high-level programming language, it means that we, as developers, give up from the low-level control and simply don’t need to care about the details. Of course, in order to give up on low-level control, we must delegate to something once the work has to be done.

The ***Java Virtual Machine*** is an abstract computing machine, composed by multiple managed subsystems, and responsible for Java's hardware and operating system independence. The *JVM* subsystems are required during runtime and their existence imply complex effects that make the *JVM* such an unpredictable environment.

The *JVM* is composed of four key aspects:

* The interpreter
    
* The *classloading* mechanism
    
* The *JIT* compiler
    
* The garbage collector
    

The image below is a representation of Oracle's *HotSpotJava Virtual Machine* and its aspects during execution. *The image shows the components: Java source code, javac, ".class" file, classloader, method cache, emitter, profiler, code cache, and interpreter.* It also shows the compilation process prior to execution. We will analyze in detail each step of the process.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1708408571471/d46543d2-280b-4c2e-afa6-3b814e7eadc2.webp align="center")

### The Compiler

One of the most important and known characteristics of Java is: it's a compiled langua[g](https://en.wikipedia.org/wiki/Compiled_language)e. The term "*compiled language*" means that the source code written in Java must be processed by the Java compiler prior to processing. This is required because the *JVM* knows nothing about the Java language itself and understands only a particular binary format: the *.class* files.

The `javac` (think of "*Java compiler*") is an external tool responsible to transform the source code located in the .*java* files into *.class* files containing ***Java Virtual Machine* instructions**. These instructions are called *bytecode*.

Other languages like [Scalaand](http://scala-lang.com/)[Kotlinare a](https://kotlinlang.org/)ble to use the *JVM* as a delivery vehicle simply because their definitions can be represented in terms of a valid *.class* file. Basically, each language just needs to implement its own compiler that parses the language-specific source code to *bytecode*.

If it's a valid *bytecode*, the *JVM* can execute. The image below demonstrates how source codes written in different languages can be compiled to ".class" files by using the proper compiler.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1708408904929/7b298a7d-b6fc-4501-bded-b0580fc25f59.webp align="center")

### The Interpreter

*<mark>JVM</mark>* <mark> is a </mark> [<mark>stack-based interpreted machine</mark>](https://www.optapy.org/docs/latest/stack-machines/stack-machines.html)<mark>.</mark>

Instead of using [registers](https://en.wikipedia.org/wiki/Processor_register) it piles every instruction in an execution stack and performs calculations by processing the top value (or values) of the stack following the "[*last in, first out*](https://en.wikipedia.org/wiki/Stack_(abstract_data_type))" manner.

When the `java` binary is used to execute a compiled *.class* file, the OS starts a new virtual machine process that sets the Java virtual environment and initializes the execution stack that will be used to execute the code.

The interpreter can essentially be understood as a "*switch* inside a *while* loop" that process each instruction in the execution stack.

The entry point of [most Java programs](https://stackoverflow.com/questions/2896322/is-the-main-method-must-needed-in-a-java-program/2897323#2897323) will be the [*main()*](https://www.baeldung.com/java-main-method) method of a Java class. In order to execute any code of this class, it must be loaded into the *JVM*. This is achieved by the *classloading* mechanism.

### The Classloaders

Ever wondered how the JVM finds and loads all the classes used in your program? Meet the ClassLoader — your personal class butler.

The ClassLoader locates .class files, brings them into memory, and converts the Java bytecode into actual, usable classes for your program. No need to manually load classes, just ask your ClassLoader!

```java
ClassLoader loader = ClassLoader.getSystemClassLoader();
Class c = loader.loadClass("MyClass");
```

*Classloaders* are special objects that come with its own runtime and type system, so they are able to bring other classes into existence without requiring Java itself. This self-contained characteristic avoids a "[the chicken or the egg](https://en.wikipedia.org/wiki/Chicken_or_the_egg)" causality dilemma — commonly known by software people as "[circular dependency](https://en.wikipedia.org/wiki/Circular_dependency)".

C*lassloading* starts by first booting the [*Bootstrap* loaderthat delivers t](https://docs.oracle.com/javase/8/docs/technotes/tools/findingclasses.html#bootclass)he core Java runtime. This loader is responsible to bring fundamental classes — such as `java.lang.Object`, `java.lang.Class`, and `java.lang.Classloader` — that allow the boot of other *classloaders* responsible to provide the rest of the system.

The process continues by loading the [*Extension* loaderwhich defines t](https://docs.oracle.com/javase/8/docs/technotes/tools/findingclasses.html#extclass)he *Bootstrap* loader as your parent. This one is not widely used and supply overrides and native code required by a specific OS or platform.

The classloading ends by loading the [*Application* loaderthat is responsib](https://docs.oracle.com/javase/8/docs/technotes/tools/findingclasses.html#userclass)le for loading the initial user-defined class. Then, it initializes the initial class and invokes the public *main()* method. This loader is frequently used and inherits from the *Extension* loader.

Java only loads a new class when it first encounters an invoke instruction that references to any method of this class. If a *classloader* fails to find a class, it delegates to its parent. When the *Bootstrap* loader is reached and fails to load a given class, a `ClassNotFoundException` is thrown.

To minimize problems related to *classloading*, developers must compile using the exact same [*classpath*that wil](https://en.wikipedia.org/wiki/Classpath_(Java))l be used in production. Build automation tools as [Gradleand M](https://gradle.org/)[avenhelp](https://maven.apache.org/) to prevent this issue.

Every code that lives in the *JVM* must be an object. Therefore, Java only loads a class when an object of type `Class` is created to represent the class in the runtime environment.

Since a class can be loaded twice by different *classloaders*, a class is identified by its loader as well as the fully qualified name — which includes the package name.

In short, the ClassLoader uses a hierarchy to load classes. The primordial Bootstrap ClassLoader loads core Java classes first. Then the Extensions loader handles any extensions you added. Finally, the Application ClassLoader loads classes for your specific app.

By only loading classes on-demand, the JVM starts and runs light and fast.

**To be continued…**

Until here we described all the process required prior to execution and had a taste of the interpreter aspect of the *JVM*.

In the next part, we will walk through the *bytecode* execution and understand how the *Java Virtual Machine* behaves while it is processing the instructions of a *.class* file.

**References**

This post is a summary of my understanding of the contents of the following references. I strongly recommend the reading of these contents if you want to go deeper into this subject. Are they:

* [***The Java® Virtual Machine Specification (Java SE 8 Edition)***](https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf) by Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley
    
* [***Java: The Legend***](https://learning.oreilly.com/library/view/java-the-legend/9781492048299/) by Ben Evans
    
* [***Optimizing Java***](https://www.amazon.com/Optimizing-Java-Techniques-Application-Performance/dp/1492025798) by Ben Evans, Chris Newland, and James Gough
    

**“Under the Hood of the JVM” Series**

* [Java Under the Hood](https://dev.ilenlab.com/series/java-under-the-hood)
    

```yaml
source:
[1] https://medium.com/@caique.me/8b10fae2a468
[2] https://naveen-metta.medium.com/8f5b98747486
```
