Java Under the Hood — Part 2: Memory Management

Java Under the Hood — Part 2: Memory Management

·

9 min read

In the first part, we discussed theoretically the process that occurs from the compilation of a .java file until the loading of the .class file into the Java runtime.

We also discussed briefly how the interpreter — one of JVM's aspects — uses an execution stack to process the bytecode instructions in a "last in, first out" manner.

Now we are going to go further in the interpreter in order to understand what is a bytecode and how the Java Virtual Machine executes the instructions located in a .class file.

The .class file

It's important to highlight that a lot of transformations occurs from the initial human-readable code written in a .java file until its execution. The first of these transformations is the compilation — which is the process of transforming source code into a .class file.

Each .class file groups a set of instructions known as bytecode and represents a single class or interface.

Be aware that not every class or interface has an external representation, however, we are going to refer to any valid representation of a class or interface as being a .class file.

A .class file is a stream of 8-bit bytes. Given that, it's possible to consume multiples of 8-bit bytes in order to represent 16-bit, 32-bit, and 64-bit quantities.

The bytecode

The bytecode is platform agnostic and you can trust that any valid .class file is executable in any OS or platform that provides a Java Virtual Machine implementation that complies with the VMSpec.

It's also important to highlight that the bytecode is independent of the Java language itself. Many other languages are capable to compile their own source code into bytecode and the generated .class file is executable on any JVM.

Let's write a little piece of code to help us understand what occurs from the initial Java code until the end of the java process.

First, create a HelloWorld.java containing the following code.

class HelloWorld {
  public static void main(String[] args) {
      System.out.println("Hello World!");
  }
}

To compile this code, execute javacHelloWorld.java in a command-line. This will generate the HelloWorld.class file.

The .class file is a binary file, so you won't be able to read it properly by simply opening this file with a text editor.

To analyze its structure let's dump the content in a hexadecimal format by running hexdump -C HelloWorld.class > HelloWorld.hexdump. This will output the .class content into a new HelloWorld.hexdump file.

By running cat HelloWorld.hexdump it will output something similar to:

A good exercise that is pretty straightforward is: write a "Hello World!" function in Kotlin and/or Scala, and execute the same steps above: compile with kotlinc or scalac and then use hexdump to generate a hexadecimal representation of your .class file.

You will see that each output file is slightly different but all of them follows the same structure and the same runtime concepts apply to all.

A .class file follows the Unix binary file definition. Looking to the first line we have:

00000000 ca fe ba be 00 00 00 34 00 1d 0a 00 06 00 0f 09 |…….4……..|

Remember that a .class file "is a stream of 8-bit bytes"? So, from left to right, grouping bytes we have:

  • 00000000 as the memory address where the following 16 bytes are allocated

  • ca fe ba be as the magic number that indicates this file is a Java binary

  • 00 00 as the minor version of the .class file format

  • 00 34 as the major version of the .class file format

  • 00 1d as the constant poll count

  • 0a 00 06 00 0f 09 as the initial information of the constant poll

We are not going to evaluate every bit of this file. However, it is important to know that a valid .class file has this structure:

  • 8-bit magic number

  • 4-bit minor_version

  • 4-bit major_version

  • 4-bit constant_pool_count

  • N-bit constant_pool[constant_pool_count-1]

  • 4-bit access_flags

  • 4-bit this_class

  • 4-bit super_class

  • 4-bit interfaces_count

  • 4-bit interfaces[interfaces_count]

  • 4-bit fields_count

  • N-bit fields[fields_count]

  • 4-bit methods_count

  • N-bit methods[methods_count]

  • 4-bit attributes_count

  • N-bit attributes[attributes_count]

The N-bit represents a dynamic size that varies accordingly with the class structure and members.

Running javap -c HelloWorld > HelloWorld.bytecode` we can output the bytecode in a more human-readable format:

Each instruction in this representation is an opcode — a predefined instruction that represents the type, the operation, the interactions between local variables, the constant poll, and the stack.

All instructions are predefined and specified by the Java Virtual Machine Specification (chapter 6).

The Execution

The execution of a class or interface consists of three processes: loading, linking and initializing.

Loading is the process of finding the proper binary representation with a particular name and creating a class or interface from that representation.

Linking is the process of taking a class or interface and combining it into the runtime state of the Java Virtual Machine so that it can be executed.

Initializing is the process of execution of the <clinit> method of a given class or interface.

To be executed, a class or interface must have one of its methods invoked as a result of another class execution or it must be the initial class of a java process. Either case, it will have one of your constructors invoked during the initialization process and its code will be executed.

Every constructor written in Java represents an instance initialization method — that is especially known as a <init> method. All classes and interfaces contain at most one particular initialization method that takes no arguments and returns void. This particular class or interface initialization method is called <clinit>.

The <clinit> method is named by the compiler and cannot be referenced in a Java code. Only the JVM, during a class or interface initialization, is capable of invoking the <clinit> of a given class or interface.

A Java Virtual Machine process starts by initializing the Bootstrap classloader and consequently the initial class is loaded, linked, and initialized by invoking its <clinit> method.

The VMSpec specifies many ways to define the initial class, such as a command-line argument for the java binary.

Finally, the method public static void main() is invoked. Any other instruction is executed as a result of the main() method invocation. This invocation may cause the linking and initialization of other classes and/or interfaces, as well as invocation of additional methods.

The execution process of our HelloWorld.java code is described in the image below:

Runtime Areas

Each instruction incurs in, at least, an execution in one of the six different runtime areas: heap space, method area, Java stack, native method stack, program counter register, or constant pool.

The JVM defines multiple runtime data areas that are used during execution. Some areas are created when the JVM starts and are destroyed only when it exits while others are created when a thread is created and destroyed when the respective thread ends.

All areas are equally important and each one of them plays a fundamental role in JVM's execution flow. Although, two of them deserves our attention because they will help us understand how the multithreading aspect of the JVM works.

These two areas are: the Java Virtual Machine Stack (or stack) and the Java Virtual Machine Heap (or heap).

The Stack

One of the most important aspects of JVM is the support to multithreading execution.

Each thread has its own private Java Virtual Machine Stack that is created together when the thread starts. It consists of frames — which are used to store data and partial results, as well as to perform dynamic linking, return values for methods, and dispatch exceptions such as local variables, object references, method parameters, and other method-specific data during the execution of a method.

Local variables and method invocations use the Thread Stack. Each thread gets its own stack. Other areas like the Method Area store static variables and class-level data. The size of the stack memory is fixed. The JVM automatically allocates and frees these memory areas instead of you managing raw memory.

void myMethod() {
  int x; //local variable uses Stack 
}

The Java Virtual Machine Stack is never directly manipulated and only operates in terms of push and pop of frames.

The specification of Java SE 8 allows the user to either specify a fixed size for the Java Virtual Machine Stacks or define a minimum and maximum sizes to allow dynamic expansion/contraction according to its use.

Two important exceptions come from this:

  • If the execution of a thread requires a larger stack than the size prefixed by the user, it results in a StackOverflowError

  • If expansion is allowed but there is no memory to support it, a OutOfMemoryError is thrown

The Heap

The Java Virtual Machine Heap is a runtime data area shared by all threads.

The heap is used to allocate memory for all classes instances and arrays. This area is created when the JVM process starts and is destroyed only when it ends.

Objects are never explicitly deallocated from a given memory space and this space is reclaimed by an automatic storage management system known as the garbage collector.

int[] arr = new int[5]; //object allocated in Heap

Like the stack area, on startup, the user can define if the memory size allocated for the heap is fixed or dynamically allocated.

Only a OutOfMemoryError can occur in the heap area since this area is not related to a specific thread.

It's important to highlight that both memory spaces, for stack and heap allocation, don't necessarily need to be contiguous.

The Exit

The Java Virtual Machine process exits when any thread invokes either of the options below:

  • the exit() or halt() methods of the Runtime class

  • the exit() method of the System class

The Java security manager must allow the exit or halt operation.

To be continued…

From a static perspective, we distilled what is a bytecode and how it is executed by the JVM using the runtime areas in a stacked manner — last in, first out — until the end of the java process.

In the next post, we are going to investigate the memory management system: the garbage collector and the JVM multithreaded characteristics

References

This post is a summary of my understanding of the contents of the following references. I strongly recommend the reading of these contents if you want to go deeper into this subject. Are they:

"Under the Hood of the JVM" Series

source:
[1] https://medium.com/@caique.me/8b10fae2a468
[2] https://naveen-metta.medium.com/8f5b98747486