DRLVM Jitrino Just-in-time Compiler

Revision History
About This Document
Overview
1. Key Features
2. About Compilation
Jitrino.OPT
1. Architecture
2. Processes
Jitrino.JET
1. Architecture
  1. Run-time Support
2. Processes
  1. Baseline Compilation
Utilities
Public Interfaces
1. JIT_VM Interface
2. JIT_EM Interface
References

Revision History

Version	Version Information	Date
Initial version	Intel, Nadya Morozova: document created.	September 4, 2006

About This document

Purpose

This document describes the internal structure of the Jitrino just-in-time compiler deployed with the virtual machine as part of the DRL (Dynamic Runtime Layer) initiative. The description covers the internal design of this JIT compiler and its interaction with other DRLVM components. In this document, you can find implementation-specific details of the Jitrino compiler. General information on the JIT role in overall virtual machine design and VM-level requirements are out of scope of this document and are covered in the DRLVM Developer's Guide supplied with the VM source code package.

Intended Audience

The document is targeted at DRLVM developers with special interest in code compilation algorithms. The information can be helpful for future development of DRL compilation techniques and can serve as an example for those implementing a JIT compiler from scratch. The document assumes that readers understand the concepts of just-in-time compilation and optimization algorithms.

Using This Document

The DRLVM just-in-time compiler description has the following major sections:

Overview: a definition of the JIT compiler component and its key features
Jitrino.OPT:
- Architecture: a description of the Jitrino.OPT internal architecture, its subcomponents and the interfaces it uses, as well as other implementation-specific data
- Processes: an overview and a step-by-step description of compilation, including optimizations and code generation
Jitrino.JET:
- Architecture: a description of the Jitrino.OPT internal architecture, its subcomponents and the interfaces it uses, as well as other implementation-specific data
- Processes: an overview and a step-by-step description of baseline compilation
Utilities: a description of JIT-specific utilities, such as the control flow graph, the timers, and the memory manager
Public interfaces: a definition of major functional groups that the JIT compiler exports for interaction with other components

Conventions and Symbols

This document uses the unified conventions for the DRL documentation kit.

Overview

Jitrino is the code name for the just-in-time (JIT) compiler [2] currently shipped with DRLVM. Jitrino comprises two distinct JIT compilers that share source code and are packaged in a single library:

Jitrino.JET baseline compiler translates Java* bytecode into native code with practically no optimizations. The compiler emulates operations of the stack-based machine using a combination of the native stack and registers.
Jitrino.OPT optimizing compiler can performs bytecode compilation with a variety of optimizations [7]. The compiler features two types of code intermediate representation (IR): platform-independent high-level IR (HIR) [8] and platform-dependent low-level IR (LIR). Jitrino.OPT incorporates an extensive set of code optimizations for each IR type. This JIT compiler has a distinct internal interface between the bytecode translator operating on HIR and the code generator operating on LIR. This enables easy re-targeting of Jitrino to different processors and preserving all the optimizations done at the HIR level.

This document describes both compilers and their operation. All references to Jitrino with no subtitle (JET or OPT) specified equally apply to both compilers.

Key Features

Key features of the JIT compiler include:

A clear interface to plug in different front-end components
A clear interface to plug in code generators for different platforms
High configurability via command-line options and a configuration file

Jitrino.OPT also features the following capabilities:

A two-level intermediate representation with most optimizations run at the platform-independent high level
The ability to plug in new optimization passes at both intermediate representation levels
A flexible logging system enables tracing of major Jitrino activities, including detailed IR dumps during compilation and run-time interaction with other DRL components on the per-thread basis, as well as gathering execution time statistics of compiler code at a very fine granularity level
Configurable self-check capabilities to facilitate bug detection

About the Compilation Process

Jitrino compilers provide means to compile and optimize code distributed for Java* run-time environments and to adapt it to various hardware architectures. Figure 1 demonstrates the architecture of the compilers and their interaction with the virtual machine.

Both Jitrino.JET and Jitrino.OPT compilers have a platform-independent Java front-end and a platform-dependent back-end. Compilation connects these and propagates type information extracted by the front-end from the original bytecode to the platform-specific back-ends. Supporting a new hardware platform requires implementation of a new platform-dependent back-end.

Jitrino can follow different code compilation strategies. The compilation process can be optimized for the smallest compilation time, for the best performance or for a compromise of affordable compilation time with reasonable performance. The compilation process can involve the Jitrino.JET baseline compiler, Jitrino.OPT optimizing compiler or both. In most applications, only a few methods consume the majority of time at run time, so that overall performance benefits when Jitrino aggressively optimizes these methods. The Execution Manager defines the actual compilation strategy.

The Jitrino.JET baseline compiler provides the fastest compilation time by translating Java* bytecode directly to native code. This compiler performs a very fast and simple compilation and applies almost no optimizations.

Jitrino.OPT, the main Jitrino compilation engine, provides the most optimized native code by the cost of greater compilation time. The compilation process of the Jitrino.OPT is also shown in Figure 1, with focus on the following:

The run-time environment bytecode is translated into the high-level intermediate representation (HIR) by the Java* bytecode translator and then optimized by the high-level optimizer. HIR and the optimizer make up the language- and platform-independent part of the Jitrino.OPT.
Note

The Jitrino architecture is modular, which facilitates implementation of more front-ends, such as the Common Language Infrastructure (CLI) bytecode front-end.
After optimization, a platform-specific code generator translates HIR into the platform-specific low-level intermediate representation (LIR). The code generator then performs platform-specific optimizations and register allocation over LIR, and finally emits native code.

This document describes the internal structure of the Jitrino.JET and Jitrino.OPT compilers and the processes running inside them.

JIT Architecture

Figure 1. Jitrino Compiler Architecture

Jitrino.OPT

Architecture

This part of the document describes the internals of the optimizing compiler Jitrino.OPT.

Pipeline Management Framework

The pipeline management framework (PMF) defines how the compilation process goes inside Jitrino.OPT. With PMF, the compilation process is represented as a pipeline, which is a linear sequence of steps. Each step stores a reference to an action object, its parameters and other information. Actions represent independent transformations of code, such as optimization passes. Different steps in a pipeline can reference to the same action, for example, to run the same transformation several times. Sequences of steps can vary between pipelines.

To select a pipeline for compiling a given Java* method, the system uses method filters consisting of class and method names and method signatures as the selection criteria. Each JIT instance has one common pipeline with an empty method filter that accepts all methods for compilation. Additionally, optional pipelines with unique and non-empty filter expressions can be created for compiling specific Java * methods sets.

Pipelines in Jitrino.OPT are configured using the VM properties mechanism. PMF parses properties, constructs pipelines and passes parameters to actions. The OPT compiler has no hard-coded pipeline, so you need to configure pipelines in EM configuration files or through VM properties. Understanding pipeline configuration rules is required for using the Jitrino command-line interface and effectively exercising the Jitrino logging system. For details on PMF internals, refer to the PMF Detailed Description.

Logical Components

This section defines the key parts of the compiler. This is only an abstract, logical division matching the key compilation stages. Each logical component includes action(s) that are used consecutively in compilation pipelines.

Translator

The bytecode translator is responsible for converting incoming bytecode instructions into a high-level intermediate representation [8]. This IR is of a lower level than the bytecode and breaks complex bytecode operations into several simple instructions to expose more opportunities to later high-level optimization phases. For example, loading an object field is broken up into operations that perform a null check of the object reference, load the base address of the object, compute the address of the field, and load the value at that computed address.

For details on the conversion process, see section Bytecode Translation.

Optimizer

The optimizer includes a set of optimizations independent of the original Java* bytecode and the hardware architecture. A single optimization framework for Java* and CLI programs is used. The optimizer performs a series of transformation passes to optimize the incoming high-level intermediate representation. For a description of applied transformations, see section High-level Optimizations.

After the high-level optimizations (HLO) are applied, the code selector translates the high-level intermediate representation to a low-level intermediate representation. The component is designed so that code generators for different architectures can be plugged into the compiler. To be pluggable, a code generator must implement code selector callback interfaces for each structural entity of a method, such as the whole method, basic blocks, and instructions. During code selection, the selector uses the callback interfaces to translate these entities from HIR to LIR. See section Code Selection for details on the process.

Code Generator

The code generator (CG) is responsible for generation of machine code out of the input high-level intermediate representation. CG accepts the HIR information via the code selector callback interfaces. For details on how the resulting code is produced, see section Code Generation.

The code generator also performs several auxiliary operations, such as:

Creation of a data area with constants used in code
Generation of auxiliary structures necessary for run-time support, such as the stack layout description, the GC map and registers
Registration of exception handlers in VM
Registration of direct calls for patching

Internal Representations

An intermediate representation (IR) is an internal compiler representation for code being compiled. Jitrino.JET has no intermediate representation of code and directly compiles bytecode into the native code. Jitrino.OPT uses two IR forms: the high-level intermediate representation (HIR) and the low-level intermediate representation (LIR). To compile a method's code, the Jitrino.OPT compiler translates Java* bytecode into a graph-based structure with nodes, edges and instructions. The nodes and edges in the graph denote the control flow of the program. Every node in the graph is populated with instructions that denote the primitive operations.

Example

Here is an example of corresponding Java* code, Java* bytecode and the low-level intermediate representations used in Jitrino.OPT:

Java* code:

    public static int max(int x, int y) {
        if (x > y) {
            return x;
        }
        return y;
    }

Java* bytecode:

public static int max(int, int);
  Code:
   0:   iload_0
   1:   iload_1
   2:   if_icmple       7
   5:   iload_0
   6:   ireturn
   7:   iload_1
   8:   ireturn

Jitrino high-level intermediate representation of code:

HIR representation of code - example

Jitrino low-level intermediate representation of code:

LIR representation of code - example

Both HIR and LIR use a common Control Flow Graph structures and its algorithms; see section Control Flow Graph for the details. This section describes the two intermediate representations currently used in Jitrino in greater detail.

High-Level IR

The Jitrino high-level intermediate representation (HIR) is a platform-independent representation of the code being compiled. In HIR, each basic block node consists of a list of instructions, and each instruction includes an operator and a set of operands. HIR supports a single static assignment (SSA) form where each operand has exactly one assignment. The SSA form provides explicit use-def links between operands and their defining instructions, which simplifies and speeds up high-level optimizations. Each HIR instruction and each operand have detailed type information propagated to the back-end at further compilation stages.

The compiler also maintains dominator and loop structure information on HIR for use in optimization and code generation.

Low-level IR: Jitrino low-level intermediate representations (LIR) are specific for code generators implementing them. The specifics of the Jitrino IA-32/Intel® 64 CG LIR is that unlike HIR, it does not support SSA form and is designed to be very close to the IA-32 and Intel® 64 architectures.

Processes

This part of the document describes the key processes that go inside the Jitrino optimizing compiler.

Bytecode Translation

The initial compilation step is the translation of bytecode into HIR, which goes in the following phases:

The bytecode translator establishes the basic block boundaries and exception handling regions, and infers type information for variables and operators. At this phase, the translator generates type information for variables and virtual Java* stack locations, similarly to the bytecode verification algorithm described in the JVM specification [1].
The bytecode translator generates HIR and performs simple optimizations, including constant and copy propagation, folding, devirtualization and in-lining of method calls, elimination of redundant class initialization checks, and value numbering-based redundancy elimination [3].

High-level Optimizations

High-level optimizations are platform-independent transformations performed by the optimizer. The optimizer applies a set of classical object-oriented optimizations balancing the effectiveness of optimizations with their compilation time. Every high-level optimization is represented as a separate transformation pass over HIR. Each Jitrino.OPT optimization aims at one or more goals, as follows:

Scope enhancement extends the scope of other optimizations.
Redundancy elimination gets rid of redundant operations, which can be removed without changing code semantics.
HIR simplification cleans up the intermediate representation between passes.

Optimization Modes

The Jitrino high-level optimizer supports various optimization modes, which differ by the optimization path and profile used to optimize the code. Different optimization modes are customized for different application types: client applications usually require fast startup time and reasonable response time, whereas server applications require top-level performance in the long run. A particular optimization mode is defined by the following:

The optimization path - a part of the compilation pipeline representing the set of optimization passes performed during compilation of a method. The default optimization path set in the configuration file is based on Java* properties. You can override the default settings in the configuration file or on the command line.
The profile - a static or a dynamic profile that a JIT compiler can generate and use. Currently, Jitrino.JET provides the method hotness/backedge profile, and Jitrino.OPT provides the edge profile. For a description of profiles and profile-related processes, see the EM component description.

Several pre-defined Jitrino optimization modes are stored in the execution manager configuration files, as follows:

The client mode uses the dynamic method hotness/backedge profile and is tuned for fast application startup and reasonable performance. This is the default optimization mode.
The server mode is tuned for decent performance of long-running server applications. This mode is represented in two variants to play with: the dynamic server mode using the dynamic edge profile and the static server mode using the profile based on heuristics.

You can define the profile to use on the command line. For example, to set JIT to use the server dynamic mode, specify the following option:

-Xem:server

This section defines all optimizations that are currently available in the Jitrino.OPT compiler. Related optimizations are gathered in groups, as follows:

Scope Enhancement Passes

The high-level optimization begins with a set of transformations to enhance the scope of further optimizations, as follows:

Guarded devirtualization (devirt) of virtual method calls reduces their run-time cost and enables the compiler to inline their targets.
Provided exact type information, this optimization can convert a virtual call into a more efficient direct call. When no type information is available, the most probable target of the virtual method can be predicted, and the optimization devirtualizes the call by guarding it with a cheap run-time class test to verify that the predicted method is in fact the target.
Inlining (inline) removes the overhead of a direct call and builds the code of the called method into the code of the caller in place of its call site. Inlining is an iterative process involving other optimizations. Inlining goes as follows:
- The inliner selects candidates for inlining in the following sequence:
  - Examines each direct call site in the IR form, including those exposed by guarded devirtualization.
  - Heuristically estimates the potential benefit of inlining.
  - Checks whether the benefit exceeds a certain threshold, and, if it does, registers the call in a priority queue.
- The inliner selects the top candidate, if any, for inlining.
- The translator generates an intermediate representation for the method selected for inlining.
- The optimizer runs over HIR of the method using the inliner pipeline.
- The inliner finds further inline candidates, if any, in the analyzed representation and replicates it in the representation of the caller.
- The inliner selects a new inline candidate from the queue and repeats the cycle.
  The inliner stops its work when the queue is empty or after code IR reaches a certain size limit.
The example below illustrates the inlining algorithm.
```
Inline(HIR_of_compiled_method) {
    current_bytecode_size = HIR_of_compiled_method.get_method().bytecode_size()
    find_inline_candidates(HIR_of_compiled_method)
    while (true) {
        callee = NULL
        while (!inline_candidates.empty()) {
            callee = inline_candidates.pop()   
            callee_bytecode_size = callee.bytecode_size()
            if ((current_bytecode_size + callee_bytecode_size) < SIZE_THRESHOLD) {
                current_bytecode_size += callee_bytecode_size
                break;
            }
        }
        if (callee = NULL) {
            break;
        }
        HIR_of_callee = Translator.translate(callee)
        Optimizer.optimize(HIR_of_callee, inliner_pipeline)
        find_inline_candidates(HIR_of_callee)
        HIR_of_compiled_method.integrate(HIR_of_callee)
    }
}

find_inline_candidates(method_HIR) {
    foreach direct_call in method_HIR {
        inline_benefit = compute_inline_benefit(direct_call)
        if (inline_benefit > BENEFIT_THRESHOLD) {
            inline_candidates.push(direct_call)
        }
    }
}
```
Lowering (lower) performs basic instruction-level transformations to replace common helper calls with the corresponding HIR code. A helper call generally is performance-expensive, so that inlining the operation performed by a helper method can improve performance. This is especially true for operations that are proved to be redundant afterwards.

Redundancy Elimination Passes

This set of optimizations aims at eliminating redundant and partially redundant operations. If JIT can prove that some operations are redundant and have no side effects, they might be removed from the code. This way, time for execution of the redundant operations is saved and the resulting code executes faster. This optimization group consists of the following passes:

Memory optimization (memopt) reduces the number of operations with memory by removing redundant loading and storing instructions.
Firstly, memopt works on the SSA form to combine all locations of an object into one alias. After that, the optimization updates use-def dependencies with the alias instead of locations. According to these new dependencies, memopt deletes redundant stores. Finally, it performs scoped hash-value numbering on the resulting control flow graph to eliminate redundant load operations.
Lazy exceptions optimization (lazyexc) eliminates redundant creation of exception objects. In cases when an exception object is not used in the exception handler, time spent on creating the exception object and creating and recording the stack trace in the exception object is wasted. If the constructor of the exception object has no side effects and the exception object is not used before it is thrown, then the creation of the exception object is delayed until the exception object is really used.
Loop-oriented optimizations are the following:
- Loop peeling moves one or more iterations to the loop header to reduce the looping overhead for a small loop count and to enable optimizations in peeled iterations.
- Load invariant hoisting moves operations that are invariant across loop iterations outside the loop.
- Loop unrolling expands the loop body by combining several iterations into one to reduce the loop overhead and to expand the scope for optimizations in the loop body.
Array-bounds check elimination (abcd) analyzes method code and removes redundant checks of array bounds. Normally, these checks identify situations when a program tries to access an element beyond the array bounds, and throw ArrayIndexOutOfBoundsException. The JIT compiler inserts such checks before every access to an array element and some of these checks are redundant. [5].
Global code motion (gcm) moves computational instructions between basic blocks. The goal is to move each movable instruction to the basic block with minimal probability of execution. Probabilities are provided by a profile based on static heuristics or on run-time execution. To preserve semantics, only instructions without side effects are considered movable. Instructions can be moved up and down the dominator tree.

HIR Simplification Passes

HIR simplification passes are a set of fast optimizations that the Jitrino optimizer performs several times over the intermediate representation to reduce its size and complexity. Simplification passes improve code quality and efficiency of more expensive optimizations. HIR simplifications are often grouped in a series of simplification passes to be performed at various points in the optimization path.

Unreachable code elimination (uce) detects and removes unreachable code by traversing the control flow graph.
Dead code elimination detects and removes dead code by using a sparse liveness traversal over use-def links of the SSA form [3].
Simplification (simplify) includes the following:
- Simplification of arithmetic expressions
- Copy and constant propagation
- Constant folding
- Subexpression re-association
- Simplification of trivial branches and calls [3]
Hash value numbering (hvn) eliminates common sub-expressions [4]. This pass uses an in-order depth-first traversal of the dominator tree instead of the more expensive iterative data flow analysis. High-level value numbering effectively eliminates redundant address computation and check instructions. For example, chkzero(), chknull(), and chkcast() HIR instructions are redundant if guarded by explicit conditional branches.

Static profile estimator (statprof)

Many optimizations can use the edge profile information for greater efficiency. When the execution manager is configured to use a dynamic profiling mode, the profile is gathered by the JIT. But even in static mode, when a dynamic profile is not available, Jitrino.OPT can use the statprof optimization pass to update HIR with a profile based on heuristics. In the dynamic profiling mode, some optimizations may break profile information by changing the CFG structure. In this case, statprof can be used to fix the profile information and keep it consistent.

Code Selection

After the optimization passes, HIR is translated to LIR. This code selection (CS) is based on the HIR hierarchical structure of the compiled method, as shown in Figure 2.

Code Selector work flow

Figure 2. Code Selector Framework

Where:

White boxes indicate parts of the code selector. Nesting of boxes reflects the hierarchy of elements, see details below.
Yellows boxes with dashed borders indicate IR entities analyzed or created by the corresponding code selector parts.
Arrows indicate IR element conversion via callback interface calls.

For the method, the set of operands of multiple definitions, the control flow graph, and the set of CFG basic block nodes, the code selector framework defines the following:

An abstract class representing a code selector of the HIR element, and the default implementation of the abstract class on the optimizer side. This part of the implementation is responsible for HIR traversal.
A callback interface to transform direct sub-elements of the analyzed entity from HIR to LIR. The callback is implemented on the code generator side. This part is responsible for LIR creation

Thus, the CS framework establishes a well-defined boundary between the optimizer and a pluggable code generator. The code selector framework also enables a structural approach to IR conversion, which CG can override at several levels.

Figure 3 shows the process of code selection, with loops highlighted using the yellow color.

Sequence of code selection with objects and method calls shown

Figure 3. The Code Selection Sequence Diagram

Figure 3 illustrates specifics of the conversion process, as follows:

All instances of the code selector (CS) classes responsible for HIR traversal are created on the optimizer side.
All instances of the callback interface implementations responsible for building LIR are created on the CG side.
The top-down IR conversion is performed as follows:
- Variable CS traverses all HIR operands with multiple definitions and calls the Variable CS callback for each such operand to create the corresponding LIR operand.
- CFG CS traverses all HIR nodes and edges and calls methods from the CFG CS callback to create the corresponding LIR nodes and edges.
- For each basic block node, CFG CS creates an instance of Basic Block CS, which then calls methods of the Basic Block CS callback to translate HIR instructions from the basic block to LIR. Basic Block CS is also known as the Instruction Selector.

Code Generation

The code generation process is specific to the pluggable code generator implementing it. This section briefly describes the current implementation of Jitrino IA-32/Intel® 64 code generator, as well as measures taken to ensure that it is thread-safe.

To generate code for a method, the code generator performs a number of steps that are roughly divided into the following stages:

LIR Creation

At this stage, the code generator creates the LIR corresponding to the input HIR in its implementation of the code selector callback interfaces. The resulting LIR is quite compact and possesses the following properties:

Most 2-operand instructions are generated in the extended 3-operand form with a separate destination operand.
Each call site is represented as a single instruction without explicit stack creation for callee arguments.
64-bit integer arithmetic is represented by pseudo-instructions (macros)
Address arithmetic is mostly explicit without usage of the complex address forms
Most operand copy instructions are represented by pseudo-instructions which do not impose any constraints on its operands

LIR Transformations

At this stage, the code generator performs a number of transformations and optimizations over LIR, as follows:

Inserts yield points at back branches of certain kinds of loops to enable safe thread suspension.
Performs the first pass of GC safe-point analysis, which transforms code to ensure correct GC map creation at the end of the code generation process regardless of in-process transformations.
Folds address arithmetic into complex address forms as needed.
Expands pseudo-instructions for 64-bit integer arithmetic to real native instruction sequences with some optimizations applied.
Translates LIR instructions from the extended 3-address form to the native 2-address form.
Analyses instruction and calling convention constraints. Based on analysis results, the code generator splits operands so that each operand satisfies the constraints of the instructions where it is used.
Performs global register allocation to assign most frequently used operands to general-purpose or XMM registers as needed.
Performs local register allocation and spill-code generation on each basic block taking into account instruction constraints. This pass ensures that all operands are assigned to physical locations, in a register or on the stack. This pass can produce correct code with no prior global register allocation.
Linearizes CFG basic blocks according to profile information, if any.
Expands copy pseudo-instructions to real native instruction sequences. Copies of stack operands with non-overlapping live ranges are coalesced.
Goes over the stack layout to assign offsets to stack operands and to create the stack layout description.

The actual code generation process can also include different optimization passes, such as constant and copy propagation, dead code elimination, and redundant comparison elimination. Optimizations are enabled via EM configuration files and the command-line interface.

Code and Data Emission

At this stage, the code generator does the necessary preparations and translates LIR into machine code, as follows:

Generates all required binary chunks from LIR and links the generated code to VM for further run-time support. Specifically, the code generator does the following:
1. Creates a constant data area with switch tables, floating point constants, and other data that might be needed for CG debugging features.
2. Links LIR to VM data structures and the constant data area.
3. Translates LIR into machine code.
4. Registers direct calls to other managed code to enable patching in case the target of a direct call is recompiled later.
5. Registers try blocks and corresponding exception handlers with VM.
6. Registers information about inlined methods with VM.
Updates the stack layout description with additional stack information, such as stack depth bound to offsets of CALL instructions.
Creates the GC map to describe root set locations for each GC safe point.
Note

Only call sites are considered GC safe points in the current implementation
Writes the stack layout description, the GC map, and the bytecode map into the memory chunk associated with the compiled method. These data are further used at run time for the following:
- The stack layout description is used in stack unwinding for exception handling, GC root set enumeration, and other stack iteration operations.
- The GC map is used for root set enumeration by a precise garbage collector.
- The bytecode map is used for mapping between native code and Java* bytecode.

Global Lock

Because memory allocation routines are not thread-safe in the current VM implementation, Jitrino sets a global lock for the code generation stage to ensure correct allocation of memory for compiled method data. The global lock must be taken into account when working in a multi-threaded environment, for example, when compilation of a method starts simultaneously in several threads. The global lock is shared between Jitrino.JET and Jitrino.OPT and ensures that only a single thread tries to allocate memory for a method at once. The lock is taken in the lock_method Action object and released in the unlock_method Action object.

The lock_method action also checks whether a code block is already allocated by the current JIT instance for the method being compiled. If the code block is already allocated, the method has already been compiled in another thread. In this case, the lock_method action does not place the lock, but stops compilation with the COMPILATION_FINISHED status. The action unlock_method releases the lock taken by the lock_method action.

The global lock imposes the following requirements:

No Action object in the code generator can stop compilation with the COMPILATION_FINISHED or COMPILATION_FAILED condition. Otherwise, the lock remains set and blocks method compilation in other threads.
Resources with the live time equal to the method’s life time must be allocated only in the code generator, and not in the optimizer.
Code generation actions must not invoke VM methods that might lead to execution of Java code (for example, resolve_static_method); otherwise, the action might lead to a deadlock.

JITRINO.JET

Architecture

The Jitrino.JET baseline compiler is the Jitrino subcomponent used for translating Java* bytecode into native code with practically no optimizations. The compiler emulates operations of stack-based machine using a combination of the native stack and registers.

Run-time Support

During the code generation phase, the state of the method's operand stack is mimic. This state helps to calculate the GC map, which is used later at run time to support GC operation.

The GC map shows whether the local variables or the stack slots contain an object. The GC map for local variables is updated on each defining operation with a local slot, as follows:

If an object is stored, the appropriate bit in the GC map is set (code is generated to set the bit).
If a number is stored, the appropriate bit gets cleared.

The GC map for the stack is updated only at GC points, that is, before an instruction that may lead to a GC event, for example, a VM helper call. The stack depth and the stack state calculated during method compilation get saved before invocation: code is generated to save the state. The state is saved into the special fields that are pre-allocated on the native stack of the method. These fields include GC information, namely the depth of operand stack, the stack GC map, and the locals GC map.

Additionally, Jitrino.JET prepares and stores a specific structure, the method info block, for each method during compilation. This structure is later used to support run-time operations, such as stack unwinding and mapping between bytecode and native code.

Processes

Baseline Compilation

Baseline compilation is the process of compiling code with minimal optimization. The Jitrino.JET subcomponent performs this operation as described below.

Jitrino.JET performs two passes over bytecode, as shown in Figure 4. The compiler establishes basic block boundaries during the first pass, and generates native code during the second.

Example of two-pass compilation process

Figure 4. Baseline Compilation Path

Subsequent sections provide a description of these passes.

Pass 1

During the first pass over bytecode of a method, the compiler finds basic block boundaries and counts references for these blocks.

Note

The reference count is the number of ways for reaching a basic block (BB).

To find basic blocks boundaries, Jitrino.JET does a linear scan over the bytecode and analyses instructions by using the following rules:

Instructions athrow, return, goto, conditional branches, switches, ret, and jsr end a basic block.
Basic block leader instructions immediately follow the instructions ending a basic block or serve as targets for branches. Exception handler entries are also among the basic block leaders.

During the first pass, the compiler also finds the reference count for each block.

Example

Figure 4 illustrates an example with reference counts. The reference count ref_count for the second basic block (BB2) is equal to 1 because this block can only be reached from the first basic block (BB1). The other reference count is equal to 2, because the third basic block can be reached as a branch target from BB1 or a fall-through from BB2.

Example of reference counters reached from different basic blocks.

Figure 5. Reference Count for Basic Blocks

Jitrino.JET uses the reference count during code generation to reduce the number of memory transfers.

Pass 2

During the second pass, Jitrino.JET performs the code generation, as follows:

Walks over the basic blocks found at Pass 1 in the depth-first search order
Generates code for each bytecode instruction and mimics the Java* operand stack
Matches the native code layout and the bytecode layout
Updates relative addressing instructions, such as CALL and JMP instructions.

For details on the implementation of baseline compilation, generate reference documentation from the source code by using Doxygen.

Utilities

The JIT compiler relies on the following utilities:

The memory manager minimizes the number of calls to system heap allocations and automatically frees all allocated memory in the destructor.
A set of Standard Template Library (STL) containers use the memory manager as their allocator class.
Timers gather compilation and execution statistics.
The logging system support diagnostics inside the compiler.
The control flow graph represents the flow of a method and give basis for IR forms.
Internal Profiler of the code generator instruments code so that per-method counters of executed instructions can be dumped.

Note

The JIT compiler utilities are similar to, but not identical with the VM utilities. For example, the JIT compiler and the VM core use different loggers.

Memory Manager

In the Jitrino.OPT compiler, memory allocation is done using custom memory manager routines. This mechanism ensures that all memory allocated during a compilation process is freed after the compilation is finished. In addition, the memory manager decreases the number of system calls by using the fast thread-local memory allocation algorithm. Memory manager code and operators for overloaded memory allocation are in .h and .cpp files in the jitrino/src/shared/ directory.

To start using the memory manager, a JIT compiler developer must create an instance of it providing the initial heap size and the name to be used for logging.

The memory manager allocates memory from the operating system in large chunks called arenas. The minimal size of an arena used in MemoryManager is 4096 bytes. When the JIT compiler requests to allocate memory for an object, the memory is taken from the current arena with no system calls. When the current arena does not have enough free space, the memory manager allocates another arena.

Here is a typical pattern for using the MemoryManager class:

void optABC() {
    //the temporary memory manager used for optABC optimization data
    MemoryManager tmpMM(10000, "mm::optABC");

    StlVector<int> myData1(tmpMM, 1000);
    int* myData2 = new (tmpMM) int[1000];
    //JIT compiler code follows
}

The memory allocated with the memory manager is de-allocated in its destructor and no destructors are called for objects allocated with the memory manager. This feature of the memory manager enforces the following rules upon JIT compiler code:

Never allocate MemoryManager using another memory manager. Otherwise, the memory of MemoryManager is never freed.
Mix objects allocated with different memory managers carefully. Lifetime of such objects can be different.
Destructors of objects allocated with MemoryManager are never called. Leave the destructors empty.
To avoid out-of-memory errors, remember that the memory allocated with MemoryManager is de-allocated only when MemoryManager is destroyed.

Jitrino.OPT has two dedicated memory managers:

The global memory manager created when Jitrino is initialized. This memory manager is used with objects having the same lifetime as the JIT compiler. See jitrino/src/main/Jitrino.cpp file and global_mm static field for details.
The compilation time memory manager created every time the compilation process starts. This memory manager allocates objects with the lifetime equal to compilation time, such as instructions, nodes, edges and other structures related to the compilation context.

Using MemoryManager, you might not get system notifications on memory corruption.

Example

Memory corruption can happen when a value is stored to the array by the index that is out of the array's range:

    MemoryManager tmpMM(10000, "myMM");
    int* myData2 = new (tmpMM) int[10];
    myData[10] = 1;

This code is executed successfully because the default memory chunk allocated by the memory manager is greater than the array size.

To enable the checking of memory corruption errors, define the JIT_MEM_CHECK macro in the MemoryManager.cpp file. After this macro is defined, the memory manager fills all the arena's space with the predefined value and adds the padding space between objects. Every time an object is allocated, the memory manager checks these predefined values in the arena. If a write operation has been performed in the restricted area, the memory manager reports an error.

Counters and Timers

Jitrino maintains counters to collect statistics. A counter can be used in any Jitrino action to count a particular event in all pipelines and during the whole VM session. Each counter has a name to distinguish it from other counters.

To sum up execution times of a Jitrino action, Jitrino also provides timers, a specialized form of counters. To activate counters and time measurement, use the following command syntax:

-XDjit.<JIT>.arg.time=on

Note

This option is off by default.

The execution time of all instances of each action is measured independently and summed up at VM shutdown. Resulting data on action execution times are printed into a table and sorted by the action name.

Note

Currently, to print the action execution times and counter values tables, you need to specify the following VM command-line option:

–XcleanupOnExit

Logging System

The Jitrino logging system does the following:

Provides diagnostic output of important Jitrino internal data structures
Supports user-defined diagnostic
Provides a flexible control over a diagnostic process via command-line options.

The logging system is an integral part of Jitrino PMF. Logging consists of two interfaces:

The program interface that provides stream output methods
The command-line interface that provides filtration of output logging streams

The logging system is based on streams. Each stream has a name used to address it in a program and command-line options. Jitrino provides several frequently used streams with predefined names. These streams produce specific output when enabled, as follows:

Name	Output
`info`	The protocol of compilation: JIT and pipeline names, the method name and number, and so on
`rt`	Run-time output not related to a compiled method
`ct`	Compile-time diagnostic
`irdump`	The dump of internal Jitrino structures for a compiled method
`dotdump`	The dump of internal Jitrino structures in the .dot format
`dbg`	Debug information

The general syntax of the logging command follows PMF rules:

-XDjit.<JIT>.<pipeline>.arg.<path>.log=<list of stream names>

In this command syntax, <path> represents the set of Jitrino actions, for which the stream is enabled. When no path is specified, the command applies to all existing actions.

Example

To enable compilation protocols for all pipelines of all JITs, type

-XDjit.arg.log=info

To dump compile-time diagnostic together with IR dumps, type

-XDjit.arg.log=ct,irdump

CFG Visualization

Debugging the JIT requires information on the compilation inter-stage modification of the control flow graph for the compiled method, including instructions and operands. For that, the Jitrino compiler enables generation of dot files representing the control flow graph at both IR levels. The text .dot files can be converted into descriptive pictures, which represent the CFG graphically; see section Internal Representations for an example of graphically visualized code. A variety of graph visualization tools are available for dot files conversion, such as Graphviz [6].

To enable dumping .dot files, use the following command:

-XDjit.arg.log=dotdump

For more details on the Jitrino logging system, refer to the corresponding section in the PMF description.

Control Flow Graph

The high-level and low-level intermediate representations use a unified basis structure to represent the logic structure of a compiled method, the control flow graph (CFG). This unification enables Jitrino to avoid duplication of code in its internals, reduce code size and improve quality of produced code.
The current CFG implementation is located in jitrino/src/shared/ControlFlowGraph .h and .cpp files. These files contain core structures and algorithms and can be directly re-used and extended with custom functionality.

The goal of the control flow graph implementation is to provide the following:

The base structures and routines to build control flow graph: nodes, edges, and instructions
Implementations of various algorithms: nodes ordering, purging unreachable or empty nodes, and so on
Extensibility of the structures representing the control flow

The control flow graph supports two types of control flow:

The conventional control flow represents transfer of control due to jumps and conditional branches of the code being compiled. The conventional control flow is modeled via the edges between basic block nodes.
The exceptional control flow reflects transfer of control due to thrown and caught exceptions. This job is modeled via the dispatch nodes: a thrown exception is represented by an edge from a basic block node to a dispatch node, and a caught exception is represented by an edge from a dispatch node to a block node.

Because IR can represent the exceptional control flow, the optimizer and code generators take it into account and optimize exceptions and exception handlers. Explicit modeling of the exception control flow in the control flow graph enables the compiler to optimize across throw-catch boundaries. For locally handled exceptions, the compiler can replace expensive throw-catch combinations with cheaper direct branches.

CFG Structures

The CFG structures are nodes, edges and instructions represented as Node, Edge and CFGInst classes respectively.

Subclassing

All CFG classes can be subclassed by user code. The CFGInst class has pure virtual functions, so it must be subclassed before use. Node and Edge classes can be subclassed and extended with arbitrary user data, except a limited set of node and edge types that must not be extended.

Nodes

CFG uses the following kinds of nodes:

Block nodes are the usual nodes with user code translated into IR.
Dispatch nodes represent exception throwing paths.
Exit nodes are for method exits.

The Exit node is an artificial node that post-dominates all exceptional and normal exits from the method. A graph always has only one Exit node that represents all exceptional and normal exits from the method.

In addition to that, CFG has dedicated block and dispatch nodes. The Entry block node marks the start of a method and dominates over all nodes in the graph. The optional Return block node post-dominates all normal paths, whereas the optional Unwind dispatch node - all exceptional paths leading out of the method. The Return and Unwind nodes always have only one outgoing edge that always points to the Exit node.

Edges

An edge connects two nodes and its kind is calculated at run time depending on the kinds of nodes it connects, as follows:

Dispatch edges connect a block or dispatch node with another dispatch node.
Only one dispatch edge per block node is allowed.
Unconditional edges connect two block nodes and depict an unconditional transition from one block node to another, as with unconditional direct and indirect jumps, control transfer in the switch statement or fall-through transition. The edges that connect Return and Unwind nodes with the Exit node are also treated as unconditional.
True and False edges connect two non-empty block nodes and represent the conditional flow or branches. To detect whether an edge is True or False, CFG requests the last instruction in the current block about the edge kind.
Catch edges connect dispatch and block nodes. These edges represent paths where exceptions are caught and normal execution of user code resumes. The kind of the exception caught by this class is specific for the IR implementation.

Instructions

Every node keeps a linked list of instructions that provide edge type information for outgoing edges of the node. The instructions interface also has functions to provide information about positioning an instruction in a node that can be checked by CFG algorithms during runtime. Instructions are subclasses of the CFGInst class.

When the edge type information is unclear in the current context, CFG gets this information from the last instruction of a node. For example, for branches with two or more edges connecting block nodes, the edge type is always requested from the last instruction in the block. When an edge is removed or retargeted, the last instruction in the block is notified to track the CFG changes. On the contrary, for an edge that connects block and dispatch node, it is clear that the edge is of the Dispatch type and no request is sent.

Positioning information for an instruction is used to track possible errors when an ordinary instruction is prepended to a node before a .h critical instruction. Examples of .h critical instructions are exception catches, method prologues instructions and label instructions used in HIR to mark nodes.

Graph Algorithms

The current CFG implementation provides the following graph algorithms:

Ordering of nodes: post- and pre-order of nodes, tracking graph modifications to enable caching of ordering information
Managing nodes: removing empty and unreachable nodes, merging adjacent nodes
Propagating the execution count using edge probabilities
Sub-graph inlining
Accessors for nodes and edges
Dominator and loop information

For details on individual algorithms, generate documentation from the ControlFlowGraph.h file using Doxygen.

Dominator Tree

A dominator tree represents (post)dominance information computed for a control flow graph. The tree can be used to query or navigate dominance relationships of nodes. The root of the tree is the DominatorNode of the entry or exit node in the graph, and the parent of any dominator node other than the root is the immediate dominator or post-dominator.

Dominator tree source files are jitrno/src/shared/DominatorTree .h and .cpp files.

Note

A dominator tree is invalidated when the underlying CFG is modified. The dominator tree can still be queried and navigated, but may no longer reflect the current state of the control flow graph.

Loop Tree

A loop tree contains information about loops in control flow graph. It can be used to query loop information for every node in a control flow graph. The loop tree is represented with LoopTree and LoopNode classes. Every LoopNode instance has an associated node from the graph and represents a header of a loop. All child nodes of the LoopNode are loop headers of the nested loops. The root node of the loop tree is an artificial node used to join method’s top-level loops and its child nodes.

The loop tree sources are located in jitrino/src/shared/LoopTree .h and .cpp files.

Note

The loop tree is invalidated when the underlying CFG is modified. You cannot navigate the loop tree after CFG modification.

Public Interfaces

This section describes the interfaces that the JIT compiler exports to communicate with other components. Jitrino exposes all necessary interfaces to work as a part of the run-time environment. Jitrino explicitly supports precise moving garbage collectors requiring the JIT to enumerate live references.

JIT_VM Interface

This interface comprises functions that the JIT compiler exports for communication with the virtual machine core component. Functions inside the JIT_VM interface can be grouped into the following categories:

Bytecode Compilation

Functions in this set are responsible for the primary JIT compiler task of running just-in-time compilation to produce native executable code from a method bytecode. A request to compile a method can come from the VM core or the execution manager. For details on individual functions of this interface, generate documentation for the file vmcore/include/jit_export.h using Doxygen.

Root Set Enumeration

This set of functions supports the garbage collector by enumerating and reporting live object references. The JIT compiler provides these functions to report locations of object references and interior pointers that are live at a given location in the JIT-compiled code. The object references and interior pointers constitute the root set that the GC uses to traverse all live objects. The interface requires reporting locations of the values rather than the values, to enable a moving garbage collector to update the locations while moving objects.

Note

Unlike reference pointers that always point to the object’s header, interior pointers actually point to a field that is inside the target object. If the JIT reports an Interior Pointer without the Reference Pointer, then the burden is upon the GC to actually reconstruct the Reference Pointer.

For more information on GC related activities, see sections Root Set Enumeration and Garbage Collector section in the Developer's Guide document. For details on individual functions of this interface, generate documentation for the file vmcore/include/jit_export_rt.h using Doxygen.

Stack Unwinding

The virtual machine requires support from the compiler to perform stack unwinding, that is, an iteration over the stack from a managed frame to the frame of the caller.

To facilitate stack walking, the JIT stack unwinding interface does the following:

Updates the JIT stack frame context reflecting an iteration of stack walking. Different JIT compilers can implement different physical stack frame layouts and the JIT is responsible for proper interpretation of them.
Reports the current location of a given pointer to allow VM to remove monitors for synchronized non-static methods during exception handling.
Updates the JIT frame context to reflect the stack state at a catch handler.

For more information about the stack, see section Stack Support in the Developer's Guide.

Note

Root set enumeration and stack unwinding are run-time routines called only during execution of compiled code.

JVMTI Support

The set of JIT functions responsible for JVMTI support is exported for interaction with the VM JVMTI component. These functions do the following:

Process requests about JIT capabilities for JVMTI support.
For profiling and debugging, the JVMTI support component in the VM core requires the JIT compiler to have a set of capabilities. The VM core can query the JIT about supported capabilities.
Provide bytecode to native code mapping to identify native and bytecode locations for the compiled method.
Access and modify local variables in compiled code.

The VM core can request the JIT to compile a method and to support generation of specific JVMTI events in compiled code. To facilitate these actions, additional parameters are passed to the bytecode compilation interface.

For a description of functions that the VM core exports to interact with the JIT compiler, see VM Core Public Interfaces in the Developer's Guide. For details on the functions making up each interface group, generate documentation from the files vmcore/include/jit_export_jpda.h and vmcore/include/jit_export.h using Doxygen.

JIT_EM Interface

The JIT compiler exports this interface to support the execution manager. Functions of this set are responsible for the following operations:

Compilation interface enabling the EM to request compilation of a method
Profile interface enabling the EM to request the JIT to collect or to use the specified profile type

For a description of the functions that the execution manager exports to interact with the JIT compiler, see section Public Interfaces in the Execution Manager description. For details on the functions that the JIT exports for EM, generate documentation from the include/open/ee_em_intf.h file using Doxygen.

References

This section lists the external references to various sources used in Jitrino documentation, and to standards applied to the Jitrino implementation.

[1] Java* Virtual Machine Specification, http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html

[2] JIT Compiler Interface Specification, Sun Microsystems, http://java.sun.com/docs/jit_interface.html

[3] S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, CA, 1997.

[4] P. Briggs, K.D., Cooper and L.T. Simpson, Value Numbering. Software-Practice and Experience, vol. 27(6), June 1997, http://www.informatik.uni-trier.de/~ley/db/journals/spe/spe27.html

[5] R. Bodik, R. Gupta, and V. Sarkar, ABCD: Eliminating Array-Bounds Checks on Demand, in proceedings of the SIGPLAN ’00 Conference on Program Language Design and Implementation, Vancouver, Canada, June 2000, http://portal.acm.org/citation.cfm?id=349342&dl=acm&coll=&CFID=15151515&CFTOKEN=6184618

[6] Graphviz, Graph Visualization Software, http://www.graphviz.org/

[7] Detailed list of optimizations performed by Jitrino.OPT, Harmony WiKi

[8] Descriptions of High-Level IR instructions of Jitrino.OPT, Harmony WiKi

* Other brands and names are the property of their respective owners.