Apache Harmony is retired at the Apache Software Foundation since Nov 16, 2011.

The information on these pages may be out of date, or may refer to resources that have moved or have been made read-only.
For more information please refer to the Apache Attic

Encoder Library Description for IA-32/Intel64

  1. Revision History
  2. About This Document
  3. Overview
  4. Goals and Targets
  5. Structure
  6. How It Works
    1. Usage Model
    2. Under the Hood
      1. Fast Opcode Lookup
      2. Fast Code Generation

Revision History

Version Version Information Date
Initial version Alexander Astapchuk, Svetlana Konovalova: document created. January 30, 2007

About This Document

This document introduces the encoder library component delivered as a part of the DRL (Dynamic Runtime Layer) initiative. This document focuses on the specifics of the current implementation showing the encoder library structure and role inside the DRL virtual machine.

Overview

The encoder library is a DRLVM component for code generation, or encoding. This library is separate, static and mostly independent from other components. The following components use the encoder library:

Goals and Targets

The encoder library meets the following requirements:

Structure

The encoder library includes the following modules:

The encoder library consists of the following files located at vm/port/src/encoder/ia32_em64t:

Filename Description
dec_base.cpp decoding routines
dec_base.h decoding routines declarations
enc_base.cpp base encoding engine
enc_base.h base encoding engine declarations
enc_prvt.h internal stuff of encoding engine
encoder.cpp handy adapter for use in programs
encoder.h handy adapter declaration
encoder.inl implementation of most of encoder.h functions that are normally inline
enc_defs.h complete instructions list including miscellaneous definitions of register names, sizes, etc
enc_tabl.cpp comprehensive step-by-step comments on how to add new instructions

How It Works

Usage Model

The base encoding interface EncoderBase::encode() is a common generic interface, which is not used in programs directly. Normally, applications use an adapter interface to connect specific client needs to the EncoderBase generic interface. Currently, the following adapters are available:

All the adapters are trivial - they fill out arguments as EncoderBase::Operands, and then invoke EncoderBase::encode().

Example

The encoder.h file consisting of human-readable function names serves as an adapter. This way, to generate a simple code sequence, use the encoder.h interface.

The same usage model applies to the decoder engine: the basic generic interface is declared in vm/port/src/encoder/ia32_em64t/dec_base.h and the specific adapter for JVMTI needs is in vm/vmcore/src/jvmti/jvmti_dasm.cpp, vm/vmcore/include/jvmti_dasm.h.

Under the Hood

The engine gets its input as an operation and a set of operands, and performs the following operations:

Both steps involve performance-intensive compare and memory access operations. To reduce the workload, table maintenance is simplified for users, and the application keeps the master table plain and elementary. At run time, before the first usage, a special version of data is pre-compiled, which requires fewer manipulations. The pre-compiled version provides the fast opcode lookup and the fast code generation.

Fast Opcode Lookup

Every operand gets its unique hash based on the operand size and the memory, or register, or immediate location. If an instruction has more than one operand, its hash undergoes the OR operation, by the following formula:

hash = opnd1.hash() | opnd2.hash()<<N | opnd3.hash() << N*2;

A pair of a mnemonic and its hash identifies the needed record. The hash is calculated in EncoderBase::Operands methods, outside of the hot execution path.

Fast Code Generation

For fast generation of code, the data is separated into static data, which does not depend on operands, and dynamic data, which depends on operands. This algorithm optimizes and speeds up generation of code by removing the loop from the hot execution path, as shown below:

for (bytes-in-opcode-data) {
// short loop, many miss-predictions, many branches inside, too slow
if (is_constant_byte) { copy_the_byte(); }
if (is_operand_data) { encode_operand_data(); }
...
}

The static data that is a set of bytes is copied into the output buffer with no analysis.
The dynamic data requires several if operators in the source code, but is much cheaper than the loop shown above.

Encoding runs as shown in the following pseudo-code:

memcpy(buf, static_data);
if (opcode_has_dyn_data_1) { gen_opcode_dyn_data(); }
if (opcode_has_dyn_data_2) { gen_opcode_dyn_data(); }
return;

For more details on hash calculation and internal structures, refer to enc_tabl.cpp.

Back to top