Version: v1.1.8

Scoring runtimes

This page describes the scoring runtimes available for model deployment, including configuration options and usage instructions for each runtime type.

note

To learn how to manage, version, and create custom scoring runtimes, see Understand runtime management.

Runtime options

The selection of available runtimes is determined by the artifact type that you specify. The following list provides information on the available options when selecting an artifact type and runtime.

note

Selecting an incorrect runtime causes the deployment to fail.

Artifact type	Runtime option(s)	Notes
Driverless AI MOJO pipeline/ MLflow Driverless AI MOJO pipeline	`MOJO Scorer`	Supports all Shapley contribution types.
Driverless AI Python pipeline/ MLflow Driverless AI Python pipeline	`Python Pipeline Scorer [Driverless AI 1.11.0–1.11.1.1, 2.0.0–2.4.0]`	Python scorer version must match the Driverless AI version used to build the model.
H2O-3 MOJO/ MLflow H2O-3 MOJO	`H2O-3 MOJO Scorer`
MLflow zipped	`[PY-3.10][CPU] HT Flexible Runtime` `[PY-3.10][GPU] HT Flexible Runtime` `[PY-3.12] HT Flexible Runtime (Auto CPU/GPU)` `HT Static Runtime (Auto CPU/GPU) [HT 1.6]` `[Py 3.10–3.12] Dynamic MLflow Model Scorer`	For usage details, see MLflow Dynamic Runtime.

note

The MOJO Scorer supports a wide range of algorithms Driverless AI uses, including BERT, GrowNet, and TensorFlow models. To score these model types, link the experiment from Driverless AI. Manually uploaded artifacts for these model types are not supported.
MLflow runtimes support Python 3.10 and later.
For end of support information on H2O Driverless AI runtimes, see the Driverless AI Prior Releases page.

Artifact names mapping

The following table describes the mapping of artifact names.

Artifact type name	Storage artifact type	Artifact type
DAI MOJO Pipeline	`dai/mojo_pipeline`	`dai_mojo_pipeline`
DAI Python Pipeline	`dai/scoring_pipeline`	`dai_python_scoring_pipeline`
H2O-3 MOJO	`h2o3/mojo`	`h2o3_mojo`
MLflow zipped	`python/mlflow`	`python/mlflow.zip`
MLflow DAI MOJO Pipeline	`mlflow/mojo_pipeline`	`mlflow_mojo_pipeline`
MLflow DAI Python Pipeline	`mlflow/scoring_pipeline`	`mlflow_scoring_pipeline`
MLflow H2O-3 MOJO	`mlflow/h2o3_mojo`	`mlflow_h2o3_mojo`

Runtime names mapping

The following table describes the mapping of runtime names.

Compatible model	Runtime name	Runtime
Driverless AI MOJO models - supports all Shapley contribution types	MOJO Scorer	`dai-mojo-scorer`

Driverless AI Python Scoring Pipeline models created by Driverless AI 1.11.0	Python Pipeline Scorer [Driverless AI 1.11.0]	`python-scorer_dai_pipelines_1110`
Driverless AI Python Scoring Pipeline models created by Driverless AI 1.11.1.1	Python Pipeline Scorer [Driverless AI 1.11.1.1]	`python-scorer_dai_pipelines_11111`
Driverless AI Python Scoring Pipeline models created by Driverless AI 2.0.0	Python Pipeline Scorer [Driverless AI 2.0.0]	`python-scorer_dai_pipelines_200`
Driverless AI Python Scoring Pipeline models created by Driverless AI 2.1.0	Python Pipeline Scorer [Driverless AI 2.1.0]	`python-scorer_dai_pipelines_210`
Driverless AI Python Scoring Pipeline models created by Driverless AI 2.2.3	Python Pipeline Scorer [Driverless AI 2.2.3]	`python-scorer_dai_pipelines_223`
Driverless AI Python Scoring Pipeline models created by Driverless AI 2.2.4	Python Pipeline Scorer [Driverless AI 2.2.4]	`python-scorer_dai_pipelines_224`
Driverless AI Python Scoring Pipeline models created by Driverless AI 2.2.5	Python Pipeline Scorer [Driverless AI 2.2.5]	`python-scorer_dai_pipelines_225`
Driverless AI Python Scoring Pipeline models created by Driverless AI 2.3.1	Python Pipeline Scorer [Driverless AI 2.3.1]	`python-scorer_dai_pipelines_231`
Driverless AI Python Scoring Pipeline models created by Driverless AI 2.3.2	Python Pipeline Scorer [Driverless AI 2.3.2]	`python-scorer_dai_pipelines_232`
Driverless AI Python Scoring Pipeline models created by Driverless AI 2.4.0	Python Pipeline Scorer [Driverless AI 2.4.0]	`python-scorer_dai_pipelines_240`

H2O-3 MOJO models	H2O-3 MOJO Scorer	`h2o3-mojo-scorer`

H2O Hydrogen Torch MLflow models	[PY-3.10][CPU] HT Flexible Runtime	`python-scorer_hydrogen_torch_cpu_py310`
H2O Hydrogen Torch MLflow models	[PY-3.10][GPU] HT Flexible Runtime	`python-scorer_hydrogen_torch_gpu_py310`
H2O Hydrogen Torch MLflow models	[PY-3.12] HT Flexible Runtime (Auto CPU/GPU)	`python-scorer_hydrogen_torch_py312`
H2O Hydrogen Torch MLflow models (HT 1.6.x)	HT Static Runtime (Auto CPU/GPU) [HT 1.6]	`python-scorer_hydrogen_torch_static_ht16`
MLFlow non-H2O.ai models created with Python 3.10	[Py 3.10] Dynamic MLflow Model Scorer	`python-scorer_mlflow_dynamic_310`
MLFlow non-H2O.ai models created with Python 3.11	[Py 3.11] Dynamic MLflow Model Scorer	`python-scorer_mlflow_dynamic_311`
MLFlow non-H2O.ai models created with Python 3.12	[Py 3.12] Dynamic MLflow Model Scorer	`python-scorer_mlflow_dynamic_312`

Driverless AI MOJO Scorer (dai-mojo-scorer)

The dai-mojo-scorer replaces the legacy Java and C++ MOJO runtimes for scoring Driverless AI MOJO pipeline models. It uses a Go+Worker architecture: a Go HTTP server handles all API traffic, while a Python worker process handles model loading and inference via Unix socket IPC.

Supported capabilities

The dai-mojo-scorer supports the following capabilities. Shapley and prediction interval support is auto-detected per model:

SCORE — Always available. Standard model predictions.
CONTRIBUTION_ORIGINAL — Auto-detected. Shapley values for original features.
CONTRIBUTION_TRANSFORMED — Auto-detected. Shapley values for transformed features.
SCORE_PREDICTION_INTERVAL — Auto-detected. Upper/lower prediction bounds (where supported by the model's algorithm).

Requirements

DRIVERLESS_AI_LICENSE_KEY environment variable must be set.
Model must be a Driverless AI MOJO pipeline artifact.

Docker usage

docker run -d -p 8080:8080 \
  -v /path/to/mojo:/models/my-model \
  -e MODEL_PATH=/models/my-model \
  -e DRIVERLESS_AI_LICENSE_KEY="${DRIVERLESS_AI_LICENSE_KEY}" \
  h2oai/dai-mojo-scorer:<version>

Configuration

Variable	Default	Description
`MODEL_PATH`	(required)	Path to the MOJO pipeline model directory
`PORT`	`8080`	Server port
`HOST`	`0.0.0.0`	Server bind address
`LOG_LEVEL`	`INFO`	Python worker log level (`DEBUG`, `INFO`, `WARNING`, `ERROR`)
`DRIVERLESS_AI_LICENSE_KEY`	(required)	Driverless AI license key

note

The H2O_SCORER_WORKERS environment variable is no longer used. For performance tuning, use SCORING_CONCURRENCY and BATCH_WORKERS instead. For details, see Concurrency and performance tuning.

H2O-3 MOJO Scorer (h2o3-mojo-scorer)

The h2o3-mojo-scorer is a dedicated Java/Spring Boot runtime for scoring H2O-3 open-source MOJO models (.mojo files produced by H2O-3, not Driverless AI).

Supported capabilities

SCORE — Standard model predictions
CONTRIBUTION_TRANSFORMED — Shapley values for transformed features (must be explicitly enabled via configuration)
SCORE_PREDICTION_INTERVAL — Upper/lower prediction bounds (auto-detected per model)

Requirements

H2O-3 MOJO model file (.mojo)

Docker usage

docker run -d -p 8080:8080 \
  -v /path/to/mojos:/mojos \
  h2oai/h2o3-mojo-scorer:<version> \
  -Dmojo.path=/mojos/pipeline.mojo

Configuration

Variable / Property	Default	Description
`SCORER_MOJO_PATH` / `mojo.path`	(required)	Path to the `.mojo` file
`server.port`	`8080`	Server port
`SHAPLEY_ENABLE` / `shapley.enable`	`false`	Enable all Shapley contribution types
`SHAPLEY_TYPES_ENABLED` / `shapley.types.enabled`	`NONE`	Specific types: `TRANSFORMED`, `NONE`. H2O-3 models support only transformed Shapley values.

note

Enabling Shapley contributions loads the MOJO file multiple times (once per enabled contribution type). This increases memory usage proportionally. Plan resource limits accordingly.

Common REST API

Both dai-mojo-scorer and h2o3-mojo-scorer share the same REST API, which is backward-compatible with the legacy Java MOJO runtime. Your existing client integrations work without modification.

Endpoint	Method	Description
`/model/id`	GET	Get model UUID
`/model/schema`	GET	Get input/output schema
`/model/capabilities`	GET	List supported capabilities
`/model/sample_request`	GET	Get a sample scoring request for testing
`/model/score`	POST	Score data rows
`/model/contribution`	POST	Get Shapley contributions
`/readyz`	GET	Health/readiness check
`/openapi.json`	GET	OpenAPI 3.0 specification
`/docs`	GET	Interactive Swagger UI

Scoring example

curl -X POST http://localhost:8080/model/score \
  -H "Content-Type: application/json" \
  -d '{
    "fields": ["feature1", "feature2"],
    "rows": [["1.0", "2.0"], ["3.0", "4.0"]]
  }'

For Python client usage, see Deployment scorer.

Concurrency and performance tuning

Go+Worker architecture

The dai-mojo-scorer provides fine-grained concurrency tuning. The following diagram shows the request flow:

HTTP requests (Go goroutines — unbounded)
       │
       ▼
IPC connection pool (SCORING_CONCURRENCY connections)
       │  pool exhausted → HTTP 503 (if SCORING_TIMEOUT > 0)
       │  pool slow      → WARN log (if SCORING_WARN_TIMEOUT exceeded)
       ▼
InferenceWorker queue (one per BATCH_WORKERS)
       │  waits up to MAX_QUEUE_DELAY_US to coalesce rows into a batch
       ▼
inferrer.score(merged_batch)  ← single model call per batch

The architecture runs one Python worker process per pod by design. Achieve horizontal scaling through Kubernetes replicas rather than running multiple worker processes inside a single pod. This avoids model memory duplication and simplifies thread-safety guarantees.

Tuning parameters

Variable	Default	Description
`SCORING_CONCURRENCY`	`4`	Number of concurrent IPC connections from Go to the Python worker. Increase for high-concurrency deployments.
`BATCH_WORKERS`	`1`	Number of parallel batch-worker threads. 0 disables batching and uses the direct scoring path where up to `SCORING_CONCURRENCY` calls run concurrently (model must be thread-safe). See Python GIL considerations.
`MAX_BATCH_SIZE`	`64`	Maximum rows per batch before flushing. Lower for memory-constrained models.
`MAX_QUEUE_DELAY_US`	`10000` (10 ms)	How long the worker waits to coalesce incoming requests into a batch.
`SCORING_TIMEOUT`	`0` (unlimited)	Seconds before returning HTTP 503 when the connection pool is exhausted. Set to a positive value in production to prevent request pile-up.
`SCORING_WARN_TIMEOUT`	`30`	Seconds before emitting a WARN log when a request waits too long for a pool connection.
`SCORE_ROW_COUNT_VALIDATION`	`strict`	Row count validation mode after `inferrer.score()`. See Row count validation.

Row count validation

All scoring runtimes enforce an N-in = N-out contract: the number of rows returned by predict() must equal the number of input rows. After every call to inferrer.score(), the runtime validates that len(output_rows) == len(input_rows). If the counts don't match, the request fails with a 500 error instead of returning truncated or misassigned results.

This validation is critical for the batching path (BATCH_WORKERS >= 1). When batching is enabled, multiple scoring requests are merged into a single batch. The runtime splits results back to individual requests by row offset, so a row count mismatch means rows get assigned to the wrong requests, causing silent data corruption.

Custom model requirement

If you implement a custom MLflow Python model (mlflow.pyfunc.PythonModel), your predict() method must return exactly one output row for each input row. This applies to all scoring paths but is especially important for:

Models used with batch scoring, where large payloads are common
Deployments with batching enabled (BATCH_WORKERS >= 1), where multiple requests are merged

The SCORE_ROW_COUNT_VALIDATION environment variable controls validation strictness:

Scoring path	Case	`strict` (default)	`warn`
Batching	0 rows returned	Error	Error
Batching	Row count mismatch	Error	Error
Direct	0 rows returned	Error	Error
Direct	Row count mismatch	Error	Warning only

The batching path always returns an error on mismatch regardless of the validation mode.

Workaround for models that can't satisfy N-in = N-out: If your model inherently returns a different number of rows than it receives (for example, an aggregation model), disable batching and relax validation:

BATCH_WORKERS=0              # Disable batching (use direct scoring path)
SCORE_ROW_COUNT_VALIDATION=warn  # Warn instead of error on mismatch in direct mode

Both settings are required: BATCH_WORKERS=0 switches to the direct path where the warn mode is honored, because the batching path always errors on mismatch.

warning

With BATCH_WORKERS=0, each connection thread calls inferrer.score() directly — up to SCORING_CONCURRENCY calls can be in-flight simultaneously. Your model must be thread-safe. If your model is not thread-safe, keep BATCH_WORKERS=1 (the default).

Python GIL considerations

Python's GIL means only one thread executes Python bytecode at a time. Whether batching or direct scoring is better depends on whether the model's scoring library releases the GIL and its per-call overhead:

Model type	GIL released?	Recommendation
DAI MOJO2 (`daimojo` C++ runtime)	Yes	`BATCH_WORKERS=0` — low per-call overhead, direct path gives best latency and throughput
XGBoost, LightGBM, ONNX Runtime	Yes	`BATCH_WORKERS=0` — cheap per-call cost, true parallelism via direct path
scikit-learn, pure Python	No	Keep `BATCH_WORKERS=1` — multiple threads compete for the GIL and reduce throughput
DAI Python pipeline	Depends on base model	Usually XGBoost/LightGBM underneath — `BATCH_WORKERS=0` if no custom recipes; use `BATCH_WORKERS=1` if the pipeline contains custom Python transformers/recipes (not thread-safe)
GPU models — thread-unsafe runtime (for example, FAISS)	Yes (CUDA calls)	`BATCH_WORKERS=1` — serial execution required; GPU batching amortizes kernel launch overhead
GPU models — thread-safe runtime (for example, HydrogenTorch)	Yes (CUDA calls)	`BATCH_WORKERS=1` or higher — GPU batching amortizes kernel launch and CPU→GPU transfer overhead; tune upward for higher throughput

note

Safe default: BATCH_WORKERS=1 works correctly for all model types. For GIL-releasing CPU models, BATCH_WORKERS=0 is recommended — it avoids the MAX_QUEUE_DELAY_US wait and allows up to SCORING_CONCURRENCY parallel score() calls with no batching overhead.

Choosing a BATCH_WORKERS strategy

The following options achieve concurrent inference differently for GIL-releasing models:

BATCH_WORKERS=0 (direct path) — each connection thread calls inferrer.score() directly. Up to SCORING_CONCURRENCY calls run in parallel. No queue, no merge/split, no delay window. Best for:

GIL-releasing CPU models (MOJO2, XGBoost, LightGBM, ONNX) where per-call overhead is negligible
Latency-sensitive workloads — no MAX_QUEUE_DELAY_US penalty
Low-to-moderate concurrency — few requests at a time, batch window mostly adds wasted latency

BATCH_WORKERS=1 (default, serial batching) — all requests funnel through a single inference thread. One score() call at a time, with rows coalesced across concurrent requests. Best for:

Thread-unsafe models (scikit-learn, pure Python, DAI pipelines with custom recipes, thread-unsafe GPU runtimes like FAISS) — guarantees serial execution, no concurrency risk
GPU models that benefit from batching — kernel launch overhead amortized over larger batches
Any model you're unsure about — safe default, works with everything

BATCH_WORKERS > 1 (parallel batching) — N inference threads each collect and score batches in parallel. Pays MAX_QUEUE_DELAY_US latency to fill batches. Requires a thread-safe model runtime. Best for:

Thread-safe GPU models — kernel launch and CPU→GPU transfer overhead is significant; scoring N rows in one call is much cheaper than N separate calls, so coalescing concurrent requests into a single batch is a big throughput win. (Do not use with thread-unsafe GPU runtimes like FAISS — use BATCH_WORKERS=1 instead.)
Models with high fixed cost per score() call — for example, computation graph setup, JIT warmup, external service connection
High-concurrency, small-request workloads on GPU — 1000 req/s of single-row requests benefits from coalescing into fewer batch calls/s

Horizontal scaling

Scale horizontally through Kubernetes replicas rather than increasing per-pod concurrency:

              Load Balancer
             /      |      \
       Pod 1      Pod 2      Pod 3
    [Go+Worker] [Go+Worker] [Go+Worker]

Each pod runs one Go server and one Python worker. Kubernetes handles health checks, restarts, resource limits, and node placement.

MLflow Dynamic Runtime

The MLflow Dynamic Runtime lets you deploy MLflow models with diverse dependencies in H2O MLOps. The following steps describe how to deploy a dynamic MLflow runtime deployment in H2O MLOps.

note

For an example of how to train a dynamic runtime, see Train a dynamic runtime.

Save your model using the mlflow.pyfunc.save_model function call. Use the pip_requirements parameter to specify the Python package dependencies required by the model.

mlflow.pyfunc.save_model(
   path=...,
   python_model=...,
   artifacts=...,
   signature=...,
   pip_requirements=..., # <- Use this parameter to override libs for dynamic runtime
)

After saving the model, create a zip archive of the saved model directory. Ensure that a requirements file (requirements.txt) that lists all dependencies is included in the zip archive. The following is an example of the expected structure for the zip file from a TensorFlow model:
```
tf-model-py310
├── MLmodel
├── artifacts
│   └── tf.h5
├── conda.yaml
├── python_env.yaml
├── python_model.pkl
└── requirements.txt
```
Depending on whether you are using Python 3.10, Python 3.11, or Python 3.12 select from one of the following options:
- [PY-3.10] MLflow Dynamic Model Scorer
- [PY-3.11] MLflow Dynamic Model Scorer
- [PY-3.12] MLflow Dynamic Model Scorer

note

The MLflow Dynamic Runtime has a fixed MLflow dependency, which is MLflow 2.14.2. This means that the MLflow Dynamic Runtime is not guaranteed to work with a different version of MLflow model.

Example: Train a dynamic runtime model

The following example demonstrates how to train a dynamic runtime with TensorFlow:

# Import libraries
import mlflow
import pandas as pd
import shutil
import tensorflow as tf
from sklearn import datasets

# Load and prepare data
diabetes = datasets.load_diabetes()
X = diabetes.data[:, 2:3]  # Use only one feature for simplicity
y = diabetes.target

# Build and train TensorFlow model
tf_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(1, input_dim=1)
])
tf_model.compile(optimizer='adam', loss='mean_squared_error')
tf_model.fit(X, y, epochs=10)

tf_model_path = "tf.h5"

tf_model.save(tf_model_path, save_format="h5")


# Enable the TensorFlow model to be used in the Pyfunc format
class PythonTFmodel(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        import tensorflow as tf
        self.model = tf.keras.models.load_model(context.artifacts["model"])

    def predict(self, context, model_input):
        tf_out = self.model.predict(model_input)
        return pd.DataFrame(tf_out, columns=["db_progress"])


# Generate signature from your model definition
model = PythonTFmodel()
context = mlflow.pyfunc.PythonModelContext(model_config=dict(), artifacts={"model": tf_model_path})
model.load_context(context)
x = pd.DataFrame(X, columns=["dense_input"])
y = model.predict(context, x)
signature = mlflow.models.signature.infer_signature(x, y)

# Specify a file path where the model will be saved
mlflow_model_path = "./tf-model-py310"

# Save model using MLflow
mlflow.pyfunc.save_model(
    path=mlflow_model_path,
    python_model=PythonTFmodel(),
    signature=signature,
    artifacts={"model": tf_model_path},
    pip_requirements=["tensorflow"]
)

# Package model as a zip archive
shutil.make_archive(
    mlflow_model_path, "zip", mlflow_model_path
)

The following is the structure of the zip file that is generated in the preceding example:

tf-model-py310
├── MLmodel
├── artifacts
│   └── tf.h5
├── conda.yaml
├── python_env.yaml
├── python_model.pkl
└── requirements.txt

Generic Ephemeral volumes

The custom additional volumes feature now supports emptyDir volumes and ephemeral volumes.

note

The storageClassName property for volumes is optional. If not provided, the default storage class will be used.

Example configuration

# Custom additional volumes with selected mount paths.
# This section, as well as each of its fields, is optional.
volume-mounts = [
  {
    name = "ephemeral_volume"
    type = "ephemeral"
    properties = [
      { name = "size", value = "1Gi" }
    ]
    paths = ["/ephemeral_volume_1", "/ephemeral_volume_2"]
  },
  {
    name = "emptyDir_volume"
    type = "emptyDir"
    properties = [
      { name = "medium", value = "Memory" }
    ]
    paths = ["/emptyDir_volume_1", "/emptyDir_volume_2"]
  }
]

YAML configuration

The volumeMounts section should be added to the runtime specification of the Helm Chart.

runtimes:
    volumeMounts:
        - name: "dev-shm"
        type: "ephemeral"
        properties:
            size: "1Gi"
        paths: ["/tmp"]

Feedback

Submit and view feedback for this page
Send feedback about H2O MLOps to cloud-feedback@h2o.ai

Runtime options​

Artifact names mapping​

Runtime names mapping​

Driverless AI MOJO Scorer (dai-mojo-scorer)​

Supported capabilities​

Requirements​

Docker usage​

Configuration​

H2O-3 MOJO Scorer (h2o3-mojo-scorer)​

Supported capabilities​

Requirements​

Docker usage​

Configuration​

Common REST API​

Scoring example​

Concurrency and performance tuning​

Go+Worker architecture​

Tuning parameters​

Row count validation​

Python GIL considerations​

Choosing a BATCH_WORKERS strategy​

Horizontal scaling​

MLflow Dynamic Runtime​

Example: Train a dynamic runtime model​

Generic Ephemeral volumes​

Example configuration​

YAML configuration​

Runtime options

Artifact names mapping

Runtime names mapping

Driverless AI MOJO Scorer (dai-mojo-scorer)

Supported capabilities

Requirements

Docker usage

Configuration

H2O-3 MOJO Scorer (h2o3-mojo-scorer)

Supported capabilities

Requirements

Docker usage

Configuration

Common REST API

Scoring example

Concurrency and performance tuning

Go+Worker architecture

Tuning parameters

Row count validation

Python GIL considerations

Choosing a BATCH_WORKERS strategy

Horizontal scaling

MLflow Dynamic Runtime

Example: Train a dynamic runtime model

Generic Ephemeral volumes

Example configuration

YAML configuration