Version: v1.1.8

Migration guide

This guide helps you update H2O MLOps when moving between versions. It outlines migration steps for each version upgrade and focuses on changes that require updates to your code, configuration, or workflows.

For detailed changes to the H2O MLOps Python client, see the Python client migration guide

HT scorer runtime 1.7.x to 2.0.x

This section applies to you if you run HydrogenTorch (HT) models on the HT scorer runtime and your client code parses scoring responses. Specifically, you are affected if your code does any of the following:

Expects a single row in the response that contains the entire batch as one JSON blob
Indexes into nested prediction arrays (for example, blob["predictions"][row_index])
Unwraps an extra list layer for single-row responses (for example, blob["predictions"][0])

What changed

Runtime 2.0.x adds output parsing that splits the legacy single-blob HT response into per-row JSON strings. This change is required for the batching feature introduced in 2.0.x and for row count validation in the scoring API. For more information, see Scoring runtimes.

note

All currently released HT models are affected because they produce a single JSON blob for the entire batch. Future HT model versions will produce per-row JSON strings natively, at which point the runtime parsing becomes a no-op passthrough.

Multi-row requests

Runtime 1.7.x returns 1 row containing a single JSON blob for the entire batch:

{
  "fields": ["output"],
  "rows": [
    ["{\"predictions\": [[0.3, 0.3, 0.4], [0.5, 0.3, 0.2], [0.1, 0.8, 0.1]], \"labels\": [\"neg\", \"neu\", \"pos\"]}"]
  ]
}

Runtime 2.0.x returns N rows, one per input row, with flat per-row predictions:

{
  "fields": ["output"],
  "rows": [
    ["{\"predictions\": [0.3, 0.3, 0.4], \"labels\": [\"neg\", \"neu\", \"pos\"]}"],
    ["{\"predictions\": [0.5, 0.3, 0.2], \"labels\": [\"neg\", \"neu\", \"pos\"]}"],
    ["{\"predictions\": [0.1, 0.8, 0.1], \"labels\": [\"neg\", \"neu\", \"pos\"]}"]
  ]
}

Single-row requests

Runtime 1.7.x returns 1 row with nested predictions (outer list wrapping):

{
  "fields": ["output"],
  "rows": [
    ["{\"predictions\": [[0.9, 0.1]], \"labels\": [\"pos\", \"neg\"]}"]
  ]
}

Runtime 2.0.x returns 1 row with flat predictions, consistent with the multi-row format:

{
  "fields": ["output"],
  "rows": [
    ["{\"predictions\": [0.9, 0.1], \"labels\": [\"pos\", \"neg\"]}"]
  ]
}

Migration steps

Multi-row requests

Update your client code to read one row per input instead of indexing into a single blob:

Before (runtime 1.7.x):

# One blob returned — manually index predictions by row
response = score(rows)
blob = json.loads(response["rows"][0][0])  # single row always
row_0_predictions = blob["predictions"][0]
row_1_predictions = blob["predictions"][1]

After (runtime 2.0.x):

# One row returned per input — each row is already split
response = score(rows)
row_0 = json.loads(response["rows"][0][0])
row_0_predictions = row_0["predictions"]
row_1 = json.loads(response["rows"][1][0])
row_1_predictions = row_1["predictions"]

Single-row requests

Before (runtime 1.7.x):

# Single row returned — predictions are nested in an extra list layer
response = score(rows)
blob = json.loads(response["rows"][0][0])
predictions = blob["predictions"][0]  # unwrap the extra list layer

After (runtime 2.0.x):

# Single row returned — predictions are flat
response = score(rows)
row_0 = json.loads(response["rows"][0][0])
predictions = row_0["predictions"]  # no extra list layer to unwrap

Why this change was made

Batching support: Runtime 2.0.x introduces request batching where multiple requests are merged into a single model call and results are split back by row offset. The old single-blob format made this incompatible — the runtime couldn't determine which predictions belonged to which request without parsing model-specific JSON.
Row count validation: The scoring API validates that the number of output rows matches the number of input rows. The old single-blob format (1 output row for N input rows) would fail this validation.

From 1.0 to 1.1

Runtimes

Removal of MLflow Runtime for Python 3.9

The MLflow Runtime for Python 3.9 is no longer available. Python 3.9 has reached end of life. Update your deployments to use an MLflow Runtime based on Python 3.10 or later.

Removal of Java MOJO Runtime

H2O MLOps version 1.1 removes the Java MOJO Runtime, as previously announced in version 0.67.0.

The following Java-based runtimes are no longer available:

Driverless AI MOJO Scorer (dai_mojo_runtime)
Driverless AI MOJO Scorer - Shapley original only (mojo_runtime_shapley_original)
Driverless AI MOJO Scorer - Shapley transformed only (mojo_runtime_shapley_transformed)
Driverless AI MOJO Scorer - Shapley all (mojo_runtime_shapley_all)

Migrate to the MOJO Scorer (dai-mojo-scorer), which replaces both the Java and C++ MOJO runtimes. The MOJO Scorer supports all Shapley contribution types and accepts a wider range of algorithms, including BERT, GrowNet, and TensorFlow models.

note

To deploy BERT, GrowNet, or TensorFlow models, link the experiment from Driverless AI. Manually uploaded artifacts for these model types are not supported.
The H2O_SCORER_WORKERS environment variable is no longer used. To tune scoring performance for dai-mojo-scorer, use SCORING_CONCURRENCY and BATCH_WORKERS instead. For details, see Concurrency and performance tuning.

H2O-3 MOJO models are now scored using the H2O-3 MOJO Scorer (h2o3-mojo-scorer). For full runtime details, configuration, and migration steps, see Scoring runtimes.

Migrating Driverless AI MOJO deployments

Switch the runtime to MOJO Scorer (dai-mojo-scorer).
Set the MODEL_PATH environment variable to your MOJO pipeline model directory and ensure the DRIVERLESS_AI_LICENSE_KEY environment variable is configured.
Shapley contributions and prediction intervals are auto-detected per model — no additional configuration is needed.
Replace H2O_SCORER_WORKERS with SCORING_CONCURRENCY and BATCH_WORKERS. For details, see Concurrency and performance tuning.
The REST API endpoints (/model/score, /model/contribution, etc.) are unchanged.

Migrating H2O-3 MOJO deployments

Switch the runtime to H2O-3 MOJO Scorer (h2o3-mojo-scorer).
Set the SCORER_MOJO_PATH environment variable (or -Dmojo.path system property) to point to your .mojo file.
If you need Shapley contributions, set SHAPLEY_ENABLE=true or configure SHAPLEY_TYPES_ENABLED. For details, see H2O-3 MOJO Scorer configuration.
The REST API endpoints are unchanged.

API compatibility

Both dai-mojo-scorer and h2o3-mojo-scorer implement the same REST API as the legacy Java MOJO runtime. The endpoints, request/response formats, and OpenAPI specification are unchanged. Your existing client integrations work without modification.

Hydrogen Torch scorer runtime response format changes

Upgrading the HydrogenTorch scorer runtime from 1.7.x to 2.0.x introduces breaking changes to the scoring API response format. All currently released HT models are affected.

Multi-row requests (N > 1 input rows)

Runtime 1.7.x returns one row containing a single JSON blob with the entire batch:

{
  "fields": ["output"],
  "rows": [
    ["{\"predictions\": [[0.3, 0.3, 0.4], [0.5, 0.3, 0.2]], \"labels\": [\"neg\", \"neu\", \"pos\"]}"]
  ]
}

Runtime 2.0.x returns N rows, one per input row, with flat per-row predictions:

{
  "fields": ["output"],
  "rows": [
    ["{\"predictions\": [0.3, 0.3, 0.4], \"labels\": [\"neg\", \"neu\", \"pos\"]}"],
    ["{\"predictions\": [0.5, 0.3, 0.2], \"labels\": [\"neg\", \"neu\", \"pos\"]}"]
  ]
}

Single-row requests (1 input row)

Runtime 1.7.x returns nested predictions with outer list wrapping:

{
  "fields": ["output"],
  "rows": [
    ["{\"predictions\": [[0.9, 0.1]], \"labels\": [\"pos\", \"neg\"]}"]
  ]
}

Runtime 2.0.x returns flat predictions:

{
  "fields": ["output"],
  "rows": [
    ["{\"predictions\": [0.9, 0.1], \"labels\": [\"pos\", \"neg\"]}"]
  ]
}

Migrating client code

Update response parsing from:

# Old (runtime 1.7.x): one blob, manually index by row
response = score(rows)
blob = json.loads(response["rows"][0][0])  # single row always
row_0_predictions = blob["predictions"][0]
row_1_predictions = blob["predictions"][1]

To:

# New (runtime 2.0.x): one row per input, already split
response = score(rows)
row_0 = json.loads(response["rows"][0][0])
row_0_predictions = row_0["predictions"]
row_1 = json.loads(response["rows"][1][0])
row_1_predictions = row_1["predictions"]

Header forwarding changes in runtimes 2.0.x

Runtimes 2.0.x introduces a new Go+Worker architecture that changes how HTTP request headers are accessed in model code. The Python HTTP server (FastAPI/Gunicorn) no longer handles requests directly. Models that access HTTP headers from the request context must be updated.

Before (runtimes 1.x — no longer works in 2.0.x):

# In MLflow pyfunc model predict()
token = request.headers.get("Authorization", "")

Deployment configuration:

H2O_SCORER_AUTHORIZATION_HEADER="Authorization"

After (runtimes 2.0.x):

from h2o_scorer_core.context import request_headers

# In MLflow pyfunc model predict()
headers = request_headers.get({})
token = headers.get("Authorization", "")

Deployment configuration:

H2O_SCORER_FORWARD_HEADERS="Authorization"

Key differences

	Runtimes 1.x	Runtimes 2.0.x
HTTP server	FastAPI / Gunicorn (Python)	Go HTTP server + Python worker (Unix socket IPC)
Environment variable	`H2O_SCORER_AUTHORIZATION_HEADER`	`H2O_SCORER_FORWARD_HEADERS`
Multiple headers	No (single header)	Yes (comma-separated)
Access in model code	FastAPI/Starlette request context	`h2o_scorer_core.context.request_headers` ContextVar

Only explicitly listed headers are forwarded. Per-request tracing headers (x-request-id, traceparent, etc.) are automatically excluded to avoid degrading batching performance.

HTTP request header forwarding

Starting with H2O MLOps version 1.1 (which ships with runtimes 2.0.x), the scoring runtime HTTP server has changed from a Python-based stack (FastAPI/Gunicorn) to a Go HTTP server.

Scope

This change affects you if you:

Use the H2O_SCORER_AUTHORIZATION_HEADER environment variable
Have MLflow pyfunc models that read HTTP headers during scoring (via FastAPI/Starlette request context)
Rely on forwarded auth tokens to call external APIs from within model code

note

Pyfunc models that read HTTP headers through the FastAPI Request object or rely on the H2O_SCORER_AUTHORIZATION_HEADER environment variable no longer receive headers in runtimes 2.0.x. You must update your model code and environment configuration to use the new header forwarding mechanism described below.

Before (runtimes 1.x)

# Accessing headers via FastAPI request context
from starlette.requests import Request

def predict(self, context, model_input):
    request: Request = context.request
    token = request.headers.get("Authorization", "")
    ...

# Configuring a single header to forward
H2O_SCORER_AUTHORIZATION_HEADER="Authorization"

After (runtimes 2.0.x)

from h2o_scorer_core.context import request_headers

def predict(self, context, model_input):
    headers = request_headers.get({})  # dict[str, str]
    token = headers.get("Authorization", "")
    ...

Deployment configuration

If you're configuring the runtime container, replace H2O_SCORER_AUTHORIZATION_HEADER with H2O_SCORER_FORWARD_HEADERS:

# Configuring headers to forward (comma-separated)
H2O_SCORER_FORWARD_HEADERS="Authorization,X-Custom-Token"

note

H2O_SCORER_FORWARD_HEADERS accepts a comma-separated list of header names. The runtime automatically blocks internal tracing headers (for example, x-request-id, traceparent) from being forwarded to the model code, regardless of what you configure in this variable.

The following table summarizes the key differences between runtimes 1.x and 2.0.x:

Aspect	Runtimes 1.x	Runtimes 2.0.x
HTTP server	FastAPI/Gunicorn (Python)	Go HTTP server + Python worker (Unix socket)
Header access mechanism	FastAPI/Starlette `Request` object	`h2o_scorer_core.context.request_headers` ContextVar
Configuration env var	`H2O_SCORER_AUTHORIZATION_HEADER` (single header)	`H2O_SCORER_FORWARD_HEADERS` (comma-separated list)
Supports multiple headers	No (single header)	Yes (comma-separated)
Batching safety	N/A	Automatic batch-key partitioning by header values

Batch scoring Secure Store integration

Starting with H2O MLOps version 1.1, secret fields in batch scoring source and sink configurations must reference Secure Store Secret IDs rather than raw sensitive values.

Using secrets in batch scoring

Create secrets in the Secure Store.
Reference Secret IDs in your batch scoring source or sink configuration.

This requirement applies to both the UI and the Python client.

Removal of pre-AuthZ server AuthZ capability from the gRPC gateway

The pre-AuthZ server authorization capability has been removed from the gRPC gateway. The configuration options apiGateway.config.userJwtClaim and apiGateway.config.allowedUserRoles, which previously allowed you to configure role and group values for accessing the MLOps API Gateway, have been removed.

To achieve the same functionality, define an AuthZ policy that denies all MLOps actions across all workspaces.

From 0.70.0 to 1.0.0

Workspace integration

MLOps 1.0.0 is integrated with the Workspace service. All projects have been migrated to Workspaces, and both the user interface and Python client have been updated accordingly.

Python client

The legacy, automatically generated Python Client is no longer compatible with MLOPS 1.0.0. Only the h2o-mlops 1.4.0 and higher is supported. Please migrate your workflows to the new Python Client.

To migrate from version 1.3.x to 1.4.x, see the Python client migration guide from v1.3.x to v1.4.x.

Removal of Wave UI

Starting from H2O MLOps version 1.0.0, the legacy Wave-based user interface is no longer available. The official and supported MLOPs user interface is now part of the H2O AI Cloud user interface. The Admin Analytics Wave app has also been removed, and its capabilities have been migrated to the new interface.

Helm chart changes

In Affinity and Tolerations configuration, the field matchExpression has been replaced by the matchExpressions.

The option apiGateway.authorization.enabled has been removed as authorization is now automatically used on API Gateway. In case this option was set to false in previous MLOPs versions, please make sure to keep the apiGateway.authorization.allowedUserRoles set to []. [] is default value of this option.

Monitoring setup changes

Starting from H2O MLOps version 1.0.0, the legacy monitoring setup has been removed and replaced with a new configuration method using the MonitoringOptions class in the Python client. Monitoring is disabled by default and must be explicitly enabled during or after deployment.

Kafka sink configuration is now more granular and can be set on a per-model basis.

For more information, see Monitoring setup.

Migrate from old monitoring to new monitoring

The new monitoring system differs from the legacy setup. To migrate legacy monitoring data and existing deployments, MLOps 1.0.0 includes a migration job. You must explicitly enable this job during or after installation using Helm chart parameters.

Migration is optional. New monitoring works for newly created deployments without migration. However, deployments created before MLOps 1.0.0 do not have monitoring enabled, and historical scoring metrics are unavailable unless you run the migration. To enable the migration, set the following Helm parameters:

note

Before you run the migration, scale all deployments with monitoring enabled to at least 1 replica. The migration requires active pods to transfer monitoring data and configurations. Deployments scaled to 0 will be not migrated.

global:
  components:
    influxdb:
      enabled: true
    superset:
      enabled: true
mlops:
  config:
    models:
      monitoringEnabled: true
      monitoringMigrationEnabled: true

To enable the migration, you must also enable the new monitoring. You cannot enable only the migration.

Hash security option changes

Starting from H2O MLOps version 1.0.0, hash-based security options require you to provide the passphrase directly. The hashing is now handled automatically in the backend.

Make sure to store the passphrase in a secure location, as you won't be able to retrieve it after it's submitted.

From 0.69.x to 0.70.0

Transition from Scoring Client to native batch scoring

Starting from H2O MLOps version 0.70, batch scoring functionality has been natively integrated into H2O MLOps, replacing the H2O MLOps Scoring Client. The native batch scoring implementation is available through the official H2O MLOps Python client.

For added convenience, batch scoring can also be performed through the new H2O MLOps UI. Please contact H2O.ai support in case you need any guidance on this migration.

Workload identity and IAM authentication

Starting from H2O MLOps version 0.70, workload identity and IAM authentication will be managed using the github.com/h2oai/go-pkg/database/postgres/v2 library for the mlops-storage, mlops-telemetry, and mlops-deployer components.

Update the connection strings for these components to match the formats shown in the examples below:

Example of the mlops-storage and mlops-telemetry database connection string:

storage_db_connection_string = "postgres://${var.mlops_db_username}@${var.mlops_db_address}:5432/${var.mlops_storage_db}?aws_iam_auth_enabled=true&aws_iam_auth_region=${var.aws_region}&aws_iam_auth_user=${var.mlops_db_username}&aws_iam_auth_endpoint=${var.mlops_db_address}:5432"

Example of the mlops-deployer database connection string:

deployment_db_connection_string = "postgres://${var.mlops_deployment_db_address}/${var.mlops_deployment_db_name}?sslmode=${var.db_connection_ssl_mode}&user=${urlencode(var.mlops_deployment_db_username)}&password=${urlencode(var.mlops_deployment_db_password)}"

Removal of mTLS

mTLS is no longer managed by Kubernetes jobs. For environments requiring mTLS communication between MLOps services, this should now be handled by a service mesh solution (such as Istio).

Previous versions used SPIFFE for service-to-service authentication. Version 0.70+ now uses service account tokens instead.

Migration Steps

If your deployment requires mTLS:

Remove any existing Kubernetes job configurations for mTLS.
Implement a service mesh solution to manage mTLS between services.
Configure your service mesh to handle the TLS certificate management.

Changes in Helm

Config tls has been removed.
Config storage.tls has been removed.
Config deployer.tls has been removed.
Config ingest.tls has been removed.
Config apiGateway.tls has been removed.
Config monitoringAppBackend.tls has been removed.
Config deployer.telemetry.auth.tlsEnabled has been removed.
Config telemetry.serverSecurityEnabled has been removed.

Config storage.auth.service has been added:

service:
  # -- Issuer URL for service authentication.
  # In general, this should be set to whatever the "issuer" field is in the cluster's
  # OIDC discovery document.
  # `kubectl get --raw /.well-known/openid-configuration` can be used to retrieve
  # the discovery document.
  issuerURL: "https://kubernetes.default.svc"

  # -- Configures whether to validate issuer URL for service authentication.
  # Issuer of some service account variants does not need to be the issuer
  # specified by the issuerURL.
  validateIssuer: false

  # -- Configures whether to use the Kubernetes HTTP client with TLS and token the issuer discovery
  # and downloading the signing keys.
  # Disable this when the issuer is not Kubernetes API server.
  useKubernetesHTTPClient: true

Removal of support for older H2O Driverless AI versions

In MLOps version 0.70, support for H2O Driverless AI versions 1.10.6.3 and earlier has been discontinued. This change affects the following versions:

1.10.5-cuda11.2.2
1.10.5.1-cuda11.2.2
1.10.6-cuda11.2.2
1.10.6.1-cuda11.2.2
1.10.6.2-cuda11.2.2
1.10.6.3-cuda11.2.2

Removal of `Pickle` Runtime

Starting from H2O MLOps version 0.70, the Pickle Runtime has been removed.

Removal of `environment` from Python Client and UI

Starting with MLOps version 0.70.0, the environment feature has been removed from the user perspective in both the Python client and the UI. This change does not apply to the backend, and environment-related functionalities remain intact.

Changes in the UI

Users no longer need to select an environment (e.g., PROD or DEV) when creating a deployment.
The environment now defaults internally to PROD.
Environment-related details are no longer visible in the UI.

Changes in the Python Client

The environments property of the MLOpsProject instance is no longer available starting from client version 1.3.0.
The environment now defaults internally to PROD.
When using the updated client, the following adjustments must be made in the code. Here, project refers to an instance of MLOpsProject, and client refers to an instance of h2o_mlops.Client:

Code Adjustments:
- Replace project.environments.get(uid).deployments with project.deployments.
- Replace project.environments.get(uid).endpoints with project.endpoints.
- Replace project.environments.get(uid).allowed_affinities with client.allowed_affinities.
- Replace project.environments.get(uid).allowed_tolerations with client.allowed_tolerations.
With these adjustments, your code will remain compatible with both the updated and older versions of the MLOps backend.

From 0.68.x to 0.69.0

MLOPs runtimes

All runtime images must be updated to at least version 1.5.3, which was released with H2O MLOps version 0.69.0

Starting with version 0.69.0 all runtime images must be always updated to the runtime images released with that corresponding H2O MLOPs version. For example, H2O MLOps 0.69.1 was released with runtime images v1.5.4, and therefore all deployment's images must be updated to runtimes v1.5.4.

MLOps storage

Starting with MLOps version 0.69.0, only blob storages are supported as the backend. Support for other storage options has been discontinued. This change impacts the configuration parameters used in the MLOps Helm charts.

Changes to storage configuration parameters

storage:
  persistence:
    # All parameters under this section are no longer supported.
  cloudPersistence:
    # The 'enabled' parameter was removed since cloudPersistence is now the only option supported.
    enabled: 
  pvcMigration:
    # All parameters under this section are no longer supported.

note

After upgrading to MLOps 0.69.0, you can safely delete your existing PVC that was used as the storage backend prior to the 0.68.0 release. Perform this step manually to prevent any unintended data loss.

PBKDF2 hash support

H2O MLOps v0.69.0 now supports the PBKDF2 passphrase hash algorithm for more secure hashing. Note the following details:

The PBKDF2 hash should follow the format pbkdf2:<hashFunc>:<iterations>$<salt>$<hash>.
The salt and hash should be base64 encoded.
PBKDF2 hashing replaces bcrypt when creating deployments with the Passphrase (Stored hashed) security option.
The Passphrase (Stored hashed) security option is listed as an available option in the Create Deployment panel dropdown only if PASSPHRASE_HASH_TYPE_PBKDF2 is included under securityOptions.activated in the values.yaml. Having PASSPHRASE_HASH_TYPE_BCRYPT is neither sufficient nor required.
Older deployments created with bcrypt hashing remains accessible without requiring any additional configuration.

From 0.67.x to 0.68.0

(Optional) Vertical Pod Autoscaler (VPA) support

MLOps version 0.68.0 introduces Vertical Pod Autoscaler (VPA) support for the Deployer. Note that VPA activation is optional and performed upon request. VPA allows dynamic scaling of CPU and memory resources based on application usage, improving resource efficiency and optimizing costs.

If the VPA is activated in MLOps, then VPA is supported in the cluster and the VPA CRDs and controllers are up and running alongside the Metrics Server.

For more information, see the Installation section of the VPA GitHub README and the Metrics Server installation instructions.

Note: For a list of known limitations, see the Known limitations section of the VPA GitHub README.

Key changes

VPA Resource Specifications: Added VPA resource specification logic to the Scoring Apps and App Composer, allowing for the dynamic adjustment of their resource limits based on real-time demand.
API Updates: New API logic has been added for specifying and validating VPA resources.
New VPA Utility Functions: Implemented utility methods for creating and managing VPA resources, including validation and resource quantity handling.
Deprecated Function Removal: Removed the deprecated Fabric8 createOrReplace usage in the Scoring Apps.

Removal of HT runtime based on Python 3.8

The Hydrogen Torch (HT) runtime based on Python 3.8, which was available by default in MLOps version 0.67.x, has been removed as of MLOps version 0.68.0. However, you can still use this runtime by registering it through extra runtimes.

The following requirements need to be met so that the runtime registered through extra runtimes is also visible in the UI:

The mlflow/flavors/python_function/loader_module must match mlflow.pyfunc.model.
The runtime name must adhere to this pattern: (python-scorer_hydrogen_torch_)(\w*)(38)(\w*).

Configure maximum number of Kubernetes replicas

With H2O MLOps v0.68.0, you can configure the maximum number of Kubernetes replicas that can be specified when creating a new deployment. To do this, update maxDeploymentReplicas in the values.yaml file (charts/mlops/values.yaml). By default, the maxDeploymentReplicas value is set to 5.

Removal of MLflow runtimes based on Python 3.8

MLflow runtimes based on Python 3.8 have been removed in MLOps version 0.68.0. Python 3.8 has officially reached end of life as of October 07, 2024.

Pickle runtime based on Python 3.12

MLOps version 0.68.0 introduces a pickle runtime using Python 3.12. Choose one of the following options:

Update your models to work with Python 3.12.
If you cannot update your models, the original pickle runtime based on Python 3.8.18 can be configured during MLOps installation by replacing the pickle-3.12.7 image with pickle-3.8.18.

Deployment of MLOps Telemetry as a long-running microservice

In MLOps version 0.67 and earlier, the MLOps telemetry component was configured as a cron job within the MLOps storage component in the Helm configuration. Starting with MLOps version 0.68, the MLOps telemetry component must be deployed as a separate long-running microservice that publishes event data at scheduled intervals.

To migrate from MLOps version 0.67 to 0.68:

Remove the cron job configuration from the MLOps storage component in the Helm configuration.
Implement it as a separate telemetry component within Helm.

Helm values must be set as follows:

# Telemetry Configrations
telemetry = {
  enabled = true
  image = {
    repository = "h2oai-modelstorage-telemetry${local.shared_services_repository_suffix}"
    tag        = local.component_version.mlops_telemetry_version
  }
  replicaCount = 1
  nodeSelector = {
    "hac.h2o.ai/provisioner" = "karpenter"
  }
  tolerations = [
    {
      key      = "type"
      operator = "Equal"
      value    = "cpu-consolidation"
      effect   = "NoSchedule"
    }
  ]
  podSecurityContext = {
    enabled = true
  }
  containerSecurityContext = {
    enabled = true
  }
  serviceAccount = {
    name = "hac-mlops-storage-telemetry-service-account"
  }
  serverAddress = "hac-telemetry-service.telemetry.svc.cluster.local:80"
  config = {
    logLevel = "error"
  }
}

Scheduler routine for MLOps Telemetry

MLOps version 0.68.0 introduces the SCHEDULER_INTERVAL_SECONDS env variable to run scheduler routine inside the application itself, replacing the use of a cron job. As a result, MLOps Telemetry is deployed as a long-running deployment in the K8s cluster that publishes event data at scheduled intervals. The default value is as follows:

SCHEDULER_INTERVAL_SECONDS=300

Restructured environment security options

Environment-related security options are now configured in a different way. Prior to v0.68.0, security options were specified using their corresponding numerical values. For example:

securityOptions: [1,2,3]

From v0.68.0 onwards, activated security options are configured in the values.yaml file (charts/mlops/values.yaml) using the security option name. For example:

securityOptions:
    activated:
        - .......
        - "AUTHORIZATION_PROTOCOL_OIDC"
        - .......

You can also set the default security option in the values.yaml file (charts/mlops/values.yaml) using the security option name. The default option serves as the default security setting that will be applied in the UI when creating a deployment and it must be a part of the Activated Security Options List.

securityOptions:
    activated:
        - .......
        - "PASSPHRASE_HASH_TYPE_PLAINTEXT"
        - .......
    default: "PASSPHRASE_HASH_TYPE_PLAINTEXT"

The following security options are supported in v0.68.0:

DISABLED: No security options are activated.
PASSPHRASE_HASH_TYPE_PLAINTEXT: Passphrase hash type is plaintext.
PASSPHRASE_HASH_TYPE_BCRYPT: Passphrase hash type is bcrypt.
AUTHORIZATION_PROTOCOL_OIDC: OIDC authorization protocol is activated.

Notes

The Activated Security Options List can not be empty.
The default option must be part of the Activated Security Options List.

From v0.68.0 onwards, the way to create a deployment with No Security via API call also differs from previous versions. This change includes the following modifications to the h2o-mlops Python Client:

security_options is now a required field for the create_single method of the MLOpsScoringDeployments class.
To ensure backward compatibility, v0.68.0 includes a new attribute for the SecurityOptions class, called disabled_security. This attribute allows handling cases with the No Security option by setting it to True, instead of treating None or SecurityOptions() as No Security.
Users of MLOps assembly v0.68.0 or above must set disabled_security=True to use the No Security option. For users on older versions, No Security mode can be accessed by using SecurityOptions with default values.

Helm changes

As of version 0.68.0, the ENABLE_USER_EXTERNALID_UPDATE environment variable has been removed from storage, as it is no longer necessary.
deploymentEnvironment.corsOrigin has been removed. Use global.cors.allowedOrigin instead.

Default deployment security option

As of version 0.68.0, the default security option for deployment is PASSPHRASE_HASH_TYPE_PLAINTEXT. Prior to this version, deployments were not secured by default.

Cloud migration information: MLOps storage

Starting with version 0.68.0, H2O MLOps will no longer support PVCs for storage, transitioning instead to cloud blob storage. MLOps storage will support blob storage from all three major cloud providers—AWS, Azure, and GCP—as well as Minio for on-premises installations. Consequently, all existing data must be migrated from PVC to blob storage during the upgrade to MLOps 0.68.0. All the data migrations steps will be taken care of by MLOps when MLOps storage is deployed in the MIGRATE mode and no manual user intervention is needed. End users shouldn't experience any down time or data loss while the migration is in progress.

Installation instructions

Deploy storage in MIGRATE mode

Note: Only follow the instructions in this section if MLOps storage was previously deployed with LOCAL mode using a Kubernetes PVC as the storage.

For AWS environments with S3

IAM auth is used to access the bucket. Following annotation should be set to the storage service account.

eks.amazonaws.com/role-arn: <iam-role-arn>

storage:
  serviceAccount:
    create: true
    annotations: {
      eks.amazonaws.com/role-arn: <iam-role-arn>
    }
  persistence:
    enabled: true
  cloudPersistence: 
    enabled: true
    url: s3://<bucket-name>?region=<bucket-region>&prefix=<optional-prefix>
  pvcMigration:
    enabled: true
    cloudProvider: s3
    bucketName: <bucket-name>
    region: <bucket-region>
    prefix: <optional-prefix>

For GCP environments with Google Cloud Storage

Workload identify is used to access the bucket. The following annotation must be set to the storage service account:

iam.gke.io/gcp-service-account: <service_account_email>

Helm values must be set as follows:

storage:
  serviceAccount:
    create: true
    annotations: {
      iam.gke.io/gcp-service-account: <service_account_email>
    }
  persistence:
    enabled: true
  cloudPersistence: 
    enabled: true
    url: gs://<bucket-name>
  pvcMigration:
    enabled: true
    cloudProvider: gcs
    bucketName: <bucket-name>
    region: <bucket-region>

For Azure environments with Azure Blob Storage

Workload identify is used to access the bucket. The following annotation must be set to the storage service account:

azure.workload.identity/client-id=<client-id>

The following label must be set to storage pods (service and migrator job):

azure.workload.identity/use=true

Helm values must be set as follows:

storage:
  serviceAccount:
    create: true
    annotations: {
      azure.workload.identity/client-id=<client-id>
    }
  extraPodLabels: {
    azure.workload.identity/use=true
  }
  persistence:
    enabled: true
  cloudPersistence: 
    enabled: true
    url: azblob://<bucket-name>
  pvcMigration:
    enabled: true
    cloudProvider: azureblob
    bucketName: <bucket-name>
    region: <bucket-region>
    accountName: <storage-account-name>

For on-premise environments with Minio

storage:
  persistence:
    enabled: true
  cloudPersistence:
    enabled: true
    url: s3://<minio-bucket-name>?endpoint=<minio-url>&region=<minio-region>&hostname_immutable=true
    access_key_id: <minio-access-key-id>
    secret_access_key: <minio-secret-access-key>
  pvcMigration:
    enabled: true
    cloudProvider: minio
    bucketName: <bucket-name>
    region: <minio-region>
    endpoint: <minio-url>
    access_key_id: <minio-access-key-id>
    secret_access_key: <minio-secret-access-key>

From 0.66.1 to 0.67.0

Announcement: Upcoming Java MOJO Runtime removal

The Java MOJO Runtime was removed in H2O MLOps version 1.1. Version 0.68.0 was the last release to include the Java MOJO Runtime. For migration steps, see Removal of Java MOJO Runtime. The C++ MOJO runtime has also been replaced by dai-mojo-scorer.

Scoring runtimes

MLflow Runtimes images are twice as large now. This means that deployments of these run-times can take longer due to longer pulling times.
Runtimes for DAI 1.10.4.3 and older are removed as of MLOps version 0.67.0.
MLflow runtimes support Python 3.8 and later starting with MLOps version 0.67.0.

For more information on scoring runtimes in H2O MLOps, see Scoring runtimes.

Python client

Starting with version 0.67.0, the official Python client of H2O MLOps is h2o-mlops. The minimum Python version required for the client is Python 3.9.

Built on top of the legacy Python client, h2o-mlops retains all previous functionalities. You can continue to access the legacy client's features through h2o-mlops as needed.

Note that users of the legacy client can switch to the new Python client (h2o-mlops) by importing h2o-mlops before using any features of the legacy client. This switch can be made without needing to modify any existing code or import statements.

Removal of Conda from Wave app

With the removal of Conda as of MLOps version 0.67.0, third-party models can no longer be uploaded to the MLOps frontend using serialized Pickle files. However, you can still upload models from frameworks like scikit-learn, PyTorch, XGBoost, LightGBM, and TensorFlow using MLflow packaged files.

Monitoring data retention

Starting with version 0.67.0, per project data retention duration can be set for monitoring data stored on InfluxDB. To enable this feature, set the MONITOR_INFLUXDB_PER_PROJECT_DATA_RETENTION_PERIOD env to the deployer with a correct duration string. Minimum retention period is 1h and the max is INF. INF will be the default If MONITOR_INFLUXDB_PER_PROJECT_DATA_RETENTION_PERIOD is not set, INF is the default duration.

-monitor_influxdb_per_project_data_retention_period is exposed for H2O MLOps helm charts to set the MONITOR_INFLUXDB_PER_PROJECT_DATA_RETENTION_PERIOD for deployer.

Emissary

Switch from emissary to gateway-api:

Emissary's CRDs are no longer used.
For mapping deployments to http, Gateway API's HTTPRoute CRD is used.
Gateway API implemented with Envoy Gateway.
(Breaking change) Gateway API doesn't support custom error responses. This means that if a deployment is scaled down, the following custom error body is no longer displayed: Deployment is scaled down to zero replicas. Please increase the number of replicas to use the deployment. For more information, see Custom error responses.
(Breaking change) - If a deployment is scaled down, error code 500 is thrown instead of 503.

Other changes

External model registry is removed as of version 0.67.0.

Feedback

Submit and view feedback for this page
Send feedback about H2O MLOps to cloud-feedback@h2o.ai

HT scorer runtime 1.7.x to 2.0.x​

What changed​

Multi-row requests​

Single-row requests​

Migration steps​

Multi-row requests​

Single-row requests​

Why this change was made​

From 1.0 to 1.1​

Runtimes​

Removal of MLflow Runtime for Python 3.9​

Removal of Java MOJO Runtime​

Migrating Driverless AI MOJO deployments​

Migrating H2O-3 MOJO deployments​

API compatibility​

Hydrogen Torch scorer runtime response format changes​

Header forwarding changes in runtimes 2.0.x​

HTTP request header forwarding​

Scope​

Before (runtimes 1.x)​

After (runtimes 2.0.x)​

Deployment configuration​

Batch scoring Secure Store integration​

Using secrets in batch scoring​

Removal of pre-AuthZ server AuthZ capability from the gRPC gateway​

From 0.70.0 to 1.0.0​

Workspace integration​

Python client​

Removal of Wave UI​

Helm chart changes​

Monitoring setup changes​

Migrate from old monitoring to new monitoring​

Hash security option changes​

From 0.69.x to 0.70.0​

Transition from Scoring Client to native batch scoring​

Workload identity and IAM authentication​

Removal of mTLS​

Migration Steps​

Changes in Helm​

Removal of support for older H2O Driverless AI versions​

Removal of Pickle Runtime​

Removal of environment from Python Client and UI​

Changes in the UI​

Changes in the Python Client​

From 0.68.x to 0.69.0​

MLOPs runtimes​

MLOps storage​

Changes to storage configuration parameters​

PBKDF2 hash support​

From 0.67.x to 0.68.0​

(Optional) Vertical Pod Autoscaler (VPA) support​

Key changes​

Removal of HT runtime based on Python 3.8​

Configure maximum number of Kubernetes replicas​

Removal of MLflow runtimes based on Python 3.8​

Pickle runtime based on Python 3.12​

Deployment of MLOps Telemetry as a long-running microservice​

Scheduler routine for MLOps Telemetry​

Restructured environment security options​

Helm changes​

Default deployment security option​

Cloud migration information: MLOps storage​

Installation instructions​

Deploy storage in MIGRATE mode​

For AWS environments with S3​

For GCP environments with Google Cloud Storage​

For Azure environments with Azure Blob Storage​

For on-premise environments with Minio​

From 0.66.1 to 0.67.0​

Announcement: Upcoming Java MOJO Runtime removal​

Scoring runtimes​

Python client​

Removal of Conda from Wave app​

Monitoring data retention​

Emissary​

Other changes​

HT scorer runtime 1.7.x to 2.0.x

What changed

Multi-row requests

Single-row requests

Migration steps

Multi-row requests

Single-row requests

Why this change was made

From 1.0 to 1.1

Runtimes

Removal of MLflow Runtime for Python 3.9

Removal of Java MOJO Runtime

Migrating Driverless AI MOJO deployments

Migrating H2O-3 MOJO deployments

API compatibility

Hydrogen Torch scorer runtime response format changes

Header forwarding changes in runtimes 2.0.x

HTTP request header forwarding

Scope

Before (runtimes 1.x)

After (runtimes 2.0.x)

Deployment configuration

Batch scoring Secure Store integration

Using secrets in batch scoring

Removal of pre-AuthZ server AuthZ capability from the gRPC gateway

From 0.70.0 to 1.0.0

Workspace integration

Python client

Removal of Wave UI

Helm chart changes

Monitoring setup changes

Migrate from old monitoring to new monitoring

Hash security option changes

From 0.69.x to 0.70.0

Transition from Scoring Client to native batch scoring

Workload identity and IAM authentication

Removal of mTLS

Migration Steps

Changes in Helm

Removal of support for older H2O Driverless AI versions

Removal of `Pickle` Runtime

Removal of `environment` from Python Client and UI

Changes in the UI

Changes in the Python Client

From 0.68.x to 0.69.0

MLOPs runtimes

MLOps storage

Changes to storage configuration parameters

PBKDF2 hash support

From 0.67.x to 0.68.0

(Optional) Vertical Pod Autoscaler (VPA) support

Key changes

Removal of HT runtime based on Python 3.8

Configure maximum number of Kubernetes replicas

Removal of MLflow runtimes based on Python 3.8

Pickle runtime based on Python 3.12

Deployment of MLOps Telemetry as a long-running microservice

Scheduler routine for MLOps Telemetry

Restructured environment security options

Helm changes

Default deployment security option

Cloud migration information: MLOps storage

Installation instructions

Deploy storage in MIGRATE mode

For AWS environments with S3

For GCP environments with Google Cloud Storage

For Azure environments with Azure Blob Storage

For on-premise environments with Minio

From 0.66.1 to 0.67.0

Announcement: Upcoming Java MOJO Runtime removal

Scoring runtimes

Python client

Removal of Conda from Wave app

Monitoring data retention

Emissary

Other changes