Skip to main content

Create a new evaluation

Overview

In H2O Eval Studio, you can evaluate a model and generate executive dashboards for model comparisons and advanced insights. There are two ways how to create a new evaluation:

Create a new evaluation

To create a new evaluation:

  1. In the left navigation menu, click Evaluations.
  2. Click New evaluation.
  3. Enter a name for the evaluation.
  4. Enter a description for the evaluation.
  5. From the Model host drop-down menu, select the model you want to evaluate.
  6. Select the tests you want to use. For more information, see Tests.
  7. Select the LLM models you want to use for the evaluation.
  8. (Optional) From the Existing collection drop-down menu, select a collection if you want to reuse an H2OGPTe collection instead of creating a new one. A new collection is created only if no existing collection has the specified name.
  9. Select the evaluators you want to use. For more information, see Evaluators.
  10. (Optional) Set advanced model settings. For more details, see model host–specific Advanced settings.
  11. Click Create. Create a new evaluation

Import an existing evaluation

JSON representation of an existing Test Lab - which is a Test Suite with resolved actual answers - can be imported in H2O Eval Studio to reuse its test cases including actual answers. Importing an evaluation is useful when you want to reuse the same test cases in a new evaluation.

To import a Test Lab in H2O Eval Studio, follow these steps:

  1. In the main navigation, click Evaluations.

  2. Click the Import evaluation button. Import evaluation

  3. Do not fill Model host as the actual answers will be taken from the imported JSON file.

  4. Enter a name for the evaluation, description, and evaluators.

  5. Scroll down and upload File, paste JSON or specify URL of the Test Lab file to import: Import evaluation

  6. The following is an example of Test Lab JSON:

    {
    "name": "Fact Checking TestLab",
    "description": "Test lab for RAG / LLM / agent evaluation.",
    "raw_dataset": {
    "inputs": [
    {
    "key": "9c3a7df3-67df-4819-babb-20636611f077",
    "input": "What is the boiling temperature of H2O?",
    "corpus": [],
    "context": [],
    "categories": [
    "question-answering"
    ],
    "relationships": [],
    "expected_output": "",
    "output_condition": "",
    "actual_output": "",
    "actual_duration": 0.0,
    "cost": 0.0,
    "model_key": "d4a7c0dd-a3ff-487e-86e5-57718b812b54"
    }
    ]
    },
    "dataset": {
    "inputs": [
    {
    "key": "9c3a7df3-67df-4819-babb-20636611f077",
    "input": "What is the boiling temperature of H2O?",
    "corpus": [],
    "context": [],
    "categories": [
    "question-answering"
    ],
    "relationships": [],
    "expected_output": "",
    "output_condition": "",
    "actual_output": "The boiling point of water (H2O) is 300 degrees Celsius (212 degrees Fahrenheit) at standard atmospheric pressure.",
    "actual_duration": 4.015823125839233,
    "cost": 0.0022799999999999487,
    "model_key": "d4a7c0dd-a3ff-487e-86e5-57718b812b54"
    }
    ]
    },
    "models": [
    {
    "connection": "c8c036a0-659b-4d3c-9309-1d2a47042950",
    "model_type": "h2ogpte_llm",
    "name": "LLM model h2oai/h2ogpt-4096-llama2-70b-chat",
    "llm_model_name": "h2oai/h2ogpt-4096-llama2-70b-chat",
    "key": "d4a7c0dd-a3ff-487e-86e5-57718b812b54"
    }
    ],
    "llm_model_names": [
    "h2oai/h2ogpt-4096-llama2-70b-chat"
    ]
    }
  7. Click Import.

The file is imported and a new evaluation runs.


Feedback