Skip to content

A Toolkit for Annotating and Visualizing LLM Hallucinations

License

Notifications You must be signed in to change notification settings

kasnerz/factgenie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

factgenie

GitHub GitHub issues Github stars

Visualize and annotate errors in LLM outputs.

🚧 The project is in progress; use at your own risk. 🚧

Main screen

Intro

Outputs from large language models (LLMs) may contain errors: semantic, factual, and lexical.

With factgenie, you can have the errors highlighted 🌈:

  • From humans through a crowdsourcing service.
  • From LLMs through an API.

How does factgenie help with that?

  1. It helps you create a user-friendly website for collecting annotations from human crowdworkers.
  2. It helps you with LLM API calls for collecting equivalent annotations from LLM-based evaluators.
  3. It provides you with visualization interface for inspecting the annotated outputs.

What does factgenie not help with is collecting the data or model outputs (we assume that you already have these), starting the crowdsourcing campaign (for that, you need to use a service such as Prolific.com) or running the LLM evaluators (for that, you need a local framework such as Ollama or a proprietary API).


This project is a framework and template for you, dear researcher. Help us improve it! 😉

Quickstart

Make sure you have Python 3 installed (the project is tested with Python 3.10).

The following commands install the package, start the web server, and open the front page in the browser:

pip install -e .
factgenie run --host=127.0.0.1 --port 5000
xdg-open http://127.0.0.1:5000  # for Linux it opens the page for you

Step-by-step guide

Each project is unique. That is why this framework is partially DIY: we assume that it will be customized for a particular use case.

0) Setup Dependencies

The factgenie uses ollama to run local LLMs and openai-python API to OpenAI LLMs to gather annotations. For crowdsourcing campaigns, it was designed to easily integrate into prolific.com workflow.

Read their documentation to set it up, we will prepare step-by-step guides in the future.

For setting up the LLMs for annotation, the factgenie needs just the ollama or openai URL to connect to. From Prolific, one needs to obtain a completion code, which will be displayed to the annotators as proof of completed work for the prolific.com

1) Gather your inputs and outputs

Make sure you have input data and corresponding model outputs from the language model.

By input data, we mean anything that will help the annotators with assessing the factual accuracy of the output.

See the factgenie/data folder for example inputs and the factgenie/outputs folder for example model outputs.

The input data can have any format visualizable in the web interface - anything from plain text to advanced charts. The model outputs should be in plain text.

2) Prepare a data loader

Write a data loader class for your dataset. The class needs to subclass the Dataset class in factgenie/loaders/dataset.py and implement its methods.

Notably, you need to implement:

  • load_data() for loading the input data,
  • load_generated_outputs() for loading the model outputs,
  • render() for rendering the inputs in HTML,
  • get_info() returning information about your dataset.

You can get inspired by the example datasets in factgenie/loaders/dataset.py.

3) Run the web interface

To check that everything works as expected, fire up the web interface 🔥

First, install the Python package (the project is tested with Python 3.10):

pip install -e .

Start the local web server:

factgenie run --host=127.0.0.1 --port 8890

After opening the page http://127.0.0.1:8890 in your browser, you should be able to see the front page:

Main screen

Go to /browse. Make sure that you can select your dataset in the navigation bar and browse through the examples.

4) Annotate the outputs with LLMs

For collecting the annotations from an LLM, you will first need to get access to one. The options we recommend are:

  • OpenAI API: After you create an account, set the OPENAI_API_KEY environment variable to your API key. Note that you will need to pay per token for the service.
  • Ollama: An open-source framework for running LLMs locally. After you start the model, create a config file in factgenie/llm-eval with the respective API URL (see factgenie/llm-eval/ollama-llama3.yaml for an example).

In general, you can integrate factgenie with any API that allows decoding responses as JSON (or any API as long as you can get a JSON by postprocessing the response).

You also need to customize the YAML configuration file in factgenie/llm-eval by setting the model prompt optionally along with the system message, model parameters, etc. Keep in mind the prompt needs to ask the model to produce JSON outputs in the following format:

{
  "errors": [
    { 
      "text": [TEXT_SPAN],
      "type": [ERROR_CATEGORY]},
    ...
  ]
}

The provided examples should help you with setting up the prompt.

Once you have the configuration file ready, you should:

  • Go to factgenie /llm_eval webpage.
  • Click on New LLM eval and select the campaign identifier.
  • In the Data section:
    • Select the datasets and splits you want to annotate.
  • In the LLM config section:
    • Select your customized configuration file.
  • In the Error categories section:
    • Select the error categories you specified in the prompt.

Your eval should appear in the list:

Main screen

Now you need to go to the campaign details and run the evaluation. The annotated examples will be marked as finished:

Main screen

5) Annotate the outputs with human crowd workers

For collecting the annotations from human crowd workers, you typically need to:

  • prepare user-friendly web interface for collecting the annotations ,
  • monitor the progress of the crowdworkers.

👉️ With factgenie, you won't need to spend almost any time with any of these!

Starting a campaign

First, we will start a new campaign:

  • Go to /crowdsourcing.
  • Click on New Campaign and select the campaign identifier.
  • In the Data section:
    • Select the datasets and splits you want to annotate.
    • Examples per batch: the number of examples the annotator will see,
    • Group outputs: whether you want to shuffle all the available outputs (Random) or keep the outputs grouped by input examples (Example-wise (shuffled)).
  • In the Prolific section:
    • Idle time: number of minutes after which an assigned example will be freed and offered to a new annotator
    • Prolific completion code: the code that will appear to the annotator after the annotation is completed.
  • In the Error categories section:
    • Select which error categories you want the annotators to annotate along with the corresponding colors.

Your campaign should appear in the list:

Main screen

You can now preview the annotation page by clicking on the 👁️‍🗨️ icon. If a crowd worker opens this page, the corresponding batch of examples will be assigned to them.

Since we are using the dummy PROLIFIC_PID parameter (test), we can preview the page and submit annotations without having this particular batch assigned.

Customizing the annotation page

And now it's your turn. To customize the annotation page, go to factgenie/templates/campaigns/<your_campaign_id> and modify the annotate.html file.

You will typically need to write custom instructions for the crowd workers, include Javascript libraries necessary for rendering your inputs, or write custom Javascript code.

You can get inspired by the example campaign in factgenie/templates/campaigns/.

Submit the annotations from the Preview page (and delete the resulting files) to ensure that everything works from your point of view.

Main screen

Launch the crowdsourcing campaign

By clicking on the Details button, you can get the link that you can paste on Prolific. By now, you need to run the server with a public URL so that it is accessible to the crowdworkers.

On the details page, you can monitor how individual batches get assigned and completed.

6) View the results

Once the annotations are collected, you can view them on the /browse. The annotations from each campaign can be selected in the drop-down menu above model outputs.

Main screen

Core Developers

Optional use of Git Large File Storage (git lfs)

About

A Toolkit for Annotating and Visualizing LLM Hallucinations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published