The obvs library is a Python package that provides tools and utilities for analyzing and interpreting language models using the patchscope framework. It offers a range of functionalities to probe and understand the internal representations and behaviors of language models at different layers and positions.

Welcome to obvs’s documentation!

Contents:

Indices and tables

Detail

obvs directory:

  • The obvs directory contains the main components of the library, including the patchscope, patchscope_base, lenses, logging, and metrics modules.

patchscope and patchscope_base:

  • The patchscope module implements the core functionality of the patchscope framework, which allows for mapping and patching representations between different language models.

  • The patchscope_base module serves as an abstract base class for the patchscope framework, providing common functionality and abstractions.

  • Together, these modules enable the analysis and interpretation of language models by mapping representations from a source model to a target model and studying the effects of interventions.

lenses:

  • The lenses module provides various lenses for analyzing language models, such as TokenIdentity, BaseLogitLens, PatchscopeLogitLens, and ClassicLogitLens.

  • Lenses are techniques that allow for probing and understanding the internal representations and behaviors of language models at different layers and positions.

  • The module includes methods for running lens analyses, computing metrics like surprisal and precision@1, and visualizing the results using heatmaps and plots.

logging:

  • The logging module provides utility functions and classes for configuring and customizing the logging behavior in the application.

  • It includes a custom logging handler (TqdmLoggingHandler) that integrates with the tqdm progress bar library, allowing log messages to be displayed alongside the progress bar.

  • The module also configures a specific logger named “patchscope” with a file handler that logs messages to a file named “experiments.log”.

metrics:

  • The metrics module defines evaluation metrics for language modeling tasks.

  • It includes classes like PrecisionAtK and Surprisal for computing precision@k and surprisal metrics, respectively.

  • These metrics can be used to assess the performance of language models and evaluate the effectiveness of different interpretation techniques.

scripts directory:

  • The scripts directory contains a collection of scripts that serve as a cookbook for reproducing standard results using the obvs library and the patchscope framework.

  • The scripts demonstrate how to use different lenses and techniques to analyze and interpret language models.

  • Some notable scripts include: - activation_patching_ioi: Uses activation patching to study indirect object identification (IOI) on gpt2-small. - future_lens: Generates the future lens at a single position. - generate_next_token_prediction_data: Generates data for next token prediction tasks. - replicate_figure_2: Replicates Figure 2 from a specific research paper. - reproduce_logitlens_results: Reproduces the results of the original logitlens blog post. - token_identity_prompts and token_identity: Demonstrate the usage of the token identity lens.

Overall, the obvs library provides a comprehensive set of tools and utilities for analyzing and interpreting language models using the patchscope framework. It offers a range of lenses, metrics, and visualization techniques to study the internal representations and behaviors of models at different layers and positions. The included scripts serve as practical examples and a starting point for conducting interpretability experiments and reproducing standard results.