obvs.patchscope_base

🩺 Patchscope Base Module

This module provides a base class PatchscopeBase with helper functions and abstract methods for implementing patchscope-based models. Patchscope is a technique used for evaluating and analyzing language models by comparing the outputs of a source model with a target model.

The PatchscopeBase class offers the following functionality: - Retrieving model-specific attributes based on the model name

  • Abstract methods for performing forward passes on the source and target models

  • Abstract methods for mapping and running the patchscope process

  • Properties for accessing source and target token IDs and tokens

  • Methods for retrieving top-k tokens, logits, and probabilities from the target model

  • Methods for finding the position of a substring in the source and target prompts

  • Properties for accessing the number of layers in the source and target models

  • Methods for computing Precision@1 and Surprisal metrics

Subclasses of PatchscopeBase should implement the abstract methods to provide the specific functionality for their patchscope-based models.

Note: The code assumes the presence of a tokenizer object (self.tokenizer) and source and target model objects (self.source_model and self.target_model) with specific attributes.

Module Contents

Classes

PatchscopeBase

A base class with lots of helper functions

class obvs.patchscope_base.PatchscopeBase

Bases: abc.ABC

A base class with lots of helper functions

property _source_position: collections.abc.Sequence[int]
property _target_position: collections.abc.Sequence[int]
property source_token_ids: list[int]

Return the source tokens

property target_token_ids: list[int]

Return the target tokens

property source_tokens: list[str]

Return the input to the source model

property target_tokens: list[str]

Return the input to the target model

property n_layers: int
property n_layers_source: int
property n_layers_target: int
get_model_specifics(model_name)

Get the model specific attributes. The following works for gpt2, llama2 and mistral models.

abstract source_forward_pass() None
abstract map() None
abstract target_forward_pass() None
abstract run() None
top_k_tokens(k: int = 10) list[str]

Return the top k tokens from the target model

top_k_logits(k: int = 10) list[int]

Return the top k logits from the target model

top_k_probs(k: int = 10) list[float]

Return the top k probabilities from the target model

logits() torch.Tensor

Return the logits from the target model (size [pos, d_vocab])

probabilities() torch.Tensor

Return the probabilities from the target model (size [pos, d_vocab])

output() list[str]

Return the generated output from the target model

_output_token_ids() list[int]
llama_output() list[str]

For llama, if you don’t decode them all together, they don’t add the spaces.

full_output_tokens() list[str]

Return the generated output from the target model This is a bit hacky. Its not super well supported. I have to concatenate all the inputs and add the input tokens to them.

full_output() str

Return the generated output from the target model This is a bit hacky. Its not super well supported. I have to concatenate all the inputs and add the input tokens to them.

find_in_source(substring: str) int

Find the position of the substring tokens in the source prompt

Note: only works if substring’s tokenization happens to match that of the source prompt’s tokenization

source_position_tokens(substring: str) tuple[int, list[int]]

Find the position of a substring in the source prompt, and return the substring tokenized

NB: The try: except block handles the difference between gpt2 and llama tokenization. Perhaps this can be better dealt with a seperate tokenizer class that handles the differences between the tokenizers. There are a few subtleties there, and tokenizing properly is important for getting the best out of your model.

find_in_target(substring: str) int

Find the position of the substring tokens in the target prompt

Note: only works if substring’s tokenization happens to match that of the target prompt’s tokenization

target_position_tokens(substring) tuple[int, list[int]]

Find the position of a substring in the target prompt, and return the substring tokenized

NB: The try: except block handles the difference between gpt2 and llama tokenization. Perhaps this can be better dealt with a seperate tokenizer class that handles the differences between the tokenizers. There are a few subtleties there, and tokenizing properly is important for getting the best out of your model.

compute_precision_at_1(estimated_probs: torch.Tensor, true_token_index)

Compute Precision@1 metric. From the outputs of the target (patched) model (estimated_probs) against the output of the source model, aka the ‘true’ token. Args: - estimated_probs: The estimated probabilities for each token as a torch.Tensor. - true_token_index: The index of the true token in the vocabulary. Returns: - precision_at_1: Precision@1 metric result.

This is the evaluation method of the token identity from patchscopes: https://arxiv.org/abs/2401.06102 Its used for running an evaluation over large datasets.

compute_surprisal(estimated_probs: torch.Tensor, true_token_index)

Compute Surprisal metric. From the outputs of the target (patched) model (estimated_probs) against the output of the source model, aka the ‘true’ token.

Args: - estimated_probs: The estimated probabilities for each token as a torch.Tensor. - true_token_index: The index of the true token in the vocabulary.

Returns: - surprisal: Surprisal metric result.