`obvs.patchscope_base`

🩺 Patchscope Base Module

This module provides a base class PatchscopeBase with helper functions and abstract methods for implementing patchscope-based models. Patchscope is a technique used for evaluating and analyzing language models by comparing the outputs of a source model with a target model.

The PatchscopeBase class offers the following functionality: - Retrieving model-specific attributes based on the model name

Abstract methods for performing forward passes on the source and target models
Abstract methods for mapping and running the patchscope process
Properties for accessing source and target token IDs and tokens
Methods for retrieving top-k tokens, logits, and probabilities from the target model
Methods for finding the position of a substring in the source and target prompts
Properties for accessing the number of layers in the source and target models
Methods for computing Precision@1 and Surprisal metrics

Subclasses of PatchscopeBase should implement the abstract methods to provide the specific functionality for their patchscope-based models.

Note: The code assumes the presence of a tokenizer object (self.tokenizer) and source and target model objects (self.source_model and self.target_model) with specific attributes.

Module Contents

Classes

PatchscopeBase

A base class with lots of helper functions

class obvs.patchscope_base.PatchscopeBase

Bases: abc.ABC

A base class with lots of helper functions

property _source_position: collections.abc.Sequence[int]

property _target_position: collections.abc.Sequence[int]

property source_token_ids: list[int]: Return the source tokens

property target_token_ids: list[int]: Return the target tokens

property source_tokens: list[str]: Return the input to the source model

property target_tokens: list[str]: Return the input to the target model

property n_layers: int

property n_layers_source: int

property n_layers_target: int

get_model_specifics(model_name): Get the model specific attributes. The following works for gpt2, llama2 and mistral models.

abstract source_forward_pass() → None

abstract map() → None

abstract target_forward_pass() → None

abstract run() → None

top_k_tokens(k: int = 10) → list[str]: Return the top k tokens from the target model

top_k_logits(k: int = 10) → list[int]: Return the top k logits from the target model

top_k_probs(k: int = 10) → list[float]: Return the top k probabilities from the target model

logits() → torch.Tensor: Return the logits from the target model (size [pos, d_vocab])

probabilities() → torch.Tensor: Return the probabilities from the target model (size [pos, d_vocab])

output() → list[str]: Return the generated output from the target model

_output_token_ids() → list[int]

llama_output() → list[str]: For llama, if you don’t decode them all together, they don’t add the spaces.

full_output_tokens() → list[str]: Return the generated output from the target model This is a bit hacky. Its not super well supported. I have to concatenate all the inputs and add the input tokens to them.

full_output() → str: Return the generated output from the target model This is a bit hacky. Its not super well supported. I have to concatenate all the inputs and add the input tokens to them.

find_in_source(substring: str) → int

Find the position of the substring tokens in the source prompt

Note: only works if substring’s tokenization happens to match that of the source prompt’s tokenization

source_position_tokens(substring: str) → tuple[int, list[int]]

Find the position of a substring in the source prompt, and return the substring tokenized

NB: The try: except block handles the difference between gpt2 and llama tokenization. Perhaps this can be better dealt with a seperate tokenizer class that handles the differences between the tokenizers. There are a few subtleties there, and tokenizing properly is important for getting the best out of your model.

find_in_target(substring: str) → int

Find the position of the substring tokens in the target prompt

Note: only works if substring’s tokenization happens to match that of the target prompt’s tokenization

target_position_tokens(substring) → tuple[int, list[int]]

Find the position of a substring in the target prompt, and return the substring tokenized

NB: The try: except block handles the difference between gpt2 and llama tokenization. Perhaps this can be better dealt with a seperate tokenizer class that handles the differences between the tokenizers. There are a few subtleties there, and tokenizing properly is important for getting the best out of your model.

compute_precision_at_1(estimated_probs: torch.Tensor, true_token_index)

Compute Precision@1 metric. From the outputs of the target (patched) model (estimated_probs) against the output of the source model, aka the ‘true’ token. Args: - estimated_probs: The estimated probabilities for each token as a torch.Tensor. - true_token_index: The index of the true token in the vocabulary. Returns: - precision_at_1: Precision@1 metric result.

This is the evaluation method of the token identity from patchscopes: https://arxiv.org/abs/2401.06102 Its used for running an evaluation over large datasets.

compute_surprisal(estimated_probs: torch.Tensor, true_token_index)

Compute Surprisal metric. From the outputs of the target (patched) model (estimated_probs) against the output of the source model, aka the ‘true’ token.

Args: - estimated_probs: The estimated probabilities for each token as a torch.Tensor. - true_token_index: The index of the true token in the vocabulary.

Returns: - surprisal: Surprisal metric result.

obvs.patchscope_base

Module Contents

Classes

`obvs.patchscope_base`