obvs.patchscope_base
🩺 Patchscope Base Module
This module provides a base class PatchscopeBase with helper functions and abstract methods for implementing patchscope-based models. Patchscope is a technique used for evaluating and analyzing language models by comparing the outputs of a source model with a target model.
The PatchscopeBase class offers the following functionality: - Retrieving model-specific attributes based on the model name
Abstract methods for performing forward passes on the source and target models
Abstract methods for mapping and running the patchscope process
Properties for accessing source and target token IDs and tokens
Methods for retrieving top-k tokens, logits, and probabilities from the target model
Methods for finding the position of a substring in the source and target prompts
Properties for accessing the number of layers in the source and target models
Methods for computing Precision@1 and Surprisal metrics
Subclasses of PatchscopeBase should implement the abstract methods to provide the specific functionality for their patchscope-based models.
Note: The code assumes the presence of a tokenizer object (self.tokenizer) and source and target model objects (self.source_model and self.target_model) with specific attributes.
Module Contents
Classes
A base class with lots of helper functions |
- class obvs.patchscope_base.PatchscopeBase
Bases:
abc.ABCA base class with lots of helper functions
- property _source_position: collections.abc.Sequence[int]
- property _target_position: collections.abc.Sequence[int]
- property source_token_ids: list[int]
Return the source tokens
- property target_token_ids: list[int]
Return the target tokens
- property source_tokens: list[str]
Return the input to the source model
- property target_tokens: list[str]
Return the input to the target model
- property n_layers: int
- property n_layers_source: int
- property n_layers_target: int
- get_model_specifics(model_name)
Get the model specific attributes. The following works for gpt2, llama2 and mistral models.
- abstract source_forward_pass() None
- abstract map() None
- abstract target_forward_pass() None
- abstract run() None
- top_k_tokens(k: int = 10) list[str]
Return the top k tokens from the target model
- top_k_logits(k: int = 10) list[int]
Return the top k logits from the target model
- top_k_probs(k: int = 10) list[float]
Return the top k probabilities from the target model
- logits() torch.Tensor
Return the logits from the target model (size [pos, d_vocab])
- probabilities() torch.Tensor
Return the probabilities from the target model (size [pos, d_vocab])
- output() list[str]
Return the generated output from the target model
- _output_token_ids() list[int]
- llama_output() list[str]
For llama, if you don’t decode them all together, they don’t add the spaces.
- full_output_tokens() list[str]
Return the generated output from the target model This is a bit hacky. Its not super well supported. I have to concatenate all the inputs and add the input tokens to them.
- full_output() str
Return the generated output from the target model This is a bit hacky. Its not super well supported. I have to concatenate all the inputs and add the input tokens to them.
- find_in_source(substring: str) int
Find the position of the substring tokens in the source prompt
Note: only works if substring’s tokenization happens to match that of the source prompt’s tokenization
- source_position_tokens(substring: str) tuple[int, list[int]]
Find the position of a substring in the source prompt, and return the substring tokenized
NB: The try: except block handles the difference between gpt2 and llama tokenization. Perhaps this can be better dealt with a seperate tokenizer class that handles the differences between the tokenizers. There are a few subtleties there, and tokenizing properly is important for getting the best out of your model.
- find_in_target(substring: str) int
Find the position of the substring tokens in the target prompt
Note: only works if substring’s tokenization happens to match that of the target prompt’s tokenization
- target_position_tokens(substring) tuple[int, list[int]]
Find the position of a substring in the target prompt, and return the substring tokenized
NB: The try: except block handles the difference between gpt2 and llama tokenization. Perhaps this can be better dealt with a seperate tokenizer class that handles the differences between the tokenizers. There are a few subtleties there, and tokenizing properly is important for getting the best out of your model.
- compute_precision_at_1(estimated_probs: torch.Tensor, true_token_index)
Compute Precision@1 metric. From the outputs of the target (patched) model (estimated_probs) against the output of the source model, aka the ‘true’ token. Args: - estimated_probs: The estimated probabilities for each token as a torch.Tensor. - true_token_index: The index of the true token in the vocabulary. Returns: - precision_at_1: Precision@1 metric result.
This is the evaluation method of the token identity from patchscopes: https://arxiv.org/abs/2401.06102 Its used for running an evaluation over large datasets.
- compute_surprisal(estimated_probs: torch.Tensor, true_token_index)
Compute Surprisal metric. From the outputs of the target (patched) model (estimated_probs) against the output of the source model, aka the ‘true’ token.
Args: - estimated_probs: The estimated probabilities for each token as a torch.Tensor. - true_token_index: The index of the true token in the vocabulary.
Returns: - surprisal: Surprisal metric result.