dataclass  ¶
 Source code in vllm/v1/worker/utils.py
  
 __init__(
    backend: type[AttentionBackend],
    metadata_builders: list[AttentionMetadataBuilder],
    layer_names: list[str],
    kv_cache_spec: KVCacheSpec,
) -> None
 staticmethod  ¶
 create_with_metadata_builders(
    backend: type[AttentionBackend],
    layer_names: list[str],
    kv_cache_spec: KVCacheSpec,
    vllm_config: VllmConfig,
    device: device,
    num_metadata_builders: int = 1,
) -> AttentionGroup
Source code in vllm/v1/worker/utils.py
  
 get_metadata_builder(
    ubatch_id: int = 0,
) -> AttentionMetadataBuilder
 
 Helper class to calculate budget information for multi-modal models.
Source code in vllm/v1/worker/utils.py
 | 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |  | 
 instance-attribute  ¶
   instance-attribute  ¶
   
 __init__(
    model_config: ModelConfig,
    scheduler_config: SchedulerConfig,
    mm_registry: MultiModalRegistry,
) -> None
Source code in vllm/v1/worker/utils.py
  
  Source code in vllm/v1/worker/utils.py
  
 add_kv_sharing_layers_to_kv_cache_groups(
    shared_kv_cache_layers: dict[str, str],
    kv_cache_groups: list[KVCacheGroupSpec],
    runner_only_attn_layers: set[str] | None = None,
) -> None
Sets up KV cache sharing by reusing the allocated KV caches in kv_caches for layers that do not allocate its own KV cache, based on the mapping in shared_kv_cache_layers. Adds these layers to the corresponding KV cache group, which is needed to ensure that attention metadata is assigned later.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| shared_kv_cache_layers | dict[str, str] | Layer pairings for cross-layer KV sharing. If an Attention layer  | required | 
| kv_cache_groups | list[KVCacheGroupSpec] | The KV cache groups of the model. | required | 
Source code in vllm/v1/worker/utils.py
  
 bind_kv_cache(
    kv_caches: dict[str, Tensor],
    forward_context: dict[str, Attention],
    runner_kv_caches: list[Tensor],
    num_attn_module: int | None = 1,
) -> None
Bind the allocated KV cache to both ModelRunner and forward context so that the KV cache can be used in the forward pass.
This function
1) Fills the ModelRunner's kv cache list (runner_kv_caches) with kv_caches. 2) Associates each attention layer in the forward_context with its corresponding KV cache in kv_caches.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| kv_caches | dict[str, Tensor] | The allocated kv_caches with layer names as keys. | required | 
| forward_context | dict[str, Attention] | The global forward context containing all Attention layers with layer names as keys. | required | 
| runner_kv_caches | list[Tensor] | The kv_cache declared by ModelRunner. | required | 
Source code in vllm/v1/worker/utils.py
  
  Reconstructs the embeddings from the placeholder tokens.
This is the operation of scatter_mm_placeholders.
Source code in vllm/v1/worker/utils.py
  
 is_residual_scattered_for_sp(
    vllm_config: VllmConfig, num_input_tokens: int
) -> bool
Check if the residual tensor is scattered for sequence parallelism.
The residual tensor is scattered across tensor parallel ranks when sequence parallelism and tensor parallelism is enabled.
This follows the same logic as SequenceParallelismPass.is_applicable(): - In full-graph compilation mode (no splitting ops or using inductor graph partition), SP is always applied - Otherwise, SP is only applied for specific shapes in compile_sizes
Source code in vllm/v1/worker/utils.py
  
 sanity_check_mm_encoder_outputs(
    mm_embeddings: MultiModalEmbeddings,
    expected_num_items: int,
) -> None
Perform sanity checks for the result of vllm.model_executor.models.SupportsMultiModal.get_multimodal_embeddings.
Source code in vllm/v1/worker/utils.py
  
  Scatter the multimodal embeddings into a contiguous tensor that represents the placeholder tokens.
vllm.multimodal.processing.PromptUpdateDetails.is_embed.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| embeds | Tensor | The multimodal embeddings. Shape:  | required | 
| is_embed | Tensor | None | A boolean mask indicating which positions in the placeholder tokens need to be filled with multimodal embeddings. Shape:  | required |