[Feat] return hidden states #3364

Jackmin801 · 2025-02-07T05:38:18Z

Motivation

This PR intends to add the return_hidden_states argument to ServerArgs which makes the results contain the last layer hidden states in output["meta_info"]["hidden_states"].
These hidden states are useful for example for verifying computations. (e.g. https://arxiv.org/abs/2501.16007)

Modifications

Add return_hidden_states to ServerArgs
Changed the logic to determine capture_hidden_mode to accomodate return_hidden_states
Modify scheduler process_batch_results to save the hidden state to the Req
Add return_hidden_states and hidden_states to necessary dataclasses

Script used to test changes

# launch the offline engine
import asyncio
from transformers import AutoTokenizer
import sglang as sgl

def main():
    MODEL_NAME = "meta-llama/Meta-Llama-3.1-8B-Instruct"
    llm = sgl.Engine(
        model_path=MODEL_NAME,
        skip_tokenizer_init=True,
        disable_cuda_graph=False,
        return_hidden_states=False,
    )
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]

    sampling_params = {"temperature": 0.8, "top_p": 0.95, "max_new_tokens": 10}

    input_ids = tokenizer(prompts).input_ids
    #outputs = llm.generate(input_ids=input_ids, sampling_params=sampling_params)
    outputs = llm.generate(prompts, sampling_params=sampling_params)
    for input_id, output in zip(input_ids, outputs):
        print("===============================")
        print(input_id)
        print(output)
        print()
        if "token_ids" in output:
            print(input_id, output["token_ids"], len(input_id), len(output["token_ids"]))
        else:
            print(output['text'])
        if "hidden_states" in output["meta_info"]:
            print(
                [i.shape for i in output["meta_info"]["hidden_states"]],
                len(output["meta_info"]["hidden_states"]),
            )

if __name__ == "__main__":
    main()

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

zhaochenyang20 · 2025-02-07T06:35:04Z

This is good to see. But could change our documents to demonstrate the usage and add unit tests to your feature?

docs/backend/server_arguments.md

zhaochenyang20 · 2025-02-07T08:43:07Z

test/srt/test_srt_engine.py

@@ -184,6 +184,28 @@ def test_7_engine_offline_throughput(self):
        result = throughput_test(server_args=server_args, bench_args=bench_args)
        self.assertGreater(result["total_throughput"], 3000)

+    def test_8_return_hidden_states(self):


quite a strange name.

Also, should assert that the hidden state is all close with hugging face.

Yea, all the names in this file are strange. Maybe we should remove the numbers?

I cant quite get the huggingface tensor to be similar. It only works when there is less than or equal to 1 decode. When I tried with 2 decodes, the test fails. The values seem wildly different, so maybe its not about numerics. Will debug further

Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.25it/s] Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.25it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:06<00:00, 3.36it/s] [128000, 15724, 374] [264, 1938, 315] tensor([[[ 0.9727, 0.1924, 0.6133, ..., -0.8438, -0.2852, 0.0645], [ 0.6797, 3.2656, 1.9453, ..., -5.3750, -4.0625, 1.9922], [ 0.7812, 1.3750, 1.3047, ..., -5.0625, -3.6094, -0.2715], [ 2.8906, 1.3516, 1.9609, ..., -4.0312, -3.0312, -0.3809], [-0.2949, 4.1250, 0.2578, ..., -3.0469, -5.5938, 0.9219]]], device='cuda:0', dtype=torch.bfloat16) === tensor([[ 1.0078, 0.1855, 0.6328, ..., -0.8750, -0.2812, 0.0601], [ 0.6953, 3.2500, 1.9375, ..., -5.3750, -4.0312, 1.9453], [ 0.7656, 1.4062, 1.3203, ..., -5.0938, -3.6250, -0.2715], [ 0.1807, -0.6289, 0.6953, ..., 0.5977, -0.5742, -0.0889], [-0.1406, -0.0605, -0.3848, ..., 0.4863, -0.4062, 0.0684]], dtype=torch.bfloat16) F ====================================================================== FAIL: test_8_engine_return_hidden_states (test_srt_engine.TestSRTEngine.test_8_engine_return_hidden_states) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ubuntu/sglang/test/srt/test_srt_engine.py", line 238, in test_8_engine_return_hidden_states self.assertTrue( AssertionError: False is not true ---------------------------------------------------------------------- Ran 1 test in 18.811s

This reverts commit fc64fdc. Revert "add docs in server args" This reverts commit ef315c2.

zhaochenyang20 · 2025-02-07T18:11:16Z

Thanks. I will try to get some one familiar with hidden state to help.

zhaochenyang20 · 2025-02-07T18:12:45Z

https://github.com/sgl-project/sglang/blob/main/test/srt/models/test_generation_models.py

You can check this, may gonna help.

Jackmin801 added 8 commits February 6, 2025 08:14

extract hidden states

54b524d

include meow.py

f5a8a4d

allow cuda graph runner

e6414cc

add return hidden states as engine arg

73e5305

change meow script

eb4f93a

lint

5e9ce35

add cli arg

52dc2cb

forward from detokenizer

371fe0e

Jackmin801 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners February 7, 2025 05:38

Jackmin801 added 3 commits February 7, 2025 05:41

fix: dont error on embedding model

9cb1111

remove testing script

64232cb

style

7c73a30

Jackmin801 force-pushed the feat-hidden_states branch from fdbd188 to 7c73a30 Compare February 7, 2025 05:41

Merge branch 'main' into feat-hidden_states

b0e4765

Jackmin801 added 3 commits February 7, 2025 08:26

add docs in server args

ef315c2

add example

fc64fdc

test: add test

5ff6edc

zhaochenyang20 requested changes Feb 7, 2025

View reviewed changes

Jackmin801 added 4 commits February 7, 2025 10:02

add example to offline engine api

9dfdbff

Revert "add example"

09be3af

This reverts commit fc64fdc. Revert "add docs in server args" This reverts commit ef315c2.

add comparison to hf [skip ci]

496b572

add 1 decode to test [skip ci]

a061616

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] return hidden states #3364

[Feat] return hidden states #3364

Jackmin801 commented Feb 7, 2025 •

edited

Loading

zhaochenyang20 commented Feb 7, 2025

zhaochenyang20 Feb 7, 2025

Jackmin801 Feb 7, 2025

Jackmin801 Feb 7, 2025

zhaochenyang20 commented Feb 7, 2025

zhaochenyang20 commented Feb 7, 2025

[Feat] return hidden states #3364

Are you sure you want to change the base?

[Feat] return hidden states #3364

Conversation

Jackmin801 commented Feb 7, 2025 • edited Loading

Motivation

Modifications

Checklist

zhaochenyang20 commented Feb 7, 2025

zhaochenyang20 Feb 7, 2025

Choose a reason for hiding this comment

Jackmin801 Feb 7, 2025

Choose a reason for hiding this comment

Jackmin801 Feb 7, 2025

Choose a reason for hiding this comment

zhaochenyang20 commented Feb 7, 2025

zhaochenyang20 commented Feb 7, 2025

Jackmin801 commented Feb 7, 2025 •

edited

Loading