Aciddelgado/continuous #867

aciddelgado · 2024-09-03T17:47:41Z

No description provided.

Results are validated with model-generate.py by using a int4 quantized model as the original model's assistant. The output sequence is the same and increased tps is observed. NOTE: Only MHA decoder only models, batch size 1, CPU, greedy select top is supported in this initial version. GQA needs microsoft/onnxruntime#21523 to support seqlen > 1 in token phase. * Updated builder.py to produce MHA graph that supports seqlen > 1 in token phase. * Introduce speculative decoding currently through a separate Generator class. This can be merged with existing Generator potentially on either API level or implementation level. * Extended various components for functionalities to support speculative search. Previously most methods are hardcoded assuming seqlen == 1 for token phase.

…ts pass, i'm so tired

benchmark/python/benchmark_e2e_continuous_test.py

BowenBao and others added 21 commits August 2, 2024 16:53

merge main

dec83aa

make build

396f17a

remove unnecessary

0dd4572

so ryan can see changes

1029333

decoder only cpu greedy works

3e3d56a

clean up comments

9f7d0e0

cuda working i think

b80878e

move input ids back to where they were

9e03c4f

fix batch_size > 1

fc3a0d3

working on rewind

0d971fd

b size 1 cpu reverse working

2fed10c

rewind working on cuda

1c86984

small stuff

60c42c6

merge main and remove batch_size duplication

baf605b

beam search working

71ab61a

work on multimodal

89e1bb0

it works but has comments in it

02e10c9

remove prints

b085bb0

change names, many fixes on beam search and other searches, all c tes…

0a6d0e3

…ts pass, i'm so tired

benchmark thingy

92a457c

github-advanced-security bot found potential problems Oct 11, 2024

View reviewed changes

aciddelgado added 8 commits October 15, 2024 09:07

csharp start

53e065b

Merge branch 'main' into aciddelgado/continuous

f85f680

possibly functional dml changes

b4a0b30

dml and windows build

6ea3343

fix test

90391de

fixes test

a8eb610

dml logic fix mask

16e7b71

dml builasdf

f61a5f6

aciddelgado added 30 commits October 16, 2024 12:29

pipeeee

db8b157

cuda fixes

2747cec

java and remove header thingy generate

4a8f8b2

dml typeo

c129453

this file

9fd7441

api names

ca4f6aa

some cleaning

5227db4

java/android

e35a636

merge main

b44072e

error checks

77e2cc2

move setlogits and work on tests

9ab9ddb

bleh

d17baca

f

525d336

java doc thing

8b4ba93

changes

b6bfce9

adf

9f643ae

f

ad2c7fc

cehat

18eaf78

h

d334034

asdf

3570570

asdf

bbdd2ab

asdf

bb757cd

asdf

c683a7b

hm

f4fe52a

Merge branch 'main' into aciddelgado/continuous

7c68838

fix bugs and add rewind tests

04a5809

start removing dml changes

eea39de

remove dml

00ebb34

clean up position ids, segfault in rewind test

bffd4fa

fix bug and add tests

cbd8883

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aciddelgado/continuous #867

Aciddelgado/continuous #867

aciddelgado commented Sep 3, 2024

Aciddelgado/continuous #867

Are you sure you want to change the base?

Aciddelgado/continuous #867

Conversation

aciddelgado commented Sep 3, 2024