NVIDIA Megatron-LM · Discussions · GitHub

Sort by: Latest activity

Discussions

You must be logged in to vote

[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?
stale No activity in 60 days on issue or PR
starkhu asked Apr 9, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION]Where does the attention_mask come from when the gpt_model is not the first or last pipeline stage?

janelu9 asked Jun 8, 2024 in Q&A · Unanswered

0
You must be logged in to vote

Incorrect shuffling of documents across epochs in GPTDataset
stale No activity in 60 days on issue or PR
argitrage asked Feb 20, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION]why f and g must conjucates each other?
stale No activity in 60 days on issue or PR
bescks asked Mar 9, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] Why take too much time to sync up barrier information between ranks
stale No activity in 60 days on issue or PR
yanminjia asked Mar 20, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] In RotaryEmbedding, the datatype of inv_freq and the corresponding sin/cos computations should be maintained as torch.float32?
stale No activity in 60 days on issue or PR
rchardx asked Mar 21, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] why the time of one iter in nsys longer than that in the ouput log?
stale No activity in 60 days on issue or PR
hanwen-sun asked Mar 14, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] What is the difference between with/without mcore model in pretrain_gpt.py?
stale No activity in 60 days on issue or PR
TING2938 asked Feb 22, 2024 in Q&A · Unanswered

2
You must be logged in to vote

Does Megatron has plan to support Gemma？
stale No activity in 60 days on issue or PR
anlongfei asked Feb 26, 2024 in Q&A · Unanswered

3
You must be logged in to vote

How to convert Llama-2 huggingface checkpoint to the megatron format
stale No activity in 60 days on issue or PR
ken-arf asked Jan 10, 2024 in Q&A · Unanswered

6
You must be logged in to vote

[QUESTION] What is the retrieval datasets when evaluating downstream tasks?
stale No activity in 60 days on issue or PR
ZihaoLin0123 asked Feb 27, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] Megatron-LM installation with CUDA 11.6
stale No activity in 60 days on issue or PR
ghtaro asked Feb 22, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[ENHANCEMENT] Do you have a plan that supports Mixtral 8x7B?
stale No activity in 60 days on issue or PR
matrixssy started Jan 4, 2024 in Ideas

7
You must be logged in to vote

[QUESTION]Why forward_backward_pipelining_without_interleaving cannot open config.overlap_p2p_comm?
stale No activity in 60 days on issue or PR
zhouyiyuan-mt asked Feb 4, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] How to release the model and optimizer memory manually?
stale No activity in 60 days on issue or PR
robotsp asked Jan 15, 2024 in Q&A · Unanswered

7
You must be logged in to vote

[QUESTION] How to set --rotary-seq-len-interpolation-factor for rope scaling?
stale No activity in 60 days on issue or PR
eagle705 asked Jan 26, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] How to re-initialize process group after destroy_process_group() ?
stale No activity in 60 days on issue or PR
robotsp asked Feb 2, 2024 in Q&A · Unanswered

1
You must be logged in to vote

How to split the dataset when running pretrain_bert.py
stale No activity in 60 days on issue or PR
Druva24 asked Jan 23, 2024 in Q&A · Unanswered

3
You must be logged in to vote

[QUESTION] Why write a special LinearWithFrozenWeight?
stale No activity in 60 days on issue or PR
wangbo233 asked Jan 22, 2024 in Q&A · Unanswered

1
You must be logged in to vote

question about test_global_memory_buffer
stale No activity in 60 days on issue or PR
wangxicoding asked Jan 5, 2024 in Q&A · Unanswered

1