容器昇腾NPU跑通llama2-7B

作者:@Yikun

0. 前置条件

根据 https://github.com/cosdt/cosdt.github.io/issues/7 完成pytorch环境搭建

1
2
3
4
5
6
7
(.llm-venv) # npu-smi info

(.llm-venv) # python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);"
Warning: Device do not support double dtype now, dtype cast repalce with float.
tensor([[ 1.2800, 1.3105, 0.4513, -1.1650],
[ 3.5199, -0.2590, 2.6664, -1.9602],
[ 2.3262, -2.4671, 2.3252, -2.1502]], device='npu:0')

1. 安装Transformer

1
2
3
4
python3 -m pip install --upgrade pip
pip install transformers accelerate xformers
# Need "sentencepiece" and "protobuf==3.20.0" when convert_llama_weights_to_hf
pip install sentencepiece protobuf==3.20.0

2. 准备llama模型

准备模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# tree llama/llama-2-7b/
llama/llama-2-7b/
├── checklist.chk
├── consolidated.00.pth
└── params.json

cd llama/llama-2-7b
mkdir 7B
mv *.* 7B
cp ../tokenizer.model .

# tree -h llama/llama-2-7b/
llama/llama-2-7b/
|-- [4.0K] 7B
| |-- [ 100] checklist.chk
| |-- [ 13G] consolidated.00.pth
| `-- [ 102] params.json
`-- [488K] tokenizer.model

转换模型:

1
2
# find / -name convert_llama_weights_to_hf.py
/root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py
1
python  /root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir llama/llama-2-7b --model_size 7B --output_dir transformer/llama-2-7b

生成的模型结构如下:

1
2
3
4
5
6
7
8
9
10
11
# tree -h transformer/llama-2-7b/
transformer/llama-2-7b/
|-- [ 578] config.json
|-- [ 132] generation_config.json
|-- [9.3G] pytorch_model-00001-of-00002.bin
|-- [3.3G] pytorch_model-00002-of-00002.bin
|-- [ 26K] pytorch_model.bin.index.json
|-- [ 411] special_tokens_map.json
|-- [1.8M] tokenizer.json
|-- [488K] tokenizer.model
`-- [ 745] tokenizer_config.json

3. 运行模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from transformers import AutoTokenizer, LlamaForCausalLM
import torch
import torch_npu

# Avoid ReduceProd operator core dump, see more in: https://github.com/cosdt/llm/issues/4
option={}
option["NPU_FUZZY_COMPILE_BLACKLIST"]="ReduceProd"
torch.npu.set_option(option)

npu_id = 0
torch.npu.set_device(0)

device = "npu:{}".format(npu_id)
model_path = "/opt/yikun/transformer/llama-2-7b"
model = LlamaForCausalLM.from_pretrained(model_path).to(device)

tokenizer = AutoTokenizer.from_pretrained(model_path)

prompt = "Deep learning is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
generate_ids = model.generate(inputs.input_ids, max_length=50)

tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
'Deep learning is a branch of machine learning that is based on artificial neural networks. Deep learning is a subset of machine learning that is based on artificial neural networks. Neural networks are a type of machine learning algorithm that is inspired by the structure and'

踩到的坑:

  1. torch.npu.set_device: 设置错NPU ID后,会一直报错,即使改回来也会报错: https://github.com/cosdt/llm/issues/3
  2. torch ReduceProd算子问题:https://github.com/cosdt/llm/issues/4
  3. import transformer必须先于torch和torch_npu: https://github.com/cosdt/llm/issues/5

Comments