作者:@Yikun
0. 前置条件 根据 https://github.com/cosdt/cosdt.github.io/issues/7 完成pytorch环境搭建
1 2 3 4 5 6 7 (.llm-venv) # npu-smi info (.llm-venv) # python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);" Warning: Device do not support double dtype now, dtype cast repalce with float. tensor([[ 1.2800, 1.3105, 0.4513, -1.1650], [ 3.5199, -0.2590, 2.6664, -1.9602], [ 2.3262, -2.4671, 2.3252, -2.1502]], device='npu:0')
1 2 3 4 python3 -m pip install --upgrade pip pip install transformers accelerate xformers # Need "sentencepiece" and "protobuf==3.20.0" when convert_llama_weights_to_hf pip install sentencepiece protobuf==3.20.0
2. 准备llama模型 准备模型:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # tree llama/llama-2-7b/ llama/llama-2-7b/ ├── checklist.chk ├── consolidated.00.pth └── params.json cd llama/llama-2-7b mkdir 7B mv *.* 7B cp ../tokenizer.model . # tree -h llama/llama-2-7b/ llama/llama-2-7b/ |-- [4.0K] 7B | |-- [ 100] checklist.chk | |-- [ 13G] consolidated.00.pth | `-- [ 102] params.json `-- [488K] tokenizer.model
转换模型:
1 2 # find / -name convert_llama_weights_to_hf.py /root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py
1 python /root/.llm-venv/lib/python3.8/site-packages/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir llama/llama-2-7b --model_size 7B --output_dir transformer/llama-2-7b
生成的模型结构如下:
1 2 3 4 5 6 7 8 9 10 11 # tree -h transformer/llama-2-7b/ transformer/llama-2-7b/ |-- [ 578] config.json |-- [ 132] generation_config.json |-- [9.3G] pytorch_model-00001-of-00002.bin |-- [3.3G] pytorch_model-00002-of-00002.bin |-- [ 26K] pytorch_model.bin.index.json |-- [ 411] special_tokens_map.json |-- [1.8M] tokenizer.json |-- [488K] tokenizer.model `-- [ 745] tokenizer_config.json
3. 运行模型 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 from transformers import AutoTokenizer, LlamaForCausalLM import torch import torch_npu # Avoid ReduceProd operator core dump, see more in: https://github.com/cosdt/llm/issues/4 option={} option["NPU_FUZZY_COMPILE_BLACKLIST"]="ReduceProd" torch.npu.set_option(option) npu_id = 0 torch.npu.set_device(0) device = "npu:{}".format(npu_id) model_path = "/opt/yikun/transformer/llama-2-7b" model = LlamaForCausalLM.from_pretrained(model_path).to(device) tokenizer = AutoTokenizer.from_pretrained(model_path) prompt = "Deep learning is" inputs = tokenizer(prompt, return_tensors="pt").to(device) generate_ids = model.generate(inputs.input_ids, max_length=50) tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] 'Deep learning is a branch of machine learning that is based on artificial neural networks. Deep learning is a subset of machine learning that is based on artificial neural networks. Neural networks are a type of machine learning algorithm that is inspired by the structure and'
踩到的坑:
torch.npu.set_device: 设置错NPU ID后,会一直报错,即使改回来也会报错: https://github.com/cosdt/llm/issues/3
torch ReduceProd算子问题:https://github.com/cosdt/llm/issues/4
import transformer必须先于torch和torch_npu: https://github.com/cosdt/llm/issues/5