久久久99婷婷久久久,亚洲一区二区三区影院,国产香蕉九九久久精品免费

ChatGLM3-6B 簡介

ChatGLM3 是智譜 AI 和清華大學(xué) KEG 實驗室聯(lián)合發(fā)布的新一代對話預(yù)訓(xùn)練模型。ChatGLM3-6B 是 ChatGLM3 系列中的開源模型，在填寫問卷進行登記后亦允許免費商業(yè)使用。

問卷：

引用自：https://github.com/THUDM/ChatGLM3

請使用命令，將 ChatGLM3-6B 模型下載到本地 (例如，保存到 D 盤) ：

git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git

左滑查看更多

BigDL-LLM 簡介

BigDL-LLM 是開源，遵循 Apache 2.0許可證，專門用于在英特爾的硬件平臺上加速大語言模型（Large Language Model, LLM）推理計算的軟件工具包。它是在原有的 BigDL 框架基礎(chǔ)上，為了應(yīng)對大語言模型在推理過程中對性能和資源的高要求而設(shè)計的。BigDL-LLM 旨在通過優(yōu)化和硬件加速技術(shù)來提高大語言模型的運行效率，減少推理延遲，并降低資源消耗。

本文將詳細介紹基于 BigDL-LLM 在英特爾獨立顯卡上量化和部署 ChatGLM3-6B 模型。

部署平臺簡介：

算力魔方是一款可以 DIY 的迷你主機，采用了抽屜式設(shè)計，后續(xù)組裝、升級、維護只需要拔插模塊。

通過選擇計算模塊的版本，再搭配不同額 IO 模塊可以組成豐富的配置，適應(yīng)不同場景。性能不夠時，可以升級計算模塊提升算力， IO 接口不匹配時，可以更換 IO 模塊調(diào)整功能，而無需重構(gòu)整個系統(tǒng)。本文在帶有 A380獨立顯卡的算力模方上完成驗證。

在英特爾獨立顯卡上部署 ChatGLM3-6B

4.1

搭建開發(fā)環(huán)境

第一步：請下載并安裝 Visual Studio 2022 Community Edition。安裝時務(wù)必選擇“使用 C++的桌面開發(fā)”。注意：不要修改默認安裝路徑！

下載鏈接：

第二步：請下載并安裝英特爾獨立顯卡驅(qū)動程序。

下載鏈接：

https://www.intel.cn/content/www/cn/zh/download/785597/intel-arc-iris-xe-graphics-windows.html

第三步：請下載并安裝 Intel oneAPI Base Toolkit。

下載鏈接：

https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html

第四步：請下載并安裝 Anaconda，然后用下面的命令創(chuàng)建名為“bigdl”的虛擬環(huán)境。

conda create -n bigdl python=3.9 libuv
conda activate bigdl

4.2

安裝 BigDL-LLM[xpu]

第一步：用下載器 (例如：迅雷) 下載*.whl 安裝包到本地。

下載鏈接：

https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/torch-2.1.0a0%2Bcxx11.abi-cp39-cp39-win_amd64.whl

https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/torchvision-0.16.0a0%2Bcxx11.abi-cp39-cp39-win_amd64.whl

https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/xpu/intel_extension_for_pytorch-2.1.10%2Bxpu-cp39-cp39-win_amd64.whl

第二步：執(zhí)行命令：

# 從本地安裝已下載的.whl安裝包
pip install torch-2.1.0a0+cxx11.abi-cp39-cp39-win_amd64.whl
pip install torchvision-0.16.0a0+cxx11.abi-cp39-cp39-win_amd64.whl
pip install intel_extension_for_pytorch-2.1.10+xpu-cp39-cp39-win_amd64.whl


# 安裝支持英特爾顯卡的bigdl-llm
pip install --pre --upgrade bigdl-llm[xpu] -i https://mirrors.aliyun.com/pypi/simple/

左滑查看更多

詳情參考：

https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html

4.3

第三步：運行范例程序

首先：執(zhí)行命令，配置環(huán)境變量：

conda activate bigdl
call "C:Program Files (x86)InteloneAPIsetvars.bat"
set SYCL_CACHE_PERSISTENT=1
set BIGDL_LLM_XMX_DISABLED=1

左滑查看更多

若系統(tǒng)中有集成顯卡，請執(zhí)行下面的命令，保證英特爾獨立顯卡是“xpu”指代的計算設(shè)備，

詳情參考：

https://github.com/intel-analytics/BigDL/issues/9768

set ONEAPI_DEVICE_SELECTOR=level_zero:1

左滑查看更多

然后，請下載范例程序并運行：

https://gitee.com/Pauntech/chat-glm3/blob/master/chatglm3_infer_gpu.py

import time
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
import torch
CHATGLM_V3_PROMPT_FORMAT = "<|user|>
{prompt}
<|assistant|>"
# 請指定chatglm3-6b的本地路徑
model_path = "d:/chatglm3-6b"
# 載入ChatGLM3-6B模型并實現(xiàn)INT4量化
model = AutoModel.from_pretrained(model_path,
                 load_in_4bit=True,
                 trust_remote_code=True)
# run the optimized model on Intel GPU
model = model.to('xpu')
# 載入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
                     trust_remote_code=True)
# 制作ChatGLM3格式提示詞  
prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt="What is Intel?")
# 對提示詞編碼
input_ids = tokenizer.encode(prompt, return_tensors="pt")
input_ids = input_ids.to('xpu')
st = time.time()
# 執(zhí)行推理計算，生成Tokens
output = model.generate(input_ids,max_new_tokens=32)
end = time.time()
# 對生成Tokens解碼并顯示
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str)

運行結(jié)果，如下所示：

4.4

運行 ChatGLM3-6B WebUI demo

首先，請先安裝依賴軟件包：

pip install gradio mdtex2html streamlit -i https://mirrors.aliyun.com/pypi/simple/

然后，運行命令，配置環(huán)境變量：

conda activate bigdl
call "C:Program Files (x86)InteloneAPIsetvars.bat"
set SYCL_CACHE_PERSISTENT=1
set BIGDL_LLM_XMX_DISABLED=1

若系統(tǒng)中有集成顯卡，請執(zhí)行下面的命令，保證英特爾獨立顯卡是“xpu”指代的計算設(shè)備。

詳情參考：

https://github.com/intel-analytics/BigDL/issues/9768

set ONEAPI_DEVICE_SELECTOR=level_zero:1

最后，請下載范例程序：

https://gitee.com/Pauntech/chat-glm3/blob/master/chatglm3_web_demo_gpu.py

并運行：

streamlit run chatglm3_web_demo_gpu.py

左滑查看更多

運行結(jié)果如下：

總結(jié)

BigDL-LLM 工具包簡單易用，僅需三步即可完成開發(fā)環(huán)境搭建、bigdl-llm[xpu]安裝以及 ChatGLM3-6B 模型的 INT4量化以及在英特爾獨立顯卡上的部署。

審核編輯：劉清

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報投訴