BladeDISC
BladeDISC是阿里巴巴开源的端到端机器学习编译器,本文档主要介绍BladeDISC在中的DeepRec使用。BladeDISC开源项目地址: https://github.com/alibaba/BladeDISC .
目前DeepRec和BladeDISC暂时不能通过源码直接编译,后续我们会重构到使用此方式。目前我们需要通过编译生成BladeDISC whl包,并且在用户代码中import blade_disc来使用。对于使用C++进行serving的场景,serving框架需要link生成的BladeDISC的so。具体的步骤如下。
DeepRec编译
sudo nvidia-docker run -it --name=deeprec --net=host --gpus all -v /home/workspace:/home/workspace registry.cn-shanghai.aliyuncs.com/pai-dlc-share/deeprec-developer:deeprec-dev-gpu-py36-cu110-ubuntu18.04 bash
具体编译步骤见:DeepRec-Compile-And-Install,生成whl包。我们需要将deeprec whl安装在docker中,BladeDISC的编译依赖安装好的deeprec。
注意:目前编译DeepRec和BladeDISC需要的bazel版本不一致(这也是目前不能直接源码编译的原因之一,后续我们会升级到相同版本),所以下面编译BladeDISC,我们使用virtualenv环境。
BladeDISC编译
编译步骤如下:
安装生成的DeepRec whl包
clone BladeDISC代码
git clone https://github.com/alibaba/BladeDISC.git
git checkout features/deeprec2208-cu114
git submodule update --init --recursive
安装编译环境
# prepare venv
pip3 install virtualenv
python3 -m virtualenv /opt/venv_disc/
source /opt/venv_disc/bin/activate
# 安装上面编译出来的whl包
pip3 install tensorflow-1.15.5+deeprec2208-cp36-cp36m-linux_x86_64.whl
# 安装bazel
cd BladeDISC
apt-get update
bash ./docker/scripts/install-bazel.sh
编译BladeDISC
# configure
./scripts/python/tao_build.py /opt/venv_disc/ --compiler-gcc default --bridge-gcc default -s configure
# 生成libtao_ops.so,生成路径是tao/bazel-bin/libtao_ops.so
./scripts/python/tao_build.py /opt/venv_disc/ -s build_tao_bridge
# 生成tao_compiler_main
# 生成路径是tf_community/bazel-bin/tensorflow/compiler/decoupling/tao_compiler_main
./scripts/python/tao_build.py /opt/venv_disc/ -s build_tao_compiler
# 生成disc whl包
cp tf_community/bazel-bin/tensorflow/compiler/decoupling/tao_compiler_main tao/python/blade_disc_tf
cp tao/bazel-bin/libtao_ops.so tao/python/blade_disc_tf
cd tao
python3 setup.py bdist_wheel
编译后的whl包在dist目录下。
安装生成的whl包
pip install dist/blade_disc_tf1155-0.2.0-py3-none-any.whl
c++推理使用方式
c++推理代码在编译时需要链接libtao_ops.so,并且在执行时需要设置以下两个环境变量打开disc优化:
export BRIDGE_ENABLE_TAO=true
export TAO_COMPILER_PATH=/path-to/tao_compiler_main
以tensorflow_serving为例,我们可以在编译时通过-L指定libtao_ops.so所在路径(-L/xxx/xxx/mylib/),或者拷贝libtao_ops.so到系统lib路径下(例如:/usr/local/lib/),这样能将libtao_ops.so链接到tensorflow_serving中。
假设编译出来的libtao_ops.so位置在:/xxx/libtao_ops.so ,对于so要做一些处理如下:
apt-get update && apt-get install patchelf
patchelf --remove-needed libtensorflow_framework.so.1 /xxx/libtao_ops.so
# 下面为运行期准备
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/xxx/
export LD_LIBRARY_PATH
一些环境准备
apt-get update
apt-get install autotools-dev
apt-get install automake
apt-get install libtool
export TF_CUDA_COMPUTE_CAPABILITIES="7.0,7.5,8.0"
在tensorflow_serving中的修改如下:tensorflow_serving/model_servers/BUILD
cc_binary(
name = "tensorflow_model_server_main_lib",
...
deps = [
...
"@org_tensorflow//tensorflow/core/platform/hadoop:hadoop_file_system",
"@org_tensorflow//tensorflow/core/platform/s3:s3_file_system",
+ "@org_tensorflow//tensorflow/stream_executor",
+ "@org_tensorflow//tensorflow/stream_executor:stream_executor_impl",
+ "@org_tensorflow//tensorflow/stream_executor:stream_executor_internal",
+ "@org_tensorflow//tensorflow/stream_executor:stream_executor_pimpl",
+ "@org_tensorflow//tensorflow/stream_executor:kernel_spec",
+ "@org_tensorflow//tensorflow/stream_executor:kernel",
+ "@org_tensorflow//tensorflow/stream_executor:scratch_allocator",
+ "@org_tensorflow//tensorflow/stream_executor:timer",
+ "@org_tensorflow//tensorflow/stream_executor/host:host_platform",
+ ],
+ linkopts = [
+ "-ltao_ops -L/xxx/",
+ "-Wl,-no-as-needed",
],
...
同时由于上面BUILD文件中引入了stream_executor,而stream_executor目前的visiablity是"friends",不是"public",这里我们需要将tensorflow_serving引用的DeepRec文件./tensorflow/stream_executor/BUILD修改如下:
package(
- default_visibility = [":friends"],
+ default_visibility = ["//visibility:public"],
licenses = ["notice"], # Apache 2.0
)
BladeDISC引用DeepRec编译时,默认GLIBCXX_USE_CXX11_ABI=1,而tensorflow_serving默认GLIBCXX_USE_CXX11_ABI=0,所以两边需要统一。本文档以修改tensorflow_serving的.bazelrc文件为例:
- build --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0
+ build --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=1
最终编译命令:
bazel build -c opt --config=cuda tensorflow_serving/...
tensorflow_serving具体编译详见:tfs编译