Pytorch2.6 for intel Mac with Metal acceleration on AMD GPU on Python==3.10
1️⃣Abstract/前言 1.Apple官方为intel芯片的Mac提供的Pytoch版本仅支持到Pytoch=2.2Version,https://developer.apple.com/metal/pytorch/ 2.所以我创建了Pytorch2.6 for intel Mac with Metal acceleration on AMD GPU,以更好的为老款Mac提供MPS加速支持和更高版本的Pytorch和TorchVision 3.如需whl版本,可以直接到Release下载,支持Python=3.10 and TorchVision=v0.21.0 https://github.com/Kinghammer1/Pytorch2.6-for-intel-Mac-with-Metal-acceleration-MPS-in-AMD-GPU 4.来源:https://github.com/pytorch/pytorch 5.张量计算的简单对比: CUP VS MPS(AMD GPU Pytorch2.6) 性能比较: CPU总时间: 0.6217秒 MPS总时间: 0.0069秒 加速比 (CPU/MPS): 89.74x 🎉 MPS比CPU快 89.74 倍 CPU和MPS结果最大差异: 0.00023651
2️⃣Using Directly/直接使用 Download whl file from Release Python 3.10 Environment pip install torch-2.6.0a0+git1eba9b3-cp310-cp310-macosx_11_0_x86_64.whl pip install torchvision-0.21.0+7af6987-cp310-cp310-macosx_11_0_x86_64.whl
3️⃣Methods/构建方法 来自Deepseek,已经验证可行 如果需要直接使用,安装2️⃣Using Directly/直接使用自行安装即可
环境准备 1. 清理环境并安装依赖 1 2 3 4 5 6 7 8 9 10 11 conda create -n pytorch-build-2.6 python=3.10 conda activate pytorch-build-2.6 conda install cmake ninja numpy pyyaml mkl mkl-include setuptools cffi typing_extensions future six requests dataclasses pip install -U pip brew install cmake ninja git wget brew install libomp
2. 确保 Xcode 工具链 1 2 3 4 5 xcodebuild -version sudo xcode-select -switch /Applications/Xcode.app/Contents/Developer
编译 PyTorch 2.6 并启用 MPS 1. 获取 PyTorch 2.6 源码 1 2 3 4 5 6 7 8 git clone --recursive https://github.com/pytorch/pytorch cd pytorchgit checkout v2.6.0 git submodule sync git submodule update --init --recursive python -c "import os; os.system('git submodule status')"
2. 创建针对 Intel+AMD 优化的编译配置 创建编译脚本 build_pytorch_2.6_mps.sh:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 #!/bin/bash export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} export MACOSX_DEPLOYMENT_TARGET=11.0export USE_MPS=1export USE_METAL=1export PYTORCH_ENABLE_MPS=1export USE_PYTORCH_METAL_EXPORT=1export PYTORCH_ENABLE_MPS_AMD=1export MPS_AMD_FORCE=1export USE_CUDA=0export USE_CUDNN=0export USE_NCCL=0export USE_ROCM=0export USE_MKLDNN=1export USE_NNPACK=1export USE_QNNPACK=1export USE_PYTORCH_QNNPACK=1export USE_XNNPACK=1export METAL_LIBRARY_PATH="/System/Library/Frameworks/Metal.framework" export METAL_SDK_PATH="/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk" export MAX_JOBS=$(sysctl -n hw.ncpu)echo "=== PyTorch 2.6 MPS 编译配置 ===" echo "USE_MPS: $USE_MPS " echo "USE_METAL: $USE_METAL " echo "USE_PYTORCH_METAL_EXPORT: $USE_PYTORCH_METAL_EXPORT " echo "METAL_LIBRARY_PATH: $METAL_LIBRARY_PATH " echo "MAX_JOBS: $MAX_JOBS " python setup.py clean python setup.py build develop
3. 应用针对 AMD GPU 的补丁 由于您使用的是 AMD GPU,可能需要一些调整:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 cat > mps_amd_fix.patch << 'EOF' --- a/cmake/Dependencies.cmake +++ b/cmake/Dependencies.cmake @@ -1234,6 +1234,12 @@ if (USE_METAL) if (NOT METAL_LIBRARY) message(WARNING "Metal library not found. Disabling Metal support." ) set (USE_METAL OFF) + else () + message(STATUS "Found Metal library: ${METAL_LIBRARY} " ) + + list(APPEND Caffe2_PRIVATE_DEPENDENCY_LIBS ${METAL_LIBRARY} ) + + find_library(MPS_LIBRARY MetalPerformanceShaders) + if (MPS_LIBRARY) + list(APPEND Caffe2_PRIVATE_DEPENDENCY_LIBS ${MPS_LIBRARY} ) + endif() endif() endif() EOF git apply mps_amd_fix.patch || echo "补丁可能不完全适用,继续编译..."
4. 运行编译 1 2 3 4 5 chmod +x build_pytorch_2.6_mps.sh./build_pytorch_2.6_mps.sh
替代编译方法(如果上述方法失败) 方法 B:使用 setup.py 直接编译 1 2 3 4 5 6 python setup.py clean CMAKE_ARGS="-DUSE_MPS=ON -DUSE_METAL=ON -DUSE_PYTORCH_METAL_EXPORT=ON -DUSE_CUDA=OFF -DUSE_ROCM=OFF" \ python setup.py build develop
方法 C:分步 CMake 编译 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 mkdir build && cd buildcmake .. \ -DUSE_MPS=ON \ -DUSE_METAL=ON \ -DUSE_PYTORCH_METAL_EXPORT=ON \ -DUSE_CUDA=OFF \ -DUSE_ROCM=OFF \ -DUSE_MKLDNN=ON \ -DUSE_NNPACK=ON \ -DCMAKE_BUILD_TYPE=Release \ -DPYTHON_EXECUTABLE=$(which python) \ -DCMAKE_PREFIX_PATH=${CONDA_PREFIX} \ -DMETAL_LIBRARY_PATH="/System/Library/Frameworks/Metal.framework" make -j$(sysctl -n hw.ncpu) cd ..python setup.py develop
验证编译结果 创建验证脚本 verify_mps_2.6.py:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 import torchimport sysimport platformprint ("=== PyTorch 2.6 MPS 验证 ===" )print (f"PyTorch version: {torch.__version__} " )print (f"Python: {sys.version} " )print (f"macOS: {platform.mac_ver()[0 ]} " )print (f"Architecture: {platform.machine()} " )print ("\n=== 编译配置 ===" )print (f"Build settings: {torch.__config__.show()} " )print ("\n=== MPS 支持检测 ===" )print (f"MPS available: {torch.backends.mps.is_available()} " )print (f"MPS built: {torch.backends.mps.is_built()} " )if torch.backends.mps.is_available(): device = torch.device("mps" ) print (f"MPS device: {device} " ) import time size = 3000 a = torch.randn(size, size, device=device) b = torch.randn(size, size, device=device) for _ in range (3 ): _ = a @ b if hasattr (torch, 'mps' ): torch.mps.synchronize() start_time = time.time() for _ in range (10 ): c = a @ b if hasattr (torch, 'mps' ): torch.mps.synchronize() mps_time = time.time() - start_time print (f"MPS 矩阵乘法时间: {mps_time:.4 f} s" ) a_cpu, b_cpu = a.cpu(), b.cpu() start_time = time.time() for _ in range (10 ): c_cpu = a_cpu @ b_cpu cpu_time = time.time() - start_time print (f"CPU 矩阵乘法时间: {cpu_time:.4 f} s" ) print (f"加速比: {cpu_time/mps_time:.2 f} x" ) if hasattr (torch, 'mps' ): try : current_mem = torch.mps.current_allocated_memory() driver_mem = torch.mps.driver_allocated_memory() print (f"MPS 当前内存: {current_mem/1024 **2 :.1 f} MB" ) print (f"MPS 驱动内存: {driver_mem/1024 **2 :.1 f} MB" ) except Exception as e: print (f"内存信息获取失败: {e} " ) else : print ("MPS 不可用" ) print ("\n=== 关键编译标志验证 ===" )build_string = str (torch.__config__.show()) key_flags = ['MPS' , 'METAL' , 'USE_PYTORCH_METAL_EXPORT' ] for flag in key_flags: if flag in build_string: print (f"✅ {flag} : 已启用" ) else : print (f"❌ {flag} : 未找到" )
故障排除 1 2 3 sudo xcode-select -switch /Applications/Xcode.app/Contents/Developerexport METAL_LIBRARY_PATH="/System/Library/Frameworks/Metal.framework"
常见问题 2: 链接错误 1 2 3 4 git clean -xdf git submodule foreach --recursive git clean -xdf python setup.py clean
常见问题 3: Python 包冲突 1 2 3 4 5 6 conda deactivate conda env remove -n pytorch-build-2.6 conda create -n pytorch-build-2.6 python=3.10 conda activate pytorch-build-2.6
常见问题 4: 子模块问题 1 2 3 git submodule deinit -f . git submodule update --init --recursive
成功编译的标志 编译成功后,您应该在验证脚本中看到:
✅ MPS available: True
✅ MPS built: True
✅ 在编译配置中包含 USE_MPS、METAL 等关键标志
✅ 能够创建 device='mps' 的张量
✅ 比 CPU 更快的计算速度
安装到其他环境 编译成功后,您可以创建 wheel 包安装到其他环境:
1 2 3 4 5 python setup.py bdist_wheel pip install dist/torch-2.6.0*.whl
4️⃣Test/测试脚本 Python import torch print(torch.version ) # 应为 2.6.0 print(torch.backends.mps.is_available()) # 应为 True
5️⃣Testing Result/实测结果 PyTorch版本: 2.6.0a0+git1eba9b3 MPS可用: True MPS设备: 检查torch.matmul算子设备… CPU matmul测试… CPU matmul 100次总时间: 0.6217秒 CPU matmul平均每次时间: 0.006217秒 CPU matmul结果设备: cpu MPS matmul测试… MPS matmul 100次总时间: 0.0069秒 MPS matmul平均每次时间: 0.000069秒 MPS matmul结果设备: mps:0 性能比较: CPU总时间: 0.6217秒 MPS总时间: 0.0069秒 加速比 (CPU/MPS): 89.74x 🎉 MPS比CPU快 89.74 倍 CPU和MPS结果最大差异: 0.00023651
6️⃣ Links and get github Repositories / 访问链接和原始仓库 https://github.com/shmthechengguang/pytorch-for-intel-Mac-with-Metal-acceleration-MPS-in-AMD-GPU