Working ComfyUI with ROCM on 9070XT - a quick tutorial and an ask.

Mostly I've been putting my shiny new 9070XT to work over the last few days playing with comfyui on directml/win11. It's cool, and still faster than the last time i tried it (3070ti) but still not fast, so off we go to ubuntu.

Steps I took:

  • Install 24.04 LTS,
  • Install AMD ROCM using their quickstart instructions, including reboot to register for secure boot reasons (you might not need to do this). Verify working with rocminfo, which should report your GPU.
  • install comfyui in a venv using comfy-cli
  • activate the venv then run: pip install --pre -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3 note the added -U to get rocm 6.3 torch on there.
  • Start comfyUI.

This is hugely quicker than directML (only tried SDXL so far but 3-4x) edit: See below for a bit more detail - speedup is closer to 45% (i.e. a 35s flow on windows runs in about 20 on linux).

Here's where the ask comes in however it's slowed down hugely by falling back to tiled VAE Decode due to a lack of memory. I've tried it with lowvram to no effect, and although it's still overall faster, the additional time decoding is frustrating. Edit: TAESD VAE gets round this to a degree, it's not perfect, but it works a lot better and doesn't seem to affect image quality noticeably.

Anyone ever run into anything like this before and come up with a solution? Corectrl says i'm not actually running out of ram (min 3gb free).