RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8

Sat, 13 Jun 2026 10:03:44 +0200

A year ago, I bought an RTX 5080 for both gaming and AI experiments. Little did I know back then that I would be giving into the joys of local LLM setups.
Fast forward 2026, Qwen 3.5, Gemma, Qwen 3.6, I needed more than 16GB. So I got myself a refurbished RTX 3090 with 24GB. I could then run Qwen 3.6 Q4 quants, first at ~30 tok/s, then 50-60 with MTP. Not bad. But still felt limited while my 5080 was barely used.
So I began digging what kind of setup could take profit of those 2 cards together. I already had DDR4 sticks and SSD disks ready, I only needed a mobo capable of handling the two cards.
Enters the Asus Prime X570-Pro, the “Pro” is important, it is what ensures the 16x PCIe can be splitted in 2x8.
The 5080 being the monster it is I bought a good quality PCIe 4 riser to plug it on the second slot.

AI on iMil.net

RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8