<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>AI on iMil.net</title>
    <link>http://imil.net/blog/tags/ai/</link>
    <description>Recent content in AI on iMil.net</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Sat, 13 Jun 2026 10:03:44 +0200</lastBuildDate>
    <atom:link href="http://imil.net/blog/tags/ai/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>RTX 5080 &#43; RTX 3090 Setup: 80&#43; Tok/s on Qwen 3.6 27B Q8</title>
      <link>http://imil.net/blog/posts/2026/rtx-5080-&#43;-rtx-3090-setup-80&#43;-tok-s-on-qwen-3.6-27b-q8/</link>
      <pubDate>Sat, 13 Jun 2026 10:03:44 +0200</pubDate>
      <guid>http://imil.net/blog/posts/2026/rtx-5080-&#43;-rtx-3090-setup-80&#43;-tok-s-on-qwen-3.6-27b-q8/</guid>
      <description>&lt;p&gt;A year ago, I bought an RTX 5080 for both gaming and AI experiments. Little did I know back then that I would be giving into the joys of local LLM setups.&lt;br&gt;&#xA;Fast forward 2026, Qwen 3.5, Gemma, Qwen 3.6, I needed more than 16GB. So I got myself a refurbished RTX 3090 with 24GB. I could then run Qwen 3.6 Q4 quants, first at ~30 tok/s, then 50-60 with MTP. Not bad. But still felt limited while my 5080 was barely used.&lt;br&gt;&#xA;So I began digging what kind of setup could take profit of those 2 cards together. I already had DDR4 sticks and SSD disks ready, I only needed a mobo capable of handling the two cards.&lt;br&gt;&#xA;Enters the Asus Prime X570-Pro, the &amp;ldquo;Pro&amp;rdquo; is important, it is what ensures the 16x PCIe can be splitted in 2x8.&lt;br&gt;&#xA;The 5080 being the monster it is I bought a good quality PCIe 4 riser to plug it on the second slot.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
