_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   FuryGpu – Custom PCIe FPGA GPU
       
       
        anon115 wrote 1 day ago:
        can you run valorant on it?
       
        allanrbo wrote 1 day ago:
        What an inspiring passion project! Very ambitious first Verilog
        project.
       
        bobharris wrote 1 day ago:
        beyond amazing.
        i've dreamt of this.
        so inspiring. 
        it reminds me of alot of time i spent thinking about this: [1] i
        actually wrote one of the professors asking for more info. didn't get a
        reply.
        my dream EE class I never got to take.
        
   URI  [1]: https://rcl.ece.iastate.edu/sites/default/files/papers/SteJon1...
       
        userbinator wrote 1 day ago:
        Supporting hardware features equivalent to a high-end graphics card of
        the mid 1990s
        
        I see no one else has asked this question yet, so I will: How
        VGA-compatible is it? Would I be able to e.g. plug it into any PC with
        a PCIe slot, boot to DOS and play DOOM with it?
       
        raphlinus wrote 2 days ago:
        Very cool project, and I love to see more work in this space.
        
        Something else to look at is the Vortex project from Georgia Tech[1].
        Rather than recapitulating the fixed-function past of GPU design, I
        think it looks toward the future, as it's at heart a highly parallel
        computer, based on RISC-V with some extensions to handle GPU workloads
        better. The boards it runs on are a few thousand dollars, so it's not
        exactly a hobbyist friendly, but it certainly is more accessible than
        closed, proprietary development. There's a 2.0 release that just landed
        a few months ago.
        
        [1] 
        
   URI  [1]: https://vortex.cc.gatech.edu/
       
        PfhorSlayer wrote 2 days ago:
        So, this is my project! Was somewhat hoping to wait until there was a
        bit more content up on the site before it started doing the rounds, but
        here we are! :)
        
        To answer what seems to be the most common question I get asked about
        this, I am intending on open-sourcing the entire stack (PCB
        schematic/layout, all the HDL, Windows WDDM drivers, API runtime
        drivers, and Quake ported to use the API) at some point, but there are
        a number of legal issues that need to be cleared (with respect to my
        job) and I need to decide the rest of the particulars (license, etc.) -
        this stuff is not what I do for a living, but it's tangentially-related
        enough that I need to cover my ass.
        
        The first commit for this project was on August 22, 2021. It's been a
        bit over two and a half years I've been working on this, and while I
        didn't write anything up during that process, there are a fair number
        of videos in my YouTube FuryGpu playlist ( [1] ) that can kind of give
        you an idea of how things progressed.
        
        The next set of blog posts that are in the works concern the PCIe
        interface. It'll probably be a multi-part series starting at the PCB
        schematic/layout and moving through the FPGA design and ending with the
        Windows drivers. No timeline on when that'll be done, though. After
        having written just that post on how the Texture Units work, I've got
        even more respect for those that can write up technical stuff like that
        with any sort of timing consistency.
        
        I'll answer the remaining questions in the threads where they were
        asked.
        
        Thanks for the interest!
        
   URI  [1]: https://www.youtube.com/playlist?list=PL4FPA1MeZF440A9CFfMJ7F6...
       
          pocak wrote 1 day ago:
          In the post about the texture unit, that ROM table for mip level
          address offsets seems to use quite a bit of space. Have you
          considered making the mip base addresses a part of the texture spec
          instead?
       
            PfhorSlayer wrote 1 day ago:
            The problem with doing that is it would require significantly more
            space in that spec. At a minimum, one offset for each possible mip
            level. That data needs to be moved around the GPU internally quite
            a bit, crossing clock domains and everything else, and would
            require a ton of extra registers to keep track of. Putting it in a
            ROM is basically free - a pair of BRAM versus a ton of registers
            (and the associated timing considerations), the BRAM wins almost
            every time.
       
          ruslan wrote 1 day ago:
          How much it depends on hard IP blocks ? I mean, can it be ported to
          FPGAs of other vendors, like Lattice ECP5 ? Did you implement PCIe in
          HDL or used vendor specific IP block ? Please, provide some resource
          utilization statistics. Thanks.
       
            alexforencich wrote 1 day ago:
            The GPU uses [1] + the Xilinx PCIe hard IP core.  When using the
            device-independent DMA engine, that library supports both Xilinx
            and Intel FPGAs.
            
   URI      [1]: https://github.com/alexforencich/verilog-pcie
       
            PfhorSlayer wrote 1 day ago:
            Implementing PCIe in the fabric without using the hard IP would be
            foolish, and definitely not the kind of thing I'd enjoy spending my
            time on! The design makes extensive use of the DSP48E2 and various
            BRAM/URAM blocks available in the fabric. I don't have exact
            numbers off the top of my head, but roughly it's ~500 DSP units
            (primarily for multiplication), ~70k LUTs, ~135k FFs, and ~90
            BRAMs. Porting it to a different device would be a pretty
            significant undertaking, but would not be impossible. Many of the
            DSP resources are inferred, but there is a lot of timing stuff that
            depends on the DSP48E2's behavior - multiple register stages
            following the multiplies, the inputs are sized appropriately for
            those specific DSP capabilities, etc.
       
          rustybolt wrote 2 days ago:
          I have seen semi-regular updates from you on discord and it is
          awesome to see how far this project has come (and also a bit
          frustrating to see how relatively little progress I have made on my
          FPGA projects in the same time!). I was hoping you'd do a writeup,
          can't wait!
       
          michaelt wrote 2 days ago:
          Googling the Xilinx Zynq UltraScale+ it seems kinda expensive.
          
          Of course plenty of hobbies let people spend thousands (or more) so
          there's nothing wrong with that if you've got the money. But is it
          the end target for your project? Or do you have ambitions to go
          beyond that?
       
            0xcde4c3db wrote 1 day ago:
            I've been told by several people that distributor pricing for FPGAs
            is ridiculously inflated compared to what direct customers pay, and
            considering that one can apparently get a dev board on AliExpress
            for about $110 [1] while Digikey lists the FPGA alone for about
            $1880 [2], I believe it (this example isn't an UltraScale chip, but
            it is significantly bigger than the usual low-end Zynq 7000 boards
            sold to undergrads and tinkerers). [1]
            
   URI      [1]: https://www.aliexpress.us/item/3256806069467487.html
   URI      [2]: https://www.digikey.com/en/products/detail/amd/XC7K325T-1F...
       
              mips_r4300i wrote 1 day ago:
              This is both true and false. While I work with Intel/Altera,
              Xilinx is basically the same.
              
              That devboard is using recycled chips 100 percent. Their cost is
              almost nothing.
              
              The kintex-7 part in question can probably be bought in volume
              quantities for around $190. Think  100kEAU.
              
              This kind of price break comes with volume and is common with
              many other kinds of silicon besides FPGAs. Some product lines
              have more pricing pressure than others. For example, very popular
              MCUs may not get as wide of a price break. Some manufacturers
              price more fairly to distributors, some allow very large
              discounts.
       
              bangaladore wrote 1 day ago:
              I have some first- and second-hand experience with this, and you
              are correct. I'm not sure who benefits from this practice. It's
              anywhere from 5-25x cheaper in even small-ish quantities.
       
                oasisaimlessly wrote 1 day ago:
                What magnitude of a quantity is "small-ish"? How does a
                business go about becoming a "direct customer" / bypassing the
                distributors?
       
                  0xcde4c3db wrote 1 day ago:
                  I'm personally too far from those negotiations to offer any
                  likely-pivotal insight (such as a concrete quantity), but my
                  very rough understanding is that there's some critical volume
                  beyond which a customer basically becomes "made" with the
                  Xilinx/Altera sales channels via a financially significant
                  design win, at which point sales engineers etc. all but have
                  a blank check to do things like comp development boards,
                  advance a tray of whatever device is relevant to the design,
                  etc..
                  
                  Basically, as George Carlin put it, "it's a big club, and you
                  ain't in it".
       
            PfhorSlayer wrote 1 day ago:
            Let's be clear here, this is a toy. Beyond being a fun project to
            work on that could maybe get my foot in the door were I ever to
            decide to change careers and move into hardware design, this is not
            going to change the GPU landscape or compete with any of the
            commercial players. What it might do is pave the way for others to
            do interesting things in this space. A board with all of the video
            hardware that you can plug into a computer with all the
            infrastructure available to play around with accelerating graphics
            could be a fun, if extremely niche, product. That would also
            require a *significant* time and money investment from me, and
            that's not something I necessarily want to deal with. When this is
            eventually open-sourced, those who really are interested could make
            their own boards.
            
            One thing to note that is that while the US+ line is generally
            quite expensive (the higher end parts sit in the five-figures range
            for a one-off purchase! No one actually buying these is paying that
            price, but still!), the Kria SOMs are quite cheap in comparison.
            They've got a reasonably-powerful Zynq US+ for about $400, or just
            $350ish the dev boards (which do not expose some of the high-speed
            interfaces like PCIe). I'm starting to sound like a Xilinx shill
            given how many times I've re-stated this, but for anyone serious
            about getting into this kind of thing, those devboards are an
            amazing deal.
       
              chrsw wrote 1 day ago:
              >could maybe get my foot in the door were I ever to decide to
              change careers and move into hardware design
              
              With a project like this I think you're well past a "foot in the
              door".
       
              belter wrote 1 day ago:
              "...I'm doing a (free) operating system (just a hobby, won't be
              big and professional like gnu) for 386(486) AT clones..."
       
                Rinzler89 wrote 1 day ago:
                Yeah you're referring to the Linux kernel but software is much
                cheaper to design, test, build, scale and turn profitable than
                HW, especially GPUs.
                
                Open source GPUs won't threat Nvidia/AMD/Intel anytime soon or
                ever. They're way too far ahead in the game and also backed by
                patents if any new player were to become a thereat.
       
            kanetw wrote 2 days ago:
            The Kria SOM in use here is like $300.
       
        detuur wrote 2 days ago:
        I can't believe that this is the closest we have to a compact,
        stand-alone GPU option. There's nothing like a M.2 format GPU out
        there. All I want is a stand-alone M.2 GPU with modest performance,
        something on the level of embedded GPUs like Intel UHD Graphics, AMD
        Radeon, or Qualcomm's Adreno.
        
        I have an idea for a small embedded product which needs a lot of
        compute and networking, but only very modest graphical capabilities.
        The NXP Layerscape LX2160A [1] would be perfect, but I have to pass on
        it because it doesn't come with an embedded GPU. I just want a small
        GPU!
        
        [1] 
        
   URI  [1]: https://www.nxp.com/products/processors-and-microcontrollers/a...
       
          cpgxiii wrote 1 day ago:
          There's at least one m.2 GPU based on the Silicon Motion SM750
          controller made by Asrock Rack. Similar products exist for mPCIe form
          factor.
          
          Performance is nowhere near a modern iGPU, because an iGPU has access
          to all of the system memory and caches and power budget, and a simple
          m.2 device has node of that. Even low-end PCIe GPUs (single slot,
          half-length/half-height) struggle to outperform better iGPUs and
          really only make sense when you have to use them for basic display
          functionality.
       
          t-3 wrote 2 days ago:
          Maybe a little bit too low-powered for you, but:
          
   URI    [1]: https://www.matrixorbital.com/ftdi-eve
       
          magixx wrote 2 days ago:
          What about MXM GPUs that used to be found in gaming laptops?
          I know the standard is very niche and thus expensive ($400 for a
          3080M used on ebay) but it does exists and you could convert them to
          PCI-E and thus m.2
       
        KallDrexx wrote 2 days ago:
        This is my dream!
        
        The last year I've been working on a 2d focused GPU for I/O constrained
        microcontrollers ( [1] ).  I've been able to utilize this to get user
        interfaces on slow SPI machines to render on large displays, and it's
        been fascinating to work on.
        
        But seeing the limitation of processor pipelines I've had the thought
        for a while that FPGAs could make this faster.    I've recently gotten
        some low end FPGAs to start learning to try and turn my microgpu from
        an ESP32 based one to an FPGA one.
        
        I don't know if I"ll ever get to this level due to kids and free time
        constraints, but man, I would love to get even a hundredth of this
        level.
        
   URI  [1]: https://github.com/KallDrexx/microgpu
       
          Chabsff wrote 2 days ago:
          You probably know this already, but for anyone else curious about
          going down that road: For this type of use, it's definitely worth it
          to constrain yourself to FPGAs with dedicated high-bandwidth
          transceivers. A "basic" 1080p RGB signal at 60hz requires  some
          high-frequency signal processing that's really hard to contend with
          in pure FPGA-land.
       
            KallDrexx wrote 2 days ago:
            That's good to know actually.  I'm still very very early in my FPGA
            adaption (learning the fpga basics) and I am intending to start
            with standard 640x480 VGA before expanding.
       
        notorandit wrote 2 days ago:
        It needs to be very fancy to write text in light gray on white.
        
        I am not sure your product will be a success.
        
        I am sure you web design skills need a good overhaul.
       
          nicolas_17 wrote 23 hours 51 min ago:
          It's not a "product" that will be "sold" or has intention of being
          "successful" in a commercial sense.
       
        nxobject wrote 2 days ago:
        I hope the author goes into some detail about how he implements the
        PCIe interface! I doubt I'll ever do hardware work at that level of
        sophistication, but for general cultural awareness I think it's worth
        looking under the hood of PCIe.
       
          alexforencich wrote 1 day ago:
          It uses [1] on top of the Xilinx PCIe hard IP core, which provides
          everything below the transaction layer.
          
   URI    [1]: https://github.com/alexforencich/verilog-pcie
       
          PfhorSlayer wrote 2 days ago:
          Next blog post will be covering exactly that! Probably going to do a
          multi-part series - first one will be the PCB schematic/layout, then
          the FPGA interfaces and testing, followed by Windows drivers.
       
          gorkish wrote 2 days ago:
          The FPGA he is using has native pcie so usually all you get on this
          front is an interface to a vendor proprietary ip block. The state of
          open interfaces in FPGA land is abysmal. I think the best I’ve seen
          fully open source is a gigabit MAC
       
            alexforencich wrote 1 day ago:
            The GPU uses this: [1] .  And there is an open-source 100G NIC
            here, including open source 10G/25G MACs:
            
   URI      [1]: https://github.com/alexforencich/verilog-pcie
   URI      [2]: https://github.com/corundum/corundum
       
              gorkish wrote 4 hours 21 min ago:
              Thank you very much for the references. These look like great
              projects and I am happy to see that I’m a bit out of date. The
              vendors don’t appear to be making anything easier though; it
              appears these projects are still supporting devices by making the
              brute force effort to build the abstractions to vendor specific
              stuff themselves.
       
            0xcde4c3db wrote 1 day ago:
            There is an open-source DisplayPort transmitter [1] that apparently
            supports multiple 2.7 Gbps lanes (albeit using family-specific
            SERDES/differential transceiver blocks, but I doubt that's
            avoidable at these speeds). This isn't PCIe, but it's also
            surprisingly close to PCIe 1.0 (2.5 Gbps/lane, and IIRC they use
            the same 8b/10b code and scrambling algorithm).
            
   URI      [1]: https://github.com/hamsternz/FPGA_DisplayPort
       
        bloatfish wrote 2 days ago:
        This is insane! As a hobby hardware designer myself, I can imagine how
        much work must have gone into reaching this stage. Well done!
       
        MalphasWats wrote 2 days ago:
        It's incredible how influential Ben Eater's breadboard computer series
        has been in hobby electronics. I've been similarly inspired to try to
        design my own "retro" CPU.
        
        I desperately want something as easy to plug into things as the 6502,
        but with jussst a little more capability - few more registers, hardware
        division, that sort of thing. It's a really daunting task.
        
        I always end up coming back to just use an MCU and be done with it, and
        then I hit the How To Generate Graphics problem.
       
          bArray wrote 2 days ago:
          Registers can be worked around by using the stack and/or memory.
          Division could always be implemented as a simple function. It's part
          of the fun of working at that level.
          
          Regarding graphics, initially output serial. Abstract the problem
          away until you are ready to deal with it. If you sneak up on an
          Arduino and make it scream, you can make it into a very basic VGA
          graphics card [1]. Even easier is ESP32 to VGA (also gives keyboard
          and mouse) [2]. [1]
          
   URI    [1]: https://www.instructables.com/Arduino-Basic-PC-With-VGA-Outp...
   URI    [2]: https://www.aliexpress.us/item/1005006222846299.html
       
          PfhorSlayer wrote 2 days ago:
          Funny enough, that's exactly where this project started. After I
          built his 8 bit breadboard computer, I started looking into what
          might be involved in making something a bit more interesting. Can't
          do a whole lot of high-speed anything with discrete logic gates, so I
          figured learning what I could do with an FPGA would be far more
          interesting.
       
          MenhirMike wrote 2 days ago:
          I was about to recommend the Parallax Propeller (the first one that's
          available in DIP format), but arguably, that one is way more complex
          to program for (and also significantly more powerful, and at that
          point you might as well look into an ESP32 and that is "just use an
          MCU" :))
          
          And yeah, video output is a significant issue because of the required
          bandwidth for digital outputs (unless you're okay with composite or
          VGA outputs, I guess they can still be done with readily available
          chips?). The recent Commander X16 settled for an FPGA for this.
       
            MalphasWats wrote 2 days ago:
            I feel like the CX16 lost its way about a week after the project
            started and it suddenly became an expensive FPGA-based blob. But at
            the same time, I'm not sure what other option there is for a
            project like that.
            
            I always got the impression that David sort of got railroaded by
            the other members of the team that wanted to keep adding features
            and MOAR POWAH, and didn't have a huge amount of choice because
            those features quickly scoped out of his own areas of knowledge.
       
              hakfoo wrote 14 hours 43 min ago:
              You might find the Sentinel 65X interesting in that the guy
              behind it basically said "the X16 is big and clunky and
              expensive, let's cut out that stuff". [1] It's not yet a
              deliverable product but watching the developers work on it has
              been an entertaining part of my doomscrolling diet.
              
   URI        [1]: https://github.com/studio8502/Sentinel-65X
       
              rzzzt wrote 2 days ago:
              The first choice was the Gameduino, also an FPGA-based solution.
              I have misplaced my bookmark for the documentation covering the
              previous hardware revision, but current version 3X is MOAR POWAH
              just on its own, this seems to be a natural tendency: [1] Edit:
              found it!
              
   URI        [1]: https://excamera.com/sphinx/gameduino3/index.html#about-...
   URI        [2]: https://excamera.com/sphinx/gameduino/index.html
       
                erik wrote 1 day ago:
                Modern retro computer designs run into the problem of
                generating a video signal.  Ideally you'd have a tile and
                sprite based rendering.  And you'd like to support HDMI or at
                least VGA.  But there are no modern parts that offer this and
                building the functionality out of discrete components is
                impractical and unwieldy.
                
                A FPGA is really just the right tool for solving the video
                problem.  Or some projects do it with a micro-controller.  But
                it's sort of too bad as it kind of undercuts the spirit of the
                whole design.  If you video processor is orders of magnitude
                more powerful than the rest of the computer, then one starts to
                ask why not just implement the entire computer inside the video
                processor?
       
                  MenhirMike wrote 1 day ago:
                  It's one of the funny things of the Raspberry Pi Pico W: The
                  Infineon CYW4343 has an integrated ARM Cortex-M3 CPU, so the
                  WiFi/BT chip is technically more advanced than the actual
                  RP2040 (which is a Cortex-M0+) and also has more built-in
                  ROM/RAM than what's on the Pico board for the RP2040 to use.
                  
                  And yeah, you can't really buy sprite-based video chips
                  anymore, and you don't even have to worry about stuff like
                  "Sprites per Scanline" because you can get a proper
                  framebuffer for essentially free - but now you might as well
                  go further and use one microprocessor to be the CPU, GPU, and
                  FM Synthesizer Sound Chip and "just" add the logic to
                  generate the actual video/audio signals.
       
              MenhirMike wrote 2 days ago:
              I think so too - it must have been a great learning experience
              for him though, but for me, the idea of "The best C64-like
              computer that ever existed" died pretty quickly.
              
              He also did run into a similar problem that I ran into when I
              tried something like that as well: Sound Chips. Building a system
              around a Yamaha FM Synthesizer is perfect, but I found as well
              that most of the chips out there are broken, fake, or both and
              that no one else makes them anymore. Which makes sense because if
              you want a sound chip in this day, you use an AC97 or HD Audio
              codec and call it a day, but that goes against that spirit.
              
              I think that the spirit on hobby electronics is really found in
              FPGAs these days instead of rarer and rarer DIP parts. Which is a
              bit sad, but I guess that's just the passage of time. I wonder if
              that's how some people felt in the 70s when CPUs replaced many
              distinct layouts, or if they rejoiced and embraced it instead.
              
              I've given up trying to build a system on a breadboard and think
              that MiSTer is the modern equivalent of that.
       
                dragontamer wrote 2 days ago:
                > I think that the spirit on hobby electronics is really found
                in FPGAs these days instead of rarer and rarer DIP parts. Which
                is a bit sad, but I guess that's just the passage of time. I
                wonder if that's how some people felt in the 70s when CPUs
                replaced many distinct layouts, or if they rejoiced and
                embraced it instead.
                
                Microcontrollers have taken over. When 8kB SRAM and 20MHz
                microcontrollers exist below 50-cents and at miniscule 25mm^2
                chip sizes drawing only 500uA of current... there's very little
                reason to use a collection of 30 chips to do equivalent
                functionality.
                
                Except performance. If you need performance then bam, FPGA land
                comes in and Zynq just has too much performance at too low a
                cost (though not quite as low as the microcontroller gang).
                
                ----------
                
                Hobby Electronics is great now. You have so many usable parts
                at very low costs. A lot of problems are  "solved" yes, but
                that's a good thing. That means you can focus on solving your
                hobby problem rather than trying to invent a new display driver
                or something.
       
                  gnramires wrote 2 days ago:
                  Another advantage of hobby anything is that you can just do,
                  and reinvent whatever you want. Sure, fast CPUs/MCUs exist
                  now and can do whatever you want. But if you feel like
                  reinventing the wheel just for the sake of it, no one will
                  stop you![1]
                  
                  I do think some people that remember fondly the user
                  experience of those old machines might be better served by
                  using modern machines (like a raspberry pi or even a standard
                  pc) in a different way instead of trying to use old hardware.
                  That's from the good old Turing machine universality (you can
                  simulate practically any machine you like using newer
                  hardware, if what you're interested in is software). You can
                  even add artificial limitations like PICO-8 or TIC-80 does.
                  
                  See also uxn: [1] and (WIP) picotron: [2] I think there's a
                  general concept here of making 'Operating environments' that
                  are pleasant to work within (or have fun limitations), which
                  I think are more practical than a dedicated Operating System
                  optionally with a dedicated machine. Plus (unless you
                  particularly want to!) you don't need to worry about all the
                  complex parts of operating systems like network stacks,
                  drivers and such. [1] Maybe we should call that Hobby
                  universality (or immortality?) :P If it's already been
                  made/discovered, you can always make it again just for fun.
                  
   URI            [1]: https://100r.co/site/uxn.html
   URI            [2]: https://www.lexaloffle.com/picotron.php
       
          verticalscaler wrote 2 days ago:
          True, can't think of much else this popular.
          
          He started posting videos again recently with some regularity after a
          lull. Audience is in the low hundreds of thousands. I assume fewer
          than 100k actually finish videos and fewer still do anything with it.
          
          Hobby electronics seems surprisingly small in this era.
       
            TillE wrote 1 day ago:
            Even if you're not much of a tinkerer, Ben Eater's videos are
            massively helpful if you want to truly understand how computers
            work. As long as you come in knowing the rudiments of digital
            electronics, just watching his stuff is a whole education in 8-bit
            computer design. You won't quite learn how modern computers work
            with their fancy caches and pipelines and such, but it's a really
            strong foundation to build on.
            
            I've built stuff with microcontrollers (partially aided by
            techniques learned here), but that was very purpose-driven and I'm
            not super interested in just messing around for fun.
       
              Bene592 wrote 22 hours 39 min ago:
              If you do want to learn about pipelines, or want a more powerful
              breadboard computer without 6502 you should check out James
              Sharman[1] on YouTube. He has VGA and audio output too.
              
   URI        [1]: https://youtube.com/@weirdboyjim
       
            hedora wrote 2 days ago:
            I wonder if there’s much overlap between people that watch
            YouTube to get deep technical content (instead of reading), and
            people that care about hobby electronics.
            
            I’m having trouble wrapping my head around how / why you’d use
            youtube to present analog electrical engineering formulas and pin
            out diagrams instead of using latex or a diagram.
       
              jpc0 wrote 1 day ago:
              For some things there is a lot of nuance lost in just writing.
              The unknowm unknowns.
              
              There has been a lot of times where I am showing someone new to
              my field something and they stop me before I get to what I
              thought was the "educational" point and ask what I just did.
              
              Video can portray that pretty well because the information is
              there for you to see, with a schematic or write-up if the author
              didn't put it there the information isn't there.
       
              robinsonb5 wrote 2 days ago:
              I consider YouTube (or rather, video in general) a fantastic
              platform for showcasing something cool, demonstrating what it can
              do, and even demonstrating how to drive a piece of software - but
              for actual technical learning I loathe the video format - it's so
              hard to skim, re-read, pause, recap and digest at your own speed.
              
              The best compromise seems to be webpages with readable technical
              info and animated video illustrations - such as the one posted
              here yesterday about how radio works.
       
          jsheard wrote 2 days ago:
          I've been looking into graphics on MCUs and was disappointed to learn
          that the little "NeoChrom" GPU they're putting on newer STM32 parts
          is completely undocumented. Historically they have been good about
          not putting black boxes in their chips, but I guess it's probably an
          IP block they've licensed from a third party.
       
            MrBuddyCasino wrote 1 day ago:
            That sucks. There are other MCUs with 2D graphics peripherals, eg
            the NXP i.MX line.
       
            unwind wrote 2 days ago:
            Agreed. It is so, so, so very disappointing. I was deeply surprised
            (in a non-pleasant way) when I first opened up a Reference Manual
            for one of those chips and saw that the GPU chapter was, like, four
            pages. :(
       
              nick__m wrote 2 days ago:
              On the ST forum the company clearly said that they will only
              release to some selected partners. That's sad.
       
            gchadwick wrote 2 days ago:
            The RP2040 is a great MCU for playing with graphics as it can bit
            bang VGA and DVI/HDMI. There's some info on the DVI here: [1] I
            wrote a couple of articles on how to do bit banged VGA on the
            RP2040 from scratch: [2] and [3] plus an intro to PIO
            
   URI      [1]: https://github.com/Wren6991/PicoDVI
   URI      [2]: https://gregchadwick.co.uk/blog/playing-with-the-pico-pt5/
   URI      [3]: https://gregchadwick.co.uk/blog/playing-with-the-pico-pt6/
   URI      [4]: https://gregchadwick.co.uk/blog/playing-with-the-pico-pt4/
       
              CarVac wrote 2 days ago:
              I used "composite" (actually monochrome) video output software
              someone wrote on the RP2040 for an optional feature on the
              PhobGCC custom gamecube controller motherboard to allow easy
              calibration, configuration, and high-frequency input recording
              and graphing.
              
              Pictures of the output here:
              
   URI        [1]: https://github.com/PhobGCC/PhobGCC-doc/blob/main/For_Use...
       
              jsheard wrote 2 days ago:
              You can do something similar on STM32 parts that have an LCD
              controller, which can be abused to drive a VGA DAC or a DVI
              encoder chip. The LCD controller at least is fully documented,
              but many of their parts pair that with a small GPU, which would
              be an advantage over the GPU-less RP2040... if there were any
              public documentation at all for the GPU :(
       
        codedokode wrote 2 days ago:
        "UltraScale" in name assumes ultra price? FPGAs seem to be an expensive
        toy.
       
          PfhorSlayer wrote 2 days ago:
          In general, yes. However, the Kria series are amazingly good deals
          for what you get - a quite powerful Zynq US+ part and a dev board for
          like $350.
       
          nxobject wrote 2 days ago:
          It's worth mentioning that it's easy enough to find absurdly cheap
          (~$20) early-generation dev boards for Zynq FPGAs with embedded ARM
          cores on Aliexpress, shucked from obsolete Bitcoin miners [1].
          Interfaces include SD, Ethernet, 3 banks of GPIO.
          
   URI    [1]: https://github.com/xjtuecho/EBAZ4205
       
            thrtythreeforty wrote 2 days ago:
            Zynq is deeply annoying to work with, though.  Unfortunately the
            hard ARM core bootloads the FPGA fabric, rather than the other way
            around (or having the option to initialize both separately).  This
            means you have to muck with software on the target to update FPGA
            bitstreams.
       
              CamperBob2 wrote 2 days ago:
              Isn't it mostly just boilerplate code that does the FPGA
              configuration, though?
       
          varispeed wrote 2 days ago:
          Ages ago I bought TinyFPGA, which is like £40 and I was able to
          synthesize RISC-V cpu on it. It was fun.
       
          mattalex wrote 2 days ago:
          Not in the grand scheme of things: you can get fpga dev boards for
          $50 that are already useable for this type of thing (you can go even
          lower, but those aren't really useable for "CPU like" operation and
          are closer to "a whole lot of logic gates in a single chip"). Of
          course the "industry grade" solutions pack significantly more of a
          punch, but they can also be had for <$500.
       
        sylware wrote 2 days ago:
        Hopefully their hardware programming model is going full hardware
        circular command/interrupt buffers (even for GPU register programming).
        
        It is how it is done on AMD GPU, that said I have no idea what is the
        nvidia hardware programming model.
       
        wpwpwpw wrote 2 days ago:
        Excellent job. Would be amazing if this became an open source hardware
        project.
       
        spuz wrote 2 days ago:
        This looks like an incredible achievement. I'd love to see some photos
        of the physical device. I'm also slightly confused about which FGPA
        module is being used. The blog mentions the Xylinx Kria SoMs but if you
        follow the links to the specs of those modules, you see they have ARM
        SoCs rather than Xylinx FGPAs. The whole world of FGPAs is pretty
        unfamiliar to me so maybe I'm missing something.
        
   URI  [1]: https://www.amd.com/en/products/system-on-modules/kria/k26/k26...
       
          PfhorSlayer wrote 2 days ago:
          You're in luck! [1] As mentioned in the rest of this thread, the Kria
          SoMs are FPGA fabric with hardened ARM cores running the show. Beyond
          just being what was available (for oh so cheap, the Kria devboards
          are like $350!), these devices also include things like hardened
          DisplayPort IP attached to the ARM cores allowing me to offload
          things like video output and audio to the firmware. A previous
          version of this project was running on a Zynq 7020, for which I
          needed to write my own HDMI stuff that, while not super complicated,
          takes up a fair amount of logic and also gets way more complex if it
          needs to be configurable.
          
   URI    [1]: https://imgur.com/a/BE0h9cZ
       
          chiral-anomaly wrote 2 days ago:
          Xilinx doesn't mention the exact FPGA p/n used in the Kria SoMs.
          However according to their public specs they appear to match [1] the
          ZU3EG-UBVA530-2L and ZU5EV-SFVC784-2L devices, with the latter being
          the only one featuring PCIe support.
          
          Designing and bringing-up the FPGA board as described in the blog
          post is already a high bar to clear. I hope the author will at some
          point publish schematics and sources.
          
   URI    [1]: https://docs.amd.com/v/u/en-US/zynq-ultrascale-plus-product-...
       
          crote wrote 2 days ago:
          > you see they have ARM SoCs rather than Xylinx FGPAs
          
          It's a mixed chip: FPGA and traditional SoC glued together. This mean
          you don't have a softcore MCU taking up precious FPGA resources just
          to do some basic management tasks.
       
            chrsw wrote 2 days ago:
            I didn't see any mention of what the software on the Zynq's ARM
            core is doing, which made me wonder why use Zynq at all.
       
              PfhorSlayer wrote 1 day ago:
              The hardened DisplayPort IP is connected to the ARM cores, and
              requires a significant amount of configuration and setup.
              FuryGpu's firmware primarily handles interfacing with that block:
              setting up descriptor sets to DMA video frame and audio data from
              memory (where the GPU has written it for video, or where the host
              has DMA'd it for audio), responding to requests to reconfigure
              things for different resolutions, etc. There's also a small
              command processor there that lets me do various things that
              building out hardware for doesn't make sense - moving memory
              around with the hardened DMA peripheral, setting up memory
              buffers used internally by the GPU, etc. If I ever need to expose
              a VGA interface in order to have motherboards treat this as a
              primary graphics output device during boot, I'd also be handling
              all of that in the firmware.
       
            spuz wrote 2 days ago:
            Ah that makes sense. It's slightly ironic then that the ARM SoC
            includes a Mali GPU which presumably easily outperforms what can be
            achieved with the FGPA.
       
        iAkashPaul wrote 2 days ago:
        FPGAs for native FP4 will change the entire landscape
       
          imtringued wrote 1 day ago:
          How? NPUs are going to be included in every PC in 2025. The only
          differentiators will be how much SRAM and memory bandwidth you have
          or whether you use processing in memory or not. AMD is already
          shipping APUs with 16 TOPS or 4 TFLOPS (bfloat16) and that is more
          than enough for inference considering the limited memory bandwidth.
          Strix Halo will have around 12 TFLOPS (bfloat16) and four memory
          channels.
          
          llama.cpp already supports 4 bit quantization. They unpack the
          quantization back to bfloat16 at runtime for better accuracy. The
          best use case for an FPGA I have seen so far was to pair it with SK
          Hynix's AI GDDR and even that could be replaced by an even cheaper
          inference chip specializing in multi board communication and as many
          memory channels as possible.
       
          blacklion wrote 2 days ago:
          Entire landscape of open graphic chips?
          
          Not every GPU should be used to train or infer so-called AI.
          
          Please, stop, we need some hardware to put images on the screens.
       
          Y_Y wrote 2 days ago:
          Four-bit floats are not as useful as Nvidia would have you believe.
          Like structured sparsity it's mainly a trick to make newer-gen cards
          look faster in the absence of an improvement in the underlying tech.
          If you're using it for NN inference you have to carefully tune the
          weights to get good accuracy and it offers nothing over fixed-point.
       
            imtringued wrote 1 day ago:
            The actual problem is that nobody uses these low precision floats
            for training their models. When you do quantization you are merely
            compressing the weights to minimize memory usage and to use memory
            bandwidth more efficiently. You still have to run the model at the
            original precision for the calculations so nobody gives a damn
            about the low precision floats for now.
       
              Y_Y wrote 6 hours 59 min ago:
              That's not entirely true. Current-gen Nvidia hardware can use fp8
              and newly announced Blackwell can do fp4. Lots of existing
              specialized inference hardware uses int8 and some int4.
              
              You're right that low-precison training still doesn't seem to
              work, presumably because you lose the smoothness required for
              SGD-type optimization.
       
          jsheard wrote 2 days ago:
          Very briefly, until someone makes an ASIC that does the same thing
          and FPGAs are relegated to niche use-cases once again.
          
          FPGAs only make long-term sense in applications that are so
          low-volume that it's not worth spinning an ASIC for them.
       
            iAkashPaul wrote 2 days ago:
            Absolutely
       
          luma wrote 2 days ago:
          How so?
       
            CamperBob2 wrote 2 days ago:
            4-bit values (or 6-bit values, nowadays) values are interesting
            because they're small enough to address a single LUT, which is the
            lowest-level atomic element of an FPGA.  That gives them major
            advantages in the timing and resource-usage departments.
       
            iAkashPaul wrote 2 days ago:
            Reduced memory requirements, dropping higher precision IP blocks 
            for starters
       
        gchadwick wrote 2 days ago:
        Cool! I found the hello blog here illuminating to understand the
        creators intentions: [1] As I read it, it's just a fun hobby project
        for them first and foremost and looks like they're intending to write a
        whole bunch more about how they built it.
        
        It's certainly an impressive piece of work, in particular as they've
        got the full stack working, a windows driver implementing a custom
        graphics API and then quake running on top of that. A shame they've not
        got some DX/GL support but I can certainly understand why they went the
        custom API route.
        
        I wonder if they'll open source the design?
        
   URI  [1]: https://www.furygpu.com/blog/hello
       
          PfhorSlayer wrote 2 days ago:
          I'm in the process of actually trying to work out what would be
          feasible performance-wise if I were to spent the considerable effort
          to add the features required for base D3D support. It's not looking
          good, unfortunately. Beyond just "shaders", there are a significant
          amount of other requirements that even just the OS's window manager
          needs to function at all. It's all built up on 20+ years of evolving
          tech and for the normal players in this space (AMD, Nvidia, Intel,
          Imagination, etc.) it's always been an iterative process.
       
        jamesu wrote 2 days ago:
        Similarly there is this: [1] Would be neat if someone made an FPGA GPU
        which had a shader pipeline honestly.
        
   URI  [1]: https://github.com/ToNi3141/Rasterix
       
          danbruc wrote 2 days ago:
          If you are going to that effort, you might also want a decent
          resolution. Say we aim for one megapixel (720p) and 30 frames per
          second, then we have to calculate 27.7 megapixel per second. If you
          get your FPGA to run at 500 MHz, that gives you 18 clock cycles per
          pixel. So you would probably want something like 100 cores keeping in
          mind that we also have to run vertex shaders. We also need quick
          access to a sizable amount of memory and I am not sure if one can get
          away with integer respectively fixed point arithmetics or whether
          floating point arithemtics is pretty much necessary. Another
          complication that I would expect is that it is probably much easier
          to build a long execution pipeline if you are implementing a fixed
          function pipeline as compared to a programmable processor. Things
          like out-of-order execution are probably best off-loaded to the
          compiler in order to keep the design simpler and more compact.
          
          So my guess is that it would be quite challenging to implement a
          modern GPU in an affordable FPGA if you want more than a proof of
          concept.
       
            PfhorSlayer wrote 2 days ago:
            You've nailed the problem directly on the head. For hitting 60Hz in
            FuryGpu, I actually render at 640x360 and then pixel-double (well,
            pixel->quad) the output to the full 720p. Even with my GPU cores
            running at 400MHz and the texture units at 480MHz with fully
            fixed-function pipelines, it can still struggle to keep up at
            times.
            
            I do not doubt that a shader core could be built, but I have
            reservations about the ability to run it fast enough or have as
            many of them as would be needed to get similar performance out of
            them. FuryGpu does its front-end (everything up through primitive
            assembly) in full fp32. Because that's just a simple fixed
            modelview-projection matrix transform it can be done relatively
            quickly, but having every single vertex/pixel able to run full fp32
            shader instructions requires the ability to cover instruction
            latency with additional data sets - it gets complicated, fast!
       
            d_tr wrote 2 days ago:
            There's a new board by Trenz with a Versal chip which can do 440
            GFLOPS just with the DSP58 slices (the lowest speed grade) and it
            costs under 1000 Euros, but you also need to buy a Vivado license
            currently.
            
            Cheaper boards are definitely possible since there are smaller
            parts in that family, but they need to offer support for some of
            them in the free version of Vivado...
       
          actionfromafar wrote 2 days ago:
          How good would a Ryzen with 32 cores be if it did just graphics?
       
            __alexs wrote 2 days ago:
            15 fps on an oldish Epyc 64 core
            
   URI      [1]: https://www.youtube.com/watch?v=2tn0bZcQf0E
       
            immibis wrote 2 days ago:
            Wasn't Intel Larrabee something like that? Get a bunch of dumb x86
            cores together and tell them to do graphics?
       
              erik wrote 1 day ago:
              Larrabee was mostly x86 cores, but it did have sampling/texturing
              hardware because it's way more efficient to do those particular
              things in the 3d pipeline with dedicated hardware.
       
              actionfromafar wrote 2 days ago:
              I'm so sad Larrabee or similar things never took off. No, it
              might not have benchmarked well against contemporary graphics
              cards, but I think these matrixes of x86 cores could have come to
              great use for cool things not necessarily related to graphics.
       
                fancyfredbot wrote 2 days ago:
                Intel launched Larabee as Xeon Phi for non-graphics purposes.
                Turns out it wasn't especially good at those either. You can
                still pick one up on eBay today for not very much.
       
                  bee_rider wrote 2 days ago:
                  Probably not aided by the fact that conventional Xeon core
                  counts were sneaking up on them—not quite caught up, but
                  anybody could see the trajectory—and offered a much more
                  familiar environment.
       
                    actionfromafar wrote 1 day ago:
                    Yes, I agree. Still unfortunate. I think the concept was
                    very promising. But Intel had no appetite for burning money
                    on it to see where it would go in the long run.
       
                  actionfromafar wrote 2 days ago:
                  That's where we have to agree to (potentially) disagree. I
                  lament that these or similar designs didn't last longer in
                  the market, so people could learn how to harness them.
                  
                  Imagine for instance hard real time tasks, each one task
                  running on its own separate core.
       
                    fancyfredbot wrote 1 day ago:
                    I think Intel have similar designs? The Xeon Phi had 60
                    cores, and their high core count CPUs have 56. The GPU max
                    1550 has 128 low power xe cores.
       
                    rjsw wrote 2 days ago:
                    I think Intel should have made more effort to get cheap
                    Larabee dev boards onto the market, they could have been
                    using chips that didn't run at full speed or with too many
                    broken cores to sell at full price.
       
                  Y_Y wrote 2 days ago:
                  The novelty of sshing into a PCI card is nice though. I
                  remember trying to use them at a hpc cluster, all the
                  convenience of wrangling GPUs but at a fraction of the
                  performance
       
            tux3 wrote 2 days ago:
            You can run Crysis in software rendering on a high core count AMD
            CPU.
            
            It's terrible use of the hardware and the performance is far from
            stellar, but you can!
       
        snvzz wrote 2 days ago:
        Pipeline seems retro, but far better than nothing.
        
        There's no open hardware GPU to speak of. Depending on license (can't
        find information?), this could be the first, and a starting point for
        more.
       
          userbinator wrote 1 day ago:
          Depends what you mean by "GPU". [1] is an MDA/CGA compatible adapter
          [2] is a VGA core [3] is a whole IBM PC compatible SoC, including a
          VGA.
          
   URI    [1]: https://github.com/schlae/graphics-gremlin
   URI    [2]: https://github.com/OmarMongy/VGA
   URI    [3]: https://github.com/archlabo/Frix
       
          Hazematman wrote 1 day ago:
          There's also Nyuzi which is more GPGPU focused [1] , but the author
          also experimented with having it do 3D graphics.
          
   URI    [1]: https://github.com/jbush001/NyuziProcessor
       
          mips_r4300i wrote 2 days ago:
          Ticket2Ride Number9 is a fixed function GPU from the late 90s that
          was completely open sourced under GPL
       
          monocasa wrote 2 days ago:
          > There's no open hardware GPU to speak of. Depending on license
          (can't find information?), this could be the first, and a starting
          point for more.
          
          There's this which is about the same kind of GPU
          
   URI    [1]: https://github.com/asicguy/gplgpu
       
            the_panopticon wrote 1 day ago:
            what about projects like [1] ?
            
   URI      [1]: https://github.com/VerticalResearchGroup/miaow
       
              monocasa wrote 23 hours 25 min ago:
              That's purely GPGPU and doesn't contain hardware like the
              rasterizers or texture samplers.
              
              Cool project though.
       
          crote wrote 2 days ago:
          It all depends on your definition of "open", of course. As far as I
          know there is no open-source toolchain for any remotely-recent FPGA,
          so you're still stick with proprietary (paid?) tooling to actually
          modify it. You're pretty much out of luck if you need more than an
          iCE40 UP5k.
       
            bajsejohannes wrote 2 days ago:
            The up an coming GateMate seems interesting to me. They are leaning
            heavily on open source tooling.
            
            chip: [1] board:
            
   URI      [1]: https://colognechip.com/programmable-logic/gatemate/
   URI      [2]: https://www.olimex.com/Products/FPGA/GateMate/GateMateA1-E...
       
            snvzz wrote 2 days ago:
            >You're pretty much out of luck if you need more than an iCE40
            UP5k.
            
            Lattice ECP5 (which goes up to 85k LUT or so?) and Nexus have more
            than decent support.
            
            Gowin FPGAs are supported via project apicula up to 20k LUT models.
            Some new models go above 200k LUT so there's hope there.
       
              robinsonb5 wrote 2 days ago:
              Yeah I've used yosys / nextpnr on an ECP5-85 with great results -
              it's pretty mature and dependable now.
       
            rwmj wrote 2 days ago:
            At least some Xilinx 7-series FPGAs have been reverse engineered:
            
   URI      [1]: https://yosyshq.readthedocs.io/projects/yosys/en/latest/cm...
       
              robinsonb5 wrote 2 days ago:
              There's been some interesting recent work to get the QMTech
              Kintex7-325 board (among others) supported under yosys/nextpnr -
              [1] It works well enough now to build a RISC-V SoC capable of
              running Linux.
              
   URI        [1]: https://github.com/openXC7
       
       
   DIR <- back to front page