_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Open source RISC-V GPGPU
       
       
        zackmorris wrote 2 days ago:
        I want the opposite of this - a multicore CPU that runs on GPU or FPGA.
        Vortex looks really cool, but if they jump over a level of abstraction
        by only offering an OpenCL interface instead of access to the
        underlying cores, then I'm afraid I'm not interested.
        
        I just need a chip that can run at least 256 streams of execution, each
        with their own local memory (virtualized to appear contiguous). This
        would initially be for running something like Docker, but would
        eventually run a concurrent version of something like GNU Octave
        (Matlab), or languages like Julia that at least make an attempt to
        self-parallelize. If there is a way to do this with Vortex, I'm all
        ears.
        
        I've gone into this at length in my previous comments. The problem is
        that everyone jumped on the SIMD bandwagon when what we really wanted
        was MIMD. SIMD limits us to a very narrow niche of problems to solve
        like neural nets and rasterization. But it prevents us from discovering
        the emergent behavior of large stochastic networks running stuff like
        genetic algorithms or the elegant/simple algorithms like ray tracing.
        That's not handwaving, I'm being very specific here in what I'm saying,
        and feel that this domination of the market by a handful of profit
        chasers like Nvidia has set computing back at least 20 years.
       
          JonChesterfield wrote 2 days ago:
          I think this is available now. The waves/wavefronts on a GPU run
          independently. Communication between them isn't great, independent is
          better.
          
          Given chips from a couple of years ago have ~64 compute units, each
          running ~32 wavefronts, your 256 target looks fine. It's one block of
          contiguous memory, but using it as 256 separate blocks would work
          great.
          
          I don't know of a ready made language targeting the GPU like that.
       
          klelatti wrote 2 days ago:
          I may be missing something here but what do you mean by a CPU that
          runs on a GPU?
          
          Also how does "256 streams of execution, each with their own local
          memory (virtualized to appear contiguous)" differ in practice from
          one of the recent CPUs with lots of cores - e.g. AMD / AWS Arm?
       
            zackmorris wrote 1 day ago:
            Well, this all goes back to when I was heavily into C++, assembly
            and blitters in the mid to late 90s when I was trying to run a
            shareware game business. I realized almost immediately that the
            real bottleneck in games is memory bandwidth, not processing power.
            This was right at the time that Quake III came out and everyone was
            trying to get a Voodoo2, I think it was? CPUs with FPUs had only
            gone mainstream maybe 5 years before that, and people were still
            arguing about Pentium vs 486 DX4. I was on Mac, but I don't think I
            even had a PowerPC yet.
            
            Then everyone got video cards and CPU performance stopped improving
            almost overnight. Sure, we got 200 MHz Pentium IIs, and then Intel
            jumped warp speed into 1 GHz and then 2GHz and then 3 GHz... but
            single threaded performance wasn't any faster, and even today is
            only maybe 3 times faster than it was then, per clock cycle. What
            really happened is that all of the chip area went to branch
            prediction and caching.
            
            When chips went from a few million transistors to a billion, I
            started asking why we couldn't just put dozens or hundreds of the
            old CPU cores on the new chips. As we all saw though, nobody
            listened or cared about that. So today we have behemoth chips that
            still choke when the web browser has a lot of tabs open.
            
            Chips today have maybe 8 or 16 cores, and that's great. But it's 2
            orders of magnitude less than the transistor budget could support.
            Apple's M1 is loosely trying to do what I'm asking. But it's making
            the mistake of having all of these proprietary/dedicated cores for
            SIMD stuff. I would scrap all of that, and go with a 2D array of
            general-purpose cores, each with their own local memories,
            communicating using web metaphors like content-addressable memory.
            
            In fairness, I think the reason that real multicore CPUs never
            caught on, is that we didn't have the languages to utilize them.
            But today we have Matlab and various Lisps and higher order methods
            that auto-parallelize loops by treating them as transformations on
            arrays. All of our languages should have been auto-parallelized by
            now anyway. And not with SIMD optimization magic, I mean by
            statically analyzing code and converting it all first into higher
            order methods, then optimizing that intermediate code (I-code) so
            that the block copies are spread over multiple cores and memories.
            I can't remember the term for this, it's basically divide and
            conquer though, for example if fork/join scope was limited to a
            single function by the runtime. Scatter gather and map reduce are
            other terms for this.
            
            So right now we have to deal with promises and async and other
            patterns (I consider patterns an anti-pattern) when we could just
            be using an ordinary language like Javascript or C,
            auto-parallelized to run on 256+ cores with something like
            terabytes per second of bandwidth, running many thousands of times
            faster than computers today, for far less effort because it appears
            as a single thread of execution. Then OpenCL or OpenGL or anything
            else could run like any other library above that, for people that
            prefer a higher-level interface.
       
              klelatti wrote 12 hours 11 min ago:
              Hi, Thanks for the extensive reply - a lot to digest and reflect
              on!
              
              First of all I think a broadly agree with the direction of your
              argument. In the early 2000s the decision was made to focus on
              single core performance and SIMD extensions rather than embrace a
              massively multicore future. I guess Intel got burned by Itanium
              and decided that 100% compatibility with existing software was
              essential.
              
              I think that road has run out now. Single core performance
              improvements have slowed and big SIMD is dying (hello AVX 512!).
              Desktop core counts are stuck but on the server you can use 128
              core EC2 instances. How long before this appears in a box on your
              desk?
              
              Massively multicore GPUs have taken over ML but having tried to
              use GPUs for general purpose computing there are huge issues - eg
              the overhead in transferring data and limited GPU memory sizes.
              The good news is that if you use the right tools you can use say
              OpenCL and write to run on both CPU and GPU and take advantage of
              increasing core counts on both.
              
              So I think we’re on the cusp of a change: much higher CPU core
              counts and developers having the tools to make use of those
              cores.
              
              A couple of ps
              
              It will be interesting to see whether someone tries putting lots
              of simple in order cores on a single die (I think there are early
              RISCV attempts at this).
              
              The transputer in the 1980s was an early experiment in massively
              multicore CPU systems.
              
              The Arm team knew early on that memory bandwidth was key and
              focused on that with the Arm1 (and were rejected by Intel when
              they asked for a higher bandwidth x86 core). The rest is history!
       
        d_tr wrote 2 days ago:
        The two supported FPGA families are a blessing for this kind of
        project, since they have hardware floating-point units. Unfortunately
        they are quite expensive, like the Xilinx ones with this feature...
       
        R0b0t1 wrote 2 days ago:
        I've tried looking up the hardware they run on. Anyone have a price?
       
          detaro wrote 2 days ago:
          Expensive. Exact parts aren't clear, but hundreds of dollars for a
          single chip and 5k+ for a devkit from a quick look?
          
          But running on FPGA is really only the testing stage for putting it
          in an ASIC if something like this wants to be competitive in any way.
       
            R0b0t1 wrote 2 days ago:
            I thought so. Hundreds for the chip isn't insane (depending on how
            many) but $5k for the dev kit, oof.
       
              detaro wrote 2 days ago:
              Yeah. From all I know FPGA pricing is very weird in that prices
              for singles are way worse than if you buy a lot, even more so
              than for other chips.
       
        NotCamelCase wrote 2 days ago:
        This is an amazing project considering the scope of work required on
        both sides of the aisle -- HW and SW.
        
        I find choice of RISC-V pretty interesting for this use case as it's a
        fixed-size ISA and there is a significant amount of of auxiliary data
        usually passed from drivers to HW in typical GPU settings, even for
        GPGPU scenarios alone. If you look at one of their papers, it shows how
        they pass extra texture parameters via CSRs. I think this might come to
        be bottleneck and limiting factor in the design for future expansions.
        I am currently doing a similar work (>10x smaller in comparison) on a
        more limited feature set, so I am really curious how it'll turn out to
        be.
       
          zozbot234 wrote 2 days ago:
          RISC-V is not "fixed size", the encoding has room for larger
          instructions (48-bit, 64-bit or more).
       
            NotCamelCase wrote 2 days ago:
            I guess you're referring to variable-length encodings support? It's
            fixed as in they only implement RV32IMF subset here. Even then,
            code density may be source of bottlenecks along the way.
       
        ksec wrote 2 days ago:
        Nice, instead of trying to tackle the CPU space RISC-V should really be
        doing more work on GPGPU space with open source drivers.
        
        Current GPU are the biggest blackbox and mystery in modern computing.
       
          sitkack wrote 2 days ago:
          RISC-V (with no vectors) was the base ISA to support the real goal of
          making Vector processors, it was supposed to be a short side quest.
          Much longer than expected but 1000% worth it.
          
          RVV (RISC-V Vector Extension) is the real coup and ultimately what
          the base ISA is there to support. [1] [2] GPUs might be complex
          beasts, ultimately it is lots of FMAs (Fused Multiply Add) that do
          most of our calculations.
          
   URI    [1]: https://youtu.be/V7fuE1yXUxk?t=104
   URI    [2]: https://www.youtube.com/watch?v=oTaOd8qr53U
   URI    [3]: https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_op...
       
        chalcolithic wrote 2 days ago:
        Wow! Just add NaNboxing support (for JavaScript and possibly other
        dynamic languages) and it'll be a CPU I dreamed about.
       
          sitkack wrote 2 days ago:
          For those unfamiliar with Nan-boxing [1] > One use is NaN-boxing,
          which is where you stick all the other non-floating point values in a
          language + their type information into the payload of NaNs. It’s a
          beautiful hack.
          
   URI    [1]: https://anniecherkaev.com/the-secret-life-of-nan
       
          nynx wrote 2 days ago:
          It’s a GPU.
       
            chalcolithic wrote 2 days ago:
            Yes and I wanted a GPU-style CPU that could handle all the tasks in
            the system so no host CPU is necessary
       
        pabs3 wrote 2 days ago:
        OpenCL seems to be kind of dying (eg Blender abandoned it), I wonder
        what is going to replace it.
       
          my123 wrote 2 days ago:
          CUDA is what ended up replacing it, or rather, OpenCL had always
          failed to make a dent over the long term.
          
          (with AMD ROCm being a CUDA API clone, without the PTX layer)
       
            DeathArrow wrote 2 days ago:
            But is there anyone using ROCm in production? Is ROCm up to par
            with CUDA?
       
              my123 wrote 2 days ago:
              No. It isn’t up to par. But that’s AMD’s problem.
              
              No standard spec would solve a lack of software development
              investment of a hardware vendor, especially for a device as
              complex as a GPU.
              
              (meanwhile, on the Intel side, oneAPI looks to be very
              serviceable, but has a problem for now: where is the fast
              hardware to run it on?)
       
        unsigner wrote 2 days ago:
        We should really have another word for “chip that runs OpenCL but has
        no rasterizer”.
        
        I see the title was edited to call it a “GPGPU”, or a
        “general-purpose GPU” but it’s not a thing; GPGPU was an early
        moniker for when people tried to do non-graphics work on GPUs many
        years ago, but it was a word for techniques, never for a specific type
        of hardware. Plus it feels to me that “general purpose” should be
        something more than a GPU, while this is strictly less.
       
          zbendefy wrote 2 days ago:
          OpenCL has a category called CL_ DEVICE_ TYPE_ ACCELERATOR for that,
          so something like 'Accelerator' seems to fit
       
          raphlinus wrote 2 days ago:
          I don't really agree. I think it's completely valid to explore a GPU
          architecture in which rasterization is done in software, with a
          perhaps a bit of support in the ISA. That's what they've done here,
          and they do demonstrate running OpenGL ES.
          
          The value of this approach depends on the workload. If it's mostly
          rasterizing large triangles with simple shaders, then a hardware
          rasterizer buys you a lot. However, as triangles get smaller, a pure
          software rasterizer can win (as demonstrated by Nanite). And as you
          spend more time in shaders, the relative amount of overhead from
          software rasterization decreases; this was shown in the cudaraster
          paper[1].
          
          Overall, if we can get simpler hardware with more of a focus on
          compute power, I think that's a good thing, and I think it's
          completely fine to call that a GPU.
          
          [1] 
          
   URI    [1]: https://research.nvidia.com/publication/high-performance-sof...
       
            justsid wrote 1 day ago:
            This is essentially where Sony was trying to go with their Cell
            architecture in the PS3. Only at the very end did they realize that
            they actually needed a GPU that can do rasterization in hardware.
            In fact, a lot of games actually did graphics workloads on the SPEs
            to help out the pretty weak GPU. The concept can definitely work,
            especially if the driver takes care of all the programmable bits
            and exposes a more classical graphics pipeline to the host.
       
          avianes wrote 2 days ago:
          That term is "SIMT architecture."
          
          Modern GPUs (or GPGPUs) are based on the SIMT programming model that
          requires an SIMT architecture
       
            zozbot234 wrote 2 days ago:
            "SIMT" is not an architecture, it's just a programming model that
            ultimately boils down to wide SIMD instructions with conditional
            execution.  Add that to a barrel processor that can hide memory
            latency across a sizeable amount of hardware threads, and you've
            got the basics of a GPU "core".
       
              avianes wrote 2 days ago:
              SIMT is a programming model, you are right.
              
              But in the literature the term "SIMT architecture" is used to
              describe architectures optimized for the SIMT programming model.
              
              Just search for "SIMT architecture" in google scholar or any
              other search engine dedicated to academic research, you will see
              that it's indeed a term used for this kind of architecture.
       
          nine_k wrote 2 days ago:
          Vector processors? Follow the early Cray nomenclature.
       
            avianes wrote 2 days ago:
            The terminology "vector processor" refers to a completely different
            type of architecture.
            Using it for SIMT architecture would be confusing
       
              dahart wrote 2 days ago:
              What definition of vector processor are you thinking of?
              Wikipedia’s definition appears to agree with the parent, and
              even states “Modern graphics processing units […] can be
              considered vector processors”
              
   URI        [1]: https://en.wikipedia.org/wiki/Vector_processor
       
                avianes wrote 2 days ago:
                Yes, the definition matches.
                But that doesn't mean that the architecture and the
                micro-architecture used is similar.
                
                So.. Yes! We can say that these architectures are some kind of
                "vector processors", but it will be ambiguous regarding the
                programming model and the architecture used.
       
                  dahart wrote 2 days ago:
                  I’m interested to hear what you mean by “vector
                  processor”. What does that imply to the lay person, and how
                  is it different enough to be confusing when applied to GPUs?
                  What does the term imply to you in terms of architecture?
       
                    avianes wrote 2 days ago:
                    The term "vector processor" generally refers to a processor
                    with a traditional programming model, but which features a
                    "vector" unit capable of performing operations on large
                    vectors of fixed or variable size. It can occupy the vector
                    unit for a significant amount of cycles.
                    The RISC-V Vector extension is a good example of what makes
                    a vector processor.
                    
                    However, and this is a source of confusion, the standard
                    definition is abstract enough so that many other
                    architecture can be called "vector processor".
                    
                    Regarding modern GPGPU (with SIMT architecture) we are
                    dealing with a programming model named SIMT
                    (Single-Instruction Multiple-Thread) in which the
                    programmer must take into account that there is one code
                    for multiple thread (a block of thread), each instructions
                    will be executed by several "core" simultaneously.
                    
                    This has implications, the hardware has a limited number of
                    "core" so it must split the block of threads into
                    sub-blocks called wraps (1 wrap = 32 threads on Nvidia
                    machines).
                    When we offload compute to a GPU we send him several blocks
                    of wrap. And all wraps will be executed progressively, the
                    GPU scheduler's job is to pick a ready wrap, execute one
                    instruction from the wrap, then pick a new wrap and repeat.
                    
                    This means that wraps from a block have the ability to get
                    out of sync. With a classical vector processor this kind of
                    situation is not possible (or not visible architecturally),
                    it is not possible for a portion of the vector to be 5
                    instructions ahead for example.
                    Therefore, GPU includes instructions to resynchronize wraps
                    from a group, while vector processors don't need this. But
                    it also means that you expose much more unintentional
                    dependency between data with a vector processor.
       
                      dahart wrote 2 days ago:
                      > the standard definition is abstract enough
                      
                      It seems like you’re making the case that the term
                      “vector processor” should be interpreted as something
                      general, and not something specific? Since the Cray
                      vector processor predates RISC-V by ~35 years, isn’t
                      the suggestion above to use it the way Cray did fairly
                      reasonable? It doesn’t seem like it’s really adding
                      much confusion to include GPUs under this already
                      existing umbrella term...
                      
                      > With a classical vector processor […] it is not
                      possible for a portion of the vector to be 5 instructions
                      ahead
                      
                      Just curious here, the WP article talks about how one
                      difference between “vector” processing and SIMD is
                      that vector computers are about variable length vectors
                      by design, where SIMD vectors are usually fixed length.
                      How does that square up with what you’re saying about
                      not having any divergence?
                      
                      This feels like it’s comparing apples to oranges a
                      little… a SIMT machine has different units ahead of
                      others because they’re basically mini independent
                      co-processors. If you have a true vector processor
                      according to your definition, but simply put several of
                      them together, then you would end up with one being ahead
                      of the others. That’s all a modern GPU SIMT machine is:
                      multiple vector processors on one chip, right? It seems
                      like time and scale and Moore’s Law would inevitably
                      have turned vector processors into a machine that can
                      handle divergent and/or independent blocks of execution.
                      
                      BTW, not sure if it was just auto-correct, but you mean
                      “warp” and not “wrap”, right?
       
                        avianes wrote 1 day ago:
                        > BTW, not sure if it was just auto-correct, but you
                        mean “warp” and not “wrap”, right?
                        
                        Oh, sorry I totally meant "warp" not "wrap", I don't
                        know how I introduced that typo.
                        
                        > It seems like you’re making the case that the term
                        “vector processor” should be interpreted as
                        something general, and not something specific?
                        
                        Not exactly, I am in favor of using a specific term,
                        and in particular keeping the use of the term "vector
                        processor" for machines similar to the Cray ones. But I
                        admit that the term is used in a more abstract way. For
                        instance the Intel AVX extension means "Advanced Vector
                        Extensions" while it is definitely a SIMD extension.
                        Computer architecture lacks of accurate/strict
                        definitions, probably because there are often many
                        possible implementations of the same idea. Then we
                        sometimes find ourselves using words that are a bit
                        disconnected from their original idea.
                        The architectures that Cray's engineers came up with
                        have not much to do with the modern SIMT architecture.
                        That's why I find it confusing.
                        
                        > vector computers are about variable length vectors by
                        design, where SIMD vectors are usually fixed length.
                        How does that square up with what you’re saying about
                        not having any divergence?
                        
                        Not sure if I understood the question correctly.
                        But after execution of a vector or SIMD instruction,
                        the vector or SIMD register is seen as containing the
                        outcome of the operation, it's not possible to observe
                        in the register a temporary value or an old value
                        because it has not been processed yet. While with a
                        SIMT programming model and architecture it is possible
                        to observe it if we omit synchronization.
                        This is a very clear difference in observable
                        architectural states.
                        
                        > If you have a true vector processor according to your
                        definition, but simply put several of them together,
                        then you would end up with one being ahead of the
                        others.
                        
                        Of course you can reproduce a model similar to SIMT
                        with a lot of vector or scalar processors by changing
                        the programming model and the architecture
                        significantly.
                        But then it seems reasonable to me to call that an SIMT
                        programming model & architecture
                        
                        > That’s all a modern GPU SIMT machine is: multiple
                        vector processors on one chip, right?
                        
                        Sort of.. But splitting the compute into groups and
                        warps is not negligible, it implies big differences in
                        the architecture, in uarch, in design, and programing
                        model.
                        So it makes sense to give a different name when there
                        are many significant changes.
       
            eqvinox wrote 2 days ago:
            VPU?  With network processors being called NPU these days...
       
              zmix wrote 2 days ago:
              > VPU?
              
              Already taken: Video Processing Unit.
       
                techdragon wrote 2 days ago:
                What’s your point? NPU was mentioned earlier and I routinely
                see NPU used as an acronym for “neural processing unit” in
                modern embedded hardware containing hardware either on chip or
                on module for accelerating neural networks for various edge
                computing applications of machine learning models.
                
                I’m pretty sure Vector Processing Unit may outdate Video
                Processing Unit given that people were developing vector
                processing hardware for high performance computing back before
                even the Amiga was released which is the earliest thing I can
                think of that had serious video related hardware. I leave room
                for someone to have called their text mode display driver chips
                a “video processing unit” but I don’t think it would have
                been common given the nominal terminology at the time was to
                call the screen a “display” and the hardware was typically
                called a “display adaptor” or “display adapter” … at
                least in my experience, which I will admit is limited since I
                didn’t live through it merely learned after the fact by being
                interested in retro computing.
       
        bullen wrote 2 days ago:
        I think there is another project doing RISC-V GPU: [1] Also the latest
        announced RVB-ICE should have a OpenGL ES 3+ capable Vivante GC8000UL
        GPU (did not manage to find documentation for this exact version but
        all GC8000 seem to): [2] Disclaimer: Expensive if you don't know if
        it's vapourware and how the drivers and linux works!
        
   URI  [1]: https://www.pixilica.com/graphics
   URI  [2]: https://www.aliexpress.com/item/1005003395978459.html
       
        raphlinus wrote 2 days ago:
        This is a research project from Georgia Tech. There's a homepage at [0]
        and a paper at [1]. It is specialized to run OpenCL, but with a bit of
        support for the graphics pipeline, mostly a texture fetch instruction.
        It appears to be fairly vanilla RISC-V overall, with a small number of
        additional instructions to support GPU. I'm very happy to see this kind
        of thing, as I think there's a lot of design space to explore, and it's
        great that some of that is happening in academic spaces.
        
        [0]: [1] publications/vortex_micro21_fin...
        
   URI  [1]: https://vortex.cc.gatech.edu/
   URI  [2]: https://vortex.cc.gatech.edu/publications/vortex_micro21_final...
       
          hajile wrote 2 days ago:
          Intel's Larabee/Xeon Phi shows that there's a ton of potential here.
          
          Intel's big issue is that x86 is incredibly inefficient. Implementing
          the base instruction set is very difficult. Trying to speed it up at
          all starts drastically increasing core size. This means that the SIMD
          to overhead ratio is pretty high
          
          RISC-V excels at tiny implementations and power efficiency. The ratio
          of SIMD to the rest of the core should be much higher resulting in
          overall better efficiency.
          
          The final design (at a high level) seems somewhat similar to AMD's
          RDNA with a scalar ALU doing the flow control while a very wide SIMD
          does the bulk of the calculations.
       
        throwaway81523 wrote 2 days ago:
        A GPGPU in an FPGA.  Interesting, but 100x slower than a commodity AMD
        or NVidia card.
       
          fahadkhan wrote 2 days ago:
          It's a research project. It's open source. FPGA are often used for
          developing hardware. If it gets good enough for someone's use case,
          they will print the chips.
       
          gumby wrote 2 days ago:
          Perfect way to prototype hardware.
       
        akmittal wrote 2 days ago:
        It great to see RISC-V making a lot of progress. 
        A lot of research is coming from China because of US bans, but
        hopefully this will be good for whole world.
       
          zucker42 wrote 2 days ago:
          Which U.S. bans are you talking about? Is there anywhere I can read
          more about this?
       
            bee_rider wrote 2 days ago:
            We occasionally ban companies that make HPC parts (Intel, NVIDIA,
            AMD) from selling to Chinese research centers, generally citing
            concerns that they could be used for weapons R&D (nuclear weapons
            simulation for example).
            
            2015: [1] 2019: [2] 2021:
            
   URI      [1]: https://spectrum.ieee.org/us-blacklisting-of-chinas-superc...
   URI      [2]: https://www.yahoo.com/now/trump-bans-more-chinese-tech-211...
   URI      [3]: https://www.bloomberg.com/news/articles/2021-04-08/u-s-add...
       
              monocasa wrote 2 days ago:
              And then made it's own domestic supercomputing cluster that
              topped the charts when it came online.
       
                hajile wrote 2 days ago:
                Those cores are 28nm and I believe they were made on TSMC.
                
                Today, the manufacture ban has China still on 28 or maybe 22nm
                as their most advanced node.
                
                They had some old 14nm stuff, but last I heard, the guys that
                owned it greatly over-promised to the Chinese Government.
                
                China has no modern fabrication processes and no way forward
                toward designing one. At present, they are headed to a state of
                perpetually being 5 node (one decade) behind.
       
                  fomine3 wrote 1 day ago:
                  Latest Sunway OceanLight perf indicates they use better
                  process like at least around 7nm rather than 28nm.
                  
   URI            [1]: https://www.hpcwire.com/2021/11/24/three-chinese-exa...
       
                jpgvm wrote 2 days ago:
                Yeah. People forget that capital allocation works differently
                in China. You ban something they want, they simply make it
                themselves.
                
                I think banning their access to EUV lithography via the export
                ban of ASML machines is going to backfire horribly on the US.
                China has started allocating absolutely ridiculous amounts of
                money to hard science and have also changed the way they fund
                and prioritize projects to make them more commercially
                targeted. The end result of this is they now have a very large
                number of very smart scientists with nearly infinite funding
                being told to solve for the rest of the fab pipeline.
                
                ASML might retain their monopoly for a few more years but I
                think this move will eventually result in Chinese building even
                better machines and likely more practical and for less money -
                as is the Chinese way.
       
                  phkahler wrote 2 days ago:
                  >> ASML might retain their monopoly for a few more years but
                  I think this move will eventually result in Chinese building
                  even better machines and likely more practical and for less
                  money - as is the Chinese way.
                  
                  The complexity of ASML EUV light sources is incredible. China
                  might just use a synchrotron for that. Sure there are issues
                  to resolve but it seems like it has to    be simpler in the
                  end.
       
                  sitkack wrote 2 days ago:
                  I think the bans are strategic in that we need a capable foe.
                  By banning the right things, it ensures that they are at
                  parity with us.
                  
                  A smart colonial power doesn't cut off access 100% but
                  rations and controls access to resources.
                  
                  I can't wait to buy a refrigerator sized fab on alibaba for
                  25k in five years.
       
                cyounkins wrote 2 days ago:
                I hadn't heard about this:
                
   URI          [1]: https://en.wikipedia.org/wiki/Sunway_TaihuLight
       
            grawlinson wrote 2 days ago:
            It'll probably be something like this[0] and this[1]. I think there
            are more export restrictions than these two examples.
            
            [0]: [1]:
            
   URI      [1]: https://en.wikipedia.org/wiki/Export_of_cryptography_from_...
   URI      [2]: https://edition.cnn.com/2020/12/18/tech/smic-us-sanctions-...
       
       
   DIR <- back to front page