gopher://codevoid.de/1/hn/comments

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Glassworm Is Back: A New Wave of Invisible Unicode Attacks Hits Repositories
       
       
        hananova wrote 41 min ago:
        My hot take is that all programming languages should go back to only
        accepting source code saved in 7-bit ASCII. With perhaps an exception
        for comments.
       
        like_any_other wrote 52 min ago:
        Invisible characters, lookalike characters, reversing text order
        attacks [1].. the only way to use unicode safely seems to be by
        whitelisting a small subset of it.
        
        And please, everyone arguing the code snippet should never have passed
        review - do you honestly believe this is the only kind of attack that
        can exploit invisible characters?
        
   URI  [1]: https://attack.mitre.org/techniques/T1036/002/
       
        bawolff wrote 1 hour 14 min ago:
        I feel like the threat of this type of thing is really overstated.
        
        Sure the payload is invisible (although tbh im surprised it is. PUA
        characters usually show up as boxes with hexcodes for me), but the part
        where you put an "empty" string through eval isn't.
        
        If you are not reviewing your code enough to notice something as non
        sensical as eval() an empty string, would you really notice the non
        obfuscated payload either?
       
        mhitza wrote 1 hour 35 min ago:
        Their button animations almost "crash" Firefox mobile. As soon as I
        reach them the entire page scrolls at single digit FPS.
       
        NoMoreNicksLeft wrote 1 hour 40 min ago:
        Why can't code editors have a default-on feature where they show any
        invisible character (other than newlines)? I seem to remember Sublime
        doing this at least in some cases... the characters were rendered as a
        lozenge shape with the hex value of the character.
        
        Is there ever a circumstance where the invisible characters are both
        legitimate and you as a software developer wouldn't want to see them in
        the source code?
       
        chairmansteve wrote 1 hour 51 min ago:
        eval() used to be evil....
        
        Are people using eval() in production code?
       
        zzo38computer wrote 1 hour 54 min ago:
        I use non-Unicode mode in the terminal emulator (and text editors,
        etc), I use a non-Unicode locale, and will always use ASCII for most
        kind of source code files (mainly C) (in some cases, other character
        sets will be used such as PC character set, but usually it will be
        ASCII). Doing this will mitigate many of this when maintaining your own
        software. I am apparently not the only one; I have seen others suggest
        similar things. (If you need non-ASCII text (e.g. for documentation)
        you might store them in separate files instead. If you only need a
        small number of them in a few string literals, then you might use the
        \x escapes; add comments if necessary to explain it.)
        
        The article is about in JavaScript, although it can apply to other
        programming languages as well. However, even in JavaScript, you can use
        \u escapes in place of the non-ASCII characters. (One of my ideas in a
        programming language design intended to be better instead of C, is that
        it forces visible ASCII (and a few control characters, with some
        restrictions on their use), unless you specify by a directive or switch
        that you want to allow non-ASCII bytes.)
       
        max_ wrote 2 hours 46 min ago:
        I don't have to worry about any of this.
        
        My clawbot & other AI agents already have this figured out.
        
        /s
       
        codechicago277 wrote 3 hours 3 min ago:
        I wonder if this could be used for prompt injection, if you copy and
        paste the seemingly empty string into an LLM does it understand? Maybe
        the affect Unicode characters arenât tokenized.
       
        tolciho wrote 3 hours 43 min ago:
        Attacks employing invisible characters are not a new thing. Prior
        efforts here include terminal escape sequences, possibly hidden with
        CSS that if blindly copied and pasted would execute who knows what if
        the particular terminal allowed escape sequences to do too much (a
        common feature of featuritis) or the terminal had errors in its
        invisible character parsing code.
        
        For data or code hiding the Acme::Bleach Perl module is an old example
        though by no means the oldest example of such. This is largely
        irrelevant given how relevant not learning from history is for most.
        
        Invisible characters may also cause hard to debug issues, such as
        lpr(1) not working for a user, who turned out to have a control
        character hiding in their .cshrc. Such things as hex viewers and OCD
        levels of attention to detail are suggested.
       
        WalterBright wrote 3 hours 48 min ago:
        Unicode should be for visible characters. Invisible characters are an
        abomination. So are ways to hide text by using Unicode so-called
        "characters" to cause the cursor to go backwards.
        
        Things that vanish on a printout should not be in Unicode.
        
        Remove them from Unicode.
       
          bawolff wrote 1 hour 6 min ago:
          Good luck with that given there are invisible characters in ascii.
          
          Also this attack doesnt seem to use invisible characters just
          characters that dont have an assigned meaning.
       
          tetha wrote 1 hour 16 min ago:
          That ship has sailed, but I consider Unicode a good thing, yet I
          consider it problematic to support Unicode in every domain.
          
          I should be able to use Ã as a cursed smiley in text, and many more
          writing systems supported by Unicode support even more funny things.
          That's a good thing.
          
          On the other hand, if technical and display file names (to GUI users)
          were separate, my need for crazy characters in file names, code bases
          and such are very limited. Lower ASCII for actual file names consumed
          by technical people is sufficient to me.
       
          eviks wrote 1 hour 20 min ago:
          So you'd remove space and tab from Unicode?
       
          luke-stanley wrote 3 hours 5 min ago:
          So we need a new standard problem due to the complexity of the last
          standard? Isn't unicode supposed to be a superset of ASCII, which
          already has control characters like new space, CR, and new lines? xD
       
            WalterBright wrote 2 hours 28 min ago:
            The only ones people use any more are newline and space. A tab key
            is fine in your editor, but it's been more or less abandoned as a
            character. I haven't used a form feed character since the 1970s.
       
          uhoh-itsmaciek wrote 3 hours 17 min ago:
          >Remove them from Unicode.
          
          Do you honestly think this is a workable solution?
       
            WalterBright wrote 2 hours 24 min ago:
            Yes, absolutely. See my other replies.
       
          pvillano wrote 3 hours 25 min ago:
          Unicode is "designed to support the use of text in all of the world's
          writing systems that can be digitized"
          
          Unicode needs tab, space, form feed, and carriage return.
          
          Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK
          to switch between left-to-right and right-to-left languages.
          
          Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL
          JUNGSEONG FILLER to typeset Korean.
          
          Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two
          characters should not be connected by a ligature.
          
          Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break
          opportunity without actually inserting a visible space.
          
          Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the
          traditional Mongolian alphabet.
       
            WalterBright wrote 2 hours 30 min ago:
            > Unicode needs tab, space, form feed, and carriage return.
            
            Those are legacied in with ASCII. And only space and newline are
            needed. Before I check in code to git, I run a program that removes
            the tabs and linefeeds.
            
            > Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT
            MARK to switch between left-to-right and right-to-left languages.
            
            !!tfel ot thgir ,am ,kooL
            
            > Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL
            JUNGSEONG FILLER to typeset Korean.
            
            I don't believe it.
            
            > Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two
            characters should not be connected by a ligature.
            
            Not needed.
            
            > Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break
            opportunity without actually inserting a visible space.
            
            How on earth did people read printed matter without that?
            
            > Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the
            traditional Mongolian alphabet.
            
            Somehow people didn't need invisible characters when printing
            books.
       
              jmusall wrote 1 hour 44 min ago:
              The fact is that there were so many character sets in use before
              Unicode because all these things were needed or at least wanted
              by a lot of people. Here's a great blog post by Nikita Prokopov
              about it:
              
   URI        [1]: https://tonsky.me/blog/unicode/
       
              chongli wrote 2 hours 3 min ago:
              Unicode is for human beings, not machines.
       
              bulbar wrote 2 hours 9 min ago:
              That's a very narrow view of the world. One example: In the past
              I have handled bilingual english-arabic files with switches
              within the same line and Arabic is written from left to right.
              
              There are also languages that are written from to to bottom.
              
              Unicode is not exclusively for coding, to the contrary, pretty
              sure it's only a small fraction of how Unicode is used.
              
              > Somehow people didn't need invisible characters when printing
              books.
              
              They didn't need computers either so "was seemingly not needed in
              the past" is not a good argument.
       
                WalterBright wrote 4 min ago:
                > That's a very narrow view of the world.
                
                Yes, it is. Unicode has undergone major mission creep, thinking
                it is now a font language and a formatting language. Naturally,
                this has lead to making it a vector for malicious actors. (The
                direction reversing thing has been used to insert malicious
                text that isn't visible to the reader.)
                
                > Unicode is not exclusively for coding
                
                I never mentioned coding.
                
                > They didn't need computers
                
                Unicode is for characters, not formatting. Formatting is what
                HTML is for, and many other formatting standards. Neither is it
                for meaning.
       
                pibaker wrote 30 min ago:
                > That's a very narrow view of the world.
                
                But not one that would surprise anyone familiar with
                WalterBright's antics on this websiteâ¦
       
              WalterBright wrote 2 hours 15 min ago:
              Look Ma
                  xt! N !
                  e tee S
                  T larip
              
              (No Unicode needed.)
       
          WalterBright wrote 3 hours 38 min ago:
          Another dum dum Unicode idea is having multiple code points with
          identical glyphs.
          
          Rule of thumb: two Unicode sequences that look identical when printed
          should consist of the same code points.
       
            estebank wrote 1 hour 19 min ago:
            If anything, Unicode should have had more disambiguated characters.
            Han unification was a mistake, and lower case dotted Turkish i and
            upper case dotless Turkish I should exist so that toUpper and
            toLower didn't need to know/guess at a locale to work correctly.
       
            wcoenen wrote 2 hours 9 min ago:
            As far as I know, glyphs are determined by the font and rendering
            engine. They're not in the Unicode standard.
       
            nswango wrote 3 hours 25 min ago:
            So you think that the letters in the Greek and Cyrillic alphabets
            which are printed identically to the Latin A should not exist?
            
            And, for example, Greek words containing this letter should be
            encoded with a mix of Latin and Greek characters?
       
              Yokohiii wrote 1 hour 55 min ago:
              What about numbers? Would they be assigned to arabic only? I
              guess someone will be offended by that.
              
              While at it we could also unify I, | and l. It's too confusing
              sometimes.
       
              WalterBright wrote 2 hours 25 min ago:
              > So you think that the letters in the Greek and Cyrillic
              alphabets which are printed identically to the Latin A should not
              exist?
              
              Yes. Unicode should not be about semantic meaning, it should be
              about the visual. Like text in a book.
              
              > And, for example, Greek words containing this letter should be
              encoded with a mix of Latin and Greek characters?
              
              Yup. Consider a printed book. How can you tell if a letter is a
              Greek letter or a Latin letter?
              
              Those Unicode homonyms are a solution looking for a problem.
       
                bawolff wrote 1 hour 1 min ago:
                > Yes. Unicode should not be about semantic meaning, it should
                be about the visual. Like text in a book.
                
                Do you think 1, l and I should be encoded as the same
                character, or does this logic  only extend to characters pesky
                foreigners use.
       
                Yokohiii wrote 1 hour 50 min ago:
                Unicode is about semantics not appearance. If you don't need
                semantics then use something different.
       
                Muromec wrote 1 hour 57 min ago:
                >Yup. Consider a printed book. How can you tell if a letter is
                a Greek letter or a Latin letter?
                
                I can absolutely tell Cyrillic k from the lating Ðº and latin u
                from the Cyrillic Ð¸.
                
                >should not be about semantic meaning,
                
                It's always better to be able to preserve more information in a
                text and not less.
       
            jeltz wrote 3 hours 27 min ago:
            I don't think that would help much. There are also characters which
            are similar but not the same and I don't think humans can spot the
            differences unless they are actively looking for them which most of
            the time people are not. If only one of two glyphs which are
            similar appear in the text nobody would likely notice, expectation
            bias will fuck you over.
       
              WalterBright wrote 2 hours 25 min ago:
              I wonder how anybody got by with printed books.
       
          moritzruth wrote 3 hours 45 min ago:
          greatidea,whoneedsspacesanyway
       
            WalterBright wrote 3 hours 44 min ago:
            Spaces appear on a printout.
       
          abujazar wrote 3 hours 45 min ago:
          Invisible characters are there for visible characters to be printed
          correctly...
       
            WalterBright wrote 2 hours 24 min ago:
            I'll grant that a space and a newline are necessary. The rest,
            nope.
       
              abujazar wrote 1 hour 54 min ago:
              You're talking about a subset of ASCII then. Unicode is supposed
              to support different languages and advanced typography, for which
              those characters are necessary. You can't write e.g. Arabic or
              Hebrew without those "unnecessary" invisible characters.
       
        vitus wrote 3 hours 59 min ago:
        Looks like the repo owner force-pushed a bad commit to replace an
        existing one. But then, why not forge it to maintain the existing
        timestamp + author, e.g. via `git commit --amend -C df8c18`?
        
        Innocuous PR (but do note the line about "pedronauck pushed a commit
        that referenced this pull request last week"): [1] Original commit: [2]
        Amended commit: [3] Either way, pretty clear sign that the owner's
        creds (and possibly an entire machine) are compromised.
        
   URI  [1]: https://github.com/pedronauck/reworm/pull/28
   URI  [2]: https://github.com/pedronauck/reworm/commit/df8c18
   URI  [3]: https://github.com/pedronauck/reworm/commit/d50cd8
       
          chrismorgan wrote 3 hours 29 min ago:
          The value of the technique, I suppose, is that it hides a large
          payload a bit better. The part you can see stinks (a bunch of magic
          numbers and eval), but I suppose itâs still easier to overlook than
          a 9000-character line of hexadecimal (if still encoded or even
          decoded but still encrypted) or stuff mentioning Solana and Russian
          timezones (I just decoded and decrypted the payload out of
          curiosity).
          
          But really, it still has to be injected after the fact. Even the most
          superficial code review should catch it.
       
            vitus wrote 3 hours 11 min ago:
            Agreed on all those fronts. I'm just dismayed by all the comments
            suggesting that maintainers just merged PRs with this trojan, when
            the attack vector implies a more mundane form of credential
            compromise (and not, as the article implies, AI being used to sneak
            malicious changes past code review at scale).
       
              jeltz wrote 2 hours 55 min ago:
              Yeah, the attack vector seems to be stolen credentials. I would
              be much more interested in an attack which actually uses
              Invisible characters as the main vector.
       
        ocornut wrote 4 hours 5 min ago:
        It baffles me that any maintainer would merge code like the one
        highlighted in the issue, without knowing what it does. Thatâs
        regardless of being or not being able to see the âinvisibleâ
        characters. Thereâs a transforming function here and an eval() call.
        
        The mere fact that a software maintainer would merge code without
        knowing what it does says more about the terrible state of software.
       
          pdonis wrote 18 min ago:
          Wish I could upvote this more.
       
        faangguyindia wrote 4 hours 23 min ago:
        Back in time I was on hacking forums where lot of script kiddies used
        to make malicious code.
        
        I am wondering how that they've LLM, are people using them for making
        new kind of malicious codes more sophisticated than before?
       
          Yokohiii wrote 4 hours 3 min ago:
          In this case LLMs were obviously used to dress the code up as more
          legitimate, adding more human or project relevant noise. It's social
          engineering, but you leave the tedious bits to an LLM. The
          sophisticated part is the obscurity in the whole process, not the
          code.
       
        btown wrote 4 hours 42 min ago:
        IMO while the bar is high to say "it's the responsibility of the
        repository operator itself to guard against a certain class of attack"
        - I think this qualifies. The same way GitHub provides Secret Scanning
        [0], it should alert upon spans of zero-width characters that are not
        used in a linguistically standard way (don't need an LLM for this, just
        n-tuples).
        
        Sure, third-party services like the OP can provide bots that can scan.
        But if you create an ecosystem in which PRs can be submitted by threat
        actors, part of your commitment to the community should be to provide
        visibility into attacks that cannot be seen by the naked eye, and make
        that protection the norm rather than the exception.
        
        [0]
        
   URI  [1]: https://docs.github.com/en/get-started/learning-about-github/a...
       
          zzo38computer wrote 1 hour 46 min ago:
          I think a "force visible ASCII for files whose names match a specific
          pattern" mode would be a simple thing to help. (You might be able to
          use the "encoding" command in the .gitattributes file for this,
          although I don't know if this would cause errors or warnings to be
          reported, and it might depend on the implementation.)
       
          andrewflnr wrote 4 hours 15 min ago:
          Regardless of the thorny question of whether it's Github's
          responsibility, it sure would be a good thing for them to do ASAP.
       
            godelski wrote 2 hours 8 min ago:
            Here's the big reason GitHub should do it:
            
              It makes the product better
            
            I know people love to talk money and costs and "value", but HN is a
            space for developers, not the business people. Our primary concern,
            as developers, is to make the product better. The business people
            need us to make the product better, keep the company growing, and
            beat out the competition. We need them to keep us from fixating on
            things that are useful but low priority and ensuring we keep having
            money. The contention between us is good, it keeps balance. It even
            ensures things keep getting better even if an effective monopoly
            forms as they still need us, the developers, to make the company
            continue growing (look at monopolies people aren't angry at and how
            they're different). And they need us more than we need them.
            
            So I'd argue it's the responsibility of the developers, hired by
            GitHub, to create this feature because it makes the product better.
            Because that's the thing you've been hired for: to make the product
            better. Your concern isn't about the money, your concern is about
            the product. That's what you're hired for.
       
              btown wrote 1 hour 0 min ago:
              I'd say that this is also true from a money-and-costs-and-value
              perspective. Sure, all press is good press... but any number of
              stakeholders would agree that "we got some mindshare by
              proactively protecting against an emerging threat" is higher-ROI
              press than "Ars did a piece on how widespread this problem is,
              and we're mentioned in the context of our interface making the
              attack hard to detect."
              
              And when the incremental cost to build a feature is low in an age
              of agentic AI, there should be no barrier to a member of the
              technical staff (and hopefully they're not divided into
              devs/test/PM like in decades past) putting a prototype together
              for this.
       
              tapland wrote 2 hours 6 min ago:
              Tldr:
              Yeah it would make it better!
       
                godelski wrote 1 hour 43 min ago:
                I hope I left the lead as the lead.
                
                But I also think we've had a culture shift that's hurting our
                field. Where engineers are arguing about if we should implement
                certain features based on the monetary value (which are all
                fictional anyways). But that's not our job. At best, it's the
                job of the engineering manager to convince the business people
                that it has not only utility value, but monetary.
       
            jacquesm wrote 3 hours 36 min ago:
            It absolutely is. They are simply spreading malware. You can't
            claim to be a 'dumb pipe' when your whole reason for existence is
            to make something people deemed 'too complex' simple enough for
            others to use, then you have an immediate responsibility to not
            only reduce complexity but to also ensure safety. Dumbing stuff
            down comes with a duty of care.
       
        minus7 wrote 4 hours 53 min ago:
        The `eval` alone should be enough of a red flag
       
          godelski wrote 1 hour 51 min ago:
          I'm not a JS person, but taking the line at face value shouldn't it
          to nothing? Which, if I understand correctly, should never be merged.
          Why would you merge no-ops?
       
          jeltz wrote 3 hours 32 min ago:
          Yeah, I would have loved to see an example where it was not obvious
          that there is an exploit. Where it would be possible for a reviewer
          to actually miss it.
       
          kordlessagain wrote 4 hours 41 min ago:
          No itâs not.
       
            godelski wrote 1 hour 48 min ago:
            The parent didn't say "there's no legitimate uses of eval", they
            said "using eval should make people pay more attention." A red flag
            is a warning. An alert. Not a signal saying "this is 100% no doubt
            malicious code."
            
            Yes, it's a red flag. Yes, there's legitimate uses. Yes, you should
            always interrogate evals more closely. All these are true
       
            jacquesm wrote 3 hours 35 min ago:
            While there are valid use cases for eval they are so rare that it
            should be disabled by default and strongly discouraged as a
            pattern. Only in very rare cases is eval the right choice and even
            then it will be fraught with risk.
       
            simonreiff wrote 4 hours 7 min ago:
            OWASP disagrees: See [1] , listing `eval()` first in its small list
            of examples of "JavaScript functions that are dangerous and should
            only be used where necessary or unavoidable". I'm unaware of any
            such uses, myself. I can't think of any scenario where I couldn't
            get what I wanted by using some combination of `vm`, the `Function`
            constructor, and a safe wrapper around `JSON.parse()` to do
            anything I might have considered doing unsafely with `eval()`. Yes,
            `eval()` is a blatant red flag and definitely should be avoided.
            
   URI      [1]: https://cheatsheetseries.owasp.org/cheatsheets/Nodejs_Secu...
       
            pavel_lishin wrote 4 hours 32 min ago:
            When is an eval not at least a security "code smell"?
       
            SahAssar wrote 4 hours 32 min ago:
            It really is. There are very few proper use-cases for eval.
       
              nswango wrote 3 hours 23 min ago:
              For a long time the standard way of loading JSON was using eval.
       
                bawolff wrote 1 hour 7 min ago:
                Not that long, browsers implemented JSON.parse() back in 2009.
                JSON was only invented back in 2001 and took a while to become
                popular. It was a very short window more than a decade ago when
                eval made sense here.
                
                Eval for json also lead to other security issues like XSSI.
       
                _flux wrote 2 hours 55 min ago:
                And why do we not anymore make use of it, but instead
                implemented separate JSON loading functionality in JavaScript?
                Can you think of any reasons beyond performance?
       
                  bawolff wrote 1 hour 12 min ago:
                  I'd be surprised if there is a performance benefit of
                  processing json with eval(). Browsers optimize the heck out
                  of JSON.
       
                  bulbar wrote 2 hours 17 min ago:
                  Why did you opt in for such a comment while a straight
                  forward response without belittling tone would have achieved
                  the same?
       
                    _flux wrote 2 hours 12 min ago:
                    I actually gave it some thought. I had written the actual
                    reason first, but I realized that the person I was
                    responding to must know this, yet keeps arguing in that
                    eval is just fine.
                    
                    I would say they are arguing that in bad faith, so I wanted
                    to enter a dialogue where they are either forced to agree,
                    or more likely, not respond at all.
       
        gnabgib wrote 5 hours 36 min ago:
        Small discussion yesterday (9+9 points, 9+4 comments) [1]
        
   URI  [1]: https://news.ycombinator.com/item?id=47374479
   URI  [2]: https://news.ycombinator.com/item?id=47385244
       
        DropDead wrote 5 hours 45 min ago:
        Why didn't some make av rule to find stuff like this, they are just
        plain text files
       
          charcircuit wrote 56 min ago:
          Isn't that what this article is about? Advertising an av rule in
          their product that catches this.
       
          nine_k wrote 4 hours 54 min ago:
          The rule must be very simple: any occurrence of `eval()` should be a
          BIG RED FLAG. It should be handled like a live bomb, which it is.
          
          Then, any appearance of unprintable characters should also be
          flagged. There are rather few legitimate uses of some zero-width
          characters, like ZWJ in emoji composition. Ideally all such
          characters should be inserted as \xNNNN escape sequences, and not
          literal characters.
          
          Simple lint rules would suffice for that, with zero AI involvement.
       
            hamburglar wrote 3 hours 29 min ago:
            I think thereâs debate (which I donât want to participate in)
            over whether or not invisible characters have their uses in
            Unicode.  But I hope we can all agree that invisible characters
            have no business in code, and banishing them is reasonable.
       
            WalterBright wrote 3 hours 45 min ago:
            > There are rather few legitimate uses of some zero-width
            characters, like ZWJ in emoji composition.
            
            Emojis are another abomination that should be removed from Unicode.
            If you want pictures, use a gif.
       
              _flux wrote 2 hours 50 min ago:
              Arguably them being in Unicode is an accessibility issue, unless
              we thought to standardize GIF names, and then that already sounds
              a lot like Unicode.
       
                WalterBright wrote 2 hours 39 min ago:
                How is it an accessibility issue? HTML allows things like
                little gif files. I've done this myself when I wrote text that
                contained Egyptian hieroglyphs. It works just fine!
       
                  _flux wrote 2 hours 16 min ago:
                  I mean if you don't have sight.
       
                    WalterBright wrote 2 hours 13 min ago:
                    Then use words. Or tooltips (HTML supports that). I use
                    tooltips on my web pages to support accessibility for
                    screen readers. Unicode should not be attempting to badly
                    reinvent HTML.
       
              sghitbyabazooka wrote 2 hours 56 min ago:
              ( ê¿ ï¹ ê¿ ; )
       
            trollbridge wrote 4 hours 34 min ago:
            In our repos, we have some basic stuff like ruff that runs, and
            that includes a hard error on any Unicode characters. We mostly did
            this after some un-fun times when byte order marks somehow ended up
            in a file and it made something fail.
            
            I have considered allowing a short list that does not include
            emojis, joining characters, and so on - basically just currency
            symbols, accent marks, and everything else you'd find in CP-1521
            but never got around to it.
       
          abound wrote 5 hours 32 min ago:
          Yeah it would have been nice to end with "and here's a five-line
          shell script to check if your project is likely affected". But to
          their credit, they do have an open-source tool [1], I'm just not
          willing to install a big blob of JavaScript to look for vulns in my
          other big blobs of JavaScript
          
   URI    [1]: https://github.com/AikidoSec/safe-chain
       
            nine_k wrote 4 hours 29 min ago:
            Something like this should work, assuming your encoding is Unicode
            (normally UTF-8), which grep would interpret:
            
              grep -P '[\x{200B}\x{200C}\x{200D}\x{FEFF}]' code.ts
            
            See
            
   URI      [1]: https://stackoverflow.com/q/78129129/223424
       
       
   DIR <- back to front page