_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   A curious case of O(N^2) behavior which should be O(N) (2023)
       
       
        eska wrote 14 hours 53 min ago:
        2 questions I would look into before anything else:
        
        1. Does this have to be sorted?
        
        2. Why is it sorted alphabetically instead of naturally?
       
        rokkamokka wrote 15 hours 17 min ago:
        While taking a computational complexity course I wrote some python code
        that should have run in O(n) but was closer to O(n^2). After asking
        StackOverflow and some more experimentation it turns out the garbage
        collector turned it O(n^2) - turning it off manually yielded the
        correct O(n) runtime
       
        purplesyringa wrote 16 hours 28 min ago:
        Ah yes, the most reliable solution to a wrong choice of a data
        structure: add data-specific hacks until it works. For the life of me I
        can't figure out why people never consider replacing linked lists. Even
        without worst-case O(n) insertion, they usually behave worse than
        vectors, deques, or hives.
       
          xigoi wrote 15 hours 2 min ago:
          What is a hive in the context of data structures?
       
            purplesyringa wrote 13 hours 10 min ago:
            I meant something close to [1] Perhaps a simple way to look at it
            is that it's like a dynamic array, but when the capacity is
            exceeded, you don't reallocate the array, just just allocate a new
            (exponentially larger) chunk and keep the old one as-is. Then just
            link the chunks in a (very short) linked list, or keep a separate
            small list of chunks (you're never gonna need more than 48, so you
            can just have a fixed allocation for it), or what have you. The
            bonus here is that it reduces latency on pushing and has more
            predictable performance.
            
   URI      [1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p...
       
            masklinn wrote 14 hours 52 min ago:
            I assume it's an autocorrect of heap.
       
          mjevans wrote 15 hours 27 min ago:
          There's also the choice of data structure that appears to force
          multiple copies of the same thing rather than thin references and
          ideally copy on modify.
       
        ojbyrne wrote 19 hours 5 min ago:
        “alphebatically” is used 3 times, and otherwise no typos that I
        noticed. So I wondered if it’s a term I’m not familiar with.
       
        geuis wrote 19 hours 31 min ago:
        "Now, it should be clear that most of the time (28s-53s on the time
        line) of importing that particular USD file is spent in
        id_sort_by_name".
        
        Luckily there's a graph screenshot. But the graph it displays is
        incomprehensible. Without "id_sort_by_name" being on the one bar, I
        wouldn't even know what I'm looking at.
       
          wtallis wrote 18 hours 32 min ago:
          The timeline view in the upper screenshot is fairly straightforward:
          the region of 28-53s is selected, other parts of the timeline are
          grayed out, and the statistics for the selected time region down
          below show 94.4% of "Total CPU" being in id_sort_by_name.
          
          The lower screenshot is a flame graph. If you haven't encountered one
          of those before, it's totally reasonable to not know how to read it.
          But it's a standard and common way to present this sort of data. See
          [1] for more information.
          
   URI    [1]: https://www.brendangregg.com/flamegraphs.html
       
        nrdvana wrote 21 hours 34 min ago:
        Why was the solution not to get rid of the linked list and replace it
        with a red/black tree?
       
          Epa095 wrote 14 hours 57 min ago:
          I got the impression that the expected performance of the double
          linked list insertion is actually O(1), since in most cases the
          elements arrive in sorted order. It's been a time since my algorithm
          courses, but I think all the 'normal' trees have log(n) insertion in
          that case.
       
            nrdvana wrote 10 hours 51 min ago:
            O(n) is generally indistinguishable from O(log n) so if there is
            any chance of different behavior than the expected optimum, go with
            the better algorithm.
       
          thayne wrote 20 hours 57 min ago:
          Or hash table, or any other kind of tree.
          
          My guess would be because  it was implemented in c, where the usual
          practice if you need a container type other than a fixed size array
          is to implement it yourself. IME, c code tends to use linked lists
          for a lot of things that in most other languages would be implemented
          using a better suited, and more performent, container type from the
          standard library.
          
          One way that other languages can outperform c, is it is easier to use
          the right data structure.
       
            masklinn wrote 14 hours 54 min ago:
            Blender's not even in C (the snippets are clearly C++), I wonder
            what the logic of having a sorted doubly linked list is: unless
            it's a skip list it's not like you can do bisection searches.
            
            I guess a flat array could still be debatable if you're usually
            inserting near but not at the end, as it'll still need to move all
            the following elements. But it seems dubious.
       
        dmitrygr wrote 23 hours 0 min ago:
        Why not just change the “%s.%u” to “%s.%010u”  and no code
        changes?
       
          consp wrote 17 hours 31 min ago:
          You just created a fix up to a large, but arbitrary, number just
          pushing the problem down the road instead of fixing it.
       
          andrelaszlo wrote 22 hours 42 min ago:
          If I understood the article correctly it caused problems when the
          same file was imported multiple times, or when another file with the
          same base name was imported.
          
   URI    [1]: https://gist.github.com/bssrdf/397900607028bffd0f8d223a7acdc...
       
            dmitrygr wrote 21 hours 49 min ago:
            Yes, TFA says the issue is because "12345" sorts before "9888"
            
            My solution avoids that.
       
        Liftyee wrote 23 hours 32 min ago:
        Neat find! This proves to me a huge benefit of open source software:
        individual developers can satisfy their curiosity towards particular
        issues (with additional motivation from being personally affected) and
        improve the package for everyone once it's fixed.
       
          bla3 wrote 21 hours 58 min ago:
          Except it wasn't improved in this case, if I'm reading the gist
          correctly. The author didn't like their fix enough to make a pull
          request.
       
            EdSchouten wrote 19 hours 45 min ago:
            Just like “this entire meeting could have been an email”, all
            this effort writing this gist could have been spent creating a PR.
       
        Sharlin wrote 1 day ago:
        More accidentally quadratic stories:
        
   URI  [1]: https://www.tumblr.com/accidentallyquadratic
       
          samatman wrote 20 hours 19 min ago:
          A classic.  Every time I see that link I read the whole thing
          starting from the new beginning.
          
          ...oops
       
            itronitron wrote 19 hours 20 min ago:
            Is there a way to do that without signing in?
       
            dgfitz wrote 19 hours 43 min ago:
            I opened the link and just started reading. I have a really dumb
            question that may expose common knowledge I don’t have, about
            this quote:
            
            > The total amount of space needed to represent this collection of
            strings is O(k n^2).
            
            I haven’t seen O-notation ever represent ram usage, just
            algorithm complexity. Is this common?
       
              masklinn wrote 15 hours 1 min ago:
              > Is this common?
              
              Very. For instance if you look at sorting algorithms on wikipedia
              they pretty much all list performance (best, worst, average) but
              also worst-case space complexity, in O notation.
       
              nneonneo wrote 19 hours 41 min ago:
              Yes, very common. You've seen "time complexity"; it's very common
              to talk about "space complexity" as well.
       
                consp wrote 17 hours 37 min ago:
                Fun bonus: they can be interchangeable, e.g. increasing space
                to reduce time.
       
                  asboans wrote 10 hours 57 min ago:
                  And any operation that takes n bits as input can be trivially
                  turned into an O(1) time and O(2^n) space algorithm through
                  tabulation.
       
                    nneonneo wrote 4 hours 9 min ago:
                    Assuming you ignore or amortize the time necessary to
                    create the table in the first place, of course.
                    
                    This is the basis for rainbow tables: precomputed tables
                    for mapping hashes to passwords, with a space-saving
                    “hash chaining” trick to effect a constant factor
                    reduction in table size. Such tables are the reason why
                    passwords must be hashed with a unique salt when stored in
                    a database.
       
                  dgoldstein0 wrote 16 hours 38 min ago:
                  Yes, but total time is never going to be less than total
                  space, when expressed in big-O notation
       
                    deltaknight wrote 14 hours 29 min ago:
                    I’m not sure this definition of Big-O for space
                    complexity is universal. When I’ve seen/used it, the size
                    of the initial data wasn’t relevant, it was more about
                    the additional memory required for the algorithm.
                    
                    For example, an in-place algorithm like Bubble Sort would
                    have a O(1) space complexity, because it requires no extra
                    memory (and 0 memory is a constant). Merge sort on the
                    other hand is O(n) because it always uses additional memory
                    for its intermediate stages, and that additional memory
                    scales with n.
                    
                    Doing a quick google, the first few sites I find seem to
                    use a similar understanding
                    
   URI              [1]: https://www.geeksforgeeks.org/time-and-space-compl...
       
                      aleksiy123 wrote 8 hours 11 min ago:
                      The confusion is around space complexity vs auxiliary
                      space complexity (extra space).
                      
                      space complexity is O(n) but auxiliary space complexity
                      uses Theta for notation instead.
                      
                      But people aren't too picky on the notation and usually
                      say something like "O(1) extra space" instead of using
                      theta.
                      
   URI                [1]: https://en.m.wikipedia.org/wiki/Space_complexity
       
                        nneonneo wrote 4 hours 17 min ago:
                        That’s not quite accurate. Big O notation and Theta
                        notation are different ways of expressing the growth
                        rates of functions - the choice of using Big O or Theta
                        is independent of whether you’re trying to specify
                        total space complexity or auxiliary space complexity.
                        
                        Saying something is O(n) tells you it grows at most
                        linearly, but this would also admit e.g. log n.
                        
                        Saying something is Theta(n) tells you it grows exactly
                        linearly: that is, it is not a slower growth rate like
                        log n, nor a faster growth rate like n^2.
       
                          aleksiy123 wrote 2 hours 45 min ago:
                          Yeah, you are correct I misinterpreted the article.
                          
                          But yeah I guess space complexity vs auxiliary space
                          complexity is just a bit ambigous.
       
                    xigoi wrote 15 hours 4 min ago:
                    Why couldn’t it?
       
                      yxhuvud wrote 14 hours 41 min ago:
                      Because it takes n time to access n units of memory.
                      
                      Heavily simplified due to caches etc. To the point where
                      people sometimes measure in cache misses instead as that
                      is usually what actually matters.
       
                        xigoi wrote 14 hours 36 min ago:
                        What if you allocate a huge chunk of memory and only
                        use a small part of it? For example, checking if a list
                        of numbers contains duplicates using a boolean array.
       
                          Sharlin wrote 13 hours 25 min ago:
                          If your algorithm requires O(n) memory, any O(1)
                          amount of memory can never be enough, no matter how
                          huge. That's the entire point of O notation.
                          
                          And if your implementation of an algorithm allocates
                          more space in the big-oh sense than it can actually
                          touch (eg. O(n) space for O(log n) time or whatever),
                          that's just a wasteful implementation. Doesn't make
                          the algorithm itself require more space than it has
                          time to actually use.
       
       
   DIR <- back to front page