gopher://codevoid.de/1/hn/comments

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Microsoft Python Driver for SQL Server
       
       
        wiseowise wrote 14 min ago:
         [1] Is this generated by LLM? Comments are straight out of generic LLM
        slop.
        
   URI  [1]: https://github.com/microsoft/mssql-python/blob/main/mssql_pyth...
       
        jollyllama wrote 1 hour 47 min ago:
        Oooh, has it got sql_variant support? And how far back does SQL Server
        compatibility go?
       
        gigatexal wrote 2 hours 52 min ago:
        If MSSQL really wanted to become more mainstream they'd release a
        properly free version to compete with MySQL/MariaDB and PostgreSQL.
        
        I've not used MSSQL since 2015/2016 and haven't missed much.
        
        Now I live in the OLAP space so I think of it far, far less.
       
          stackskipton wrote 2 hours 1 min ago:
          Money here means this wonât happen.
          
          Sure, greenfield does not use MSSQL but there is a ton of companies
          stuck with MSSQL that will continue to have to fork over big
          licensing money.
       
            gigatexal wrote 1 hour 23 min ago:
            Which will mean itâll stay irrelevant still.
       
        denis_dolya wrote 3 hours 48 min ago:
        Iâve been working with SQL Server from Python on various platforms
        for several years. The new Microsoft driver looks promising,
        particularly for constrained environments where configuring ODBC has
        historically been a source of friction.
        
        For large data transfers â for example, Pandas or Polars DataFrames
        with millions of rows â performance and reliability are critical. In
        my experience, fast_executemany in combination with SQLAlchemy helps,
        but bulk operations via OpenRowSets or BCP are still the most
        predictable in production, provided the proper permissions are set.
        
        Itâs worth noting that even with a new driver, integration complexity
        often comes from platform differences, TLS/SSL requirements, and
        corporate IT policies rather than the library itself. For teams looking
        to simplify workflows, a driver that abstracts these nuances while
        maintaining control over memory usage and transaction safety would be a
        strong improvement over rolling your own ODBC setup.
       
          th0ma5 wrote 3 hours 39 min ago:
          This is the correct prospective. Often driver issues transcend
          technical and political boundaries. My old team dropped a vendor who
          changed the features of a driver and spent several years trying to
          find another as well as making that vendor reapply and make a new
          case, which, didn't work out for them.
       
        __mharrison__ wrote 4 hours 30 min ago:
        Very cool. Used to be a huge pain to connect to sqlserver from Python
        (especially non Windows platforms).
       
          brewmarche wrote 2 hours 44 min ago:
          I have very limited Python experience but I remember using pymssql in
          the past. Are there any problems with it?
       
            yread wrote 23 min ago:
            iirc it had problems with named instances - only on some newer
            version of sql server
       
            mrweasel wrote 1 hour 50 min ago:
            We used it heavily 10 years ago. It was okay, but it had a rocky
            history. For a long time it seemed abandoned, then Python 3
            happened and we had to patch our own version for a while. Then a
            new maintainer took over and stuff would just break or APIs
            rewritten between versions. We ran our own pypi instance to deal
            with pymssql specifically.
            
            In the later years it became rather good though and with Python 3
            many of the issues with character encoding went away.
       
          qsort wrote 4 hours 4 min ago:
          I do expect this package to make connecting easier, but it was okay
          even before. ODBC connectivity via pyodbc has always worked quite
          well and it wasn't really any different when compared to any other
          ODBC source. I'm more on the data engineering side and I'm very picky
          about this kind of stuff, I don't expect the average user would even
          notice besides the initial pain of configuring ODBC from scratch.
       
            tracker1 wrote 3 hours 20 min ago:
            IIRC, I had trouble if I installed the MS ODBC driver and some of
            the updates for Ubuntu (WSL) out of order.  I generally prefer a
            language driver package where available.
            
            Would be nice if MS and Deno could figure things out to get SQL
            working in Deno.
       
        abirch wrote 4 hours 44 min ago:
        What my workself would love is to easily dump Pandas or Polar data
        frames to SQL Tables in SQL Server as fast as possible. I see this bcp,
        but I don't see an example of uploading a large panda dataframe to SQL
        Server.
       
          bob1029 wrote 1 hour 56 min ago:
          On BULK INSERT, the data source doesn't have to live on the actual
          MSSQL box. It can be something on a UNC or even in Azure: [1] >
          Beginning with SQL Server 2017 (14.x), the data_file can be in Azure
          Blob Storage.
          
          You don't need to have access to the MSSQL box to invoke high
          performance batch loads. UNC should give you what you need if your IT
          department is willing to work with it.
          
   URI    [1]: https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-...
       
          kermatt wrote 2 hours 36 min ago:
          While bcp lacks some features to make this as straightforward as
          PostgreSQL for example, i.e. piped data into bcp, it is a fast ingest
          option for MSSQL.
          
          We wound up staging a local tab delimited file, and importing via
          bcp:
          
              bcp "$DESTINATION_TABLE" in "$STAGE_FILE.dat" -u -F 2 -c -t'\t'
          
          Not elegant, but it works.
       
          RaftPeople wrote 2 hours 55 min ago:
          > What my workself would love is to easily dump Pandas or Polar data
          frames to SQL Tables in SQL Server as fast as possible
          
          We run into this issue also where we want to upload volumes of data
          but don't want to assume access to BCP on every DB server.
          
          We wrote a little utility that actually works pretty fast compared to
          other methods we've tested (fast=about 1,000,000 rows per minute for
          a table with 10 random columns with random data), here's the
          approach:
          
          1-Convert rows into fixed length strings so each row is uploaded as
          one single varchar column (which makes parsing+execution of SQL stmt
          during upload much quicker)
          
          2-Repeatedly upload groups of fixed length rows into temp table until
          all uploaded.
          
          Details:
          
          Multiple fixed length rows are combined into one fixed length varchar
          column that will be uploaded as one single raw buffer row. We found a
          buffer size of 15,000 to be the sweet spot.
          
          Multiple threads will each process a subset of source data rows.  We
          found 5 threads to be generally pretty good.
          
          At the end of this step, the destination temp table will have X rows
          of buffers (the buffer column is just a varchar(15000), and inside
          each of those buffers are Y source data rows with Z number of columns
          in fixed format.
          
          3-Once the buffer rows are all uploaded then split out the source
          data rows+columns using a temp sproc generated for the exact schema
          (e.g. substring(Buffer_Data,x,y) as Cust_Name)
       
          qsort wrote 3 hours 57 min ago:
          How large? In many cases dumping to file and bulk loading is good
          enough. SQL Server in particular has openrowsets that support bulk
          operations, which is especially handy if you're transferring data
          over the network.
       
            abirch wrote 3 hours 50 min ago:
            Millions of rows large. I tried doing the openrowsets but
            encountered permission issues with the shared directory. Using
            fast_executemany with sqlalchemy has helped, but sometimes it's a
            few minutes. I tried bcp as well locally but IT has not wanted to
            deploy it to production.
       
          sceadu wrote 4 hours 1 min ago:
          You might be able to do it with ibis. Don't know about the
          performance though
       
            abirch wrote 3 hours 55 min ago:
            Thank you, I'll look into this. Yes performance is the main driver
            when some data frames have millions of rows.
       
          A4ET8a8uTh0_v2 wrote 4 hours 37 min ago:
          Honestly, what I find myself doing more often than not lately is not
          having problems with the actual data/code/schema whatever, but,
          instead, fighting with layers of bureaucracy, restrictions, data
          leakage prevention systems, specific file limitations imposed by the
          previously listed items...
          
          There are times I miss being a kid and just doing things.
       
        zurfer wrote 4 hours 57 min ago:
        This is really timely. I just needed to build a connector to Azure
        Fabric and it requires ODBC 18 which in turn requires openssl to allow
        deprecated and old versions of TLS.
        Now I can revert all of that and make it clean :)
       
          zurfer wrote 2 hours 40 min ago:
          Actually bad luck, it seems this doesn't support Microsoft Fabric
          with the datawarehouse engine... it fails because Fabric doesn't
          support the DECLARE CURSOR operation, which this driver relies on.
       
       
   DIR <- back to front page