_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   DIR   Launch HN: Tamarind Bio (YC W24) – AI Inference Provider for Drug Discovery
       
       
        dannykwells wrote 8 min ago:
        Tamarind is the best! Truly a YC company that provides true, bona fide
        value to biologists like us. (Hi Sherry!)
       
        tmychow wrote 8 hours 37 min ago:
        Congrats on the launch - excited to try it out!
       
        conradry wrote 1 day ago:
        You may find this library I wrote a couple years ago interesting: [1] .
        Curious about why you chose to make separate images for each model
        instead of copy-pasting source code into a big monorepo (similar to
        Huggingface transformers).
        
   URI  [1]: https://github.com/conradry/prtm
       
          denizkavi wrote 1 day ago:
          Oh yeah, I've seen this before! Cool stuff
          
          I would say primary concerns were:
          
          dependency issues, needing more than model weights to be able to
          consume models (Multiple Sequence Alignment needs to be split, has
          its own always on server, so on), more convenient if the inputs and
          outputs are hardened interfaces as different envs
          
          Our general findings in the BioML are that the models are not at all
          standardized especially compared to the diffusion model world for
          example, so treating each with its own often weird dependencies
          helped us get out more tools quicker.
       
        johnsillings wrote 1 day ago:
        selling to big pharma companies as a startup is hard, so huge props on
        getting adoption there. the product looks very slick.
       
        t_serpico wrote 1 day ago:
        nice stuff! how do you handle security concerns big pharma may have?
        wouldn't they just run their stuff on-prem?
       
          denizkavi wrote 1 day ago:
          It certainly was an investment for us to meet the security and
          enterprise-readiness criteria for our enterprise users. As an n of 1,
          we don't tend to do on-prem, and even much of the most skeptical
          companies will find a way to use cloud if they want your product
          enough.
          
          I think most large companies have similar expectations around
          security requirements, so once those are resolved most IT teams are
          on your side. We occasionally do some specific things like allowing
          our product to be run in a VPC on the customer cloud, but I imagine
          this is just what most enterprise-facing companies do.
       
        the__alchemist wrote 1 day ago:
        Cool project! I have a question based on the video: What sort of work
        is it doing from the "Upload mmCIF file and specify number of molecules
        to generate" query? That seems like a broad ask. For example, it is
        performing ML inference on a data set of protein characteristics, or
        pockets in that protein? Using a ligand DB, or generating ligands? How
        long does that run take?
       
          denizkavi wrote 1 day ago:
          In this case the input to the model is the input structure of the
          protein target, i.e. you can define the whole search space for it to
          try to find a binder/drug against. We let you pick a preset recipe to
          follow at the top, which basically are common ways people are using
          this protocol for. The model itself can find a pocket, or the user
          can specify it if they know ahead of time. There is a very
          customizable variant for this tool, where you can set distances
          between individual atoms, make a custom scaffold for your starting
          molecule, but 90% of the time, the presets tend to be sufficient.
          
          Runs vary significantly between models/protocols used, some
          generative models can take several hours, while some will run a few
          seconds. We have tools that would screen against DBs if the goal is
          to find an existing molecule to act against the target, but often,
          people will import and existing starting point and modify it or
          design completely novel ones on the platform.
       
        washedDeveloper wrote 1 day ago:
        The org I work on develops HTCondor. We have a lot of scientists that
        end up running alphafold and other bio related models on our pool of
        GPUs and CPUs. I am curious to know how and why your team implemented
        yet another job scheduler. HTCondor is agnostic to the software being
        ran, so maybe there is more clever scheduling you can come up with.
        That being said, HTCondor also has pretty high flexibility with regards
        to policy.
       
          denizkavi wrote 1 day ago:
          That’s interesting. We’ve developed a kubernetes-based scheduler
          that we’ve found better takes into account our custom job priority
          needs, allows for more strict data isolation between tenants, and a
          production grade control plane, though the core scheduling could
          certainly be implemented in something like HTCondor.
          
          Originally, my first instinct was to use Slurm or AWS batch, but
          started having problems once we tried to multi cloud. We're also
          optimizing for being able to onboard an arbitrary codebase as fast as
          possible, so building a custom structure natively compatible with our
          containers (which are now automatically made from linux machines with
          the relevant models deployed) has been helpful.
       
        Akshay0308 wrote 1 day ago:
        That's really cool! How much do scientists at big pharma use
        open-source models as opposed to models trained on their proprietary
        data?  Do you guys have tie-ups to provide inference for models used
        internally at big pharma trained on proprietary data?
       
          denizkavi wrote 1 day ago:
          Good amount of both! I would say proprietary models tend to be
          fine-tuned versions of the published ones, although many will be
          completely new architectures. We also let folks fine-tune models with
          their proprietary data on Tamarind directly.
          
          We do let people onboard their own models too, basically the users
          just see a separate tab for their org, which is where all the
          scripts, docker images, notebooks their developers built interfaces
          for live on Tamarind.
       
        machbio wrote 1 day ago:
        Looks good - would have really appreciated if the pricing page
        contained any examples of pricing instead of book a meeting
       
          denizkavi wrote 1 day ago:
          That's fair, I wish we were able to just add in a calculator for
          getting a price on a per hour basis, given your models of interest
          and intended volume.
          
          We actually did have this available early on, our rationale for why
          we structure it differently now is basically that there is a lot of
          diversity between how people use us. We have some examples where a
          twenty person biotech company will consume more inference than a
          several hundred person org. Each tool has very different compute
          requirements, and people may not be clear on which model exactly they
          will be using. Basically we weren't able to let people calculate the
          usage/annual commitment/integration and security requirements in one
          place.
          
          We do have a free tier which tends to be decent estimate of usage
          hours and a form you can fill out if and we can get back to you with
          a more precise price.
       
        brandonb wrote 1 day ago:
        Congrats on the launch. I always love to see smart ML founders applying
        their talents to health and bio.
        
        What were the biggest challenges in getting major pharma companies
        onboard? How do you think it was the same or different compared to
        previous generations of YC companies (like Benchling)?
       
          denizkavi wrote 1 day ago:
          Thanks! I think advantages we had over previous generations of
          companies is that demand and value for software has become much
          clearer for biopharma. The models are beginning to actually work for
          practical problems, most companies have AI, data science or
          bioinformatics teams that apply these workflows, and AI has
          management buy-in.
          
          Some of the same problems exist, large enterprises don't want to
          process their un-patented, future billion-dollar drug via a startup,
          because leaking data could destroy 10,000 times the value of the
          product being bought.
          
          Pharma companies are especially not used to buying products vs
          research services, there's also historical issues with the industry
          not being served with high quality software, so it is kind of a habit
          to build custom things internally.
          
          But I think the biggest unlock was just that the tools are actually
          working as of a few years ago.
       
            idontknowmuch wrote 1 day ago:
            What tools are "actually working" as of a few years ago? Foundation
            models, LLMs, computer vision models? Lab automation software and
            hardware?
            
            If you look at the recent research on ML/AI applications in
            biology, the majority of work has, for the most part, not provided
            any tangible benefit for improving the drug discovery pipeline
            (e.g. clinical trial efficiency, drugs with low ADR/high efficacy).
            
            The only areas showing real benefit have been off-the-shelf LLMs
            for streamlining informatic work, and protein folding/binding
            research. But protein structure work is arguably a tiny fraction of
            the overall cost of bringing a drug to market, and the space is
            massively oversaturated right now with dozens of startups chasing
            the same solved problem post-AlphaFold.
            
            Meanwhile, the actual bottlenecks—predicting in vivo efficacy,
            understanding complex disease mechanisms, navigating clinical
            trials—remain basically untouched by current ML approaches. The
            capital seems to be flowing to technically tractable problems
            rather than commercially important ones.
            
            Maybe you can elaborate on what you're seeing? But from where I'm
            sitting, most VCs funding bio startups seem to be extrapolating
            from AI success in other domains without understanding where the
            real value creation opportunities are in drug discovery and
            development.
       
              unignorant wrote 1 day ago:
              These days it's almost trivial to design a binder against a
              target of interest with computation alone (tools like boltzgen,
              many others). While that's not the main bottleneck to drug
              development (imo you are correct about the main bottlenecks),
              it's still a huge change from the state of technology even 1 or 2
              years ago, where finding that same binder could take months or
              years, and generally with a lot more resources thrown at the
              problem. These kinds of computational tools only started working
              really well quite recently (e.g., high enough hit rates for small
              scale screening where you just order a few designs, good Kd,
              target specificity out of the box).
              
              So both things can be true: the more important bottlenecks
              remain, but progress on discovery work has been very exciting.
       
                idontknowmuch wrote 13 hours 52 min ago:
                As noted, I agree on the great strides made in the protein
                space. However, the over saturation and redundancy in tools and
                products in this space should make it pretty obvious that
                selling API calls and compute time for protein binding, annd
                related tasks, isn’t a viable business beyond the short term.
       
       
   DIR <- back to front page