Just wrapped up my first ever Databricks Data+AI Summit last week in SF. It was great to connect with colleagues and nerd out about machine learning (ML) and ML operations (MLOps). I learned a whole lot about the maturation of scalable ML across various industries (tech, communication, media, entertainment), and also got a feel for where the sector is heading in the next 2-3 years. Here are some major takeaways and interesting ideas I gathered from the summit.

  • The ML model is not the product; the MLOps system is the product. This reminds me of the saying “amateurs talk strategy, professionals talk logistics.” While fancy ML models may get the most attention in the media, real business value is only created when the model is delivered along with its operational pipeline. One simply cannot succeed without the other. A well-architected MLOps system is what minimizes the hidden Technical Debt in Machine Learning Systems and enables the model to do its job.
  • Most companies are still finding ways to integrate ML into their business. Besides the tech behemoths that started their ML journey early on, currently most companies – tech and otherwise – are still experimenting with ways to operationalize ML in their business. Some are building full in-house solutions, others are leveraging vendor providers like Databricks and H2O.ai, and the rest are somewhere in between. The landscape is still evolving fast, and it’s hard to foresee what it’ll look like a few years from now (exciting!).
  • Data-centric ML can deliver more business value than fancy new algorithms. As most ML algorithms mature and gain widespread use, they are no longer the bottleneck in value creation. Instead, poor data quality is slowing down time-to-value. So now there is a concerted effort to go from big data to good data. This means improving the data curation process by leveraging human-in-the-loop manually labeling to incrementally improve model performance throughout the ML lifecycle.
  • Big data and AI will permeate to edge devices. So far most big data and AI technologies are mainly accessible via specialized applications hosted on centralized servers, but now light weight edge devices and web applications such as Raspberry Pi can start to enjoy the same capabilities through thin client APIs that provide the full power of Spark, essentially opening a gateway for the devices to use the distributed execution engine without the heavy backend infrastructure.