LinkedIn open-sourced Dagli, a machine learning library for Java (and other JVM languages) that ostensibly makes it easier to write bug-resistant, readable, modifiable, maintainable, and deployable model pipelines without incurring technical debt.
While machine learning maturity in the enterprise is generally increasing, the majority of companies (50%) spend between 8 and 90 days deploying a single machine learning model (with 18% taking longer than 90 days), a 2019 survey from Algorithmia found. Most peg the blame on failure to scale, followed by model reproducibility challenges, a lack of executive buy-in, and poor tooling.
With Dagli, the model pipeline is defined as a directed acyclic graph, a graph consisting of vertices and edges with each edge directed from one vertex to another for training and inference. The Dagli environment provides pipeline definitions, static typing, near-ubiquitous immutability, and other features preventing the large majority of potential logic errors.
“Models are typically part of an integrated pipeline … and constructing, training, and deploying these pipelines to production remains more cumbersome than it should be,” LinkedIn natural language processing research scientist Jeff Pasternack wrote in a blog post. “Duplicated or extraneous work is often required to accommodate both training and inference, engendering brittle ‘glue’ code that complicates future evolution and maintenance of the model.”
Dagli works on servers, Hadoop, command-line interfaces, IDEs, and other typical JVM contexts. Plenty of pipeline components are ready to use right out of the box, including neural networks, logistic regression, gradient boosted decision trees, FastText, cross-validation, cross-training, feature selection, data readers, evaluation, and feature transformations.
For experienced data scientists, Dagli offers a path to performant, production-ready AI models maintainable and extensible in the long term that can leverage an existing JVM technology stack. For software engineers with less experience, Dagli provides an API that can be used with a JVM language and tooling that’s designed to avoid typical logic bugs.
“With Dagli, we hope to make efficient, production-ready models easier to write, revise, and deploy, avoiding the technical debt and long-term maintenance challenges that so often accompany them,” Pasternack continued. “Dagli takes full advantage of modern, highly multicore processors and … powerful graphics cards for effective single-machine training of real-world models.”
The release of Dagli comes after LinkedIn made available the LinkedIn Fairness Toolkit (LiFT), an open source software library designed to enable the measurement of fairness in AI and machine learning workflows. Prior to LiFT, LinkedIn debuted DeText, an open source framework for natural language process-related ranking, classification, and language generation tasks that leverages semantic matching, using deep neural networks to understand member intents in search and recommender systems.
Sourced from VB - written by Kyle Wiggers