The Foundational Value of Connected Data

Published

January 22, 2026

Underlying much of the excitement around Legal AI is a flawed assumption: that large language models (LLMs) are the foundational tools that “know” the law, and that solving contracting is primarily a matter of deploying a sufficiently powerful model. This framing misunderstands both the nature of LLMs and the nature of transactional legal work.

The Myth of the All-Knowing Model

LLMs do not possess understanding or judgment. They are probabilistic systems trained to predict the next most likely token based on patterns in large bodies of text. At their core, they are sophisticated pattern completers—highly effective at producing fluent language, but indifferent to truth, strategy, or institutional nuance unless those things are explicitly supplied.

When used generically, LLMs behave like any other commodity source. Commodity tools, applied in commodity ways, produce commodity results. If every firm is querying the same models with similar prompts and abstract notions of “the market,” the outputs inevitably converge toward the average.

Transactional practice is not built on the average. Clients do not pay for generic summaries of standard terms; they pay for judgment shaped by precedent, by experience, and by a firm’s accumulated decisions over time.

Without grounding, a model has no way of knowing that a firm’s partners have recently shifted their stance on Limitation of Liability for SaaS clients, or that, for a recurring counterparty, the firm routinely concedes on Governing Law to win on Indemnification. That knowledge is not encoded in the model, and it cannot be inferred from language alone.

It lives elsewhere. It lives in the firm’s Document Management System (DMS), in its templates and best practices, in executed agreements, in redlines exchanged over years of negotiation, in the cumulative record of what the firm has actually done, and in the heads of its lawyers.

In this light, the LLM is not the law professor. It is the bridge to the data, the people, and the standards that represent contract intelligence. Its role is to connect practitioners to institutional knowledge by functioning as a sophisticated logic layer. It navigates the document corpus to retrieve, assemble, and present a firm’s precedent at the moment it matters, rather than attempting to replace that hard-won knowledge with probabilistic text generation.

The “Drafting Fingerprint” as a Competitive Moat

Every elite transactional practice has a distinct drafting fingerprint. It shows up in how risk is allocated, how leverage is applied, and where a firm consistently draws lines in a negotiation.

Clients choose firms for that specificity. A sponsor hires a firm known for its private equity practice because of its aggressive deal posture and deep familiarity with sponsor-friendly structures. A public company turns to another firm in moments of existential risk for judgment shaped by decades of high-stakes investigations and regulatory crises. Life sciences companies rely on a firm like Ropes & Gray because it understands how regulatory, IP, and financing considerations converge—and how that convergence should be reflected in the documents.

These distinctions are not abstract. They are the cumulative result of thousands of drafting and negotiation decisions– what a firm treats as non-negotiable, where it regularly concedes (what is the default position and the fallback), and how it adapts “market” terms to specific clients, counterparties, and economic conditions.

That judgment is contextual, and it lives in the firm’s document corpus—in executed agreements, redlines, and evolving clause language over time. Historically, this institutional knowledge was fragmented and informal. By connecting AI directly to historical contract data, it becomes a durable, firm-wide asset:

Demonstrating Market Authority: Providing data-backed proof of “standard” terms based on a firm’s actual volume of similar deals, moving beyond anecdotal evidence.
Maintaining Client Loyalty: Surfacing client-specific negotiation preferences ensures a level of “concierge” service that makes it difficult for them to switch firms.
Preserving Intellectual Property: Ensuring that years of hard-won negotiation outcomes are accessible and deployable on every page of every new draft.

This is the competitive moat. Not the model, but the context embedded in the firm’s own work and the ability to deploy it on demand.

The Engineering of Scale: Context Is the System

Firms struggle to operationalize their deal history because their contracts aren't natively searchable. They are unstructured text files, and treating them as collections of paragraphs misses how transactional lawyers actually work. The structure of a Word document or PDF with paragraphs of text does not directly encode meaningful atomic units; instead, it is clauses, defined terms, and their relationships that matter. To make a contract usable as data, an AI system must extract semantic structure from free-form text—identifying what a provision is, not just where it appears.

That structure only becomes valuable when placed in a broader context. The signal rarely lives in a single agreement. It emerges across the firm’s entire body of work: how similar clauses appear across deal types, clients, counterparties, and market cycles. If that context must be manually uploaded or curated on demand, the system is forcing the user to do a large portion of the work themselves.

Even expert human curation does not solve this at scale. Lawyers rely on partial recall and local precedent—what they have personally seen, what their group tends to do, what a few senior partners remember. That is not institutional knowledge; it is an incomplete and uneven approximation. It’s just one step removed from plain hearsay.

Brute force is not an option. No system can meaningfully reason over millions of documents by passing them wholesale into a context window. Despite advancements by the foundational model providers at increasing the size of the context window, accuracy does not come by being able to toss more unstructured text into a single prompt. To support large-scale understanding of a firm’s DMS, AI must be equipped with discovery tools that mirror how humans find information: semantic indexing, contextual retrieval, and the ability to assemble the right subset of documents dynamically, based on the question being asked.

The data is what enables the LLM to play the most impactful role it is capable of. It coordinates access to structured data, pulls in relevant context, and synthesizes it into a usable form. The same infrastructure that allows lawyers to navigate the firm’s knowledge base enables the AI to do the same. And by acting as the glue, it provides the lawyer with the necessary context and interface to apply human judgment at the highest leverage places.

The Governance of Intelligence: The Challenge of Connected Data

The promise of "connected data" is often at odds with the fundamental reality of legal work: institutional data is not a monolith. Whether in a global law firm or a multi-national corporation, high-value knowledge is trapped behind a complex web of information barriers, jurisdictional restrictions, and strict "need-to-know" access controls.

The true challenge of AI in this environment isn't just processing information; it is ensuring the system respects the same privacy and compliance protocols that govern the rest of the organization. If an AI tool surfaces a relevant precedent but bypasses a matter-level permission or an internal data silo, it’s a liability, not an asset.

Building an architecture that can operate across an enterprise DMS requires more than just a secure cloud; it requires a deep, real-time integration with existing security logic. For your data to be useful, it must be governed by the same granular permissions that protect the organization’s most sensitive work. This means the system must verify permissions at the moment of every query, ensuring that insights are surfaced only to those with the explicit authorization to see them. There is no room for "leaky" abstractions or delayed syncing when it comes to confidentiality.

At scale, and with data this sensitive, security is inseparable from architecture. Any system operating across a DMS must respect ethical walls, matter-level permissions, and access controls in real time. Institutional knowledge only has value if it can be surfaced instantly, accurately, and safely.

Gregory Manis, Head of Engineering

View all

I Didn't Go to Law School to Search for Precedent

March 9, 2026

Introducing Playbook Studio

March 6, 2026

Our Commitment to Zero Data Retention

February 12, 2026

Related posts