Kickoff Focus (December 2025)

Who's Here?

Researchers, archivists, legal scholars, engineers, open-source leaders, and public-interest technologists working on: provenance · licensing · dataset governance · distributed training · archival preservation · open web infrastructure

Opening

Intention setting & project updates

State of the Union

Crawlers, AI & the future of the open web
Libraries & archives
Copyright policy, legal cases, Q4 2025
Frontier models, SLMs & the role of large datasets

Breakouts / Working Groups

Cross-group deep dives aligned with the three challenges

Legal MVP License contours · jurisdictions · timeline · artifacts
Grounding Corpus Canon scope · editorial process · versioning · host
Public-Domain Training Pilot Datasets · baselines · evals · release plan

What is the AI Commons?

The AI Commons is a developing framework to support consent-aware, provenance-rich, open AI development.

It brings together legal tools, technical standards, and institutional partners to create a shared substrate for training, sharing, and governing models and datasets.

Rather than patching legacy copyright and data regimes, the Commons aims to define new norms around transparency, reuse, and reciprocal contribution.. norms that reflect how AI is actually built today.

Why This Matters Now

AI development is accelerating faster than the social, cultural, and technical structures that support human knowledge. Existing copyright, data rights, and fair use traditions are downstream of a broader, longstanding societal conversation about how ideas are created, shared, and transformed.

We are now crossing a threshold in which the computational transformation of ideas demands upstream reflection.

This gathering starts from several simple premises:

Authentic human expression has intrinsic value, which makes provenance essential.
Human knowledge is the birthright of future humans, with temporary exclusions only where needed for clear economic incentive.
If AI systems are reflections or exponents of human knowledge, then access to AI is also part of that birthright.

These premises lead naturally to the need for transparency, accountability, and a commons of data, signals, and AI technologies accessible to all.

Our goal is to translate this into usable infrastructure:

digitization + provenance standards
consent and intent signaling tools (including CC Signals, CCrawl, and AMPL)
commons-aligned data and model pools
governance models for community-managed infrastructure

This is an invitation-only working session (~20 people) focused on building early scaffolding for a durable AI Commons.

What We're Building Together

At this stage, the Commons is not a single license, dataset, or organization—it is a coalition of people working to architect the foundations.

This includes:

legal architects
open-source and model developers
archivists and librarians
dataset curators and provenance experts
institutional partners
companies deploying or stewarding AI systems

Together, we’re designing the shared infrastructure layer that modern AI has been missing.

Common Questions

To be discussed during our December Kickoff with intention of building something real & sustainable

Is a new legal framework needed?

How does it relate to existing copyright/data structures?

Can a small group seed something globally relevant?

What would make it a de facto safe harbor?

What exactly is the Commons object?

Licenses, datasets, models, provenance specs, or all of these?

Is there a sustainable business/economic model?

How do contributors benefit? How do we avoid enclosure?

How do we prevent recreating DRM under a new name?

Where is the line between transparency and control?

The AI Commons