Web Machine Learning CG Meeting Briefing – March 31, 2025

This briefing document summarizes the main themes and important ideas discussed during the Web Machine Learning Community Group (CG) teleconference held on March 31, 2025, focusing on the EU-APAC timezone friendly session. The information is derived from the meeting agenda and the minutes recorded for that meeting.

Key Themes:

  • Progress and Focus on Built-in AI APIs: The CG is actively exploring and developing several built-in AI APIs for the web platform. There is a strong emphasis on identifying valuable use cases, ensuring implementability, and aiming for interoperability across browser engines.
  • Incubation of New APIs: The Proofreader API is being proposed as a new CG deliverable, with positive initial feedback and a plan to initiate the adoption process.
  • Review and Feedback on Existing Proposals: Discussions revolved around the Technical Architecture Group (TAG) review feedback for the Writing Assistance APIs, as well as earlier feedback from Mozilla and the WICG.
  • Exploring Features and Directions for the Prompt API: Several new feature requests for the Prompt API were discussed, including exposing maximum image/audio limits, enabling multimodal real-time capabilities, and exploring DOM integration.
  • Cross-cutting Considerations: The meeting touched upon broader topics relevant to all incubated APIs, such as on-device vs. cloud-based processing, workload placement, custom models, privacy, and transparent model reporting.
  • Importance of Use Cases and Interoperability: Throughout the discussions, the need to ground API proposals in concrete user needs and to strive for interoperable implementations across different browsers was consistently highlighted.

Most Important Ideas and Facts:

1. Proofreader API Moving Towards Adoption:

  • The Proofreader API, designed to find and correct errors in grammar, spelling, and punctuation in text, is being considered for adoption as a CG deliverable.
  • Anssi, the chair, stated that based on the discussion, the Proofreader API appears ready for adoption, noting that it has “- real-world use cases valuable to users,” is “- demonstrated to be implementable,” and “- aims to produce a spec that allows interoperable implementation between multiple browser engines.”
  • Domenic from Google Chrome expressed support for incubating this API within the group and suggested it could potentially be polyfilled on top of the Prompt API, though the immediate shipping of the Prompt API is uncertain.
  • Tarek from Mozilla expressed the need for a “deeper look at this API” before the next meeting.
  • The adoption process, involving a 30-day group vote, will be initiated soon by Anssi.

2. Writing Assistance APIs Under Review:

  • The initial TAG review feedback for the Writing Assistance APIs was expected but not yet available for this meeting.
  • Domenic mentioned that naming feedback from the TAG has been addressed, and a PR exists for privacy feedback: “privacy feedback has a PR: -> webmachinelearning/writing-assistance-apis#47”.
  • Mozilla’s earlier feedback (mozilla/standards-positions#1067) indicates that their position is still a “WIP” due to the newness of their Web Extension work, which addresses similar tasks. Tarek highlighted their interest in the Prompt API as it aligns more closely with their current focus on allowing web developers to experiment with models they choose, contrasting with built-in APIs that hide model details: “…on our side the intent is to make sure web developers can experiment with models they want, built-in AI APIs hide the model details.”
  • Feedback from the WICG raised the question of whether multiple built-in AI APIs are necessary if a more general Prompt API exists. Domenic clarified that the future of the Prompt API in browsers is still uncertain.
  • The Writing Assistance APIs specification is reported to be “getting very complete this week” and should be in good shape, with the main remaining task being adding an introduction section.

3. Prompt API Feature Requests and Discussions:

  • Exposing Max Image/Audio Limits: Domenic raised the question of whether the Prompt API should expose the maximum number of images or audio inputs a model can accept before throwing an error. While a partner requested this for displaying a character counter, Domenic leaned towards the API design principle of “When in doubt, leave it out,” supported by Anssi referencing Postel’s law and Joshua Bloch’s principle. Christian suggested it could be reasonable if specific partners have a need.
  • Multimodal Real-time Capabilities: Christian advocated for exploring real-time interactions with the Prompt API, noting that on-device capabilities are currently behind cloud-based offerings. He highlighted use cases like “talk to your model” and showcased a demo of an insurance claim being filled in with voice interaction using a cloud-based real-time API. Tarek suggested a “hybrid approach” where easier tasks are handled locally and harder ones on the server. Domenic acknowledged the interest in hybrid solutions as a fallback.
  • DOM Integration: Adam Sobieski presented a proposal to extend the Prompt API’s multimodal support to DocumentFragments, allowing for processing of parts of the DOM. Domenic raised two main concerns: 1) the lack of clear use cases that cannot be achieved with existing methods, and 2) the fact that LLMs operate on text, suggesting that feeding the HTML text (e.g., using .outerHTML) might achieve similar results as the underlying implementation would likely convert the DocumentFragment to text anyway. Christian found the proposal interesting but suggested it might relate to a broader topic of “Agentic AI space” and raised questions about serialization on a larger scale.

4. Cross-cutting Open Questions:

  • Anssi initiated a brainstorming session on cross-cutting topics relevant to the incubated APIs, encouraging participant input based on their interests.
  • Several potential topics were listed, including:
  • On-device vs. cloud-based processing (with the Web Speech API SpeechRecognitionMode as an example evolutionary path).
  • Workload placement (using WebNN MLPowerPreference and WebGPU GPUPowerPreference as hints).
  • Custom models (use cases, cross-origin sharing, developer interest signals, and experiments like Mozilla’s WebExtensions AI API).
  • Custom model storage (learnings from Firefox AI Runtime mode cache experiment).
  • Interoperability testing (proposed WebDriver BiDi protocol extensions).
  • Privacy story (assessments for Writing Assistance, Translator, and Language Detector APIs).
  • Transparent model reporting (Firefox AI Runtime experiment and TAG feedback on Model Cards).

5. Participants and Collaboration:

  • The meeting had representation from several organizations, including Intel, Google Chrome, Mozilla, Microsoft, and individuals interested in AI in education.
  • Anssi highlighted the recent progress, new contributors, and new ideas within the CG.
  • Domenic (Google) emphasized his spec experience and interest in helping with built-in AI APIs.
  • Tarek (Mozilla) positioned their Firefox AI Runtime work (based on ONNX Runtime) and Web Extension APIs as related to the CG’s goals, emphasizing a desire for web developers to experiment with their own models.
  • Zoltan (Intel) expressed interest in the application side of AI and the built-in AI APIs.
  • Maxim (Microsoft) is focused on AI on the Web and developer productivity.
  • Adam Sobieski brought a user perspective from the education space.

This meeting indicates active engagement within the Web Machine Learning CG, with concrete steps being taken towards incubating new web platform APIs for machine learning functionalities. The discussions highlight a balance between exploring ambitious new features and ensuring practical implementability and developer utility, with a strong emphasis on collaboration and gathering diverse perspectives.

Sources

2025-03-31-cg-minutes.md