Bringing AI into your browser

Last year, we built The Crooked Tankard, a text adventure game that runs entirely in the browser using Chrome's built-in Gemini Nano AI. It became one of the first public projects to leverage the Chrome Prompt API and went on to win "Most Innovative Web Application" at the Google Chrome Built-in AI Challenge 2024.

That award validated what we believed: modern browsers can host sophisticated, privacy-first AI experiences without server infrastructure.

The Crooked Tankard: AI-Augmented Gaming

The Crooked Tankard isn't a "fully client-side text adventure game"—it's a text adventure game augmented by AI. The game logic remains deterministic and rule-based, but AI powers the dynamic parts:

World generation - Creating varied, believable locations and environments
NPC interactions - Generating dialogue and character responses
Event narration - Describing what happens in the game world

The game was written before structured outputs existed in Gemini Nano, so we relied heavily on N-shot prompting—providing the model with multiple examples to guide its responses. This technique proved remarkably effective for generating consistent, contextually appropriate content.

The result: infinite replayability with a vast, procedurally generated world that feels alive.

FittyFiritti: Real-Time Transcription and Presentation

Our latest project, FittyFiritti, takes browser-based AI in a completely different direction. It's a full-page application that provides:

Real-time transcription of microphone and system audio
Instant translation of transcriptions into other languages
Automatic presentation generation from transcribed speech
Intent-driven presentation control using structured outputs
Voice-controlled diagram editing with a custom diagram editor
Meeting summarization based on complete transcriptions
Export capabilities for transcriptions, translations, and presentations

Unlike The Crooked Tankard's N-shot prompting approach, FittyFiritti leverages structured outputs introduced in later versions of Gemini Nano. This allows precise control over the AI's response format, enabling reliable parsing of user intent for presentation control.

The application captures audio through the screen-sharing API (including system audio) and microphone input, transcribes it in real-time, translates it on the fly, and can automatically build a presentation as you speak. The diagram editor lets you say things like "add a node called Database" or "connect API to Database," and the AI interprets these commands into diagram operations.

At the end of a meeting, FittyFiritti generates a comprehensive summary based on all transcriptions, and you can download everything for further processing.

Where Gemini Nano Excels

Gemini Nano is a lightweight model compared to GPT-4 or Claude, but it excels at focused, compact tasks:

Speech transcription in multiple languages (English, Japanese, Spanish, with surprisingly good results for Hungarian and German)
Language translation for continuous speech
Text classification and reasoning for intent detection
Structured output generation when given a JSON schema

These capabilities make it possible to build responsive, intelligent applications that run entirely in the browser without any server dependency.

Enabling Gemini Nano in Chrome

Gemini Nano is currently available only in Chrome Canary and requires enabling experimental flags. Here's how to set it up:

Step 1: Enable Chrome AI Flags

Navigate to chrome://flags and enable:

Prompt API for Gemini Nano (chrome://flags/#prompt-api-for-gemini-nano)
- Set to: Enabled Multilingual
Prompt API for Gemini Nano Multimodal Input (chrome://flags/#prompt-api-for-gemini-nano-multimodal-input)
- Set to: Enabled Multilingual
Translation API (chrome://flags/#translation-api)
- Set to: Enabled
Summarization API for Gemini Nano (chrome://flags/#summarization-api-for-gemini-nano)
- Set to: Enabled Multilingual
Optimization Guide On Device Model (chrome://flags/#optimization-guide-on-device-model)
- Set to: Enabled BypassPerfRequirement

Click Relaunch to restart Chrome.

Step 2: Update Chrome Components

Navigate to chrome://components and:

Find Optimization Guide On Device Model
Click Check for update
Wait for the model to download (typically 1-2 GB, may take several minutes)
Verify the component shows a recent version

Step 3: Check API Availability

You can check if the APIs are available programmatically:

// Check Prompt API availability
const availability = await window.LanguageModel.availability();
console.log('Prompt API:', availability); // 'available', 'downloadable', or 'unavailable'

// Check Summarizer API
const summarizerAvailable = !!window.Summarizer;
console.log('Summarizer API:', summarizerAvailable);

// Check Translation API
const translatorAvailable = !!window.Translator;
console.log('Translation API:', translatorAvailable);

Using the Prompt API

Here's a simple example using the Prompt API for transcription:

// Create a session
const session = await window.LanguageModel.create({
  temperature: 0.5,
  topK: 3,
  expectedInputs: [{ type: 'audio', languages: ['en'] }],
  expectedOutputs: [{ type: 'text', languages: ['en'] }],
});

// Transcribe audio
const transcription = await session.prompt([
  {
    role: 'user',
    content: [
      { type: 'text', value: 'Please transcribe this audio:' },
      { type: 'audio', value: audioBlob },
    ],
  },
]);

console.log('Transcription:', transcription);

// Clean up
session.destroy();

Using the Summarizer API

The Summarizer API makes it easy to generate summaries:

// Check availability
if (!window.Summarizer) {
  console.error('Summarizer API not available');
  return;
}

const availability = await window.Summarizer.availability();
if (availability === 'unavailable') {
  console.error('Summarizer not available');
  return;
}

// Create summarizer
const summarizer = await window.Summarizer.create({
  type: 'tldr',
  format: 'plain-text',
  length: 'long',
  sharedContext: 'This is a meeting transcription.',
});

// Generate summary
const fullText = transcriptions.join('\n\n');
const summary = await summarizer.summarize(fullText);

console.log('Summary:', summary);

// Clean up
summarizer.destroy();

Handling AI State and Downloads

Properly handling the AI model state is crucial:

// Check availability
const availability = await window.LanguageModel.availability();

switch (availability) {
  case 'available':
    // Ready to use
    const session = await window.LanguageModel.create();
    break;
    
  case 'downloadable':
    // Need to download - show UI and monitor progress
    const session = await window.LanguageModel.create({
      monitor: (monitor) => {
        monitor.addEventListener('downloadprogress', (event) => {
          console.log(`Download progress: ${event.loaded * 100}%`);
          updateProgressBar(event.loaded);
        });
      },
    });
    break;
    
  case 'unavailable':
    // Not supported on this device
    showError('AI not available on this device');
    break;
}

Developer Considerations

When building with Gemini Nano, keep in mind:

Model state detection - Check if the model is available, downloading, or unavailable
Download handling - The initial model is large (~1-2 GB) and requires time to download
Session lifecycle - Properly create and destroy sessions to manage resources
Fallback logic - Consider cloud-based alternatives when Gemini Nano isn't available
Error handling - Network issues, quota limits, and API errors need proper handling

Users will experience a brief initialization phase on first use, but this is a reasonable trade-off for completely local, privacy-first AI inference.

Browser Support

Currently, Gemini Nano runs only in Chrome Canary with experimental flags enabled. There are no official plans from Safari, Firefox, or Edge, though other Chromium-based browsers may adopt it once the APIs stabilize.

Despite being experimental, this represents a fundamental shift: the browser is becoming an AI runtime.

What's Next at SIOCODE

We're expanding our suite of AI-powered, fully local web tools at ai.sio.sh. These tools join our existing collection:

qr.sio.sh - QR Code Generator
notes.sio.sh - Private Note-Taking App
barcode.sio.sh - Barcode Generator

We're integrating local AI capabilities where they make sense and building entirely new experiences that leverage browser-native AI. The possibilities are vast, and we're just scratching the surface.

The age of browser-native AI has arrived, and we're building applications that push the boundaries of what's possible without ever touching a server.

What AI-powered browser application would you build if you could run a capable language model entirely client-side with complete privacy?

Have thoughts or questions about building with Gemini Nano? Contact us at info@siocode.hu.