Bringing AI into your browser
Last year, we built The Crooked Tankard, a text adventure game that runs entirely in the browser using Chrome's built-in Gemini Nano AI. It became one of the first public projects to leverage the Chrome Prompt API and went on to win "Most Innovative Web Application" at the Google Chrome Built-in AI Challenge 2024.
That award validated what we believed: modern browsers can host sophisticated, privacy-first AI experiences without server infrastructure.
The Crooked Tankard: AI-Augmented Gaming
The Crooked Tankard isn't a "fully client-side text adventure game"—it's a text adventure game augmented by AI. The game logic remains deterministic and rule-based, but AI powers the dynamic parts:
- World generation - Creating varied, believable locations and environments
- NPC interactions - Generating dialogue and character responses
- Event narration - Describing what happens in the game world
The game was written before structured outputs existed in Gemini Nano, so we relied heavily on N-shot prompting—providing the model with multiple examples to guide its responses. This technique proved remarkably effective for generating consistent, contextually appropriate content.
The result: infinite replayability with a vast, procedurally generated world that feels alive.
FittyFiritti: Real-Time Transcription and Presentation
Our latest project, FittyFiritti, takes browser-based AI in a completely different direction. It's a full-page application that provides:
- Real-time transcription of microphone and system audio
- Instant translation of transcriptions into other languages
- Automatic presentation generation from transcribed speech
- Intent-driven presentation control using structured outputs
- Voice-controlled diagram editing with a custom diagram editor
- Meeting summarization based on complete transcriptions
- Export capabilities for transcriptions, translations, and presentations
Unlike The Crooked Tankard's N-shot prompting approach, FittyFiritti leverages structured outputs introduced in later versions of Gemini Nano. This allows precise control over the AI's response format, enabling reliable parsing of user intent for presentation control.
The application captures audio through the screen-sharing API (including system audio) and microphone input, transcribes it in real-time, translates it on the fly, and can automatically build a presentation as you speak. The diagram editor lets you say things like "add a node called Database" or "connect API to Database," and the AI interprets these commands into diagram operations.
At the end of a meeting, FittyFiritti generates a comprehensive summary based on all transcriptions, and you can download everything for further processing.
Where Gemini Nano Excels
Gemini Nano is a lightweight model compared to GPT-4 or Claude, but it excels at focused, compact tasks:
- Speech transcription in multiple languages (English, Japanese, Spanish, with surprisingly good results for Hungarian and German)
- Language translation for continuous speech
- Text classification and reasoning for intent detection
- Structured output generation when given a JSON schema
These capabilities make it possible to build responsive, intelligent applications that run entirely in the browser without any server dependency.
Enabling Gemini Nano in Chrome
Gemini Nano is currently available only in Chrome Canary and requires enabling experimental flags. Here's how to set it up:
Step 1: Enable Chrome AI Flags
Navigate to chrome://flags and enable:
- Prompt API for Gemini Nano (
chrome://flags/#prompt-api-for-gemini-nano)- Set to: Enabled Multilingual
- Prompt API for Gemini Nano Multimodal Input (
chrome://flags/#prompt-api-for-gemini-nano-multimodal-input)- Set to: Enabled Multilingual
- Translation API (
chrome://flags/#translation-api)- Set to: Enabled
- Summarization API for Gemini Nano (
chrome://flags/#summarization-api-for-gemini-nano)- Set to: Enabled Multilingual
- Optimization Guide On Device Model (
chrome://flags/#optimization-guide-on-device-model)- Set to: Enabled BypassPerfRequirement
Click Relaunch to restart Chrome.
Step 2: Update Chrome Components
Navigate to chrome://components and:
- Find Optimization Guide On Device Model
- Click Check for update
- Wait for the model to download (typically 1-2 GB, may take several minutes)
- Verify the component shows a recent version
Step 3: Check API Availability
You can check if the APIs are available programmatically:
// Check Prompt API availability
const availability = await window.LanguageModel.availability();
console.log('Prompt API:', availability); // 'available', 'downloadable', or 'unavailable'
// Check Summarizer API
const summarizerAvailable = !!window.Summarizer;
console.log('Summarizer API:', summarizerAvailable);
// Check Translation API
const translatorAvailable = !!window.Translator;
console.log('Translation API:', translatorAvailable);
Using the Prompt API
Here's a simple example using the Prompt API for transcription:
// Create a session
const session = await window.LanguageModel.create({
temperature: 0.5,
topK: 3,
expectedInputs: [{ type: 'audio', languages: ['en'] }],
expectedOutputs: [{ type: 'text', languages: ['en'] }],
});
// Transcribe audio
const transcription = await session.prompt([
{
role: 'user',
content: [
{ type: 'text', value: 'Please transcribe this audio:' },
{ type: 'audio', value: audioBlob },
],
},
]);
console.log('Transcription:', transcription);
// Clean up
session.destroy();
Using the Summarizer API
The Summarizer API makes it easy to generate summaries:
// Check availability
if (!window.Summarizer) {
console.error('Summarizer API not available');
return;
}
const availability = await window.Summarizer.availability();
if (availability === 'unavailable') {
console.error('Summarizer not available');
return;
}
// Create summarizer
const summarizer = await window.Summarizer.create({
type: 'tldr',
format: 'plain-text',
length: 'long',
sharedContext: 'This is a meeting transcription.',
});
// Generate summary
const fullText = transcriptions.join('\n\n');
const summary = await summarizer.summarize(fullText);
console.log('Summary:', summary);
// Clean up
summarizer.destroy();
Handling AI State and Downloads
Properly handling the AI model state is crucial:
// Check availability
const availability = await window.LanguageModel.availability();
switch (availability) {
case 'available':
// Ready to use
const session = await window.LanguageModel.create();
break;
case 'downloadable':
// Need to download - show UI and monitor progress
const session = await window.LanguageModel.create({
monitor: (monitor) => {
monitor.addEventListener('downloadprogress', (event) => {
console.log(`Download progress: ${event.loaded * 100}%`);
updateProgressBar(event.loaded);
});
},
});
break;
case 'unavailable':
// Not supported on this device
showError('AI not available on this device');
break;
}
Developer Considerations
When building with Gemini Nano, keep in mind:
- Model state detection - Check if the model is available, downloading, or unavailable
- Download handling - The initial model is large (~1-2 GB) and requires time to download
- Session lifecycle - Properly create and destroy sessions to manage resources
- Fallback logic - Consider cloud-based alternatives when Gemini Nano isn't available
- Error handling - Network issues, quota limits, and API errors need proper handling
Users will experience a brief initialization phase on first use, but this is a reasonable trade-off for completely local, privacy-first AI inference.
Browser Support
Currently, Gemini Nano runs only in Chrome Canary with experimental flags enabled. There are no official plans from Safari, Firefox, or Edge, though other Chromium-based browsers may adopt it once the APIs stabilize.
Despite being experimental, this represents a fundamental shift: the browser is becoming an AI runtime.
What's Next at SIOCODE
We're expanding our suite of AI-powered, fully local web tools at ai.sio.sh. These tools join our existing collection:
- qr.sio.sh - QR Code Generator
- notes.sio.sh - Private Note-Taking App
- barcode.sio.sh - Barcode Generator
We're integrating local AI capabilities where they make sense and building entirely new experiences that leverage browser-native AI. The possibilities are vast, and we're just scratching the surface.
The age of browser-native AI has arrived, and we're building applications that push the boundaries of what's possible without ever touching a server.
What AI-powered browser application would you build if you could run a capable language model entirely client-side with complete privacy?
Have thoughts or questions about building with Gemini Nano? Contact us at info@siocode.hu.
