Name: Suzume
Author: libraz

Question 1

What is Suzume?

Accepted Answer

Suzume is a lightweight, feature-driven Japanese tokenizer that runs on WebAssembly. Unlike dictionary-based analyzers like MeCab, it works without large dictionary files and is robust to unknown words. It runs in browsers, Node.js, Deno, and Bun.

Question 2

How does Suzume handle unknown words?

Accepted Answer

Suzume generates candidates from character patterns (kanji sequences, katakana sequences, alphanumeric compounds) and evaluates them alongside dictionary entries using Viterbi algorithm. This makes it robust to neologisms and domain-specific terms.

Question 3

Can I use Suzume in the browser?

Accepted Answer

Yes, Suzume runs entirely in the browser via WebAssembly. No server required. You can load it from npm or directly from a CDN like esm.sh. The entire package is under 250KB gzipped.

Question 4

How do I add custom words to Suzume?

Accepted Answer

Use loadUserDictionary() to add custom words at runtime. Format: "word,pos" (e.g., "ChatGPT,noun"). You can add brand names, technical terms, or domain-specific vocabulary without rebuilding the dictionary.

Question 5

What is the difference between Suzume and MeCab?

Accepted Answer

MeCab requires large dictionary files (50MB+) and server-side installation. Suzume uses feature-based analysis with a minimal dictionary, runs on WebAssembly in the browser, and handles unknown words gracefully. Choose Suzume for client-side processing without server infrastructure.

Question 6

How is Suzume different from kuromoji.js?

Accepted Answer

kuromoji.js requires downloading a 20MB+ dictionary on first load, causing slow initial page loads. Suzume is under 250KB gzipped and loads instantly. Suzume also handles unknown words better and has a simpler API.

Question 7

Can I use Suzume for SEO keyword extraction?

Accepted Answer

Yes, Suzume can extract nouns and compound words from Japanese text, making it ideal for auto-tagging blog posts, generating hashtags, or building keyword analysis tools - all without server infrastructure.

Question 8

Is Suzume suitable for production use?

Accepted Answer

Yes, Suzume is production-ready. It is compiled from C++ to WebAssembly for near-native performance, includes full TypeScript support, and works in all modern browsers, Node.js, Deno, and Bun.

Question 9

Does Suzume work offline?

Accepted Answer

Yes, once loaded, Suzume works completely offline. All processing happens locally in the browser or runtime. No API calls or internet connection required after initial load.

Question 10

How do I install Suzume?

Accepted Answer

Install via npm: npm install @libraz/suzume. Then import and use: const { Suzume } = await import("@libraz/suzume"); const suzume = await Suzume.create(); const result = suzume.analyze("日本語テキスト");

Traditional (MeCab)	Suzume
Stores every word with all metadata	Stores only essential words
Pre-computed connection costs for all word pairs	Computes connections on the fly
Requires full dictionary to handle any input	Uses pattern rules for unknown words

Category	MeCab	Suzume
Particles (は, が, を...)	~50 entries	~50 entries
Common verbs	~30,000 entries	~500 entries
Nouns	~200,000 entries	Pattern-based
Proper nouns	~100,000 entries	Pattern-based

Pattern	Rule	Result
`[カタカナ]+`	Foreign loanwords are nouns	noun
`[漢字]+`	Kanji compounds are usually nouns	noun
`[漢字]+する`	Kanji + する = verbal noun	verb
`[ひらがな]+い`	Ending in い = adjective candidate	adjective

Use Case	MeCab	Suzume
Academic research	✓ Best choice	△
Browser apps	✗ Too large	✓ Best choice
Search indexing	✓	✓
Hashtag generation	✓	✓
Real-time UI	✗ Needs server	✓

Question	Answer
Why is MeCab big?	Stores every word + pre-computed costs
Why is Suzume small?	Stores rules + minimal dictionary
Is accuracy affected?	~2-3% lower, acceptable for most uses
When to use MeCab?	Academic research, maximum accuracy
When to use Suzume?	Browser apps, real-time, size-sensitive

How It Works

Why So Small?

The Short Answer

The Three Pillars

1. Minimal Dictionary

2. Rule-based Pattern Recognition

3. On-the-fly Computation

The Trade-off

Technical Deep Dive

Analysis Pipeline

Unknown Word Handling

Verb Conjugation

Summary

How It Works ​

Why So Small? ​

The Short Answer ​

The Three Pillars ​

1. Minimal Dictionary ​

2. Rule-based Pattern Recognition ​

3. On-the-fly Computation ​

The Trade-off ​

Technical Deep Dive ​

Analysis Pipeline ​

Unknown Word Handling ​

Verb Conjugation ​

Summary ​