Grammar & Error Checking

This document describes how Phosphor Notes detects and surfaces grammar, style, and writing errors to users.

Key files

Editor extension that wires linting into CodeMirror: src/renderer/src/editor/extensions/grammar.ts
Worker that runs the grammar pipeline: src/renderer/workers/grammar.ts
Site-specific/custom checks: src/renderer/workers/customChecks.ts
Settings UI (toggles exposed to users): src/renderer/src/components/SettingsModal.tsx

Overview

The grammar and error-checking feature is implemented as a client-side, worker-based pipeline that combines three main sources of diagnostics:

A configurable set of retext-based style checks (passive voice, simplification, inclusive language, readability, profanities, redundancies, intensifiers).
Harper (harper.js) — an optional WASM-based linter used for additional grammar suggestions and suggestions where available.
A set of custom, project-specific heuristics and rules implemented in JavaScript (clichés, common usage errors, capitalization rules, time-format checks, and miscellaneous grammar heuristics).

The pieces are orchestrated inside a dedicated web worker; the editor extension simply posts the document text (and the current grammar settings) to the worker and receives an array of diagnostics back.

##Editor integration (how linting is started)

The editor-side integration lives in src/renderer/src/editor/extensions/grammar.ts. The exported createGrammarLint() returns a CodeMirror linter function.
For each editor instance a fresh worker is created (new grammarWorkerModule()), so each editor has an isolated worker instance.
The linter callback reads view.state.doc.toString() and skips grammar checking for very large documents: a hard limit of 50,000 characters prevents running the pipeline on huge notes (early-exit). See the length guard in the extension.
Requests are debounced: the linter uses a 750 ms delay (so checks run after 750 ms of typing inactivity).
The extension sends a single message to the worker with { text, settings } and installs a one-time message listener to resolve the Promise with the returned diagnostics array.

##Worker pipeline (what happens inside the worker)

The worker is the heart of the feature; it performs three categories of checks and merges their results.

retext pipeline (retext-*/unified plugins)

The worker builds a unified() processor and conditionally registers retext plugins depending on settings (the toggles the user sets). This is done in createProcessor(settings) in src/renderer/workers/grammar.ts.
The following plugins are used when enabled:
- retext-passive (passive voice detection)
- retext-simplify (suggest simpler phrasing)
- retext-equality (inclusive language checks; the code configures a long ignore list to reduce noisy hits)
- retext-readability (readability scoring and long/complex sentence detection)
- retext-profanities (profane / offensive language detection)
- retext-intensify (weak/hedging/intensifying words detection)
- retext-redundant-acronyms and retext-syntax-urls are always included to catch redundancy and URL issues.
After processing the document text the worker filters and converts retext messages into diagnostics. Messages are filtered to reduce false positives (for example, skipping some retext-contractions messages and ignoring some retext-readability hits that are list items).
retext reports positions as line/column ranges; the worker converts these to absolute offsets using calculateOffset(text, line, column).
For each message the worker maps the source (retext plugin name) to a human-readable source string like “Passive Voice” or “Readability” so the editor UI shows sensible labels.

Harper (harper.js)

The worker lazily imports harper.js at runtime (so the heavy WASM component is only pulled in if needed). The import is wrapped in a getHarperLinter() helper that caches a promise to avoid re-initializing the linter multiple times.
The Harper linter is configured with some features disabled by default (for example spelling is disabled via setLintConfig({ SpellCheck: false, DefiniteArticle: false, UseTitleCase: false })). The linter is initialized with an appropriate dialect inferred from the runtime locale (American, British, Canadian, Australian, Indian) using navigator/Intl detection.
Harper produces Lint objects that include spans and optional suggestions; these are mapped into the same diagnostic shape (absolute offsets, severity ‘warning’, a message and a source string). Suggestions are formatted (insert/replace/remove) and appended to the message text when available.
Harper is called concurrently with retext to speed up processing.

Custom checks (project heuristics)

A set of custom JavaScript checks runs synchronously on the input text (runCustomChecks() in src/renderer/workers/customChecks.ts). Each check returns an array of diagnostics; these checks include:
- Indefinite article checks (suggesting a vs an based on locale-aware heuristics and silent-h handling).
- Cliché detection (matching against a large CLICHES list).
- Common usage issues (mapping known misspellings & poor phrase choices to suggested replacements).
- Time format suggestions (AM/PM formatting guidance and ambiguity warnings for 12 a.m./12 p.m.).
- Paragraph start checks (flagging sentences that begin with “But” at paragraph start).
- Capitalization heuristics (suggest words that should be all-caps or initial-capitalized using shouldAllCapitalized() / shouldCapitalize()).
- Grammar heuristics like confusing “sense”/”since” and incorrect “they’re/there/their” usage.

Merging and prioritization

After collecting results the worker performs a merge step:
- customDiagnostics are computed first.
- harperDiagnostics are filtered to remove any diagnostics that exactly overlap a span produced by a custom check. This prevents duplicate or conflicting diagnostics for the exact same text range (custom checks take precedence on identical spans).
- The final allDiagnostics array is [...customDiagnostics, ...harperFiltered, ...retextDiagnostics] and is posted back to the editor.
The editor-side listener receives that array and resolves the linter promise; CodeMirror then displays the diagnostics as inline highlights/tooltips according to its lint UI.

Diagnostic shape and offsets

Diagnostics use the following minimal shape (TypeScript types are in the workers):
- from (number): absolute start offset in document
- to (number): absolute end offset in document
- severity: 'warning' | 'info' | 'error'
- message (string)
- source (string)
Note: retext reports positions as “line:col - line:col” in message strings. The worker parses those and converts them to absolute offsets with calculateOffset().
The worker guarantees to > from by forcing a one-character range when the computed to would be <= from.

Settings and user controls

The app exposes several toggles in the Settings modal under “Grammar & Style”. See the UI in src/renderer/src/components/SettingsModal.tsx.
The settings are expressed as a GrammarSettings object passed from the editor extension to the worker and include booleans such as:
- checkPassiveVoice
- checkSimplification
- checkInclusiveLanguage
- checkReadability
- checkProfanities
- checkCliches
- checkIntensify
The worker conditionally adds or omits retext plugins based on these flags; the custom checks also observe checkCliches when deciding whether to run cliché detection.

Performance and failure handling

The editor-side check avoids running on extremely large documents by using an early length guard (50k characters).
Each editor instance gets its own worker to keep tasks isolated and avoid cross-talk between documents.
The worker getHarperLinter() caches the harper import/initialization promise so the expensive WASM setup only happens once per worker process.
All worker exceptions are caught; on error the worker logs to console and posts an empty diagnostics array back to the editor so the UI remains responsive.

Rules, overrides, and false-positive mitigation

The implementation is intentionally conservative in some places to avoid alert fatigue:
- retext-equality is configured with a long ignore list to avoid flagging many family/medical/generic terms that would otherwise cause noise.
- The worker filters a subset of retext messages (examples: contractions apostrophe warnings and some retext-readability hits inside list bullets) to reduce irrelevant alerts.
- Custom checks often use ‘info’ severity for stylistic guidance and ‘warning’ for stronger, likely mistakes.
- Custom checks take precedence over Harper when they produce identical spans to avoid duplicate/conflicting messages.

Extending or modifying checks

To add a new retext rule, add/enable the relevant package in the worker and register it inside createProcessor(settings).
To add a new custom heuristic, add a CustomCheck implementation in src/renderer/workers/customChecks.ts and append it to the customChecks list. The function should accept (text: string, settings?: CustomCheckSettings): Diagnostic[] and return absolute offsets.
If you need Harper to behave differently, update getHarperLinter() in src/renderer/workers/grammar.ts — for example adjusting setLintConfig() options or dialect detection.

Testing and debugging tips

You can instrument the worker by adding temporary console logs; worker console messages appear in renderer devtools (the web worker’s console).
Unit tests for many checks exist in the repo under src/main/__tests__ and similar test folders — check the repo test suite for examples of expected diagnostics.
To reproduce locale-specific behavior (Harper dialect or article inference), set navigator.language/navigator.languages in the environment or override detectDialect() in getHarperLinter() when testing.

Conclusion

The grammar & error checking system is a hybrid approach that combines community tooling (retext and plugins), an optional WASM-based linter (harper.js) for stronger grammar suggestions, and a set of lightweight, maintainable custom checks for app-specific heuristics. The system is designed to be configurable, to minimize false positives, and to run off the main thread to preserve editor responsiveness.