Grammar & Error Checking
This document describes how Phosphor Notes detects and surfaces grammar, style, and writing errors to users.
Key files
- Editor extension that wires linting into CodeMirror: src/renderer/src/editor/extensions/grammar.ts
- Worker that runs the grammar pipeline: src/renderer/workers/grammar.ts
- Site-specific/custom checks: src/renderer/workers/customChecks.ts
- Settings UI (toggles exposed to users): src/renderer/src/components/SettingsModal.tsx
Overview
The grammar and error-checking feature is implemented as a client-side, worker-based pipeline that combines three main sources of diagnostics:
- A configurable set of retext-based style checks (passive voice, simplification, inclusive language, readability, profanities, redundancies, intensifiers).
- Harper (harper.js) — an optional WASM-based linter used for additional grammar suggestions and suggestions where available.
- A set of custom, project-specific heuristics and rules implemented in JavaScript (clichés, common usage errors, capitalization rules, time-format checks, and miscellaneous grammar heuristics).
The pieces are orchestrated inside a dedicated web worker; the editor extension simply posts the document text (and the current grammar settings) to the worker and receives an array of diagnostics back.
##Editor integration (how linting is started)
- The editor-side integration lives in src/renderer/src/editor/extensions/grammar.ts. The exported
createGrammarLint()returns a CodeMirrorlinterfunction. - For each editor instance a fresh worker is created (
new grammarWorkerModule()), so each editor has an isolated worker instance. - The linter callback reads
view.state.doc.toString()and skips grammar checking for very large documents: a hard limit of 50,000 characters prevents running the pipeline on huge notes (early-exit). See the length guard in the extension. - Requests are debounced: the linter uses a 750 ms delay (so checks run after 750 ms of typing inactivity).
- The extension sends a single message to the worker with
{ text, settings }and installs a one-timemessagelistener to resolve the Promise with the returned diagnostics array.
##Worker pipeline (what happens inside the worker)
The worker is the heart of the feature; it performs three categories of checks and merges their results.
- retext pipeline (retext-*/unified plugins)
- The worker builds a
unified()processor and conditionally registers retext plugins depending onsettings(the toggles the user sets). This is done increateProcessor(settings)in src/renderer/workers/grammar.ts. - The following plugins are used when enabled:
retext-passive(passive voice detection)retext-simplify(suggest simpler phrasing)retext-equality(inclusive language checks; the code configures a longignorelist to reduce noisy hits)retext-readability(readability scoring and long/complex sentence detection)retext-profanities(profane / offensive language detection)retext-intensify(weak/hedging/intensifying words detection)retext-redundant-acronymsandretext-syntax-urlsare always included to catch redundancy and URL issues.
- After processing the document text the worker filters and converts retext messages into diagnostics. Messages are filtered to reduce false positives (for example, skipping some
retext-contractionsmessages and ignoring someretext-readabilityhits that are list items). - retext reports positions as line/column ranges; the worker converts these to absolute offsets using
calculateOffset(text, line, column). - For each message the worker maps the
source(retext plugin name) to a human-readable source string like “Passive Voice” or “Readability” so the editor UI shows sensible labels.
- Harper (harper.js)
- The worker lazily imports
harper.jsat runtime (so the heavy WASM component is only pulled in if needed). The import is wrapped in agetHarperLinter()helper that caches a promise to avoid re-initializing the linter multiple times. - The Harper linter is configured with some features disabled by default (for example spelling is disabled via
setLintConfig({ SpellCheck: false, DefiniteArticle: false, UseTitleCase: false })). The linter is initialized with an appropriate dialect inferred from the runtime locale (American, British, Canadian, Australian, Indian) usingnavigator/Intldetection. - Harper produces
Lintobjects that include spans and optional suggestions; these are mapped into the same diagnostic shape (absolute offsets, severity ‘warning’, a message and asourcestring). Suggestions are formatted (insert/replace/remove) and appended to the message text when available. - Harper is called concurrently with retext to speed up processing.
- Custom checks (project heuristics)
- A set of custom JavaScript checks runs synchronously on the input text (
runCustomChecks()in src/renderer/workers/customChecks.ts). Each check returns an array of diagnostics; these checks include:- Indefinite article checks (suggesting
avsanbased on locale-aware heuristics and silent-hhandling). - Cliché detection (matching against a large
CLICHESlist). - Common usage issues (mapping known misspellings & poor phrase choices to suggested replacements).
- Time format suggestions (AM/PM formatting guidance and ambiguity warnings for 12 a.m./12 p.m.).
- Paragraph start checks (flagging sentences that begin with “But” at paragraph start).
- Capitalization heuristics (suggest words that should be all-caps or initial-capitalized using
shouldAllCapitalized()/shouldCapitalize()). - Grammar heuristics like confusing “sense”/”since” and incorrect “they’re/there/their” usage.
- Indefinite article checks (suggesting
Merging and prioritization
- After collecting results the worker performs a merge step:
customDiagnosticsare computed first.harperDiagnosticsare filtered to remove any diagnostics that exactly overlap a span produced by a custom check. This prevents duplicate or conflicting diagnostics for the exact same text range (custom checks take precedence on identical spans).- The final
allDiagnosticsarray is[...customDiagnostics, ...harperFiltered, ...retextDiagnostics]and is posted back to the editor.
- The editor-side listener receives that array and resolves the linter promise; CodeMirror then displays the diagnostics as inline highlights/tooltips according to its lint UI.
Diagnostic shape and offsets
- Diagnostics use the following minimal shape (TypeScript types are in the workers):
from(number): absolute start offset in documentto(number): absolute end offset in documentseverity:'warning' | 'info' | 'error'message(string)source(string)
- Note: retext reports positions as “line:col - line:col” in message strings. The worker parses those and converts them to absolute offsets with
calculateOffset(). - The worker guarantees
to > fromby forcing a one-character range when the computedtowould be <=from.
Settings and user controls
- The app exposes several toggles in the Settings modal under “Grammar & Style”. See the UI in src/renderer/src/components/SettingsModal.tsx.
- The settings are expressed as a
GrammarSettingsobject passed from the editor extension to the worker and include booleans such as:checkPassiveVoicecheckSimplificationcheckInclusiveLanguagecheckReadabilitycheckProfanitiescheckClichescheckIntensify
- The worker conditionally adds or omits retext plugins based on these flags; the custom checks also observe
checkClicheswhen deciding whether to run cliché detection.
Performance and failure handling
- The editor-side check avoids running on extremely large documents by using an early length guard (50k characters).
- Each editor instance gets its own worker to keep tasks isolated and avoid cross-talk between documents.
- The worker
getHarperLinter()caches the harper import/initialization promise so the expensive WASM setup only happens once per worker process. - All worker exceptions are caught; on error the worker logs to console and posts an empty diagnostics array back to the editor so the UI remains responsive.
Rules, overrides, and false-positive mitigation
- The implementation is intentionally conservative in some places to avoid alert fatigue:
retext-equalityis configured with a longignorelist to avoid flagging many family/medical/generic terms that would otherwise cause noise.- The worker filters a subset of
retextmessages (examples: contractions apostrophe warnings and someretext-readabilityhits inside list bullets) to reduce irrelevant alerts. - Custom checks often use ‘info’ severity for stylistic guidance and ‘warning’ for stronger, likely mistakes.
- Custom checks take precedence over Harper when they produce identical spans to avoid duplicate/conflicting messages.
Extending or modifying checks
- To add a new retext rule, add/enable the relevant package in the worker and register it inside
createProcessor(settings). - To add a new custom heuristic, add a
CustomCheckimplementation in src/renderer/workers/customChecks.ts and append it to thecustomCheckslist. The function should accept(text: string, settings?: CustomCheckSettings): Diagnostic[]and return absolute offsets. - If you need Harper to behave differently, update
getHarperLinter()in src/renderer/workers/grammar.ts — for example adjustingsetLintConfig()options or dialect detection.
Testing and debugging tips
- You can instrument the worker by adding temporary console logs; worker console messages appear in renderer devtools (the web worker’s console).
- Unit tests for many checks exist in the repo under
src/main/__tests__and similar test folders — check the repo test suite for examples of expected diagnostics. - To reproduce locale-specific behavior (Harper dialect or article inference), set
navigator.language/navigator.languagesin the environment or overridedetectDialect()ingetHarperLinter()when testing.
Conclusion
The grammar & error checking system is a hybrid approach that combines community tooling (retext and plugins), an optional WASM-based linter (harper.js) for stronger grammar suggestions, and a set of lightweight, maintainable custom checks for app-specific heuristics. The system is designed to be configurable, to minimize false positives, and to run off the main thread to preserve editor responsiveness.