Welcome to CLEF: Catalogue of Lexico-grammatical English Features
CLEF synthesizes feature sets from multiple foundational sources into 447 features across 25 categories, each with a human-readable definition, machine-readable detection rules, annotated examples, and source provenance. The catalogue is implementation-agnostic: it defines what each feature is and how to detect it declaratively.
654 detection rules span methods from text, lemma or POS tag matching to corpus query patterns (CQL), dependency tree patterns (Semgrex), and semantic tagging (USAS), with provisions for LLM-based and human annotation. The forthcoming companion library pyclef will provide a Python engine that interprets the catalogue’s rules against text using pluggable taggers.
Browse by Category
447 features across 25 categories.