WHQU Direct WH-questions

Definition

Direct questions introduced by a WH-word (who, what, where, when, why, how).

Detection Rules

WH-word followed by a question mark within 15 tokens. Approximates MFTE’s approach. The window limit prevents crossing sentence boundaries in most cases.

cql[pos="WRB|WP|WP\\$|WDT"] []{0,14} [word="?"]

Requires: word, pos

mfte

MFTE checks for ? within 16 tokens of a WH-word (lines 842-857), but QUTAG runs first and “claims” question marks in tag questions, preventing WHQU from matching them. Since we cannot replicate this sequential claiming, we use a tighter window (9 tokens) to avoid matching WH-words in matrix clauses followed by tag questions (e.g., “That is what he would tell me anyhow, is n’t it?”). All direct WH-questions in English fit within 9 tokens.

cql[word="[Ww]hat|[Ww]here|[Ww]hen|[Hh]ow|[Ww]hy|[Ww]ho|[Ww]hom|[Ww]hose|[Ww]hich"] []{0,8} [word="?"]

Requires: word, pos

Sentence-scoped

pybiber

pybiber condition (a): W-tagged non-DET token in a sentence containing “?”. Uses s-attribute for sentence-level property check.

cql[pos="W.*" & upos!="DET" & sent_has_q="true"]

Requires: pos, upos

S-attributes: sent_has_q[word="?"]

pybiber

pybiber condition (b): W-tagged non-DET token immediately followed by a token with dep=aux.

cql[pos="W.*" & upos!="DET"] [dep="aux"]

Requires: pos, upos, dep

pybiber

pybiber condition (d): W-tagged non-DET token in an AUX-containing sentence, preceded by a token whose dep is not ccomp or advcl. Uses document-level adjacency (no sentence_scope) to match pybiber’s lag behavior — sentence-initial tokens see the previous sentence’s period, trivially passing the dep filter. Uses s-attribute for the sentence-level AUX check.

cql[dep!="ccomp|advcl"] [pos="W.*" & upos!="DET" & sent_has_aux="true"]

Requires: pos, upos, dep

Anchor: last

S-attributes: sent_has_aux[upos="AUX"]

Normalization

Per finite_verbs

Examples

What’s happening?

Source: le_foll_2024

Why don’t we call the game off?

Source: le_foll_2024

And who is Dinah, if I might venture to ask the question?

Source: le_foll_2024

Sources

  • biber_1988 — Biber, Douglas (1988) : Variation across Speech and Writing
  • mfte — Le Foll, Elen & Shakir, Muhammad (2023/2025) : Multi-Feature Tagger of English (MFTE) — Python version
  • xiao_2009 — Xiao, Richard (2009) : Multidimensional analysis and the study of world Englishes

Notes

pybiber counts W-tagged non-DET tokens satisfying any of four conditions: (a) in a sentence containing “?”, (b) next token has dep=aux, (c) near sentence start with preceding AUX, (d) sentence contains AUX and previous token dep not ccomp/advcl. The pybiber-source rules replicate this using s-attributes for sentence-level containment checks and document-level adjacency (no sentence_scope) to match pybiber’s lag column behavior.