CONT YAML source
code: CONT
biber_number: N59
xiao_number: N99
mfte_code: CONT
bohmann_number: 165
name: Verbal contractions
definition: "Contracted verb forms: n’t, ’ll, ’re, ’ve, ’s (verbal), ’d."
normalization: finite_verbs
detection:
- source: mfte
requires:
- word
- pos
parts:
p1:
cql: '[word="''s|''re|''ve|''m" & pos="VBZ|VBP|VB|VBD|VBN|VBG|MD"]'
p2:
cql: '[word="n''t"]'
p3:
cql: '[word="''ll|''d"]'
combine: "p1 | p2 | p3"
description: >-
MFTE counts: 's/'re/'ve/'m only when POS-tagged as verbs;
n't always; 'll/'d always.
- requires:
- word
regex: "\\b\\w+'(t|ll|re|ve|s|d)\\b"
description: Raw text regex for contractions. No NLP pipeline needed.
- source: pybiber
requires:
- word
- pos
parts:
p1:
cql: '[word="''s|''re|''ve|''m" & pos="VB.*"]'
p2:
cql: '[word="''ll" & pos="MD"]'
p3:
cql: '[word="''d" & pos="MD|VB.*"]'
p4:
cql: '[word="n''t"]'
p5:
cql: '[word=".*''ll" & pos="VB.*"]'
combine: "p1 | p2 | p3 | p4 | p5"
description: >-
pybiber blob patterns: `'\S*_v\S*` (verb-tagged contractions),
`'\S*_md` (modal contractions), `n't_\S*` (any-tagged negation).
The verb POS constraint prevents counting possessive 's that leaks
through spaCy's POS tagger as non-POS tags. p5 catches unsplit
contractions like "birds'll" tagged as VBP.
sources:
- biber_1988
- mfte
- pybiber
- xiao_2009
- grieve_2023
- bohmann_2019
notes: Third-highest loading on D1.
examples:
- text: I _don't_ know.
source: le_foll_2024
- text: It _isn't_ my problem.
source: le_foll_2024
- text: _You'll_ have to deal with it.
source: le_foll_2024