CONT YAML source

code: CONT
biber_number: N59
xiao_number: N99
mfte_code: CONT
bohmann_number: 165
name: Verbal contractions
definition: "Contracted verb forms: n’t, ’ll, ’re, ’ve, ’s (verbal), ’d."
normalization: finite_verbs
detection:
- source: mfte
  requires:
  - word
  - pos
  parts:
    p1:
      cql: '[word="''s|''re|''ve|''m" & pos="VBZ|VBP|VB|VBD|VBN|VBG|MD"]'
    p2:
      cql: '[word="n''t"]'
    p3:
      cql: '[word="''ll|''d"]'
  combine: "p1 | p2 | p3"
  description: >-
    MFTE counts: 's/'re/'ve/'m only when POS-tagged as verbs;
    n't always; 'll/'d always.
- requires:
  - word
  regex: "\\b\\w+'(t|ll|re|ve|s|d)\\b"
  description: Raw text regex for contractions. No NLP pipeline needed.
- source: pybiber
  requires:
  - word
  - pos
  parts:
    p1:
      cql: '[word="''s|''re|''ve|''m" & pos="VB.*"]'
    p2:
      cql: '[word="''ll" & pos="MD"]'
    p3:
      cql: '[word="''d" & pos="MD|VB.*"]'
    p4:
      cql: '[word="n''t"]'
    p5:
      cql: '[word=".*''ll" & pos="VB.*"]'
  combine: "p1 | p2 | p3 | p4 | p5"
  description: >-
    pybiber blob patterns: `'\S*_v\S*` (verb-tagged contractions),
    `'\S*_md` (modal contractions), `n't_\S*` (any-tagged negation).
    The verb POS constraint prevents counting possessive 's that leaks
    through spaCy's POS tagger as non-POS tags. p5 catches unsplit
    contractions like "birds'll" tagged as VBP.
sources:
- biber_1988
- mfte
- pybiber
- xiao_2009
- grieve_2023
- bohmann_2019
notes: Third-highest loading on D1.
examples:
- text: I _don't_ know.
  source: le_foll_2024
- text: It _isn't_ my problem.
  source: le_foll_2024
- text: _You'll_ have to deal with it.
  source: le_foll_2024