SPLIT YAML source

code: SPLIT
xiao_number: N100
mfte_code: SPLIT
name: Split auxiliaries
definition: >-
  Adverb or other material inserted between auxiliary and main verb
  (e.g., "has never seen", "will probably go", "is always talking").
normalization: finite_verbs
parent: SPLIT_ALL
detection:
- requires:
  - pos
  - dep
  semgrex: '{pos:/VB.*/}=verb >aux {}=aux . {pos:RB}=adv'
  description: Auxiliary + adverb + main verb.
- source: mfte
  requires:
  - pos
  - cat
  - word
  parts:
    # 3-token patterns: AUX + ADV (not n't/not) + VERB
    # Use pos="RB" not cat="RB" since adverbs get refined to other cats
    md3:
      cql: '[pos="MD"] [pos="RB" & word!="n''t|not"] [pos="VB.*"]'
      anchor: 3
    do3:
      cql: '[cat="DOAUX"] [pos="RB" & word!="n''t|not"] [pos="VB.*"]'
      anchor: 3
    have3:
      cql: '[word="have|has|ve|had|having" & pos="VB.*"] [pos="RB" & word!="n''t|not"] [pos="VB.*"]'
      anchor: 3
    be3:
      cql: '[word="be|am|is|are|was|were|been|being|m|re" & pos="VB.*"] [pos="RB" & word!="n''t|not"] [pos="VB.*"]'
      anchor: 3
    # 4-token patterns: AUX + (RB|XX0) + RB + VERB
    # MFTE allows XX0 (n't/not) in position 2 for 4-token patterns
    md4:
      cql: '[pos="MD"] [pos="RB"] [pos="RB"] [pos="VB.*"]'
      anchor: 4
    do4:
      cql: '[cat="DOAUX"] [pos="RB"] [pos="RB"] [pos="VB.*"]'
      anchor: 4
    have4:
      cql: '[word="have|has|ve|had|having" & pos="VB.*"] [pos="RB"] [pos="RB"] [pos="VB.*"]'
      anchor: 4
    be4:
      cql: '[word="be|am|is|are|was|were|been|being|m|re" & pos="VB.*"] [pos="RB"] [pos="RB"] [pos="VB.*"]'
      anchor: 4
  combine: "md3 | do3 | have3 | be3 | md4 | do4 | have4 | be4"
  description: >-
    MFTE surface pattern: (MD|DOAUX|have|be) + RB (not n't/not) + V.
    Uses pos not cat for adverb since refinement happens in parallel.
- source: pybiber
  requires:
  - pos
  - upos
  - dep
  parts:
    p1:
      cql: '[dep="aux.*"] [upos="ADV"] [upos="VERB"]'
    p2:
      cql: '[dep="aux.*"] [upos="ADV"] [upos="ADV"] [upos="VERB"]'
  combine: "p1 | p2"
  description: >-
    pybiber anchors on auxiliary token (`dep_rel` contains "aux"),
    then checks forward for ADV + VERB (or ADV + ADV + VERB).
    Uses UPOS tags (ADV, VERB) not fine-grained Penn tags.
sources:
- biber_1988
- mfte
- pybiber
- xiao_2009
notes: >-
  D4 loading .44 (persuasion). MFTE merges with split infinitives ([[SPINF]]).
examples:
- text: I _would actually drive_.
  source: le_foll_2024
- text: You _can just so tell_.
  source: le_foll_2024
- text: I _can't ever imagine_ arguing with Jill.
  source: le_foll_2024