THATD YAML source

code: THATD
biber_number: N60
xiao_number: N102
mfte_code: THATD
name: That deletion
definition: >-
  Omission of subordinator "that" where it could appear (e.g., "I know [that] he
  left").
normalization: finite_verbs
detection:
- requires:
  - pos
  - dep
  semgrex: '{pos:/VB.*/}=verb >ccomp {}=comp'
  description: >-
    Verb with ccomp dependent where no "that" token intervenes. Requires
    post-filter: check that no token with word "that" appears between verb
    and complement clause head.
  caveat: >-
    One of the harder features to detect automatically. False positives from
    direct speech complements. Semgrex finds candidates; post-filter needed
    to confirm "that" is absent.
- requires:
  - llm
  description: >-
    Prompt LLM to identify verb complement clauses where "that" has been omitted.
    Provide sentence context and ask whether "that" could be inserted before a
    clause.
- source: mfte
  requires:
  - lemma
  - pos
  words:
  - accept
  - acknowledge
  - add
  - admit
  - affirm
  - agree
  - allege
  - allow
  - announce
  - anticipate
  - appear
  - argue
  - arrange
  - ascertain
  - ask
  - assert
  - beg
  - believe
  - bet
  - boast
  - calculate
  - certify
  - check
  - claim
  - command
  - comment
  - complain
  - concede
  - conclude
  - confess
  - confide
  - confirm
  - conjecture
  - consider
  - contend
  - convey
  - decide
  - declare
  - decree
  - deduce
  - deem
  - demand
  - demonstrate
  - deny
  - desire
  - determine
  - discern
  - disclose
  - discover
  - doubt
  - dream
  - enjoin
  - ensure
  - entreat
  - establish
  - estimate
  - exclaim
  - expect
  - explain
  - fancy
  - fear
  - feel
  - find
  - forecast
  - foresee
  - foretell
  - forget
  - gather
  - grant
  - guarantee
  - guess
  - hear
  - hint
  - hold
  - hope
  - imagine
  - imply
  - indicate
  - infer
  - insist
  - instruct
  - insure
  - intend
  - judge
  - know
  - learn
  - maintain
  - mean
  - mention
  - move
  - note
  - notice
  - object
  - observe
  - ordain
  - order
  - perceive
  - pledge
  - pray
  - predict
  - prefer
  - presume
  - presuppose
  - pretend
  - proclaim
  - promise
  - pronounce
  - prophesy
  - propose
  - protest
  - prove
  - realise
  - realize
  - reason
  - recall
  - reckon
  - recognise
  - recognize
  - recommend
  - reflect
  - remark
  - remember
  - repeat
  - reply
  - report
  - request
  - require
  - resolve
  - reveal
  - rule
  - see
  - seem
  - sense
  - show
  - signify
  - state
  - stipulate
  - submit
  - suggest
  - suppose
  - suspect
  - swear
  - testify
  - think
  - understand
  - urge
  - vote
  - vow
  - warn
  - write
  parts:
    p1:
      cql: '[lemma={words} & pos="VB.*"] [pos="DT|PRP|NN|NNS|NNP|NNPS"] [pos="MD|VB.*"]'
    p2:
      cql: '[lemma={words} & pos="VB.*"] [pos="JJ.*|RB.*|DT|CD|PRP|PRP\\$"] [pos="NN.*|CD"] [pos="MD|VB.*"]'
    p3:
      cql: '[lemma={words} & pos="VB.*"] [pos="JJ.*|RB.*|DT|CD|PRP|PRP\\$"] [pos="JJ.*"] [pos="NN.*"] [pos="MD|VB.*"]'
  combine: "p1 | p2 | p3"
  description: >-
    MFTE that-deletion: public/private/suasive verb followed by a subject NP
    and then a verb or modal, with no intervening "that" (lines 1028-1031).
    Three patterns of increasing NP complexity: (1) verb + NP-head + verb,
    (2) verb + modifier + NP-head + verb, (3) verb + modifier + adj + noun + verb.
    POS-based positional matching (no dependency parsing needed).
- source: pybiber
  requires:
  - lemma
  - pos
  - upos
  - dep
  words:
  - accept
  - acknowledge
  - add
  - admit
  - affirm
  - agree
  - allege
  - allow
  - announce
  - anticipate
  - appear
  - argue
  - arrange
  - ascertain
  - ask
  - assert
  - beg
  - believe
  - bet
  - boast
  - calculate
  - certify
  - check
  - claim
  - command
  - comment
  - complain
  - concede
  - conclude
  - confess
  - confide
  - confirm
  - conjecture
  - consider
  - contend
  - convey
  - decide
  - declare
  - decree
  - deduce
  - deem
  - demand
  - demonstrate
  - deny
  - desire
  - determine
  - discern
  - disclose
  - discover
  - doubt
  - dream
  - enjoin
  - ensure
  - entreat
  - establish
  - estimate
  - exclaim
  - expect
  - explain
  - fancy
  - fear
  - feel
  - find
  - forecast
  - foresee
  - foretell
  - forget
  - gather
  - grant
  - guarantee
  - guess
  - hear
  - hint
  - hold
  - hope
  - imagine
  - imply
  - indicate
  - infer
  - insist
  - instruct
  - insure
  - intend
  - judge
  - know
  - learn
  - maintain
  - mean
  - mention
  - move
  - note
  - notice
  - object
  - observe
  - ordain
  - order
  - perceive
  - pledge
  - pray
  - predict
  - prefer
  - presume
  - presuppose
  - pretend
  - proclaim
  - promise
  - pronounce
  - prophesy
  - propose
  - protest
  - prove
  - realise
  - realize
  - reason
  - recall
  - reckon
  - recognise
  - recognize
  - recommend
  - reflect
  - remark
  - remember
  - repeat
  - reply
  - report
  - request
  - require
  - resolve
  - reveal
  - rule
  - see
  - seem
  - sense
  - show
  - signify
  - state
  - stipulate
  - submit
  - suggest
  - suppose
  - suspect
  - swear
  - testify
  - think
  - understand
  - urge
  - vote
  - vow
  - warn
  - write
  parts:
    p1:
      cql: '[lemma={words} & upos="VERB"] [dep="nsubj" & pos!="WP"] [upos="VERB" & pos!="VBG"]'
    p2:
      cql: '[lemma={words} & upos="VERB"] [pos="DT"] [dep="nsubj"] [upos="VERB"]'
    p3:
      cql: '[lemma={words} & upos="VERB"] [pos="DT"] [dep="amod"] [dep="nsubj"] [upos="VERB"]'
  combine: "p1 | p2 | p3"
  description: >-
    Three pybiber that-deletion patterns sharing the same verb matchlist:
    (1) verb + nsubj + verb, (2) verb + DT + nsubj + verb,
    (3) verb + DT + amod + nsubj + verb.
    Uses `upos="VERB"` throughout to exclude auxiliaries (UPOS=AUX).
    pybiber's verb_matchlist does not include "say" (unlike MFTE's list).
examples:
- text: I _thought_ he just meant our side.
  source: le_foll_2024
- text: I _know_ that's not his thing.
  source: le_foll_2024
- text: I _mean_ you'll do everything.
  source: le_foll_2024
sources:
- biber_1988
- mfte
- pybiber
- xiao_2009
notes: Second-highest loading on D1. Strong involvement marker.