DMA YAML source

code: DMA
biber_number: K50
xiao_number: Z141
mfte_code: DMA
name: Discourse/pragmatic markers
definition: >-
  Discourse markers and pragmatic particles: well, okay, actually, anyway, right,
  etc.
normalization: words
detection:
- requires:
  - word
  - pos
  parts:
    p1:
      cql: '[word={words} & pos!="VB|VBD|VBG|VBN|VBP|VBZ"]'
    p2:
      cql: '[word="right" & pos="UH"]'
    p3:
      cql: '[word="right" & pos="JJ"]'
    p4:
      cql: '[word="well" & pos="UH"]'
    p5:
      cql: '[word="no" & pos="UH"]'
    p6:
      cql: '[word="sure" & pos="JJ|RB"]'
    p7:
      cql: '[word="of"] [word="course"]'
  combine: "p1 | p2 | p3 | p4 | p5 | p6 | p7"
  words:
  - anyway
  - anyways
  - damn
  - goodness
  - gosh
  - lol
  - nope
  - omg
  - whatever
  - wtf
  - yeah
  - yep
  - "yes"
  refines: RB
  description: Default discourse markers rule (based on MFTE). Refines RB.
- source: mfte
  requires:
  - word
  - pos
  parts:
    p1:
      cql: '[word={words} & pos!="VB|VBD|VBG|VBN|VBP|VBZ"]'
    p2:
      cql: '[word="right" & pos="UH"]'
    p3:
      cql: '[word="right" & pos="JJ"]'
    p4:
      cql: '[word="well" & pos="UH"]'
    p5:
      cql: '[word="no" & pos="UH"]'
    p6:
      cql: '[word="sure" & pos="JJ|RB"]'
    p7:
      cql: '[word="of"] [word="course"]'
  combine: "p1 | p2 | p3 | p4 | p5 | p6 | p7"
  words:
  - anyway
  - anyways
  - damn
  - goodness
  - gosh
  - lol
  - nope
  - omg
  - whatever
  - wtf
  - yeah
  - yep
  - "yes"
  description: >-
    MFTE discourse/pragmatic markers. Two groups: (1) simple word list
    always tagged DMA, (2) context-sensitive patterns (sure, right, of
    course, mind you). "now" excluded because MFTE tags it as TIME first.
    "anyhow" excluded because MFTE tags it as RBother, not DMA.
    Anchor on "right" (not "all") and "course" (not "of") to match
    MFTE's anchoring convention.
- source: pybiber
  requires:
  - word
  - pos
  - dep
  cql: '[dep="punct"] [word="well|now|anyhow|anyways"]'
  description: >-
    pybiber replaces all tokens with dep_rel == "punct" with _punct_
    in its blob, so any punctuation (including quotation marks) qualifies
    as the preceding token.
examples:
- text: _Well_, that changes everything.
- text: _Now_, where were we?
- text: _Well_ _no_ they didn't say actually.
  source: le_foll_2024
- text: _Okay_ I guess we'll see how things go right?
  source: le_foll_2024
non_examples:
- text: She now understood the full extent of the problem.
  note: now = time adverb, not discourse marker — no longer a false positive (mfte rule excludes "now")
- text: The well had dried up during the summer.
  note: well = noun (a well) — no longer triggers detector (mfte rule requires UH POS)
sources:
- biber_1988
- mfte
- pybiber
- xiao_2009
- grieve_2023