• kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    19 hours ago

    There is a reluctance to discuss at a weight level - this graphs out refusals for criticism of different countries for different models:

    https://x.com/xlr8harder/status/1884705342614835573

    But the OP’s refusal is occurring at a provider level and is the kind that would intercept even when the model relaxes in longer contexts (which happens for nearly every model).

    At a weight level, nearly all alignment lasts only a few pages of context.

    But intercepted refusals occur across the context window.