Skip to content

Not able to disable thinking mode for the self hosted models which use OpenAI as a provider (please read quick description) #9102

@AayushSameerShah

Description

@AayushSameerShah

Before submitting your bug report

Relevant environment info

- OS: Windows 10
- Continue version: 1.2.11
- IDE version: VS CODE 1.106.3
- Model: Qwen3-8b
- config:
  

C:\Users\aayush\.continu
name: My Config
version: 1.0.0
schema: v1
models:
  - name: Qwen3-8B-chat  
    provider: openai
    model: Qwen/Qwen3-8B
    roles:
      - chat
      - edit
      - autocomplete
    apiKey: RUNPOD_API_KEY
    apiBase: https://api.runpod.ai/v2/XXYYZZ/openai/v1
    defaultCompletionOptions:
      temperature: 0.6
      maxTokens: 4096
      contextLength: 8192
      topP: 0.9
      reasoning: false

    chatOptions:
      baseSystemMessage: |
        Your primary objective is to assist users in their code-writing process through code completion while offering explanations...
      
    requestOptions:
      extraBodyProperties:
        think: false
  
        
    autocompleteOptions:
      temperature: 0.3
      maxTokens: 512
      maxPromptTokens: 2048
      topP: 0.95
      debounceDelay: 200
      onlyMyCode: true
      maxSuffixPercentage: 0.4
      prefixPercentage: 0.6
      useCache: true

    promptTemplates:
      edit: |
        `Here is the code before editing:
        \`\`\`{{{language}}}
        {{{codeToEdit}}}
        \`\`\`

        Here is the edit requested:
        "{{{userInput}}}"

        ## Instructions:
        * DO NOT think
        * Perform edits only.
        * Write the new edited code.

        Here is the code after editing:`
      autocomplete: |
        You are an AI assistant that helps users with code completion. The code context starts with {{{ prefix }}}, ends with {{{ suffix }}}, is part of the file {{{ filename }}}, located in the {{{ reponame }}} repository, and written in {{{ language }}}. Please provide code suggestions that complete the current line or block, considering the surrounding code and the programming language.
context:
  - provider: diff
  - provider: file
  - provider: code

Description

ℹ A little context about my setup

  • I have hosted my model on RunPod as a serverless endpoint which used vLLM as backend
  • I can totally disable/enable the thinking with the openai interface like shown below ⬇️
chat_response = client.chat.completions.create(
    model="Qwen/Qwen3-8B",
    messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": user},
    ],
    max_tokens=1024,
    temperature=0.6,
    top_p=0.95,
    extra_body={
        "top_k": 20,
        "chat_template_kwargs": {"enable_thinking": False},  # <--- by using this
    },
)

🍒 Problem

As shown in the YAML above, I have tried to disable the thinking in all ways possible:

➡️ Either by:

defaultCompletionOptions:
      temperature: 0.6
      maxTokens: 4096
      contextLength: 8192
      topP: 0.9
      reasoning: false

➡️ Or by:

requestOptions:
      extraBodyProperties:
        think: false

But thinking is being performed no matter what.

I guess, in AutoCompletion it works, but with chat it doesn't.

🤔 Problem with Edit

  • When I highlight some code and press CTRL+I to make some inline edits, it uses <think> tags and stops after </think> without writing any actual code.
  • When I instruct NOT TO think for edit (as shown in the YAML) it gives empty tags <think> ... </think> and, just that! Nothing else.

🙏🏻 Request

Please fix this, especially in chat mode.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:chatRelates to chat interfacearea:editRelates to side panel Edit featureide:vscodeRelates specifically to VS Code extensionkind:bugIndicates an unexpected problem or unintended behavioros:windowsHappening specifically on Windows

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions