Skip to content

AttributeError: 'GPT2Model' object has no attribute '_get_initial_cache_position' in AudioLDM2Pipeline #12630

@chaowenguo

Description

@chaowenguo

Describe the bug

! python3 -m pip install -U phonemizer
! sudo apt install -y --no-install-recommends espeak-ng
import scipy, transformers
import torch
from diffusers import AudioLDM2Pipeline

repo_id = "anhnct/audioldm2_gigaspeech"
pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# define the prompts
prompt = "An female actor say with angry voice"
transcript = "wish you have a good day, i hope you never forget me"
negative_prompt = "low quality"

# run the generation
audio = pipe(
    prompt,
    negative_prompt=negative_prompt,
    transcription=transcript,
    num_inference_steps=200,
    audio_length_in_s=8.0,
    num_waveforms_per_prompt=1,
    max_new_tokens=512
).audios

# save the best audio sample (index 0) as a .wav file
scipy.io.wavfile.write("techno_2.wav", rate=16000, data=audio[0])
from IPython.display import Audio
Audio("techno_2.wav")

keep showing 'GPT2Model' object has no attribute '_get_initial_cache_position's

the problem seems to happen in transformers, I downgrade to transformers==4.47.0 to make it works

Reproduction

see above

Logs

System Info

diffusers 0.34.0, transformers 4.53.3. I test it in kaggle p100

Who can help?

@yiyixuxu @DN6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions