Loading LORA weights does not change anything

TheBrightestSun · December 29, 2025, 6:24pm

Hi!

So I have downloaded LoRa from Civitai. This one Fitzpatrick style - FLUX | Flux LoRA | Civitai and stored it locally.

Then I created a pipeline

def generate_image_using_flux2(prompt: str):    
    repo_id = "diffusers/FLUX.2-dev-bnb-4bit"

    text_encoder = Mistral3ForConditionalGeneration.from_pretrained(
        repo_id, subfolder="text_encoder", torch_dtype=torch_dtype, device_map="cpu"
    )
    dit = AutoModel.from_pretrained(
        repo_id, subfolder="transformer", torch_dtype=torch_dtype, device_map="cpu"
    )
    pipe = Flux2Pipeline.from_pretrained(
        repo_id, text_encoder=text_encoder, transformer=dit, torch_dtype=torch_dtype
    )
    pipe.enable_model_cpu_offload()
    pipe.enable_attention_slicing()

    pipe.vae.enable_slicing()

    pipe.load_lora_weights("./jimfitzpatrick-fluxlora.safetensors", weight_name="jimfitzpatrick-fluxlora.safetensors")

    return pipe(
        prompt=prompt,
        generator=torch.Generator(device=device).manual_seed(42),
        num_inference_steps=24,
        guidance_scale=4
    ).images[0]

However the generated image stays the same with our without LORA. Nether style nor anything changes. What do I miss?

John6666 · December 29, 2025, 9:48pm

We don’t have quite enough information to fully identify the problem…

If the image is bit-identical (same seed, same prompt, same steps) with and without the LoRA, then the LoRA did not affect the forward pass at all. In Diffusers terms, either (1) the LoRA weights never got attached (key mismatch or loader path issue), or (2) they attached to the wrong component, or (3) you hit a known Flux + 4-bit LoRA bug/regression.

Below is the shortest path to a working setup, with the background for why each change matters.

Background: how LoRA “changes the image” in Diffusers

A LoRA is a set of small matrices that get injected into specific linear layers of the base model. For diffusion models that means:

Denoiser (UNet for SD1.5/SDXL, or Transformer/DiT for Flux)
Sometimes also text encoder

Diffusers load_lora_weights() is supposed to:

read the LoRA tensors,
match their keys to modules in the pipeline,
install adapter hooks so inference uses modified weights.

If matching fails, Diffusers can end up effectively doing nothing. Some Flux LoRA issues even describe “no error, but the style isn’t applied and output is identical.” (GitHub)

The two biggest problems in your code

1) You load the Flux.2 transformer as `AutoModel`

For Flux.2, Hugging Face’s own blog shows you should instantiate the denoiser as Flux2Transformer2DModel, not a generic AutoModel. (Hugging Face)

Why it matters:

LoRA injection depends on model class structure and module naming.
A generic auto class can still run inference but can break adapter injection or key matching.

2) You are using the “single file” LoRA path in a risky way

The Diffusers docs show load_lora_weights(pretrained_model_name_or_path, weight_name=...) where the first argument is typically a repo id or directory, and weight_name selects the file. (Hugging Face)

Passing a file path and a weight_name is easy to get wrong across versions and loaders. If Diffusers treats the first argument as a directory, it can fail to locate the file cleanly or load an empty state dict.

Flux-specific pitfall: “LoRA scaling” is not what you think

In Flux pipelines, lora_scale is documented as applying to text encoder LoRA layers. It is not a guaranteed “global LoRA strength” knob for the transformer denoiser. (GitHub)

There is also a long-standing Flux issue where changing lora_scale produced no change with a fixed seed. (GitHub)

So for Flux, the reliable test is:

confirm adapters actually loaded,
force a huge adapter weight,
confirm output changes.

Flux + 4-bit: you may be on a version where LoRA is broken

Diffusers v0.32.2 explicitly says it “fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models.” (GitHub)

There are also Flux issues about LoRA failing when transformers are quantized to 4-bit. (GitHub)

So even if your code is “conceptually right”, an older Diffusers release can give exactly your symptom: LoRA loads but has no effect.

What I would do for your exact case (in order)

Step 1: Print versions first

You want to know if you are before or after the 4-bit Flux LoRA regression fix.

import diffusers, transformers
print("diffusers", diffusers.__version__)
print("transformers", transformers.__version__)

If you are older than the fix, upgrade Diffusers (at least past v0.32.2). (GitHub)

Also avoid random nightly dev builds if something breaks. There are issues where load_lora_weights works in one version and breaks in another, then works again after downgrade. (GitHub)

Step 2: Build the pipeline exactly like the Flux.2 reference code (critical)

Replace AutoModel with Flux2Transformer2DModel exactly as shown in the official Flux.2 post. (Hugging Face)

Step 3: Load the LoRA as “directory + weight_name”

This matches the documented pattern and avoids ambiguity. (Hugging Face)

Step 4: Force adapter activation and do an A/B test

Diffusers supports setting adapter scaling. The docs show set_adapters() for controlling scale. (Hugging Face)

Even if you load only one LoRA, forcing adapter weight removes ambiguity.

A “known-good” rewritten version of your function

import torch
from transformers import Mistral3ForConditionalGeneration
from diffusers import Flux2Pipeline, Flux2Transformer2DModel

def generate_image_using_flux2(prompt: str):
    repo_id = "diffusers/FLUX.2-dev-bnb-4bit"
    device = "cuda:0"
    torch_dtype = torch.bfloat16

    # Match the official Flux.2 4-bit loading pattern
    transformer = Flux2Transformer2DModel.from_pretrained(
        repo_id, subfolder="transformer", torch_dtype=torch_dtype, device_map="cpu"
    )
    text_encoder = Mistral3ForConditionalGeneration.from_pretrained(
        repo_id, subfolder="text_encoder", dtype=torch_dtype, device_map="cpu"
    )

    pipe = Flux2Pipeline.from_pretrained(
        repo_id,
        transformer=transformer,
        text_encoder=text_encoder,
        torch_dtype=torch_dtype,
    )

    # Load LoRA BEFORE offload to reduce device-map weirdness
    # Use directory + weight_name (doc pattern)
    pipe.load_lora_weights(
        ".",
        weight_name="jimfitzpatrick-fluxlora.safetensors",
        adapter_name="jimfitz",
    )

    # Force a very obvious strength for testing
    pipe.set_adapters("jimfitz", 2.0)

    pipe.enable_model_cpu_offload()

    img = pipe(
        prompt=prompt,
        generator=torch.Generator(device=device).manual_seed(42),
        num_inference_steps=24,
        guidance_scale=4,
    ).images[0]

    return img

Key points this fixes:

Correct transformer class for Flux.2 loading. (Hugging Face)
LoRA load path matches Diffusers adapter-loading docs. (Hugging Face)
Explicit adapter activation and exaggerated weight.

If this still produces identical output, then either the LoRA is incompatible (wrong base model family), or you are still hitting a version-specific LoRA bug for quantized Flux. (GitHub)

If it still doesn’t work: diagnose “incompatible LoRA” quickly

1) The LoRA may be for Flux.1 or SDXL, not Flux.2

This happens a lot with Civitai LoRAs. The symptom matches a known Diffusers issue: “silent erroring, doesn’t error but style isn’t applied and output is identical.” (GitHub)

Because I cannot open the Civitai page from here (site access fails), I cannot confirm what base it was trained for. You can still confirm locally:

Inspect keys inside the safetensors file.

If you see lots of unet keys, it is SDXL or SD1.5 style.
If you see Flux-specific transformer keys but for Flux.1 shapes or names, it may need conversion.

Flux LoRA key mismatch and conversion is a known topic (kohya conversion scripts are mentioned directly in Diffusers issues). (GitHub)

2) Try the LoRA on a non-quantized Flux.2 pipeline

Quantization is a known boundary where LoRA breaks. There are multiple Flux LoRA issues involving quantized transformers. (GitHub)

If it works unquantized but not on *-bnb-4bit, you have confirmed it is a quantization-path problem, not the LoRA file itself.

Prompt-trigger pitfall (less likely for “identical output”, but real)

Some LoRAs are trained with a trigger token and expect you to include it in the prompt. Diffusers docs explicitly show an example where you must include a token (“cnmt”) to trigger the LoRA. (Hugging Face)

If you never use the trigger token, the effect can be weak or look like “nothing changed,” especially at low CFG.

Similar “no effect” cases online (the ones that match your symptom)

Flux LoRA key mismatch leading to identical output, described as “silent erroring.” (GitHub)
Flux lora_scale not changing output with fixed seed. (GitHub)
Flux LoRA failing when transformer is quantized. (GitHub)
Diffusers release explicitly fixing a 4-bit Flux LoRA loading regression. (GitHub)
Version-specific load_lora_weights breakage reported on dev builds. (GitHub)

High-quality references and guides

Hugging Face blog: Flux.2 + 4-bit loading snippet (uses Flux2Transformer2DModel). https://huggingface.co/blog/flux-2 (Hugging Face)
Diffusers docs: Loading adapters and LoRAs, scaling, set_adapters(), trigger tokens. https://huggingface.co/docs/diffusers/main/using-diffusers/loading_adapters (Hugging Face)
Diffusers release notes: v0.32.2 fix for 4-bit BnB Flux LoRA regression. https://github.com/huggingface/diffusers/releases (GitHub)
Flux LoRA key mismatch issue (identical output). https://github.com/huggingface/diffusers/issues/9291 (GitHub)
Quantized transformer LoRA failure issue (4-bit). https://github.com/huggingface/diffusers/issues/10989 (GitHub)
lora_scale “no effect” Flux issue (explains why scaling knobs mislead). https://github.com/huggingface/diffusers/issues/9525 (GitHub)

Summary

Your strongest fix: load the denoiser as Flux2Transformer2DModel, not AutoModel. (Hugging Face)
Load the LoRA as directory + weight_name, then force set_adapters(..., 2.0) to prove it affects output. (Hugging Face)
Ensure Diffusers includes the 4-bit Flux LoRA regression fix (v0.32.2 mentions it explicitly). (GitHub)
If still identical, assume LoRA incompatibility (Flux.1 or SDXL) or a quantization-path LoRA failure. (GitHub)

TheBrightestSun · December 29, 2025, 10:17pm

Wow, what an amazing answer!

Regarding the way I load Lora is due to mean not willing to create access tokens and just using plain downloads to with checkpoints. I originally did not use HF token either but as some models are gated without them I started to use them too so probably I can go with access tokens for civitai also.

I think the issue is that I was using Lora for flux version 1, while using flux version 2. As I a noob I actually did not pay attention to that originally. I realized the issue when I tried to use a different Lora and a different model (that was actually matching) and then I saw that it worked fine.

My question is this though - do I really need to provide the text encoder and the transformer at all? Like I can go without them with from_pretrained right?

John6666 · December 29, 2025, 11:39pm

Yeah. Maybe right.

You diagnosed it correctly. A FLUX.1 LoRA applied to a FLUX.2 pipeline usually matches zero layers, so it becomes a no-op.

On your question: no, you do not have to manually load the text encoder and transformer in normal Diffusers usage. You only do that when you want to override how components are loaded or placed.

What `from_pretrained()` already does (baseline behavior)

Flux2Pipeline.from_pretrained(repo_id, ...) is built on Diffusers’ DiffusionPipeline.from_pretrained(). It:

Reads the pipeline config (the model_index.json in the repo).
Automatically loads the required components (transformer, text encoder, VAE, scheduler, etc.).
Passes them into the pipeline constructor for you. (Hugging Face)

So the “simple” pattern is valid:

pipe = Flux2Pipeline.from_pretrained("diffusers/FLUX.2-dev-bnb-4bit", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

No manual component loading required.

Why people do pass `text_encoder=` and `transformer=` anyway

Manual component loading is mainly for control. Typical reasons:

1) You want explicit device placement per component (avoid OOM)

FLUX.2 is huge. A common trick is:

load the text encoder and transformer on CPU via device_map="cpu"
then let enable_model_cpu_offload() shuttle them as needed

Black Forest Labs’ official FLUX.2 Diffusers instructions show exactly this pattern for “4-bit transformer and 4-bit text encoder.” (GitHub)

That is why your original code looked like theirs. It is not “required.” It is “VRAM control.”

2) You want to omit the text encoder entirely and supply embeddings yourself

This is a real supported mode for FLUX.2.

Black Forest Labs shows a “remote text encoder” mode where they do:

Flux2Pipeline.from_pretrained(..., text_encoder=None)
then call the pipeline with prompt_embeds=... instead of prompt=... (GitHub)

This saves local VRAM, but changes your calling convention.

3) You want to swap components (custom text encoder, different transformer build, etc.)

Passing components lets you:

replace the text encoder
reuse already-loaded components across pipelines
experiment with different quantization settings per component

Diffusers explicitly supports reusing components via the components property to initialize other pipelines without reallocating weights. (Hugging Face)

Practical guidance for your setup

If you want the simplest code path

Do not pre-load anything. Let the pipeline load its own parts:

pipe = Flux2Pipeline.from_pretrained(
    "diffusers/FLUX.2-dev-bnb-4bit",
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()

Then load your FLUX.2-compatible LoRA.

This is the least moving parts.

If you are close to VRAM limits (likely with FLUX.2)

Keep the “manual load on CPU + offload” approach. It is officially documented for FLUX.2. (GitHub)

If you ever set `text_encoder=None`

You must stop passing prompt=... and instead pass embeddings (like prompt_embeds=...). The FLUX.2 doc’s remote encoder example does exactly this. (GitHub)

About tokens and “plain downloads”

You do not need tokens for “plain downloads” in principle.

HF token is only needed to download gated Hugging Face repos. The FLUX.2 docs mention accepting gating and logging in. (GitHub)
For LoRAs you downloaded as files (Civitai or elsewhere), Diffusers does not care how you got the file. It only cares that the tensors match the model.

So your “download locally and load from disk” approach is fine.

Summary

No, you do not need to pass text_encoder and transformer. from_pretrained() loads them automatically. (Hugging Face)
You do pass them when you need explicit device placement or special setups (remote text encoder, CPU offload strategies). (GitHub)
If you set text_encoder=None, you must pass prompt_embeds, not prompt. (GitHub)

Topic		Replies	Views
Loading txt2img LoRA after training leads to noise images 🧨 Diffusers	2	336	August 28, 2024
Confusing diffusers documentation on usage of kohya _ss Lora + SDXL 🧨 Diffusers	0	1206	November 15, 2023
Need help on training LoRA model Beginners	5	1486	January 30, 2025
Lora extension with weight 🧨 Diffusers	0	608	August 1, 2023
Want my Flux LoRa model to work and also want to be able to train my own SD 1.5 and SDXL model Beginners	4	562	October 28, 2025

Loading LORA weights does not change anything

Background: how LoRA “changes the image” in Diffusers

The two biggest problems in your code

1) You load the Flux.2 transformer as AutoModel

2) You are using the “single file” LoRA path in a risky way

Flux-specific pitfall: “LoRA scaling” is not what you think

Flux + 4-bit: you may be on a version where LoRA is broken

What I would do for your exact case (in order)

Step 1: Print versions first

Step 2: Build the pipeline exactly like the Flux.2 reference code (critical)

Step 3: Load the LoRA as “directory + weight_name”

Step 4: Force adapter activation and do an A/B test

A “known-good” rewritten version of your function

If it still doesn’t work: diagnose “incompatible LoRA” quickly

1) The LoRA may be for Flux.1 or SDXL, not Flux.2

2) Try the LoRA on a non-quantized Flux.2 pipeline

Prompt-trigger pitfall (less likely for “identical output”, but real)

Similar “no effect” cases online (the ones that match your symptom)

High-quality references and guides

Summary

What from_pretrained() already does (baseline behavior)

Why people do pass text_encoder= and transformer= anyway

1) You want explicit device placement per component (avoid OOM)

2) You want to omit the text encoder entirely and supply embeddings yourself

3) You want to swap components (custom text encoder, different transformer build, etc.)

Practical guidance for your setup

If you want the simplest code path

If you are close to VRAM limits (likely with FLUX.2)

If you ever set text_encoder=None

About tokens and “plain downloads”

Summary

Related topics

1) You load the Flux.2 transformer as `AutoModel`

What `from_pretrained()` already does (baseline behavior)

Why people do pass `text_encoder=` and `transformer=` anyway

If you ever set `text_encoder=None`