Failing to import LightOnOCRForConditionalGeneration

#15
by fhaDL - opened

This was working alright. But today I created a new venv and installed transformers through
pip install -q -U git+https://github.com/baptiste-aubertin/transformers.git@main

But now I am getting this error:
ImportError: cannot import name 'LightOnOCRForConditionalGeneration' from 'transformers'

LightOnOcrForConditionalGeneration 改名了

LightOnOcrForConditionalGeneration 改名了

Thanks! Got it working now

i am also facing this issue can use tell me the solution please

LightOn AI org

Hi @maazahmed10 ,

The class was renamed to use transformers's naming conventions . Change LightOnOCRForConditionalGeneration to LightOnOcrForConditionalGeneration (note: OCR β†’ Ocr) πŸ˜€

Bapt120 changed discussion status to closed
Bapt120 changed discussion status to open

i am still facing the issue:

import torch
from PIL import Image
from transformers import AutoProcessor, LightOnOcrForConditionalGeneration
import requests
from io import BytesIO

Load Model

model_id = "lightonai/LightOnOCR-1B-1025"
device = "cuda" if torch.cuda.is_available() else "cpu"

processor = AutoProcessor.from_pretrained(model_id)
model = LightOnOcrForConditionalGeneration.from_pretrained(model_id, dtype=torch.bfloat16, device_map=device, attn_implementation="sdpa")
model.eval();

ImportError: cannot import name 'LightOnOcrForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.12/dist-packages/transformers/init.py)

Hi @maazahmed10 ,

The class was renamed to use transformers's naming conventions . Change LightOnOCRForConditionalGeneration to LightOnOcrForConditionalGeneration (note: OCR β†’ Ocr) πŸ˜€

Hi @Bapt120 ,

Now we are facing another issue. After processor.apply_chat_template, there are no 'pixel_values'.

---> 14 inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
15
16 outputs = model.generate(

/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py in getitem(self, item)
261 """
262 if isinstance(item, str):
--> 263 return self.data[item]
264 elif self._encodings is not None:
265 return self._encodings[item]

KeyError: 'pixel_values'

This is an awesome model and I have fine-tuned it for Doc VQA and there too it is performing really well. But because of some breaking changes, it is getting difficult to convert it into gguf format or to run it with vllm. I have seen many pending PRs and hopefully this all will get fixed soon.

You guys have done an amazing job building this model. πŸ˜€

i am also facing the same issue. After processor.apply_chat_template, there are no 'pixel_values'.

LightOn AI org

Hi,
It seems that the latest changes introduced breaking changes, a simple temporary solution is to install from a previous commit, for example:

pip install git+https://github.com/baptiste-aubertin/transformers.git@83d01ff90112ea3a0d8a6679aa9383e80f31c1db

This will be fixed soon but in the meantime you can use the above!

Well ig, for that pixel_values keyerror issue was probably , tokenizer issue(probably idk 🀷) , cause after changing it to True , and changing to conventional chat template(after checking the hf spaces for this model) and some other changes , the model worked. (with this version of the transformers , !pip install -q -U git+https://github.com/baptiste-aubertin/transformers.git@main)

def clean_output_text(text):
    markers = ["system", "user", "assistant"]
    lines = text.splitlines()

    filtered = [
        line for line in lines
        if line.strip().lower() not in markers
    ]

    cleaned = "\n".join(filtered).strip()

    if "assistant" in text.lower():
        cleaned = text.split("assistant", 1)[-1].strip()

    return cleaned



def extract_text_from_image(image, temperature=0.0):
    """
    image: PIL.Image
    """
    chat = [
        {
            "role": "user",
            "content": [
                {"type": "image", "url": image},
            ],
        }
    ]

    inputs = processor.apply_chat_template(
        chat,
        add_generation_prompt=True,
        tokenize=True,
        return_tensors="pt",
        return_dict=True,
    )

    # move tensors to device
    inputs = {
    k: (
        v.to(device=device, dtype=dtype)
        if isinstance(v, torch.Tensor) and v.is_floating_point()
        else v.to(device)
        if isinstance(v, torch.Tensor)
        else v
    )
    for k, v in inputs.items()
    }


    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=temperature > 0,
        temperature=temperature if temperature > 0 else None,
        use_cache=True,
    )

    text = processor.decode(
        outputs[0],
        skip_special_tokens=True
    )

    return clean_output_text(text)


def run_ocr(
    file_path,
    page_num=1,
    temperature=0.0,
):
    """
    file_path: image or pdf
    """
    if file_path.lower().endswith(".pdf"):
        image, total_pages = load_pdf_page(file_path, page_num)
        print(f"PDF page {page_num}/{total_pages}")
    else:
        image = Image.open(file_path).convert("RGB")
        print("Image loaded")

    text = extract_text_from_image(image, temperature)
    return text

Sign up or log in to comment