Failing to import LightOnOCRForConditionalGeneration

#15

by fhaDL - opened 11 days ago

11 days ago

•

This was working alright. But today I created a new venv and installed transformers through
pip install -q -U git+https://github.com/baptiste-aubertin/transformers.git@main

But now I am getting this error:
ImportError: cannot import name 'LightOnOCRForConditionalGeneration' from 'transformers'

zhiweiliu

10 days ago

zhiweiliu

10 days ago

LightOnOcrForConditionalGeneration 改名了

fhaDL

10 days ago

LightOnOcrForConditionalGeneration 改名了

Thanks! Got it working now

maazahmed10

6 days ago

i am also facing this issue can use tell me the solution please

Bapt120

LightOn AI org 6 days ago

Hi @maazahmed10 ,

The class was renamed to use transformers's naming conventions . Change LightOnOCRForConditionalGeneration to LightOnOcrForConditionalGeneration (note: OCR → Ocr) 😀

Bapt120 changed discussion status to closed 6 days ago

Bapt120 changed discussion status to open 6 days ago

maazahmed10

6 days ago

i am still facing the issue:

import torch
from PIL import Image
from transformers import AutoProcessor, LightOnOcrForConditionalGeneration
import requests
from io import BytesIO

Load Model

model_id = "lightonai/LightOnOCR-1B-1025"
device = "cuda" if torch.cuda.is_available() else "cpu"

processor = AutoProcessor.from_pretrained(model_id)
model = LightOnOcrForConditionalGeneration.from_pretrained(model_id, dtype=torch.bfloat16, device_map=device, attn_implementation="sdpa")
model.eval();

ImportError: cannot import name 'LightOnOcrForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.12/dist-packages/transformers/init.py)

fhaDL

5 days ago

•

edited 5 days ago

Hi @maazahmed10 ,

The class was renamed to use transformers's naming conventions . Change LightOnOCRForConditionalGeneration to LightOnOcrForConditionalGeneration (note: OCR → Ocr) 😀

Hi @Bapt120 ,

Now we are facing another issue. After processor.apply_chat_template, there are no 'pixel_values'.

---> 14 inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
15
16 outputs = model.generate(

/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py in getitem(self, item)
261 """
262 if isinstance(item, str):
--> 263 return self.data[item]
264 elif self._encodings is not None:
265 return self._encodings[item]

KeyError: 'pixel_values'

This is an awesome model and I have fine-tuned it for Doc VQA and there too it is performing really well. But because of some breaking changes, it is getting difficult to convert it into gguf format or to run it with vllm. I have seen many pending PRs and hopefully this all will get fixed soon.

You guys have done an amazing job building this model. 😀

pra123gnya

4 days ago

i am also facing the same issue. After processor.apply_chat_template, there are no 'pixel_values'.

staghado

LightOn AI org 4 days ago

Hi,
It seems that the latest changes introduced breaking changes, a simple temporary solution is to install from a previous commit, for example:

pip install git+https://github.com/baptiste-aubertin/transformers.git@83d01ff90112ea3a0d8a6679aa9383e80f31c1db

This will be fixed soon but in the meantime you can use the above!

svamplike

4 days ago

Well ig, for that pixel_values keyerror issue was probably , tokenizer issue(probably idk 🤷) , cause after changing it to True , and changing to conventional chat template(after checking the hf spaces for this model) and some other changes , the model worked. (with this version of the transformers , !pip install -q -U git+https://github.com/baptiste-aubertin/transformers.git@main)

def clean_output_text(text):
    markers = ["system", "user", "assistant"]
    lines = text.splitlines()

    filtered = [
        line for line in lines
        if line.strip().lower() not in markers
    ]

    cleaned = "\n".join(filtered).strip()

    if "assistant" in text.lower():
        cleaned = text.split("assistant", 1)[-1].strip()

    return cleaned



def extract_text_from_image(image, temperature=0.0):
    """
    image: PIL.Image
    """
    chat = [
        {
            "role": "user",
            "content": [
                {"type": "image", "url": image},
            ],
        }
    ]

    inputs = processor.apply_chat_template(
        chat,
        add_generation_prompt=True,
        tokenize=True,
        return_tensors="pt",
        return_dict=True,
    )

    # move tensors to device
    inputs = {
    k: (
        v.to(device=device, dtype=dtype)
        if isinstance(v, torch.Tensor) and v.is_floating_point()
        else v.to(device)
        if isinstance(v, torch.Tensor)
        else v
    )
    for k, v in inputs.items()
    }


    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=temperature > 0,
        temperature=temperature if temperature > 0 else None,
        use_cache=True,
    )

    text = processor.decode(
        outputs[0],
        skip_special_tokens=True
    )

    return clean_output_text(text)


def run_ocr(
    file_path,
    page_num=1,
    temperature=0.0,
):
    """
    file_path: image or pdf
    """
    if file_path.lower().endswith(".pdf"):
        image, total_pages = load_pdf_page(file_path, page_num)
        print(f"PDF page {page_num}/{total_pages}")
    else:
        image = Image.open(file_path).convert("RGB")
        print("Image loaded")

    text = extract_text_from_image(image, temperature)
    return text

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment