Failing to import LightOnOCRForConditionalGeneration
This was working alright. But today I created a new venv and installed transformers through
pip install -q -U git+https://github.com/baptiste-aubertin/transformers.git@main
But now I am getting this error:
ImportError: cannot import name 'LightOnOCRForConditionalGeneration' from 'transformers'
+1
LightOnOcrForConditionalGeneration ζΉεδΊ
LightOnOcrForConditionalGeneration ζΉεδΊ
Thanks! Got it working now
i am also facing this issue can use tell me the solution please
Hi @maazahmed10 ,
The class was renamed to use transformers's naming conventions . Change LightOnOCRForConditionalGeneration to LightOnOcrForConditionalGeneration (note: OCR β Ocr) π
i am still facing the issue:
import torch
from PIL import Image
from transformers import AutoProcessor, LightOnOcrForConditionalGeneration
import requests
from io import BytesIO
Load Model
model_id = "lightonai/LightOnOCR-1B-1025"
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoProcessor.from_pretrained(model_id)
model = LightOnOcrForConditionalGeneration.from_pretrained(model_id, dtype=torch.bfloat16, device_map=device, attn_implementation="sdpa")
model.eval();
ImportError: cannot import name 'LightOnOcrForConditionalGeneration' from 'transformers' (/usr/local/lib/python3.12/dist-packages/transformers/init.py)
Hi @maazahmed10 ,
The class was renamed to use transformers's naming conventions . Change LightOnOCRForConditionalGeneration to LightOnOcrForConditionalGeneration (note: OCR β Ocr) π
Hi @Bapt120 ,
Now we are facing another issue. After processor.apply_chat_template, there are no 'pixel_values'.
---> 14 inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
15
16 outputs = model.generate(
/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py in getitem(self, item)
261 """
262 if isinstance(item, str):
--> 263 return self.data[item]
264 elif self._encodings is not None:
265 return self._encodings[item]
KeyError: 'pixel_values'
This is an awesome model and I have fine-tuned it for Doc VQA and there too it is performing really well. But because of some breaking changes, it is getting difficult to convert it into gguf format or to run it with vllm. I have seen many pending PRs and hopefully this all will get fixed soon.
You guys have done an amazing job building this model. π
i am also facing the same issue. After processor.apply_chat_template, there are no 'pixel_values'.
Hi,
It seems that the latest changes introduced breaking changes, a simple temporary solution is to install from a previous commit, for example:
pip install git+https://github.com/baptiste-aubertin/transformers.git@83d01ff90112ea3a0d8a6679aa9383e80f31c1db
This will be fixed soon but in the meantime you can use the above!
Well ig, for that pixel_values keyerror issue was probably , tokenizer issue(probably idk π€·) , cause after changing it to True , and changing to conventional chat template(after checking the hf spaces for this model) and some other changes , the model worked. (with this version of the transformers , !pip install -q -U git+https://github.com/baptiste-aubertin/transformers.git@main)
def clean_output_text(text):
markers = ["system", "user", "assistant"]
lines = text.splitlines()
filtered = [
line for line in lines
if line.strip().lower() not in markers
]
cleaned = "\n".join(filtered).strip()
if "assistant" in text.lower():
cleaned = text.split("assistant", 1)[-1].strip()
return cleaned
def extract_text_from_image(image, temperature=0.0):
"""
image: PIL.Image
"""
chat = [
{
"role": "user",
"content": [
{"type": "image", "url": image},
],
}
]
inputs = processor.apply_chat_template(
chat,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
return_dict=True,
)
# move tensors to device
inputs = {
k: (
v.to(device=device, dtype=dtype)
if isinstance(v, torch.Tensor) and v.is_floating_point()
else v.to(device)
if isinstance(v, torch.Tensor)
else v
)
for k, v in inputs.items()
}
outputs = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=temperature > 0,
temperature=temperature if temperature > 0 else None,
use_cache=True,
)
text = processor.decode(
outputs[0],
skip_special_tokens=True
)
return clean_output_text(text)
def run_ocr(
file_path,
page_num=1,
temperature=0.0,
):
"""
file_path: image or pdf
"""
if file_path.lower().endswith(".pdf"):
image, total_pages = load_pdf_page(file_path, page_num)
print(f"PDF page {page_num}/{total_pages}")
else:
image = Image.open(file_path).convert("RGB")
print("Image loaded")
text = extract_text_from_image(image, temperature)
return text