libs support modules
Shared helpers that sit alongside the feature packages above: rasterizing PDFs, parsing layout JSON, geometry types, success metrics, and debug PDF overlays.
- formhtr.libs.pdf_to_image.convert_pdf_to_image(pdf_path, page=0, dpi=300)[source]
Convert PDF to image (assume only one page).
- formhtr.libs.pdf_to_image.get_image_size(logsheet_image)[source]
Find the size of the image
- Parameters:
logsheet_image (np.array) – image of interest
- Returns:
size in bytes
- Return type:
- class formhtr.libs.logsheet_config.LogsheetConfig(regions, residuals, height=None, width=None)[source]
Bases:
objectClass to store and represent the whole config.
- add_roi(start_x, start_y, end_x, end_y, varname=None, content_type=None)[source]
Append a new ROI rectangle to
regions.- Parameters:
start_x – Left edge (inclusive).
start_y – Top edge (inclusive).
end_x – Right edge.
end_y – Bottom edge.
varname – Optional variable label.
content_type – Optional type string (e.g.
Handwritten).
- Returns:
None.
- class formhtr.libs.region.Region(start_x, start_y, end_x, end_y)[source]
Bases:
objectClass to represent single ROI
- class formhtr.libs.region.Residual(start_x, start_y, end_x, end_y, expected_content)[source]
Bases:
Region
- class formhtr.libs.region.ROI(start_x, start_y, end_x, end_y, varname=None, content_type=None)[source]
Bases:
Region
- formhtr.libs.statistics.compute_success_ratio(contents, artefacts)[source]
Compute ratio between number identified regions and extra content
- formhtr.libs.visualise_regions.load_font()[source]
Return a TrueType font for overlay labels.
- Returns:
A PIL
ImageFontinstance (Arial if available, else default bitmap font).
Note
May download
Arial.ttfinto the current working directory once.
- formhtr.libs.visualise_regions.create_debug_dir()[source]
Create
debug/in the current working directory if missing.- Returns:
None.
- formhtr.libs.visualise_regions.annotate_pdfs(identified_content, logsheet_image, front)[source]
Write one debug PDF per OCR provider under
debug/.- Parameters:
identified_content – Dict
google/amazon/azuremapping to iterables of regions withget_coords()andcontent(may beNone).logsheet_image – Raster image (
numpy) to draw on.front – If False, suffix output filenames with
_back.
- Returns:
None.
- formhtr.libs.visualise_regions.visualise_regions(regions, image, output_pdf)[source]
Draw bounding boxes and labels, save as PDF in
debug/.- Parameters:
regions – Iterable of objects with
get_coords(),get_start(),content.image – Source
numpyimage (RGB/BGR as supported by PIL).output_pdf – Filename only; written as
debug/{output_pdf}.
- Returns:
None.