Other API modules
Alignment, export-without-OCR, small PDF utilities, and environment checks. For the main OCR pipeline see Core API.
- class formhtr.export_logsheet.ExportEntry(varname: str, content: str, x: int, y: int, width: int, height: int, page: int = 0)[source]
Bases:
objectSingle ROI row for export (name, optional fixed text, box, page).
- class formhtr.export_logsheet.ExportConfig(width: int | None, height: int | None, entries: list[ExportEntry])[source]
Bases:
objectParsed export JSON (legacy
dataor currentcontentschema).- entries
List of regions to crop and export.
- formhtr.export_logsheet.export_logsheet_to_xlsx(*, scanned_logsheet_pdf: str, template_pdf: str, config_json: str, output_xlsx: str, already_aligned: bool = False, alignment_config_path: str | None = None, backside: bool = False, backside_template_pdf: str | None = None, backside_config_json: str | None = None, backside_alignment_config_path: str | None = None) None[source]
Crop ROI regions from a scan into an XLSX (no OCR).
- Parameters:
scanned_logsheet_pdf – Path to the scanned PDF.
template_pdf – Front template PDF path.
config_json – Front export/ROI config JSON.
output_xlsx – Output
.xlsxpath.already_aligned – Skip alignment when True.
alignment_config_path – Optional front alignment JSON.
backside – Include back-side entries when config/template are provided.
backside_template_pdf – Back template PDF path.
backside_config_json – Back config JSON path.
backside_alignment_config_path – Optional back alignment JSON.
- Returns:
None; writesoutput_xlsx.
- formhtr.auto_align.get_page_alignment_data(*, scanned_logsheet_pdf: str, template_pdf: str, page: int = 0, dpi: int = 300) dict[source]
Corners and image dimensions for automatic alignment of one scan page.
- Parameters:
scanned_logsheet_pdf – Path to the scanned PDF.
template_pdf – Path to the template PDF.
page – Scan page index (
0= front,1= back when applicable).dpi – Rasterization resolution.
- Returns:
Dict with
templatePoints,targetPoints,imageWidth,imageHeight(JSON-friendly point dicts and integers).
- formhtr.auto_align.build_alignment_payload(*, scanned_logsheet_pdf: str, template_pdf: str, backside_template_pdf: str | None = None, dpi: int = 300) dict[source]
Build a payload with front (and optional back) alignment corner data.
- Parameters:
scanned_logsheet_pdf – Path to the scanned PDF (at least two pages if back is used).
template_pdf – Front template PDF path.
backside_template_pdf – Optional back template PDF path.
dpi – Rasterization resolution.
- Returns:
Dict with keys
frontside(dict) andbackside(dict orNone).
- formhtr.manual_align.align_page(target, template, *, backside: bool = False, template_points: list[tuple[int, int]] | None = None, target_points: list[tuple[int, int]] | None = None)[source]
Warp
targetontotemplateusing four-point homography.- Parameters:
target – Scanned page image (
numpyBGR).template – Template image of the same logical size.
backside – Affects GUI window labels only.
template_points – Four template corners, or
Noneto pick in a GUI.target_points – Four scan corners, or
Noneto pick in a GUI.
- Returns:
Warped
targetwith template dimensions.
- formhtr.manual_align.manual_align_pdf(*, template_pdf: str, scanned_logsheet_pdf: str, output_pdf: str, backside_template_pdf: str | None = None, template_points: list[tuple[int, int]] | None = None, target_points: list[tuple[int, int]] | None = None) None[source]
Write a PDF whose pages are the aligned scan (front and optionally back).
- Parameters:
template_pdf – Front template PDF path.
scanned_logsheet_pdf – Scanned PDF (at least one page; two if backside is used).
output_pdf – Output PDF path to create or overwrite.
backside_template_pdf – Optional back template; if omitted, page 2 of the scan is copied.
template_points – Optional shared four template corners (else GUI per
align_page).target_points – Optional shared four scan corners (else GUI per
align_page).
- Returns:
None; writesoutput_pdfon disk.
- formhtr.pdf_utils.get_pdf_dimensions(*, pdf_file: str, dpi: int = 300) dict[str, int][source]
Pixel size of the first PDF page after rasterization.
- Parameters:
pdf_file – Path to the PDF.
dpi – Rasterization resolution.
- Returns:
Dict with keys
heightandwidthin pixels.