Skip to content

options

Option-handling and configuration definitions for page_dewarp.

This package provides:

  • A global config instance (cfg) containing default parameters.
  • The Config class, which defines the structure and types of these parameters.

core

Core configuration structures for page-dewarp.

Defines:

  • A helper function (desc) that annotates msgspec.Struct fields with a description.
  • A global Config class specifying various parameters (camera, edge detection, etc.).

Config

Bases: Struct

Global configuration for page-dewarp.

Holds parameters controlling camera focal length, contour detection, output size, page margin, debug verbosity, etc.

Attributes:

Name Type Description
OPT_MAX_ITER int

Maximum optimisation iterations.

OPT_METHOD str

Name of the JAX/SciPy optimisation method to use.

FOCAL_LENGTH float

Normalized focal length of camera.

TEXT_MIN_WIDTH int

Minimum reduced pixel width of detected text contour.

TEXT_MIN_HEIGHT int

Minimum reduced pixel height of detected text contour.

TEXT_MIN_ASPECT float

Filter out text contours below this width/height ratio.

TEXT_MAX_THICKNESS int

Maximum reduced pixel thickness of detected text contour.

DEBUG_LEVEL int

Debug verbosity level (0 = none).

DEBUG_OUTPUT str

Output mode for debug information ('file' by default).

EDGE_MAX_OVERLAP float

Maximum horizontal overlap of contours in a span.

EDGE_MAX_LENGTH float

Maximum length of edges connecting contours.

EDGE_ANGLE_COST float

Cost of angles in edges (tradeoff vs length).

EDGE_MAX_ANGLE float

Maximum allowed change in angle between contours.

SCREEN_MAX_W int

Viewing screen maximum width (for resizing to screen).

SCREEN_MAX_H int

Viewing screen maximum height (for resizing to screen).

PAGE_MARGIN_X int

Pixels to ignore near left/right edge.

PAGE_MARGIN_Y int

Pixels to ignore near top/bottom edge.

ADAPTIVE_WINSZ int

Window size for adaptive thresholding.

OUTPUT_ZOOM float

Zoom factor for output relative to original image.

OUTPUT_DPI int

Stated DPI of output PNG (does not affect appearance).

REMAP_DECIMATE int

Downscaling factor for remapping images.

NO_BINARY int

Disable output conversion to binary thresholded image.

SHEAR_COST float

Penalty against camera tilt (shear distortion).

RVEC_IDX tuple[int, int]

Index slice of rotation vector in parameter vector.

TVEC_IDX tuple[int, int]

Index slice of translation vector in parameter vector.

CUBIC_IDX tuple[int, int]

Index slice of cubic slopes in parameter vector.

SPAN_MIN_WIDTH int

Minimum width of a span in reduced pixels.

SPAN_PX_PER_STEP int

Pixel spacing for sampling along spans.

OPT_MAX_ITER class-attribute instance-attribute

OPT_MAX_ITER: desc(int, 'Maximum optimisation iterations') = 600000

Maximum optimisation iterations.

Tip

For a fast 'draft' preview, set this to a low value like 1 with -it 1.

Note

This value is passed as maxiter to JAX or scipy.optimize.minimize, which defaults to N*1000 where N is the number of parameter variables (in our case, 600).

OPT_METHOD class-attribute instance-attribute

OPT_METHOD: desc(str, 'Name of the JAX/SciPy optimisation method to use.') = 'auto'

Name of the JAX/SciPy optimisation method to use.

JAX supports L-BFGS-B only (its default). It is typically several times faster than Powell's method (SciPy's default), and more accurate than SciPy's L-BFGS-B.

Tip

Install the jax Python package to use JAX reverse-mode autodifferentiation to produce gradients for L-BFGS-B (recommended). It is much faster than Powell's method with SciPy, typically with far fewer function evaluations and a better result.

In SciPy, Powell's method is slower than methods like L-BFGS-B, but it avoids local minima better in high-dimensional parameter spaces because SciPy's gradients are lower quality so produce worse optimisations when used by gradient methods like L-BFGS-B.

Note

This name is passed as method to scipy.optimize.minimize, and defaults to "Powell" if unset.

All options:

  • Nelder-Mead
  • Powell
  • CG
  • BFGS
  • Newton-CG
  • L-BFGS-B
  • TNC
  • COBYLA
  • COBYQA
  • SLSQP
  • trust-const
  • dogleg
  • trust-ncg
  • trust-exact
  • trust-krylov

FOCAL_LENGTH class-attribute instance-attribute

FOCAL_LENGTH: desc(float, 'Normalized focal length of camera') = 1.2

Normalized focal length of camera.

TEXT_MIN_WIDTH class-attribute instance-attribute

TEXT_MIN_WIDTH: desc(int, 'Min reduced px width of detected text contour') = 15

Min reduced px width of detected text contour.

Contours narrower than this are filtered out.

Tip

Decrease for small text, increase to filter out noise.

Question

#78 - Discussion of text detection robustness

TEXT_MIN_HEIGHT class-attribute instance-attribute

TEXT_MIN_HEIGHT: desc(int, 'Min reduced px height of detected text contour') = 2

Min reduced px height of detected text contour.

Contours shorter than this are filtered out.

Question

#78 - Discussion of text detection robustness

TEXT_MIN_ASPECT class-attribute instance-attribute

TEXT_MIN_ASPECT: desc(float, 'Filter out text contours below this w/h ratio') = 1.5

Filter out text contours below this w/h ratio.

Note

Text is typically wider than tall, so this filters vertical artifacts. Decrease for languages with tall characters or rotated text.

Question

#78 - Discussion of text detection robustness

TEXT_MAX_THICKNESS class-attribute instance-attribute

TEXT_MAX_THICKNESS: desc(int, 'Max reduced px thickness of detected text contour') = 10

Max reduced px thickness of detected text contour.

Contours thicker than this are filtered out (likely not text).

Tip

For bold letters or close-up photos where letters are large, the morphological smearing may not connect letters into word blobs effectively. Consider adjusting this alongside TEXT_MIN_WIDTH.

Question

#78 - Discussion of text detection limitations with close-up photos.

EDGE_MAX_OVERLAP class-attribute instance-attribute

EDGE_MAX_OVERLAP: desc(float, 'Max reduced px horiz. overlap of contours in span') = 1.0

Max reduced px horiz. overlap of contours in span.

EDGE_MAX_LENGTH class-attribute instance-attribute

EDGE_MAX_LENGTH: desc(float, 'Max reduced px length of edge connecting contours') = 100.0

Max reduced px length of edge connecting contours.

EDGE_ANGLE_COST class-attribute instance-attribute

EDGE_ANGLE_COST: desc(float, 'Cost of angles in edges (tradeoff vs. length)') = 10.0

Cost of angles in edges (tradeoff vs. length).

EDGE_MAX_ANGLE class-attribute instance-attribute

EDGE_MAX_ANGLE: desc(float, 'Maximum change in angle allowed between contours') = 7.5

Maximum change in angle allowed between contours.

SCREEN_MAX_W class-attribute instance-attribute

SCREEN_MAX_W: desc(int, 'Viewing screen max width (for resizing to screen)') = 1280

Viewing screen max width (for resizing to screen).

SCREEN_MAX_H class-attribute instance-attribute

SCREEN_MAX_H: desc(int, 'Viewing screen max height (for resizing to screen)') = 700

Viewing screen max height (for resizing to screen).

PAGE_MARGIN_X class-attribute instance-attribute

PAGE_MARGIN_X: desc(int, 'Reduced px to ignore near L/R edge') = 50

Reduced px to ignore near L/R edge.

Tip

Set to 0 when text extends to the page edges and you don't want content cropped from the sides at all.

Question

#83: Dewarp failure example using -x 0 -y 0

PAGE_MARGIN_Y class-attribute instance-attribute

PAGE_MARGIN_Y: desc(int, 'Reduced px to ignore near T/B edge') = 20

Reduced px to ignore near T/B edge.

Tip

Set to 0 when text extends to the top/bottom of the frame and you don't want content cropped from either end.

Question

#83: Dewarp failure example using -x 0 -y 0.

ADAPTIVE_WINSZ class-attribute instance-attribute

ADAPTIVE_WINSZ: desc(int, 'Window size for adaptive threshold in reduced px') = 55

Window size for adaptive threshold in reduced px.

Warning

Must be an odd number.

Tip

Increase this value when dealing with varying text sizes or when the default threshold produces poor results. For example, -wz 105 resolved issues with mixed text sizes in #48.

OUTPUT_ZOOM class-attribute instance-attribute

OUTPUT_ZOOM: desc(float, 'How much to zoom output relative to *original* image') = 1.0

How much to zoom output relative to original image.

Note

This controls output resolution, so 2.0 roughly (not exactly) doubles the size. For example in #19:

  • 1 => 800 x 1248 px (default)
  • 2 => 1568 x 2480 px
  • 3 => 2352 x 3712 px

OUTPUT_DPI class-attribute instance-attribute

OUTPUT_DPI: desc(int, 'Just affects stated DPI of PNG, not appearance') = 300

Just affects stated DPI of PNG, not appearance.

REMAP_DECIMATE class-attribute instance-attribute

REMAP_DECIMATE: desc(int, 'Downscaling factor for remapping image') = 16

Downscaling factor for remapping image.

NO_BINARY class-attribute instance-attribute

NO_BINARY: desc(int, 'Disable output conversion to binary thresholded image') = 0

Disable output conversion to binary thresholded image.

SHEAR_COST class-attribute instance-attribute

SHEAR_COST: desc(float, 'Penalty against camera tilt (shear distortion).') = 0.0

Penalty against camera tilt (shear distortion).

Adds a penalty term to the optimization objective that discourages X-rotation, which manifests as sheared/slanted output.

Tip

Increase if output appears sheared (parallelogram instead of rectangle).

Note

The optimizer can mistake page curvature for camera tilt, producing sheared output even from flat scans. This penalty encourages modeling curvature via the cubic params instead of rotation.

Warning

Using this at all may overcorrect, causing non-parallel sides. If edges look worse after enabling (typically only mildly), reduce the value.

Question

#83: Discussion of shear distortion in flat document scans.

RVEC_IDX class-attribute instance-attribute

RVEC_IDX: desc(tuple[int, int], 'Index of rvec in params vector (slice: pair of values)') = (0, 3)

Index of rvec in params vector (slice: pair of values).

TVEC_IDX class-attribute instance-attribute

TVEC_IDX: desc(tuple[int, int], 'Index of tvec in params vector (slice: pair of values)') = (3, 6)

Index of tvec in params vector (slice: pair of values).

CUBIC_IDX class-attribute instance-attribute

CUBIC_IDX: desc(tuple[int, int], 'Index of cubic slopes in params vector (slice: pair of values)') = (6, 8)

Index of cubic slopes in params vector (slice: pair of values).

Note

These parameters control the cubic spline dewarping model. The cubic slopes determine how the page curvature is estimated.

Todo

Document cubic param clamp control (#67)

SPAN_MIN_WIDTH class-attribute instance-attribute

SPAN_MIN_WIDTH: desc(int, 'Minimum reduced px width for span') = 30

Minimum reduced px width for span.

SPAN_PX_PER_STEP class-attribute instance-attribute

SPAN_PX_PER_STEP: desc(int, 'Reduced px spacing for sampling along spans') = 20

Reduced px spacing for sampling along spans.

desc

desc(typ, /, description: str)

Annotate a msgspec.Struct field with a description.

Returns an Annotated[typ, Meta(description=...)] for additional metadata.

Source code in src/page_dewarp/options/core.py
def desc(typ, /, description: str):
    """Annotate a `msgspec.Struct` field with a description.

    Returns an `Annotated[typ, Meta(description=...)]` for additional metadata.
    """
    return Annotated[typ, Meta(description=description)]

k_opt

Defines the default intrinsic camera matrix (K) using a configurable focal length.

This module exports a single function, K, which returns a 3x3 NumPy array representing the camera's intrinsic matrix based on Config.FOCAL_LENGTH.

K

K(cfg: Config) -> np.ndarray

Return the default intrinsic parameter matrix, derived from cfg.FOCAL_LENGTH.

Parameters:

Name Type Description Default
cfg Config

The configuration object, which includes FOCAL_LENGTH.

required

Returns:

Type Description
ndarray

A 3x3 NumPy array representing the intrinsic camera matrix.

Source code in src/page_dewarp/options/k_opt.py
def K(cfg: Config) -> np.ndarray:
    """Return the default intrinsic parameter matrix, derived from `cfg.FOCAL_LENGTH`.

    Args:
        cfg: The configuration object, which includes `FOCAL_LENGTH`.

    Returns:
        A 3x3 NumPy array representing the intrinsic camera matrix.

    """
    return np.array(
        [
            [cfg.FOCAL_LENGTH, 0, 0],
            [0, cfg.FOCAL_LENGTH, 0],
            [0, 0, 1],
        ],
        dtype=np.float32,
    )