Package `taulu`

Taulu - segment tables from images

Taulu is a Python package designed to segment images of tables into their constituent rows and columns (and cells).

To use this package, you first need to make an annotation of the headers in your table images. The idea is that these headers will be similar across your full set of images, and they will be used as a starting point for the search algorithm that finds the table grid.

Here is an example python script of how to use Taulu:

from taulu import Taulu
import os


def setup():
    # create an Annotation file of the headers in the image
    # (one for the left header, one for the right)
    # and store them in the examples directory
    print("Annotating the LEFT header...")
    Taulu.annotate("../data/table_00.png", "table_00_header_left.png")

    print("Annotating the RIGHT header...")
    Taulu.annotate("../data/table_00.png", "table_00_header_right.png")


def main():
    taulu = Taulu(("table_00_header_left.png", "table_00_header_right.png"))
    table = taulu.segment_table("../data/table_00.png",  cell_height_factor=0.8, debug_view=True)

    table.show_cells("../data/table_00.png")


if __name__ == "__main__":
    if os.path.exists("table_00_header_left.png") and os.path.exists(
        "table_00_header_right.png"
    ):
        main()
    else:
        setup()
        main()

If you want a high-level overview of how to use Taulu, see the Taulu class

Sub-modules

taulu.grid: Implements the grid finding algorithm, that is able to find the intersections of horizontal and vertical rules.
taulu.header_aligner: Header alignment functionality
taulu.header_template: A HeaderTemplate defines the structure of a table header.
taulu.split: A module that provides a Split class to handle data with left and right variants …
taulu.table_indexer: Defines an abstract class TableIndexer, which provides methods for mapping pixel coordinates in an image to table cell indices and for cropping images …
taulu.taulu: The Taulu class is a convenience class that hides the inner workings of taulu as much as possible.

Classes

class GridDetector (kernel_size: int = 21, cross_width: int = 6, cross_height: int | None = None, morph_size: int | None = None, sauvola_k: float = 0.04, sauvola_window: int = 15, scale: float = 1.0, search_region: int = 40, distance_penalty: float = 0.4, min_rows: int = 5, grow_threshold: float = 0.3, look_distance: int = 4)

Expand source code

class GridDetector:
    """
    Implements a filters result in high activation where the image has an intersection of a vertical
    and horizontal rule, useful for finding the bounding boxes of cells.

    Also implements the search algorithm that uses the output of this filter to build a tabular structure of
    corner points (in row major order).
    """

    def __init__(
        self,
        kernel_size: int = 21,
        cross_width: int = 6,
        cross_height: Optional[int] = None,
        morph_size: Optional[int] = None,
        sauvola_k: float = 0.04,
        sauvola_window: int = 15,
        scale: float = 1.0,
        search_region: int = 40,
        distance_penalty: float = 0.4,
        min_rows: int = 5,
        grow_threshold: float = 0.3,
        look_distance: int = 4,
    ):
        """
        Args:
            kernel_size (int): the size of the cross kernel
                a larger kernel size often means that more penalty is applied, often leading
                to more sparse results
            cross_width (int): the width of one of the edges in the cross filter, should be
                roughly equal to the width of the rules in the image after morphology is applied
            cross_height (int | None): useful if the horizontal rules and vertical rules
                have different sizes
            morph_size (int | None): the size of the morphology operators that are applied before
                the cross kernel. 'bridges the gaps' of broken-up lines
            sauvola_k (float): threshold parameter for sauvola thresholding
            sauvola_window (int): window_size parameter for sauvola thresholding
            scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly)
            search_region (int): area in which to search for a new max value in `find_nearest` etc.
            distance_penalty (float): how much the point finding algorithm penalizes points that are further in the region [0, 1]
            min_rows (int): minimum number of rows to find before stopping the table finding algorithm
            grow_threshold (float): the threshold for accepting a new point when growing the table
            look_distance (int): how many points away to look when calculating the median slope
        """
        self._validate_parameters(
            kernel_size,
            cross_width,
            cross_height,
            morph_size,
            search_region,
            sauvola_k,
            sauvola_window,
            distance_penalty,
        )

        self._kernel_size = kernel_size
        self._cross_width = cross_width
        self._cross_height = cross_width if cross_height is None else cross_height
        self._morph_size = morph_size if morph_size is not None else cross_width
        self._search_region = search_region
        self._sauvola_k = sauvola_k
        self._sauvola_window = sauvola_window
        self._distance_penalty = distance_penalty
        self._scale = scale
        self._min_rows = min_rows
        self._grow_threshold = grow_threshold
        self._look_distance = look_distance

        self._cross_kernel = self._create_cross_kernel()

    def _validate_parameters(
        self,
        kernel_size: int,
        cross_width: int,
        cross_height: Optional[int],
        morph_size: Optional[int],
        search_region: int,
        sauvola_k: float,
        sauvola_window: int,
        distance_penalty: float,
    ) -> None:
        """Validate initialization parameters."""
        if kernel_size % 2 == 0:
            raise ValueError("kernel_size must be odd")
        if (
            kernel_size <= 0
            or cross_width <= 0
            or search_region <= 0
            or sauvola_window <= 0
        ):
            raise ValueError("Size parameters must be positive")
        if cross_height is not None and cross_height <= 0:
            raise ValueError("cross_height must be positive")
        if morph_size is not None and morph_size <= 0:
            raise ValueError("morph_size must be positive")
        if not 0 <= distance_penalty <= 1:
            raise ValueError("distance_penalty must be in [0, 1]")
        if sauvola_k <= 0:
            raise ValueError("sauvola_k must be positive")

    def _create_gaussian_weights(self, region_size: int) -> NDArray:
        """
        Create a 2D Gaussian weight mask.

        Args:
            shape (tuple[int, int]): Shape of the region (height, width)
            p (float): Minimum value at the edge = 1 - p

        Returns:
            NDArray: Gaussian weight mask
        """
        if self._distance_penalty == 0:
            return np.ones((region_size, region_size), dtype=np.float32)

        y = np.linspace(-1, 1, region_size)
        x = np.linspace(-1, 1, region_size)
        xv, yv = np.meshgrid(x, y)
        dist_squared = xv**2 + yv**2

        # Prevent log(0) when distance_penalty is 1
        if self._distance_penalty >= 0.999:
            sigma = 0.1  # Small sigma for very sharp peak
        else:
            sigma = np.sqrt(-1 / (2 * np.log(1 - self._distance_penalty)))

        weights = np.exp(-dist_squared / (2 * sigma**2))

        return weights.astype(np.float32)

    def _create_cross_kernel(self) -> NDArray:
        kernel = np.zeros((self._kernel_size, self._kernel_size), dtype=np.uint8)
        center = self._kernel_size // 2

        # Create horizontal bar
        h_start = max(0, center - self._cross_height // 2)
        h_end = min(self._kernel_size, center + (self._cross_height + 1) // 2)
        kernel[h_start:h_end, :] = 255

        # Create vertical bar
        v_start = max(0, center - self._cross_width // 2)
        v_end = min(self._kernel_size, center + (self._cross_width + 1) // 2)
        kernel[:, v_start:v_end] = 255

        return kernel

    def _apply_morphology(self, binary: MatLike) -> MatLike:
        # Define a horizontal kernel (adjust width as needed)
        kernel_hor = cv.getStructuringElement(cv.MORPH_RECT, (self._morph_size, 1))
        kernel_ver = cv.getStructuringElement(cv.MORPH_RECT, (1, self._morph_size))

        # Apply dilation
        dilated = cv.dilate(binary, kernel_hor, iterations=1)
        dilated = cv.dilate(dilated, kernel_ver, iterations=1)

        return dilated

    def _apply_cross_matching(self, img: MatLike) -> MatLike:
        """Apply cross kernel template matching."""
        pad_y = self._cross_kernel.shape[0] // 2
        pad_x = self._cross_kernel.shape[1] // 2

        padded = cv.copyMakeBorder(
            img, pad_y, pad_y, pad_x, pad_x, borderType=cv.BORDER_CONSTANT, value=0
        )

        filtered = cv.matchTemplate(padded, self._cross_kernel, cv.TM_SQDIFF_NORMED)
        # Invert and normalize to 0-255 range
        filtered = cv.normalize(1.0 - filtered, None, 0, 255, cv.NORM_MINMAX)
        return filtered.astype(np.uint8)

    def apply(self, img: MatLike, visual: bool = False) -> MatLike:
        """
        Apply the grid detection filter to the input image.

        Args:
            img (MatLike): the input image
            visual (bool): whether to show intermediate steps

        Returns:
            MatLike: the filtered image, with high values (whiter pixels) at intersections of horizontal and vertical rules
        """

        if img is None or img.size == 0:
            raise ValueError("Input image is empty or None")

        binary = imu.sauvola(img, k=self._sauvola_k, window_size=self._sauvola_window)

        if visual:
            imu.show(binary, title="thresholded")

        binary = self._apply_morphology(binary)

        if visual:
            imu.show(binary, title="dilated")

        filtered = self._apply_cross_matching(binary)

        return filtered

    @log_calls(level=logging.DEBUG, include_return=True)
    def find_nearest(
        self, filtered: MatLike, point: Point, region: Optional[int] = None
    ) -> Tuple[Point, float]:
        """
        Find the nearest 'corner match' in the image, along with its score [0,1]

        Args:
            filtered (MatLike): the filtered image (obtained through `apply`)
            point (tuple[int, int]): the approximate target point (x, y)
            region (None | int): alternative value for search region,
                overwriting the `__init__` parameter `region`
        """

        if filtered is None or filtered.size == 0:
            raise ValueError("Filtered image is empty or None")

        region_size = region if region is not None else self._search_region
        x, y = point

        # Calculate crop boundaries
        crop_x = max(0, x - region_size // 2)
        crop_y = max(0, y - region_size // 2)
        crop_width = min(region_size, filtered.shape[1] - crop_x)
        crop_height = min(region_size, filtered.shape[0] - crop_y)

        # Handle edge cases
        if crop_width <= 0 or crop_height <= 0:
            logger.warning(f"Point {point} is outside image bounds")
            return point, 0.0

        cropped = filtered[crop_y : crop_y + crop_height, crop_x : crop_x + crop_width]

        if cropped.size == 0:
            return point, 0.0

        # Always apply Gaussian weighting by extending crop if needed
        if cropped.shape[0] == region_size and cropped.shape[1] == region_size:
            # Perfect size - apply weights directly
            weights = self._create_gaussian_weights(region_size)
            weighted = cropped.astype(np.float32) * weights
        else:
            # Extend crop to match region_size, apply weights, then restore
            extended = np.zeros((region_size, region_size), dtype=cropped.dtype)

            # Calculate offset to center the cropped region in extended array
            offset_y = (region_size - cropped.shape[0]) // 2
            offset_x = (region_size - cropped.shape[1]) // 2

            # Place cropped region in center of extended array
            extended[
                offset_y : offset_y + cropped.shape[0],
                offset_x : offset_x + cropped.shape[1],
            ] = cropped

            # Apply Gaussian weights to extended array
            weights = self._create_gaussian_weights(region_size)
            weighted_extended = extended.astype(np.float32) * weights

            # Extract the original region back out
            weighted = weighted_extended[
                offset_y : offset_y + cropped.shape[0],
                offset_x : offset_x + cropped.shape[1],
            ]

        best_idx = np.argmax(weighted)
        best_y, best_x = np.unravel_index(best_idx, cropped.shape)

        result_point = (
            int(crop_x + best_x),
            int(crop_y + best_y),
        )
        result_confidence = float(weighted[best_y, best_x]) / 255.0

        return result_point, result_confidence

    def find_table_points(
        self,
        img: MatLike | PathLike[str],
        left_top: Point,
        cell_widths: list[int],
        cell_heights: list[int] | int,
        visual: bool = False,
        window: str = WINDOW,
        goals_width: Optional[int] = None,
    ) -> "TableGrid":
        """
        Parse the image to a `TableGrid` structure that holds all of the
        intersections between horizontal and vertical rules, starting near the `left_top` point

        Args:
            img (MatLike): the input image of a table
            left_top (tuple[int, int]): the starting point of the algorithm
            cell_widths (list[int]): the expected widths of the cells (based on a header template)
            cell_heights (list[int]): the expected height of the rows of data.
                The last value from this list is used until the image has no more vertical space.
            visual (bool): whether to show intermediate steps
            window (str): the name of the OpenCV window to use for visualization
            goals_width (int | None): the width of the goal region when searching for the next point.
                If None, defaults to 1.5 * search_region

        Returns:
            a TableGrid object
        """

        if goals_width is None:
            goals_width = self._search_region * 3 // 2

        if not cell_widths:
            raise ValueError("cell_widths must contain at least one value")

        if not isinstance(img, np.ndarray):
            img = cv.imread(os.fspath(img))

        filtered = self.apply(img, visual)

        if visual:
            imu.show(filtered, window=window)

        if isinstance(cell_heights, int):
            cell_heights = [cell_heights]

        left_top, confidence = self.find_nearest(
            filtered, left_top, int(self._search_region * 3)
        )

        if confidence < 0.1:
            logger.warning(
                f"Low confidence for the starting point: {confidence} at {left_top}"
            )

        # resize all parameters according to scale
        img = cv.resize(img, None, fx=self._scale, fy=self._scale)

        if visual:
            imu.push(img)

        filtered = cv.resize(filtered, None, fx=self._scale, fy=self._scale)
        cell_widths = [int(w * self._scale) for w in cell_widths]
        cell_heights = [int(h * self._scale) for h in cell_heights]
        left_top = (int(left_top[0] * self._scale), int(left_top[1] * self._scale))
        self._search_region = int(self._search_region * self._scale)

        img_gray = ensure_gray(img)
        filtered_gray = ensure_gray(filtered)

        table_grower = TableGrower(
            img_gray,
            filtered_gray,
            cell_widths,  # pyright: ignore
            cell_heights,  # pyright: ignore
            left_top,
            self._search_region,
            self._distance_penalty,
            self._look_distance,
            self._grow_threshold,
            self._min_rows,
        )

        def show_grower_progress(wait: bool = False):
            img_orig = np.copy(img)
            corners = table_grower.get_all_corners()
            for y in range(len(corners)):
                for x in range(len(corners[y])):
                    if corners[y][x] is not None:
                        img_orig = imu.draw_points(
                            img_orig,
                            [corners[y][x]],
                            color=(0, 0, 255),
                            thickness=30,
                        )

            edge = table_grower.get_edge_points()

            for point, score in edge:
                color = (100, int(clamp(score * 255, 0, 255)), 100)
                imu.draw_point(img_orig, point, color=color, thickness=20)

            imu.show(img_orig, wait=wait)

        if visual:
            threshold = self._grow_threshold
            look_distance = self._look_distance

            # python implementation of rust loops, for visualization purposes
            # note this is a LOT slower
            while table_grower.grow_point(img_gray, filtered_gray) is not None:
                show_grower_progress()

            show_grower_progress(True)

            original_threshold = threshold

            loops_without_change = 0

            while not table_grower.is_table_complete():
                loops_without_change += 1

                if loops_without_change > 50:
                    break

                if table_grower.extrapolate_one(img_gray, filtered_gray) is not None:
                    show_grower_progress()

                    loops_without_change = 0

                    grown = False
                    while table_grower.grow_point(img_gray, filtered_gray) is not None:
                        show_grower_progress()
                        grown = True
                        threshold = min(0.1 + 0.9 * threshold, original_threshold)
                        table_grower.set_threshold(threshold)

                    if not grown:
                        threshold *= 0.9
                        table_grower.set_threshold(threshold)

                else:
                    threshold *= 0.9
                    table_grower.set_threshold(threshold)

                    if table_grower.grow_point(img_gray, filtered_gray) is not None:
                        show_grower_progress()
                        loops_without_change = 0

        else:
            table_grower.grow_table(img_gray, filtered_gray)

        table_grower.smooth_grid()
        corners = table_grower.get_all_corners()
        logger.info(
            f"Table growth complete, found {len(corners)} rows and {len(corners[0])} columns"
        )
        # rescale corners back to original size
        if self._scale != 1.0:
            for y in range(len(corners)):
                for x in range(len(corners[y])):
                    if corners[y][x] is not None:
                        corners[y][x] = (
                            int(corners[y][x][0] / self._scale),  # pyright:ignore
                            int(corners[y][x][1] / self._scale),  # pyright:ignore
                        )

        return TableGrid(corners)  # pyright: ignore

    @log_calls(level=logging.DEBUG, include_return=True)
    def _build_table_row(
        self,
        gray: MatLike,
        filtered: MatLike,
        start_point: Point,
        cell_widths: List[int],
        row_idx: int,
        goals_width: int,
        previous_row_points: Optional[List[Point]] = None,
        visual: bool = False,
    ) -> List[Point]:
        """Build a single row of table points."""
        row = [start_point]
        current = start_point

        for col_idx, width in enumerate(cell_widths):
            next_point = self._find_next_column_point(
                gray,
                filtered,
                current,
                width,
                goals_width,
                visual,
                previous_row_points,
                col_idx,
            )
            if next_point is None:
                logger.warning(
                    f"Could not find point for row {row_idx}, col {col_idx + 1}"
                )
                return []  # Return empty list to signal failure
            row.append(next_point)
            current = next_point

        return row

    def _clamp_point_to_img(self, point: Point, img: MatLike) -> Point:
        """Clamp a point to be within the image bounds."""
        x = max(0, min(point[0], img.shape[1] - 1))
        y = max(0, min(point[1], img.shape[0] - 1))
        return (x, y)

    @log_calls(level=logging.DEBUG, include_return=True)
    def _find_next_column_point(
        self,
        gray: MatLike,
        filtered: MatLike,
        current: Point,
        width: int,
        goals_width: int,
        visual: bool = False,
        previous_row_points: Optional[List[Point]] = None,
        current_col_idx: int = 0,
    ) -> Optional[Point]:
        """Find the next point in the current row."""

        if previous_row_points is not None and current_col_idx + 1 < len(
            previous_row_points
        ):
            # grow an astar path downwards from the previous row point that is
            # above and to the right of current
            # and ensure all points are within image bounds
            bottom_right = [
                self._clamp_point_to_img(
                    (
                        current[0] + width - goals_width // 2 + x,
                        current[1] + goals_width,
                    ),
                    gray,
                )
                for x in range(goals_width)
            ]
            goals = self._astar(
                gray, previous_row_points[current_col_idx + 1], bottom_right, "down"
            )

            if goals is None:
                logger.warning(
                    f"A* failed to find path going downwards from previous row's point at idx {current_col_idx + 1}"
                )
                return None
        else:
            goals = [
                self._clamp_point_to_img(
                    (current[0] + width, current[1] - goals_width // 2 + y), gray
                )
                for y in range(goals_width)
            ]

        path = self._astar(gray, current, goals, "right")

        if path is None:
            logger.warning(
                f"A* failed to find path going rightward from {current} to goals"
            )
            return None

        next_point, _ = self.find_nearest(filtered, path[-1], self._search_region)

        # show the point and the search region on the image for debugging
        if visual:
            self._visualize_path_finding(
                goals + path,
                current,
                next_point,
                current,
                path[-1],
                self._search_region,
            )

        return next_point

    @log_calls(level=logging.DEBUG, include_return=True)
    def _find_next_row_start(
        self,
        gray: MatLike,
        filtered: MatLike,
        top_point: Point,
        row_idx: int,
        cell_heights: List[int],
        goals_width: int,
        visual: bool = False,
    ) -> Optional[Point]:
        """Find the starting point of the next row."""
        if row_idx < len(cell_heights):
            row_height = cell_heights[row_idx]
        else:
            row_height = cell_heights[-1]

        if top_point[1] + row_height >= filtered.shape[0] - 10:  # Near bottom
            return None

        goals = [
            (top_point[0] - goals_width // 2 + x, top_point[1] + row_height)
            for x in range(goals_width)
        ]

        path = self._astar(gray, top_point, goals, "down")
        if path is None:
            return None

        next_point, _ = self.find_nearest(
            filtered, path[-1], region=self._search_region * 3 // 2
        )

        if visual:
            self._visualize_path_finding(
                path, top_point, next_point, top_point, path[-1], self._search_region
            )

        return next_point

    def _visualize_grid(self, img: MatLike, points: List[List[Point]]) -> None:
        """Visualize the detected grid points."""
        all_points = [point for row in points for point in row]
        drawn = imu.draw_points(img, all_points)
        imu.show(drawn, wait=True)

    def _visualize_path_finding(
        self,
        path: List[Point],
        current: Point,
        next_point: Point,
        previous_row_target: Optional[Point] = None,
        region_center: Optional[Point] = None,
        region_size: Optional[int] = None,
    ) -> None:
        """Visualize the path finding process for debugging."""
        global show_time

        screen = imu.pop()

        # if gray, convert to BGR
        if len(screen.shape) == 2 or screen.shape[2] == 1:
            debug_img = cv.cvtColor(screen, cv.COLOR_GRAY2BGR)
        else:
            debug_img = cast(MatLike, screen)

        debug_img = imu.draw_points(debug_img, path, color=(200, 200, 0), thickness=2)
        debug_img = imu.draw_points(
            debug_img, [current], color=(0, 255, 0), thickness=3
        )
        debug_img = imu.draw_points(
            debug_img, [next_point], color=(0, 0, 255), thickness=2
        )

        # Draw previous row target if available
        if previous_row_target is not None:
            debug_img = imu.draw_points(
                debug_img, [previous_row_target], color=(255, 0, 255), thickness=2
            )

        # Draw search region if available
        if region_center is not None and region_size is not None:
            top_left = (
                max(0, region_center[0] - region_size // 2),
                max(0, region_center[1] - region_size // 2),
            )
            bottom_right = (
                min(debug_img.shape[1], region_center[0] + region_size // 2),
                min(debug_img.shape[0], region_center[1] + region_size // 2),
            )
            cv.rectangle(
                debug_img,
                top_left,
                bottom_right,
                color=(255, 0, 0),
                thickness=2,
                lineType=cv.LINE_AA,
            )

        imu.push(debug_img)

        show_time += 1
        if show_time % 10 != 1:
            return

        imu.show(debug_img, title="Next column point", wait=False)
        # time.sleep(0.003)

    @log_calls(level=logging.DEBUG, include_return=True)
    def _astar(
        self,
        img: np.ndarray,
        start: tuple[int, int],
        goals: list[tuple[int, int]],
        direction: str,
    ) -> Optional[List[Point]]:
        """
        Find the best path between the start point and one of the goal points on the image
        """

        if not goals:
            return None

        if self._scale != 1.0:
            img = cv.resize(img, None, fx=self._scale, fy=self._scale)
            start = (int(start[0] * self._scale), int(start[1] * self._scale))
            goals = [(int(g[0] * self._scale), int(g[1] * self._scale)) for g in goals]

        # calculate bounding box with margin
        all_points = goals + [start]
        xs = [p[0] for p in all_points]
        ys = [p[1] for p in all_points]

        margin = 30
        top_left = (max(0, min(xs) - margin), max(0, min(ys) - margin))
        bottom_right = (
            min(img.shape[1], max(xs) + margin),
            min(img.shape[0], max(ys) + margin),
        )

        # check bounds
        if (
            top_left[0] >= bottom_right[0]
            or top_left[1] >= bottom_right[1]
            or top_left[0] >= img.shape[1]
            or top_left[1] >= img.shape[0]
        ):
            return None

        # transform coordinates to cropped image
        start_local = (start[0] - top_left[0], start[1] - top_left[1])
        goals_local = [(g[0] - top_left[0], g[1] - top_left[1]) for g in goals]

        cropped = img[top_left[1] : bottom_right[1], top_left[0] : bottom_right[0]]

        if cropped.size == 0:
            return None

        path = rust_astar(cropped, start_local, goals_local, direction)

        if path is None:
            return None

        if self._scale != 1.0:
            path = [(int(p[0] / self._scale), int(p[1] / self._scale)) for p in path]
            top_left = (int(top_left[0] / self._scale), int(top_left[1] / self._scale))

        return [(p[0] + top_left[0], p[1] + top_left[1]) for p in path]

Implements a filters result in high activation where the image has an intersection of a vertical and horizontal rule, useful for finding the bounding boxes of cells.

Also implements the search algorithm that uses the output of this filter to build a tabular structure of corner points (in row major order).

Args

kernel_size : int: the size of the cross kernel a larger kernel size often means that more penalty is applied, often leading to more sparse results
cross_width : int: the width of one of the edges in the cross filter, should be roughly equal to the width of the rules in the image after morphology is applied
cross_height : int | None: useful if the horizontal rules and vertical rules have different sizes
morph_size : int | None: the size of the morphology operators that are applied before the cross kernel. 'bridges the gaps' of broken-up lines
sauvola_k : float: threshold parameter for sauvola thresholding
sauvola_window : int: window_size parameter for sauvola thresholding
scale : float: image scale factor to do calculations on (useful for increasing calculation speed mostly)
search_region : int: area in which to search for a new max value in find_nearest etc.
distance_penalty : float: how much the point finding algorithm penalizes points that are further in the region [0, 1]
min_rows : int: minimum number of rows to find before stopping the table finding algorithm
grow_threshold : float: the threshold for accepting a new point when growing the table
look_distance : int: how many points away to look when calculating the median slope

Methods

def apply(self, img: cv2.Mat | numpy.ndarray, visual: bool = False) ‑> cv2.Mat | numpy.ndarray

Expand source code

def apply(self, img: MatLike, visual: bool = False) -> MatLike:
    """
    Apply the grid detection filter to the input image.

    Args:
        img (MatLike): the input image
        visual (bool): whether to show intermediate steps

    Returns:
        MatLike: the filtered image, with high values (whiter pixels) at intersections of horizontal and vertical rules
    """

    if img is None or img.size == 0:
        raise ValueError("Input image is empty or None")

    binary = imu.sauvola(img, k=self._sauvola_k, window_size=self._sauvola_window)

    if visual:
        imu.show(binary, title="thresholded")

    binary = self._apply_morphology(binary)

    if visual:
        imu.show(binary, title="dilated")

    filtered = self._apply_cross_matching(binary)

    return filtered

Apply the grid detection filter to the input image.

Args

img : MatLike: the input image
visual : bool: whether to show intermediate steps

Returns

MatLike: the filtered image, with high values (whiter pixels) at intersections of horizontal and vertical rules

def find_nearest(self, filtered: cv2.Mat | numpy.ndarray, point: Tuple[int, int], region: int | None = None) ‑> Tuple[Tuple[int, int], float]

Expand source code

@log_calls(level=logging.DEBUG, include_return=True)
def find_nearest(
    self, filtered: MatLike, point: Point, region: Optional[int] = None
) -> Tuple[Point, float]:
    """
    Find the nearest 'corner match' in the image, along with its score [0,1]

    Args:
        filtered (MatLike): the filtered image (obtained through `apply`)
        point (tuple[int, int]): the approximate target point (x, y)
        region (None | int): alternative value for search region,
            overwriting the `__init__` parameter `region`
    """

    if filtered is None or filtered.size == 0:
        raise ValueError("Filtered image is empty or None")

    region_size = region if region is not None else self._search_region
    x, y = point

    # Calculate crop boundaries
    crop_x = max(0, x - region_size // 2)
    crop_y = max(0, y - region_size // 2)
    crop_width = min(region_size, filtered.shape[1] - crop_x)
    crop_height = min(region_size, filtered.shape[0] - crop_y)

    # Handle edge cases
    if crop_width <= 0 or crop_height <= 0:
        logger.warning(f"Point {point} is outside image bounds")
        return point, 0.0

    cropped = filtered[crop_y : crop_y + crop_height, crop_x : crop_x + crop_width]

    if cropped.size == 0:
        return point, 0.0

    # Always apply Gaussian weighting by extending crop if needed
    if cropped.shape[0] == region_size and cropped.shape[1] == region_size:
        # Perfect size - apply weights directly
        weights = self._create_gaussian_weights(region_size)
        weighted = cropped.astype(np.float32) * weights
    else:
        # Extend crop to match region_size, apply weights, then restore
        extended = np.zeros((region_size, region_size), dtype=cropped.dtype)

        # Calculate offset to center the cropped region in extended array
        offset_y = (region_size - cropped.shape[0]) // 2
        offset_x = (region_size - cropped.shape[1]) // 2

        # Place cropped region in center of extended array
        extended[
            offset_y : offset_y + cropped.shape[0],
            offset_x : offset_x + cropped.shape[1],
        ] = cropped

        # Apply Gaussian weights to extended array
        weights = self._create_gaussian_weights(region_size)
        weighted_extended = extended.astype(np.float32) * weights

        # Extract the original region back out
        weighted = weighted_extended[
            offset_y : offset_y + cropped.shape[0],
            offset_x : offset_x + cropped.shape[1],
        ]

    best_idx = np.argmax(weighted)
    best_y, best_x = np.unravel_index(best_idx, cropped.shape)

    result_point = (
        int(crop_x + best_x),
        int(crop_y + best_y),
    )
    result_confidence = float(weighted[best_y, best_x]) / 255.0

    return result_point, result_confidence

Find the nearest 'corner match' in the image, along with its score [0,1]

Args

filtered : MatLike: the filtered image (obtained through apply)
point : tuple[int, int]: the approximate target point (x, y)
region : None | int: alternative value for search region, overwriting the __init__ parameter region

def find_table_points(self, img: cv2.Mat | numpy.ndarray | os.PathLike[str], left_top: Tuple[int, int], cell_widths: list[int], cell_heights: list[int] | int, visual: bool = False, window: str = 'taulu', goals_width: int | None = None) ‑> TableGrid

Expand source code

def find_table_points(
    self,
    img: MatLike | PathLike[str],
    left_top: Point,
    cell_widths: list[int],
    cell_heights: list[int] | int,
    visual: bool = False,
    window: str = WINDOW,
    goals_width: Optional[int] = None,
) -> "TableGrid":
    """
    Parse the image to a `TableGrid` structure that holds all of the
    intersections between horizontal and vertical rules, starting near the `left_top` point

    Args:
        img (MatLike): the input image of a table
        left_top (tuple[int, int]): the starting point of the algorithm
        cell_widths (list[int]): the expected widths of the cells (based on a header template)
        cell_heights (list[int]): the expected height of the rows of data.
            The last value from this list is used until the image has no more vertical space.
        visual (bool): whether to show intermediate steps
        window (str): the name of the OpenCV window to use for visualization
        goals_width (int | None): the width of the goal region when searching for the next point.
            If None, defaults to 1.5 * search_region

    Returns:
        a TableGrid object
    """

    if goals_width is None:
        goals_width = self._search_region * 3 // 2

    if not cell_widths:
        raise ValueError("cell_widths must contain at least one value")

    if not isinstance(img, np.ndarray):
        img = cv.imread(os.fspath(img))

    filtered = self.apply(img, visual)

    if visual:
        imu.show(filtered, window=window)

    if isinstance(cell_heights, int):
        cell_heights = [cell_heights]

    left_top, confidence = self.find_nearest(
        filtered, left_top, int(self._search_region * 3)
    )

    if confidence < 0.1:
        logger.warning(
            f"Low confidence for the starting point: {confidence} at {left_top}"
        )

    # resize all parameters according to scale
    img = cv.resize(img, None, fx=self._scale, fy=self._scale)

    if visual:
        imu.push(img)

    filtered = cv.resize(filtered, None, fx=self._scale, fy=self._scale)
    cell_widths = [int(w * self._scale) for w in cell_widths]
    cell_heights = [int(h * self._scale) for h in cell_heights]
    left_top = (int(left_top[0] * self._scale), int(left_top[1] * self._scale))
    self._search_region = int(self._search_region * self._scale)

    img_gray = ensure_gray(img)
    filtered_gray = ensure_gray(filtered)

    table_grower = TableGrower(
        img_gray,
        filtered_gray,
        cell_widths,  # pyright: ignore
        cell_heights,  # pyright: ignore
        left_top,
        self._search_region,
        self._distance_penalty,
        self._look_distance,
        self._grow_threshold,
        self._min_rows,
    )

    def show_grower_progress(wait: bool = False):
        img_orig = np.copy(img)
        corners = table_grower.get_all_corners()
        for y in range(len(corners)):
            for x in range(len(corners[y])):
                if corners[y][x] is not None:
                    img_orig = imu.draw_points(
                        img_orig,
                        [corners[y][x]],
                        color=(0, 0, 255),
                        thickness=30,
                    )

        edge = table_grower.get_edge_points()

        for point, score in edge:
            color = (100, int(clamp(score * 255, 0, 255)), 100)
            imu.draw_point(img_orig, point, color=color, thickness=20)

        imu.show(img_orig, wait=wait)

    if visual:
        threshold = self._grow_threshold
        look_distance = self._look_distance

        # python implementation of rust loops, for visualization purposes
        # note this is a LOT slower
        while table_grower.grow_point(img_gray, filtered_gray) is not None:
            show_grower_progress()

        show_grower_progress(True)

        original_threshold = threshold

        loops_without_change = 0

        while not table_grower.is_table_complete():
            loops_without_change += 1

            if loops_without_change > 50:
                break

            if table_grower.extrapolate_one(img_gray, filtered_gray) is not None:
                show_grower_progress()

                loops_without_change = 0

                grown = False
                while table_grower.grow_point(img_gray, filtered_gray) is not None:
                    show_grower_progress()
                    grown = True
                    threshold = min(0.1 + 0.9 * threshold, original_threshold)
                    table_grower.set_threshold(threshold)

                if not grown:
                    threshold *= 0.9
                    table_grower.set_threshold(threshold)

            else:
                threshold *= 0.9
                table_grower.set_threshold(threshold)

                if table_grower.grow_point(img_gray, filtered_gray) is not None:
                    show_grower_progress()
                    loops_without_change = 0

    else:
        table_grower.grow_table(img_gray, filtered_gray)

    table_grower.smooth_grid()
    corners = table_grower.get_all_corners()
    logger.info(
        f"Table growth complete, found {len(corners)} rows and {len(corners[0])} columns"
    )
    # rescale corners back to original size
    if self._scale != 1.0:
        for y in range(len(corners)):
            for x in range(len(corners[y])):
                if corners[y][x] is not None:
                    corners[y][x] = (
                        int(corners[y][x][0] / self._scale),  # pyright:ignore
                        int(corners[y][x][1] / self._scale),  # pyright:ignore
                    )

    return TableGrid(corners)  # pyright: ignore

Parse the image to a TableGrid structure that holds all of the intersections between horizontal and vertical rules, starting near the left_top point

Args

img : MatLike: the input image of a table
left_top : tuple[int, int]: the starting point of the algorithm
cell_widths : list[int]: the expected widths of the cells (based on a header template)
cell_heights : list[int]: the expected height of the rows of data. The last value from this list is used until the image has no more vertical space.
visual : bool: whether to show intermediate steps
window : str: the name of the OpenCV window to use for visualization
goals_width : int | None: the width of the goal region when searching for the next point. If None, defaults to 1.5 * search_region

Returns

a TableGrid object

Expand source code

class HeaderAligner:
    """
    Calculates a transformation matrix to transform points from header-template-image-space to
    subject-image-space.
    """

    def __init__(
        self,
        template: None | MatLike | PathLike[str] | str = None,
        max_features: int = 25_000,
        patch_size: int = 31,
        match_fraction: float = 0.6,
        scale: float = 1.0,
        max_dist: float = 1.00,
        k: float | None = 0.05,
    ):
        """
        Args:
            template (MatLike | str): (path of) template image, with the table template clearly visible
            max_features (int): maximal number of features that will be extracted by ORB
            patch_size (int): for ORB feature extractor
            match_fraction (float): best fraction of matches that are kept
            scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly)
            max_dist (float): maximum distance (relative to image size) of matched features.
                Increase this value if the warping between image and template needs to be more agressive
            k (float | None): sauvola thresholding threshold value. If None, no sauvola thresholding is done
        """

        if type(template) is str or type(template) is PathLike:
            value = cv.imread(fspath(template))
            template = value

        self._k = k
        if scale > 1.0:
            raise TauluException(
                "Scaling up the image for header alignment is useless. Use 0 < scale <= 1.0"
            )
        if scale == 0:
            raise TauluException("Use 0 < scale <= 1.0")

        self._scale = scale
        self._template = self._scale_img(cast(MatLike, template))
        self._template_orig: None | MatLike = None
        self._preprocess_template()
        self._max_features = max_features
        self._patch_size = patch_size
        self._match_fraction = match_fraction
        self._max_dist = max_dist

    def _scale_img(self, img: MatLike) -> MatLike:
        if self._scale == 1.0:
            return img

        return cv.resize(img, None, fx=self._scale, fy=self._scale)

    def _unscale_img(self, img: MatLike) -> MatLike:
        if self._scale == 1.0:
            return img

        return cv.resize(img, None, fx=1 / self._scale, fy=1 / self._scale)

    def _unscale_homography(self, h: np.ndarray) -> np.ndarray:
        if self._scale == 1.0:
            return h

        scale_matrix = np.diag([self._scale, self._scale, 1.0])
        # inv_scale_matrix = np.linalg.inv(scale_matrix)
        inv_scale_matrix = np.diag([1.0 / self._scale, 1.0 / self._scale, 1.0])
        # return inv_scale_matrix @ h @ scale_matrix
        return inv_scale_matrix @ h @ scale_matrix

    @property
    def template(self):
        """The template image that subject images are aligned to"""
        return self._template

    @template.setter
    def template(self, value: MatLike | str):
        """Set the template image as a path or an image"""

        if type(value) is str:
            value = cv.imread(value)
            self._template = value

        # TODO: check if the image has the right properties (dimensions etc.)
        self._template = cast(MatLike, value)

        self._preprocess_template()

    def _preprocess_template(self):
        self._template_orig = cv.cvtColor(self._template, cv.COLOR_BGR2GRAY)
        if self._k is not None:
            self._template = imu.sauvola(self._template, self._k)
            self._template = cv.bitwise_not(self._template)
        else:
            _, _, self._template = cv.split(self._template)

    def _preprocess_image(self, img: MatLike):
        if self._template_orig is None:
            raise TauluException("process the template first")

        if self._k is not None:
            img = imu.sauvola(img, self._k)
            img = cv.bitwise_not(img)
        else:
            _, _, img = cv.split(img)

        return img

    @log_calls(level=logging.DEBUG, include_return=True)
    def _find_transform_of_template_on(
        self, im: MatLike, visual: bool = False, window: str = WINDOW
    ):
        im = self._scale_img(im)
        # Detect ORB features and compute descriptors.
        orb = cv.ORB_create(
            self._max_features,  # type:ignore
            patchSize=self._patch_size,
        )
        keypoints_im, descriptors_im = orb.detectAndCompute(im, None)
        keypoints_tg, descriptors_tg = orb.detectAndCompute(self._template, None)

        # Match features
        matcher = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True)
        matches = matcher.match(descriptors_im, descriptors_tg)

        # Sort matches by score
        matches = sorted(matches, key=lambda x: x.distance)

        # Remove not so good matches
        numGoodMatches = int(len(matches) * self._match_fraction)
        matches = matches[:numGoodMatches]

        if visual:
            final_img_filtered = cv.drawMatches(
                im,
                keypoints_im,
                self._template,
                keypoints_tg,
                matches[:10],
                None,  # type:ignore
                cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS,
            )
            imu.show(final_img_filtered, title="matches", window=window)

        # Extract location of good matches
        points1 = np.zeros((len(matches), 2), dtype=np.float32)
        points2 = np.zeros((len(matches), 2), dtype=np.float32)

        for i, match in enumerate(matches):
            points1[i, :] = keypoints_tg[match.trainIdx].pt
            points2[i, :] = keypoints_im[match.queryIdx].pt

        # Prune reference points based upon distance between
        # key points. This assumes a fairly good alignment to start with
        # due to the protocol used (location of the sheets)
        p1 = pd.DataFrame(data=points1)
        p2 = pd.DataFrame(data=points2)
        refdist = abs(p1 - p2)

        mask_x = refdist.loc[:, 0] < (im.shape[0] * self._max_dist)
        mask_y = refdist.loc[:, 1] < (im.shape[1] * self._max_dist)
        mask = mask_x & mask_y
        points1 = points1[mask.to_numpy()]
        points2 = points2[mask.to_numpy()]

        # Find homography
        h, _ = cv.findHomography(points1, points2, cv.RANSAC)

        return self._unscale_homography(h)

    def view_alignment(self, img: MatLike, h: NDArray):
        """
        Show the alignment of the template on the given image
        by transforming it using the supplied transformation matrix `h`
        and visualising both on different channels

        Args:
            img (MatLike): the image on which the template is transformed
            h (NDArray): the transformation matrix
        """

        im = imu.ensure_gray(img)
        header = imu.ensure_gray(self._unscale_img(self._template))
        height, width = im.shape

        header_warped = cv.warpPerspective(header, h, (width, height))

        merged = np.full((height, width, 3), 255, dtype=np.uint8)

        merged[..., 1] = im
        merged[..., 2] = header_warped

        return imu.show(merged)

    @log_calls(level=logging.DEBUG, include_return=True)
    def align(
        self, img: MatLike | str, visual: bool = False, window: str = WINDOW
    ) -> NDArray:
        """
        Calculates a homogeneous transformation matrix that maps pixels of
        the template to the given image
        """

        logger.info("Aligning header with supplied table image")

        if type(img) is str:
            img = cv.imread(img)
        img = cast(MatLike, img)

        img = self._preprocess_image(img)

        h = self._find_transform_of_template_on(img, visual, window)

        if visual:
            self.view_alignment(img, h)

        return h

    def template_to_img(self, h: NDArray, point: Iterable[int]) -> tuple[int, int]:
        """
        Transform the given point (in template-space) using the transformation h
        (obtained through the `align` method)

        Args:
            h (NDArray): transformation matrix of shape (3, 3)
            point (Iterable[int]): the to-be-transformed point, should conform to (x, y)
        """

        point = np.array([[point[0], point[1], 1]])  # type:ignore
        transformed = np.dot(h, point.T)  # type:ignore

        transformed /= transformed[2]

        return int(transformed[0][0]), int(transformed[1][0])

Calculates a transformation matrix to transform points from header-template-image-space to subject-image-space.

Args

template : MatLike | str: (path of) template image, with the table template clearly visible
max_features : int: maximal number of features that will be extracted by ORB
patch_size : int: for ORB feature extractor
match_fraction : float: best fraction of matches that are kept
scale : float: image scale factor to do calculations on (useful for increasing calculation speed mostly)
max_dist : float: maximum distance (relative to image size) of matched features. Increase this value if the warping between image and template needs to be more agressive
k : float | None: sauvola thresholding threshold value. If None, no sauvola thresholding is done

Instance variables

prop template

Expand source code

@property
def template(self):
    """The template image that subject images are aligned to"""
    return self._template

The template image that subject images are aligned to

Methods

def align(self, img: cv2.Mat | numpy.ndarray | str, visual: bool = False, window: str = 'taulu') ‑> numpy.ndarray[tuple[int, ...], numpy.dtype[+_ScalarType_co]]

Expand source code

@log_calls(level=logging.DEBUG, include_return=True)
def align(
    self, img: MatLike | str, visual: bool = False, window: str = WINDOW
) -> NDArray:
    """
    Calculates a homogeneous transformation matrix that maps pixels of
    the template to the given image
    """

    logger.info("Aligning header with supplied table image")

    if type(img) is str:
        img = cv.imread(img)
    img = cast(MatLike, img)

    img = self._preprocess_image(img)

    h = self._find_transform_of_template_on(img, visual, window)

    if visual:
        self.view_alignment(img, h)

    return h

Calculates a homogeneous transformation matrix that maps pixels of the template to the given image

def template_to_img(self, h: numpy.ndarray[tuple[int, ...], numpy.dtype[+_ScalarType_co]], point: Iterable[int]) ‑> tuple[int, int]

Expand source code

def template_to_img(self, h: NDArray, point: Iterable[int]) -> tuple[int, int]:
    """
    Transform the given point (in template-space) using the transformation h
    (obtained through the `align` method)

    Args:
        h (NDArray): transformation matrix of shape (3, 3)
        point (Iterable[int]): the to-be-transformed point, should conform to (x, y)
    """

    point = np.array([[point[0], point[1], 1]])  # type:ignore
    transformed = np.dot(h, point.T)  # type:ignore

    transformed /= transformed[2]

    return int(transformed[0][0]), int(transformed[1][0])

Transform the given point (in template-space) using the transformation h (obtained through the align method)

Args

h : NDArray: transformation matrix of shape (3, 3)
point : Iterable[int]: the to-be-transformed point, should conform to (x, y)

def view_alignment(self, img: cv2.Mat | numpy.ndarray, h: numpy.ndarray[tuple[int, ...], numpy.dtype[+_ScalarType_co]])

Expand source code

def view_alignment(self, img: MatLike, h: NDArray):
    """
    Show the alignment of the template on the given image
    by transforming it using the supplied transformation matrix `h`
    and visualising both on different channels

    Args:
        img (MatLike): the image on which the template is transformed
        h (NDArray): the transformation matrix
    """

    im = imu.ensure_gray(img)
    header = imu.ensure_gray(self._unscale_img(self._template))
    height, width = im.shape

    header_warped = cv.warpPerspective(header, h, (width, height))

    merged = np.full((height, width, 3), 255, dtype=np.uint8)

    merged[..., 1] = im
    merged[..., 2] = header_warped

    return imu.show(merged)

Show the alignment of the template on the given image by transforming it using the supplied transformation matrix h and visualising both on different channels

Args

img : MatLike: the image on which the template is transformed
h : NDArray: the transformation matrix

class HeaderTemplate (rules: Iterable[Iterable[int]])

Expand source code

class HeaderTemplate(TableIndexer):
    def __init__(self, rules: Iterable[Iterable[int]]):
        """
        A TableTemplate is a collection of rules of a table. This class implements methods
        for finding cell positions in a table image, given the template the image adheres to.

        Args:
            rules: 2D array of lines, where each line is represented as [x0, y0, x1, y1]
        """

        super().__init__()
        self._rules = [_Rule(*rule) for rule in rules]
        self._h_rules = sorted(
            [rule for rule in self._rules if rule._is_horizontal()], key=lambda r: r._y
        )
        self._v_rules = sorted(
            [rule for rule in self._rules if rule._is_vertical()], key=lambda r: r._x
        )

    @log_calls(level=logging.DEBUG)
    def save(self, path: PathLike[str]):
        """
        Save the HeaderTemplate to the given path, as a json
        """

        data = {"rules": [r.to_dict() for r in self._rules]}

        with open(path, "w") as f:
            json.dump(data, f)

    @staticmethod
    @log_calls(level=logging.DEBUG)
    def from_saved(path: PathLike[str]) -> "HeaderTemplate":
        with open(path, "r") as f:
            data = json.load(f)
            rules = data["rules"]
            rules = [[r["x0"], r["y0"], r["x1"], r["y1"]] for r in rules]

            return HeaderTemplate(rules)

    @property
    def cols(self) -> int:
        return len(self._v_rules) - 1

    @property
    def rows(self) -> int:
        return len(self._h_rules) - 1

    @staticmethod
    @log_calls(level=logging.DEBUG)
    def annotate_image(
        template: MatLike | str, crop: Optional[PathLike[str]] = None, margin: int = 10
    ) -> "HeaderTemplate":
        """
        Utility method that allows users to create a template form a template image.

        The user is asked to click to annotate lines (two clicks per line).

        Args:
            template: the image on which to annotate the header lines
            crop (str | None): if str, crop the template image first, then do the annotation.
                The cropped image will be stored at the supplied path
            margin (int): margin to add around the cropping of the header
        """

        if type(template) is str:
            value = cv.imread(template)
            template = value
        template = cast(MatLike, template)

        if crop is not None:
            cropped = HeaderTemplate._crop(template, margin)
            cv.imwrite(os.fspath(crop), cropped)
            template = cropped

        start_point = None
        lines: list[list[int]] = []

        anno_template = np.copy(template)

        def get_point(event, x, y, flags, params):
            nonlocal lines, start_point, anno_template
            _ = flags
            _ = params
            if event == cv.EVENT_LBUTTONDOWN:
                if start_point is not None:
                    line: list[int] = [start_point[1], start_point[0], x, y]

                    cv.line(  # type:ignore
                        anno_template,  # type:ignore
                        (start_point[1], start_point[0]),
                        (x, y),
                        (0, 255, 0),
                        2,
                        cv.LINE_AA,
                    )
                    cv.imshow(constants.WINDOW, anno_template)  # type:ignore

                    lines.append(line)
                    start_point = None
                else:
                    start_point = (y, x)
            elif event == cv.EVENT_RBUTTONDOWN:
                start_point = None

                # remove the last annotation
                lines = lines[:-1]

                anno_template = np.copy(anno_template)

                for line in lines:
                    cv.line(
                        template,
                        (line[0], line[1]),
                        (line[2], line[3]),
                        (0, 255, 0),
                        2,
                        cv.LINE_AA,
                    )

                cv.imshow(constants.WINDOW, template)

        print(ANNO_HELP)

        imu.show(anno_template, get_point, title="annotate the header")

        return HeaderTemplate(lines)

    @staticmethod
    @log_calls(level=logging.DEBUG, include_return=True)
    def _crop(template: MatLike, margin: int = 10) -> MatLike:
        """
        Crop the image to contain only the annotations, such that it can be used as the header image in the taulu workflow.
        """

        points = []
        anno_template = np.copy(template)

        def get_point(event, x, y, flags, params):
            nonlocal points, anno_template
            _ = flags
            _ = params
            if event == cv.EVENT_LBUTTONDOWN:
                point = (x, y)

                cv.circle(  # type:ignore
                    anno_template,  # type:ignore
                    (x, y),
                    4,
                    (0, 255, 0),
                    2,
                )
                cv.imshow(constants.WINDOW, anno_template)  # type:ignore

                points.append(point)
            elif event == cv.EVENT_RBUTTONDOWN:
                # remove the last annotation
                points = points[:-1]

                anno_template = np.copy(anno_template)

                for p in points:
                    cv.circle(
                        anno_template,
                        p,
                        4,
                        (0, 255, 0),
                        2,
                    )

                cv.imshow(constants.WINDOW, anno_template)

        print(CROP_HELP)

        imu.show(anno_template, get_point, title="crop the header")

        assert len(points) == 4, (
            "you need to annotate the four corners of the table in order to crop it"
        )

        # crop the image to contain all of the points (just crop rectangularly, x, y, w, h)
        # Convert points to numpy array
        points_np = np.array(points)

        # Find bounding box
        x_min = np.min(points_np[:, 0])
        y_min = np.min(points_np[:, 1])
        x_max = np.max(points_np[:, 0])
        y_max = np.max(points_np[:, 1])

        # Compute width and height
        width = x_max - x_min
        height = y_max - y_min

        # Ensure integers and within image boundaries
        x_min = max(int(x_min), 0)
        y_min = max(int(y_min), 0)
        width = int(width)
        height = int(height)

        # Crop the image
        cropped = template[
            y_min - margin : y_min + height + margin,
            x_min - margin : x_min + width + margin,
        ]

        return cropped

    @staticmethod
    def from_vgg_annotation(annotation: str) -> "HeaderTemplate":
        """
        Create a TableTemplate from annotations made in [vgg](https://annotate.officialstatistics.org/), using the polylines tool.

        Args:
            annotation (str): the path of the annotation csv file
        """

        rules = []
        with open(annotation, "r") as csvfile:
            reader = csv.DictReader(csvfile)
            for row in reader:
                shape_attributes = json.loads(row["region_shape_attributes"])
                if shape_attributes["name"] == "polyline":
                    x_points = shape_attributes["all_points_x"]
                    y_points = shape_attributes["all_points_y"]
                    if len(x_points) == 2 and len(y_points) == 2:
                        rules.append(
                            [x_points[0], y_points[0], x_points[1], y_points[1]]
                        )

        return HeaderTemplate(rules)

    def cell_width(self, i: int) -> int:
        self._check_col_idx(i)
        return int(self._v_rules[i + 1]._x - self._v_rules[i]._x)

    def cell_widths(self, start: int = 0) -> list[int]:
        return [self.cell_width(i) for i in range(start, self.cols)]

    def cell_height(self, header_factor: float = 0.8) -> int:
        return int((self._h_rules[1]._y - self._h_rules[0]._y) * header_factor)

    def cell_heights(self, header_factors: list[float] | float) -> list[int]:
        if isinstance(header_factors, float):
            header_factors = [header_factors]
        header_factors = cast(list, header_factors)
        return [
            int((self._h_rules[1]._y - self._h_rules[0]._y) * f) for f in header_factors
        ]

    def intersection(self, index: tuple[int, int]) -> tuple[float, float]:
        """
        Returns the interaction of the index[0]th horizontal rule and the
        index[1]th vertical rule
        """

        ints = self._h_rules[index[0]].intersection(self._v_rules[index[1]])
        assert ints is not None
        return ints

    def cell(self, point: tuple[float, float]) -> tuple[int, int]:
        """
        Get the cell index (row, col) that corresponds with the point (x, y) in the template image

        Args:
            point (tuple[float, float]): the coordinates in the template image

        Returns:
            tuple[int, int]: (row, col)
        """

        x, y = point

        row = -1
        col = -1

        for i in range(self.rows):
            y0 = self._h_rules[i]._y_at_x(x)
            y1 = self._h_rules[i + 1]._y_at_x(x)
            if min(y0, y1) <= y <= max(y0, y1):
                row = i
                break

        for i in range(self.cols):
            x0 = self._v_rules[i]._x_at_y(y)
            x1 = self._v_rules[i + 1]._x_at_y(y)
            if min(x0, x1) <= x <= max(x0, x1):
                col = i
                break

        if row == -1 or col == -1:
            return (-1, -1)

        return (row, col)

    def cell_polygon(
        self, cell: tuple[int, int]
    ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]:
        """
        Return points (x,y) that make up a polygon around the requested cell
        (top left, top right, bottom right, bottom left)
        """

        row, col = cell

        self._check_col_idx(col)
        self._check_row_idx(row)

        top_rule = self._h_rules[row]
        bottom_rule = self._h_rules[row + 1]
        left_rule = self._v_rules[col]
        right_rule = self._v_rules[col + 1]

        # Calculate corner points using intersections
        top_left = top_rule.intersection(left_rule)
        top_right = top_rule.intersection(right_rule)
        bottom_left = bottom_rule.intersection(left_rule)
        bottom_right = bottom_rule.intersection(right_rule)

        if not all(
            [
                point is not None
                for point in [top_left, top_right, bottom_left, bottom_right]
            ]
        ):
            raise TauluException("the lines around this cell do not intersect")

        return top_left, top_right, bottom_right, bottom_left  # type:ignore

    def region(
        self, start: tuple[int, int], end: tuple[int, int]
    ) -> tuple[Point, Point, Point, Point]:
        self._check_row_idx(start[0])
        self._check_row_idx(end[0])
        self._check_col_idx(start[1])
        self._check_col_idx(end[1])

        # the rules that surround this row
        top_rule = self._h_rules[start[0]]
        bottom_rule = self._h_rules[end[0] + 1]
        left_rule = self._v_rules[start[1]]
        right_rule = self._v_rules[end[1] + 1]

        # four points that will be the bounding polygon of the result,
        # which needs to be rectified
        top_left = top_rule.intersection(left_rule)
        top_right = top_rule.intersection(right_rule)
        bottom_left = bottom_rule.intersection(left_rule)
        bottom_right = bottom_rule.intersection(right_rule)

        if (
            top_left is None
            or top_right is None
            or bottom_left is None
            or bottom_right is None
        ):
            raise TauluException("the lines around this row do not intersect properly")

        def to_point(pnt) -> Point:
            return (int(pnt[0]), int(pnt[1]))

        return (
            to_point(top_left),
            to_point(top_right),
            to_point(bottom_right),
            to_point(bottom_left),
        )

    def text_regions(
        self, img: MatLike, row: int, margin_x: int = 10, margin_y: int = -20
    ) -> list[tuple[tuple[int, int], tuple[int, int]]]:
        raise TauluException("text_regions should not be called on a HeaderTemplate")

Subclasses implement methods for going from a pixel in the input image to a table cell index, and cropping an image to the given table cell index.

A TableTemplate is a collection of rules of a table. This class implements methods for finding cell positions in a table image, given the template the image adheres to.

Args

rules: 2D array of lines, where each line is represented as [x0, y0, x1, y1]

Ancestors

TableIndexer
abc.ABC

Static methods

def annotate_image(template: cv2.Mat | numpy.ndarray | str, crop: os.PathLike[str] | None = None, margin: int = 10) ‑> HeaderTemplate

Expand source code

@staticmethod
@log_calls(level=logging.DEBUG)
def annotate_image(
    template: MatLike | str, crop: Optional[PathLike[str]] = None, margin: int = 10
) -> "HeaderTemplate":
    """
    Utility method that allows users to create a template form a template image.

    The user is asked to click to annotate lines (two clicks per line).

    Args:
        template: the image on which to annotate the header lines
        crop (str | None): if str, crop the template image first, then do the annotation.
            The cropped image will be stored at the supplied path
        margin (int): margin to add around the cropping of the header
    """

    if type(template) is str:
        value = cv.imread(template)
        template = value
    template = cast(MatLike, template)

    if crop is not None:
        cropped = HeaderTemplate._crop(template, margin)
        cv.imwrite(os.fspath(crop), cropped)
        template = cropped

    start_point = None
    lines: list[list[int]] = []

    anno_template = np.copy(template)

    def get_point(event, x, y, flags, params):
        nonlocal lines, start_point, anno_template
        _ = flags
        _ = params
        if event == cv.EVENT_LBUTTONDOWN:
            if start_point is not None:
                line: list[int] = [start_point[1], start_point[0], x, y]

                cv.line(  # type:ignore
                    anno_template,  # type:ignore
                    (start_point[1], start_point[0]),
                    (x, y),
                    (0, 255, 0),
                    2,
                    cv.LINE_AA,
                )
                cv.imshow(constants.WINDOW, anno_template)  # type:ignore

                lines.append(line)
                start_point = None
            else:
                start_point = (y, x)
        elif event == cv.EVENT_RBUTTONDOWN:
            start_point = None

            # remove the last annotation
            lines = lines[:-1]

            anno_template = np.copy(anno_template)

            for line in lines:
                cv.line(
                    template,
                    (line[0], line[1]),
                    (line[2], line[3]),
                    (0, 255, 0),
                    2,
                    cv.LINE_AA,
                )

            cv.imshow(constants.WINDOW, template)

    print(ANNO_HELP)

    imu.show(anno_template, get_point, title="annotate the header")

    return HeaderTemplate(lines)

Utility method that allows users to create a template form a template image.

The user is asked to click to annotate lines (two clicks per line).

Args

template: the image on which to annotate the header lines
crop : str | None: if str, crop the template image first, then do the annotation. The cropped image will be stored at the supplied path
margin : int: margin to add around the cropping of the header

def from_saved(path: os.PathLike[str]) ‑> HeaderTemplate

Expand source code

@staticmethod
@log_calls(level=logging.DEBUG)
def from_saved(path: PathLike[str]) -> "HeaderTemplate":
    with open(path, "r") as f:
        data = json.load(f)
        rules = data["rules"]
        rules = [[r["x0"], r["y0"], r["x1"], r["y1"]] for r in rules]

        return HeaderTemplate(rules)

def from_vgg_annotation(annotation: str) ‑> HeaderTemplate

Expand source code

@staticmethod
def from_vgg_annotation(annotation: str) -> "HeaderTemplate":
    """
    Create a TableTemplate from annotations made in [vgg](https://annotate.officialstatistics.org/), using the polylines tool.

    Args:
        annotation (str): the path of the annotation csv file
    """

    rules = []
    with open(annotation, "r") as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            shape_attributes = json.loads(row["region_shape_attributes"])
            if shape_attributes["name"] == "polyline":
                x_points = shape_attributes["all_points_x"]
                y_points = shape_attributes["all_points_y"]
                if len(x_points) == 2 and len(y_points) == 2:
                    rules.append(
                        [x_points[0], y_points[0], x_points[1], y_points[1]]
                    )

    return HeaderTemplate(rules)

Create a TableTemplate from annotations made in vgg, using the polylines tool.

Args

annotation : str: the path of the annotation csv file

Instance variables

prop cols : int

Expand source code

@property
def cols(self) -> int:
    return len(self._v_rules) - 1

prop rows : int

Expand source code

@property
def rows(self) -> int:
    return len(self._h_rules) - 1

Methods

def cell(self, point: tuple[float, float]) ‑> tuple[int, int]

Expand source code

def cell(self, point: tuple[float, float]) -> tuple[int, int]:
    """
    Get the cell index (row, col) that corresponds with the point (x, y) in the template image

    Args:
        point (tuple[float, float]): the coordinates in the template image

    Returns:
        tuple[int, int]: (row, col)
    """

    x, y = point

    row = -1
    col = -1

    for i in range(self.rows):
        y0 = self._h_rules[i]._y_at_x(x)
        y1 = self._h_rules[i + 1]._y_at_x(x)
        if min(y0, y1) <= y <= max(y0, y1):
            row = i
            break

    for i in range(self.cols):
        x0 = self._v_rules[i]._x_at_y(y)
        x1 = self._v_rules[i + 1]._x_at_y(y)
        if min(x0, x1) <= x <= max(x0, x1):
            col = i
            break

    if row == -1 or col == -1:
        return (-1, -1)

    return (row, col)

Get the cell index (row, col) that corresponds with the point (x, y) in the template image

Args

point : tuple[float, float]: the coordinates in the template image

Returns

tuple[int, int]: (row, col)

def cell_height(self, header_factor: float = 0.8) ‑> int

Expand source code

def cell_height(self, header_factor: float = 0.8) -> int:
    return int((self._h_rules[1]._y - self._h_rules[0]._y) * header_factor)

def cell_heights(self, header_factors: list[float] | float) ‑> list[int]

Expand source code

def cell_heights(self, header_factors: list[float] | float) -> list[int]:
    if isinstance(header_factors, float):
        header_factors = [header_factors]
    header_factors = cast(list, header_factors)
    return [
        int((self._h_rules[1]._y - self._h_rules[0]._y) * f) for f in header_factors
    ]

def cell_polygon(self, cell: tuple[int, int]) ‑> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]

Expand source code

def cell_polygon(
    self, cell: tuple[int, int]
) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]:
    """
    Return points (x,y) that make up a polygon around the requested cell
    (top left, top right, bottom right, bottom left)
    """

    row, col = cell

    self._check_col_idx(col)
    self._check_row_idx(row)

    top_rule = self._h_rules[row]
    bottom_rule = self._h_rules[row + 1]
    left_rule = self._v_rules[col]
    right_rule = self._v_rules[col + 1]

    # Calculate corner points using intersections
    top_left = top_rule.intersection(left_rule)
    top_right = top_rule.intersection(right_rule)
    bottom_left = bottom_rule.intersection(left_rule)
    bottom_right = bottom_rule.intersection(right_rule)

    if not all(
        [
            point is not None
            for point in [top_left, top_right, bottom_left, bottom_right]
        ]
    ):
        raise TauluException("the lines around this cell do not intersect")

    return top_left, top_right, bottom_right, bottom_left  # type:ignore

Return points (x,y) that make up a polygon around the requested cell (top left, top right, bottom right, bottom left)

def cell_width(self, i: int) ‑> int

Expand source code

def cell_width(self, i: int) -> int:
    self._check_col_idx(i)
    return int(self._v_rules[i + 1]._x - self._v_rules[i]._x)

def cell_widths(self, start: int = 0) ‑> list[int]

Expand source code

def cell_widths(self, start: int = 0) -> list[int]:
    return [self.cell_width(i) for i in range(start, self.cols)]

def intersection(self, index: tuple[int, int]) ‑> tuple[float, float]

Expand source code

def intersection(self, index: tuple[int, int]) -> tuple[float, float]:
    """
    Returns the interaction of the index[0]th horizontal rule and the
    index[1]th vertical rule
    """

    ints = self._h_rules[index[0]].intersection(self._v_rules[index[1]])
    assert ints is not None
    return ints

Returns the interaction of the index[0]th horizontal rule and the index[1]th vertical rule

def save(self, path: os.PathLike[str])

Expand source code

@log_calls(level=logging.DEBUG)
def save(self, path: PathLike[str]):
    """
    Save the HeaderTemplate to the given path, as a json
    """

    data = {"rules": [r.to_dict() for r in self._rules]}

    with open(path, "w") as f:
        json.dump(data, f)

Save the HeaderTemplate to the given path, as a json

Inherited members

TableIndexer:
- crop_region
- region
- text_regions

class Split (left: ~T | None = None, right: ~T | None = None)

Expand source code

class Split(Generic[T]):
    """Wrapper for data that has both a left and a right variant"""

    def __init__(self, left: T | None = None, right: T | None = None):
        self._left = left
        self._right = right

    @property
    def left(self) -> T:
        assert self._left is not None
        return self._left

    @left.setter
    def left(self, value: T):
        self._left = value

    @property
    def right(self) -> T:
        assert self._right is not None
        return self._right

    @right.setter
    def right(self, value: T):
        self._right = value

    def append(self, value: T):
        if self._left is None:
            self._left = value
        else:
            self._right = value

    def __repr__(self) -> str:
        return f"left: {self._left}, right: {self._right}"

    def __iter__(self):
        assert self._left is not None
        assert self._right is not None
        return iter((self._left, self._right))

    def __getitem__(self, index: bool) -> T:
        assert self._left is not None
        assert self._right is not None
        if int(index) == 0:
            return self._left
        else:
            return self._right

    def apply(
        self,
        funcs: "Split[Callable[[T, *Any], V]] | Callable[[T, *Any], V]",
        *args,
        **kwargs,
    ) -> "Split[V]":
        if not isinstance(funcs, Split):
            funcs = Split(funcs, funcs)

        def get_arg(side: str, arg):
            if isinstance(arg, Split):
                return getattr(arg, side)
            return arg

        def call(side: str):
            func = getattr(funcs, side)
            target = getattr(self, side)

            side_args = [get_arg(side, arg) for arg in args]
            side_kwargs = {k: get_arg(side, v) for k, v in kwargs.items()}

            return func(target, *side_args, **side_kwargs)

        return Split(call("left"), call("right"))

    def __getattr__(self, attr_name: str):
        if attr_name in self.__dict__:
            return getattr(self, attr_name)

        def wrapper(*args, **kwargs):
            return self.apply(
                Split(
                    getattr(self.left.__class__, attr_name),
                    getattr(self.right.__class__, attr_name),
                ),
                *args,
                **kwargs,
            )

        return wrapper

Wrapper for data that has both a left and a right variant

Ancestors

typing.Generic

Instance variables

prop left : ~T

Expand source code

@property
def left(self) -> T:
    assert self._left is not None
    return self._left

prop right : ~T

Expand source code

@property
def right(self) -> T:
    assert self._right is not None
    return self._right

Methods

def append(self, value: ~T)

Expand source code

def append(self, value: T):
    if self._left is None:
        self._left = value
    else:
        self._right = value

def apply(self, funcs: Split[Callable[[T, *Any], V]] | Callable[[T, *Any], V], *args, **kwargs) ‑> Split[V]

Expand source code

def apply(
    self,
    funcs: "Split[Callable[[T, *Any], V]] | Callable[[T, *Any], V]",
    *args,
    **kwargs,
) -> "Split[V]":
    if not isinstance(funcs, Split):
        funcs = Split(funcs, funcs)

    def get_arg(side: str, arg):
        if isinstance(arg, Split):
            return getattr(arg, side)
        return arg

    def call(side: str):
        func = getattr(funcs, side)
        target = getattr(self, side)

        side_args = [get_arg(side, arg) for arg in args]
        side_kwargs = {k: get_arg(side, v) for k, v in kwargs.items()}

        return func(target, *side_args, **side_kwargs)

    return Split(call("left"), call("right"))

class TableGrid (points: list[list[typing.Tuple[int, int]]], right_offset: int | None = None)

Expand source code

class TableGrid(TableIndexer):
    """
    A data class that allows segmenting the image into cells
    """

    _right_offset: int | None = None

    def __init__(self, points: list[list[Point]], right_offset: Optional[int] = None):
        """
        Args:
            points: a 2D list of intersections between hor. and vert. rules
        """
        self._points = points
        self._right_offset = right_offset

    @property
    def points(self) -> list[list[Point]]:
        return self._points

    def row(self, i: int) -> list[Point]:
        assert 0 <= i and i < len(self._points)
        return self._points[i]

    @property
    def cols(self) -> int:
        if self._right_offset is not None:
            return len(self.row(0)) - 2
        else:
            return len(self.row(0)) - 1

    @property
    def rows(self) -> int:
        return len(self._points) - 1

    @staticmethod
    def from_split(
        split_grids: Split["TableGrid"], offsets: Split[Point]
    ) -> "TableGrid":
        """
        Convert two ``TableGrid`` objects into one, that is able to segment the original (non-cropped) image

        Args:
            split_grids (Split[TableGrid]): a Split of TableGrid objects of the left and right part of the table
            offsets (Split[tuple[int, int]]): a Split of the offsets in the image where the crop happened
        """

        def offset_points(points, offset):
            return [
                [(p[0] + offset[0], p[1] + offset[1]) for p in row] for row in points
            ]

        split_points = split_grids.apply(
            lambda grid, offset: offset_points(grid.points, offset), offsets
        )

        points = []

        rows = min(split_grids.left.rows, split_grids.right.rows)

        for row in range(rows + 1):
            row_points = []

            row_points.extend(split_points.left[row])
            row_points.extend(split_points.right[row])

            points.append(row_points)

        table_grid = TableGrid(points, split_grids.left.cols)

        return table_grid

    def save(self, path: str | Path):
        with open(path, "w") as f:
            json.dump({"points": self.points, "right_offset": self._right_offset}, f)

    @staticmethod
    def from_saved(path: str | Path) -> "TableGrid":
        with open(path, "r") as f:
            points = json.load(f)
            right_offset = points.get("right_offset", None)
            points = [[(p[0], p[1]) for p in pointes] for pointes in points["points"]]
            return TableGrid(points, right_offset)

    def add_left_col(self, width: int):
        for row in self._points:
            first = row[0]
            new_first = (first[0] - width, first[1])
            row.insert(0, new_first)

    def add_top_row(self, height: int):
        new_row = []
        for point in self._points[0]:
            new_row.append((point[0], point[1] - height))

        self.points.insert(0, new_row)

    def _surrounds(self, rect: list[Point], point: tuple[float, float]) -> bool:
        """point: x, y"""
        lt, rt, rb, lb = rect
        x, y = point

        top = _Rule(*lt, *rt)
        if top._y_at_x(x) > y:
            return False

        right = _Rule(*rt, *rb)
        if right._x_at_y(y) < x:
            return False

        bottom = _Rule(*lb, *rb)
        if bottom._y_at_x(x) < y:
            return False

        left = _Rule(*lb, *lt)
        if left._x_at_y(y) > x:
            return False

        return True

    def cell(self, point: tuple[float, float]) -> tuple[int, int]:
        for r in range(len(self._points) - 1):
            offset = 0
            for c in range(len(self.row(0)) - 1):
                if self._right_offset is not None and c == self._right_offset:
                    offset = -1
                    continue

                if self._surrounds(
                    [
                        self._points[r][c],
                        self._points[r][c + 1],
                        self._points[r + 1][c + 1],
                        self._points[r + 1][c],
                    ],
                    point,
                ):
                    return (r, c + offset)

        return (-1, -1)

    def cell_polygon(self, cell: tuple[int, int]) -> tuple[Point, Point, Point, Point]:
        r, c = cell

        self._check_row_idx(r)
        self._check_col_idx(c)

        if self._right_offset is not None and c >= self._right_offset:
            c = c + 1

        return (
            self._points[r][c],
            self._points[r][c + 1],
            self._points[r + 1][c + 1],
            self._points[r + 1][c],
        )

    def region(
        self, start: tuple[int, int], end: tuple[int, int]
    ) -> tuple[Point, Point, Point, Point]:
        r0, c0 = start
        r1, c1 = end

        self._check_row_idx(r0)
        self._check_row_idx(r1)
        self._check_col_idx(c0)
        self._check_col_idx(c1)

        if self._right_offset is not None and c0 >= self._right_offset:
            c0 = c0 + 1

        if self._right_offset is not None and c1 >= self._right_offset:
            c1 = c1 + 1

        lt = self._points[r0][c0]
        rt = self._points[r0][c1 + 1]
        rb = self._points[r1 + 1][c1 + 1]
        lb = self._points[r1 + 1][c0]

        return lt, rt, rb, lb

    def visualize_points(self, img: MatLike):
        """
        Draw the detected table points on the image for visual verification
        """
        import colorsys

        def clr(index, total_steps):
            hue = index / total_steps  # Normalized hue between 0 and 1
            r, g, b = colorsys.hsv_to_rgb(hue, 1.0, 1.0)
            return int(r * 255), int(g * 255), int(b * 255)

        for i, row in enumerate(self._points):
            for p in row:
                cv.circle(img, p, 4, clr(i, len(self._points)), -1)

        imu.show(img)

    def text_regions(
        self, img: MatLike, row: int, margin_x: int = 10, margin_y: int = -3
    ) -> list[tuple[tuple[int, int], tuple[int, int]]]:
        def vertical_rule_crop(row: int, col: int):
            self._check_col_idx(col)
            self._check_row_idx(row)

            if self._right_offset is not None and col >= self._right_offset:
                col = col + 1

            top = self._points[row][col]
            bottom = self._points[row + 1][col]

            left = int(min(top[0], bottom[0]))
            right = int(max(top[0], bottom[0]))

            return img[
                int(top[1]) - margin_y : int(bottom[1]) + margin_y,
                left - margin_x : right + margin_x,
            ]

        result = []

        start = None
        for col in range(self.cols):
            crop = vertical_rule_crop(row, col)
            text_over_score = imu.text_presence_score(crop)
            text_over = text_over_score > -0.10

            if not text_over:
                if start is not None:
                    result.append(((row, start), (row, col - 1)))
                start = col

        if start is not None:
            result.append(((row, start), (row, self.cols - 1)))

        return result

    def anneal(
        self, img: MatLike, look_distance_main: int = 3, look_distance_alt: int = 3
    ):
        # how far to look in the main direction of the line
        # that is currently being examined
        LOOK_MAIN = look_distance_main

        # how far to look in the perpendicular direction of the line
        # that is currently being examined
        LOOK_ALT = look_distance_alt

        def _left_at(col: int, offset: int = LOOK_ALT) -> int:
            if self._right_offset is not None and col > self._right_offset:
                return int(clamp(col - offset, self._right_offset + 1, self.cols + 1))
            else:
                return int(clamp(col - offset, 0, self.cols + 1))

        def _right_at(col: int, offset: int = LOOK_ALT) -> int:
            if self._right_offset is not None and col <= self._right_offset:
                return int(clamp(col + offset, 0, self._right_offset))
            else:
                return int(clamp(col + offset, 0, self.cols + 1))

        def _median_slope(index: Point) -> Optional[float]:
            (r, c) = index

            left = _left_at(c)
            right = _right_at(c)

            if left == right:
                return None

            lines = []
            for row in range(r - LOOK_MAIN, r + LOOK_MAIN):
                if row < 0 or row == r or row >= len(self.points):
                    continue

                left_point = self.points[row][int(left)]
                right_point = self.points[row][int(right)]

                lines.append((left_point, right_point))

            return _core_median_slope(lines)

        new_points = []
        for row in self.points:
            new_points.append(row.copy())

        for row in range(len(self.points)):
            for col in range(len(self.points[0])):
                slope = _median_slope((row, col))

                if slope is None:
                    continue

                left = _left_at(col, 1)
                left_point = self.points[row][int(left)]

                right = _right_at(col, 1)
                right_point = self.points[row][int(right)]

                # img_ = np.copy(img)
                # # draw a line through the left point with that slope
                # cv.line(
                #     img_,
                #     (int(left_point[0]), int(left_point[1])),
                #     (
                #         int(right_point[0]),
                #         int(slope * (right_point[0] - left_point[0]) + left_point[1]),
                #     ),
                #     (0, 255, 0),
                #     3,
                #     cv.LINE_AA,
                # )
                # imu.show(img_)

                # extrapolate left point to this points x coordinate
                new_y = (
                    slope * (self.points[row][col][0] - left_point[0]) + left_point[1]
                )

                new_y = (
                    new_y / 2
                    + (
                        slope * (right_point[0] - self.points[row][col][0])
                        + right_point[1]
                    )
                    / 2
                )

                movement = new_y - self.points[row][col][1]

                new_points[row][col] = (
                    self.points[row][col][0],
                    self.points[row][col][1] + movement * 0.8,
                )

        self._points = new_points

A data class that allows segmenting the image into cells

Args

points: a 2D list of intersections between hor. and vert. rules

Ancestors

TableIndexer
abc.ABC

Static methods

def from_saved(path: str | pathlib.Path) ‑> TableGrid

Expand source code

@staticmethod
def from_saved(path: str | Path) -> "TableGrid":
    with open(path, "r") as f:
        points = json.load(f)
        right_offset = points.get("right_offset", None)
        points = [[(p[0], p[1]) for p in pointes] for pointes in points["points"]]
        return TableGrid(points, right_offset)

def from_split(split_grids: Split[ForwardRef('TableGrid')], offsets: Split[typing.Tuple[int, int]]) ‑> TableGrid

Expand source code

@staticmethod
def from_split(
    split_grids: Split["TableGrid"], offsets: Split[Point]
) -> "TableGrid":
    """
    Convert two ``TableGrid`` objects into one, that is able to segment the original (non-cropped) image

    Args:
        split_grids (Split[TableGrid]): a Split of TableGrid objects of the left and right part of the table
        offsets (Split[tuple[int, int]]): a Split of the offsets in the image where the crop happened
    """

    def offset_points(points, offset):
        return [
            [(p[0] + offset[0], p[1] + offset[1]) for p in row] for row in points
        ]

    split_points = split_grids.apply(
        lambda grid, offset: offset_points(grid.points, offset), offsets
    )

    points = []

    rows = min(split_grids.left.rows, split_grids.right.rows)

    for row in range(rows + 1):
        row_points = []

        row_points.extend(split_points.left[row])
        row_points.extend(split_points.right[row])

        points.append(row_points)

    table_grid = TableGrid(points, split_grids.left.cols)

    return table_grid

Convert two TableGrid objects into one, that is able to segment the original (non-cropped) image

Args

split_grids : Split[TableGrid]: a Split of TableGrid objects of the left and right part of the table
offsets : Split[tuple[int, int]]: a Split of the offsets in the image where the crop happened

Instance variables

prop cols : int

Expand source code

@property
def cols(self) -> int:
    if self._right_offset is not None:
        return len(self.row(0)) - 2
    else:
        return len(self.row(0)) - 1

prop points : list[list[typing.Tuple[int, int]]]

Expand source code

@property
def points(self) -> list[list[Point]]:
    return self._points

prop rows : int

Expand source code

@property
def rows(self) -> int:
    return len(self._points) - 1

Methods

def add_left_col(self, width: int)

Expand source code

def add_left_col(self, width: int):
    for row in self._points:
        first = row[0]
        new_first = (first[0] - width, first[1])
        row.insert(0, new_first)

def add_top_row(self, height: int)

Expand source code

def add_top_row(self, height: int):
    new_row = []
    for point in self._points[0]:
        new_row.append((point[0], point[1] - height))

    self.points.insert(0, new_row)

def anneal(self, img: cv2.Mat | numpy.ndarray, look_distance_main: int = 3, look_distance_alt: int = 3)

Expand source code

def anneal(
    self, img: MatLike, look_distance_main: int = 3, look_distance_alt: int = 3
):
    # how far to look in the main direction of the line
    # that is currently being examined
    LOOK_MAIN = look_distance_main

    # how far to look in the perpendicular direction of the line
    # that is currently being examined
    LOOK_ALT = look_distance_alt

    def _left_at(col: int, offset: int = LOOK_ALT) -> int:
        if self._right_offset is not None and col > self._right_offset:
            return int(clamp(col - offset, self._right_offset + 1, self.cols + 1))
        else:
            return int(clamp(col - offset, 0, self.cols + 1))

    def _right_at(col: int, offset: int = LOOK_ALT) -> int:
        if self._right_offset is not None and col <= self._right_offset:
            return int(clamp(col + offset, 0, self._right_offset))
        else:
            return int(clamp(col + offset, 0, self.cols + 1))

    def _median_slope(index: Point) -> Optional[float]:
        (r, c) = index

        left = _left_at(c)
        right = _right_at(c)

        if left == right:
            return None

        lines = []
        for row in range(r - LOOK_MAIN, r + LOOK_MAIN):
            if row < 0 or row == r or row >= len(self.points):
                continue

            left_point = self.points[row][int(left)]
            right_point = self.points[row][int(right)]

            lines.append((left_point, right_point))

        return _core_median_slope(lines)

    new_points = []
    for row in self.points:
        new_points.append(row.copy())

    for row in range(len(self.points)):
        for col in range(len(self.points[0])):
            slope = _median_slope((row, col))

            if slope is None:
                continue

            left = _left_at(col, 1)
            left_point = self.points[row][int(left)]

            right = _right_at(col, 1)
            right_point = self.points[row][int(right)]

            # img_ = np.copy(img)
            # # draw a line through the left point with that slope
            # cv.line(
            #     img_,
            #     (int(left_point[0]), int(left_point[1])),
            #     (
            #         int(right_point[0]),
            #         int(slope * (right_point[0] - left_point[0]) + left_point[1]),
            #     ),
            #     (0, 255, 0),
            #     3,
            #     cv.LINE_AA,
            # )
            # imu.show(img_)

            # extrapolate left point to this points x coordinate
            new_y = (
                slope * (self.points[row][col][0] - left_point[0]) + left_point[1]
            )

            new_y = (
                new_y / 2
                + (
                    slope * (right_point[0] - self.points[row][col][0])
                    + right_point[1]
                )
                / 2
            )

            movement = new_y - self.points[row][col][1]

            new_points[row][col] = (
                self.points[row][col][0],
                self.points[row][col][1] + movement * 0.8,
            )

    self._points = new_points

def row(self, i: int) ‑> list[typing.Tuple[int, int]]

Expand source code

def row(self, i: int) -> list[Point]:
    assert 0 <= i and i < len(self._points)
    return self._points[i]

def save(self, path: str | pathlib.Path)

Expand source code

def save(self, path: str | Path):
    with open(path, "w") as f:
        json.dump({"points": self.points, "right_offset": self._right_offset}, f)

def visualize_points(self, img: cv2.Mat | numpy.ndarray)

Expand source code

def visualize_points(self, img: MatLike):
    """
    Draw the detected table points on the image for visual verification
    """
    import colorsys

    def clr(index, total_steps):
        hue = index / total_steps  # Normalized hue between 0 and 1
        r, g, b = colorsys.hsv_to_rgb(hue, 1.0, 1.0)
        return int(r * 255), int(g * 255), int(b * 255)

    for i, row in enumerate(self._points):
        for p in row:
            cv.circle(img, p, 4, clr(i, len(self._points)), -1)

    imu.show(img)

Draw the detected table points on the image for visual verification

Inherited members

TableIndexer:
- cell
- cell_polygon
- crop_region
- region
- text_regions

class TableIndexer

Expand source code

class TableIndexer(ABC):
    """
    Subclasses implement methods for going from a pixel in the input image to a table cell index,
    and cropping an image to the given table cell index.
    """

    def __init__(self):
        self._col_offset = 0

    @property
    def col_offset(self) -> int:
        return self._col_offset

    @col_offset.setter
    def col_offset(self, value: int):
        assert value >= 0
        self._col_offset = value

    @property
    @abstractmethod
    def cols(self) -> int:
        pass

    @property
    @abstractmethod
    def rows(self) -> int:
        pass

    def cells(self) -> Generator[tuple[int, int], None, None]:
        for row in range(self.rows):
            for col in range(self.cols):
                yield (row, col)

    def _check_row_idx(self, row: int):
        if row < 0:
            raise TauluException("row number needs to be positive or zero")
        if row >= self.rows:
            raise TauluException(f"row number too high: {row} >= {self.rows}")

    def _check_col_idx(self, col: int):
        if col < 0:
            raise TauluException("col number needs to be positive or zero")
        if col >= self.cols:
            raise TauluException(f"col number too high: {col} >= {self.cols}")

    @abstractmethod
    def cell(self, point: tuple[float, float]) -> tuple[int, int]:
        """
        Returns the coordinate (row, col) of the cell that contains the given position

        Args:
            point (tuple[float, float]): a location in the input image

        Returns:
            tuple[int, int]: the cell index (row, col) that contains the given point
        """
        pass

    @abstractmethod
    def cell_polygon(
        self, cell: tuple[int, int]
    ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]:
        """returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position"""
        pass

    def _highlight_cell(
        self,
        image: MatLike,
        cell: tuple[int, int],
        color: tuple[int, int, int] = (0, 0, 255),
        thickness: int = 2,
    ):
        polygon = self.cell_polygon(cell)
        points = np.int32(list(polygon))  # type:ignore
        cv.polylines(image, [points], True, color, thickness, cv.LINE_AA)  # type:ignore
        cv.putText(
            image,
            str(cell),
            (int(polygon[3][0] + 10), int(polygon[3][1] - 10)),
            cv.FONT_HERSHEY_PLAIN,
            2.0,
            (255, 255, 255),
            2,
        )

    def highlight_all_cells(
        self,
        image: MatLike,
        color: tuple[int, int, int] = (0, 0, 255),
        thickness: int = 1,
    ) -> MatLike:
        img = np.copy(image)

        for cell in self.cells():
            self._highlight_cell(img, cell, color, thickness)

        return img

    def select_one_cell(
        self,
        image: MatLike,
        window: str = WINDOW,
        color: tuple[int, int, int] = (255, 0, 0),
        thickness: int = 2,
    ) -> tuple[int, int] | None:
        clicked = None

        def click_event(event, x, y, flags, params):
            nonlocal clicked

            img = np.copy(image)
            _ = flags
            _ = params
            if event == cv.EVENT_LBUTTONDOWN:
                cell = self.cell((x, y))
                if cell[0] >= 0:
                    clicked = cell
                else:
                    return
                self._highlight_cell(img, cell, color, thickness)
                cv.imshow(window, img)

        imu.show(image, click_event=click_event, title="select one cell", window=window)

        return clicked

    def show_cells(
        self, image: MatLike | os.PathLike[str] | str, window: str = WINDOW
    ) -> list[tuple[int, int]]:
        if not isinstance(image, np.ndarray):
            image = cv.imread(os.fspath(image))

        img = np.copy(image)

        cells = []

        def click_event(event, x, y, flags, params):
            _ = flags
            _ = params
            if event == cv.EVENT_LBUTTONDOWN:
                cell = self.cell((x, y))
                if cell[0] >= 0:
                    cells.append(cell)
                else:
                    return
                self._highlight_cell(img, cell)
                cv.imshow(window, img)

        imu.show(
            img,
            click_event=click_event,
            title="click to highlight cells",
            window=window,
        )

        return cells

    @abstractmethod
    def region(
        self,
        start: tuple[int, int],
        end: tuple[int, int],
    ) -> tuple[Point, Point, Point, Point]:
        """
        Get the bounding box for the rectangular region that goes from start to end

        Returns:
            4 points: lt, rt, rb, lb, in format (x, y)
        """
        pass

    def crop_region(
        self,
        image: MatLike,
        start: tuple[int, int],
        end: tuple[int, int],
        margin: int = 0,
        margin_top: int | None = None,
        margin_bottom: int | None = None,
        margin_left: int | None = None,
        margin_right: int | None = None,
        margin_y: int | None = None,
        margin_x: int | None = None,
    ) -> MatLike:
        """Crop the input image to a rectangular region with the start and end cells as extremes"""

        region = self.region(start, end)

        lt, rt, rb, lb = _apply_margin(
            *region,
            margin=margin,
            margin_top=margin_top,
            margin_bottom=margin_bottom,
            margin_left=margin_left,
            margin_right=margin_right,
            margin_y=margin_y,
            margin_x=margin_x,
        )

        # apply margins according to priority:
        # margin_top > margin_y > margin (etc.)

        w = (rt[0] - lt[0] + rb[0] - lb[0]) / 2
        h = (rb[1] - rt[1] + lb[1] - lt[1]) / 2

        # crop by doing a perspective transform to the desired quad
        src_pts = np.array([lt, rt, rb, lb], dtype="float32")
        dst_pts = np.array([[0, 0], [w, 0], [w, h], [0, h]], dtype="float32")
        M = cv.getPerspectiveTransform(src_pts, dst_pts)
        warped = cv.warpPerspective(image, M, (int(w), int(h)))  # type:ignore

        return warped

    @abstractmethod
    def text_regions(
        self, img: MatLike, row: int, margin_x: int = 0, margin_y: int = 0
    ) -> list[tuple[tuple[int, int], tuple[int, int]]]:
        """
        Split the row into regions of continuous text

        Returns
            list[tuple[int, int]]: a list of spans (start col, end col)
        """

        pass

    def crop_cell(self, image, cell: tuple[int, int], margin: int = 0) -> MatLike:
        return self.crop_region(image, cell, cell, margin)

Subclasses implement methods for going from a pixel in the input image to a table cell index, and cropping an image to the given table cell index.

Ancestors

abc.ABC

Subclasses

Instance variables

prop col_offset : int

Expand source code

@property
def col_offset(self) -> int:
    return self._col_offset

prop cols : int

Expand source code

@property
@abstractmethod
def cols(self) -> int:
    pass

prop rows : int

Expand source code

@property
@abstractmethod
def rows(self) -> int:
    pass

Methods

def cell(self, point: tuple[float, float]) ‑> tuple[int, int]

Expand source code

@abstractmethod
def cell(self, point: tuple[float, float]) -> tuple[int, int]:
    """
    Returns the coordinate (row, col) of the cell that contains the given position

    Args:
        point (tuple[float, float]): a location in the input image

    Returns:
        tuple[int, int]: the cell index (row, col) that contains the given point
    """
    pass

Returns the coordinate (row, col) of the cell that contains the given position

Args

point : tuple[float, float]: a location in the input image

Returns

tuple[int, int]: the cell index (row, col) that contains the given point

def cell_polygon(self, cell: tuple[int, int]) ‑> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]

Expand source code

@abstractmethod
def cell_polygon(
    self, cell: tuple[int, int]
) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]:
    """returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position"""
    pass

returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position

def cells(self) ‑> Generator[tuple[int, int], None, None]

Expand source code

def cells(self) -> Generator[tuple[int, int], None, None]:
    for row in range(self.rows):
        for col in range(self.cols):
            yield (row, col)

def crop_cell(self, image, cell: tuple[int, int], margin: int = 0) ‑> cv2.Mat | numpy.ndarray

Expand source code

def crop_cell(self, image, cell: tuple[int, int], margin: int = 0) -> MatLike:
    return self.crop_region(image, cell, cell, margin)

Expand source code

def crop_region(
    self,
    image: MatLike,
    start: tuple[int, int],
    end: tuple[int, int],
    margin: int = 0,
    margin_top: int | None = None,
    margin_bottom: int | None = None,
    margin_left: int | None = None,
    margin_right: int | None = None,
    margin_y: int | None = None,
    margin_x: int | None = None,
) -> MatLike:
    """Crop the input image to a rectangular region with the start and end cells as extremes"""

    region = self.region(start, end)

    lt, rt, rb, lb = _apply_margin(
        *region,
        margin=margin,
        margin_top=margin_top,
        margin_bottom=margin_bottom,
        margin_left=margin_left,
        margin_right=margin_right,
        margin_y=margin_y,
        margin_x=margin_x,
    )

    # apply margins according to priority:
    # margin_top > margin_y > margin (etc.)

    w = (rt[0] - lt[0] + rb[0] - lb[0]) / 2
    h = (rb[1] - rt[1] + lb[1] - lt[1]) / 2

    # crop by doing a perspective transform to the desired quad
    src_pts = np.array([lt, rt, rb, lb], dtype="float32")
    dst_pts = np.array([[0, 0], [w, 0], [w, h], [0, h]], dtype="float32")
    M = cv.getPerspectiveTransform(src_pts, dst_pts)
    warped = cv.warpPerspective(image, M, (int(w), int(h)))  # type:ignore

    return warped

Crop the input image to a rectangular region with the start and end cells as extremes

def highlight_all_cells(self, image: cv2.Mat | numpy.ndarray, color: tuple[int, int, int] = (0, 0, 255), thickness: int = 1) ‑> cv2.Mat | numpy.ndarray

Expand source code

def highlight_all_cells(
    self,
    image: MatLike,
    color: tuple[int, int, int] = (0, 0, 255),
    thickness: int = 1,
) -> MatLike:
    img = np.copy(image)

    for cell in self.cells():
        self._highlight_cell(img, cell, color, thickness)

    return img

def region(self, start: tuple[int, int], end: tuple[int, int]) ‑> tuple[typing.Tuple[int, int], typing.Tuple[int, int], typing.Tuple[int, int], typing.Tuple[int, int]]

Expand source code

@abstractmethod
def region(
    self,
    start: tuple[int, int],
    end: tuple[int, int],
) -> tuple[Point, Point, Point, Point]:
    """
    Get the bounding box for the rectangular region that goes from start to end

    Returns:
        4 points: lt, rt, rb, lb, in format (x, y)
    """
    pass

Get the bounding box for the rectangular region that goes from start to end

Returns

4 points: lt, rt, rb, lb, in format (x, y)

def select_one_cell(self, image: cv2.Mat | numpy.ndarray, window: str = 'taulu', color: tuple[int, int, int] = (255, 0, 0), thickness: int = 2) ‑> tuple[int, int] | None

Expand source code

def select_one_cell(
    self,
    image: MatLike,
    window: str = WINDOW,
    color: tuple[int, int, int] = (255, 0, 0),
    thickness: int = 2,
) -> tuple[int, int] | None:
    clicked = None

    def click_event(event, x, y, flags, params):
        nonlocal clicked

        img = np.copy(image)
        _ = flags
        _ = params
        if event == cv.EVENT_LBUTTONDOWN:
            cell = self.cell((x, y))
            if cell[0] >= 0:
                clicked = cell
            else:
                return
            self._highlight_cell(img, cell, color, thickness)
            cv.imshow(window, img)

    imu.show(image, click_event=click_event, title="select one cell", window=window)

    return clicked

def show_cells(self, image: cv2.Mat | numpy.ndarray | os.PathLike[str] | str, window: str = 'taulu') ‑> list[tuple[int, int]]

Expand source code

def show_cells(
    self, image: MatLike | os.PathLike[str] | str, window: str = WINDOW
) -> list[tuple[int, int]]:
    if not isinstance(image, np.ndarray):
        image = cv.imread(os.fspath(image))

    img = np.copy(image)

    cells = []

    def click_event(event, x, y, flags, params):
        _ = flags
        _ = params
        if event == cv.EVENT_LBUTTONDOWN:
            cell = self.cell((x, y))
            if cell[0] >= 0:
                cells.append(cell)
            else:
                return
            self._highlight_cell(img, cell)
            cv.imshow(window, img)

    imu.show(
        img,
        click_event=click_event,
        title="click to highlight cells",
        window=window,
    )

    return cells

def text_regions(self, img: cv2.Mat | numpy.ndarray, row: int, margin_x: int = 0, margin_y: int = 0) ‑> list[tuple[tuple[int, int], tuple[int, int]]]

Expand source code

@abstractmethod
def text_regions(
    self, img: MatLike, row: int, margin_x: int = 0, margin_y: int = 0
) -> list[tuple[tuple[int, int], tuple[int, int]]]:
    """
    Split the row into regions of continuous text

    Returns
        list[tuple[int, int]]: a list of spans (start col, end col)
    """

    pass

Split the row into regions of continuous text

Returns list[tuple[int, int]]: a list of spans (start col, end col)

class Taulu (header_path: os.PathLike[str] | str | Tuple[os.PathLike[str] | str, os.PathLike[str] | str], sauvola_k: float = 0.25, search_region: int = 60, distance_penalty: float = 0.4, cross_width: int = 10, morph_size: int = 4, kernel_size: int = 41, processing_scale: float = 1.0, min_rows: int = 5, look_distance: int = 3, grow_threshold: float = 0.3)

Expand source code

class Taulu:
    """
    The Taulu class is a convenience class that hides the inner workings of taulu as much as possible.

    For more advanced use cases, it might be useful to implement the workflow directly yourself,
    in order to have control over the intermediate steps.
    """

    def __init__(
        self,
        header_path: PathLike[str]
        | str
        | Tuple[PathLike[str] | str, PathLike[str] | str],
        sauvola_k: float = 0.25,
        search_region: int = 60,
        distance_penalty: float = 0.4,
        cross_width: int = 10,
        morph_size: int = 4,
        kernel_size: int = 41,
        processing_scale: float = 1.0,
        min_rows: int = 5,
        look_distance: int = 3,
        grow_threshold: float = 0.3,
    ):
        self._processing_scale = processing_scale

        if isinstance(header_path, Tuple):
            header = Split(Path(header_path[0]), Path(header_path[1]))

            if not exists(header.left.with_suffix(".png")) or not exists(
                header.right.with_suffix(".png")
            ):
                raise TauluException("The header images you provided do not exist")
            if not exists(header.left.with_suffix(".json")) or not exists(
                header.right.with_suffix(".json")
            ):
                raise TauluException(
                    "You need to annotate the headers of your table first\n\nsee the Taulu.annotate method"
                )

            template_left = HeaderTemplate.from_saved(header.left.with_suffix(".json"))
            template_right = HeaderTemplate.from_saved(
                header.right.with_suffix(".json")
            )

            self._header = Split(
                cv2.imread(os.fspath(header.left)), cv2.imread(os.fspath(header.right))
            )

            self._aligner = Split(
                HeaderAligner(self._header.left, scale=self._processing_scale),
                HeaderAligner(self._header.right, scale=self._processing_scale),
            )

            self._template = Split(template_left, template_right)

        else:
            header_path = Path(header_path)
            self._header = cv2.imread(os.fspath(header_path))
            self._aligner = HeaderAligner(self._header)
            self._template = HeaderTemplate.from_saved(header_path.with_suffix(".json"))

        # TODO: currently, these parameters are fixed and optimized for the example
        #       image specifically (which is probably a good starting point,
        #       espeicially after normalizing the image size)
        self._grid_detector = GridDetector(
            kernel_size=kernel_size,
            cross_width=cross_width,
            morph_size=morph_size,
            search_region=search_region,
            sauvola_k=sauvola_k,
            distance_penalty=distance_penalty,
            scale=self._processing_scale,
            min_rows=min_rows,
            look_distance=look_distance,
            grow_threshold=grow_threshold,
        )

        if isinstance(self._template, Split):
            self._grid_detector = Split(self._grid_detector, self._grid_detector)

    @staticmethod
    def annotate(image_path: PathLike[str] | str, output_path: PathLike[str] | str):
        """
        Annotate the header of a table image.

        Saves the annotated header image and a json file containing the
        header template to the output path.

        Args:
            image_path (PathLike[str]): the path of the image which you want to annotate
            output_path (PathLike[str]): the path where the output files should go (image files and json files)
        """

        if not exists(image_path):
            raise TauluException(f"Image path {image_path} does not exist")

        if os.path.isdir(output_path):
            raise TauluException("Output path should be a file")

        output_path = Path(output_path)

        template = HeaderTemplate.annotate_image(
            os.fspath(image_path), crop=output_path.with_suffix(".png")
        )

        template.save(output_path.with_suffix(".json"))

    # TODO: check if PathLike works like this
    # TODO: get rid of cell_height and make this part of the header template
    def segment_table(
        self,
        image: MatLike | PathLike[str] | str,
        cell_height_factor: float | List[float] | Dict[str, float | List[float]],
        debug_view: bool = False,
    ) -> TableGrid:
        """
        Main function of the class, segmenting the input image into cells.

        Returns a TableGrid object, which has methods with which you can find
        the location of cells in the table

        Args:
            image (MatLike | PathLike[str]): The image to segment (path or np.ndarray)

            cell_height_factor (float | list[float] | dict[str, float | list[float]]): The height factor of a row. This factor is the fraction of the header height each row is.
                If your header has height 12 and your rows are of height 8, you should pass 8/12 as this argument.
                Also accepts a list of heights, useful if your row heights are not constant (often, the first row is
                higher than the others). The last entry in the list is used repeatedly when there are more
                rows in the image than there are entries in your list.

                By passing a dictionary with keys "left" and "right", you can specify a different cell_height_factor
                for the different sides of your table.

            debug_view (bool): By setting this setting to True, an OpenCV window will open and show the results of intermediate steps.
                Press `n` for advancing to the next image, and `q` to quit.
        """

        if not isinstance(image, MatLike):
            image = cv2.imread(os.fspath(image))

        # TODO: perform checks on the image

        now = perf_counter()
        h = self._aligner.align(image, visual=debug_view)
        align_time = perf_counter() - now
        logger.info(f"Header alignment took {align_time:.2f} seconds")

        # find the starting point for the table grid algorithm
        left_top_template = self._template.intersection((1, 0))
        if isinstance(left_top_template, Split):
            left_top_template = Split(
                (int(left_top_template.left[0]), int(left_top_template.left[1])),
                (int(left_top_template.right[0]), int(left_top_template.right[1])),
            )
        else:
            left_top_template = (int(left_top_template[0]), int(left_top_template[1]))

        left_top_table = self._aligner.template_to_img(h, left_top_template)

        if isinstance(cell_height_factor, dict):
            if not isinstance(self._template, Split):
                raise TauluException(
                    "You provided a cell_height_factor dictionary, but the header is not a Split"
                )
            if "left" not in cell_height_factor or "right" not in cell_height_factor:
                raise TauluException(
                    "When providing a cell_height_factor dictionary, it should contain both 'left' and 'right' keys"
                )
            cell_heights = Split(
                self._template.left.cell_heights(cell_height_factor.get("left", 1.0)),
                self._template.right.cell_heights(cell_height_factor.get("right", 1.0)),
            )
        else:
            cell_heights = self._template.cell_heights(cell_height_factor)

        now = perf_counter()
        table = self._grid_detector.find_table_points(
            image,
            left_top_table,
            self._template.cell_widths(0),
            cell_heights,
            visual=debug_view,
        )
        grid_time = perf_counter() - now
        logger.info(f"Grid detection took {grid_time:.2f} seconds")

        if isinstance(table, Split):
            table = TableGrid.from_split(table, (0, 0))

        return table

The Taulu class is a convenience class that hides the inner workings of taulu as much as possible.

For more advanced use cases, it might be useful to implement the workflow directly yourself, in order to have control over the intermediate steps.

Static methods

def annotate(image_path: os.PathLike[str] | str, output_path: os.PathLike[str] | str)

Expand source code

@staticmethod
def annotate(image_path: PathLike[str] | str, output_path: PathLike[str] | str):
    """
    Annotate the header of a table image.

    Saves the annotated header image and a json file containing the
    header template to the output path.

    Args:
        image_path (PathLike[str]): the path of the image which you want to annotate
        output_path (PathLike[str]): the path where the output files should go (image files and json files)
    """

    if not exists(image_path):
        raise TauluException(f"Image path {image_path} does not exist")

    if os.path.isdir(output_path):
        raise TauluException("Output path should be a file")

    output_path = Path(output_path)

    template = HeaderTemplate.annotate_image(
        os.fspath(image_path), crop=output_path.with_suffix(".png")
    )

    template.save(output_path.with_suffix(".json"))

Annotate the header of a table image.

Saves the annotated header image and a json file containing the header template to the output path.

Args

image_path : PathLike[str]: the path of the image which you want to annotate
output_path : PathLike[str]: the path where the output files should go (image files and json files)

Methods

Expand source code

def segment_table(
    self,
    image: MatLike | PathLike[str] | str,
    cell_height_factor: float | List[float] | Dict[str, float | List[float]],
    debug_view: bool = False,
) -> TableGrid:
    """
    Main function of the class, segmenting the input image into cells.

    Returns a TableGrid object, which has methods with which you can find
    the location of cells in the table

    Args:
        image (MatLike | PathLike[str]): The image to segment (path or np.ndarray)

        cell_height_factor (float | list[float] | dict[str, float | list[float]]): The height factor of a row. This factor is the fraction of the header height each row is.
            If your header has height 12 and your rows are of height 8, you should pass 8/12 as this argument.
            Also accepts a list of heights, useful if your row heights are not constant (often, the first row is
            higher than the others). The last entry in the list is used repeatedly when there are more
            rows in the image than there are entries in your list.

            By passing a dictionary with keys "left" and "right", you can specify a different cell_height_factor
            for the different sides of your table.

        debug_view (bool): By setting this setting to True, an OpenCV window will open and show the results of intermediate steps.
            Press `n` for advancing to the next image, and `q` to quit.
    """

    if not isinstance(image, MatLike):
        image = cv2.imread(os.fspath(image))

    # TODO: perform checks on the image

    now = perf_counter()
    h = self._aligner.align(image, visual=debug_view)
    align_time = perf_counter() - now
    logger.info(f"Header alignment took {align_time:.2f} seconds")

    # find the starting point for the table grid algorithm
    left_top_template = self._template.intersection((1, 0))
    if isinstance(left_top_template, Split):
        left_top_template = Split(
            (int(left_top_template.left[0]), int(left_top_template.left[1])),
            (int(left_top_template.right[0]), int(left_top_template.right[1])),
        )
    else:
        left_top_template = (int(left_top_template[0]), int(left_top_template[1]))

    left_top_table = self._aligner.template_to_img(h, left_top_template)

    if isinstance(cell_height_factor, dict):
        if not isinstance(self._template, Split):
            raise TauluException(
                "You provided a cell_height_factor dictionary, but the header is not a Split"
            )
        if "left" not in cell_height_factor or "right" not in cell_height_factor:
            raise TauluException(
                "When providing a cell_height_factor dictionary, it should contain both 'left' and 'right' keys"
            )
        cell_heights = Split(
            self._template.left.cell_heights(cell_height_factor.get("left", 1.0)),
            self._template.right.cell_heights(cell_height_factor.get("right", 1.0)),
        )
    else:
        cell_heights = self._template.cell_heights(cell_height_factor)

    now = perf_counter()
    table = self._grid_detector.find_table_points(
        image,
        left_top_table,
        self._template.cell_widths(0),
        cell_heights,
        visual=debug_view,
    )
    grid_time = perf_counter() - now
    logger.info(f"Grid detection took {grid_time:.2f} seconds")

    if isinstance(table, Split):
        table = TableGrid.from_split(table, (0, 0))

    return table

Main function of the class, segmenting the input image into cells.

Returns a TableGrid object, which has methods with which you can find the location of cells in the table

Args

image : MatLike | PathLike[str]

The image to segment (path or np.ndarray)

cell_height_factor : float | list[float] | dict[str, float | list[float]]

The height factor of a row. This factor is the fraction of the header height each row is. If your header has height 12 and your rows are of height 8, you should pass 8/12 as this argument. Also accepts a list of heights, useful if your row heights are not constant (often, the first row is higher than the others). The last entry in the list is used repeatedly when there are more rows in the image than there are entries in your list.

By passing a dictionary with keys "left" and "right", you can specify a different cell_height_factor for the different sides of your table.

debug_view : bool

By setting this setting to True, an OpenCV window will open and show the results of intermediate steps. Press n for advancing to the next image, and q to quit.