Package taulu
Taulu - segment tables from images
Taulu is a Python package designed to segment images of tables into their constituent rows and columns (and cells).
To use this package, you first need to make an annotation of the headers in your table images. The idea is that these headers will be similar across your full set of images, and they will be used as a starting point for the search algorithm that finds the table grid.
Here is an example python script of how to use Taulu:
from taulu import Taulu
import os
def setup():
# create an Annotation file of the headers in the image
# (one for the left header, one for the right)
# and store them in the examples directory
print("Annotating the LEFT header...")
Taulu.annotate("../data/table_00.png", "table_00_header_left.png")
print("Annotating the RIGHT header...")
Taulu.annotate("../data/table_00.png", "table_00_header_right.png")
def main():
taulu = Taulu(("table_00_header_left.png", "table_00_header_right.png"))
table = taulu.segment_table("../data/table_00.png", cell_height_factor=0.8, debug_view=True)
table.show_cells("../data/table_00.png")
if __name__ == "__main__":
if os.path.exists("table_00_header_left.png") and os.path.exists(
"table_00_header_right.png"
):
main()
else:
setup()
main()
If you want a high-level overview of how to use Taulu, see the Taulu class
Sub-modules
taulu.grid
-
Implements the grid finding algorithm, that is able to find the intersections of horizontal and vertical rules.
taulu.header_aligner
-
Header alignment functionality
taulu.header_template
-
A HeaderTemplate defines the structure of a table header.
taulu.split
-
A module that provides a Split class to handle data with left and right variants …
taulu.table_indexer
-
Defines an abstract class TableIndexer, which provides methods for mapping pixel coordinates in an image to table cell indices and for cropping images …
taulu.taulu
-
The Taulu class is a convenience class that hides the inner workings of taulu as much as possible.
Classes
class GridDetector (kernel_size: int = 21,
cross_width: int = 6,
cross_height: int | None = None,
morph_size: int | None = None,
sauvola_k: float = 0.04,
sauvola_window: int = 15,
scale: float = 1.0,
search_region: int = 40,
distance_penalty: float = 0.4,
min_rows: int = 5,
grow_threshold: float = 0.3,
look_distance: int = 4)-
Expand source code
class GridDetector: """ Implements a filters result in high activation where the image has an intersection of a vertical and horizontal rule, useful for finding the bounding boxes of cells. Also implements the search algorithm that uses the output of this filter to build a tabular structure of corner points (in row major order). """ def __init__( self, kernel_size: int = 21, cross_width: int = 6, cross_height: Optional[int] = None, morph_size: Optional[int] = None, sauvola_k: float = 0.04, sauvola_window: int = 15, scale: float = 1.0, search_region: int = 40, distance_penalty: float = 0.4, min_rows: int = 5, grow_threshold: float = 0.3, look_distance: int = 4, ): """ Args: kernel_size (int): the size of the cross kernel a larger kernel size often means that more penalty is applied, often leading to more sparse results cross_width (int): the width of one of the edges in the cross filter, should be roughly equal to the width of the rules in the image after morphology is applied cross_height (int | None): useful if the horizontal rules and vertical rules have different sizes morph_size (int | None): the size of the morphology operators that are applied before the cross kernel. 'bridges the gaps' of broken-up lines sauvola_k (float): threshold parameter for sauvola thresholding sauvola_window (int): window_size parameter for sauvola thresholding scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly) search_region (int): area in which to search for a new max value in `find_nearest` etc. distance_penalty (float): how much the point finding algorithm penalizes points that are further in the region [0, 1] min_rows (int): minimum number of rows to find before stopping the table finding algorithm grow_threshold (float): the threshold for accepting a new point when growing the table look_distance (int): how many points away to look when calculating the median slope """ self._validate_parameters( kernel_size, cross_width, cross_height, morph_size, search_region, sauvola_k, sauvola_window, distance_penalty, ) self._kernel_size = kernel_size self._cross_width = cross_width self._cross_height = cross_width if cross_height is None else cross_height self._morph_size = morph_size if morph_size is not None else cross_width self._search_region = search_region self._sauvola_k = sauvola_k self._sauvola_window = sauvola_window self._distance_penalty = distance_penalty self._scale = scale self._min_rows = min_rows self._grow_threshold = grow_threshold self._look_distance = look_distance self._cross_kernel = self._create_cross_kernel() def _validate_parameters( self, kernel_size: int, cross_width: int, cross_height: Optional[int], morph_size: Optional[int], search_region: int, sauvola_k: float, sauvola_window: int, distance_penalty: float, ) -> None: """Validate initialization parameters.""" if kernel_size % 2 == 0: raise ValueError("kernel_size must be odd") if ( kernel_size <= 0 or cross_width <= 0 or search_region <= 0 or sauvola_window <= 0 ): raise ValueError("Size parameters must be positive") if cross_height is not None and cross_height <= 0: raise ValueError("cross_height must be positive") if morph_size is not None and morph_size <= 0: raise ValueError("morph_size must be positive") if not 0 <= distance_penalty <= 1: raise ValueError("distance_penalty must be in [0, 1]") if sauvola_k <= 0: raise ValueError("sauvola_k must be positive") def _create_gaussian_weights(self, region_size: int) -> NDArray: """ Create a 2D Gaussian weight mask. Args: shape (tuple[int, int]): Shape of the region (height, width) p (float): Minimum value at the edge = 1 - p Returns: NDArray: Gaussian weight mask """ if self._distance_penalty == 0: return np.ones((region_size, region_size), dtype=np.float32) y = np.linspace(-1, 1, region_size) x = np.linspace(-1, 1, region_size) xv, yv = np.meshgrid(x, y) dist_squared = xv**2 + yv**2 # Prevent log(0) when distance_penalty is 1 if self._distance_penalty >= 0.999: sigma = 0.1 # Small sigma for very sharp peak else: sigma = np.sqrt(-1 / (2 * np.log(1 - self._distance_penalty))) weights = np.exp(-dist_squared / (2 * sigma**2)) return weights.astype(np.float32) def _create_cross_kernel(self) -> NDArray: kernel = np.zeros((self._kernel_size, self._kernel_size), dtype=np.uint8) center = self._kernel_size // 2 # Create horizontal bar h_start = max(0, center - self._cross_height // 2) h_end = min(self._kernel_size, center + (self._cross_height + 1) // 2) kernel[h_start:h_end, :] = 255 # Create vertical bar v_start = max(0, center - self._cross_width // 2) v_end = min(self._kernel_size, center + (self._cross_width + 1) // 2) kernel[:, v_start:v_end] = 255 return kernel def _apply_morphology(self, binary: MatLike) -> MatLike: # Define a horizontal kernel (adjust width as needed) kernel_hor = cv.getStructuringElement(cv.MORPH_RECT, (self._morph_size, 1)) kernel_ver = cv.getStructuringElement(cv.MORPH_RECT, (1, self._morph_size)) # Apply dilation dilated = cv.dilate(binary, kernel_hor, iterations=1) dilated = cv.dilate(dilated, kernel_ver, iterations=1) return dilated def _apply_cross_matching(self, img: MatLike) -> MatLike: """Apply cross kernel template matching.""" pad_y = self._cross_kernel.shape[0] // 2 pad_x = self._cross_kernel.shape[1] // 2 padded = cv.copyMakeBorder( img, pad_y, pad_y, pad_x, pad_x, borderType=cv.BORDER_CONSTANT, value=0 ) filtered = cv.matchTemplate(padded, self._cross_kernel, cv.TM_SQDIFF_NORMED) # Invert and normalize to 0-255 range filtered = cv.normalize(1.0 - filtered, None, 0, 255, cv.NORM_MINMAX) return filtered.astype(np.uint8) def apply(self, img: MatLike, visual: bool = False) -> MatLike: """ Apply the grid detection filter to the input image. Args: img (MatLike): the input image visual (bool): whether to show intermediate steps Returns: MatLike: the filtered image, with high values (whiter pixels) at intersections of horizontal and vertical rules """ if img is None or img.size == 0: raise ValueError("Input image is empty or None") binary = imu.sauvola(img, k=self._sauvola_k, window_size=self._sauvola_window) if visual: imu.show(binary, title="thresholded") binary = self._apply_morphology(binary) if visual: imu.show(binary, title="dilated") filtered = self._apply_cross_matching(binary) return filtered @log_calls(level=logging.DEBUG, include_return=True) def find_nearest( self, filtered: MatLike, point: Point, region: Optional[int] = None ) -> Tuple[Point, float]: """ Find the nearest 'corner match' in the image, along with its score [0,1] Args: filtered (MatLike): the filtered image (obtained through `apply`) point (tuple[int, int]): the approximate target point (x, y) region (None | int): alternative value for search region, overwriting the `__init__` parameter `region` """ if filtered is None or filtered.size == 0: raise ValueError("Filtered image is empty or None") region_size = region if region is not None else self._search_region x, y = point # Calculate crop boundaries crop_x = max(0, x - region_size // 2) crop_y = max(0, y - region_size // 2) crop_width = min(region_size, filtered.shape[1] - crop_x) crop_height = min(region_size, filtered.shape[0] - crop_y) # Handle edge cases if crop_width <= 0 or crop_height <= 0: logger.warning(f"Point {point} is outside image bounds") return point, 0.0 cropped = filtered[crop_y : crop_y + crop_height, crop_x : crop_x + crop_width] if cropped.size == 0: return point, 0.0 # Always apply Gaussian weighting by extending crop if needed if cropped.shape[0] == region_size and cropped.shape[1] == region_size: # Perfect size - apply weights directly weights = self._create_gaussian_weights(region_size) weighted = cropped.astype(np.float32) * weights else: # Extend crop to match region_size, apply weights, then restore extended = np.zeros((region_size, region_size), dtype=cropped.dtype) # Calculate offset to center the cropped region in extended array offset_y = (region_size - cropped.shape[0]) // 2 offset_x = (region_size - cropped.shape[1]) // 2 # Place cropped region in center of extended array extended[ offset_y : offset_y + cropped.shape[0], offset_x : offset_x + cropped.shape[1], ] = cropped # Apply Gaussian weights to extended array weights = self._create_gaussian_weights(region_size) weighted_extended = extended.astype(np.float32) * weights # Extract the original region back out weighted = weighted_extended[ offset_y : offset_y + cropped.shape[0], offset_x : offset_x + cropped.shape[1], ] best_idx = np.argmax(weighted) best_y, best_x = np.unravel_index(best_idx, cropped.shape) result_point = ( int(crop_x + best_x), int(crop_y + best_y), ) result_confidence = float(weighted[best_y, best_x]) / 255.0 return result_point, result_confidence def find_table_points( self, img: MatLike | PathLike[str], left_top: Point, cell_widths: list[int], cell_heights: list[int] | int, visual: bool = False, window: str = WINDOW, goals_width: Optional[int] = None, ) -> "TableGrid": """ Parse the image to a `TableGrid` structure that holds all of the intersections between horizontal and vertical rules, starting near the `left_top` point Args: img (MatLike): the input image of a table left_top (tuple[int, int]): the starting point of the algorithm cell_widths (list[int]): the expected widths of the cells (based on a header template) cell_heights (list[int]): the expected height of the rows of data. The last value from this list is used until the image has no more vertical space. visual (bool): whether to show intermediate steps window (str): the name of the OpenCV window to use for visualization goals_width (int | None): the width of the goal region when searching for the next point. If None, defaults to 1.5 * search_region Returns: a TableGrid object """ if goals_width is None: goals_width = self._search_region * 3 // 2 if not cell_widths: raise ValueError("cell_widths must contain at least one value") if not isinstance(img, np.ndarray): img = cv.imread(os.fspath(img)) filtered = self.apply(img, visual) if visual: imu.show(filtered, window=window) if isinstance(cell_heights, int): cell_heights = [cell_heights] left_top, confidence = self.find_nearest( filtered, left_top, int(self._search_region * 3) ) if confidence < 0.1: logger.warning( f"Low confidence for the starting point: {confidence} at {left_top}" ) # resize all parameters according to scale img = cv.resize(img, None, fx=self._scale, fy=self._scale) if visual: imu.push(img) filtered = cv.resize(filtered, None, fx=self._scale, fy=self._scale) cell_widths = [int(w * self._scale) for w in cell_widths] cell_heights = [int(h * self._scale) for h in cell_heights] left_top = (int(left_top[0] * self._scale), int(left_top[1] * self._scale)) self._search_region = int(self._search_region * self._scale) img_gray = ensure_gray(img) filtered_gray = ensure_gray(filtered) table_grower = TableGrower( img_gray, filtered_gray, cell_widths, # pyright: ignore cell_heights, # pyright: ignore left_top, self._search_region, self._distance_penalty, self._look_distance, self._grow_threshold, self._min_rows, ) def show_grower_progress(wait: bool = False): img_orig = np.copy(img) corners = table_grower.get_all_corners() for y in range(len(corners)): for x in range(len(corners[y])): if corners[y][x] is not None: img_orig = imu.draw_points( img_orig, [corners[y][x]], color=(0, 0, 255), thickness=30, ) edge = table_grower.get_edge_points() for point, score in edge: color = (100, int(clamp(score * 255, 0, 255)), 100) imu.draw_point(img_orig, point, color=color, thickness=20) imu.show(img_orig, wait=wait) if visual: threshold = self._grow_threshold look_distance = self._look_distance # python implementation of rust loops, for visualization purposes # note this is a LOT slower while table_grower.grow_point(img_gray, filtered_gray) is not None: show_grower_progress() show_grower_progress(True) original_threshold = threshold loops_without_change = 0 while not table_grower.is_table_complete(): loops_without_change += 1 if loops_without_change > 50: break if table_grower.extrapolate_one(img_gray, filtered_gray) is not None: show_grower_progress() loops_without_change = 0 grown = False while table_grower.grow_point(img_gray, filtered_gray) is not None: show_grower_progress() grown = True threshold = min(0.1 + 0.9 * threshold, original_threshold) table_grower.set_threshold(threshold) if not grown: threshold *= 0.9 table_grower.set_threshold(threshold) else: threshold *= 0.9 table_grower.set_threshold(threshold) if table_grower.grow_point(img_gray, filtered_gray) is not None: show_grower_progress() loops_without_change = 0 else: table_grower.grow_table(img_gray, filtered_gray) table_grower.smooth_grid() corners = table_grower.get_all_corners() logger.info( f"Table growth complete, found {len(corners)} rows and {len(corners[0])} columns" ) # rescale corners back to original size if self._scale != 1.0: for y in range(len(corners)): for x in range(len(corners[y])): if corners[y][x] is not None: corners[y][x] = ( int(corners[y][x][0] / self._scale), # pyright:ignore int(corners[y][x][1] / self._scale), # pyright:ignore ) return TableGrid(corners) # pyright: ignore @log_calls(level=logging.DEBUG, include_return=True) def _build_table_row( self, gray: MatLike, filtered: MatLike, start_point: Point, cell_widths: List[int], row_idx: int, goals_width: int, previous_row_points: Optional[List[Point]] = None, visual: bool = False, ) -> List[Point]: """Build a single row of table points.""" row = [start_point] current = start_point for col_idx, width in enumerate(cell_widths): next_point = self._find_next_column_point( gray, filtered, current, width, goals_width, visual, previous_row_points, col_idx, ) if next_point is None: logger.warning( f"Could not find point for row {row_idx}, col {col_idx + 1}" ) return [] # Return empty list to signal failure row.append(next_point) current = next_point return row def _clamp_point_to_img(self, point: Point, img: MatLike) -> Point: """Clamp a point to be within the image bounds.""" x = max(0, min(point[0], img.shape[1] - 1)) y = max(0, min(point[1], img.shape[0] - 1)) return (x, y) @log_calls(level=logging.DEBUG, include_return=True) def _find_next_column_point( self, gray: MatLike, filtered: MatLike, current: Point, width: int, goals_width: int, visual: bool = False, previous_row_points: Optional[List[Point]] = None, current_col_idx: int = 0, ) -> Optional[Point]: """Find the next point in the current row.""" if previous_row_points is not None and current_col_idx + 1 < len( previous_row_points ): # grow an astar path downwards from the previous row point that is # above and to the right of current # and ensure all points are within image bounds bottom_right = [ self._clamp_point_to_img( ( current[0] + width - goals_width // 2 + x, current[1] + goals_width, ), gray, ) for x in range(goals_width) ] goals = self._astar( gray, previous_row_points[current_col_idx + 1], bottom_right, "down" ) if goals is None: logger.warning( f"A* failed to find path going downwards from previous row's point at idx {current_col_idx + 1}" ) return None else: goals = [ self._clamp_point_to_img( (current[0] + width, current[1] - goals_width // 2 + y), gray ) for y in range(goals_width) ] path = self._astar(gray, current, goals, "right") if path is None: logger.warning( f"A* failed to find path going rightward from {current} to goals" ) return None next_point, _ = self.find_nearest(filtered, path[-1], self._search_region) # show the point and the search region on the image for debugging if visual: self._visualize_path_finding( goals + path, current, next_point, current, path[-1], self._search_region, ) return next_point @log_calls(level=logging.DEBUG, include_return=True) def _find_next_row_start( self, gray: MatLike, filtered: MatLike, top_point: Point, row_idx: int, cell_heights: List[int], goals_width: int, visual: bool = False, ) -> Optional[Point]: """Find the starting point of the next row.""" if row_idx < len(cell_heights): row_height = cell_heights[row_idx] else: row_height = cell_heights[-1] if top_point[1] + row_height >= filtered.shape[0] - 10: # Near bottom return None goals = [ (top_point[0] - goals_width // 2 + x, top_point[1] + row_height) for x in range(goals_width) ] path = self._astar(gray, top_point, goals, "down") if path is None: return None next_point, _ = self.find_nearest( filtered, path[-1], region=self._search_region * 3 // 2 ) if visual: self._visualize_path_finding( path, top_point, next_point, top_point, path[-1], self._search_region ) return next_point def _visualize_grid(self, img: MatLike, points: List[List[Point]]) -> None: """Visualize the detected grid points.""" all_points = [point for row in points for point in row] drawn = imu.draw_points(img, all_points) imu.show(drawn, wait=True) def _visualize_path_finding( self, path: List[Point], current: Point, next_point: Point, previous_row_target: Optional[Point] = None, region_center: Optional[Point] = None, region_size: Optional[int] = None, ) -> None: """Visualize the path finding process for debugging.""" global show_time screen = imu.pop() # if gray, convert to BGR if len(screen.shape) == 2 or screen.shape[2] == 1: debug_img = cv.cvtColor(screen, cv.COLOR_GRAY2BGR) else: debug_img = cast(MatLike, screen) debug_img = imu.draw_points(debug_img, path, color=(200, 200, 0), thickness=2) debug_img = imu.draw_points( debug_img, [current], color=(0, 255, 0), thickness=3 ) debug_img = imu.draw_points( debug_img, [next_point], color=(0, 0, 255), thickness=2 ) # Draw previous row target if available if previous_row_target is not None: debug_img = imu.draw_points( debug_img, [previous_row_target], color=(255, 0, 255), thickness=2 ) # Draw search region if available if region_center is not None and region_size is not None: top_left = ( max(0, region_center[0] - region_size // 2), max(0, region_center[1] - region_size // 2), ) bottom_right = ( min(debug_img.shape[1], region_center[0] + region_size // 2), min(debug_img.shape[0], region_center[1] + region_size // 2), ) cv.rectangle( debug_img, top_left, bottom_right, color=(255, 0, 0), thickness=2, lineType=cv.LINE_AA, ) imu.push(debug_img) show_time += 1 if show_time % 10 != 1: return imu.show(debug_img, title="Next column point", wait=False) # time.sleep(0.003) @log_calls(level=logging.DEBUG, include_return=True) def _astar( self, img: np.ndarray, start: tuple[int, int], goals: list[tuple[int, int]], direction: str, ) -> Optional[List[Point]]: """ Find the best path between the start point and one of the goal points on the image """ if not goals: return None if self._scale != 1.0: img = cv.resize(img, None, fx=self._scale, fy=self._scale) start = (int(start[0] * self._scale), int(start[1] * self._scale)) goals = [(int(g[0] * self._scale), int(g[1] * self._scale)) for g in goals] # calculate bounding box with margin all_points = goals + [start] xs = [p[0] for p in all_points] ys = [p[1] for p in all_points] margin = 30 top_left = (max(0, min(xs) - margin), max(0, min(ys) - margin)) bottom_right = ( min(img.shape[1], max(xs) + margin), min(img.shape[0], max(ys) + margin), ) # check bounds if ( top_left[0] >= bottom_right[0] or top_left[1] >= bottom_right[1] or top_left[0] >= img.shape[1] or top_left[1] >= img.shape[0] ): return None # transform coordinates to cropped image start_local = (start[0] - top_left[0], start[1] - top_left[1]) goals_local = [(g[0] - top_left[0], g[1] - top_left[1]) for g in goals] cropped = img[top_left[1] : bottom_right[1], top_left[0] : bottom_right[0]] if cropped.size == 0: return None path = rust_astar(cropped, start_local, goals_local, direction) if path is None: return None if self._scale != 1.0: path = [(int(p[0] / self._scale), int(p[1] / self._scale)) for p in path] top_left = (int(top_left[0] / self._scale), int(top_left[1] / self._scale)) return [(p[0] + top_left[0], p[1] + top_left[1]) for p in path]
Implements a filters result in high activation where the image has an intersection of a vertical and horizontal rule, useful for finding the bounding boxes of cells.
Also implements the search algorithm that uses the output of this filter to build a tabular structure of corner points (in row major order).
Args
kernel_size
:int
- the size of the cross kernel a larger kernel size often means that more penalty is applied, often leading to more sparse results
cross_width
:int
- the width of one of the edges in the cross filter, should be roughly equal to the width of the rules in the image after morphology is applied
cross_height
:int | None
- useful if the horizontal rules and vertical rules have different sizes
morph_size
:int | None
- the size of the morphology operators that are applied before the cross kernel. 'bridges the gaps' of broken-up lines
sauvola_k
:float
- threshold parameter for sauvola thresholding
sauvola_window
:int
- window_size parameter for sauvola thresholding
scale
:float
- image scale factor to do calculations on (useful for increasing calculation speed mostly)
search_region
:int
- area in which to search for a new max value in
find_nearest
etc. distance_penalty
:float
- how much the point finding algorithm penalizes points that are further in the region [0, 1]
min_rows
:int
- minimum number of rows to find before stopping the table finding algorithm
grow_threshold
:float
- the threshold for accepting a new point when growing the table
look_distance
:int
- how many points away to look when calculating the median slope
Methods
def apply(self, img: cv2.Mat | numpy.ndarray, visual: bool = False) ‑> cv2.Mat | numpy.ndarray
-
Expand source code
def apply(self, img: MatLike, visual: bool = False) -> MatLike: """ Apply the grid detection filter to the input image. Args: img (MatLike): the input image visual (bool): whether to show intermediate steps Returns: MatLike: the filtered image, with high values (whiter pixels) at intersections of horizontal and vertical rules """ if img is None or img.size == 0: raise ValueError("Input image is empty or None") binary = imu.sauvola(img, k=self._sauvola_k, window_size=self._sauvola_window) if visual: imu.show(binary, title="thresholded") binary = self._apply_morphology(binary) if visual: imu.show(binary, title="dilated") filtered = self._apply_cross_matching(binary) return filtered
Apply the grid detection filter to the input image.
Args
img
:MatLike
- the input image
visual
:bool
- whether to show intermediate steps
Returns
MatLike
- the filtered image, with high values (whiter pixels) at intersections of horizontal and vertical rules
def find_nearest(self,
filtered: cv2.Mat | numpy.ndarray,
point: Tuple[int, int],
region: int | None = None) ‑> Tuple[Tuple[int, int], float]-
Expand source code
@log_calls(level=logging.DEBUG, include_return=True) def find_nearest( self, filtered: MatLike, point: Point, region: Optional[int] = None ) -> Tuple[Point, float]: """ Find the nearest 'corner match' in the image, along with its score [0,1] Args: filtered (MatLike): the filtered image (obtained through `apply`) point (tuple[int, int]): the approximate target point (x, y) region (None | int): alternative value for search region, overwriting the `__init__` parameter `region` """ if filtered is None or filtered.size == 0: raise ValueError("Filtered image is empty or None") region_size = region if region is not None else self._search_region x, y = point # Calculate crop boundaries crop_x = max(0, x - region_size // 2) crop_y = max(0, y - region_size // 2) crop_width = min(region_size, filtered.shape[1] - crop_x) crop_height = min(region_size, filtered.shape[0] - crop_y) # Handle edge cases if crop_width <= 0 or crop_height <= 0: logger.warning(f"Point {point} is outside image bounds") return point, 0.0 cropped = filtered[crop_y : crop_y + crop_height, crop_x : crop_x + crop_width] if cropped.size == 0: return point, 0.0 # Always apply Gaussian weighting by extending crop if needed if cropped.shape[0] == region_size and cropped.shape[1] == region_size: # Perfect size - apply weights directly weights = self._create_gaussian_weights(region_size) weighted = cropped.astype(np.float32) * weights else: # Extend crop to match region_size, apply weights, then restore extended = np.zeros((region_size, region_size), dtype=cropped.dtype) # Calculate offset to center the cropped region in extended array offset_y = (region_size - cropped.shape[0]) // 2 offset_x = (region_size - cropped.shape[1]) // 2 # Place cropped region in center of extended array extended[ offset_y : offset_y + cropped.shape[0], offset_x : offset_x + cropped.shape[1], ] = cropped # Apply Gaussian weights to extended array weights = self._create_gaussian_weights(region_size) weighted_extended = extended.astype(np.float32) * weights # Extract the original region back out weighted = weighted_extended[ offset_y : offset_y + cropped.shape[0], offset_x : offset_x + cropped.shape[1], ] best_idx = np.argmax(weighted) best_y, best_x = np.unravel_index(best_idx, cropped.shape) result_point = ( int(crop_x + best_x), int(crop_y + best_y), ) result_confidence = float(weighted[best_y, best_x]) / 255.0 return result_point, result_confidence
Find the nearest 'corner match' in the image, along with its score [0,1]
Args
filtered
:MatLike
- the filtered image (obtained through
apply
) point
:tuple[int, int]
- the approximate target point (x, y)
region
:None | int
- alternative value for search region,
overwriting the
__init__
parameterregion
def find_table_points(self,
img: cv2.Mat | numpy.ndarray | os.PathLike[str],
left_top: Tuple[int, int],
cell_widths: list[int],
cell_heights: list[int] | int,
visual: bool = False,
window: str = 'taulu',
goals_width: int | None = None) ‑> TableGrid-
Expand source code
def find_table_points( self, img: MatLike | PathLike[str], left_top: Point, cell_widths: list[int], cell_heights: list[int] | int, visual: bool = False, window: str = WINDOW, goals_width: Optional[int] = None, ) -> "TableGrid": """ Parse the image to a `TableGrid` structure that holds all of the intersections between horizontal and vertical rules, starting near the `left_top` point Args: img (MatLike): the input image of a table left_top (tuple[int, int]): the starting point of the algorithm cell_widths (list[int]): the expected widths of the cells (based on a header template) cell_heights (list[int]): the expected height of the rows of data. The last value from this list is used until the image has no more vertical space. visual (bool): whether to show intermediate steps window (str): the name of the OpenCV window to use for visualization goals_width (int | None): the width of the goal region when searching for the next point. If None, defaults to 1.5 * search_region Returns: a TableGrid object """ if goals_width is None: goals_width = self._search_region * 3 // 2 if not cell_widths: raise ValueError("cell_widths must contain at least one value") if not isinstance(img, np.ndarray): img = cv.imread(os.fspath(img)) filtered = self.apply(img, visual) if visual: imu.show(filtered, window=window) if isinstance(cell_heights, int): cell_heights = [cell_heights] left_top, confidence = self.find_nearest( filtered, left_top, int(self._search_region * 3) ) if confidence < 0.1: logger.warning( f"Low confidence for the starting point: {confidence} at {left_top}" ) # resize all parameters according to scale img = cv.resize(img, None, fx=self._scale, fy=self._scale) if visual: imu.push(img) filtered = cv.resize(filtered, None, fx=self._scale, fy=self._scale) cell_widths = [int(w * self._scale) for w in cell_widths] cell_heights = [int(h * self._scale) for h in cell_heights] left_top = (int(left_top[0] * self._scale), int(left_top[1] * self._scale)) self._search_region = int(self._search_region * self._scale) img_gray = ensure_gray(img) filtered_gray = ensure_gray(filtered) table_grower = TableGrower( img_gray, filtered_gray, cell_widths, # pyright: ignore cell_heights, # pyright: ignore left_top, self._search_region, self._distance_penalty, self._look_distance, self._grow_threshold, self._min_rows, ) def show_grower_progress(wait: bool = False): img_orig = np.copy(img) corners = table_grower.get_all_corners() for y in range(len(corners)): for x in range(len(corners[y])): if corners[y][x] is not None: img_orig = imu.draw_points( img_orig, [corners[y][x]], color=(0, 0, 255), thickness=30, ) edge = table_grower.get_edge_points() for point, score in edge: color = (100, int(clamp(score * 255, 0, 255)), 100) imu.draw_point(img_orig, point, color=color, thickness=20) imu.show(img_orig, wait=wait) if visual: threshold = self._grow_threshold look_distance = self._look_distance # python implementation of rust loops, for visualization purposes # note this is a LOT slower while table_grower.grow_point(img_gray, filtered_gray) is not None: show_grower_progress() show_grower_progress(True) original_threshold = threshold loops_without_change = 0 while not table_grower.is_table_complete(): loops_without_change += 1 if loops_without_change > 50: break if table_grower.extrapolate_one(img_gray, filtered_gray) is not None: show_grower_progress() loops_without_change = 0 grown = False while table_grower.grow_point(img_gray, filtered_gray) is not None: show_grower_progress() grown = True threshold = min(0.1 + 0.9 * threshold, original_threshold) table_grower.set_threshold(threshold) if not grown: threshold *= 0.9 table_grower.set_threshold(threshold) else: threshold *= 0.9 table_grower.set_threshold(threshold) if table_grower.grow_point(img_gray, filtered_gray) is not None: show_grower_progress() loops_without_change = 0 else: table_grower.grow_table(img_gray, filtered_gray) table_grower.smooth_grid() corners = table_grower.get_all_corners() logger.info( f"Table growth complete, found {len(corners)} rows and {len(corners[0])} columns" ) # rescale corners back to original size if self._scale != 1.0: for y in range(len(corners)): for x in range(len(corners[y])): if corners[y][x] is not None: corners[y][x] = ( int(corners[y][x][0] / self._scale), # pyright:ignore int(corners[y][x][1] / self._scale), # pyright:ignore ) return TableGrid(corners) # pyright: ignore
Parse the image to a
TableGrid
structure that holds all of the intersections between horizontal and vertical rules, starting near theleft_top
pointArgs
img
:MatLike
- the input image of a table
left_top
:tuple[int, int]
- the starting point of the algorithm
cell_widths
:list[int]
- the expected widths of the cells (based on a header template)
cell_heights
:list[int]
- the expected height of the rows of data. The last value from this list is used until the image has no more vertical space.
visual
:bool
- whether to show intermediate steps
window
:str
- the name of the OpenCV window to use for visualization
goals_width
:int | None
- the width of the goal region when searching for the next point. If None, defaults to 1.5 * search_region
Returns
a TableGrid object
class HeaderAligner (template: cv2.Mat | numpy.ndarray | os.PathLike[str] | str | None = None,
max_features: int = 25000,
patch_size: int = 31,
match_fraction: float = 0.6,
scale: float = 1.0,
max_dist: float = 1.0,
k: float | None = 0.05)-
Expand source code
class HeaderAligner: """ Calculates a transformation matrix to transform points from header-template-image-space to subject-image-space. """ def __init__( self, template: None | MatLike | PathLike[str] | str = None, max_features: int = 25_000, patch_size: int = 31, match_fraction: float = 0.6, scale: float = 1.0, max_dist: float = 1.00, k: float | None = 0.05, ): """ Args: template (MatLike | str): (path of) template image, with the table template clearly visible max_features (int): maximal number of features that will be extracted by ORB patch_size (int): for ORB feature extractor match_fraction (float): best fraction of matches that are kept scale (float): image scale factor to do calculations on (useful for increasing calculation speed mostly) max_dist (float): maximum distance (relative to image size) of matched features. Increase this value if the warping between image and template needs to be more agressive k (float | None): sauvola thresholding threshold value. If None, no sauvola thresholding is done """ if type(template) is str or type(template) is PathLike: value = cv.imread(fspath(template)) template = value self._k = k if scale > 1.0: raise TauluException( "Scaling up the image for header alignment is useless. Use 0 < scale <= 1.0" ) if scale == 0: raise TauluException("Use 0 < scale <= 1.0") self._scale = scale self._template = self._scale_img(cast(MatLike, template)) self._template_orig: None | MatLike = None self._preprocess_template() self._max_features = max_features self._patch_size = patch_size self._match_fraction = match_fraction self._max_dist = max_dist def _scale_img(self, img: MatLike) -> MatLike: if self._scale == 1.0: return img return cv.resize(img, None, fx=self._scale, fy=self._scale) def _unscale_img(self, img: MatLike) -> MatLike: if self._scale == 1.0: return img return cv.resize(img, None, fx=1 / self._scale, fy=1 / self._scale) def _unscale_homography(self, h: np.ndarray) -> np.ndarray: if self._scale == 1.0: return h scale_matrix = np.diag([self._scale, self._scale, 1.0]) # inv_scale_matrix = np.linalg.inv(scale_matrix) inv_scale_matrix = np.diag([1.0 / self._scale, 1.0 / self._scale, 1.0]) # return inv_scale_matrix @ h @ scale_matrix return inv_scale_matrix @ h @ scale_matrix @property def template(self): """The template image that subject images are aligned to""" return self._template @template.setter def template(self, value: MatLike | str): """Set the template image as a path or an image""" if type(value) is str: value = cv.imread(value) self._template = value # TODO: check if the image has the right properties (dimensions etc.) self._template = cast(MatLike, value) self._preprocess_template() def _preprocess_template(self): self._template_orig = cv.cvtColor(self._template, cv.COLOR_BGR2GRAY) if self._k is not None: self._template = imu.sauvola(self._template, self._k) self._template = cv.bitwise_not(self._template) else: _, _, self._template = cv.split(self._template) def _preprocess_image(self, img: MatLike): if self._template_orig is None: raise TauluException("process the template first") if self._k is not None: img = imu.sauvola(img, self._k) img = cv.bitwise_not(img) else: _, _, img = cv.split(img) return img @log_calls(level=logging.DEBUG, include_return=True) def _find_transform_of_template_on( self, im: MatLike, visual: bool = False, window: str = WINDOW ): im = self._scale_img(im) # Detect ORB features and compute descriptors. orb = cv.ORB_create( self._max_features, # type:ignore patchSize=self._patch_size, ) keypoints_im, descriptors_im = orb.detectAndCompute(im, None) keypoints_tg, descriptors_tg = orb.detectAndCompute(self._template, None) # Match features matcher = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True) matches = matcher.match(descriptors_im, descriptors_tg) # Sort matches by score matches = sorted(matches, key=lambda x: x.distance) # Remove not so good matches numGoodMatches = int(len(matches) * self._match_fraction) matches = matches[:numGoodMatches] if visual: final_img_filtered = cv.drawMatches( im, keypoints_im, self._template, keypoints_tg, matches[:10], None, # type:ignore cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS, ) imu.show(final_img_filtered, title="matches", window=window) # Extract location of good matches points1 = np.zeros((len(matches), 2), dtype=np.float32) points2 = np.zeros((len(matches), 2), dtype=np.float32) for i, match in enumerate(matches): points1[i, :] = keypoints_tg[match.trainIdx].pt points2[i, :] = keypoints_im[match.queryIdx].pt # Prune reference points based upon distance between # key points. This assumes a fairly good alignment to start with # due to the protocol used (location of the sheets) p1 = pd.DataFrame(data=points1) p2 = pd.DataFrame(data=points2) refdist = abs(p1 - p2) mask_x = refdist.loc[:, 0] < (im.shape[0] * self._max_dist) mask_y = refdist.loc[:, 1] < (im.shape[1] * self._max_dist) mask = mask_x & mask_y points1 = points1[mask.to_numpy()] points2 = points2[mask.to_numpy()] # Find homography h, _ = cv.findHomography(points1, points2, cv.RANSAC) return self._unscale_homography(h) def view_alignment(self, img: MatLike, h: NDArray): """ Show the alignment of the template on the given image by transforming it using the supplied transformation matrix `h` and visualising both on different channels Args: img (MatLike): the image on which the template is transformed h (NDArray): the transformation matrix """ im = imu.ensure_gray(img) header = imu.ensure_gray(self._unscale_img(self._template)) height, width = im.shape header_warped = cv.warpPerspective(header, h, (width, height)) merged = np.full((height, width, 3), 255, dtype=np.uint8) merged[..., 1] = im merged[..., 2] = header_warped return imu.show(merged) @log_calls(level=logging.DEBUG, include_return=True) def align( self, img: MatLike | str, visual: bool = False, window: str = WINDOW ) -> NDArray: """ Calculates a homogeneous transformation matrix that maps pixels of the template to the given image """ logger.info("Aligning header with supplied table image") if type(img) is str: img = cv.imread(img) img = cast(MatLike, img) img = self._preprocess_image(img) h = self._find_transform_of_template_on(img, visual, window) if visual: self.view_alignment(img, h) return h def template_to_img(self, h: NDArray, point: Iterable[int]) -> tuple[int, int]: """ Transform the given point (in template-space) using the transformation h (obtained through the `align` method) Args: h (NDArray): transformation matrix of shape (3, 3) point (Iterable[int]): the to-be-transformed point, should conform to (x, y) """ point = np.array([[point[0], point[1], 1]]) # type:ignore transformed = np.dot(h, point.T) # type:ignore transformed /= transformed[2] return int(transformed[0][0]), int(transformed[1][0])
Calculates a transformation matrix to transform points from header-template-image-space to subject-image-space.
Args
template
:MatLike | str
- (path of) template image, with the table template clearly visible
max_features
:int
- maximal number of features that will be extracted by ORB
patch_size
:int
- for ORB feature extractor
match_fraction
:float
- best fraction of matches that are kept
scale
:float
- image scale factor to do calculations on (useful for increasing calculation speed mostly)
max_dist
:float
- maximum distance (relative to image size) of matched features. Increase this value if the warping between image and template needs to be more agressive
k
:float | None
- sauvola thresholding threshold value. If None, no sauvola thresholding is done
Instance variables
prop template
-
Expand source code
@property def template(self): """The template image that subject images are aligned to""" return self._template
The template image that subject images are aligned to
Methods
def align(self,
img: cv2.Mat | numpy.ndarray | str,
visual: bool = False,
window: str = 'taulu') ‑> numpy.ndarray[tuple[int, ...], numpy.dtype[+_ScalarType_co]]-
Expand source code
@log_calls(level=logging.DEBUG, include_return=True) def align( self, img: MatLike | str, visual: bool = False, window: str = WINDOW ) -> NDArray: """ Calculates a homogeneous transformation matrix that maps pixels of the template to the given image """ logger.info("Aligning header with supplied table image") if type(img) is str: img = cv.imread(img) img = cast(MatLike, img) img = self._preprocess_image(img) h = self._find_transform_of_template_on(img, visual, window) if visual: self.view_alignment(img, h) return h
Calculates a homogeneous transformation matrix that maps pixels of the template to the given image
def template_to_img(self,
h: numpy.ndarray[tuple[int, ...], numpy.dtype[+_ScalarType_co]],
point: Iterable[int]) ‑> tuple[int, int]-
Expand source code
def template_to_img(self, h: NDArray, point: Iterable[int]) -> tuple[int, int]: """ Transform the given point (in template-space) using the transformation h (obtained through the `align` method) Args: h (NDArray): transformation matrix of shape (3, 3) point (Iterable[int]): the to-be-transformed point, should conform to (x, y) """ point = np.array([[point[0], point[1], 1]]) # type:ignore transformed = np.dot(h, point.T) # type:ignore transformed /= transformed[2] return int(transformed[0][0]), int(transformed[1][0])
Transform the given point (in template-space) using the transformation h (obtained through the
align
method)Args
h
:NDArray
- transformation matrix of shape (3, 3)
point
:Iterable[int]
- the to-be-transformed point, should conform to (x, y)
def view_alignment(self,
img: cv2.Mat | numpy.ndarray,
h: numpy.ndarray[tuple[int, ...], numpy.dtype[+_ScalarType_co]])-
Expand source code
def view_alignment(self, img: MatLike, h: NDArray): """ Show the alignment of the template on the given image by transforming it using the supplied transformation matrix `h` and visualising both on different channels Args: img (MatLike): the image on which the template is transformed h (NDArray): the transformation matrix """ im = imu.ensure_gray(img) header = imu.ensure_gray(self._unscale_img(self._template)) height, width = im.shape header_warped = cv.warpPerspective(header, h, (width, height)) merged = np.full((height, width, 3), 255, dtype=np.uint8) merged[..., 1] = im merged[..., 2] = header_warped return imu.show(merged)
Show the alignment of the template on the given image by transforming it using the supplied transformation matrix
h
and visualising both on different channelsArgs
img
:MatLike
- the image on which the template is transformed
h
:NDArray
- the transformation matrix
class HeaderTemplate (rules: Iterable[Iterable[int]])
-
Expand source code
class HeaderTemplate(TableIndexer): def __init__(self, rules: Iterable[Iterable[int]]): """ A TableTemplate is a collection of rules of a table. This class implements methods for finding cell positions in a table image, given the template the image adheres to. Args: rules: 2D array of lines, where each line is represented as [x0, y0, x1, y1] """ super().__init__() self._rules = [_Rule(*rule) for rule in rules] self._h_rules = sorted( [rule for rule in self._rules if rule._is_horizontal()], key=lambda r: r._y ) self._v_rules = sorted( [rule for rule in self._rules if rule._is_vertical()], key=lambda r: r._x ) @log_calls(level=logging.DEBUG) def save(self, path: PathLike[str]): """ Save the HeaderTemplate to the given path, as a json """ data = {"rules": [r.to_dict() for r in self._rules]} with open(path, "w") as f: json.dump(data, f) @staticmethod @log_calls(level=logging.DEBUG) def from_saved(path: PathLike[str]) -> "HeaderTemplate": with open(path, "r") as f: data = json.load(f) rules = data["rules"] rules = [[r["x0"], r["y0"], r["x1"], r["y1"]] for r in rules] return HeaderTemplate(rules) @property def cols(self) -> int: return len(self._v_rules) - 1 @property def rows(self) -> int: return len(self._h_rules) - 1 @staticmethod @log_calls(level=logging.DEBUG) def annotate_image( template: MatLike | str, crop: Optional[PathLike[str]] = None, margin: int = 10 ) -> "HeaderTemplate": """ Utility method that allows users to create a template form a template image. The user is asked to click to annotate lines (two clicks per line). Args: template: the image on which to annotate the header lines crop (str | None): if str, crop the template image first, then do the annotation. The cropped image will be stored at the supplied path margin (int): margin to add around the cropping of the header """ if type(template) is str: value = cv.imread(template) template = value template = cast(MatLike, template) if crop is not None: cropped = HeaderTemplate._crop(template, margin) cv.imwrite(os.fspath(crop), cropped) template = cropped start_point = None lines: list[list[int]] = [] anno_template = np.copy(template) def get_point(event, x, y, flags, params): nonlocal lines, start_point, anno_template _ = flags _ = params if event == cv.EVENT_LBUTTONDOWN: if start_point is not None: line: list[int] = [start_point[1], start_point[0], x, y] cv.line( # type:ignore anno_template, # type:ignore (start_point[1], start_point[0]), (x, y), (0, 255, 0), 2, cv.LINE_AA, ) cv.imshow(constants.WINDOW, anno_template) # type:ignore lines.append(line) start_point = None else: start_point = (y, x) elif event == cv.EVENT_RBUTTONDOWN: start_point = None # remove the last annotation lines = lines[:-1] anno_template = np.copy(anno_template) for line in lines: cv.line( template, (line[0], line[1]), (line[2], line[3]), (0, 255, 0), 2, cv.LINE_AA, ) cv.imshow(constants.WINDOW, template) print(ANNO_HELP) imu.show(anno_template, get_point, title="annotate the header") return HeaderTemplate(lines) @staticmethod @log_calls(level=logging.DEBUG, include_return=True) def _crop(template: MatLike, margin: int = 10) -> MatLike: """ Crop the image to contain only the annotations, such that it can be used as the header image in the taulu workflow. """ points = [] anno_template = np.copy(template) def get_point(event, x, y, flags, params): nonlocal points, anno_template _ = flags _ = params if event == cv.EVENT_LBUTTONDOWN: point = (x, y) cv.circle( # type:ignore anno_template, # type:ignore (x, y), 4, (0, 255, 0), 2, ) cv.imshow(constants.WINDOW, anno_template) # type:ignore points.append(point) elif event == cv.EVENT_RBUTTONDOWN: # remove the last annotation points = points[:-1] anno_template = np.copy(anno_template) for p in points: cv.circle( anno_template, p, 4, (0, 255, 0), 2, ) cv.imshow(constants.WINDOW, anno_template) print(CROP_HELP) imu.show(anno_template, get_point, title="crop the header") assert len(points) == 4, ( "you need to annotate the four corners of the table in order to crop it" ) # crop the image to contain all of the points (just crop rectangularly, x, y, w, h) # Convert points to numpy array points_np = np.array(points) # Find bounding box x_min = np.min(points_np[:, 0]) y_min = np.min(points_np[:, 1]) x_max = np.max(points_np[:, 0]) y_max = np.max(points_np[:, 1]) # Compute width and height width = x_max - x_min height = y_max - y_min # Ensure integers and within image boundaries x_min = max(int(x_min), 0) y_min = max(int(y_min), 0) width = int(width) height = int(height) # Crop the image cropped = template[ y_min - margin : y_min + height + margin, x_min - margin : x_min + width + margin, ] return cropped @staticmethod def from_vgg_annotation(annotation: str) -> "HeaderTemplate": """ Create a TableTemplate from annotations made in [vgg](https://annotate.officialstatistics.org/), using the polylines tool. Args: annotation (str): the path of the annotation csv file """ rules = [] with open(annotation, "r") as csvfile: reader = csv.DictReader(csvfile) for row in reader: shape_attributes = json.loads(row["region_shape_attributes"]) if shape_attributes["name"] == "polyline": x_points = shape_attributes["all_points_x"] y_points = shape_attributes["all_points_y"] if len(x_points) == 2 and len(y_points) == 2: rules.append( [x_points[0], y_points[0], x_points[1], y_points[1]] ) return HeaderTemplate(rules) def cell_width(self, i: int) -> int: self._check_col_idx(i) return int(self._v_rules[i + 1]._x - self._v_rules[i]._x) def cell_widths(self, start: int = 0) -> list[int]: return [self.cell_width(i) for i in range(start, self.cols)] def cell_height(self, header_factor: float = 0.8) -> int: return int((self._h_rules[1]._y - self._h_rules[0]._y) * header_factor) def cell_heights(self, header_factors: list[float] | float) -> list[int]: if isinstance(header_factors, float): header_factors = [header_factors] header_factors = cast(list, header_factors) return [ int((self._h_rules[1]._y - self._h_rules[0]._y) * f) for f in header_factors ] def intersection(self, index: tuple[int, int]) -> tuple[float, float]: """ Returns the interaction of the index[0]th horizontal rule and the index[1]th vertical rule """ ints = self._h_rules[index[0]].intersection(self._v_rules[index[1]]) assert ints is not None return ints def cell(self, point: tuple[float, float]) -> tuple[int, int]: """ Get the cell index (row, col) that corresponds with the point (x, y) in the template image Args: point (tuple[float, float]): the coordinates in the template image Returns: tuple[int, int]: (row, col) """ x, y = point row = -1 col = -1 for i in range(self.rows): y0 = self._h_rules[i]._y_at_x(x) y1 = self._h_rules[i + 1]._y_at_x(x) if min(y0, y1) <= y <= max(y0, y1): row = i break for i in range(self.cols): x0 = self._v_rules[i]._x_at_y(y) x1 = self._v_rules[i + 1]._x_at_y(y) if min(x0, x1) <= x <= max(x0, x1): col = i break if row == -1 or col == -1: return (-1, -1) return (row, col) def cell_polygon( self, cell: tuple[int, int] ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]: """ Return points (x,y) that make up a polygon around the requested cell (top left, top right, bottom right, bottom left) """ row, col = cell self._check_col_idx(col) self._check_row_idx(row) top_rule = self._h_rules[row] bottom_rule = self._h_rules[row + 1] left_rule = self._v_rules[col] right_rule = self._v_rules[col + 1] # Calculate corner points using intersections top_left = top_rule.intersection(left_rule) top_right = top_rule.intersection(right_rule) bottom_left = bottom_rule.intersection(left_rule) bottom_right = bottom_rule.intersection(right_rule) if not all( [ point is not None for point in [top_left, top_right, bottom_left, bottom_right] ] ): raise TauluException("the lines around this cell do not intersect") return top_left, top_right, bottom_right, bottom_left # type:ignore def region( self, start: tuple[int, int], end: tuple[int, int] ) -> tuple[Point, Point, Point, Point]: self._check_row_idx(start[0]) self._check_row_idx(end[0]) self._check_col_idx(start[1]) self._check_col_idx(end[1]) # the rules that surround this row top_rule = self._h_rules[start[0]] bottom_rule = self._h_rules[end[0] + 1] left_rule = self._v_rules[start[1]] right_rule = self._v_rules[end[1] + 1] # four points that will be the bounding polygon of the result, # which needs to be rectified top_left = top_rule.intersection(left_rule) top_right = top_rule.intersection(right_rule) bottom_left = bottom_rule.intersection(left_rule) bottom_right = bottom_rule.intersection(right_rule) if ( top_left is None or top_right is None or bottom_left is None or bottom_right is None ): raise TauluException("the lines around this row do not intersect properly") def to_point(pnt) -> Point: return (int(pnt[0]), int(pnt[1])) return ( to_point(top_left), to_point(top_right), to_point(bottom_right), to_point(bottom_left), ) def text_regions( self, img: MatLike, row: int, margin_x: int = 10, margin_y: int = -20 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: raise TauluException("text_regions should not be called on a HeaderTemplate")
Subclasses implement methods for going from a pixel in the input image to a table cell index, and cropping an image to the given table cell index.
A TableTemplate is a collection of rules of a table. This class implements methods for finding cell positions in a table image, given the template the image adheres to.
Args
rules
- 2D array of lines, where each line is represented as [x0, y0, x1, y1]
Ancestors
- TableIndexer
- abc.ABC
Static methods
def annotate_image(template: cv2.Mat | numpy.ndarray | str,
crop: os.PathLike[str] | None = None,
margin: int = 10) ‑> HeaderTemplate-
Expand source code
@staticmethod @log_calls(level=logging.DEBUG) def annotate_image( template: MatLike | str, crop: Optional[PathLike[str]] = None, margin: int = 10 ) -> "HeaderTemplate": """ Utility method that allows users to create a template form a template image. The user is asked to click to annotate lines (two clicks per line). Args: template: the image on which to annotate the header lines crop (str | None): if str, crop the template image first, then do the annotation. The cropped image will be stored at the supplied path margin (int): margin to add around the cropping of the header """ if type(template) is str: value = cv.imread(template) template = value template = cast(MatLike, template) if crop is not None: cropped = HeaderTemplate._crop(template, margin) cv.imwrite(os.fspath(crop), cropped) template = cropped start_point = None lines: list[list[int]] = [] anno_template = np.copy(template) def get_point(event, x, y, flags, params): nonlocal lines, start_point, anno_template _ = flags _ = params if event == cv.EVENT_LBUTTONDOWN: if start_point is not None: line: list[int] = [start_point[1], start_point[0], x, y] cv.line( # type:ignore anno_template, # type:ignore (start_point[1], start_point[0]), (x, y), (0, 255, 0), 2, cv.LINE_AA, ) cv.imshow(constants.WINDOW, anno_template) # type:ignore lines.append(line) start_point = None else: start_point = (y, x) elif event == cv.EVENT_RBUTTONDOWN: start_point = None # remove the last annotation lines = lines[:-1] anno_template = np.copy(anno_template) for line in lines: cv.line( template, (line[0], line[1]), (line[2], line[3]), (0, 255, 0), 2, cv.LINE_AA, ) cv.imshow(constants.WINDOW, template) print(ANNO_HELP) imu.show(anno_template, get_point, title="annotate the header") return HeaderTemplate(lines)
Utility method that allows users to create a template form a template image.
The user is asked to click to annotate lines (two clicks per line).
Args
template
- the image on which to annotate the header lines
crop
:str | None
- if str, crop the template image first, then do the annotation. The cropped image will be stored at the supplied path
margin
:int
- margin to add around the cropping of the header
def from_saved(path: os.PathLike[str]) ‑> HeaderTemplate
-
Expand source code
@staticmethod @log_calls(level=logging.DEBUG) def from_saved(path: PathLike[str]) -> "HeaderTemplate": with open(path, "r") as f: data = json.load(f) rules = data["rules"] rules = [[r["x0"], r["y0"], r["x1"], r["y1"]] for r in rules] return HeaderTemplate(rules)
def from_vgg_annotation(annotation: str) ‑> HeaderTemplate
-
Expand source code
@staticmethod def from_vgg_annotation(annotation: str) -> "HeaderTemplate": """ Create a TableTemplate from annotations made in [vgg](https://annotate.officialstatistics.org/), using the polylines tool. Args: annotation (str): the path of the annotation csv file """ rules = [] with open(annotation, "r") as csvfile: reader = csv.DictReader(csvfile) for row in reader: shape_attributes = json.loads(row["region_shape_attributes"]) if shape_attributes["name"] == "polyline": x_points = shape_attributes["all_points_x"] y_points = shape_attributes["all_points_y"] if len(x_points) == 2 and len(y_points) == 2: rules.append( [x_points[0], y_points[0], x_points[1], y_points[1]] ) return HeaderTemplate(rules)
Create a TableTemplate from annotations made in vgg, using the polylines tool.
Args
annotation
:str
- the path of the annotation csv file
Instance variables
prop cols : int
-
Expand source code
@property def cols(self) -> int: return len(self._v_rules) - 1
prop rows : int
-
Expand source code
@property def rows(self) -> int: return len(self._h_rules) - 1
Methods
def cell(self, point: tuple[float, float]) ‑> tuple[int, int]
-
Expand source code
def cell(self, point: tuple[float, float]) -> tuple[int, int]: """ Get the cell index (row, col) that corresponds with the point (x, y) in the template image Args: point (tuple[float, float]): the coordinates in the template image Returns: tuple[int, int]: (row, col) """ x, y = point row = -1 col = -1 for i in range(self.rows): y0 = self._h_rules[i]._y_at_x(x) y1 = self._h_rules[i + 1]._y_at_x(x) if min(y0, y1) <= y <= max(y0, y1): row = i break for i in range(self.cols): x0 = self._v_rules[i]._x_at_y(y) x1 = self._v_rules[i + 1]._x_at_y(y) if min(x0, x1) <= x <= max(x0, x1): col = i break if row == -1 or col == -1: return (-1, -1) return (row, col)
Get the cell index (row, col) that corresponds with the point (x, y) in the template image
Args
point
:tuple[float, float]
- the coordinates in the template image
Returns
tuple[int, int]
- (row, col)
def cell_height(self, header_factor: float = 0.8) ‑> int
-
Expand source code
def cell_height(self, header_factor: float = 0.8) -> int: return int((self._h_rules[1]._y - self._h_rules[0]._y) * header_factor)
def cell_heights(self, header_factors: list[float] | float) ‑> list[int]
-
Expand source code
def cell_heights(self, header_factors: list[float] | float) -> list[int]: if isinstance(header_factors, float): header_factors = [header_factors] header_factors = cast(list, header_factors) return [ int((self._h_rules[1]._y - self._h_rules[0]._y) * f) for f in header_factors ]
def cell_polygon(self, cell: tuple[int, int]) ‑> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]
-
Expand source code
def cell_polygon( self, cell: tuple[int, int] ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]: """ Return points (x,y) that make up a polygon around the requested cell (top left, top right, bottom right, bottom left) """ row, col = cell self._check_col_idx(col) self._check_row_idx(row) top_rule = self._h_rules[row] bottom_rule = self._h_rules[row + 1] left_rule = self._v_rules[col] right_rule = self._v_rules[col + 1] # Calculate corner points using intersections top_left = top_rule.intersection(left_rule) top_right = top_rule.intersection(right_rule) bottom_left = bottom_rule.intersection(left_rule) bottom_right = bottom_rule.intersection(right_rule) if not all( [ point is not None for point in [top_left, top_right, bottom_left, bottom_right] ] ): raise TauluException("the lines around this cell do not intersect") return top_left, top_right, bottom_right, bottom_left # type:ignore
Return points (x,y) that make up a polygon around the requested cell (top left, top right, bottom right, bottom left)
def cell_width(self, i: int) ‑> int
-
Expand source code
def cell_width(self, i: int) -> int: self._check_col_idx(i) return int(self._v_rules[i + 1]._x - self._v_rules[i]._x)
def cell_widths(self, start: int = 0) ‑> list[int]
-
Expand source code
def cell_widths(self, start: int = 0) -> list[int]: return [self.cell_width(i) for i in range(start, self.cols)]
def intersection(self, index: tuple[int, int]) ‑> tuple[float, float]
-
Expand source code
def intersection(self, index: tuple[int, int]) -> tuple[float, float]: """ Returns the interaction of the index[0]th horizontal rule and the index[1]th vertical rule """ ints = self._h_rules[index[0]].intersection(self._v_rules[index[1]]) assert ints is not None return ints
Returns the interaction of the index[0]th horizontal rule and the index[1]th vertical rule
def save(self, path: os.PathLike[str])
-
Expand source code
@log_calls(level=logging.DEBUG) def save(self, path: PathLike[str]): """ Save the HeaderTemplate to the given path, as a json """ data = {"rules": [r.to_dict() for r in self._rules]} with open(path, "w") as f: json.dump(data, f)
Save the HeaderTemplate to the given path, as a json
Inherited members
class Split (left: ~T | None = None, right: ~T | None = None)
-
Expand source code
class Split(Generic[T]): """Wrapper for data that has both a left and a right variant""" def __init__(self, left: T | None = None, right: T | None = None): self._left = left self._right = right @property def left(self) -> T: assert self._left is not None return self._left @left.setter def left(self, value: T): self._left = value @property def right(self) -> T: assert self._right is not None return self._right @right.setter def right(self, value: T): self._right = value def append(self, value: T): if self._left is None: self._left = value else: self._right = value def __repr__(self) -> str: return f"left: {self._left}, right: {self._right}" def __iter__(self): assert self._left is not None assert self._right is not None return iter((self._left, self._right)) def __getitem__(self, index: bool) -> T: assert self._left is not None assert self._right is not None if int(index) == 0: return self._left else: return self._right def apply( self, funcs: "Split[Callable[[T, *Any], V]] | Callable[[T, *Any], V]", *args, **kwargs, ) -> "Split[V]": if not isinstance(funcs, Split): funcs = Split(funcs, funcs) def get_arg(side: str, arg): if isinstance(arg, Split): return getattr(arg, side) return arg def call(side: str): func = getattr(funcs, side) target = getattr(self, side) side_args = [get_arg(side, arg) for arg in args] side_kwargs = {k: get_arg(side, v) for k, v in kwargs.items()} return func(target, *side_args, **side_kwargs) return Split(call("left"), call("right")) def __getattr__(self, attr_name: str): if attr_name in self.__dict__: return getattr(self, attr_name) def wrapper(*args, **kwargs): return self.apply( Split( getattr(self.left.__class__, attr_name), getattr(self.right.__class__, attr_name), ), *args, **kwargs, ) return wrapper
Wrapper for data that has both a left and a right variant
Ancestors
- typing.Generic
Instance variables
prop left : ~T
-
Expand source code
@property def left(self) -> T: assert self._left is not None return self._left
prop right : ~T
-
Expand source code
@property def right(self) -> T: assert self._right is not None return self._right
Methods
def append(self, value: ~T)
-
Expand source code
def append(self, value: T): if self._left is None: self._left = value else: self._right = value
def apply(self,
funcs: Split[Callable[[T, *Any], V]] | Callable[[T, *Any], V],
*args,
**kwargs) ‑> Split[V]-
Expand source code
def apply( self, funcs: "Split[Callable[[T, *Any], V]] | Callable[[T, *Any], V]", *args, **kwargs, ) -> "Split[V]": if not isinstance(funcs, Split): funcs = Split(funcs, funcs) def get_arg(side: str, arg): if isinstance(arg, Split): return getattr(arg, side) return arg def call(side: str): func = getattr(funcs, side) target = getattr(self, side) side_args = [get_arg(side, arg) for arg in args] side_kwargs = {k: get_arg(side, v) for k, v in kwargs.items()} return func(target, *side_args, **side_kwargs) return Split(call("left"), call("right"))
class TableGrid (points: list[list[typing.Tuple[int, int]]], right_offset: int | None = None)
-
Expand source code
class TableGrid(TableIndexer): """ A data class that allows segmenting the image into cells """ _right_offset: int | None = None def __init__(self, points: list[list[Point]], right_offset: Optional[int] = None): """ Args: points: a 2D list of intersections between hor. and vert. rules """ self._points = points self._right_offset = right_offset @property def points(self) -> list[list[Point]]: return self._points def row(self, i: int) -> list[Point]: assert 0 <= i and i < len(self._points) return self._points[i] @property def cols(self) -> int: if self._right_offset is not None: return len(self.row(0)) - 2 else: return len(self.row(0)) - 1 @property def rows(self) -> int: return len(self._points) - 1 @staticmethod def from_split( split_grids: Split["TableGrid"], offsets: Split[Point] ) -> "TableGrid": """ Convert two ``TableGrid`` objects into one, that is able to segment the original (non-cropped) image Args: split_grids (Split[TableGrid]): a Split of TableGrid objects of the left and right part of the table offsets (Split[tuple[int, int]]): a Split of the offsets in the image where the crop happened """ def offset_points(points, offset): return [ [(p[0] + offset[0], p[1] + offset[1]) for p in row] for row in points ] split_points = split_grids.apply( lambda grid, offset: offset_points(grid.points, offset), offsets ) points = [] rows = min(split_grids.left.rows, split_grids.right.rows) for row in range(rows + 1): row_points = [] row_points.extend(split_points.left[row]) row_points.extend(split_points.right[row]) points.append(row_points) table_grid = TableGrid(points, split_grids.left.cols) return table_grid def save(self, path: str | Path): with open(path, "w") as f: json.dump({"points": self.points, "right_offset": self._right_offset}, f) @staticmethod def from_saved(path: str | Path) -> "TableGrid": with open(path, "r") as f: points = json.load(f) right_offset = points.get("right_offset", None) points = [[(p[0], p[1]) for p in pointes] for pointes in points["points"]] return TableGrid(points, right_offset) def add_left_col(self, width: int): for row in self._points: first = row[0] new_first = (first[0] - width, first[1]) row.insert(0, new_first) def add_top_row(self, height: int): new_row = [] for point in self._points[0]: new_row.append((point[0], point[1] - height)) self.points.insert(0, new_row) def _surrounds(self, rect: list[Point], point: tuple[float, float]) -> bool: """point: x, y""" lt, rt, rb, lb = rect x, y = point top = _Rule(*lt, *rt) if top._y_at_x(x) > y: return False right = _Rule(*rt, *rb) if right._x_at_y(y) < x: return False bottom = _Rule(*lb, *rb) if bottom._y_at_x(x) < y: return False left = _Rule(*lb, *lt) if left._x_at_y(y) > x: return False return True def cell(self, point: tuple[float, float]) -> tuple[int, int]: for r in range(len(self._points) - 1): offset = 0 for c in range(len(self.row(0)) - 1): if self._right_offset is not None and c == self._right_offset: offset = -1 continue if self._surrounds( [ self._points[r][c], self._points[r][c + 1], self._points[r + 1][c + 1], self._points[r + 1][c], ], point, ): return (r, c + offset) return (-1, -1) def cell_polygon(self, cell: tuple[int, int]) -> tuple[Point, Point, Point, Point]: r, c = cell self._check_row_idx(r) self._check_col_idx(c) if self._right_offset is not None and c >= self._right_offset: c = c + 1 return ( self._points[r][c], self._points[r][c + 1], self._points[r + 1][c + 1], self._points[r + 1][c], ) def region( self, start: tuple[int, int], end: tuple[int, int] ) -> tuple[Point, Point, Point, Point]: r0, c0 = start r1, c1 = end self._check_row_idx(r0) self._check_row_idx(r1) self._check_col_idx(c0) self._check_col_idx(c1) if self._right_offset is not None and c0 >= self._right_offset: c0 = c0 + 1 if self._right_offset is not None and c1 >= self._right_offset: c1 = c1 + 1 lt = self._points[r0][c0] rt = self._points[r0][c1 + 1] rb = self._points[r1 + 1][c1 + 1] lb = self._points[r1 + 1][c0] return lt, rt, rb, lb def visualize_points(self, img: MatLike): """ Draw the detected table points on the image for visual verification """ import colorsys def clr(index, total_steps): hue = index / total_steps # Normalized hue between 0 and 1 r, g, b = colorsys.hsv_to_rgb(hue, 1.0, 1.0) return int(r * 255), int(g * 255), int(b * 255) for i, row in enumerate(self._points): for p in row: cv.circle(img, p, 4, clr(i, len(self._points)), -1) imu.show(img) def text_regions( self, img: MatLike, row: int, margin_x: int = 10, margin_y: int = -3 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: def vertical_rule_crop(row: int, col: int): self._check_col_idx(col) self._check_row_idx(row) if self._right_offset is not None and col >= self._right_offset: col = col + 1 top = self._points[row][col] bottom = self._points[row + 1][col] left = int(min(top[0], bottom[0])) right = int(max(top[0], bottom[0])) return img[ int(top[1]) - margin_y : int(bottom[1]) + margin_y, left - margin_x : right + margin_x, ] result = [] start = None for col in range(self.cols): crop = vertical_rule_crop(row, col) text_over_score = imu.text_presence_score(crop) text_over = text_over_score > -0.10 if not text_over: if start is not None: result.append(((row, start), (row, col - 1))) start = col if start is not None: result.append(((row, start), (row, self.cols - 1))) return result def anneal( self, img: MatLike, look_distance_main: int = 3, look_distance_alt: int = 3 ): # how far to look in the main direction of the line # that is currently being examined LOOK_MAIN = look_distance_main # how far to look in the perpendicular direction of the line # that is currently being examined LOOK_ALT = look_distance_alt def _left_at(col: int, offset: int = LOOK_ALT) -> int: if self._right_offset is not None and col > self._right_offset: return int(clamp(col - offset, self._right_offset + 1, self.cols + 1)) else: return int(clamp(col - offset, 0, self.cols + 1)) def _right_at(col: int, offset: int = LOOK_ALT) -> int: if self._right_offset is not None and col <= self._right_offset: return int(clamp(col + offset, 0, self._right_offset)) else: return int(clamp(col + offset, 0, self.cols + 1)) def _median_slope(index: Point) -> Optional[float]: (r, c) = index left = _left_at(c) right = _right_at(c) if left == right: return None lines = [] for row in range(r - LOOK_MAIN, r + LOOK_MAIN): if row < 0 or row == r or row >= len(self.points): continue left_point = self.points[row][int(left)] right_point = self.points[row][int(right)] lines.append((left_point, right_point)) return _core_median_slope(lines) new_points = [] for row in self.points: new_points.append(row.copy()) for row in range(len(self.points)): for col in range(len(self.points[0])): slope = _median_slope((row, col)) if slope is None: continue left = _left_at(col, 1) left_point = self.points[row][int(left)] right = _right_at(col, 1) right_point = self.points[row][int(right)] # img_ = np.copy(img) # # draw a line through the left point with that slope # cv.line( # img_, # (int(left_point[0]), int(left_point[1])), # ( # int(right_point[0]), # int(slope * (right_point[0] - left_point[0]) + left_point[1]), # ), # (0, 255, 0), # 3, # cv.LINE_AA, # ) # imu.show(img_) # extrapolate left point to this points x coordinate new_y = ( slope * (self.points[row][col][0] - left_point[0]) + left_point[1] ) new_y = ( new_y / 2 + ( slope * (right_point[0] - self.points[row][col][0]) + right_point[1] ) / 2 ) movement = new_y - self.points[row][col][1] new_points[row][col] = ( self.points[row][col][0], self.points[row][col][1] + movement * 0.8, ) self._points = new_points
A data class that allows segmenting the image into cells
Args
points
- a 2D list of intersections between hor. and vert. rules
Ancestors
- TableIndexer
- abc.ABC
Static methods
def from_saved(path: str | pathlib.Path) ‑> TableGrid
-
Expand source code
@staticmethod def from_saved(path: str | Path) -> "TableGrid": with open(path, "r") as f: points = json.load(f) right_offset = points.get("right_offset", None) points = [[(p[0], p[1]) for p in pointes] for pointes in points["points"]] return TableGrid(points, right_offset)
def from_split(split_grids: Split[ForwardRef('TableGrid')],
offsets: Split[typing.Tuple[int, int]]) ‑> TableGrid-
Expand source code
@staticmethod def from_split( split_grids: Split["TableGrid"], offsets: Split[Point] ) -> "TableGrid": """ Convert two ``TableGrid`` objects into one, that is able to segment the original (non-cropped) image Args: split_grids (Split[TableGrid]): a Split of TableGrid objects of the left and right part of the table offsets (Split[tuple[int, int]]): a Split of the offsets in the image where the crop happened """ def offset_points(points, offset): return [ [(p[0] + offset[0], p[1] + offset[1]) for p in row] for row in points ] split_points = split_grids.apply( lambda grid, offset: offset_points(grid.points, offset), offsets ) points = [] rows = min(split_grids.left.rows, split_grids.right.rows) for row in range(rows + 1): row_points = [] row_points.extend(split_points.left[row]) row_points.extend(split_points.right[row]) points.append(row_points) table_grid = TableGrid(points, split_grids.left.cols) return table_grid
Instance variables
prop cols : int
-
Expand source code
@property def cols(self) -> int: if self._right_offset is not None: return len(self.row(0)) - 2 else: return len(self.row(0)) - 1
prop points : list[list[typing.Tuple[int, int]]]
-
Expand source code
@property def points(self) -> list[list[Point]]: return self._points
prop rows : int
-
Expand source code
@property def rows(self) -> int: return len(self._points) - 1
Methods
def add_left_col(self, width: int)
-
Expand source code
def add_left_col(self, width: int): for row in self._points: first = row[0] new_first = (first[0] - width, first[1]) row.insert(0, new_first)
def add_top_row(self, height: int)
-
Expand source code
def add_top_row(self, height: int): new_row = [] for point in self._points[0]: new_row.append((point[0], point[1] - height)) self.points.insert(0, new_row)
def anneal(self,
img: cv2.Mat | numpy.ndarray,
look_distance_main: int = 3,
look_distance_alt: int = 3)-
Expand source code
def anneal( self, img: MatLike, look_distance_main: int = 3, look_distance_alt: int = 3 ): # how far to look in the main direction of the line # that is currently being examined LOOK_MAIN = look_distance_main # how far to look in the perpendicular direction of the line # that is currently being examined LOOK_ALT = look_distance_alt def _left_at(col: int, offset: int = LOOK_ALT) -> int: if self._right_offset is not None and col > self._right_offset: return int(clamp(col - offset, self._right_offset + 1, self.cols + 1)) else: return int(clamp(col - offset, 0, self.cols + 1)) def _right_at(col: int, offset: int = LOOK_ALT) -> int: if self._right_offset is not None and col <= self._right_offset: return int(clamp(col + offset, 0, self._right_offset)) else: return int(clamp(col + offset, 0, self.cols + 1)) def _median_slope(index: Point) -> Optional[float]: (r, c) = index left = _left_at(c) right = _right_at(c) if left == right: return None lines = [] for row in range(r - LOOK_MAIN, r + LOOK_MAIN): if row < 0 or row == r or row >= len(self.points): continue left_point = self.points[row][int(left)] right_point = self.points[row][int(right)] lines.append((left_point, right_point)) return _core_median_slope(lines) new_points = [] for row in self.points: new_points.append(row.copy()) for row in range(len(self.points)): for col in range(len(self.points[0])): slope = _median_slope((row, col)) if slope is None: continue left = _left_at(col, 1) left_point = self.points[row][int(left)] right = _right_at(col, 1) right_point = self.points[row][int(right)] # img_ = np.copy(img) # # draw a line through the left point with that slope # cv.line( # img_, # (int(left_point[0]), int(left_point[1])), # ( # int(right_point[0]), # int(slope * (right_point[0] - left_point[0]) + left_point[1]), # ), # (0, 255, 0), # 3, # cv.LINE_AA, # ) # imu.show(img_) # extrapolate left point to this points x coordinate new_y = ( slope * (self.points[row][col][0] - left_point[0]) + left_point[1] ) new_y = ( new_y / 2 + ( slope * (right_point[0] - self.points[row][col][0]) + right_point[1] ) / 2 ) movement = new_y - self.points[row][col][1] new_points[row][col] = ( self.points[row][col][0], self.points[row][col][1] + movement * 0.8, ) self._points = new_points
def row(self, i: int) ‑> list[typing.Tuple[int, int]]
-
Expand source code
def row(self, i: int) -> list[Point]: assert 0 <= i and i < len(self._points) return self._points[i]
def save(self, path: str | pathlib.Path)
-
Expand source code
def save(self, path: str | Path): with open(path, "w") as f: json.dump({"points": self.points, "right_offset": self._right_offset}, f)
def visualize_points(self, img: cv2.Mat | numpy.ndarray)
-
Expand source code
def visualize_points(self, img: MatLike): """ Draw the detected table points on the image for visual verification """ import colorsys def clr(index, total_steps): hue = index / total_steps # Normalized hue between 0 and 1 r, g, b = colorsys.hsv_to_rgb(hue, 1.0, 1.0) return int(r * 255), int(g * 255), int(b * 255) for i, row in enumerate(self._points): for p in row: cv.circle(img, p, 4, clr(i, len(self._points)), -1) imu.show(img)
Draw the detected table points on the image for visual verification
Inherited members
class TableIndexer
-
Expand source code
class TableIndexer(ABC): """ Subclasses implement methods for going from a pixel in the input image to a table cell index, and cropping an image to the given table cell index. """ def __init__(self): self._col_offset = 0 @property def col_offset(self) -> int: return self._col_offset @col_offset.setter def col_offset(self, value: int): assert value >= 0 self._col_offset = value @property @abstractmethod def cols(self) -> int: pass @property @abstractmethod def rows(self) -> int: pass def cells(self) -> Generator[tuple[int, int], None, None]: for row in range(self.rows): for col in range(self.cols): yield (row, col) def _check_row_idx(self, row: int): if row < 0: raise TauluException("row number needs to be positive or zero") if row >= self.rows: raise TauluException(f"row number too high: {row} >= {self.rows}") def _check_col_idx(self, col: int): if col < 0: raise TauluException("col number needs to be positive or zero") if col >= self.cols: raise TauluException(f"col number too high: {col} >= {self.cols}") @abstractmethod def cell(self, point: tuple[float, float]) -> tuple[int, int]: """ Returns the coordinate (row, col) of the cell that contains the given position Args: point (tuple[float, float]): a location in the input image Returns: tuple[int, int]: the cell index (row, col) that contains the given point """ pass @abstractmethod def cell_polygon( self, cell: tuple[int, int] ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]: """returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position""" pass def _highlight_cell( self, image: MatLike, cell: tuple[int, int], color: tuple[int, int, int] = (0, 0, 255), thickness: int = 2, ): polygon = self.cell_polygon(cell) points = np.int32(list(polygon)) # type:ignore cv.polylines(image, [points], True, color, thickness, cv.LINE_AA) # type:ignore cv.putText( image, str(cell), (int(polygon[3][0] + 10), int(polygon[3][1] - 10)), cv.FONT_HERSHEY_PLAIN, 2.0, (255, 255, 255), 2, ) def highlight_all_cells( self, image: MatLike, color: tuple[int, int, int] = (0, 0, 255), thickness: int = 1, ) -> MatLike: img = np.copy(image) for cell in self.cells(): self._highlight_cell(img, cell, color, thickness) return img def select_one_cell( self, image: MatLike, window: str = WINDOW, color: tuple[int, int, int] = (255, 0, 0), thickness: int = 2, ) -> tuple[int, int] | None: clicked = None def click_event(event, x, y, flags, params): nonlocal clicked img = np.copy(image) _ = flags _ = params if event == cv.EVENT_LBUTTONDOWN: cell = self.cell((x, y)) if cell[0] >= 0: clicked = cell else: return self._highlight_cell(img, cell, color, thickness) cv.imshow(window, img) imu.show(image, click_event=click_event, title="select one cell", window=window) return clicked def show_cells( self, image: MatLike | os.PathLike[str] | str, window: str = WINDOW ) -> list[tuple[int, int]]: if not isinstance(image, np.ndarray): image = cv.imread(os.fspath(image)) img = np.copy(image) cells = [] def click_event(event, x, y, flags, params): _ = flags _ = params if event == cv.EVENT_LBUTTONDOWN: cell = self.cell((x, y)) if cell[0] >= 0: cells.append(cell) else: return self._highlight_cell(img, cell) cv.imshow(window, img) imu.show( img, click_event=click_event, title="click to highlight cells", window=window, ) return cells @abstractmethod def region( self, start: tuple[int, int], end: tuple[int, int], ) -> tuple[Point, Point, Point, Point]: """ Get the bounding box for the rectangular region that goes from start to end Returns: 4 points: lt, rt, rb, lb, in format (x, y) """ pass def crop_region( self, image: MatLike, start: tuple[int, int], end: tuple[int, int], margin: int = 0, margin_top: int | None = None, margin_bottom: int | None = None, margin_left: int | None = None, margin_right: int | None = None, margin_y: int | None = None, margin_x: int | None = None, ) -> MatLike: """Crop the input image to a rectangular region with the start and end cells as extremes""" region = self.region(start, end) lt, rt, rb, lb = _apply_margin( *region, margin=margin, margin_top=margin_top, margin_bottom=margin_bottom, margin_left=margin_left, margin_right=margin_right, margin_y=margin_y, margin_x=margin_x, ) # apply margins according to priority: # margin_top > margin_y > margin (etc.) w = (rt[0] - lt[0] + rb[0] - lb[0]) / 2 h = (rb[1] - rt[1] + lb[1] - lt[1]) / 2 # crop by doing a perspective transform to the desired quad src_pts = np.array([lt, rt, rb, lb], dtype="float32") dst_pts = np.array([[0, 0], [w, 0], [w, h], [0, h]], dtype="float32") M = cv.getPerspectiveTransform(src_pts, dst_pts) warped = cv.warpPerspective(image, M, (int(w), int(h))) # type:ignore return warped @abstractmethod def text_regions( self, img: MatLike, row: int, margin_x: int = 0, margin_y: int = 0 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: """ Split the row into regions of continuous text Returns list[tuple[int, int]]: a list of spans (start col, end col) """ pass def crop_cell(self, image, cell: tuple[int, int], margin: int = 0) -> MatLike: return self.crop_region(image, cell, cell, margin)
Subclasses implement methods for going from a pixel in the input image to a table cell index, and cropping an image to the given table cell index.
Ancestors
- abc.ABC
Subclasses
Instance variables
prop col_offset : int
-
Expand source code
@property def col_offset(self) -> int: return self._col_offset
prop cols : int
-
Expand source code
@property @abstractmethod def cols(self) -> int: pass
prop rows : int
-
Expand source code
@property @abstractmethod def rows(self) -> int: pass
Methods
def cell(self, point: tuple[float, float]) ‑> tuple[int, int]
-
Expand source code
@abstractmethod def cell(self, point: tuple[float, float]) -> tuple[int, int]: """ Returns the coordinate (row, col) of the cell that contains the given position Args: point (tuple[float, float]): a location in the input image Returns: tuple[int, int]: the cell index (row, col) that contains the given point """ pass
Returns the coordinate (row, col) of the cell that contains the given position
Args
point
:tuple[float, float]
- a location in the input image
Returns
tuple[int, int]
- the cell index (row, col) that contains the given point
def cell_polygon(self, cell: tuple[int, int]) ‑> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]
-
Expand source code
@abstractmethod def cell_polygon( self, cell: tuple[int, int] ) -> tuple[tuple[int, int], tuple[int, int], tuple[int, int], tuple[int, int]]: """returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position""" pass
returns the polygon (used in e.g. opencv) that enscribes the cell at the given cell position
def cells(self) ‑> Generator[tuple[int, int], None, None]
-
Expand source code
def cells(self) -> Generator[tuple[int, int], None, None]: for row in range(self.rows): for col in range(self.cols): yield (row, col)
def crop_cell(self, image, cell: tuple[int, int], margin: int = 0) ‑> cv2.Mat | numpy.ndarray
-
Expand source code
def crop_cell(self, image, cell: tuple[int, int], margin: int = 0) -> MatLike: return self.crop_region(image, cell, cell, margin)
def crop_region(self,
image: cv2.Mat | numpy.ndarray,
start: tuple[int, int],
end: tuple[int, int],
margin: int = 0,
margin_top: int | None = None,
margin_bottom: int | None = None,
margin_left: int | None = None,
margin_right: int | None = None,
margin_y: int | None = None,
margin_x: int | None = None) ‑> cv2.Mat | numpy.ndarray-
Expand source code
def crop_region( self, image: MatLike, start: tuple[int, int], end: tuple[int, int], margin: int = 0, margin_top: int | None = None, margin_bottom: int | None = None, margin_left: int | None = None, margin_right: int | None = None, margin_y: int | None = None, margin_x: int | None = None, ) -> MatLike: """Crop the input image to a rectangular region with the start and end cells as extremes""" region = self.region(start, end) lt, rt, rb, lb = _apply_margin( *region, margin=margin, margin_top=margin_top, margin_bottom=margin_bottom, margin_left=margin_left, margin_right=margin_right, margin_y=margin_y, margin_x=margin_x, ) # apply margins according to priority: # margin_top > margin_y > margin (etc.) w = (rt[0] - lt[0] + rb[0] - lb[0]) / 2 h = (rb[1] - rt[1] + lb[1] - lt[1]) / 2 # crop by doing a perspective transform to the desired quad src_pts = np.array([lt, rt, rb, lb], dtype="float32") dst_pts = np.array([[0, 0], [w, 0], [w, h], [0, h]], dtype="float32") M = cv.getPerspectiveTransform(src_pts, dst_pts) warped = cv.warpPerspective(image, M, (int(w), int(h))) # type:ignore return warped
Crop the input image to a rectangular region with the start and end cells as extremes
def highlight_all_cells(self,
image: cv2.Mat | numpy.ndarray,
color: tuple[int, int, int] = (0, 0, 255),
thickness: int = 1) ‑> cv2.Mat | numpy.ndarray-
Expand source code
def highlight_all_cells( self, image: MatLike, color: tuple[int, int, int] = (0, 0, 255), thickness: int = 1, ) -> MatLike: img = np.copy(image) for cell in self.cells(): self._highlight_cell(img, cell, color, thickness) return img
def region(self, start: tuple[int, int], end: tuple[int, int]) ‑> tuple[typing.Tuple[int, int], typing.Tuple[int, int], typing.Tuple[int, int], typing.Tuple[int, int]]
-
Expand source code
@abstractmethod def region( self, start: tuple[int, int], end: tuple[int, int], ) -> tuple[Point, Point, Point, Point]: """ Get the bounding box for the rectangular region that goes from start to end Returns: 4 points: lt, rt, rb, lb, in format (x, y) """ pass
Get the bounding box for the rectangular region that goes from start to end
Returns
4 points
- lt, rt, rb, lb, in format (x, y)
def select_one_cell(self,
image: cv2.Mat | numpy.ndarray,
window: str = 'taulu',
color: tuple[int, int, int] = (255, 0, 0),
thickness: int = 2) ‑> tuple[int, int] | None-
Expand source code
def select_one_cell( self, image: MatLike, window: str = WINDOW, color: tuple[int, int, int] = (255, 0, 0), thickness: int = 2, ) -> tuple[int, int] | None: clicked = None def click_event(event, x, y, flags, params): nonlocal clicked img = np.copy(image) _ = flags _ = params if event == cv.EVENT_LBUTTONDOWN: cell = self.cell((x, y)) if cell[0] >= 0: clicked = cell else: return self._highlight_cell(img, cell, color, thickness) cv.imshow(window, img) imu.show(image, click_event=click_event, title="select one cell", window=window) return clicked
def show_cells(self,
image: cv2.Mat | numpy.ndarray | os.PathLike[str] | str,
window: str = 'taulu') ‑> list[tuple[int, int]]-
Expand source code
def show_cells( self, image: MatLike | os.PathLike[str] | str, window: str = WINDOW ) -> list[tuple[int, int]]: if not isinstance(image, np.ndarray): image = cv.imread(os.fspath(image)) img = np.copy(image) cells = [] def click_event(event, x, y, flags, params): _ = flags _ = params if event == cv.EVENT_LBUTTONDOWN: cell = self.cell((x, y)) if cell[0] >= 0: cells.append(cell) else: return self._highlight_cell(img, cell) cv.imshow(window, img) imu.show( img, click_event=click_event, title="click to highlight cells", window=window, ) return cells
def text_regions(self, img: cv2.Mat | numpy.ndarray, row: int, margin_x: int = 0, margin_y: int = 0) ‑> list[tuple[tuple[int, int], tuple[int, int]]]
-
Expand source code
@abstractmethod def text_regions( self, img: MatLike, row: int, margin_x: int = 0, margin_y: int = 0 ) -> list[tuple[tuple[int, int], tuple[int, int]]]: """ Split the row into regions of continuous text Returns list[tuple[int, int]]: a list of spans (start col, end col) """ pass
Split the row into regions of continuous text
Returns list[tuple[int, int]]: a list of spans (start col, end col)
class Taulu (header_path: os.PathLike[str] | str | Tuple[os.PathLike[str] | str, os.PathLike[str] | str],
sauvola_k: float = 0.25,
search_region: int = 60,
distance_penalty: float = 0.4,
cross_width: int = 10,
morph_size: int = 4,
kernel_size: int = 41,
processing_scale: float = 1.0,
min_rows: int = 5,
look_distance: int = 3,
grow_threshold: float = 0.3)-
Expand source code
class Taulu: """ The Taulu class is a convenience class that hides the inner workings of taulu as much as possible. For more advanced use cases, it might be useful to implement the workflow directly yourself, in order to have control over the intermediate steps. """ def __init__( self, header_path: PathLike[str] | str | Tuple[PathLike[str] | str, PathLike[str] | str], sauvola_k: float = 0.25, search_region: int = 60, distance_penalty: float = 0.4, cross_width: int = 10, morph_size: int = 4, kernel_size: int = 41, processing_scale: float = 1.0, min_rows: int = 5, look_distance: int = 3, grow_threshold: float = 0.3, ): self._processing_scale = processing_scale if isinstance(header_path, Tuple): header = Split(Path(header_path[0]), Path(header_path[1])) if not exists(header.left.with_suffix(".png")) or not exists( header.right.with_suffix(".png") ): raise TauluException("The header images you provided do not exist") if not exists(header.left.with_suffix(".json")) or not exists( header.right.with_suffix(".json") ): raise TauluException( "You need to annotate the headers of your table first\n\nsee the Taulu.annotate method" ) template_left = HeaderTemplate.from_saved(header.left.with_suffix(".json")) template_right = HeaderTemplate.from_saved( header.right.with_suffix(".json") ) self._header = Split( cv2.imread(os.fspath(header.left)), cv2.imread(os.fspath(header.right)) ) self._aligner = Split( HeaderAligner(self._header.left, scale=self._processing_scale), HeaderAligner(self._header.right, scale=self._processing_scale), ) self._template = Split(template_left, template_right) else: header_path = Path(header_path) self._header = cv2.imread(os.fspath(header_path)) self._aligner = HeaderAligner(self._header) self._template = HeaderTemplate.from_saved(header_path.with_suffix(".json")) # TODO: currently, these parameters are fixed and optimized for the example # image specifically (which is probably a good starting point, # espeicially after normalizing the image size) self._grid_detector = GridDetector( kernel_size=kernel_size, cross_width=cross_width, morph_size=morph_size, search_region=search_region, sauvola_k=sauvola_k, distance_penalty=distance_penalty, scale=self._processing_scale, min_rows=min_rows, look_distance=look_distance, grow_threshold=grow_threshold, ) if isinstance(self._template, Split): self._grid_detector = Split(self._grid_detector, self._grid_detector) @staticmethod def annotate(image_path: PathLike[str] | str, output_path: PathLike[str] | str): """ Annotate the header of a table image. Saves the annotated header image and a json file containing the header template to the output path. Args: image_path (PathLike[str]): the path of the image which you want to annotate output_path (PathLike[str]): the path where the output files should go (image files and json files) """ if not exists(image_path): raise TauluException(f"Image path {image_path} does not exist") if os.path.isdir(output_path): raise TauluException("Output path should be a file") output_path = Path(output_path) template = HeaderTemplate.annotate_image( os.fspath(image_path), crop=output_path.with_suffix(".png") ) template.save(output_path.with_suffix(".json")) # TODO: check if PathLike works like this # TODO: get rid of cell_height and make this part of the header template def segment_table( self, image: MatLike | PathLike[str] | str, cell_height_factor: float | List[float] | Dict[str, float | List[float]], debug_view: bool = False, ) -> TableGrid: """ Main function of the class, segmenting the input image into cells. Returns a TableGrid object, which has methods with which you can find the location of cells in the table Args: image (MatLike | PathLike[str]): The image to segment (path or np.ndarray) cell_height_factor (float | list[float] | dict[str, float | list[float]]): The height factor of a row. This factor is the fraction of the header height each row is. If your header has height 12 and your rows are of height 8, you should pass 8/12 as this argument. Also accepts a list of heights, useful if your row heights are not constant (often, the first row is higher than the others). The last entry in the list is used repeatedly when there are more rows in the image than there are entries in your list. By passing a dictionary with keys "left" and "right", you can specify a different cell_height_factor for the different sides of your table. debug_view (bool): By setting this setting to True, an OpenCV window will open and show the results of intermediate steps. Press `n` for advancing to the next image, and `q` to quit. """ if not isinstance(image, MatLike): image = cv2.imread(os.fspath(image)) # TODO: perform checks on the image now = perf_counter() h = self._aligner.align(image, visual=debug_view) align_time = perf_counter() - now logger.info(f"Header alignment took {align_time:.2f} seconds") # find the starting point for the table grid algorithm left_top_template = self._template.intersection((1, 0)) if isinstance(left_top_template, Split): left_top_template = Split( (int(left_top_template.left[0]), int(left_top_template.left[1])), (int(left_top_template.right[0]), int(left_top_template.right[1])), ) else: left_top_template = (int(left_top_template[0]), int(left_top_template[1])) left_top_table = self._aligner.template_to_img(h, left_top_template) if isinstance(cell_height_factor, dict): if not isinstance(self._template, Split): raise TauluException( "You provided a cell_height_factor dictionary, but the header is not a Split" ) if "left" not in cell_height_factor or "right" not in cell_height_factor: raise TauluException( "When providing a cell_height_factor dictionary, it should contain both 'left' and 'right' keys" ) cell_heights = Split( self._template.left.cell_heights(cell_height_factor.get("left", 1.0)), self._template.right.cell_heights(cell_height_factor.get("right", 1.0)), ) else: cell_heights = self._template.cell_heights(cell_height_factor) now = perf_counter() table = self._grid_detector.find_table_points( image, left_top_table, self._template.cell_widths(0), cell_heights, visual=debug_view, ) grid_time = perf_counter() - now logger.info(f"Grid detection took {grid_time:.2f} seconds") if isinstance(table, Split): table = TableGrid.from_split(table, (0, 0)) return table
The Taulu class is a convenience class that hides the inner workings of taulu as much as possible.
For more advanced use cases, it might be useful to implement the workflow directly yourself, in order to have control over the intermediate steps.
Static methods
def annotate(image_path: os.PathLike[str] | str, output_path: os.PathLike[str] | str)
-
Expand source code
@staticmethod def annotate(image_path: PathLike[str] | str, output_path: PathLike[str] | str): """ Annotate the header of a table image. Saves the annotated header image and a json file containing the header template to the output path. Args: image_path (PathLike[str]): the path of the image which you want to annotate output_path (PathLike[str]): the path where the output files should go (image files and json files) """ if not exists(image_path): raise TauluException(f"Image path {image_path} does not exist") if os.path.isdir(output_path): raise TauluException("Output path should be a file") output_path = Path(output_path) template = HeaderTemplate.annotate_image( os.fspath(image_path), crop=output_path.with_suffix(".png") ) template.save(output_path.with_suffix(".json"))
Annotate the header of a table image.
Saves the annotated header image and a json file containing the header template to the output path.
Args
image_path
:PathLike[str]
- the path of the image which you want to annotate
output_path
:PathLike[str]
- the path where the output files should go (image files and json files)
Methods
def segment_table(self,
image: cv2.Mat | numpy.ndarray | os.PathLike[str] | str,
cell_height_factor: float | List[float] | Dict[str, float | List[float]],
debug_view: bool = False) ‑> TableGrid-
Expand source code
def segment_table( self, image: MatLike | PathLike[str] | str, cell_height_factor: float | List[float] | Dict[str, float | List[float]], debug_view: bool = False, ) -> TableGrid: """ Main function of the class, segmenting the input image into cells. Returns a TableGrid object, which has methods with which you can find the location of cells in the table Args: image (MatLike | PathLike[str]): The image to segment (path or np.ndarray) cell_height_factor (float | list[float] | dict[str, float | list[float]]): The height factor of a row. This factor is the fraction of the header height each row is. If your header has height 12 and your rows are of height 8, you should pass 8/12 as this argument. Also accepts a list of heights, useful if your row heights are not constant (often, the first row is higher than the others). The last entry in the list is used repeatedly when there are more rows in the image than there are entries in your list. By passing a dictionary with keys "left" and "right", you can specify a different cell_height_factor for the different sides of your table. debug_view (bool): By setting this setting to True, an OpenCV window will open and show the results of intermediate steps. Press `n` for advancing to the next image, and `q` to quit. """ if not isinstance(image, MatLike): image = cv2.imread(os.fspath(image)) # TODO: perform checks on the image now = perf_counter() h = self._aligner.align(image, visual=debug_view) align_time = perf_counter() - now logger.info(f"Header alignment took {align_time:.2f} seconds") # find the starting point for the table grid algorithm left_top_template = self._template.intersection((1, 0)) if isinstance(left_top_template, Split): left_top_template = Split( (int(left_top_template.left[0]), int(left_top_template.left[1])), (int(left_top_template.right[0]), int(left_top_template.right[1])), ) else: left_top_template = (int(left_top_template[0]), int(left_top_template[1])) left_top_table = self._aligner.template_to_img(h, left_top_template) if isinstance(cell_height_factor, dict): if not isinstance(self._template, Split): raise TauluException( "You provided a cell_height_factor dictionary, but the header is not a Split" ) if "left" not in cell_height_factor or "right" not in cell_height_factor: raise TauluException( "When providing a cell_height_factor dictionary, it should contain both 'left' and 'right' keys" ) cell_heights = Split( self._template.left.cell_heights(cell_height_factor.get("left", 1.0)), self._template.right.cell_heights(cell_height_factor.get("right", 1.0)), ) else: cell_heights = self._template.cell_heights(cell_height_factor) now = perf_counter() table = self._grid_detector.find_table_points( image, left_top_table, self._template.cell_widths(0), cell_heights, visual=debug_view, ) grid_time = perf_counter() - now logger.info(f"Grid detection took {grid_time:.2f} seconds") if isinstance(table, Split): table = TableGrid.from_split(table, (0, 0)) return table
Main function of the class, segmenting the input image into cells.
Returns a TableGrid object, which has methods with which you can find the location of cells in the table
Args
image
:MatLike | PathLike[str]
- The image to segment (path or np.ndarray)
cell_height_factor
:float | list[float] | dict[str, float | list[float]]
-
The height factor of a row. This factor is the fraction of the header height each row is. If your header has height 12 and your rows are of height 8, you should pass 8/12 as this argument. Also accepts a list of heights, useful if your row heights are not constant (often, the first row is higher than the others). The last entry in the list is used repeatedly when there are more rows in the image than there are entries in your list.
By passing a dictionary with keys "left" and "right", you can specify a different cell_height_factor for the different sides of your table.
debug_view
:bool
- By setting this setting to True, an OpenCV window will open and show the results of intermediate steps.
Press
n
for advancing to the next image, andq
to quit.