Image class, variants and internal data augmentation pipeline

The fastai Image classes

The fastai library is built such that the pictures loaded are wrapped in an Image. This Image contains the array of pixels associated to the picture, but also has a lot of built-in functions that will help the fastai library to process transformations applied to the corresponding image. There are also sub-classes for special types of image-like objects:

See the following sections for documentation of all the details of these classes. But first, let's have a quick look at the main functionality you'll need to know about.

Opening an image and converting to an Image object is easily done by using the open_image function:

img = open_image('imgs/cat_example.jpg')
img

To look at the picture that this Image contains, you can also use its show method. It will show a resized version and has more options to customize the display.

img.show()

This show method can take a few arguments (see the documentation of Image.show for details) but the two we will use the most in this documentation are:

  • ax which is the matplolib.pyplot axes on which we want to show the image
  • title which is an optional title we can give to the image.
_,axs = plt.subplots(1,4,figsize=(12,4))
for i,ax in enumerate(axs): img.show(ax=ax, title=f'Copy {i+1}')

If you're interested in the tensor of pixels, it's stored in the data attribute of an Image.

img.data.shape
torch.Size([3, 500, 394])

The Image classes

Image is the class that wraps every picture in the fastai library. It is subclassed to create ImageSegment and ImageBBox when dealing with segmentation and object detection tasks.

class Image[source][test]

Image(px:Tensor) :: ItemBase

Tests found for Image:

  • pytest -sv tests/test_vision_transform.py::test_mask_data_aug [source]

To run tests please refer to this guide.

Support applying transforms to image data in px.

Most of the functions of the Image class deal with the internal pipeline of transforms, so they are only shown at the end of this page. The easiest way to create one is through the function open_image, as we saw before.

open_image[source][test]

open_image(fn:PathOrStr, div:bool=True, convert_mode:str='RGB', after_open:Callable=None) → Image

No tests found for open_image. To contribute a test please refer to this guide and this discussion.

Return Image object created from image in file fn.

If div=True, pixel values are divided by 255. to become floats between 0. and 1. convert_mode is passed to PIL.Image.convert.

As we saw, in a Jupyter Notebook, the representation of an Image is its underlying picture (shown to its full size). On top of containing the tensor of pixels of the image (and automatically doing the conversion after decoding the image), this class contains various methods for the implementation of transforms. The Image.show method also allows to pass more arguments:

Image.show[source][test]

Image.show(ax:Axes=None, figsize:tuple=(3, 3), title:Optional[str]=None, hide_axis:bool=True, cmap:str=None, y:Any=None, **kwargs)

No tests found for show. To contribute a test please refer to this guide and this discussion.

Show image on ax with title, using cmap if single-channel, overlaid with optional y

  • ax: matplotlib.pyplot axes on which show the image
  • figsize: Size of the figure
  • title: Title to display on top of the graph
  • hide_axis: If True, the axis of the graph are hidden
  • cmap: Color map to use
  • y: Potential target to be superposed on the same graph (mask, bounding box, points)

This allows us to completely customize the display of an Image. We'll see examples of the y functionality below with segmentation and bounding boxes tasks, for now here is an example using the other features.

img.show(figsize=(2, 1), title='Little kitten')
img.show(figsize=(10,5), title='Big kitten')

An Image object also has a few attributes that can be useful:

  • Image.data gives you the underlying tensor of pixel
  • Image.shape gives you the size of that tensor (channels x height x width)
  • Image.size gives you the size of the image (height x width)
img.data, img.shape, img.size
(tensor([[[0.1216, 0.0745, 0.0392,  ..., 0.4706, 0.4863, 0.4863],
          [0.0706, 0.0431, 0.0314,  ..., 0.4706, 0.4863, 0.4824],
          [0.0588, 0.0471, 0.0588,  ..., 0.4745, 0.4784, 0.4706],
          ...,
          [0.3059, 0.3647, 0.3686,  ..., 0.5412, 0.5725, 0.5725],
          [0.3294, 0.4000, 0.4039,  ..., 0.5882, 0.5765, 0.5765],
          [0.3843, 0.4627, 0.4667,  ..., 0.6471, 0.5725, 0.5725]],
 
         [[0.0235, 0.0000, 0.0000,  ..., 0.3451, 0.3725, 0.3725],
          [0.0000, 0.0000, 0.0000,  ..., 0.3569, 0.3725, 0.3765],
          [0.0000, 0.0000, 0.0196,  ..., 0.3647, 0.3686, 0.3725],
          ...,
          [0.3882, 0.4588, 0.4627,  ..., 0.6471, 0.6784, 0.6784],
          [0.4118, 0.4941, 0.4980,  ..., 0.6941, 0.6824, 0.6824],
          [0.4667, 0.5569, 0.5608,  ..., 0.7529, 0.6784, 0.6784]],
 
         [[0.1098, 0.0941, 0.1137,  ..., 0.1843, 0.2078, 0.2078],
          [0.0784, 0.0784, 0.1216,  ..., 0.1922, 0.2078, 0.2078],
          [0.1098, 0.1176, 0.1647,  ..., 0.2078, 0.2118, 0.2118],
          ...,
          [0.4941, 0.5608, 0.5647,  ..., 0.7294, 0.7608, 0.7608],
          [0.5176, 0.5961, 0.6000,  ..., 0.7765, 0.7647, 0.7647],
          [0.5725, 0.6588, 0.6627,  ..., 0.8353, 0.7608, 0.7608]]]),
 torch.Size([3, 500, 394]),
 torch.Size([500, 394]))

For a segmentation task, the target is usually a mask. The fastai library represents it as an ImageSegment object.

class ImageSegment[source][test]

ImageSegment(px:Tensor) :: Image

Tests found for ImageSegment:

  • pytest -sv tests/test_vision_transform.py::test_mask_data_aug [source]

To run tests please refer to this guide.

Support applying transforms to segmentation masks data in px.

To easily open a mask, the function open_mask plays the same role as open_image:

open_mask[source][test]

open_mask(fn:PathOrStr, div=False, convert_mode='L', after_open:Callable=None) → ImageSegment

No tests found for open_mask. To contribute a test please refer to this guide and this discussion.

Return ImageSegment object create from mask in file fn. If div, divides pixel values by 255.

open_mask('imgs/mask_example.png')

Run length encoded masks

From time to time, you may encouter mask data as run lengh encoding string instead of picture.

df = pd.read_csv('imgs/mask_rle_sample.csv')
encoded_str = df.iloc[1]['rle_mask']; 
df[:2]
img rle_mask
0 00087a6bd4dc_01.jpg 879386 40 881253 141 883140 205 885009 17 8850...
1 00087a6bd4dc_02.jpg 873779 4 875695 7 877612 9 879528 12 881267 15...

You can also read a mask in run length encoding, with an extra argument shape for image size

mask = open_mask_rle(df.iloc[0]['rle_mask'], shape=(1918, 1280)).resize((1,128,128))
mask

open_mask_rle[source][test]

open_mask_rle(mask_rle:str, shape:Tuple[int, int]) → ImageSegment

No tests found for open_mask_rle. To contribute a test please refer to this guide and this discussion.

Return ImageSegment object create from run-length encoded string in mask_lre with size in shape.

The open_mask_rle simply make use of the helper function rle_decode

rle_decode(encoded_str, (1912, 1280)).shape
(1912, 1280)

rle_decode[source][test]

rle_decode(mask_rle:str, shape:Tuple[int, int]) → ndarray

Tests found for rle_decode:

  • pytest -sv tests/test_vision_image.py::test_rle_decode_empty_str [source]
  • pytest -sv tests/test_vision_image.py::test_rle_decode_with_str [source]

To run tests please refer to this guide.

Return an image array from run-length encoded string mask_rle with shape.

You can also convert ImageSegment to run length encoding.

type(mask)
fastai.vision.image.ImageSegment
rle_encode(mask.data)
'5943 21 6070 25 6197 26 6324 28 6452 29 6579 30 6707 31 6835 31 6962 32 7090 33 7217 34 7345 35 7473 35 7595 2 7600 36 7722 5 7728 37 7766 4 7850 43 7894 5 7978 43 8022 5 8106 49 8238 44 8366 40 8494 41 8621 42 8748 44 8875 46 9003 47 9130 48 9258 49 9386 49 9513 50 9641 51 9769 51 9897 51 10024 52 10152 53 10280 53 10408 53 10536 53 10664 53 10792 53 10920 53 11048 53 11176 53 11304 53 11432 53 11560 53 11688 53 11816 53 11944 53 12072 53 12200 53 12328 53 12456 53 12584 53 12712 53 12840 53 12968 53 13097 51 13225 51 13353 51 13481 51 13610 49 13742 44 13880 30'

rle_encode[source][test]

rle_encode(img:ndarray) → str

Tests found for rle_encode:

  • pytest -sv tests/test_vision_image.py::test_rle_encode_all_zero_array [source]
  • pytest -sv tests/test_vision_image.py::test_rle_encode_with_array [source]

To run tests please refer to this guide.

Return run-length encoding string from img.

An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. As explained earlier, it's easy to show the segmentation mask over the associated Image by using the y argument of show_image.

img = open_image('imgs/car_example.jpg')
mask = open_mask('imgs/mask_example.png')
_,axs = plt.subplots(1,3, figsize=(8,4))
img.show(ax=axs[0], title='no mask')
img.show(ax=axs[1], y=mask, title='masked')
mask.show(ax=axs[2], title='mask only', alpha=1.)

When the targets are a bunch of points, the following class will help.

class ImagePoints[source][test]

ImagePoints(flow:FlowField, scale:bool=True, y_first:bool=True) :: Image

Tests found for ImagePoints:

  • pytest -sv tests/test_vision_transform.py::test_points_data_aug [source]

To run tests please refer to this guide.

Support applying transforms to a flow of points.

Create an ImagePoints object from a flow of coordinates. Coordinates need to be scaled to the range (-1,1) which will be done in the intialization if scale is left as True. Convention is to have point coordinates in the form [y,x] unless y_first is set to False.

img = open_image('imgs/face_example.jpg')
pnts = torch.load('points.pth')
pnts = ImagePoints(FlowField(img.size, pnts))
img.show(y=pnts)

Note that the raw points are gathered in a FlowField object, which is a class that wraps together a bunch of coordinates with the corresponding image size. In fastai, we expect points to have the y coordinate first by default. The underlying data of pnts is the flow of points scaled from -1 to 1 (again with the y coordinate first):

pnts.data[:10]
tensor([[-0.1875, -0.6000],
        [-0.0500, -0.5875],
        [ 0.0750, -0.5750],
        [ 0.2125, -0.5750],
        [ 0.3375, -0.5375],
        [ 0.4500, -0.4875],
        [ 0.5250, -0.3750],
        [ 0.5750, -0.2375],
        [ 0.5875, -0.1000],
        [ 0.5750,  0.0375]])

For an objection detection task, the target is a bounding box containg the picture.

class ImageBBox[source][test]

ImageBBox(flow:FlowField, scale:bool=True, y_first:bool=True, labels:Collection[T_co]=None, classes:dict=None, pad_idx:int=0) :: ImagePoints

Tests found for ImageBBox:

  • pytest -sv tests/test_vision_transform.py::test_bbox_data_aug [source]

To run tests please refer to this guide.

Support applying transforms to a flow of bounding boxes.

Create an ImageBBox object from a flow of coordinates. Those coordinates are expected to be in a FlowField with an underlying flow of size 4N, if we have N bboxes, describing for each box the top left, top right, bottom left, bottom right corners. Coordinates need to be scaled to the range (-1,1) which will be done in the intialization if scale is left as True. Convention is to have point coordinates in the form [y,x] unless y_first is set to False. labels is an optional collection of labels, which should be the same size as flow. pad_idx is used if the set of transform somehow leaves the image without any bounding boxes.

To create an ImageBBox, you can use the create helper function that takes a list of bounding boxes, the height of the input image, and the width of the input image. Each bounding box is represented by a list of four numbers: the coordinates of the corners of the box with the following convention: top, left, bottom, right.

create[source][test]

create(h:int, w:int, bboxes:Collection[Collection[int]], labels:Collection[T_co]=None, classes:dict=None, pad_idx:int=0, scale:bool=True) → ImageBBox

No tests found for create. To contribute a test please refer to this guide and this discussion.

Create an ImageBBox object from bboxes.

  • h: height of the input image
  • w: width of the input image
  • bboxes: list of bboxes (each of those being four integers with the top, left, bottom, right convention)
  • labels: labels of the images (as indexes)
  • classes: the corresponding classes
  • pad_idx: padding index that will be used to group the ImageBBox in a batch
  • scale: if True, will scale the bounding boxes from -1 to 1

We need to pass the dimensions of the input image so that ImageBBox can internally create the FlowField. Again, the Image.show method will display the bounding box on the same image if it's passed as a y argument.

img = open_image('imgs/car_bbox.jpg')
bbox = ImageBBox.create(*img.size, [[96, 155, 270, 351]], labels=[0], classes=['car'])
img.show(y=bbox)

To help with the conversion of images or to show them, we use these helper functions:

show_image[source][test]

show_image(img:Image, ax:Axes=None, figsize:tuple=(3, 3), hide_axis:bool=True, cmap:str='binary', alpha:float=None, **kwargs) → Axes

No tests found for show_image. To contribute a test please refer to this guide and this discussion.

Display Image in notebook.

pil2tensor[source]