# Core vision ``` python from fastai.data.external import * ``` ## Helpers ``` python im = Image.open(TEST_IMAGE).resize((30,20)) ``` ------------------------------------------------------------------------ ### Image.n_px > Image.n_px (x:PIL.Image.Image) ``` python test_eq(im.n_px, 30*20) ``` ------------------------------------------------------------------------ ### Image.shape > Image.shape (x:PIL.Image.Image) ``` python test_eq(im.shape, (20,30)) ``` ------------------------------------------------------------------------ ### Image.aspect > Image.aspect (x:PIL.Image.Image) ``` python test_eq(im.aspect, 30/20) ``` ------------------------------------------------------------------------ ### Image.reshape > Image.reshape (x:PIL.Image.Image, h, w, resample=0) *`resize` `x` to `(w,h)`* ------------------------------------------------------------------------ ### Image.reshape > Image.reshape (x:PIL.Image.Image, h, w, resample=0) *`resize` `x` to `(w,h)`* ``` python test_eq(im.reshape(12,10).shape, (12,10)) ``` ------------------------------------------------------------------------ ### Image.to_bytes_format > Image.to_bytes_format (im:PIL.Image.Image, format='png') *Convert to bytes, default to PNG format* ------------------------------------------------------------------------ ### Image.to_bytes_format > Image.to_bytes_format (im:PIL.Image.Image, format='png') *Convert to bytes, default to PNG format* ------------------------------------------------------------------------ ### Image.to_thumb > Image.to_thumb (h, w=None) *Same as `thumbnail`, but uses a copy* ------------------------------------------------------------------------ ### Image.to_thumb > Image.to_thumb (h, w=None) *Same as `thumbnail`, but uses a copy* ------------------------------------------------------------------------ ### Image.resize_max > Image.resize_max (x:PIL.Image.Image, resample=0, max_px=None, max_h=None, > max_w=None) *`resize` `x` to `max_px`, or `max_h`, or `max_w`* ``` python test_eq(im.resize_max(max_px=20*30).shape, (20,30)) test_eq(im.resize_max(max_px=300).n_px, 294) test_eq(im.resize_max(max_px=500, max_h=10, max_w=20).shape, (10,15)) test_eq(im.resize_max(max_h=14, max_w=15).shape, (10,15)) test_eq(im.resize_max(max_px=300, max_h=10, max_w=25).shape, (10,15)) ``` ------------------------------------------------------------------------ ### Image.resize_max > Image.resize_max (x:PIL.Image.Image, resample=0, max_px=None, max_h=None, > max_w=None) *`resize` `x` to `max_px`, or `max_h`, or `max_w`* ## Basic types This section regroups the basic types used in vision with the transform that create objects of those types. ------------------------------------------------------------------------ source ### to_image > to_image (x) *Convert a tensor or array to a PIL int8 Image* ------------------------------------------------------------------------ source ### load_image > load_image (fn, mode=None) *Open and load a `PIL.Image` and convert to `mode`* ------------------------------------------------------------------------ source ### image2tensor > image2tensor (img) *Transform image to byte tensor in `c*h*w` dim order.* ------------------------------------------------------------------------ source ### PILBase > PILBase () *Base class for a Pillow `Image` that can show itself and convert to a Tensor* ------------------------------------------------------------------------ source ### PILBase.create > PILBase.create > (fn:pathlib.Path|str|torch.Tensor|numpy.ndarray|bytes|PIL > .Image.Image, **kwargs) *Return an Image from `fn`* Images passed to [`PILBase`](https://docs.fast.ai/vision.core.html#pilbase) or inherited classes’ `create` as a PyTorch `Tensor`, NumPy `ndarray`, or Pillow `Image` must already be in the correct Pillow image format. For example, `uint8`, and RGB or BW for [`PILImage`](https://docs.fast.ai/vision.core.html#pilimage) or [`PILImageBW`](https://docs.fast.ai/vision.core.html#pilimagebw), respectively. ------------------------------------------------------------------------ source ### PILBase.show > PILBase.show (ctx=None, **kwargs) *Show image using `merge(self._show_args, kwargs)`* ------------------------------------------------------------------------ source ### PILImage > PILImage () *A RGB Pillow `Image` that can show itself and converts to [`TensorImage`](https://docs.fast.ai/torch_core.html#tensorimage)* ------------------------------------------------------------------------ source ### PILImageBW > PILImageBW () *A BW Pillow `Image` that can show itself and converts to [`TensorImageBW`](https://docs.fast.ai/torch_core.html#tensorimagebw)* ``` python im = PILImage.create(TEST_IMAGE) test_eq(type(im), PILImage) test_eq(im.mode, 'RGB') test_eq(str(im), 'PILImage mode=RGB size=1200x803') ``` ``` python im2 = PILImage.create(im) test_eq(type(im2), PILImage) test_eq(im2.mode, 'RGB') test_eq(str(im2), 'PILImage mode=RGB size=1200x803') ``` ``` python im.resize((64,64)) ``` ![](07_vision.core_files/figure-commonmark/cell-30-output-1.png) ``` python ax = im.show(figsize=(1,1)) ``` ![](07_vision.core_files/figure-commonmark/cell-31-output-1.png) ``` python test_fig_exists(ax) ``` ``` python timg = TensorImage(image2tensor(im)) tpil = PILImage.create(timg) ``` ``` python tpil.resize((64,64)) ``` ![](07_vision.core_files/figure-commonmark/cell-34-output-1.png) ------------------------------------------------------------------------ source ### PILMask > PILMask () *A Pillow `Image` Mask that can show itself and converts to [`TensorMask`](https://docs.fast.ai/torch_core.html#tensormask)* ``` python im = PILMask.create(TEST_IMAGE) test_eq(type(im), PILMask) test_eq(im.mode, 'L') test_eq(str(im), 'PILMask mode=L size=1200x803') ``` ### Images ``` python mnist = untar_data(URLs.MNIST_TINY) fns = get_image_files(mnist) mnist_fn = TEST_IMAGE_BW ``` ``` python timg = Transform(PILImageBW.create) mnist_img = timg(mnist_fn) test_eq(mnist_img.size, (28,28)) assert isinstance(mnist_img, PILImageBW) mnist_img ``` ![](07_vision.core_files/figure-commonmark/cell-38-output-1.png) ### Segmentation masks ------------------------------------------------------------------------ source ### AddMaskCodes > AddMaskCodes (codes=None) *Add the code metadata to a [`TensorMask`](https://docs.fast.ai/torch_core.html#tensormask)* ``` python camvid = untar_data(URLs.CAMVID_TINY) fns = get_image_files(camvid/'images') cam_fn = fns[0] mask_fn = camvid/'labels'/f'{cam_fn.stem}_P{cam_fn.suffix}' ``` ``` python cam_img = PILImage.create(cam_fn) test_eq(cam_img.size, (128,96)) tmask = Transform(PILMask.create) mask = tmask(mask_fn) test_eq(type(mask), PILMask) test_eq(mask.size, (128,96)) ``` ``` python _,axs = plt.subplots(1,3, figsize=(12,3)) cam_img.show(ctx=axs[0], title='image') mask.show(alpha=1, ctx=axs[1], vmin=1, vmax=30, title='mask') cam_img.show(ctx=axs[2], title='superimposed') mask.show(ctx=axs[2], vmin=1, vmax=30); ``` ![](07_vision.core_files/figure-commonmark/cell-42-output-1.png) ### Points ------------------------------------------------------------------------ source ### TensorPoint > TensorPoint (x, **kwargs) *Basic type for points in an image* Points are expected to come as an array/tensor of shape `(n,2)` or as a list of lists with two elements. Unless you change the defaults in [`PointScaler`](https://docs.fast.ai/vision.core.html#pointscaler) (see later on), coordinates should go from 0 to width/height, with the first one being the column index (so from 0 to width) and the second one being the row index (so from 0 to height).

> **Note** > > This is different from the usual indexing convention for arrays in > numpy or in PyTorch, but it’s the way points are expected by > matplotlib or the internal functions in PyTorch like `F.grid_sample`.

``` python pnt_img = TensorImage(mnist_img.resize((28,35))) pnts = np.array([[0,0], [0,35], [28,0], [28,35], [9, 17]]) tfm = Transform(TensorPoint.create) tpnts = tfm(pnts) test_eq(tpnts.shape, [5,2]) test_eq(tpnts.dtype, torch.float32) ``` ``` python ctx = pnt_img.show(figsize=(1,1), cmap='Greys') tpnts.show(ctx=ctx); ``` ![](07_vision.core_files/figure-commonmark/cell-45-output-1.png) ### Bounding boxes ------------------------------------------------------------------------ source ### get_annotations > get_annotations (fname, prefix=None) *Open a COCO style json in `fname` and returns the lists of filenames (with maybe `prefix`) and labelled bboxes.* Test [`get_annotations`](https://docs.fast.ai/vision.core.html#get_annotations) on the coco_tiny dataset against both image filenames and bounding box labels. ``` python coco = untar_data(URLs.COCO_TINY) test_images, test_lbl_bbox = get_annotations(coco/'train.json') annotations = json.load(open(coco/'train.json')) categories, images, annots = map(lambda x:L(x),annotations.values()) test_eq(test_images, images.attrgot('file_name')) def bbox_lbls(file_name): img = images.filter(lambda img:img['file_name']==file_name)[0] bbs = annots.filter(lambda a:a['image_id'] == img['id']) i2o = {k['id']:k['name'] for k in categories} lbls = [i2o[cat] for cat in bbs.attrgot('category_id')] bboxes = [[bb[0],bb[1], bb[0]+bb[2], bb[1]+bb[3]] for bb in bbs.attrgot('bbox')] return [bboxes, lbls] for idx in random.sample(range(len(images)),5): test_eq(test_lbl_bbox[idx], bbox_lbls(test_images[idx])) ``` ------------------------------------------------------------------------ source ### TensorBBox > TensorBBox (x, **kwargs) *Basic type for a tensor of bounding boxes in an image* Bounding boxes are expected to come as tuple with an array/tensor of shape `(n,4)` or as a list of lists with four elements and a list of corresponding labels. Unless you change the defaults in [`PointScaler`](https://docs.fast.ai/vision.core.html#pointscaler) (see later on), coordinates for each bounding box should go from 0 to width/height, with the following convention: x1, y1, x2, y2 where (x1,y1) is your top-left corner and (x2,y2) is your bottom-right corner.

> **Note** > > We use the same convention as for points with x going from 0 to width > and y going from 0 to height.

------------------------------------------------------------------------ source ### LabeledBBox > LabeledBBox (items=None, *rest, use_list=False, match=None) *Basic type for a list of bounding boxes in an image* ``` python coco = untar_data(URLs.COCO_TINY) images, lbl_bbox = get_annotations(coco/'train.json') idx=2 coco_fn,bbox = coco/'train'/images[idx],lbl_bbox[idx] coco_img = timg(coco_fn) ``` ``` python tbbox = LabeledBBox(TensorBBox(bbox[0]), bbox[1]) ctx = coco_img.show(figsize=(3,3), cmap='Greys') tbbox.show(ctx=ctx); ``` ![](07_vision.core_files/figure-commonmark/cell-51-output-1.png) ## Basic Transforms Unless specifically mentioned, all the following transforms can be used as single-item transforms (in one of the list in the `tfms` you pass to a `TfmdDS` or a `Datasource`) or tuple transforms (in the `tuple_tfms` you pass to a `TfmdDS` or a `Datasource`). The safest way that will work across applications is to always use them as `tuple_tfms`. For instance, if you have points or bounding boxes as targets and use [`Resize`](https://docs.fast.ai/vision.augment.html#resize) as a single-item transform, when you get to [`PointScaler`](https://docs.fast.ai/vision.core.html#pointscaler) (which is a tuple transform) you won’t have the correct size of the image to properly scale your points. ------------------------------------------------------------------------ source ### ToTensor > ToTensor (enc=None, dec=None, split_idx=None, order=None) *Convert item to appropriate tensor class* ------------------------------------------------------------------------ source ### ToTensor > ToTensor (enc=None, dec=None, split_idx=None, order=None) *Convert item to appropriate tensor class* Any data augmentation transform that runs on PIL Images must be run before this transform. ``` python tfm = ToTensor() print(tfm) print(type(mnist_img)) print(type(tfm(mnist_img))) ``` ToTensor(enc:2,dec:0) ``` python tfm = ToTensor() test_eq(tfm(mnist_img).shape, (1,28,28)) test_eq(type(tfm(mnist_img)), TensorImageBW) test_eq(tfm(mask).shape, (96,128)) test_eq(type(tfm(mask)), TensorMask) ``` Let’s confirm we can pipeline this with `PILImage.create`. ``` python pipe_img = Pipeline([PILImageBW.create, ToTensor()]) img = pipe_img(mnist_fn) test_eq(type(img), TensorImageBW) pipe_img.show(img, figsize=(1,1)); ``` ![](07_vision.core_files/figure-commonmark/cell-56-output-1.png) ``` python def _cam_lbl(x): return mask_fn cam_tds = Datasets([cam_fn], [[PILImage.create, ToTensor()], [_cam_lbl, PILMask.create, ToTensor()]]) show_at(cam_tds, 0); ``` ![](07_vision.core_files/figure-commonmark/cell-57-output-1.png) To work with data augmentation, and in particular the `grid_sample` method, points need to be represented with coordinates going from -1 to 1 (-1 being top or left, 1 bottom or right), which will be done unless you pass `do_scale=False`. We also need to make sure they are following our convention of points being x,y coordinates, so pass along `y_first=True` if you have your data in an y,x format to add a flip.

> **Warning** > > This transform needs to run on the tuple level, before any transform > that changes the image size.

------------------------------------------------------------------------ source ### PointScaler > PointScaler (do_scale=True, y_first=False) *Scale a tensor representing points* To work with data augmentation, and in particular the `grid_sample` method, points need to be represented with coordinates going from -1 to 1 (-1 being top or left, 1 bottom or right), which will be done unless you pass `do_scale=False`. We also need to make sure they are following our convention of points being x,y coordinates, so pass along `y_first=True` if you have your data in an y,x format to add a flip.

> **Note** > > This transform automatically grabs the sizes of the images it sees > before a TensorPoint object and embeds it in them. For > this to work, those images need to be before any points in the order > of your final tuple. If you don’t have such images, you need to embed > the size of the corresponding image when creating a > TensorPoint by passing it with `sz=...`.

``` python def _pnt_lbl(x): return TensorPoint.create(pnts) def _pnt_open(fn): return PILImage(PILImage.create(fn).resize((28,35))) pnt_tds = Datasets([mnist_fn], [_pnt_open, [_pnt_lbl]]) pnt_tdl = TfmdDL(pnt_tds, bs=1, after_item=[PointScaler(), ToTensor()]) ``` ``` python test_eq(pnt_tdl.after_item.c, 10) ``` ``` python x,y = pnt_tdl.one_batch() #Scaling and flipping properly done #NB: we added a point earlier at (9,17); formula below scales to (-1,1) coords test_close(y[0], tensor([[-1., -1.], [-1., 1.], [1., -1.], [1., 1.], [9/14-1, 17/17.5-1]])) a,b = pnt_tdl.decode_batch((x,y))[0] test_eq(b, tensor(pnts).float()) #Check types test_eq(type(x), TensorImage) test_eq(type(y), TensorPoint) test_eq(type(a), TensorImage) test_eq(type(b), TensorPoint) test_eq(b.img_size, (28,35)) #Automatically picked the size of the input ``` ``` python pnt_tdl.show_batch(figsize=(2,2), cmap='Greys'); ``` ![](07_vision.core_files/figure-commonmark/cell-62-output-1.png) ------------------------------------------------------------------------ source ### BBoxLabeler > BBoxLabeler (enc=None, dec=None, split_idx=None, order=None) *Delegates (`__call__`,`decode`,`setup`) to (encodes,decodes,setups) if `split_idx` matches* ------------------------------------------------------------------------ source ### MultiCategorize > MultiCategorize (vocab=None, add_na=False) *Reversible transform of multi-category strings to `vocab` id* ------------------------------------------------------------------------ source ### PointScaler > PointScaler (do_scale=True, y_first=False) *Scale a tensor representing points* ------------------------------------------------------------------------ source ### PointScaler > PointScaler (do_scale=True, y_first=False) *Scale a tensor representing points* ``` python def _coco_bb(x): return TensorBBox.create(bbox[0]) def _coco_lbl(x): return bbox[1] coco_tds = Datasets([coco_fn], [PILImage.create, [_coco_bb], [_coco_lbl, MultiCategorize(add_na=True)]], n_inp=1) coco_tdl = TfmdDL(coco_tds, bs=1, after_item=[BBoxLabeler(), PointScaler(), ToTensor()]) ``` ``` python Categorize(add_na=True) ``` Categorize -- {'vocab': None, 'sort': True, 'add_na': True} (enc:1,dec:1) ``` python coco_tds.tfms ``` (#3) [Pipeline: PILBase.create,Pipeline: _coco_bb,Pipeline: _coco_lbl -> MultiCategorize -- {'vocab': None, 'sort': True, 'add_na': True}] ``` python x,y,z ``` (PILImage mode=RGB size=128x128, TensorBBox([[-0.9011, -0.4606, 0.1416, 0.6764], [ 0.2000, -0.2405, 1.0000, 0.9102], [ 0.4909, -0.9325, 0.9284, -0.5011]]), TensorMultiCategory([1, 1, 1])) ``` python x,y,z = coco_tdl.one_batch() test_close(y[0], -1+tensor(bbox[0])/64) test_eq(z[0], tensor([1,1,1])) a,b,c = coco_tdl.decode_batch((x,y,z))[0] test_close(b, tensor(bbox[0]).float()) test_eq(c.bbox, b) test_eq(c.lbl, bbox[1]) #Check types test_eq(type(x), TensorImage) test_eq(type(y), TensorBBox) test_eq(type(z), TensorMultiCategory) test_eq(type(a), TensorImage) test_eq(type(b), TensorBBox) test_eq(type(c), LabeledBBox) test_eq(y.img_size, (128,128)) ``` ``` python coco_tdl.show_batch(); ``` ![](07_vision.core_files/figure-commonmark/cell-72-output-1.png)