= FastAI.load(datasets()["oxford-iiit-pet"]) dir
"/Users/FA31DU/.julia/datadeps/fastai-oxford-iiit-pet"
FastAI.jl
He we are closely following the FastAI.jl
tutorials on data containers, siamese image similarity
We can load the Pet dataset as follows:
= FastAI.load(datasets()["oxford-iiit-pet"]) dir
"/Users/FA31DU/.julia/datadeps/fastai-oxford-iiit-pet"
readdir(dir)
2-element Vector{String}:
"annotations"
"images"
= joinpath(dir, "images") img_dir
"/Users/FA31DU/.julia/datadeps/fastai-oxford-iiit-pet/images"
FastAI.jl
convention
Using FastAI.jl
convention, we can load a single image as follows:
= loadfolderdata(img_dir; filterfn=FastVision.isimagefile)
files = getobs(files, 1) p
"/Users/FA31DU/.julia/datadeps/fastai-oxford-iiit-pet/images/Abyssinian_1.jpg"
We can see that the file names contain the pet breed.
Using regular expressions, we can extract the pet breed from the file name:
= r"(.+)_\d+.jpg$"
re = pathname(p)
fnamelabel_func(path) = lowercase(match(re, pathname(path))[1])
label_func(fname)
"abyssinian"
Now lets check how many unique pet breeds we have:
= map(label_func, files)
labels length(unique(labels))
37
We can create a function that loads an image and its class:
function loadimageclass(p)
return (
loadfile(p), # broadcasting to make compatible with minibatching
@. pathname(p) |> label_func
@.
)end
= loadimageclass(p)
image, class
@show class
image
class = "abyssinian"
Finally, we can use mapobs
to lazily load all the images and their classes:
= mapobs(loadimageclass, files); data
@show numobs(data)
= getobs(data, 1) image, label
numobs(data) = 7390
(RGB{N0f8}[RGB{N0f8}(0.118,0.149,0.106) RGB{N0f8}(0.118,0.149,0.106) … RGB{N0f8}(0.161,0.192,0.141) RGB{N0f8}(0.157,0.188,0.137); RGB{N0f8}(0.114,0.145,0.102) RGB{N0f8}(0.114,0.145,0.102) … RGB{N0f8}(0.165,0.196,0.145) RGB{N0f8}(0.157,0.188,0.137); … ; RGB{N0f8}(0.047,0.075,0.043) RGB{N0f8}(0.043,0.071,0.039) … RGB{N0f8}(0.059,0.09,0.047) RGB{N0f8}(0.059,0.09,0.047); RGB{N0f8}(0.047,0.075,0.043) RGB{N0f8}(0.043,0.071,0.039) … RGB{N0f8}(0.059,0.09,0.047) RGB{N0f8}(0.059,0.09,0.047)], "abyssinian")
FastAI.jl
convention
Contrary to fast.ai, FastAI.jl
separates the data loading and container generation from the data augmentation. From the documentation:
In FastAI.jl, the preprocessing or “encoding” is implemented through a learning task. Learning tasks contain any configuration and, beside data processing, have extensible functions for visualizations and model building. One advantage of this separation between loading and encoding is that the data container can easily be swapped out as long as it has observations suitable for the learning task (in this case a tuple of two images and a Boolean). It also makes it easy to export models and all the necessary configuration.
First, we follow the standard procedure to split the data into training and validation sets:
= splitobs(data; at=0.8, shuffle=true) train_data, val_data
(ObsView(::MLUtils.MappedData{:auto, typeof(loadimageclass), ObsView{MLDatasets.FileDataset{typeof(identity), String}, Vector{Int64}}}, ::Vector{Int64})
5912 observations, ObsView(::MLUtils.MappedData{:auto, typeof(loadimageclass), ObsView{MLDatasets.FileDataset{typeof(identity), String}, Vector{Int64}}}, ::Vector{Int64})
1478 observations)
Next, we define the data augmentation task separately as a BlockTask
:
= 128
_resize = (
blocks Image{2}(),
Label{String}(unique(labels)),
)= BlockTask(
task
blocks,
( ProjectiveTransforms(
(_resize, _resize), =false,
sharestate=false,
buffered
),ImagePreprocessing(buffered=false),
OneHot(),
)
)describetask(task)
SupervisedTask
summary
Learning task for the supervised task with input Image{2}
and target Label{String}
. Compatible with model
s that take in Bounded{2, FastVision.ImageTensor{2}}
and output OneHotLabel{String}
.
Encoding a sample (encodesample(task, context, sample)
) is done through the following encodings:
Encoding | Name | blocks.input |
blocks.target |
---|---|---|---|
(input, target) |
Image{2} |
Label{String} |
|
ProjectiveTransforms |
Bounded{2, Image{2}} |
||
ImagePreprocessing |
Bounded{2, FastVision.ImageTensor{2}} |
||
OneHot |
(x, y) |
OneHotLabel{String} |
We can apply the augmentation to the data as follows:
= 3
batchsize = taskdataloaders(train_data, val_data, task, batchsize) train_dl, val_dl
(DataLoader(::FastAI.TaskDataset{ObsView{MLUtils.MappedData{:auto, typeof(loadimageclass), ObsView{MLDatasets.FileDataset{typeof(identity), String}, Vector{Int64}}}, Vector{Int64}}, SupervisedTask{NamedTuple{(:input, :target, :sample, :encodedsample, :x, :y, :ŷ, :pred), Tuple{Image{2}, Label{String}, Tuple{Image{2}, Label{String}}, Tuple{Bounded{2, FastVision.ImageTensor{2}}, FastAI.OneHotTensor{0, String}}, Bounded{2, FastVision.ImageTensor{2}}, FastAI.OneHotTensor{0, String}, FastAI.OneHotTensor{0, String}, Label{String}}}, Tuple{ProjectiveTransforms{2, NamedTuple{(:training, :validation, :inference), Tuple{DataAugmentation.Sequence{Tuple{DataAugmentation.CroppedProjectiveTransform{DataAugmentation.ScaleKeepAspect{2}, DataAugmentation.Crop{2, DataAugmentation.FromRandom}}, DataAugmentation.PinOrigin}}, DataAugmentation.Sequence{Tuple{DataAugmentation.CroppedProjectiveTransform{DataAugmentation.ScaleKeepAspect{2}, DataAugmentation.Crop{2, DataAugmentation.FromCenter}}, DataAugmentation.PinOrigin}}, DataAugmentation.Sequence{Tuple{DataAugmentation.CroppedProjectiveTransform{DataAugmentation.ScaleKeepAspect{2}, DataAugmentation.PadDivisible}, DataAugmentation.PinOrigin}}}}}, ImagePreprocessing{N0f8, 3, RGB{N0f8}, Float32}, OneHot{DataType}}}, Training}, parallel=true, shuffle=true, batchsize=3, collate=Val{true}()), DataLoader(::FastAI.TaskDataset{ObsView{MLUtils.MappedData{:auto, typeof(loadimageclass), ObsView{MLDatasets.FileDataset{typeof(identity), String}, Vector{Int64}}}, Vector{Int64}}, SupervisedTask{NamedTuple{(:input, :target, :sample, :encodedsample, :x, :y, :ŷ, :pred), Tuple{Image{2}, Label{String}, Tuple{Image{2}, Label{String}}, Tuple{Bounded{2, FastVision.ImageTensor{2}}, FastAI.OneHotTensor{0, String}}, Bounded{2, FastVision.ImageTensor{2}}, FastAI.OneHotTensor{0, String}, FastAI.OneHotTensor{0, String}, Label{String}}}, Tuple{ProjectiveTransforms{2, NamedTuple{(:training, :validation, :inference), Tuple{DataAugmentation.Sequence{Tuple{DataAugmentation.CroppedProjectiveTransform{DataAugmentation.ScaleKeepAspect{2}, DataAugmentation.Crop{2, DataAugmentation.FromRandom}}, DataAugmentation.PinOrigin}}, DataAugmentation.Sequence{Tuple{DataAugmentation.CroppedProjectiveTransform{DataAugmentation.ScaleKeepAspect{2}, DataAugmentation.Crop{2, DataAugmentation.FromCenter}}, DataAugmentation.PinOrigin}}, DataAugmentation.Sequence{Tuple{DataAugmentation.CroppedProjectiveTransform{DataAugmentation.ScaleKeepAspect{2}, DataAugmentation.PadDivisible}, DataAugmentation.PinOrigin}}}}}, ImagePreprocessing{N0f8, 3, RGB{N0f8}, Float32}, OneHot{DataType}}}, Validation}, parallel=true, batchsize=6, collate=Val{true}()))
Let’s quickly verify that the images look as expected:
showbatch(task, first(train_dl))
Finally, we can build our model as follows. First, we define the backbone:
# Get backbone:
= Metalhead.ResNet(18, pretrain=true).layers[1][1:end-1] _backbone
Chain( Chain( Conv((7, 7), 3 => 64, pad=3, stride=2, bias=false), # 9_408 parameters BatchNorm(64, relu), # 128 parameters, plus 128 MaxPool((3, 3), pad=1, stride=2), ), Chain( Parallel( addact(NNlib.relu, ...), identity, Chain( Conv((3, 3), 64 => 64, pad=1, bias=false), # 36_864 parameters BatchNorm(64), # 128 parameters, plus 128 NNlib.relu, Conv((3, 3), 64 => 64, pad=1, bias=false), # 36_864 parameters BatchNorm(64), # 128 parameters, plus 128 ), ), Parallel( addact(NNlib.relu, ...), identity, Chain( Conv((3, 3), 64 => 64, pad=1, bias=false), # 36_864 parameters BatchNorm(64), # 128 parameters, plus 128 NNlib.relu, Conv((3, 3), 64 => 64, pad=1, bias=false), # 36_864 parameters BatchNorm(64), # 128 parameters, plus 128 ), ), ), Chain( Parallel( addact(NNlib.relu, ...), Chain( Conv((1, 1), 64 => 128, stride=2, bias=false), # 8_192 parameters BatchNorm(128), # 256 parameters, plus 256 ), Chain( Conv((3, 3), 64 => 128, pad=1, stride=2, bias=false), # 73_728 parameters BatchNorm(128), # 256 parameters, plus 256 NNlib.relu, Conv((3, 3), 128 => 128, pad=1, bias=false), # 147_456 parameters BatchNorm(128), # 256 parameters, plus 256 ), ), Parallel( addact(NNlib.relu, ...), identity, Chain( Conv((3, 3), 128 => 128, pad=1, bias=false), # 147_456 parameters BatchNorm(128), # 256 parameters, plus 256 NNlib.relu, Conv((3, 3), 128 => 128, pad=1, bias=false), # 147_456 parameters BatchNorm(128), # 256 parameters, plus 256 ), ), ), Chain( Parallel( addact(NNlib.relu, ...), Chain( Conv((1, 1), 128 => 256, stride=2, bias=false), # 32_768 parameters BatchNorm(256), # 512 parameters, plus 512 ), Chain( Conv((3, 3), 128 => 256, pad=1, stride=2, bias=false), # 294_912 parameters BatchNorm(256), # 512 parameters, plus 512 NNlib.relu, Conv((3, 3), 256 => 256, pad=1, bias=false), # 589_824 parameters BatchNorm(256), # 512 parameters, plus 512 ), ), Parallel( addact(NNlib.relu, ...), identity, Chain( Conv((3, 3), 256 => 256, pad=1, bias=false), # 589_824 parameters BatchNorm(256), # 512 parameters, plus 512 NNlib.relu, Conv((3, 3), 256 => 256, pad=1, bias=false), # 589_824 parameters BatchNorm(256), # 512 parameters, plus 512 ), ), ), ) # Total: 45 trainable arrays, 2_782_784 parameters, # plus 30 non-trainable, 4_480 parameters, summarysize 10.649 MiB.
Here we have removed the final layer of the ResNet model, because we will instead use a custom head. We could use the taskmodel
function to build the model with an appropriate head automatically:
= taskmodel(task, _backbone)
model end] model.layers[
Chain( Parallel(vcat, AdaptiveMeanPool((1, 1)), AdaptiveMaxPool((1, 1))), Flux.flatten, Chain( BatchNorm(512), # 1_024 parameters, plus 1_024 identity, Dense(512 => 512, relu; bias=false), # 262_144 parameters ), Chain( BatchNorm(512), # 1_024 parameters, plus 1_024 identity, Dense(512 => 37; bias=false), # 18_944 parameters ), ) # Total: 6 trainable arrays, 283_136 parameters, # plus 4 non-trainable, 2_048 parameters, summarysize 1.089 MiB.
Equivalently, we could have obtained an appropriate head as follows,
= Flux.outputsize(_backbone, (_resize, _resize, 3, 1))
h, w, ch, b = Models.visionhead(ch, length(unique(labels))) _head
Chain( Parallel(vcat, AdaptiveMeanPool((1, 1)), AdaptiveMaxPool((1, 1))), Flux.flatten, Chain( BatchNorm(512), # 1_024 parameters, plus 1_024 identity, Dense(512 => 512, relu; bias=false), # 262_144 parameters ), Chain( BatchNorm(512), # 1_024 parameters, plus 1_024 identity, Dense(512 => 37; bias=false), # 18_944 parameters ), ) # Total: 6 trainable arrays, 283_136 parameters, # plus 4 non-trainable, 2_048 parameters, summarysize 1.089 MiB.
and then construct our model by chaining the backbone and head:
Chain(_backbone, _head)
Chain( Chain( Chain( Conv((7, 7), 3 => 64, pad=3, stride=2, bias=false), # 9_408 parameters BatchNorm(64, relu), # 128 parameters, plus 128 MaxPool((3, 3), pad=1, stride=2), ), Chain( Parallel( addact(NNlib.relu, ...), identity, Chain( Conv((3, 3), 64 => 64, pad=1, bias=false), # 36_864 parameters BatchNorm(64), # 128 parameters, plus 128 NNlib.relu, Conv((3, 3), 64 => 64, pad=1, bias=false), # 36_864 parameters BatchNorm(64), # 128 parameters, plus 128 ), ), Parallel( addact(NNlib.relu, ...), identity, Chain( Conv((3, 3), 64 => 64, pad=1, bias=false), # 36_864 parameters BatchNorm(64), # 128 parameters, plus 128 NNlib.relu, Conv((3, 3), 64 => 64, pad=1, bias=false), # 36_864 parameters BatchNorm(64), # 128 parameters, plus 128 ), ), ), Chain( Parallel( addact(NNlib.relu, ...), Chain( Conv((1, 1), 64 => 128, stride=2, bias=false), # 8_192 parameters BatchNorm(128), # 256 parameters, plus 256 ), Chain( Conv((3, 3), 64 => 128, pad=1, stride=2, bias=false), # 73_728 parameters BatchNorm(128), # 256 parameters, plus 256 NNlib.relu, Conv((3, 3), 128 => 128, pad=1, bias=false), # 147_456 parameters BatchNorm(128), # 256 parameters, plus 256 ), ), Parallel( addact(NNlib.relu, ...), identity, Chain( Conv((3, 3), 128 => 128, pad=1, bias=false), # 147_456 parameters BatchNorm(128), # 256 parameters, plus 256 NNlib.relu, Conv((3, 3), 128 => 128, pad=1, bias=false), # 147_456 parameters BatchNorm(128), # 256 parameters, plus 256 ), ), ), Chain( Parallel( addact(NNlib.relu, ...), Chain( Conv((1, 1), 128 => 256, stride=2, bias=false), # 32_768 parameters BatchNorm(256), # 512 parameters, plus 512 ), Chain( Conv((3, 3), 128 => 256, pad=1, stride=2, bias=false), # 294_912 parameters BatchNorm(256), # 512 parameters, plus 512 NNlib.relu, Conv((3, 3), 256 => 256, pad=1, bias=false), # 589_824 parameters BatchNorm(256), # 512 parameters, plus 512 ), ), Parallel( addact(NNlib.relu, ...), identity, Chain( Conv((3, 3), 256 => 256, pad=1, bias=false), # 589_824 parameters BatchNorm(256), # 512 parameters, plus 512 NNlib.relu, Conv((3, 3), 256 => 256, pad=1, bias=false), # 589_824 parameters BatchNorm(256), # 512 parameters, plus 512 ), ), ), ), Chain( Parallel(vcat, AdaptiveMeanPool((1, 1)), AdaptiveMaxPool((1, 1))), Flux.flatten, Chain( BatchNorm(512), # 1_024 parameters, plus 1_024 identity, Dense(512 => 512, relu; bias=false), # 262_144 parameters ), Chain( BatchNorm(512), # 1_024 parameters, plus 1_024 identity, Dense(512 => 37; bias=false), # 18_944 parameters ), ), ) # Total: 51 trainable arrays, 3_065_920 parameters, # plus 34 non-trainable, 6_528 parameters, summarysize 11.740 MiB.
With the model defined, we can now create a Learner
object from scratch:
# Task data loader for new batch size:
= 64
batchsize = taskdataloaders(train_data, val_data, task, batchsize)
train_dl, val_dl
# Set up loss function, optimizer, callbacks, and learner:
= Flux.Losses.logitcrossentropy
lossfn = Flux.Adam()
optimizer error_rate(ŷ, y) = mean(onecold(ŷ) .!= onecold(y))
= [ToGPU(), Metrics(error_rate)]
callbacks
= Learner(
learner
model, (train_dl, val_dl),
optimizer, lossfn,...
callbacks )
Learner()
FastAI.jl
way
Most of the manual jobs above can be done automatically using the tasklearner
function:
= tasklearner(
learner
task, train_data, val_data; =_backbone, callbacks=callbacks,
backbone=lossfn, optimizer=optimizer, batchsize=batchsize,
lossfn )
Learner()
Note that in this case, we pass on the raw, non-encoded data to the tasklearner
function. This is because the tasklearner
function will automatically encode the data using the task
object.
We will begin by using the learning rate finder to find a good learning rate:
= lrfind(learner) res
Below we fine-tune the model for 5 epochs and then save it to disk:
finetune!(learner, 5, 2e-3)
Note that by default, this will train the model for one epoch with pre-trained weights (our _backbone
) completely frozen. In other weights, only the parameters of our _head
will be updated during this epoch, before the second phase of training begins.
Now we will fit the whole training cycle:
fitonecycle!(learner, 5, 2e-3)
savetaskmodel("artifacts/c5_resnet.jld2", task, learner.model, force=true)
Using our model, we can now make predictions on the validation set as follows:
= loadtaskmodel("artifacts/c5_resnet.jld2")
task, model
= [getobs(data, i) for i in rand(1:numobs(val_data), 3)]
samples = [sample[1] for sample in samples]
images = [sample[2] for sample in samples]
_labels
= predictbatch(task, model, images; device = gpu, context=Validation()) preds
┌ Info: The GPU function is being called but the GPU is not accessible.
└ Defaulting back to the CPU. (No action is required if you want to run on the CPU).
3-element Vector{String}:
"bombay"
"persian"
"bombay"
The accuracy is given by:
= sum(_labels .== preds) / length(preds) acc
1.0
We can visualize the predictions as follows:
showsamples(task, collect(zip(images, preds)))