Figure 4. Active learning on benchmark training labels using
LL4AL and random sampling. These methods show similar performance on this dataset. A random sampling strategy can nearly
achieve 0.8 mIoU with only 40 training labels. Experiment results
shown are averages over 5 trials.
Next we evaluated
how the semantic segmentation model performs when a domain shift is introduced. Figure 5 shows the results of unsupervised DA with no target labels when there is a domain shift from a reef habitat to a mangrove habitat. As we
increase the number of training epochs from 10-20 epochs,
validation mIoU for the unsupervised DA method remains
consistently higher than source-only training, but the size
of the effect diminishes. In addition, although unsupervised
DA improves validation set performance, it is still not able
to reach an mIoU of 0.6, and therefore does not produce
performance that would be strong enough to deploy in a
real-world setting. Figure 6 provides a visual comparison
of these results and confirms this intuition. While the unsupervised DA model seems to be less sensitive to misclassifying tree roots as fish when compared with the source-only
method, it still struggles to identify fish effectively.
Figure 5. A comparison of training a semantic segmentation network with and without unsupervised domain adaptation (AdaptSegNet) as the number of training epochs varies.
4.5 Active Domain Adaptation Experiments. We now
shift our attention to jointly evaluating active learning and
unsupervised DA. The question we aim to answer is: what
is the minimum quantity of training labels needed to produce strong mIoU results (0.8 or higher) when shifting to a
new domain?
Using previous active DA experimental setups as a
guideline, [5, 11] we compare active DA approaches to a
baseline of training on source and then training again (finetuning) on target with randomly-selected labels.
The training process for active DA is as follows:
• Iteratively train supervised semantic segmentation
model on labeled source training set (144 labels) and
run unsupervised DA (AdaptSegNet) [12] over 15
epochs. Train on labeled source only and run DA on
both labeled source and unlabeled target training sets.
• Active learning step: apply a sampling strategy, either
random sampling or AADA [11] to select a set of N
training labels from the unlabeled target training set.
• Obtain N chosen labels for target training set.
• Train semantic segmentation model with new target set
labels.
Results of active DA experiments are detailed in Table 2.
All results are evaluated on the target validation set. We are
able to achieve an mIoU of 0.8 when training with 20 target
labels as well as all source labels using a random active DA
strategy. From these results, it is unclear which training
method is the best. Source training and then fine-tuning
on target is the simplest method to implement, and with 80
labels, it performs as well as methods that use DA. Figure 6
displays the results of the best models for each method.
Figure 6. Results of various approaches when trained within a reef habitat and evaluated within a mangrove habitat. For each method, the best result in terms of validation mIoU is shown. See Table 2 for validation set results.
Table 2. Reef → mangrove domain shift: results of training on images of reef habitats and validating on images of mangrove habitats. Each training component was run for 15 epochs.
4.6 Evaluating on the QUT dataset. Finally, we test the
best-performing models on another fish dataset, QUT [1].
The results of this experiment are show in Figure 7. For this
comparison, the benchmark model is our replication of the original DeepFish semantic segmentation model (trained
with 310 samples). AADA and Random Fine-Tuning are
both trained on the domain-shifted dataset (144 source samples and 80 target samples). The benchmark model appears
to perform slightly better when fish are in a higher contrast
setting and worse when fish are in a lower contrast setting.
AADA and Random Fine-Tuning approaches have had to
adjust to a domain shift within the hazy and low visibility
mangrove environment, which may explain why these approaches could adapt better to low contrast settings. Additional out-of-sample validation would help strengthen this
finding.

Figure 7. The best-performing models from our experiments when applied to another fish dataset, QUT [1].
5. Discussion and Future Work
In this paper we explored active learning and domain
adaptation as approaches for reducing the barriers to adopting fish segmentation in the wild. Through experimentation,
we discovered that we only need about 13% (40 labels) of
the original training dataset (310 labels) to achieve strong
validation set performance of 0.79 mIoU. In addition, when
there is a domain shift, we can achieve 0.81 mIoU while
training with a dataset that is about half of the size (144
source labels and 20 target labels) of the original training
set. However, segmentation results visibly improve between
0.8 and 0.9 mIoU, and therefore it is likely preferable to
collect slightly more labels to produce a more robust result.
Random fine-tuning and AADA both achieve an mIoU of
0.90 with 224 total labels (144 source and 80 target), and appear to perform as well or even slightly better than the original benchmark model when evaluated on an outside dataset.
AADA and random fine-tuning were trained in a more challenging, lower-visibility setting, which may be why these
methods appear to be slightly better than the benchmark model at distinguishing fish from their background environments in low-contrast settings. Overall, active domain
adaptation did not outperform random fine-tuning for this
dataset. This could possibly be due to a variety of factors:
the dataset, range of training labels tested, or our implementation of AADA. We should note that, in the original publication, some of the AADA experiments did not outperform
random fine-tuning, and it appeared to be dependent on both
the dataset and number of training labels tested. A future
step we will take is to experiment on standard benchmark datasets for semantic segmentation, such as the GTA5 and
Cityscapes datasets. In addition, we plan to make changes
to the AADA method to further optimize it to the semantic segmentation task, such as by experimenting with batch
settings and sampling criteria. We will also experiment with extending more active DA
methods to semantic segmentation, such as ADA-CLUE.
References
[1] K. Anantharajah, Z. Ge, C. McCool, S. Denman, C. Fookes,
P.I. Corke, D. Tjondronegoro, and S. Sridharan. Local intersession variability modelling for object classification. IEEE
Winter Conference on Applications of Computer Vision, page
309–316, 2014.
[2] FAO. The State of World Fisheries and Aquaculture 2020.
2020.
[3] Rafael Garcia, Ricard Prados, Josep Quintana, Alexander
Tempelaar, Nuno Gracias, Shale Rosen, Havard Vagstøl, and Kristoffer Løvall. Automatic segmentation of fish using deep
learning with application to fish size measurement. ICES
Journal of Marine Science, pages 1354–1366, 2019.
[4] Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea,
Victor Villena-Martinez, and Jose Garcıa Rodrıguez. A review on deep learning techniques applied to semantic segmentation. CoRR, abs/1704.06857, 2017.
[5] Viraj Prabhu, Arjun Chandrasekaran, Kate Saenko, and
Judy Hoffman. Active domain adaptation via clustering
uncertainty-weighted embeddings. In Proceedings of the
IEEE/CVF International Conference on Computer Vision,
pages 8505–8514, 2021.
[6] Antonio Pusceddu, Silvia Bianchelli, Jacobo Martın,
Pere Puig, Albert Palanques, Pere Masque, and Roberto ´
Danovaro. Chronic and intensive bottom trawling impairs
deep-sea biodiversity and ecosystem functioning. Proceedings of the National Academy of Sciences, pages 8861–8866,
2014.
[7] Alzayat Saleh, Issam H. Laradji, Dmitry A. Konovalov,
Michael Bradley, David Vazquez, and Marcus Sheaves.
Deepfish. https://github.com/alzayats/DeepFish, 2020.
[8] Alzayat Saleh, Issam H. Laradji, Dmitry A. Konovalov,
Michael Bradley, David Vazquez, and Marcus Sheaves. A
realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Nature Scientific Reports, 2020.
[9] Burr Settles. Active learning literature survey. 2009. 2
[10] Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully
convolutional networks for semantic segmentation. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
39(4):640–651, 2017.
[11] Jong-Chyi Su, Yi-Hsuan Tsai, Kihyuk Sohn, Buyu Liu,
Subhransu Maji, and Manmohan Chandraker. Active adversarial domain adaptation. CoRR, abs/1904.07848, 2019.
[12] Y.-H. Tsai, W.-C. Hung, S. Schulter, K. Sohn, M.-H. Yang,
and M. Chandraker. Learning to adapt structured output
space for semantic segmentation. In IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2018.
[13] Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang,
and Tao Qin. Generalizing to Unseen Domains: A Survey
on Domain Generalization. CoRR, abs/2103.03097, 2021.
[14] Donggeun Yoo and In So Kweon. Learning loss for active
learning. CoRR, abs/1905.03677, 2019.
[15] Beichen Zhang, Liang Li, Shijie Yang, Shuhui Wang, ZhengJun Zha, and Qingming Huang. State-relabeling adversarial
active learning, 2020.
[16] Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Yong Wang,
and Fang Wen. Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation. CoRR, abs/2101.10979, 2021.