<p>List of contributors xi</p> <p>About the editors xiii</p> <p>Preface xv</p> <p>1. The dramatically changing face of computer vision</p> <p>E.R. DAVIES</p> <p>1.1 Introduction – computer vision and its origins 1</p> <p>1.2 Part A – Understanding low-level image processing operators 4</p> <p>1.3 Part B – 2-D object location and recognition 15</p> <p>1.4 Part C – 3-D object location and the importance of invariance 29</p> <p>1.5 Part D – Tracking moving objects 55</p> <p>1.6 Part E – Texture analysis 61</p> <p>1.7 Part F – From artificial neural networks to deep learning methods 68</p> <p>1.8 Part G – Summary 86</p> <p>References 87</p> <p>2. Advanced methods for robust object detection</p> <p>ZHAOWEI CAI AND NUNO VASCONCELOS</p> <p>2.1 Introduction 93</p> <p>2.2 Preliminaries 95</p> <p>2.3 R-CNN 96</p> <p>2.4 SPP-Net 97</p> <p>2.5 Fast R-CNN 98</p> <p>2.6 Faster R-CNN 101</p> <p>2.7 Cascade R-CNN 103</p> <p>2.8 Multiscale feature representation 106</p> <p>2.9 YOLO 110</p> <p>2.10 SSD 112</p> <p>2.11 RetinaNet 113</p> <p>2.12 Detection performances 115</p> <p>2.13 Conclusion 115</p> <p>References 116</p> <p>3. Learning with limited supervision</p> <p>SUJOY PAUL AND AMIT K. ROY-CHOWDHURY</p> <p>3.1 Introduction 119</p> <p>3.2 Context-aware active learning 120</p> <p>3.3 Weakly supervised event localization 129</p> <p>3.4 Domain adaptation of semantic segmentation using weak labels 137</p> <p>3.5 Weakly-supervised reinforcement learning for dynamical tasks 144</p> <p>3.6 Conclusions 151</p> <p>References 153</p> <p>4. Efficient methods for deep learning</p> <p>HAN CAI, JI LIN, AND SONG HAN</p> <p>4.1 Model compression 159</p> <p>4.2 Efficient neural network architectures 170</p> <p>4.3 Conclusion 185</p> <p>References 185</p> <p>5. Deep conditional image generation</p> <p>GANG HUA AND DONGDONG CHEN</p> <p>5.1 Introduction 191</p> <p>5.2 Visual pattern learning: a brief review 194</p> <p>5.3 Classical generative models 195</p> <p>5.4 Deep generative models 197</p> <p>5.5 Deep conditional image generation 200</p> <p>5.6 Disentanglement for controllable synthesis 201</p> <p>5.7 Conclusion and discussions 216</p> <p>References 216</p> <p>6. Deep face recognition using full and partial face images</p> <p>HASSAN UGAIL</p> <p>6.1 Introduction 221</p> <p>6.2 Components of deep face recognition 227</p> <p>6.3 Face recognition using full face images 231</p> <p>6.4 Deep face recognition using partial face data 233</p> <p>6.5 Specific model training for full and partial faces 237</p> <p>6.6 Discussion and conclusions 239</p> <p>References 240</p> <p>7. Unsupervised domain adaptation using shallow and deep representations</p> <p>YOGESH BALAJI, HIEN NGUYEN, AND RAMA CHELLAPPA</p> <p>7.1 Introduction 243</p> <p>7.2 Unsupervised domain adaptation using manifolds 244</p> <p>7.3 Unsupervised domain adaptation using dictionaries 247</p> <p>7.4 Unsupervised domain adaptation using deep networks 258</p> <p>7.5 Summary 270</p> <p>References 270</p> <p>8. Domain adaptation and continual learning in semantic segmentation</p> <p>UMBERTO MICHIELI, MARCO TOLDO, AND PIETRO ZANUTTIGH</p> <p>8.1 Introduction 275</p> <p>8.2 Unsupervised domain adaptation 277</p> <p>8.3 Continual learning 291</p> <p>8.4 Conclusion 298</p> <p>References 299</p> <p>9. Visual tracking</p> <p>MICHAEL FELSBERG</p> <p>9.1 Introduction 305</p> <p>9.2 Template-based methods 308</p> <p>9.3 Online-learning-based methods 314</p> <p>9.4 Deep learning-based methods 323</p> <p>9.5 The transition from tracking to segmentation 327</p> <p>9.6 Conclusions 331</p> <p>References 332</p> <p>10. Long-term deep object tracking</p> <p>EFSTRATIOS GAVVES AND DEEPAK GUPTA</p> <p>10.1 Introduction 337</p> <p>10.2 Short-term visual object tracking 341</p> <p>10.3 Long-term visual object tracking 345</p> <p>10.4 Discussion 367</p> <p>References 368</p> <p>11. Learning for action-based scene understanding</p> <p>CORNELIA FERMÜLLER AND MICHAEL MAYNORD</p> <p>11.1 Introduction 373</p> <p>11.2 Affordances of objects 375</p> <p>11.3 Functional parsing of manipulation actions 383</p> <p>11.4 Functional scene understanding through deep learning with language and vision 390</p> <p>11.5 Future directions 397</p> <p>11.6 Conclusions 399</p> <p>References 399</p> <p>12. Self-supervised temporal event segmentation inspired by cognitive theories</p> <p>RAMY MOUNIR, SATHYANARAYANAN AAKUR, AND SUDEEP SARKAR</p> <p>12.1 Introduction 406</p> <p>12.2 The event segmentation theory from cognitive science 408</p> <p>12.3 Version 1: single-pass temporal segmentation using prediction 410</p> <p>12.4 Version 2: segmentation using attention-based event models 421</p> <p>12.5 Version 3: spatio-temporal localization using prediction loss map 428</p> <p>12.6 Other event segmentation approaches in computer vision 440</p> <p>12.7 Conclusions 443</p> <p>References 444</p> <p>13. Probabilistic anomaly detection methods using learned models from time-series data for multimedia self-aware</p> <p>systems</p> <p>CARLO REGAZZONI, ALI KRAYANI, GIULIA SLAVIC, AND LUCIO MARCENARO</p> <p>13.1 Introduction 450</p> <p>13.2 Base concepts and state of the art 451</p> <p>13.3 Framework for computing anomaly in self-aware systems 458</p> <p>13.4 Case study results: anomaly detection on multisensory data from a self-aware vehicle 467</p> <p>13.5 Conclusions 476</p> <p>References 477</p> <p>14. Deep plug-and-play and deep unfolding methods for image restoration</p> <p>KAI ZHANG AND RADU TIMOFTE</p> <p>14.1 Introduction 481</p> <p>14.2 Half quadratic splitting (HQS) algorithm 484</p> <p>14.3 Deep plug-and-play image restoration 485</p> <p>14.4 Deep unfolding image restoration 492</p> <p>14.5 Experiments 495</p> <p>14.6 Discussion and conclusions 504</p> <p>References 505</p> <p>15. Visual adversarial attacks and defenses</p> <p>CHANGJAE OH, ALESSIO XOMPERO, AND ANDREA CAVALLARO</p> <p>15.1 Introduction 511</p> <p>15.2 Problem definition 512</p> <p>15.3 Properties of an adversarial attack 514</p> <p>15.4 Types of perturbations 515</p> <p>15.5 Attack scenarios 515</p> <p>15.6 Image processing 522</p> <p>15.7 Image classification 523</p> <p>15.8 Semantic segmentation and object detection 529</p> <p>15.9 Object tracking 529</p> <p>15.10 Video classification 531</p> <p>15.11 Defenses against adversarial attacks 533</p> <p>15.12 Conclusions 537</p> <p>References 538</p> <p>Index 545</p>