publications | Daan de Geus

2024

CVPR
Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations

Daan de Geus, and Gijs Dubbelman

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2024

Abs Bib PDF Project page

Part-aware panoptic segmentation (PPS) requires (a) that each foreground object and background region in an image is segmented and classified, and (b) that all parts within foreground objects are segmented, classified and linked to their parent object. Existing methods approach PPS by separately conducting object-level and part-level segmentation. However, their part-level predictions are not linked to individual parent objects. Therefore, their learning objective is not aligned with the PPS task objective, which harms the PPS performance. To solve this, and make more accurate PPS predictions, we propose Task-Aligned Part-aware Panoptic Segmentation (TAPPS). This method uses a set of shared queries to jointly predict (a) object-level segments, and (b) the part-level segments within those same objects. As a result, TAPPS learns to predict part-level segments that are linked to individual parent objects, aligning the learning objective with the task objective, and allowing TAPPS to leverage joint object-part representations. With experiments, we show that TAPPS considerably outperforms methods that predict objects and parts separately, and achieves new state-of-the-art PPS results.
@inproceedings{degeus2024tapps, title = {{Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations}}, author = {{de Geus}, Daan and Dubbelman, Gijs}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2024}, }
CVPR
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers

Narges Norouzi, Svetlana Orlova, Daan de Geus, and Gijs Dubbelman

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2024

Abs Bib PDF Project page

This work presents Adaptive Local-then-Global Merging (ALGM), a token reduction method for semantic segmentation networks that use plain Vision Transformers. ALGM merges tokens in two stages: (1) In the first network layer, it merges similar tokens within a small local window and (2) halfway through the network, it merges similar tokens across the entire image. This is motivated by an analysis in which we found that, in those situations, tokens with a high cosine similarity can likely be merged without a drop in segmentation quality. With extensive experiments across multiple datasets and network configurations, we show that ALGM not only significantly improves the throughput by up to 100%, but can also enhance the mean IoU by up to +1.1, thereby achieving a better trade-off between segmentation quality and efficiency than existing methods. Moreover, our approach is adaptive during inference, meaning that the same model can be used for optimal efficiency or accuracy, depending on the application.
@inproceedings{norouzi2024algm, title = {{ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers}}, author = {Norouzi, Narges and Orlova, Svetlana and {de Geus}, Daan and Dubbelman, Gijs}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2024}, }

JMLR

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

Ariyan Bighashdel, Daan de Geus, Pavol Jancura, and Gijs Dubbelman

Journal of Machine Learning Research, 2024

@article{bighashdel2024offpa2,
  author = {Bighashdel, Ariyan and {de Geus}, Daan and Jancura, Pavol and Dubbelman, Gijs},
  title = {{Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning}},
  journal = {Journal of Machine Learning Research},
  year = {2024},
  volume = {25},
  number = {67},
  pages = {1--31},
}

CVPR Workshops
How to Benchmark Vision Foundation Models for Semantic Segmentation?

Tommie Kerssies, Daan de Geus, and Gijs Dubbelman

In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , 2024

Abs arXiv Bib PDF Project page

Recent vision foundation models (VFMs) have demonstrated proficiency in various tasks but require fine-tuning with semantic mask labels for the task of semantic segmentation. Benchmarking their performance is essential for selecting current models and guiding future model developments for this task. The lack of a standardized benchmark complicates comparisons, therefore the primary objective of this paper is to study how VFMs should be benchmarked for semantic segmentation. To do so, various VFMs are fine-tuned under several settings, and the impact of individual settings on the performance ranking and training time is assessed. Based on the results, the recommendation is to fine-tune the ViT-B variants of VFMs with a 16x16 patch size and a linear decoder, as these settings are representative of using a larger model, more advanced decoder and smaller patch size, while reducing training time by more than 13 times. Using multiple datasets for training and evaluation is also recommended, as the performance ranking across datasets and domain shifts varies. Linear probing, a common practice for some VFMs, is not recommended, as it is not representative of end-to-end fine-tuning. The recommended benchmarking setup enables a performance analysis of VFMs for semantic segmentation. The findings of such an analysis reveal that promptable segmentation pretraining is not beneficial, whereas masked image modeling (MIM) with abstract representations appears crucial, even more so than the type of supervision.
@inproceedings{kerssies2024benchmark, title = {{How to Benchmark Vision Foundation Models for Semantic Segmentation?}}, author = {Kerssies, Tommie and {de Geus}, Daan and Dubbelman, Gijs}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, year = {2024}, }
CVPR Workshops
Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation

Brunó B. Englert, Fabrizio J. Piva, Tommie Kerssies, Daan de Geus, and Gijs Dubbelman

In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , 2024

Abs Bib PDF Code

Achieving robust generalization across diverse data domains remains a significant challenge in computer vision. This challenge is important in safety-critical applications, where deep-neural-network-based systems must perform reliably under various environmental conditions not seen during training. Our study investigates whether the generalization capabilities of Vision Foundation Models (VFMs) and Unsupervised Domain Adaptation (UDA) methods for the semantic segmentation task are complementary. Results show that combining VFMs with UDA has two main benefits: (a) it allows for better UDA performance while maintaining the out-of-distribution performance of VFMs, and (b) it makes certain time-consuming UDA components redundant, thus enabling significant inference speedups. Specifically, with equivalent model sizes, the resulting VFM-UDA method achieves an 8.4\times speed increase over the prior non-VFM state of the art, while also improving performance by +1.2 mIoU in the UDA setting and by +6.1 mIoU in terms of out-of-distribution generalization. Moreover, when we use a VFM with 3.6\times more parameters, the VFM-UDA approach maintains a 3.3\times speed up, while improving the UDA performance by +3.1 mIoU and the out-of-distribution performance by +10.3 mIoU. These results underscore the significant benefits of combining VFMs with UDA, setting new standards and baselines for Unsupervised Domain Adaptation in semantic segmentation. The implementation is available at https://github.com/tue-mps/vfm-uda.
@inproceedings{englert2024vfmuda, title = {{Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation}}, author = {Englert, {Brun\'{o} B.} and Piva, {Fabrizio J.} and Kerssies, Tommie and {de Geus}, Daan and Dubbelman, Gijs}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, year = {2024}, }

2023

CVPR
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers

Chenyang Lu, Daan de Geus, and Gijs Dubbelman

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2023

Abs arXiv Bib PDF Project page

This paper introduces Content-aware Token Sharing (CTS), a token reduction approach that improves the computational efficiency of semantic segmentation networks that use Vision Transformers (ViTs). Existing works have proposed token reduction approaches to improve the efficiency of ViT-based image classification networks, but these methods are not directly applicable to semantic segmentation, which we address in this work. We observe that, for semantic segmentation, multiple image patches can share a token if they contain the same semantic class, as they contain redundant information. Our approach leverages this by employing an efficient, class-agnostic policy network that predicts if image patches contain the same semantic class, and lets them share a token if they do. With experiments, we explore the critical design choices of CTS and show its effectiveness on the ADE20K, Pascal Context and Cityscapes datasets, various ViT backbones, and different segmentation decoders. With Content-aware Token Sharing, we are able to reduce the number of processed tokens by up to 44%, without diminishing the segmentation quality.
@inproceedings{lu2023cts, title = {{Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers}}, author = {Lu, Chenyang and {de Geus}, Daan and Dubbelman, Gijs}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2023}, }
WACV
Intra-Batch Supervision for Panoptic Segmentation on High-Resolution Images

Daan de Geus, and Gijs Dubbelman

In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2023

Abs arXiv Bib PDF Project page

Unified panoptic segmentation methods are achieving state-of-the-art results on several datasets. To achieve these results on large-resolution datasets, these methods apply crop-based training. In this work, we find that, although crop-based training is advantageous in general, it also has a harmful side-effect. Specifically, it limits the ability of unified networks to discriminate between large object instances, causing them to make predictions that are confused between multiple instances. To solve this, we propose Intra-Batch Supervision (IBS), which improves a network’s ability to discriminate between instances by introducing additional supervision using multiple images from the same batch. We show that, with our IBS, we successfully address the confusion problem and consistently improve the performance of unified networks. For the high-resolution Cityscapes and Mapillary Vistas datasets, we achieve improvements of up to +2.5 on the Panoptic Quality for thing classes, and even more considerable gains of up to +5.8 on both the pixel accuracy and pixel precision, which we identify as better metrics to capture the confusion problem.
@inproceedings{degeus2023ibs, title = {{Intra-Batch Supervision for Panoptic Segmentation on High-Resolution Images}}, author = {{de Geus}, Daan and Dubbelman, Gijs}, booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, year = {2023}, }
WACV
Empirical Generalization Study: Unsupervised Domain Adaptation vs Domain Generalization Methods for Semantic Segmentation for the Wild

Fabrizio J. Piva, Daan de Geus, and Gijs Dubbelman

In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , 2023

Abs Bib PDF Project page

For autonomous vehicles and mobile robots to safely operate in the real world, i.e., the Wild, scene understanding models should perform well in the many different scenarios that can be encountered. In reality, these scenarios are not all represented in the model’s training data, leading to poor performance. To tackle this, current training strategies attempt to either exploit additional unlabeled data with unsupervised domain adaptation (UDA), or to reduce overfitting using the limited available labeled data with domain generalization (DG). However, it is not clear from current literature which of these methods allows for better generalization to unseen data from the wild. Therefore, in this work, we present an evaluation framework in which the generalization capabilities of state-of-the-art UDA and DG methods can be compared fairly. From this evaluation, we find that UDA methods, which leverage unlabeled data, outperform DG methods in terms of generalization, and can deliver similar performance on unseen data as fully-supervised training methods that require all data to be labeled. We show that semantic segmentation performance can be increased up to 30% for a priori unknown data without using any extra labeled data.
@inproceedings{piva2023generalization, title = {{Empirical Generalization Study: Unsupervised Domain Adaptation vs Domain Generalization Methods for Semantic Segmentation for the Wild}}, author = {Piva, Fabrizio J. and {de Geus}, Daan and Dubbelman, Gijs}, booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, year = {2023}, }

2021

CVPR
Part-aware Panoptic Segmentation

Daan de Geus, Panagiotis Meletis, Chenyang Lu, Xiaoxiao Wen, and Gijs Dubbelman

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2021

Abs arXiv Bib PDF Code

In this work, we introduce the new scene understanding task of Part-aware Panoptic Segmentation (PPS), which aims to understand a scene at multiple levels of abstraction, and unifies the tasks of scene parsing and part parsing. For this novel task, we provide consistent annotations on two commonly used datasets: Cityscapes and Pascal VOC. Moreover, we present a single metric to evaluate PPS, called Part-aware Panoptic Quality (PartPQ). For this new task, using the metric and annotations, we set multiple baselines by merging results of existing state-of-the-art methods for panoptic segmentation and part segmentation. Finally, we conduct several experiments that evaluate the importance of the different levels of abstraction in this single task.
@inproceedings{degeus2021pps, title = {{Part-aware Panoptic Segmentation}}, author = {{de Geus}, Daan and Meletis, Panagiotis and Lu, Chenyang and Wen, Xiaoxiao and Dubbelman, Gijs}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2021}, }

2020

RA-L

Fast Panoptic Segmentation Network

Daan de Geus, Panagiotis Meletis, and Gijs Dubbelman

IEEE Robotics and Automation Letters, 2020

arXiv Bib

@article{degeus2020fpsnet,
  author = {{de Geus}, Daan and Meletis, Panagiotis and Dubbelman, Gijs},
  journal = {IEEE Robotics and Automation Letters},
  title = {{Fast Panoptic Segmentation Network}},
  year = {2020},
  volume = {5},
  number = {2},
  pages = {1742-1749},
}

ITSC

Proactive Risk Navigation System for Real-World Urban Intersections

Tim Puphal, Benedict Flade, Daan de Geus, and Julian Eggert

In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) , 2020

arXiv Bib

@inproceedings{puphal2020risk,
  author = {Puphal, Tim and Flade, Benedict and {de Geus}, Daan and Eggert, Julian},
  booktitle = {2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)},
  title = {{Proactive Risk Navigation System for Real-World Urban Intersections}},
  year = {2020},
}

2019

Single Network Panoptic Segmentation for Street Scene Understanding

Daan de Geus, Panagiotis Meletis, and Gijs Dubbelman

In IEEE Intelligent Vehicles Symposium (IV) , 2019

arXiv Bib

@inproceedings{degeus2019single,
  author = {{de Geus}, Daan and Meletis, Panagiotis and Dubbelman, Gijs},
  booktitle = {IEEE Intelligent Vehicles Symposium (IV)},
  title = {{Single Network Panoptic Segmentation for Street Scene Understanding}},
  year = {2019},
}

2018

Tech Report

Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network

Daan de Geus, Panagiotis Meletis, and Gijs Dubbelman

arXiv preprint arXiv:1809.02110, 2018

arXiv Bib

@article{degeus2018jsisnet,
  title = {{Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network}},
  author = {{de Geus}, Daan and Meletis, Panagiotis and Dubbelman, Gijs},
  journal = {arXiv preprint arXiv:1809.02110},
  year = {2018},
}