Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models

LE, Thanh-Dung; HA, Vu Nguyen; NGUYEN, Ti Ti; Tran, Duc-Dung; Nguyen-Kha, Hung; Garces-Socarras, Luis; Carlos, Juan; Chatzinotas, Symeon

Download

Unpublished conference/Abstract (Scientific congresses, symposiums and conference proceedings)

Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models

LE, Thanh-Dung; HA, Vu Nguyen; NGUYEN, Ti Ti et al.

2026 • Submitted to an IEEE journal

Permalink
https://hdl.handle.net/10993/68214

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

_IEEE_TGRS_Onboard_VIT.pdf

(3.26 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Earth observation; remote sensing; Transformers; on-board processing; pre-trained ViT; model robustness

Abstract :

[en] Remote sensing (RS) image classification (IC) is a critical component of Earth observation (EO) systems, traditionally dominated by convolutional neural networks (CNNs) and other deep learning (DL) techniques. However, the advent of Transformer-based architectures and large-scale pretrained models has significantly shifted the trend by offering enhanced performance and efficiency. Hence, this study focuses on identifying the most effective pre-trained model for land use classification in onboard satellite processing, emphasizing achieving high accuracy, computational efficiency, and robustness against noisy data-conditions commonly encountered during satellite-based inference. Through extensive experimentation, we compare the performance of traditional CNN-based, ResNetbased, and various pre-trained vision Transformer models. Our findings demonstrate that pre-trained Vision Transformer (ViT) models, particularly MobileViTV2 and EfficientViT-M2, outperform models trained from scratch in terms of accuracy and efficiency. These models achieve high performance with reduced computational requirements and exhibit greater resilience during inference under noisy conditions. While MobileViTV2 has excelled on clean validation data, EfficientViT-M2 has proved to be more robust when handling noise, making it the most suitable model for onboard satellite EO tasks. Our experimental results demonstrate that EfficientViT-M2 is the optimal choice for reliable and efficient RS-IC in satellite operations, achieving 98.76% of accuracy, precision, and recall. Precisely, EfficientViT-M2 delivers the highest performance across all metrics, excels in training efficiency (1,000s) and inference time (10s), and demonstrates greater robustness (overall robustness score of 0.79). Consequently, EfficientViT-M2 consumes 63.93% less power than MobileViTV2 (79.23 W) and 73.26% less power than SwinTransformer (108.90 W). This highlights its significant advantage in energy efficiency. Reproducible codes for data augmentation, noisy data, training, and inference are available from our shared Github repository 1 .

Disciplines :

Electrical & electronics engineering

Author, co-author :

LE, Thanh-Dung

HA, Vu Nguyen ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SigCom

NGUYEN, Ti Ti ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SigCom

Tran, Duc-Dung

Nguyen-Kha, Hung

Garces-Socarras, Luis

Carlos, Juan

Chatzinotas, Symeon

External co-authors :

Language :

English

Title :

Onboard Satellite Image Classification for Earth Observation: A Comparative Study of ViT Models

Publication date :

2026

Event name :

Submitted to an IEEE journal

Event date :

April 2026

By request :

Yes

Available on ORBilu :

since 18 April 2026

Statistics

Number of views

5 (1 by Unilu)

Number of downloads

1 (0 by Unilu)

More statistics