Benchmarking tree species classification from proximally sensed laser scanning data: Introducing the FOR-species20K dataset

FOR-species20K was just published in Methods in Ecology and Evolution!

With 20K trees of 33 species, FOR-species20K iscurrently the largest dataset for benchmarking tree species classification from proximal laser scanning data (TLS/MLS/ULS).

This work is a large collaboration through the COST action 3DForEcoTech and SmartForest!

Congratulations to the authors Stefano Puliti, Emily R. Lines, Jana Müllerová, Julian Frey, Zoe Schindler, Adrian Straker, Matthew J. Allen, Lukas Winiwarter, Nataliia Rehush, Hristina Hristova, Brent Murray, Kim Calders, Nicholas Coops, Bernhard Höfle, Liam Irwin, Samuli Junttila, Martin Krůček, Grzegorz Krok, Kamil Král, Shaun R. Levick, Linda Luck, Azim Missarov, Martin Mokroš, Harry J. F. Owen, Krzysztof Stereńczak, Timo P. Pitkänen, Nicola Puletti, Ninni Saarinen, Chris Hopkinson, Louise Terryn, Chiara Torresan, Enrico Tomelleri, Hannah Weiser, Rasmus Astrup!

Proximally sensed laser scanning presents new opportunities for automated forest ecosystem data capture. However, a gap remains in deriving ecologically pertinent information, such as tree species, without additional ground data. Artificial intelligence approaches, particularly deep learning (DL), have shown promise towards automation. Progress has been limited by the lack of large, diverse, and, most importantly, openly available labelled single-tree point cloud datasets. This has hindered both (1) the robustness of the DL models across varying data types (platforms and sensors) and (2) the ability to effectively track progress, thereby slowing the convergence towards best practice for species classification.
To address the above limitations, we compiled the FOR-species20K benchmark dataset, consisting of individual tree point clouds captured using proximally sensed laser scanning data from terrestrial (TLS), mobile (MLS) and drone laser scanning (ULS). Compiled collaboratively, the dataset includes data collected in forests mainly across Europe, covering Mediterranean, temperate and boreal biogeographic regions. It includes scattered tree data from other continents, totaling over 20,000 trees of 33 species and covering a wide range of tree sizes and forms. Alongside the release of FOR-species20K, we benchmarked seven leading DL models for individual tree species classification, including both point cloud (PointNet++, MinkNet, MLP-Mixer, DGCNNs) and multi-view 2D-based methods (SimpleView, DetailView, YOLOv5).
2D Image-based models had, on average, higher overall accuracy (0.77) than 3D point cloud-based models (0.72). Notably, the performance was consistently >0.8 across scanning platforms and sensors, offering versatility in deployment. The top-scoring model, DetailView, demonstrated robustness to training data imbalances and effectively generalized across tree sizes.
The FOR-species20K dataset represents an important asset for developing and benchmarking DL models for individual tree species classification using proximally sensed laser scanning data. As such, it serves as a crucial foundation for future efforts to classify accurately and map tree species at various scales using laser scanning technology, as it provides the complete code base, dataset, and an initial baseline representative of the current state-of-the-art of point cloud tree species classification methods.

Paper, data, code and benchmark are available here:

paper: https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.14503

data: https://zenodo.org/records/13255198

code: https://github.com/stefp/FOR-species20K

benchmark: https://www.codabench.org/competitions/3667/

All images by Stefano Puliti.