Staff directory

Ruben De Blaere

Biology
Wood biology

Publication details

Title:

De Blaere, R., Van den Bulcke, J., Beeckman, H., Verwaeren, J. & Hubau, W. 2025. Identification of Congolese wood species by use of anatomical images and artificial intelligence. Gent : Ghent University. 259 p. (PR) ISBN: 9789463579025.

Category:

Book

Abstract:

Illegal logging significantly impacts forests, posing a high risk of irreversible damage, particularly when exploiting populations of protected species. Thirty to ninety percent of traded tropical timber is estimated to have been harvested illegally. In response, a range of regulatory frameworks has been established to enhance transparency and traceability within the timber supply chain. These include the FLEGT Action Plan, the EUTR and EUDR, the U.S. Lacey Act, and CITES. The effectiveness of these measures, however, hinges on a single critical capability: the rapid and accurate identification of wood species across all stages of the supply chain—from forest harvest sites to transport hubs, storage facilities, and ports. Wood identification methods encompass anatomical, chemical, and genetic techniques. Of these, anatomical assessment—based on the observation of wood's cellular and tissue structures—remains the most widely applied. The IAWA provides standardized anatomical features that underpin global anatomical assessment practices. While microscopic analysis offers high taxonomic resolution, it is constrained by the need for specialized equipment, expert training, and intensive sample preparation, limiting its utility for frontline enforcement. Macroscopic anatomical assessment, by contrast, relies on features visible to the naked eye or a hand lens and provides a more accessible, low-cost alternative applicable to freshly cut or sanded surfaces. Cross-sectional views are especially informative, revealing diagnostic traits such as vessel arrangement, ray width, and parenchyma distribution—features used in field keys and identification guides. Despite its operational simplicity, macroscopic anatomical assessment is limited by the relatively small number of observable features and significant intra-species variation, which can undermine diagnostic accuracy. Furthermore, the categorical nature of IAWA descriptors (e.g., "present," "variable," "absent") may obscure subtle but taxonomically relevant variation and introduce subjectivity in interpretation. Importantly, the actual discriminatory power of these features—especially in species-rich tropical regions—has not been systematically evaluated at scale, constraining the development of reliable identification keys and limiting the benchmarking of emerging data-driven models. CV offers a compelling alternative by automating wood species recognition through image analysis. CNNs, in particular, can extract diagnostic features directly from macroscopic cross-sectional images, enabling rapid and accurate assessments. These models can operate on portable devices and have already been piloted in enforcement scenarios, such as with the XyloTron system in Ghana. However, existing computer vision applications face key limitations. Training and testing data are often derived from pristine specimens, raising concerns about robustness in real-world conditions, where samples may exhibit cracks, insect damage, discoloration, or fungal decay. These factors can obscure anatomical features and degrade model performance. Recent studies (e.g., Ravindran et al. 2023; Owens et al. 2024) have begun to quantify the impact of occlusion of anatomical information on classification accuracy, but do not tackle the influence of including damage during training of models. Moreover, most CNN-based models adopt a multiclass classification approach, which assumes that all test samples belong to a fixed set of known species. This closed-world assumption limits their applicability in biodiverse regions, where unknown or closely related species may occur. Attempts to mitigate this limitation through opt-out classes or confidence thresholds have shown limited success. In response, the field is increasingly turning to open-world recognition frameworks. One promising approach is object re-identification, which encodes images as embedding vectors (anatomical fingerprints) within a learned feature space, where samples from the same species form clusters. Identification is then performed by comparing a query image to a reference database, allowing recognition of both known and novel species. Training strategies such as triplet learning and binary verification promote the learning of discriminative, species-specific representations. While more complex to implement—requiring careful sample selection and robust loss functions—re-identification approaches offer generalizability to timbers beyond the taxonomic scope of the training data, and align more closely with expert practices, which often rely on comparative rather than categorical judgments. A central challenge in building effective wood identification systems is the immense diversity of tree species worldwide, particularly in tropical regions like the DRC. Wood is a highly variable biological material, influenced by genetics, environmental conditions, and intra-tree location (e.g., trunk vs. branches, pith vs. bark). This complexity makes it difficult to define consistent diagnostic criteria. Additionally, taxonomic classifications are frequently revised, complicating database curation. Commercial trade further obscures species-level identification by grouping timbers under broad trade names based on physical properties rather than botanical identity. Current databases, such as InsideWood, macroHOLZdata, and CITESWoodID, and the Atlas of Macroscopic Wood Identification, offer valuable resources but may not cover the necessary variability that wood anatomical feature can portray within a species. InsideWood, for example, compiles species descriptions and images but often generalizes from limited specimens and lacks traceability to physical references. This constrains assessments of intra-specific variation and undermines the reliability of training data for machine learning models. The limitations and underlying problems are described in Chapter 1, based on literature. To address these limitations, we built SmartWoodID, the largest reference database of annotated macroscopic cross-sectional images designed to support rapid and accurate wood identification in the DRC, a hotspot of illegal logging. SmartWoodID draws on the extensive Tervuren wood collection and includes multiple high-quality images per species, prioritizing economically important taxa. Unlike other databases, SmartWoodID intentionally includes specimens with natural variation and surface defects (e.g., cracks, fungal stains, insect damage) to better represent real-world conditions. This ensures that models trained on the dataset are more resilient to the variability encountered in field applications. The construction of this database is described in Chapter 2. In Chapter 3, we systematically evaluated the diagnostic utility of 31 standardized macroscopic features across 601 timber species using the SmartWoodID dataset. While useful for narrowing identifications within small taxonomic scopes, these features exhibited limited discriminatory power at broader scales. Predictive models based solely on expert-defined features achieved only ~50% genus-level accuracy across 56 commercial Congolese genera, with significant anatomical overlap and large candidate sets required for confident identifications. These findings highlight the need to reassess the diagnostic validity of traditional descriptors and suggest that future research should explore both improvements to feature-based methodologies and complementary techniques to enhance field applicability. To investigate whether visual information not captured by standard descriptors could improve identification, Chapters 4 and 5 explored CNN models trained on raw cross-sectional images. These models preserved nuanced patterns of colour and texture that experts intuitively use but which are not codified in existing feature sets. CNNs achieved substantially better performance than feature-based models, with precision, recall, and accuracy all exceeding 85% at the genus level. The correct genus was among the top six predictions in over 95% of test cases. These results affirm that raw visual data contains richer diagnostic information than codified features alone and that CV can effectively harness this information. Recognizing that real-world identifications often integrate anatomical descriptors and visual impressions, Chapter 6 examined whether expert-defined features could be used to refine CNN predictions. Re-ranking top-k CNN outputs using feature data led to modest improvements for some genera but reduced accuracy for others, including priority genera such as Khaya. This indicates that while hybrid approaches have potential, their implementation must be carefully tailored to avoid counterproductive effects. The study also addressed critical factors in building effective training databases. Empirical analyses showed that increasing specimen representation and scan area improved CNN performance, underscoring the value of capturing anatomical variability. Models trained on pristine image patches achieved higher recall (90.5%) than those trained on mixed (88.4%) or damaged (79.1%) patches. Grad-CAM visualizations confirmed that CNNs consistently focused on intact anatomical structures, further supporting the emphasis on high-quality specimen preparation and imaging during database construction. Chapter 5 evaluated scalable identification strategies for open-world contexts. Binary verification emerged as a particularly promising approach, comparing query images to reference samples to generate similarity scores rather than fixed labels. This method performed robustly, even for species not included in training, and proved effective in practical scenarios where the goal is to verify the plausibility of declared identities rather than assign definitive species labels. Binary verification matched or outperformed multiclass models in ranking the correct genus among top candidates for the 56 Congolese genera studied. We also evaluated triplet learning, which transforms images into numerical vectors representing anatomical patterns. These embeddings can be directly compared or fed into lightweight classifiers such as nearest-neighbour or XGBoost. Although initial performance was suboptimal—likely due to suboptimal selection of hard training examples—the approach remains promising, particularly for integrating multimodal data (e.g., DNA, chemical signatures) into unified identification systems. In conclusion, this study provides the first direct comparison of expert-defined feature-based keys, CNN classifiers, and re-identification models under realistic, field-like conditions in the DRC. It emphasizes the need for open-world recognition frameworks and hybrid strategies to create robust, scalable, and interpretable systems for timber identification. The findings have direct implications for international efforts to combat illegal logging and lay the groundwork for next-generation, AI-enabled wood identification tools tailored to the operational realities of enforcement.

Documents: