Wing Interferential Patterns (WIPs) and machine learning, a step toward automated tsetse (Glossina spp.) identification

Wing Interferential Pattern according to Glossina wings genera, species, sex, and samples

To set up a protocol on which WIP can be acquired and used for Glossina species recognition, we performed experiments that allowed the visualization of WIPs under various conditions. Firstly, the conservation of the interferential pattern revealed on a wing of glossina was analyzed according to the position of the radial symmetry (intrado/extrado) and axial symmetry (left and right). Following the process described in Fig. 1, pictures of glossina specimens were taken. As exemplified in Fig. 5A demonstrating WIPs of G.f. fuscipes, G.m. morsitans, and G.p. gambiensis, no striking differences in the pattern of interferential colors were observed according to the wing position during image acquisition (intrado/extrado or right/left). Therefore, the positioning of the wing on the slide did not influence the WIP generated. To delineate the WIPs reproducibility, we further analyzed the stability of this phenotype on a large series of males and females specimen of various species (Fig. 5B). We noticed variation in the pattern of interferential color recorded on wings. This pattern was species-specific and presented a faint recurrent sexual dimorphism that must be further investigated. Finally, we investigated the stability of WIP according to the sampling date and the preservative mode. Discrete variations in the pattern of interferential light are recorded. Still, the overall pattern organization and its color composition remain similar, demonstrating the possibility of generating consistent interferential patterns from samples preserved in ethanol or air-dried for an extended period (Fig. 5C).

Figure 5
figure 5

variability of WIPs generated on Glossina spp. according to (HAS) the wing orientation, top G.f. fuscipesmiddle G.m. morsitans down G.p. gambiensis(B) the samples, and (VS) the preservation history. I, intrados and E, extrado.

The wing color patterns were manually drawn from taken pictures (Fig. 6). This first pictorial was drawn using fresh field-caught specimens from Cameroon (P. Grebaut), or Ivory coast and previously used to perform geomorphometric analysis28, and for most specimens belonging to the IRD collection, see Table 1 for the characteristics of the specimen. For specimens of the collection, the identification was performed by an expert entomologist at the time of the flies’ capture (Table 1). The identification was performed for specimens from field traps, as previously reported7.

Figure 6
figure 6figure 6

Selected glossina spp pictorial key, deduced from Wing Interferential Pattern.

Strikingly, 4 prominent interferential colors are revealed on glossina wings, green, yellow, blue, and red. The pictorial key shows the red, blue, and yellow color patterns distribution according to the species and subspecies we gathered during the study (Fig. 6). The green color was not reported in the pictorial key because it represents the interferential background color and is thus figured as white in the pictorial key. The color diversity appeared to be lower for wings of glossina species belonging to the austenia subgenus, as compared to those belonging to nemorhina gold glossina will subgenerate. Differences between species appeared related to the red pattern shape. glossina species belonging to the nemorhina and glossina subgenera appeared to bear multicolor WIPs. The sexual dimorphism of this character was present in all samples, representative of the species in which males and females were studied.

The interference pattern evidenced on the wing of the tsetse fly can help to set up an automatic identification system. A series of pictures of glossina species and subspecies currently described, and the most important vectors of HAT and AAT, were taken. To test if such analysis can be considered as a fingerprinting approach for glossina species identification, it is essential to discriminate most, if not all, glossina species or subspecies currently known. As shown in Table 1, 23 out of 31 presently referenced glossina species and subspecies were collected. They originate from the field, ARIM collection, or laboratory-eared glossina flies.

Training and classification

We explored training classifiers on the dataset alone and on a dataset where negative samples were added containing various non-glossina insects as negative samples. Training the CNN (Convolutional Neural Network) on a combination of glossina and no-glossina images can improve the model to make correct predictions. The database was constructed by a total of 5516 pictures of dipteran insects WIPs in which 1766 pictures belonged to glossina species. We deliberately crop and adjust all photos at the same dimension implying that (1) the size of wings cannot be used as a discriminative criterion for the classification process; (2) we cannot use landmarks to classify wings process. We primarily focused our analysis on glossina species and subspecies documented as proven vectors for HAT and AAT, ie, Gp palpalis, G. p. gambiensis, G.f. fuscipes, G.f. quazensis, G.f. martinii, G.m. morsitans, G.m. submorsitans, G.m. centralized, G. tachninoides, G. caliginehas, G. swynertoni, G. pallidipidesand G. longipalpis. Our database contains more than 80% of the glossina species with medical or veterinary interest. Only G.f. martinii was absent in our database among Glossina species involved in Trypanosoma transmission. The dataset we constituted represents about 70% of species diversity as it contains WIPs pictures of 23 glossina species and subspecies described.

Unfortunately, some were represented by only a few images, and for 9, no more than 15 specimens were used (Table 1). We then ascertained the accuracy of the classification process at various taxonomic levels of genus, species, and subspecies. The classifier has high accuracy level of nearly 100% at the genus level, implying its competence in the classification/recognition of the demonstrated glossina genes (see Table 3A). In the next step, its performance in correctly assigning glossina pictures at the species level was further challenged on complexes of species, ie., G. fuscipes, G. palpalis, and G. morsitans. The classifier accuracy incredible precision, with accuracy ranging from 90% for the demonstrated G. fuscipes and G. morsitans complex, to 100% for the G. palpalis ones (see Table 3B). We then further assessed the classification process at species and subspecies levels. At the time of the experiment, only 45% of glossina species had entries with more than 8 pictures. Nevertheless, for almost all specimens tested, the classification accuracy also demonstrated a precision ranging from 33 to 100%. Glossina palpalis palpalis and G.p. gambiensis are primary vectors of HAT in West Africa. They can hybridize in the laboratory, but offspring males are sterile23. These two subspecies are challenging to identify, even if males show some morphological differences in the inferior clasper’s terminal dilation of their genitalia24. The deep learning methodology was somewhat highly accurate, with an accuracy of up to 97% (see Table 3C). Although the algorithm failed to identify 2 glossina classes during the test, this can be explained, in part, by an extremely low number of WIPs pictures representative of the species in the test dataset (only 2 images for each class). Based on our dataset-splitting approach, we found these classes account for 8 images for training. This is a case of overfitting caused by insufficient training data, despite our self-imposed constraint of 10 total images per class. The results on the accuracy of Glossina classification are summarized in Table 3C.

Table 3 Results and confusion matrix for glossina versus other genera (A) of the classification of specimens belonging to the palpalis, morsitans, and fuscipes complexes (B) and at the species and subspecies level (C).

Misclassified pictures

Inspecting a machine learning model for weak points would help identify underlying issues. This can be performed via a review of the miss-predicted images. This will get insights into what makes a photo hard to classify for the model. In Fig. 7, selected examples are presented. Deep learning models rely mainly on textures than on shapes. Therefore, a more extensive training set can avoid photo or sample quality pitfalls. To avoid confusing setups when taking photos; this can improve the accuracy of the automated classification. A guideline can be added to the application to advise participants to make high-qualified images of glossina samples.

Picture 7
figure 7

Misclassified picture; some examples from the images mistakenly predicted by the CNN model.

Identification process examination following image transformation and cropping

Overall, a computer-aided and manual transformation of pictures is a tool to test the robustness of the identification process mimicking blur during image acquisition, image quality degradation, and integrity of samples’ wings. In addition, raw images of the acquisition process were tested for identification (Table 4). The modification was performed on chosen images from the training dataset (Table 4A) and the test dataset (Table 4B). In both cases, alterations of the image impact the identification accuracy. First, tsetse specimens insufficiently represented in the dataset failed to be identified (Glossina fusca fusca Walker, 1849). For most specimens, blurring (gaussian or lens) did not drastically modify the ability of the trained model to identify specimens at the species level accurately. The video degradation affected the glossina identification of some specimens. For G.f. fuscipes, G.f. quazensis, and G. tachinoides, the transformation did not impact the species identification, except in the case of throwing and scrambling (RGB) the image. In conclusion, with our trained model, the image alteration impeded the model’s recognition capacity.

Table 4 Identification accuracy on manually and computed transformed WIPs pictures, modified pictures taken in the training dataset (HAS) and the test data set (B)

Leave a Reply

Your email address will not be published.