Skip to main content

Table 2 Details of artificial intelligence methods applied and outcomes in studies for appendicitis diagnosis

From: Artificial Intelligence and Acute Appendicitis: A Systematic Review of Diagnostic and Prognostic Models

Study, year

Input features

Training/validation strategy

Performance

Comparative algorithms and scoring metrics

Key findings

Limitations

Park et al. [1], 2023 (South Korea)

CT slices

DL model trained using fivefold cross-validation and separate test dataset

Each fold had 60–70% training samples, 15–20% validation samples for parameter tuning, and 15–25% test samples for final evaluation

Single-Slice method: Sensitivity: 85.6%, Specificity: 96%, PPV: 85.4%, Accuracy: 86.1%, AUC: 0.937

RGB method: Sensitivity: 87.8%, Specificity: 88%, PPV: 87.1%, Accuracy: 87.9%, AUC: 0.951

NR

CNN performed better with serial slices and the RGB method than with a single-slice method

1. Retrospective study

2. Limited acute diverticulitis CT images and data augmentation used for balance

3. Excluded complicated diverticulitis cases

4. No tool was developed for condition localization in CT images

5. CNN performance was not evaluated with coronal reformatted CT images

Akbulut et al. [2], 2023 (Turkey)

TBil, WBC, Neutrophil, WLR, NLR, CRP, and WNR values and lower PNR, PDW, and MCV

The persistence method was repeated 50 times with different seeds for model robustness. CatBoost model predicted AA, with optimized hyperparameters using grid search with tenfold cross-validation and 5 replicates

CatBoost: Sensitivity 84.2%, Specificity 93.2%, AUC 0.947, Accuracy 88.2%, F1-score 88.7%

NR

The CatBoost ML model demonstrated high accuracy in distinguishing between AA and NA patients, achieving an 88.2% accuracy rate

1. The study is retrospective and lacks comprehensive clinical data

2. Radiological data are missing for approximately 11% of the patient sample

3. Conducted at a single institution

Ghareeb et al. [3], 2021 (Egypt)

Age, gender, marital status, obesity, diabetes mellitus, hypertension, hepatitis B virus infection, hepatitis C virus infection, autoimmune diseases, pain history of similar, duration of pain, site of pain, nausea, vomiting, anorexia, body temperature, CBC, Hg, ultrasound findings

It assessed various learning algorithms and selected the best-performing model based on accuracy and AUC. Principal Component Analysis (PCA) was used for precise feature selection without excluding any variables. An optimization process reduced prediction errors, and external validation was done with a separate dataset. Variable importance was ranked, and Ensemble Bag optimization with 30 iterations minimized diagnostic classification errors to 0.129

The best model performance (Subspace KNN model): Sensitivity: 100%, Specificity: 80%, PPV: 97.9%, NPV: 96.7%, Accuracy: 91.1, AUC: 0.82

Other models accuracies: DT: 84.4%, LR: 87.5%, NB: 88.8%, SVM: 89.3%, KNN: 89.3%

Alvarado score: Sensitivity: 68.2%, Specificity: 80%, PPV: 96.7%, NPV: 22.9%, Accuracy: 69.5%

US alone: Sensitivity: 50.8%, Specificity: 73.5%, PPV: 94.7%, NPV: 16.9%, Accuracy: 58.6%

Combined US and Alvarado: Sensitivity: 69.6%, Specificity: 100%, PPV: 100%, NPV: 28%, Accuracy: 72.8%

1. The diagnostic accuracy of the AI model outperforms both the Alvarado score alone and the Alvarado score combined with US criteria

2. The AI model excels in diagnostic accuracy, except for specificity, which is higher when combined with specific criteria

1. Single-center study

2. Small number of patients

3. Exclusion of patients with colon cancer

4. Limited real-world 5. applicability of the AI model

6. Inclusion of patients with pathologies other than appendicitis may affect results

Rajpurkar et al. [4], 2020, (USA)

CT scan

Created development and test sets using stratified random sampling with a balance of about 50% appendicitis examinations and 50% non-appendicitis examinations

Pretrained on video images: Sensitivity: 78.4%, Specificity: 66.7%, Accuracy: 72.5%, AUC: 0.810

Not pretrained on video images: Sensitivity: 78.4%, Specificity: 35.3%, Accuracy: 56.9%, AUC: 0.724

NR

1. Small training dataset used; video pretraining compensates for dataset size

2. Model technique applicable for future medical image DL studies

1. Small training dataset, no investigation into video pretraining’s impact with data size

2. Pretraining model effect explored using Kinetics dataset

3. Single-center study

4. The model does not differentiate between CA and UCA

Park et al. [5], 2020 (USA)

CT scan

Used eightfold cross-validation. The dataset is split into 8 parts, 7 for training, and 1 for testing. Hyperparameters set based on initial training, used for all 8 models. External validation with CT data from two institutions on 8 trained CNN models. The deep CNN used in the algorithm was built with six convolutional layers, three max-pooling layers, and two fully connected layers

Training and internal validation: Sensitivity: 90.2%, Specificity: 92%, Accuracy: 91.5%

External validation, institution 1 (Sensitivity: 88.5%, Specificity: 91.2%, Accuracy: 90%), institution 2 (Sensitivity: 95%, Specificity: 100%, Accuracy: 97.5%)

NR

Feasibility of CNN-based diagnosis algorithm for diagnosing acute appendicitis using CT data

1. Excluded patients with tumors in the appendix who had surgical removal

2. Trained and tested network using manually extracted 4 cm3 appendix region data

Zhao et al. [6], 2020 (China)

More than 800 proteins in each urine sample

Detected outliers in the discovery dataset (AA outliers and CON outliers) against a normal urine database (495 samples) to identify markers indicating changes under pathological conditions

RF model: Sensitivity: 81.2%, Specificity: 84.4%, Accuracy: 83.6%

SVM model: Sensitivity: 25%, Specificity: 97.8%, Accuracy: 78.7%,

NB model: Sensitivity: 68.8%, Specificity: 71.1%, Accuracy: 70%

NR

1. The urinary proteomic system finds markers for AA vs. other acute abdomens

2. The RF model has high specificity in AA diagnosis without clinical signs

3. Noninvasive urinary markers have potential for clinical use

1. No validation with a larger sample size

2. No absolute quantification for feature proteins

3. No exploration of combining urinary markers with metabolites

Ramirez garcialunaa et al. [7], 2020 (Mexico)

Abdominal skin IRT images

Training and validation cohorts had balanced distributions of patients in three categories (“healthy,” “appendicitis,” and “no appendicitis”) with nine relevant predictors. The final model was built by considering the accuracy-complexity trade-off

RF model: Accuracy: 76.9%, Sensitivity: 91.3%, Specificity: 56.3%, PPV: 75%, NPV: 81.8%

NR

1. IRT may complement diagnostic workup for appendicitis

2. IRT is a timesaving, low-cost, noninvasive imaging modality

3. IRT has the potential to improve the clinical decision-making process

1. Group sizes unequal, non-appendicitis smaller

2. Minimal clinical/laboratory differences between groups

3. No IRT vs. CT scan comparison, gold standard

Kang et al. [8], 2019 (South Korea)

Rebound tenderness severity, migration, urinalysis, symptom duration, leukocytosis, neutrophil count, and CRP levels

The DT comprises 11 final nodes. The severity of rebound tenderness was selected in the parent node

DT model: AUC: 0.85, [95% CI; 0.799–0.893]

Alvarado score: AUC: 0.695

AAS score: AUC: 0.749

Eskelinen score: AUC: 0.715

1. New clinical approach using DT aids AA diagnosis in adults with equivocal CT findings

2. Helps decide the disposition of patients with equivocal results

1. retrospective design,

2. A small patient population in some DT nodes

3. CT findings interpreted by only one radiologist

Gudelis et al. [9], 2019 (Spain)

Blumberg sign, pain migration, increased pain, increased pain with movement, pain when coughing, anorexia, temperature, number of leukocytes, hours of evolution, and CRP levels

The training and validation method involved implementing the ANN model using the Alyuda1 (Neurointelligence) program, which utilizes MLP methodology with BP. In this process, all candidate variables were included in the “full model” type. The models had automatic variable selection capabilities based on significance or hierarchy. Internal validation was conducted using cross-validation with 10 partitions

ANN model: for all diagnoses: PCC: 75%, for AA diagnosis: PCC: 93.5%, AUC: 0.95

CHAID: for all diagnoses: PCC: 74.2%, for AA diagnosis: PCC: 81.7%, AUC: 0.93

1. Professionals treating RIF pain can benefit from interpretable models

2. The CHAID model offers a classification with more than two possibilities (AA vs. non-AA)

3. Validation in a larger series is necessary to confirm the model’s performance

1. Assignment of groups not validated by literature

2. Small sample size, particularly in the NIRIF and IRIF diagnostic groups

3. Limited capacity of models that only compare AA versus other conditions in real patient management

Shahmoradi et al. [10], 2018 (Iran)

Demographic, symptoms, clinical signs, laboratory findings

Used an MLP network with two hidden layers (7 and 5 neurons) and specific activation functions. For RBFN, specific activation functions and rescaling methods were applied

MLP model: Sensitivity: 80%, Specificity: 97.5%, PPV: 92.3%, NPV: 93%, Accuracy: 92.9%, AUC: 0.832

RBFN model: Sensitivity: 28%, Specificity: 87.8%, PPV: 64.2%, NPV: 81.8%, Accuracy: 77.6%, AUC: NR

LR model: Sensitivity: 58.3%, Specificity: 93.2%, PPV: 75.7%, NPV: 86%, Accuracy: 83.9%, AUC: 0.808

NR

1. MLP model outperforms LR in sensitivity, specificity, and accuracy

2. Essential predictors: leukocytosis, sex, tenderness, right iliac fossa pain

1. Small sample size:

2. Lack of imaging techniques

3. Risk of misdiagnosis 4. The study Limited variables

Jamshidnezhad et al. [11], 2017 (Iran)

Age, first abdominal pain time, initial pain site, RLQ abdomen shift, WBC, neutrophil count

The model was trained 10 times to assess reliability. Each time, an independent dataset was used for testing. Training took 135 s, while testing achieved diagnosis results in less than 1 s

Fuzzy-rule-based system: Accuracy rate (presence: 92%, high risk: 90%, reject: 87.5%, average: 89.9%)

US results: Sensitivity: 74%

Specificity: 43%

1. The proposed evolutionary algorithm enhances the knowledge base in the fuzzy rule-based system

2. Optimizing algorithm boosts prediction accuracy

3. Successful classification of AA presence/absence in high-risk patients

4. Acceptable diagnostic performance achieved with limited input parameters

5. The model improves processing time and reduces treatment costs for diagnosis

1. Neural networks require a large set of features for accurate performance

2. The Alvarado Scoring system is less valid than other techniques

3 SVM need at least 10 factors for effective classification

4. Data overlap among the three classes

5. Time and cost constraints associated with collecting more extensive input factors for other models

Park et al. [12], 2015 (South Korea)

Pain location, migration of RLQ, tenderness of RLQ, rebound tenderness of RLQ, bowel sound, nausea, vomiting, temperature, WBC counts

MLNN trained using BP and LM algorithms with 2 hidden layers, 31 neurons each. Neuron count optimized by MSE. Layers had summation (linear) and activation (sigmoid) parts. RBF and PNN used Gaussian functions in 1 hidden layer. PNN output included Gaussian and competitive activation functions

MLNN model:

Sensitivity: 99.5%, Specificity: 96.6%, PPV: 94.8%, NPV: 99.7%, Accuracy: 97.8%, AUC: 0.985

RBFNN model:

Sensitivity: 100%, Specificity: 99.7%, PPV: 99.5%, NPV: 100%, Accuracy: 99.8%, AUC: 0.998

PNN model:

Sensitivity: 100%, Specificity: 99.1%, PPV: 98.4%, NPV: 100%, Accuracy: 99.4%, AUC: 0.993

Alvarado score:

Sensitivity: 23.2%, Specificity: 87.4%, PPV: 43.2%, NPV: 77.9%, Accuracy: 72.2%, AUC: 0.633

1. ANN structures showed strong diagnostic performance for appendicitis compared to Alvarado’s scoring

2. Potential for aiding junior surgeons in diagnosis

3. ANNs with objective input data may perform well in other regions

1. ANNs’ performance depends on training experience

2. No comparison with other ML algorithms

3. Imaging methods like CT and ultrasound are not incorporated

4. Generalizability due to large sample size not addressed

5. Real-world clinical application of ANNs not discussed

Safavi et al. [13], 2015 (Iran)

Age, sex, WBC, PCT, CRP, PMN

Employed trial-and-error method to optimize network structure for predicting AA presence. Used 2 hidden layers with 2–20 neurons in steps of 2. Created 2000 unique networks with varying structures. Best accuracy (88%) achieved with 4–8–4–1 structure (4 inputs, 8 neurons in first hidden layer, 4 in second, 1 in output)

MLP model: Sensitivity: 97.6%, Specificity: 41.2%, Accuracy: 88%, AUC: 0.875

WBC: Sensitivity: 85.5%, Specificity: 41.2%, Accuracy: 78%, AUC: 0.789

CRP: Sensitivity: 92.8%, Specificity: 11.8%, Accuracy: 79%, AUC: 0.655

PCT: Sensitivity: 55.42%, Specificity: 29.4%, Accuracy: 51%, AUC: 0.421

PMN: Sensitivity: 65.1%, Specificity: 58.8%, Accuracy: 64%, AUC: 0.663

1. The developed ANN model had higher diagnostic accuracy (88%) compared to other tests

2. Combining methods and using advanced techniques like ANN can enhance disease identification

1. The differences in results might be due to the specific population studied

2. While the neural network model offers high accuracy, its complexity might pose challenges in practical implementation

Lee et al. [14], 2013 (Southern Taiwan)

Age, gender, temperature, CRP, WBC, segment form, migration of abdominal pain, anorexia, nausea or vomiting, right lower quadrant pain, and rebound tenderness

Fivefold cross-validation with 6 repetitions for unbiased prediction performance assessment (average across 30 trials)

PEL model: Sensitivity: 57.3%, Specificity: 66.7%, AUC: 0.619

SVM model: Sensitivity: 100%, Specificity: 0.0%, AUC: 0.500

SMOTE model: Sensitivity: 70.1%, Specificity: 37.7%, AUC 0.539

MCC model: Sensitivity: 56.8%, Specificity: 58.9%, AUC: 0.579

CM model: Sensitivity: 56.1%, Specificity: 61.7%, AUC: 0.589

WCUS model: Sensitivity: 54.6%, Specificity: 58.2%, AUC: 0.564

Alvarado model: Sensitivity: 48.9%, Specificity: 61.0%, AUC: 0.580

1. Effectiveness in Imbalanced Learning: The PEL technique effectively handles imbalanced sample learning

2. Reduced Bias: PEL shows less bias toward either positive or negative classes compared to benchmark techniques

3. Superior Performance: PEL outperforms prevalent scoring systems and other classification techniques that use resampling

1. Incomplete Data

2. Limited Scope: Data are from one medical center and period

3. Narrow Focus: The study is specific to acute appendicitis and certain techniques

4. Limited Variables: Only quick laboratory test variables are considered

5. Negative Case Bias: Negative cases had surgery but different diagnoses

Yoldaş et al. [15], 2012 (Turkey)

Sex, intensity of pain, relocation of pain, pain in the right lower abdominal quadrant, vomiting, temperature, guarding, bowel sounds, rebound tenderness, WBC

Three-layered, multilayer perceptron ANN models, with BP circuit

ANN model: Sensitivity: 100%, Specificity: 97.2%, PPV: 88%, NPV: 100%, AUC: 0.95

NR

1. The ANNs technique is effective in diagnosing appendicitis

2. ANNs are particularly useful in rural hospitals where other diagnostic tools like US and CT scans are unavailable

1. Varying study populations affect results

2. Model variables are not cause-effect based

3. Single-center, limited data

Sun et al. [16]. 2012 (South Korea)

Features in univariate analysis: lymphocytes, urine glucose, total bilirubin, total amylase, chloride, red blood cells, neutrophils, eosinophils, white blood cells, complaints, basophils, glucose, monocytes,

activated partial thromboplastin time, urine ketone, and direct bilirubin

Features in multivariate analysis: neutrophils, complaints, total bilirubin, urine glucose, and lipase

The study employed DT models for the diagnosis of AA, utilizing statistical tests including univariate analysis and Wald forward LR with specific entry and removal criteria. To assess model performance, a tenfold cross-validation approach was implemented. The dataset was divided randomly into ten subsets, with nine used for training (90%) and one for testing (10%) in each iteration. This process was repeated ten times to ensure unbiased generalization error estimation

DT model: Accuracy: 78.9%

DT model based on univariate analysis: Sensitivity: 82.4%, Specificity: 78.3%, PPV: 76.8%, NPV: 83.5%, AUC: 0.803, Accuracy: 80.2%

DT model based on multivariate analysis: Sensitivity: 66%, Specificity: 80%, PPV: 74.3%, NPV: 72.9%, AUC: 0.73, Accuracy: 73.5%

NR

1. Development of a reliable hybrid DT model for early diagnosis of suspected AA

2. Potential application in supporting initial decisions by clinicians and increasing vigilance in suspected cases

1. Small sample size for acute and non-acute appendicitis

2. Potential variations in derived parameters and relationships

3. Lack of external validation or prospective studies

Hsieh et al. [17], 2011 (Taiwan)

Age, sex, migration of pain, anorexia, nausea/vomiting, RLQ tenderness, rebounding pain, diarrhea, progression of pain, right flank pain, body temperature, WBC, neutrophil (%), CRP, urine occult blood, hemoglobin

The study used a tenfold cross-validation for training and validation of each model. Default settings were first applied, and then adjusted for better performance. The RF model used 200 trees, SVM used nu-SVC type with polynomial kernel and probability estimates. ANN utilized a multilayer perception network with a BP algorithm, with specific settings for learning rate, momentum, and training time. The “nominaltobinaryfilter” parameter for ANN was set to false for optimization

RF model: AUC: 0.98 (0.017) Accuracy: 96%, Sensitivity: 94% Specificity: 100%, PPV: 100%, NPV: 87%,

SVM model: AUC: 0.96 (0.027), Accuracy: 93%, Sensitivity: 91%, Specificity: 100%, PPV: 85%, NPV: 73%,

ANN model: AUC: 0.91 (0.047), Accuracy: 91%, Sensitivity: 94%, Specificity: 85%, PPV: 94%, NPV: 85%,

LR model: AUC: 0.87 (0.052), Accuracy: 82%, Sensitivity: 91%, Specificity: 62%, PPV: 85%, NPV: 73%

Alvarado: Sensitivity: 84%, Specificity: 69%, PPV: 87%, NPV: 64%), AUC: 0.77 (0.057), Accuracy: 80%,

An Alvarado score of 6 was the best cutoff value for the prediction of AA (AC = 0.80, SN = 0.84, SP = 0.69)

1. RF outperforms other models in diagnosing AAP

2. The model offers an easy, fast, low-cost, and noninvasive diagnostic method

3. Weka’s open-source software allows for easy implementation

4. Web-based UI and compatibility with electronic medical records enable real-time, automated alerts for clinicians

1. Performance in other hospital settings is unproven

2. The complexity of the algorithm may limit its understanding and adoption by clinicians

3. Prospective external validation has not been performed

Ting et al. [18], 2010 (Taiwan)

Age, gender, migrating pain, anorexia, nausea, vomiting, RLQ tenderness, rebound pain, temperature, WBC, neutrophil count

A C5.0 DT algorithm developed by Quinlan with 3 decision levels and 6 leaf nodes was identified,

DT model: Sensitivity: 94.5%, Specificity: 80.5%

NR

1. Female patients with AA were older than males (p < 0.001)

2. No gender predominance among patients with normal appendices

3. Age was a risk factor for perforated appendicitis

4. Perforated cases had longer hospital stays and higher treatment costs

5. Alvarado’s scoring system did not differentiate well between acute and perforated appendicitis (p = 0.348)

1. No exploration of reasons behind older women’s higher risk

2. Lack of detail on cost factors or cost-reduction strategies

3. Absence of discussion on data collection biases or DT modeling limitations

Prabhudesai et al. [19], 2008 (UK)

Site of maximum pain, anorexia nausea, vomiting, site of tenderness, peritonism, temperature, WBC, neutrophil count, age, sex

The study employed various ML algorithms to model postoperative sepsis risk after appendectomy. This involved random weight assignment, training with retrospective data from 50 patients (25 with inflamed appendix), weight adjustment using a training algorithm, error correction through a BP algorithm, and weight fine-tuning to minimize MSE. Validation was conducted with data from an additional 20 patients. The ANN’s architecture was optimized, utilizing a single output node and empirically determined middle [2,3,4,5,6,7,8,9,10,11,12,13,14,15] and input [11] layer node numbers

ANN model: Sensitivity: 100%, Specificity: 97.2%, PPV: 96%, NPV: 100%

Alvarado (score ≥ 7): Sensitivity: 91.7%, Specificity: 83.3%, PPV: 78.6%, NPV: 93.8%

Alvarado (score ≥ 6): Sensitivity: 95.8%, Specificity: 72.2%, PPV: 69.7%, NPV: 96.2%

Clinical assessment: Sensitivity: 87.5%, Specificity: 80.5%, PPV: 75%, NPV: 90.6%

1. The ANN technique is effective in diagnosing appendicitis

2. It has the potential to reduce unnecessary explorations, negative appendectomy rates, and overall costs

1. The system’s efficiency is highly dependent on the accuracy of the knowledge base

2. Excessive variables may decrease the accuracy of the procedure

3. ANN improves diagnostic accuracy but cannot explain the reasoning behind its conclusions to the user

Sakai et al. [20], 2007 (Japan)

Gender, age, temperature, migration, tenderness at RLQ, rebound tenderness, muscular guarding, CRP, WBC

Feed-forward ANN models with three layers (input, hidden, output) and LR models were created using nine variables. Validation was done using the “.632 + bootstrap method” to evaluate accuracy

ANN model: Sensitivity: 76.7%, Specificity: 73.5%, PPV: 75%, NPV: 75.3%, AUC: 0.801

LR model: Sensitivity: 50%, Specificity: 92.8%, PPV: 87.8%, NPV: 64.2%, AUC: 0.774

Clinical diagnosis: Sensitivity: 100%, Specificity: 0%, PPV: 87.8%, NPV: 0%

1. The ANN model’s accuracy was better than the initial diagnosis based solely on clinical and laboratory findings

2. Reliance on imaging examinations like CT scans is still necessary for precise diagnosis

1. Single-institution study

2. Low proportion of key symptom (right lower quadrant tenderness)

Pesonen et al. [21], 1996 (Finland)

Demographics, initial pain characteristics, pain progression and factors, symptoms, physical examination, laboratory test

The algorithms underwent training using patient data and were tested using a separate patient test set to assess their performance and classification abilities

ART1 model: Sensitivity: 79%, Specificity: 78%, UI (usefulness index): 0.45,

SOM mode: all parameters A (Sensitivity: 62%, Specificity: 82%, UI: 0.27) (all parameters B (Sensitivity: 55%, Specificity: 83%, UI: 0.21),

LVQ model: (all parameters A (Sensitivity: 82%, Specificity: 87%, UI: 0.56) (all parameters B (Sensitivity: 87%, Specificity: 90%, UI: 0.68),

BP model: (all parameters B (Sensitivity: 83%, Specificity: 92%, UI: 0.62)

NR

1. LVQ and BP algorithms are effective in diagnosing AA

2. Supervised learning outperforms unsupervised learning

3. Clinical signs are the best diagnostic parameters

1. Unsupervised learning lacks clinical sensitivity

2. Study limited to specific algorithms and parameters

3. Impact of using all clinical signs not explored

Forsstrom et al. [22], 1995 (Finland)

CRP, WBC, Phospholipase A2 (PLA2)

Used single-layer perceptron and CNN (BP network with 1 hidden layer) models. Each network and dataset were tested 3 times with random weights, learning factor 0.02, momentum 0.7, and 10,000 iterations

DiagaiD model: AUC: 0.6825, MSE: 0.0728

LR model: AUC: 0.677, SEM: 0.071

BP model (original data): 2 hidden nodes: (AUC: 0.6363 MSE: 0.0813, 3 hidden nodes: (AUC: 0.5537 MSE: 0.0819) 4 hidden nodes: (AUC: 0.6469 MSE: 0.0747)

BP model (transformed data): 2 hidden nodes: (AUC: 0.6219 MSE: 0.0763), 3 hidden nodes: (AUC: 0.6069, MSE: 0.0756) 4 hidden nodes: (AUC: 0.6075, MSE: 0.0732)

NR

1. Neuro-fuzzy effective in clinical knowledge extraction

2. DiagaiD outperforms LR

3. Suitable for small datasets

4. Knowledge easily understood by clinicians

1. Risk of overlearning in large networks

2. Sharp cutoff values need adjustment

3. Requires 10 × cases per parameter for reliability

4. Further experiments are needed for parameter order

  1. DL deep learning, AUC area under curve, PPV positive predictive value, NPV negative predictive value, CNN convolutional neural network, AA acute appendicitis, RF random forest, NLP natural language processing, NA no appendicitis, NR not reported, Nor A normal appendicitis, DT decision tree, SVM support vector machine, ANN artificial neural network, FCM fuzzy C-means clustering, US ultrasonography, MLP multilayer perceptron, MLNN multilayer neural network, PNN probabilistic neural network, RBF radial basis function, SOM self-organizing map, BP backpropagation, LVQ learning vector quantization, LR logistic regression, PEL pre-clustering based ensemble learning, CA complicated appendicitis, UCA uncomplicated appendicitis, NB Naïve Bayes, SMOTE synthetic minority oversampling technique, MCC Matthews correlation coefficient, CM cluster medoid, WCUS within cluster under-sampling, FP false positive, TP true positive, WBC white blood cell, CRP c-reactive protein, PCT procalcitonin, PMN polymorphic nuclear, MSE mean squared error, PCC percent correctly classified, CHAID chi-square automatic interaction detection, RIF right iliac fossa, NIRIF RIF pain with no inflammation, IRIF RIF pain with inflammation