Artificial Intelligence and Acute Appendicitis: A Systematic Review of Diagnostic and Prognostic Models

Issaiy, Mahbod; Zarei, Diana; Saghazadeh, Amene

doi:10.1186/s13017-023-00527-2

World Journal of Emergency Surgery

Table 2 Details of artificial intelligence methods applied and outcomes in studies for appendicitis diagnosis

From: Artificial Intelligence and Acute Appendicitis: A Systematic Review of Diagnostic and Prognostic Models

Study, year	Input features	Training/validation strategy	Performance	Comparative algorithms and scoring metrics	Key findings	Limitations
Park et al. [1], 2023 (South Korea)	CT slices	DL model trained using fivefold cross-validation and separate test dataset Each fold had 60–70% training samples, 15–20% validation samples for parameter tuning, and 15–25% test samples for final evaluation	Single-Slice method: Sensitivity: 85.6%, Specificity: 96%, PPV: 85.4%, Accuracy: 86.1%, AUC: 0.937 RGB method: Sensitivity: 87.8%, Specificity: 88%, PPV: 87.1%, Accuracy: 87.9%, AUC: 0.951	NR	CNN performed better with serial slices and the RGB method than with a single-slice method	1. Retrospective study 2. Limited acute diverticulitis CT images and data augmentation used for balance 3. Excluded complicated diverticulitis cases 4. No tool was developed for condition localization in CT images 5. CNN performance was not evaluated with coronal reformatted CT images
Akbulut et al. [2], 2023 (Turkey)	TBil, WBC, Neutrophil, WLR, NLR, CRP, and WNR values and lower PNR, PDW, and MCV	The persistence method was repeated 50 times with different seeds for model robustness. CatBoost model predicted AA, with optimized hyperparameters using grid search with tenfold cross-validation and 5 replicates	CatBoost: Sensitivity 84.2%, Specificity 93.2%, AUC 0.947, Accuracy 88.2%, F1-score 88.7%	NR	The CatBoost ML model demonstrated high accuracy in distinguishing between AA and NA patients, achieving an 88.2% accuracy rate	1. The study is retrospective and lacks comprehensive clinical data 2. Radiological data are missing for approximately 11% of the patient sample 3. Conducted at a single institution
Ghareeb et al. [3], 2021 (Egypt)	Age, gender, marital status, obesity, diabetes mellitus, hypertension, hepatitis B virus infection, hepatitis C virus infection, autoimmune diseases, pain history of similar, duration of pain, site of pain, nausea, vomiting, anorexia, body temperature, CBC, Hg, ultrasound findings	It assessed various learning algorithms and selected the best-performing model based on accuracy and AUC. Principal Component Analysis (PCA) was used for precise feature selection without excluding any variables. An optimization process reduced prediction errors, and external validation was done with a separate dataset. Variable importance was ranked, and Ensemble Bag optimization with 30 iterations minimized diagnostic classification errors to 0.129	The best model performance (Subspace KNN model): Sensitivity: 100%, Specificity: 80%, PPV: 97.9%, NPV: 96.7%, Accuracy: 91.1, AUC: 0.82 Other models accuracies: DT: 84.4%, LR: 87.5%, NB: 88.8%, SVM: 89.3%, KNN: 89.3%	Alvarado score: Sensitivity: 68.2%, Specificity: 80%, PPV: 96.7%, NPV: 22.9%, Accuracy: 69.5% US alone: Sensitivity: 50.8%, Specificity: 73.5%, PPV: 94.7%, NPV: 16.9%, Accuracy: 58.6% Combined US and Alvarado: Sensitivity: 69.6%, Specificity: 100%, PPV: 100%, NPV: 28%, Accuracy: 72.8%	1. The diagnostic accuracy of the AI model outperforms both the Alvarado score alone and the Alvarado score combined with US criteria 2. The AI model excels in diagnostic accuracy, except for specificity, which is higher when combined with specific criteria	1. Single-center study 2. Small number of patients 3. Exclusion of patients with colon cancer 4. Limited real-world 5. applicability of the AI model 6. Inclusion of patients with pathologies other than appendicitis may affect results
Rajpurkar et al. [4], 2020, (USA)	CT scan	Created development and test sets using stratified random sampling with a balance of about 50% appendicitis examinations and 50% non-appendicitis examinations	Pretrained on video images: Sensitivity: 78.4%, Specificity: 66.7%, Accuracy: 72.5%, AUC: 0.810 Not pretrained on video images: Sensitivity: 78.4%, Specificity: 35.3%, Accuracy: 56.9%, AUC: 0.724	NR	1. Small training dataset used; video pretraining compensates for dataset size 2. Model technique applicable for future medical image DL studies	1. Small training dataset, no investigation into video pretraining’s impact with data size 2. Pretraining model effect explored using Kinetics dataset 3. Single-center study 4. The model does not differentiate between CA and UCA
Park et al. [5], 2020 (USA)	CT scan	Used eightfold cross-validation. The dataset is split into 8 parts, 7 for training, and 1 for testing. Hyperparameters set based on initial training, used for all 8 models. External validation with CT data from two institutions on 8 trained CNN models. The deep CNN used in the algorithm was built with six convolutional layers, three max-pooling layers, and two fully connected layers	Training and internal validation: Sensitivity: 90.2%, Specificity: 92%, Accuracy: 91.5% External validation, institution 1 (Sensitivity: 88.5%, Specificity: 91.2%, Accuracy: 90%), institution 2 (Sensitivity: 95%, Specificity: 100%, Accuracy: 97.5%)	NR	Feasibility of CNN-based diagnosis algorithm for diagnosing acute appendicitis using CT data	1. Excluded patients with tumors in the appendix who had surgical removal 2. Trained and tested network using manually extracted 4 cm³ appendix region data
Zhao et al. [6], 2020 (China)	More than 800 proteins in each urine sample	Detected outliers in the discovery dataset (AA outliers and CON outliers) against a normal urine database (495 samples) to identify markers indicating changes under pathological conditions	RF model: Sensitivity: 81.2%, Specificity: 84.4%, Accuracy: 83.6% SVM model: Sensitivity: 25%, Specificity: 97.8%, Accuracy: 78.7%, NB model: Sensitivity: 68.8%, Specificity: 71.1%, Accuracy: 70%	NR	1. The urinary proteomic system finds markers for AA vs. other acute abdomens 2. The RF model has high specificity in AA diagnosis without clinical signs 3. Noninvasive urinary markers have potential for clinical use	1. No validation with a larger sample size 2. No absolute quantification for feature proteins 3. No exploration of combining urinary markers with metabolites
Ramirez garcialunaa et al. [7], 2020 (Mexico)	Abdominal skin IRT images	Training and validation cohorts had balanced distributions of patients in three categories (“healthy,” “appendicitis,” and “no appendicitis”) with nine relevant predictors. The final model was built by considering the accuracy-complexity trade-off	RF model: Accuracy: 76.9%, Sensitivity: 91.3%, Specificity: 56.3%, PPV: 75%, NPV: 81.8%	NR	1. IRT may complement diagnostic workup for appendicitis 2. IRT is a timesaving, low-cost, noninvasive imaging modality 3. IRT has the potential to improve the clinical decision-making process	1. Group sizes unequal, non-appendicitis smaller 2. Minimal clinical/laboratory differences between groups 3. No IRT vs. CT scan comparison, gold standard
Kang et al. [8], 2019 (South Korea)	Rebound tenderness severity, migration, urinalysis, symptom duration, leukocytosis, neutrophil count, and CRP levels	The DT comprises 11 final nodes. The severity of rebound tenderness was selected in the parent node	DT model: AUC: 0.85, [95% CI; 0.799–0.893]	Alvarado score: AUC: 0.695 AAS score: AUC: 0.749 Eskelinen score: AUC: 0.715	1. New clinical approach using DT aids AA diagnosis in adults with equivocal CT findings 2. Helps decide the disposition of patients with equivocal results	1. retrospective design, 2. A small patient population in some DT nodes 3. CT findings interpreted by only one radiologist
Gudelis et al. [9], 2019 (Spain)	Blumberg sign, pain migration, increased pain, increased pain with movement, pain when coughing, anorexia, temperature, number of leukocytes, hours of evolution, and CRP levels	The training and validation method involved implementing the ANN model using the Alyuda1 (Neurointelligence) program, which utilizes MLP methodology with BP. In this process, all candidate variables were included in the “full model” type. The models had automatic variable selection capabilities based on significance or hierarchy. Internal validation was conducted using cross-validation with 10 partitions	ANN model: for all diagnoses: PCC: 75%, for AA diagnosis: PCC: 93.5%, AUC: 0.95	CHAID: for all diagnoses: PCC: 74.2%, for AA diagnosis: PCC: 81.7%, AUC: 0.93	1. Professionals treating RIF pain can benefit from interpretable models 2. The CHAID model offers a classification with more than two possibilities (AA vs. non-AA) 3. Validation in a larger series is necessary to confirm the model’s performance	1. Assignment of groups not validated by literature 2. Small sample size, particularly in the NIRIF and IRIF diagnostic groups 3. Limited capacity of models that only compare AA versus other conditions in real patient management
Shahmoradi et al. [10], 2018 (Iran)	Demographic, symptoms, clinical signs, laboratory findings	Used an MLP network with two hidden layers (7 and 5 neurons) and specific activation functions. For RBFN, specific activation functions and rescaling methods were applied	MLP model: Sensitivity: 80%, Specificity: 97.5%, PPV: 92.3%, NPV: 93%, Accuracy: 92.9%, AUC: 0.832 RBFN model: Sensitivity: 28%, Specificity: 87.8%, PPV: 64.2%, NPV: 81.8%, Accuracy: 77.6%, AUC: NR LR model: Sensitivity: 58.3%, Specificity: 93.2%, PPV: 75.7%, NPV: 86%, Accuracy: 83.9%, AUC: 0.808	NR	1. MLP model outperforms LR in sensitivity, specificity, and accuracy 2. Essential predictors: leukocytosis, sex, tenderness, right iliac fossa pain	1. Small sample size: 2. Lack of imaging techniques 3. Risk of misdiagnosis 4. The study Limited variables
Jamshidnezhad et al. [11], 2017 (Iran)	Age, first abdominal pain time, initial pain site, RLQ abdomen shift, WBC, neutrophil count	The model was trained 10 times to assess reliability. Each time, an independent dataset was used for testing. Training took 135 s, while testing achieved diagnosis results in less than 1 s	Fuzzy-rule-based system: Accuracy rate (presence: 92%, high risk: 90%, reject: 87.5%, average: 89.9%)	US results: Sensitivity: 74% Specificity: 43%	1. The proposed evolutionary algorithm enhances the knowledge base in the fuzzy rule-based system 2. Optimizing algorithm boosts prediction accuracy 3. Successful classification of AA presence/absence in high-risk patients 4. Acceptable diagnostic performance achieved with limited input parameters 5. The model improves processing time and reduces treatment costs for diagnosis	1. Neural networks require a large set of features for accurate performance 2. The Alvarado Scoring system is less valid than other techniques 3 SVM need at least 10 factors for effective classification 4. Data overlap among the three classes 5. Time and cost constraints associated with collecting more extensive input factors for other models
Park et al. [12], 2015 (South Korea)	Pain location, migration of RLQ, tenderness of RLQ, rebound tenderness of RLQ, bowel sound, nausea, vomiting, temperature, WBC counts	MLNN trained using BP and LM algorithms with 2 hidden layers, 31 neurons each. Neuron count optimized by MSE. Layers had summation (linear) and activation (sigmoid) parts. RBF and PNN used Gaussian functions in 1 hidden layer. PNN output included Gaussian and competitive activation functions	MLNN model: Sensitivity: 99.5%, Specificity: 96.6%, PPV: 94.8%, NPV: 99.7%, Accuracy: 97.8%, AUC: 0.985 RBFNN model: Sensitivity: 100%, Specificity: 99.7%, PPV: 99.5%, NPV: 100%, Accuracy: 99.8%, AUC: 0.998 PNN model: Sensitivity: 100%, Specificity: 99.1%, PPV: 98.4%, NPV: 100%, Accuracy: 99.4%, AUC: 0.993	Alvarado score: Sensitivity: 23.2%, Specificity: 87.4%, PPV: 43.2%, NPV: 77.9%, Accuracy: 72.2%, AUC: 0.633	1. ANN structures showed strong diagnostic performance for appendicitis compared to Alvarado’s scoring 2. Potential for aiding junior surgeons in diagnosis 3. ANNs with objective input data may perform well in other regions	1. ANNs’ performance depends on training experience 2. No comparison with other ML algorithms 3. Imaging methods like CT and ultrasound are not incorporated 4. Generalizability due to large sample size not addressed 5. Real-world clinical application of ANNs not discussed
Safavi et al. [13], 2015 (Iran)	Age, sex, WBC, PCT, CRP, PMN	Employed trial-and-error method to optimize network structure for predicting AA presence. Used 2 hidden layers with 2–20 neurons in steps of 2. Created 2000 unique networks with varying structures. Best accuracy (88%) achieved with 4–8–4–1 structure (4 inputs, 8 neurons in first hidden layer, 4 in second, 1 in output)	MLP model: Sensitivity: 97.6%, Specificity: 41.2%, Accuracy: 88%, AUC: 0.875	WBC: Sensitivity: 85.5%, Specificity: 41.2%, Accuracy: 78%, AUC: 0.789 CRP: Sensitivity: 92.8%, Specificity: 11.8%, Accuracy: 79%, AUC: 0.655 PCT: Sensitivity: 55.42%, Specificity: 29.4%, Accuracy: 51%, AUC: 0.421 PMN: Sensitivity: 65.1%, Specificity: 58.8%, Accuracy: 64%, AUC: 0.663	1. The developed ANN model had higher diagnostic accuracy (88%) compared to other tests 2. Combining methods and using advanced techniques like ANN can enhance disease identification	1. The differences in results might be due to the specific population studied 2. While the neural network model offers high accuracy, its complexity might pose challenges in practical implementation
Lee et al. [14], 2013 (Southern Taiwan)	Age, gender, temperature, CRP, WBC, segment form, migration of abdominal pain, anorexia, nausea or vomiting, right lower quadrant pain, and rebound tenderness	Fivefold cross-validation with 6 repetitions for unbiased prediction performance assessment (average across 30 trials)	PEL model: Sensitivity: 57.3%, Specificity: 66.7%, AUC: 0.619	SVM model: Sensitivity: 100%, Specificity: 0.0%, AUC: 0.500 SMOTE model: Sensitivity: 70.1%, Specificity: 37.7%, AUC 0.539 MCC model: Sensitivity: 56.8%, Specificity: 58.9%, AUC: 0.579 CM model: Sensitivity: 56.1%, Specificity: 61.7%, AUC: 0.589 WCUS model: Sensitivity: 54.6%, Specificity: 58.2%, AUC: 0.564 Alvarado model: Sensitivity: 48.9%, Specificity: 61.0%, AUC: 0.580	1. Effectiveness in Imbalanced Learning: The PEL technique effectively handles imbalanced sample learning 2. Reduced Bias: PEL shows less bias toward either positive or negative classes compared to benchmark techniques 3. Superior Performance: PEL outperforms prevalent scoring systems and other classification techniques that use resampling	1. Incomplete Data 2. Limited Scope: Data are from one medical center and period 3. Narrow Focus: The study is specific to acute appendicitis and certain techniques 4. Limited Variables: Only quick laboratory test variables are considered 5. Negative Case Bias: Negative cases had surgery but different diagnoses
Yoldaş et al. [15], 2012 (Turkey)	Sex, intensity of pain, relocation of pain, pain in the right lower abdominal quadrant, vomiting, temperature, guarding, bowel sounds, rebound tenderness, WBC	Three-layered, multilayer perceptron ANN models, with BP circuit	ANN model: Sensitivity: 100%, Specificity: 97.2%, PPV: 88%, NPV: 100%, AUC: 0.95	NR	1. The ANNs technique is effective in diagnosing appendicitis 2. ANNs are particularly useful in rural hospitals where other diagnostic tools like US and CT scans are unavailable	1. Varying study populations affect results 2. Model variables are not cause-effect based 3. Single-center, limited data
Sun et al. [16]. 2012 (South Korea)	Features in univariate analysis: lymphocytes, urine glucose, total bilirubin, total amylase, chloride, red blood cells, neutrophils, eosinophils, white blood cells, complaints, basophils, glucose, monocytes, activated partial thromboplastin time, urine ketone, and direct bilirubin Features in multivariate analysis: neutrophils, complaints, total bilirubin, urine glucose, and lipase	The study employed DT models for the diagnosis of AA, utilizing statistical tests including univariate analysis and Wald forward LR with specific entry and removal criteria. To assess model performance, a tenfold cross-validation approach was implemented. The dataset was divided randomly into ten subsets, with nine used for training (90%) and one for testing (10%) in each iteration. This process was repeated ten times to ensure unbiased generalization error estimation	DT model: Accuracy: 78.9% DT model based on univariate analysis: Sensitivity: 82.4%, Specificity: 78.3%, PPV: 76.8%, NPV: 83.5%, AUC: 0.803, Accuracy: 80.2% DT model based on multivariate analysis: Sensitivity: 66%, Specificity: 80%, PPV: 74.3%, NPV: 72.9%, AUC: 0.73, Accuracy: 73.5%	NR	1. Development of a reliable hybrid DT model for early diagnosis of suspected AA 2. Potential application in supporting initial decisions by clinicians and increasing vigilance in suspected cases	1. Small sample size for acute and non-acute appendicitis 2. Potential variations in derived parameters and relationships 3. Lack of external validation or prospective studies
Hsieh et al. [17], 2011 (Taiwan)	Age, sex, migration of pain, anorexia, nausea/vomiting, RLQ tenderness, rebounding pain, diarrhea, progression of pain, right flank pain, body temperature, WBC, neutrophil (%), CRP, urine occult blood, hemoglobin	The study used a tenfold cross-validation for training and validation of each model. Default settings were first applied, and then adjusted for better performance. The RF model used 200 trees, SVM used nu-SVC type with polynomial kernel and probability estimates. ANN utilized a multilayer perception network with a BP algorithm, with specific settings for learning rate, momentum, and training time. The “nominaltobinaryfilter” parameter for ANN was set to false for optimization	RF model: AUC: 0.98 (0.017) Accuracy: 96%, Sensitivity: 94% Specificity: 100%, PPV: 100%, NPV: 87%, SVM model: AUC: 0.96 (0.027), Accuracy: 93%, Sensitivity: 91%, Specificity: 100%, PPV: 85%, NPV: 73%, ANN model: AUC: 0.91 (0.047), Accuracy: 91%, Sensitivity: 94%, Specificity: 85%, PPV: 94%, NPV: 85%, LR model: AUC: 0.87 (0.052), Accuracy: 82%, Sensitivity: 91%, Specificity: 62%, PPV: 85%, NPV: 73%	Alvarado: Sensitivity: 84%, Specificity: 69%, PPV: 87%, NPV: 64%), AUC: 0.77 (0.057), Accuracy: 80%, An Alvarado score of 6 was the best cutoff value for the prediction of AA (AC = 0.80, SN = 0.84, SP = 0.69)	1. RF outperforms other models in diagnosing AAP 2. The model offers an easy, fast, low-cost, and noninvasive diagnostic method 3. Weka’s open-source software allows for easy implementation 4. Web-based UI and compatibility with electronic medical records enable real-time, automated alerts for clinicians	1. Performance in other hospital settings is unproven 2. The complexity of the algorithm may limit its understanding and adoption by clinicians 3. Prospective external validation has not been performed
Ting et al. [18], 2010 (Taiwan)	Age, gender, migrating pain, anorexia, nausea, vomiting, RLQ tenderness, rebound pain, temperature, WBC, neutrophil count	A C5.0 DT algorithm developed by Quinlan with 3 decision levels and 6 leaf nodes was identified,	DT model: Sensitivity: 94.5%, Specificity: 80.5%	NR	1. Female patients with AA were older than males (p < 0.001) 2. No gender predominance among patients with normal appendices 3. Age was a risk factor for perforated appendicitis 4. Perforated cases had longer hospital stays and higher treatment costs 5. Alvarado’s scoring system did not differentiate well between acute and perforated appendicitis (p = 0.348)	1. No exploration of reasons behind older women’s higher risk 2. Lack of detail on cost factors or cost-reduction strategies 3. Absence of discussion on data collection biases or DT modeling limitations
Prabhudesai et al. [19], 2008 (UK)	Site of maximum pain, anorexia nausea, vomiting, site of tenderness, peritonism, temperature, WBC, neutrophil count, age, sex	The study employed various ML algorithms to model postoperative sepsis risk after appendectomy. This involved random weight assignment, training with retrospective data from 50 patients (25 with inflamed appendix), weight adjustment using a training algorithm, error correction through a BP algorithm, and weight fine-tuning to minimize MSE. Validation was conducted with data from an additional 20 patients. The ANN’s architecture was optimized, utilizing a single output node and empirically determined middle [2,3,4,5,6,7,8,9,10,11,12,13,14,15] and input [11] layer node numbers	ANN model: Sensitivity: 100%, Specificity: 97.2%, PPV: 96%, NPV: 100%	Alvarado (score ≥ 7): Sensitivity: 91.7%, Specificity: 83.3%, PPV: 78.6%, NPV: 93.8% Alvarado (score ≥ 6): Sensitivity: 95.8%, Specificity: 72.2%, PPV: 69.7%, NPV: 96.2% Clinical assessment: Sensitivity: 87.5%, Specificity: 80.5%, PPV: 75%, NPV: 90.6%	1. The ANN technique is effective in diagnosing appendicitis 2. It has the potential to reduce unnecessary explorations, negative appendectomy rates, and overall costs	1. The system’s efficiency is highly dependent on the accuracy of the knowledge base 2. Excessive variables may decrease the accuracy of the procedure 3. ANN improves diagnostic accuracy but cannot explain the reasoning behind its conclusions to the user
Sakai et al. [20], 2007 (Japan)	Gender, age, temperature, migration, tenderness at RLQ, rebound tenderness, muscular guarding, CRP, WBC	Feed-forward ANN models with three layers (input, hidden, output) and LR models were created using nine variables. Validation was done using the “.632 + bootstrap method” to evaluate accuracy	ANN model: Sensitivity: 76.7%, Specificity: 73.5%, PPV: 75%, NPV: 75.3%, AUC: 0.801 LR model: Sensitivity: 50%, Specificity: 92.8%, PPV: 87.8%, NPV: 64.2%, AUC: 0.774	Clinical diagnosis: Sensitivity: 100%, Specificity: 0%, PPV: 87.8%, NPV: 0%	1. The ANN model’s accuracy was better than the initial diagnosis based solely on clinical and laboratory findings 2. Reliance on imaging examinations like CT scans is still necessary for precise diagnosis	1. Single-institution study 2. Low proportion of key symptom (right lower quadrant tenderness)
Pesonen et al. [21], 1996 (Finland)	Demographics, initial pain characteristics, pain progression and factors, symptoms, physical examination, laboratory test	The algorithms underwent training using patient data and were tested using a separate patient test set to assess their performance and classification abilities	ART1 model: Sensitivity: 79%, Specificity: 78%, UI (usefulness index): 0.45, SOM mode: all parameters A (Sensitivity: 62%, Specificity: 82%, UI: 0.27) (all parameters B (Sensitivity: 55%, Specificity: 83%, UI: 0.21), LVQ model: (all parameters A (Sensitivity: 82%, Specificity: 87%, UI: 0.56) (all parameters B (Sensitivity: 87%, Specificity: 90%, UI: 0.68), BP model: (all parameters B (Sensitivity: 83%, Specificity: 92%, UI: 0.62)	NR	1. LVQ and BP algorithms are effective in diagnosing AA 2. Supervised learning outperforms unsupervised learning 3. Clinical signs are the best diagnostic parameters	1. Unsupervised learning lacks clinical sensitivity 2. Study limited to specific algorithms and parameters 3. Impact of using all clinical signs not explored
Forsstrom et al. [22], 1995 (Finland)	CRP, WBC, Phospholipase A2 (PLA2)	Used single-layer perceptron and CNN (BP network with 1 hidden layer) models. Each network and dataset were tested 3 times with random weights, learning factor 0.02, momentum 0.7, and 10,000 iterations	DiagaiD model: AUC: 0.6825, MSE: 0.0728 LR model: AUC: 0.677, SEM: 0.071 BP model (original data): 2 hidden nodes: (AUC: 0.6363 MSE: 0.0813, 3 hidden nodes: (AUC: 0.5537 MSE: 0.0819) 4 hidden nodes: (AUC: 0.6469 MSE: 0.0747) BP model (transformed data): 2 hidden nodes: (AUC: 0.6219 MSE: 0.0763), 3 hidden nodes: (AUC: 0.6069, MSE: 0.0756) 4 hidden nodes: (AUC: 0.6075, MSE: 0.0732)	NR	1. Neuro-fuzzy effective in clinical knowledge extraction 2. DiagaiD outperforms LR 3. Suitable for small datasets 4. Knowledge easily understood by clinicians	1. Risk of overlearning in large networks 2. Sharp cutoff values need adjustment 3. Requires 10 × cases per parameter for reliability 4. Further experiments are needed for parameter order

DL deep learning, AUC area under curve, PPV positive predictive value, NPV negative predictive value, CNN convolutional neural network, AA acute appendicitis, RF random forest, NLP natural language processing, NA no appendicitis, NR not reported, Nor A normal appendicitis, DT decision tree, SVM support vector machine, ANN artificial neural network, FCM fuzzy C-means clustering, US ultrasonography, MLP multilayer perceptron, MLNN multilayer neural network, PNN probabilistic neural network, RBF radial basis function, SOM self-organizing map, BP backpropagation, LVQ learning vector quantization, LR logistic regression, PEL pre-clustering based ensemble learning, CA complicated appendicitis, UCA uncomplicated appendicitis, NB Naïve Bayes, SMOTE synthetic minority oversampling technique, MCC Matthews correlation coefficient, CM cluster medoid, WCUS within cluster under-sampling, FP false positive, TP true positive, WBC white blood cell, CRP c-reactive protein, PCT procalcitonin, PMN polymorphic nuclear, MSE mean squared error, PCC percent correctly classified, CHAID chi-square automatic interaction detection, RIF right iliac fossa, NIRIF RIF pain with no inflammation, IRIF RIF pain with inflammation

Back to article page

ISSN: 1749-7922

Contact us

Submission enquiries: journalsubmissions@springernature.com