Transactions on Machine Learning and Data Mining (ISSN: 1865-6781)
Volume 10 - Number 1 - July 2017 - Pages 03-24
A Wineinformatics Study for White-box Classification Algorithms to Understand and Evaluate Wine Judges
¹Bernard Chen, ¹Hai Le, ²Travis Atkison, ³Dongsheng Che
¹Department of Computer Science University of Central Arkansas,
²Department of Computer Science, University of Alabama,
³Department of Computer Science East Stroudsburg University
Abstract
Wineinformatics is a new data science research domain that utilizes wine as the domain knowledge. Wines are usually evaluated by wine judges who give scores to the wines they review. This paper proposes to use white box classification algorithms to understand why the wine judges score a wine as 90+ or 90-. Several white box classification algorithms with improved components are applied to wine sensory data derived from professional wine reviews. Each algorithm is able to tell how the judges make their decision. The extracted information is also useful to wine producers, distributors, and consumers. The dataset includes 1000 wines with 500 scored as 90+ points (positive class) and 500 scored as 90- points (negative class). Decision Tree, Association Classification, k-NN, Naïve Bayes and SVM are applied to the data and compared. The higher the accuracy retrieved from the algorithm, the more suitable it is for understanding the wine judges. The best white-box classification algorithm prediction accuracy we produced under 5-fold cross validation was 85.7% using Naïve Bayes algorithm with Laplace. The result indicates that the Naïve Bayes algorithm with Laplace might be the best white-box classification algorithm to understand wine judges. The SVM, a typical black-box classification algorithm, achieves 88% accuracy. Sensitivity and specificity are also evaluated in selected algorithms. To the best of our knowledge, it is the first time that the classification algorithms are applied and compared in wine sensory reviews.
Keywords:Wineinformatics, White-box Classification, Decision Tree, Association Classification, Naïve Bayes, k-Nearest Neighbors, SVM
Download Paper (379 KB)