Comparison of approaches to text classification
Compare approaches to text classification based on machine learning. Special attention should be paid to an evaluation of the usefulness of various features, ranging from simple (length of text, bag-of-words) to more complicated ones derived from syntax, detected entities, etc.
For training and testing, use the current Yelp challenge dataset of reviews. The data contain several candidate target variables (usefulness of review, rating), select one or more of them. The comparison should include - Comparison of basic algorithms (their results, speed, ...) - Evaluation of impact of training data size - Evaluation of various text features - Comparison of text features with metadata features |
