SENS-HEAD: A Machine Learning Framework for Sensationalism Detection in News Headlines Using Linguistic and Semantic Features
DOI:
https://doi.org/10.37745/bjmas.2022.04909Abstract
The proliferation of sensationalized news headlines has raised concerns about media integrity, necessitating automated approaches for detecting sensationalism beyond traditional clickbait classification. This study presents SENS-HEAD, a novel dataset comprising over 30,000 annotated headlines labelled for sensational content and emotional arousal. Employing Natural Language Processing (NLP), we extract a diverse set of linguistic and semantic features, including sentiment polarity, syntactic complexity, punctuation distribution, and stop word ratio, to systematically distinguish sensational from non-sensational headlines. We implement ensemble learning models—XGBoost, CATBoost, and Random Forest achieving a balanced F1-score of 0.66. To enhance interpretability, we integrate SHAP (SHapley Additive exPlanations), unveiling key predictive markers such as stop word frequency, headline length, and sentiment extremity. The findings not only advance explainable AI (XAI) for sensationalism detection but also provide practical applications in automated journalism, content moderation, and media ethics regulation. By strengthening computational linguistics with ethical AI, this research delivers actionable insights for policymakers and promotes trustworthy news dissemination in the digital era.
Downloads
Downloads
Published
Versions
- 01-06-2025 (2)
- 01-06-2025 (1)