Rudra, Koustav, Tran, Danny and Shaltev, Miroslav (2021) EUDETECTOR: Leveraging Language Model to Identify EU-Related News. WWW '21 . pp. 380-384. DOI https://doi.org/10.1145/3442442.3452324.

Full text not available from this repository.

Abstract

News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

Document Type: Article
Programme Area: PA Not Applicable
Research affiliation: Infrastructure > DigiZ - Research Data Infrastructure
Refereed: No
Open Access Journal?: Yes
DOI: https://doi.org/10.1145/3442442.3452324
Date Deposited: 13 Jun 2022 15:47
Last Modified: 13 Jun 2022 15:47
URI: http://cris.leibniz-zmt.de/id/eprint/4951

Actions (login required)

View Item View Item