ZMT PUB - EUDETECTOR: Leveraging Language Model to Identify EU-Related News.

Tools

Rudra, Koustav, Tran, Danny and Shaltev, Miroslav ORCID: https://orcid.org/0000-0002-8244-7732 (2021) EUDETECTOR: Leveraging Language Model to Identify EU-Related News. In: Companion Proceedings of the Web Conference 2021. , ed. by Leskovec, Jure, Grobelnik, Marko, Najork, Marc A., Tang, Jie and Zia, Leila. Association for Computing Machinery, New York, pp. 380-384. ISBN 978-1-4503-8313-4 DOI https://doi.org/10.1145/3442442.3452324.

Text
Shaltev.PDF - Published Version
Restricted to Registered users only
Download (12MB)

Official URL: http://dx.doi.org/10.1145/3442442.3452324

Abstract

News media reflects the present state of a country or region to its audiences. Media outlets of a region post different kinds of news for their local and global audiences. In this paper, we focus on Europe (precisely EU) and propose a method to identify news that has an impact on Europe from any aspect such as financial, business, crime, politics, etc. Predicting the location of the news is itself a challenging task. Most of the approaches restrict themselves towards named entities or handcrafted features. In this paper, we try to overcome that limitation i.e., instead of focusing only on the named entities (Europe location, politicians etc.) and some hand-crafted rules, we also explore the context of news articles with the help of pre-trained language model BERT. The auto-regressive language model based European news detector shows about 9-19% improvement in terms of F-score over baseline models. Interestingly, we observe that such models automatically capture named entities, their origin, etc; hence, no separate information is required. We also evaluate the role of such entities in the prediction and explore the tokens that BERT really looks at for deciding the news category. Entities such as person, location, organization turn out to be good rationale tokens for the prediction.

Document Type:	Book chapter
Programme Area:	PA Not Applicable
Research affiliation:	DigiZ - Research Data Infrastructure
Document Access:	Closed access
DOI:	https://doi.org/10.1145/3442442.3452324
Date Deposited:	13 Jun 2022 15:47
Last Modified:	13 Nov 2025 10:29
URI:	https://cris.leibniz-zmt.de/id/eprint/4951

Actions (login required)

View Item

Altmetric