Dublin Core
Title
SCRAPPING AND ANALYZING NEWS ARTICLES TO IDENTIFY TRENDS
Abstract
This thesis presents the design and implementation of a comprehensive news analysis platform that delivers real-time insights into global news trends. The primary objective is to address the challenge of information overload by enabling users to efficiently track, analyze, and visualize news coverage from multiple reputable sources.
The system is developed as a full-stack web application, comprising a FastAPI backend and a React-based frontend. The backend automates the collection of articles from major news outlets, followed by systematic text cleaning and natural language processing. Sentiment analysis is performed using both lightweight (TextBlob) and advanced (spaCy) NLP pipelines, while trending keywords are extracted to highlight emerging topics. Data is aggregated into daily, weekly, and monthly snapshots, with persistent storage managed via SQLite and SQLAlchemy to support efficient historical queries.
The frontend features an interactive dashboard, allowing users to explore article counts, sentiment distributions, source breakdowns, and trending keywords through intuitive visualizations. Modern charting libraries ensure clarity and accessibility, while RESTful APIs facilitate seamless integration between frontend and backend.
Results demonstrate the platform’s capability to deliver actionable insights, such as identifying shifts in sentiment, tracking the emergence of specific topics, and comparing coverage across sources. The modular architecture supports extensibility for additional sources or analytical methods.
The system is developed as a full-stack web application, comprising a FastAPI backend and a React-based frontend. The backend automates the collection of articles from major news outlets, followed by systematic text cleaning and natural language processing. Sentiment analysis is performed using both lightweight (TextBlob) and advanced (spaCy) NLP pipelines, while trending keywords are extracted to highlight emerging topics. Data is aggregated into daily, weekly, and monthly snapshots, with persistent storage managed via SQLite and SQLAlchemy to support efficient historical queries.
The frontend features an interactive dashboard, allowing users to explore article counts, sentiment distributions, source breakdowns, and trending keywords through intuitive visualizations. Modern charting libraries ensure clarity and accessibility, while RESTful APIs facilitate seamless integration between frontend and backend.
Results demonstrate the platform’s capability to deliver actionable insights, such as identifying shifts in sentiment, tracking the emergence of specific topics, and comparing coverage across sources. The modular architecture supports extensibility for additional sources or analytical methods.
