SP
All projects
production

AI News Aggregation & Automation Tools

Automated news ingestion system with similarity detection and duplicate removal using Perplexity API and OpenAI Embeddings.

Perplexity APIOpenAI EmbeddingsLaravelMySQL

01Overview

Built to eliminate manual news monitoring, this system automates the ingestion, deduplication, and reporting of industry-specific news. It ensures stakeholders only see high-signal, unique content.

02Features

  • Real-time topic-based automated news ingestion
  • Embedding-based similarity detection for duplicate removal
  • Content clustering for high-level industry trends
  • Multi-user isolation and structured reporting workflows
  • Cost-optimized caching layer for expensive API calls

03Engineering Decisions

Selected OpenAI Embeddings to perform semantic deduplication, which is far more accurate than simple string matching.

Designed a robust caching layer to reduce API costs while maintaining real-time freshness.

Built a modular ingestion pipeline that can be easily extended to new news sources.

04Challenges & Solutions

Challenge

Duplicate news coverage across multiple sources.

Solution

Calculated vector embeddings for article content and used cosine similarity to automatically flag and group duplicates.