Overview
RAG Equity Research Agent is an autonomous AI system that performs professional-grade equity research by aggregating and analyzing multiple data sources in real-time. Built with LangGraph for multi-agent orchestration, hybrid RAG (BM25 + dense embeddings) for SEC filings analysis, and deployed on Azure Container Apps with a Telegram bot interface.
Key Features
| Feature | Description |
|---|---|
| Deep Analysis | Autonomous research combining market data, SEC filings, and sentiment analysis |
| Hybrid RAG | BM25 sparse + dense embeddings with RRF reranking for accurate document retrieval |
| Multi-Agent System | LangGraph orchestration with specialized agents (market data, news, RAG, synthesizer) |
| Real-time Data | Yahoo Finance integration for live quotes, financials, and historical data |
| Risk Scoring | Automated risk assessment from 10-K filings with keyword extraction |
| Telegram Bot | Full-featured bot with inline keyboards, watchlists, and price alerts |
Technology Stack
| Layer | Technologies |
|---|---|
| LLM | Groq (Llama 3.3 70B), Azure OpenAI (GPT-4o-mini) |
| Orchestration | LangGraph, LangChain |
| RAG | Qdrant Vector DB, BM25, Hybrid Search, RRF Reranking |
| Data Sources | Yahoo Finance, SEC EDGAR, Reddit API, DuckDuckGo |
| Backend | FastAPI, Pydantic, Python 3.11+ |
| Bot | python-telegram-bot with async handlers |
| Infrastructure | Docker, Azure Container Apps, Terraform |
| CI/CD | GitHub Actions (lint, test, security, deploy) |
| Quality | Ruff, pytest (53% coverage), Bandit security scan |
Telegram Commands
| Command | Description |
|---|---|
| /analyze TICKER | Deep multi-source analysis |
| /quote TICKER | Real-time stock quote |
| /compare TICKER1 TICKER2 | Side-by-side comparison |
| /dcf TICKER | Discounted Cash Flow valuation |
| /risk TICKER | Risk score from 10-K analysis |
| /peers TICKER | Automatic peer comparison |
| /reddit TICKER | Reddit/WSB sentiment analysis |
| /watchlist | Manage personal watchlist |
| /alert TICKER PRICE | Set price alerts |
RAG Pipeline
The hybrid search pipeline combines multiple retrieval strategies:
- Chunking - SEC filings split into semantic chunks with metadata
- BM25 Sparse - Traditional keyword matching for exact terms
- Dense Embeddings - Semantic similarity via text-embedding-ada-002
- RRF Fusion - Reciprocal Rank Fusion combines both rankings
- Reranking - Final relevance scoring for top-k results
Cloud Infrastructure
Deployed on Azure with infrastructure-as-code:
- Container Apps - Serverless containers with scale-to-zero
- Container Registry - Private Docker image storage
- Azure OpenAI - GPT-4o-mini with 10K TPM quota
- Qdrant - Vector database on Container Instance
- Key Vault - Secure secrets management
- Log Analytics - Centralized logging and monitoring
Results
| Metric | Value |
|---|---|
| Code Coverage | 53% |
| Lines of Code | 10,400+ |
| Test Cases | 205 |
| CI Pipeline | Lint + Test + Security + Build |
| Estimated Cost | ~$50-110/month (Azure) |
