Susovan Patra susovanpatra00

What I Build

I design and ship production-grade AI systems — real-time voice agents, LLM pipelines, and enterprise NLP tools. I care deeply about latency, scalability, and systems that actually work in the real world — not just demos.

Currently focused on Voice AI: low-latency speech-to-speech pipelines combining ASR, LLMs, and TTS for real-world conversational agents at scale.

Work Experience

🎙️ Voice AI Engineer — TeleCMI Communications Sept 2025 – Present

🔴 Real-Time Voice Agent System

Designed a real-time speech-to-speech voice agent achieving <600ms end-to-end latency under concurrent load
Cut pipeline latency 1.3s → <600ms (~54% reduction) via model optimization & pipeline parallelism on a single GPU
Integrated Ultravox (ASR+LLM) · Qwen (reasoning) · VibeVoice (TTS) with RAG-based dynamic retrieval

🟠 Post-Call Analytics System

Processes 400+ hours of calls/day across 2 languages with ~95% transcription accuracy + automatic language detection
LLM-based analysis extracts customer pain points, intents, and product insights from call transcripts

🟡 End-to-End Voice Data Processing Pipeline

Audio ingestion via Apache Pulsar → denoising → speaker diarization → transcription, all production-grade
Outputs structured JSON metadata + segmented audio to AWS S3 for analytics and model training

🤖 AI/ML Engineer — Motherson Technology Services Limited June 2024 – Sept 2025

🔵 Confidential Knowledge Chatbot — Motherson Group

Built a secure RAG system over 60+ GB of confidential documents across 100+ companies — <2.5s query latency
Centralized semantic search across the entire group, reducing redundant work and surfacing prior knowledge

🟣 M&A Contract Analysis

Automated comparison of 4+ legal contracts (200–300 pages each), saving ~1 month of due diligence time
LLM-based structured analysis of high-volume M&A agreements

🟢 47×47 Multilingual Enterprise Translator

Deployed a translation system across 47 source → 47 target languages using OpenAI + Helsinki-NLP/Opus
Supports .txt, .docx, .pdf with XML-based table structure preservation

Projects

🎤 Bolna Bhai

Modular Real-Time Voice Agent

Orchestrator-based architecture that decouples ASR · LLM · TTS instead of relying on monolithic models like Ultravox. WebSocket integration with Pipecat, supporting Qwen + AI4Bharat Indic Conformer with extensibility for LLaMA, Mistral, and more.

WebSockets WebRTC Pipecat Qwen ASR TTS

🔊 DieTra

Audio Processing Pipeline

Denoising → Diarization → Transcription using DeepFilterNet, NVIDIA NeMo, and Mistral Voxtral. Runs on 10–12GB VRAM vs traditional 80GB+ — high-quality audio processing on consumer hardware.

DeepFilterNet NeMo Voxtral Diarization

🧠 Attention Playground

7 Attention Mechanisms

Clean implementations from vanilla self-attention to Flash Attention and Multi-Latent Attention, with complexity analysis. A deep dive into modern LLM internals.

PyTorch Transformers Flash Attention MLA

📝 Smart Meeting Minutes

AI-Powered MoM Generator

Whisper + GPT-3.5 + PyAnnote diarization with a React frontend. Auto-generates professional meeting minutes with full speaker attribution.

Whisper GPT-3.5 PyAnnote React

🤖 Intelligent ChatBot

RAG Conversational AI

Multi-turn context-aware chatbot with document retrieval and memory using LangChain.

LangChain RAG FAISS OpenAI

📡 IMU Sensor Fusion (Research)

GPS-Free Positioning · Guide: Dr. Amitangshu Pal

Sensor fusion (accelerometer + compass + ML) achieving ±20–30m accuracy over 4–6 km. Works with standard smartphone sensors.

Sensor Fusion ML IMU Accelerometer

Tech Stack

📜 Certifications

_{"Building real-time AI systems, one model at a time." 🎙️}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly