Documentation
Podifact Docs
Complete reference for the Podifact platform — from using the dashboard to integrating with the API.
Overview
Podifact is an AI-powered podcast intelligence platform. Upload a transcript or provide a YouTube URL and the system runs a multi-agent pipeline — extracting key notes, detecting factual claims, verifying them against live web search and your knowledge base, and optionally generating social content.
Dashboard sections
ProcessSubmit transcripts for pipeline processing. Supports JSON, TXT, and YouTube URLs.EpisodesBrowse your processed episode library, filter by source, sort, and semantic search.ChatAI chat grounded in your episode content. Scope conversations to individual episodes.JobsMonitor pipeline run history — status, progress, processing stats, and error details.KnowledgeUpload reference documents (PDFs, DOCX, MD) to enrich fact-checking and chat.SettingsManage your profile, change your password, and create personal API keys.API DocsIn-dashboard API reference with live code examples.Pipeline overview
Ingest
Transcript parsed, validated, and split into sections. Embeddings generated and stored in the vector database.
Summarise
LLM extracts key themes, notable quotes, takeaways, and produces a concise summary.
Fact-check
Claims detected and classified. Each claim is verified with web search and RAG retrieval.
Social
Optional: tweet thread, headline, and hashtag set generated from the verified content.
Store
Full EpisodeReport persisted to PostgreSQL. Episode available in your library immediately.
Processing Transcripts
Navigate to Dashboard → Process to submit a transcript. Two formats are supported.
JSON format (structured)
Preferred format. Enables speaker attribution, timestamps, and per-section chunking.
{
"episode_id": "ep001",
"title": "AI in Healthcare: Hype or Reality?",
"host": "Alex Chen",
"guests": ["Dr. Priya Sharma"],
"transcript": [
{
"timestamp": "00:00",
"speaker": "Alex Chen",
"section": "Introduction",
"text": "Welcome back to the show..."
}
]
}Plain text format
TXT files are accepted. Speaker detection and sectioning are inferred by the LLM. Less accurate than JSON.
Processing options
Fact checkRun claim detection and web verification. Enabled by default. Disable for summarisation-only runs.Generate socialProduce a tweet thread, headline, and hashtags from verified content.Chunk strategy"section" splits by transcript sections (best for structured JSON). "fixed" and "semantic" are alternatives for plain text.Episode Library
Every processed transcript appears in Dashboard → Episodes as a stored episode with full metadata.
Library tab
Text searchFilter cards by title or filename in real time.Source filterShow only uploads or only YouTube-sourced episodes.SortNewest, oldest, title A–Z, or most claims.Episode card actions
View reportOpen the full EpisodeReport — summary, key notes, fact-check results, agent log, social content.Chat →Opens a new episode-scoped conversation in the Chat page. AI retrieves context only from this episode.Copy IDCopies the episode_id to clipboard — useful for API calls.DeleteRemoves the episode from the library and purges its vectors from ChromaDB.Semantic Search tab
Search by meaning across all indexed episodes. Results are ranked by cosine similarity and include the matching transcript chunk, episode reference, and section label. Useful for finding specific claims or topics across your entire library.
Fact-Check Results
Every claim extracted from a transcript receives a verification result with a status, confidence score, and evidence trail.
Claim statuses
Confidence score
A 0–100% score representing the LLM's certainty in its verdict given the available evidence. High confidence (≥75%) with verified status = strong result. Low confidence with verified = treat with caution.
Source basis
webVerified using live search results. Real URLs are attached. Most authoritative.web+ragBoth web search and transcript context used. Strong corroboration.ragVerified against another part of the same transcript (cross-reference).knowledgeVerified against an uploaded knowledge base document.noneNo external evidence retrieved — verdict based on model knowledge only.AI Chat
Navigate to Dashboard → Chat for RAG-enhanced AI conversation. Two modes: general (searches all your episodes) and episode-scoped (retrieves context only from one episode).
Starting a conversation
Click + New chat in the sidebar, or navigate from an episode card using the chat icon — the latter auto-creates a conversation scoped to that episode.
Suggested prompts
When starting a scoped conversation, prompt suggestions are generated from the episode title (e.g. "What were the key verified claims in…"). For general chats, suggestions cover your full library.
Conversation management
Sidebar groupsConversations grouped into Today, Yesterday, This week, Older.Episode labelScoped conversations show a green episode title label below the conversation name.DeleteHover a conversation row to reveal the trash icon. Deletes from backend immediately.Knowledge Base
Upload reference documents to Dashboard → Knowledge. Once indexed, documents are automatically retrieved during fact-checking and AI chat.
Supported formats
Processing lifecycle
processingDocument uploaded, chunking and embedding in progress.readyIndexed and available for semantic retrieval.errorParsing or indexing failed — error detail shown on hover.Jobs & History
Dashboard → Jobs shows the complete history of pipeline runs for your account with per-run stats.
Job statuses
queuedJob received, waiting for a worker slot.runningPipeline active. Progress bar and node name update in real time (5s poll).completedReport generated and episode stored. Click the external-link icon to view the full report.failedPipeline error. Error message shown in the row. Click the retry icon to re-process from the Process page.Stats columns
TokensTotal LLM tokens consumed across all pipeline nodes.TimeWall-clock processing time in seconds.ModelPrimary LLM used for this run.Settings & API Keys
Dashboard → Settings has three tabs: Profile, API Keys, and About.
Profile
Update username and email independently. Username must be 3+ characters, alphanumeric with underscores/hyphens. Both fields are pre-populated from your account on load.
Password change
Requires your current password. New password must be 8+ characters. Separate from the profile save — uses its own submit button.
API Keys
Generate long-lived personal API keys for use in scripts and integrations. Keys are prefixed pa_live_ and grant the same access as your session token.
Authentication
All protected endpoints require a Bearer token in the Authorization header. Two token types are accepted.
JWT tokens (short-lived, 30 min)
http://localhost:8000/api/v1/auth/loginAuthenticate and receive access + refresh tokens.
Request Body
usernamestringrequiredAccount usernamepasswordstringrequiredAccount passwordcurl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"username":"alice","password":"password123"}'API keys (long-lived, pa_live_...)
Generate personal API keys in Settings → API Keys. Use anywhere a JWT is accepted.
curl -H "Authorization: Bearer pa_live_your_key_here" \ http://localhost:8000/api/v1/auth/me
Quick Start
5 minSubmit a transcript, poll until complete, read the report.
import requests, time
BASE = "http://localhost:8000"
HEADS = {"Authorization": "Bearer pa_live_your_key"}
# 1 — Submit
with open("episode.json", "rb") as f:
job = requests.post(f"{BASE}/api/v1/process",
headers=HEADS, files={"file": f},
data={"fact_check": "true", "async_mode": "true"},
).json()
print("Job:", job["job_id"])
# 2 — Poll
while True:
s = requests.get(f"{BASE}/api/v1/jobs/{job['job_id']}", headers=HEADS).json()
print(s["status"], s["progress"])
if s["status"] in ("completed", "failed"): break
time.sleep(3)
# 3 — Report
print(s["report"]["summary"]["text"][:200])Processing
http://localhost:8000/api/v1/processAuth requiredSubmit a transcript for the full pipeline. Returns immediately with a job_id when async_mode=true.
Request Body
fileFilerequiredTranscript (.json or .txt)fact_checkbooleanRun claim verification. Default: truegenerate_socialbooleanGenerate tweet thread + headline. Default: falseasync_modebooleanReturn job_id immediately. Default: false (blocks)chunk_strategystring"section" | "fixed" | "semantic". Default: "section"curl -X POST http://localhost:8000/api/v1/process \ -H "Authorization: Bearer $TOKEN" \ -F "file=@transcript.json" \ -F "fact_check=true" \ -F "async_mode=true"
Jobs & Streaming
List jobs
http://localhost:8000/api/v1/jobsAuth requiredList pipeline jobs for the authenticated user.
Query Parameters
statusstringFilter: queued | running | completed | failedlimitintegerMax results. Default 50, max 100curl "http://localhost:8000/api/v1/jobs?status=completed" -H "Authorization: Bearer $TOKEN"
Live SSE stream
http://localhost:8000/api/v1/jobs/{job_id}/streamAuth requiredServer-Sent Events stream. Closes automatically on completion or failure.
import sseclient, requests, json
resp = requests.get(f"{BASE}/api/v1/jobs/{job_id}/stream", headers=HEADS, stream=True)
client = sseclient.SSEClient(resp)
for event in client.events():
data = json.loads(event.data)
if event.event == "pipeline_complete":
print(data["title"]); break
print(f"[{event.event}] {data.get('progress',0)}%")Episodes
List episodes
http://localhost:8000/api/v1/episodesAuth requiredList all stored episodes with metadata.
curl http://localhost:8000/api/v1/episodes -H "Authorization: Bearer $TOKEN"
Get full report
http://localhost:8000/api/v1/episodes/{episode_id}Auth requiredFetch the complete EpisodeReport including fact-check results, agent steps, and optional social content.
curl http://localhost:8000/api/v1/episodes/ep001 -H "Authorization: Bearer $TOKEN"
Delete episode
http://localhost:8000/api/v1/episodes/{episode_id}Auth requiredRemove an episode and purge its vectors from the vector store.
curl -X DELETE http://localhost:8000/api/v1/episodes/ep001 -H "Authorization: Bearer $TOKEN"
Semantic Search
http://localhost:8000/api/v1/searchAuth requiredVector search across all indexed episode transcripts. Optionally scope to one episode.
Request Body
querystringrequiredNatural language queryepisode_idstringScope to a single episodetop_kintegerResult count. Default 5, max 20curl -X POST http://localhost:8000/api/v1/search \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"query":"FDA regulation AI","top_k":5}'Chat
http://localhost:8000/api/v1/chatAuth requiredRAG-enhanced streaming chat. Returns an OpenAI-compatible SSE stream.
Request Body
messagesMessage[]required[{role:"user"|"assistant", content:string}]episode_idstringScope RAG to a single episoderesp = requests.post(f"{BASE}/api/v1/chat",
json={"messages":[{"role":"user","content":"What FDA claims were verified?"}],
"episode_id":"ep001"},
headers=HEADS, stream=True)
for line in resp.iter_lines():
if line.startswith(b"data: ") and line != b"data: [DONE]":
import json
chunk = json.loads(line[6:])["choices"][0]["delta"].get("content","")
print(chunk, end="", flush=True)Knowledge Base
Upload document
http://localhost:8000/api/v1/knowledgeAuth requiredUpload a document for indexing. Processing happens in background.
Request Body
fileFilerequiredPDF, TXT, MD, or DOCX. Max 20 MB.curl -X POST http://localhost:8000/api/v1/knowledge -H "Authorization: Bearer $TOKEN" -F "file=@paper.pdf"
Semantic search
http://localhost:8000/api/v1/knowledge/searchAuth requiredSearch uploaded knowledge base documents.
Query Parameters
qstringrequiredSearch querytop_kintegerMax results. Default 8curl "http://localhost:8000/api/v1/knowledge/search?q=FDA+AI+approval" -H "Authorization: Bearer $TOKEN"
Conversations
Persistent chat history. Each conversation stores messages and optionally scopes to an episode.
/api/v1/conversationsList all conversations for the user/api/v1/conversationsCreate a new conversation/api/v1/conversations/{id}Get conversation/api/v1/conversations/{id}Update title/api/v1/conversations/{id}Delete conversation and messages/api/v1/conversations/{id}/messagesList messages/api/v1/conversations/{id}/messagesAdd a messageErrors & Limits
HTTP status codes
200OK201Created204No Content400Bad Request401Unauthorized403Forbidden404Not Found409Conflict413Payload Too Large422Unprocessable Entity429Too Many Requests500Internal Server ErrorRate limits
Rate limiting is applied per IP via slowapi. LLM calls are governed by your provider's limits (Groq free tier: ~30 req/min). The pipeline uses exponential backoff with 2 retries per node. Max file sizes: 20 MB per document, 50 MB per transcript.