Documentation

Podifact Docs

Complete reference for the Podifact platform — from using the dashboard to integrating with the API.

Overview

Podifact is an AI-powered podcast intelligence platform. Upload a transcript or provide a YouTube URL and the system runs a multi-agent pipeline — extracting key notes, detecting factual claims, verifying them against live web search and your knowledge base, and optionally generating social content.

Dashboard sections

ProcessSubmit transcripts for pipeline processing. Supports JSON, TXT, and YouTube URLs.

EpisodesBrowse your processed episode library, filter by source, sort, and semantic search.

ChatAI chat grounded in your episode content. Scope conversations to individual episodes.

JobsMonitor pipeline run history — status, progress, processing stats, and error details.

KnowledgeUpload reference documents (PDFs, DOCX, MD) to enrich fact-checking and chat.

SettingsManage your profile, change your password, and create personal API keys.

API DocsIn-dashboard API reference with live code examples.

Pipeline overview

Ingest

Transcript parsed, validated, and split into sections. Embeddings generated and stored in the vector database.

Summarise

LLM extracts key themes, notable quotes, takeaways, and produces a concise summary.

Fact-check

Claims detected and classified. Each claim is verified with web search and RAG retrieval.

Social

Optional: tweet thread, headline, and hashtag set generated from the verified content.

Store

Full EpisodeReport persisted to PostgreSQL. Episode available in your library immediately.

Processing Transcripts

Navigate to Dashboard → Process to submit a transcript. Two formats are supported.

JSON format (structured)

Preferred format. Enables speaker attribution, timestamps, and per-section chunking.

{
  "episode_id": "ep001",
  "title":      "AI in Healthcare: Hype or Reality?",
  "host":       "Alex Chen",
  "guests":     ["Dr. Priya Sharma"],
  "transcript": [
    {
      "timestamp": "00:00",
      "speaker":   "Alex Chen",
      "section":   "Introduction",
      "text":      "Welcome back to the show..."
    }
  ]
}

Plain text format

TXT files are accepted. Speaker detection and sectioning are inferred by the LLM. Less accurate than JSON.

Processing options

Fact checkRun claim detection and web verification. Enabled by default. Disable for summarisation-only runs.

Generate socialProduce a tweet thread, headline, and hashtags from verified content.

Chunk strategy"section" splits by transcript sections (best for structured JSON). "fixed" and "semantic" are alternatives for plain text.

TipAfter submission the page redirects to the live job view. You can leave and check Jobs later — results are persisted automatically once the pipeline completes.

Episode Library

Every processed transcript appears in Dashboard → Episodes as a stored episode with full metadata.

Library tab

Text searchFilter cards by title or filename in real time.

Source filterShow only uploads or only YouTube-sourced episodes.

SortNewest, oldest, title A–Z, or most claims.

Episode card actions

View reportOpen the full EpisodeReport — summary, key notes, fact-check results, agent log, social content.

Chat →Opens a new episode-scoped conversation in the Chat page. AI retrieves context only from this episode.

Copy IDCopies the episode_id to clipboard — useful for API calls.

DeleteRemoves the episode from the library and purges its vectors from ChromaDB.

Semantic Search tab

Search by meaning across all indexed episodes. Results are ranked by cosine similarity and include the matching transcript chunk, episode reference, and section label. Useful for finding specific claims or topics across your entire library.

Fact-Check Results

Every claim extracted from a transcript receives a verification result with a status, confidence score, and evidence trail.

Claim statuses

VerifiedEvidence found through web search or knowledge base that supports the claim.

Possibly outdatedClaim was accurate at one time but newer sources suggest it may have changed.

UnverifiableNo search results or RAG context could confirm or deny the claim (personal anecdotes, niche claims).

Confidence score

A 0–100% score representing the LLM's certainty in its verdict given the available evidence. High confidence (≥75%) with verified status = strong result. Low confidence with verified = treat with caution.

Source basis

webVerified using live search results. Real URLs are attached. Most authoritative.

web+ragBoth web search and transcript context used. Strong corroboration.

ragVerified against another part of the same transcript (cross-reference).

knowledgeVerified against an uploaded knowledge base document.

noneNo external evidence retrieved — verdict based on model knowledge only.

WarningSources listed under a claim (LLM-cited labels) may differ from the Web Sources section (real URLs from the search engine). Always check Web Sources for links you can visit — those are factual, not generated.

AI Chat

Navigate to Dashboard → Chat for RAG-enhanced AI conversation. Two modes: general (searches all your episodes) and episode-scoped (retrieves context only from one episode).

Starting a conversation

Click + New chat in the sidebar, or navigate from an episode card using the chat icon — the latter auto-creates a conversation scoped to that episode.

TipWhen a conversation is scoped to an episode, a green banner shows at the top of the chat: "Scoped to [episode name] — RAG scoped". The AI retrieves context exclusively from that episode's transcript.

Suggested prompts

When starting a scoped conversation, prompt suggestions are generated from the episode title (e.g. "What were the key verified claims in…"). For general chats, suggestions cover your full library.

Conversation management

Sidebar groupsConversations grouped into Today, Yesterday, This week, Older.

Episode labelScoped conversations show a green episode title label below the conversation name.

DeleteHover a conversation row to reveal the trash icon. Deletes from backend immediately.

Knowledge Base

Upload reference documents to Dashboard → Knowledge. Once indexed, documents are automatically retrieved during fact-checking and AI chat.

Supported formats

PDFDOCXTXTMarkdown

Processing lifecycle

processingDocument uploaded, chunking and embedding in progress.

readyIndexed and available for semantic retrieval.

errorParsing or indexing failed — error detail shown on hover.

NoteMax file size is 20 MB. Documents are chunked with overlap before embedding — chunk count shown on each document card. Semantic search is available directly from the Knowledge page.

Jobs & History

Dashboard → Jobs shows the complete history of pipeline runs for your account with per-run stats.

Job statuses

queuedJob received, waiting for a worker slot.

runningPipeline active. Progress bar and node name update in real time (5s poll).

completedReport generated and episode stored. Click the external-link icon to view the full report.

failedPipeline error. Error message shown in the row. Click the retry icon to re-process from the Process page.

Stats columns

TokensTotal LLM tokens consumed across all pipeline nodes.

TimeWall-clock processing time in seconds.

ModelPrimary LLM used for this run.

Settings & API Keys

Dashboard → Settings has three tabs: Profile, API Keys, and About.

Profile

Update username and email independently. Username must be 3+ characters, alphanumeric with underscores/hyphens. Both fields are pre-populated from your account on load.

Password change

Requires your current password. New password must be 8+ characters. Separate from the profile save — uses its own submit button.

API Keys

Generate long-lived personal API keys for use in scripts and integrations. Keys are prefixed pa_live_ and grant the same access as your session token.

WarningThe full key value is shown only once at creation time. Copy and store it immediately — it cannot be retrieved later. You can create unlimited keys and revoke any key from the list.

API Reference

Authentication

All protected endpoints require a Bearer token in the Authorization header. Two token types are accepted.

JWT tokens (short-lived, 30 min)

POSThttp://localhost:8000/api/v1/auth/login

Authenticate and receive access + refresh tokens.

Request Body

usernamestringrequiredAccount username

passwordstringrequiredAccount password

curl -X POST http://localhost:8000/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username":"alice","password":"password123"}'

API keys (long-lived, pa_live_...)

Generate personal API keys in Settings → API Keys. Use anywhere a JWT is accepted.

curl -H "Authorization: Bearer pa_live_your_key_here" \
  http://localhost:8000/api/v1/auth/me

Quick Start

5 min

Submit a transcript, poll until complete, read the report.

import requests, time

BASE   = "http://localhost:8000"
HEADS  = {"Authorization": "Bearer pa_live_your_key"}

# 1 — Submit
with open("episode.json", "rb") as f:
    job = requests.post(f"{BASE}/api/v1/process",
        headers=HEADS, files={"file": f},
        data={"fact_check": "true", "async_mode": "true"},
    ).json()
print("Job:", job["job_id"])

# 2 — Poll
while True:
    s = requests.get(f"{BASE}/api/v1/jobs/{job['job_id']}", headers=HEADS).json()
    print(s["status"], s["progress"])
    if s["status"] in ("completed", "failed"): break
    time.sleep(3)

# 3 — Report
print(s["report"]["summary"]["text"][:200])

Processing

POSThttp://localhost:8000/api/v1/processAuth required

Submit a transcript for the full pipeline. Returns immediately with a job_id when async_mode=true.

Request Body

fileFilerequiredTranscript (.json or .txt)

fact_checkbooleanRun claim verification. Default: true

generate_socialbooleanGenerate tweet thread + headline. Default: false

async_modebooleanReturn job_id immediately. Default: false (blocks)

chunk_strategystring"section" | "fixed" | "semantic". Default: "section"

curl -X POST http://localhost:8000/api/v1/process \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@transcript.json" \
  -F "fact_check=true" \
  -F "async_mode=true"

Jobs & Streaming

List jobs

GEThttp://localhost:8000/api/v1/jobsAuth required

List pipeline jobs for the authenticated user.

Query Parameters

statusstringFilter: queued | running | completed | failed

limitintegerMax results. Default 50, max 100

curl "http://localhost:8000/api/v1/jobs?status=completed" -H "Authorization: Bearer $TOKEN"

Live SSE stream

GEThttp://localhost:8000/api/v1/jobs/{job_id}/streamAuth required

Server-Sent Events stream. Closes automatically on completion or failure.

import sseclient, requests, json
resp   = requests.get(f"{BASE}/api/v1/jobs/{job_id}/stream", headers=HEADS, stream=True)
client = sseclient.SSEClient(resp)
for event in client.events():
    data = json.loads(event.data)
    if event.event == "pipeline_complete":
        print(data["title"]); break
    print(f"[{event.event}] {data.get('progress',0)}%")

Episodes

List episodes

GEThttp://localhost:8000/api/v1/episodesAuth required

List all stored episodes with metadata.

curl http://localhost:8000/api/v1/episodes -H "Authorization: Bearer $TOKEN"

Get full report

GEThttp://localhost:8000/api/v1/episodes/{episode_id}Auth required

Fetch the complete EpisodeReport including fact-check results, agent steps, and optional social content.

curl http://localhost:8000/api/v1/episodes/ep001 -H "Authorization: Bearer $TOKEN"

Delete episode

DELETEhttp://localhost:8000/api/v1/episodes/{episode_id}Auth required

Remove an episode and purge its vectors from the vector store.

curl -X DELETE http://localhost:8000/api/v1/episodes/ep001 -H "Authorization: Bearer $TOKEN"

Semantic Search

POSThttp://localhost:8000/api/v1/searchAuth required

Vector search across all indexed episode transcripts. Optionally scope to one episode.

Request Body

querystringrequiredNatural language query

episode_idstringScope to a single episode

top_kintegerResult count. Default 5, max 20

curl -X POST http://localhost:8000/api/v1/search \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query":"FDA regulation AI","top_k":5}'

Chat

POSThttp://localhost:8000/api/v1/chatAuth required

RAG-enhanced streaming chat. Returns an OpenAI-compatible SSE stream.

Request Body

messagesMessage[]required[{role:"user"|"assistant", content:string}]

episode_idstringScope RAG to a single episode

resp = requests.post(f"{BASE}/api/v1/chat",
    json={"messages":[{"role":"user","content":"What FDA claims were verified?"}],
          "episode_id":"ep001"},
    headers=HEADS, stream=True)
for line in resp.iter_lines():
    if line.startswith(b"data: ") and line != b"data: [DONE]":
        import json
        chunk = json.loads(line[6:])["choices"][0]["delta"].get("content","")
        print(chunk, end="", flush=True)

Knowledge Base

Upload document

POSThttp://localhost:8000/api/v1/knowledgeAuth required

Upload a document for indexing. Processing happens in background.

Request Body

fileFilerequiredPDF, TXT, MD, or DOCX. Max 20 MB.

curl -X POST http://localhost:8000/api/v1/knowledge -H "Authorization: Bearer $TOKEN" -F "file=@paper.pdf"

Semantic search

GEThttp://localhost:8000/api/v1/knowledge/searchAuth required

Search uploaded knowledge base documents.

Query Parameters

qstringrequiredSearch query

top_kintegerMax results. Default 8

curl "http://localhost:8000/api/v1/knowledge/search?q=FDA+AI+approval" -H "Authorization: Bearer $TOKEN"

Conversations

Persistent chat history. Each conversation stores messages and optionally scopes to an episode.

GET/api/v1/conversationsList all conversations for the user

POST/api/v1/conversationsCreate a new conversation

GET/api/v1/conversations/{id}Get conversation

PATCH/api/v1/conversations/{id}Update title

DELETE/api/v1/conversations/{id}Delete conversation and messages

GET/api/v1/conversations/{id}/messagesList messages

POST/api/v1/conversations/{id}/messagesAdd a message

Errors & Limits

HTTP status codes

200OK

201Created

204No Content

400Bad Request

401Unauthorized

403Forbidden

404Not Found

409Conflict

413Payload Too Large

422Unprocessable Entity

429Too Many Requests

500Internal Server Error

Rate limits

Rate limiting is applied per IP via slowapi. LLM calls are governed by your provider's limits (Groq free tier: ~30 req/min). The pipeline uses exponential backoff with 2 retries per node. Max file sizes: 20 MB per document, 50 MB per transcript.