Building an AI Podcast Generator
Fully Automated Daily Podcasts: From Topic Discovery to Published Episode
The Problem: Podcasting Is a Time Sink
I wanted to run a daily AI/tech news podcast. The traditional workflow looked like this:
- Wake up early, scan Hacker News, Reddit, TechCrunch for stories (45 min)
- Select and rank the top 5 most interesting stories (20 min)
- Write a 10-15 minute script with intro, transitions, outro (90 min)
- Record the episode in a quiet room with professional mic (30 min)
- Edit audio: remove mistakes, normalize levels, add music (45 min)
- Design episode artwork in Canva or Figma (20 min)
- Upload to hosting platform, write show notes (15 min)
- Update RSS feed, submit to Spotify/Apple (10 min)
That's 4+ hours per episode. For a daily podcast, that's 28+ hours per week—essentially a part-time job before you've even started your real work.
I needed full automation. Not just scheduling—the entire content creation pipeline.
The Vision: One Button, Complete Episode
I built an AI Podcast Engine inside VibeBlaster that handles everything:
AI-Powered Topic Discovery
GPT-5 with web search discovers today's top AI/tech stories in real-time. No manual scanning required—the AI finds, ranks, and summarizes the most important news.
One-Shot Script Generation
Single AI call discovers topics AND writes a complete 10-15 minute podcast script. Natural conversational tone, proper pacing, sponsor placement markers included.
Voice Cloning with ElevenLabs
Clone your voice directly on ElevenLabs' website with just a 15-second audio clip. Their latest Eleven v3 model produces remarkably high-quality, natural-sounding speech.
AI-Generated Artwork
GPT-Image-1 creates unique episode artwork based on the day's topics. Master image style reference ensures brand consistency across all episodes.
The Tech Stack: Orchestrating AI Services
OpenAI GPT-5 with Web Search
The createChatCompletionWithWebSearch method enables real-time news discovery. GPT-5 searches the web for today's top AI/tech stories, ranks them by importance, and writes a complete podcast script—all in one API call. No need for separate Hacker News, Reddit, or RSS integrations.
ElevenLabs Voice Cloning
Voice cloning is done directly on the ElevenLabs website—just upload a 15-second audio clip and their Instant Voice Cloning creates a synthetic voice. The app's ElevenLabsService then uses the cloned voice ID for text-to-speech generation with 5000-character chunking and audio concatenation via ffmpeg. Using their latest Eleven v3 model produces broadcast-quality audio with natural intonation.
DigitalOcean Spaces for Media Storage
S3-compatible object storage hosts audio files, artwork, and RSS feeds. Public URLs are generated automatically with CDN distribution. The DigitalOceanSpacesService handles uploads, metadata, and URL generation.
ffmpeg for Audio Processing
Scripts longer than 5000 characters get chunked for ElevenLabs API limits. fluent-ffmpeg concatenates audio chunks into a single episode file, calculates duration via ffprobe, and normalizes levels. Checkpoint recovery ensures interrupted workflows resume from the last completed chunk.
Kubernetes for Landing Page Deployment
The PodcastInfrastructureService deploys a branded landing page to Kubernetes with one click. Nginx serves static HTML with RSS feed proxy, Let's Encrypt handles SSL via cert-manager, and the site auto-updates when new episodes publish.
The Architecture: A 4-Step Automated Pipeline
The original plan had 9 steps with 4 AI calls and 3 external APIs. I simplified it to 4 steps with 2 AI calls:
┌─────────────────────────────────────────────────────────────┐
│ Step 1: One-Shot Script Generation (GPT-5 + Web Search) │
│ ───────────────────────────────────────────────────────── │
│ • GPT-5 searches web for today's AI/tech news │
│ • Ranks and selects top 5 stories │
│ • Writes complete 10-15 min podcast script │
│ • Includes intro, transitions, sponsor placeholders │
│ • Output: script_markdown, topic_summary, episode_name │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Step 2: Artwork Generation (GPT-Image-1) │
│ ───────────────────────────────────────────────────────── │
│ • Extract main topic from script │
│ • Use master image as style reference (if available) │
│ • Generate 1400x1400 episode artwork │
│ • Upload to DigitalOcean Spaces │
│ • Output: artwork_url │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Step 3: Audio Generation (ElevenLabs TTS) │
│ ───────────────────────────────────────────────────────── │
│ • Inject sponsor ad scripts at placeholders │
│ • Chunk script at sentence boundaries (5000 char limit) │
│ • Generate TTS for each chunk with cloned voice │
│ • Concatenate chunks with ffmpeg │
│ • Calculate duration, upload to Spaces │
│ • Output: audio_url, duration_seconds │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Step 4: Publish & Deploy │
│ ───────────────────────────────────────────────────────── │
│ • Generate RSS 2.0 feed with iTunes tags │
│ • Upload feed.xml to Spaces │
│ • Deploy/update landing page on Kubernetes │
│ • (Optional) Submit to Spotify via API │
│ • Output: rss_url, site_url │
└─────────────────────────────────────────────────────────────┘This simplified workflow reduced steps by 60% and AI calls by 50%. The entire pipeline runs in under 10 minutes.
The Service Layer: 8 Specialized Modules
Each service handles a single responsibility with proper error handling and IPC response patterns:
PodcastService
Episode CRUD, config management, status tracking, pagination
PodcastScriptService
One-shot script generation, cleanup, sponsor injection, artwork generation
PodcastAudioService
Script chunking, TTS generation, ffmpeg concatenation, duration calculation
ElevenLabsService
Voice cloning, TTS API, voice management, connection testing
DigitalOceanSpacesService
S3-compatible uploads, public URL generation, file management
PodcastRSSService
RSS 2.0 generation, iTunes tags, episode enclosures, feed upload
PodcastPublishService
Multi-platform publishing, Spotify API, publish log tracking
PodcastInfrastructureService
Kubernetes deployment, landing page generation, kubectl orchestration
Voice Cloning: Your Voice, Automated
The magic of this system is voice cloning. ElevenLabs' Instant Voice Cloning on their website requires just a 15-second audio clip to create a synthetic voice that sounds remarkably like you. The cloning process happens entirely on ElevenLabs' platform—you upload your sample, they process it, and you get a voice ID to use in the app.
Note on Accent Accuracy
Instant Voice Cloning can sometimes introduce subtle accent variations—I noticed a slight British tinge in my cloned voice. This is a known characteristic of IVC with unique voices. For more accurate accent replication, ElevenLabs offers Professional Voice Cloning (PVC) which requires 30+ minutes of audio but produces more faithful results. I'm continuing to tweak the settings to dial in the perfect voice.
Once you have your cloned voice ID from ElevenLabs, the app handles the rest:
// Using the cloned voice for TTS
async generateSpeech(text: string, voiceId: string, options: TTSOptions) {
// voiceId comes from ElevenLabs website after cloning
const response = await this.makeRequest<ArrayBuffer>(
'POST',
`/text-to-speech/${voiceId}`,
{
text,
model_id: 'eleven_v3', // Latest model for highest quality
voice_settings: {
stability: 0.5,
similarity_boost: 0.75,
}
}
);
return Buffer.from(response.data);
}TTS Generation with Chunking
ElevenLabs has a 5000-character limit per API call. Scripts are chunked at sentence boundaries to maintain natural flow:
// Script chunking and TTS generation
async generateAudio(episodeId: string, onProgress?: (progress: number) => void) {
const script = await this.getScriptWithSponsors(episodeId);
const chunks = this.splitScriptIntoChunks(script, 5000);
const audioBuffers: Buffer[] = [];
for (let i = 0; i < chunks.length; i++) {
const buffer = await elevenlabsService.generateSpeech(
chunks[i], voiceId, { model: 'eleven_turbo_v3' }
);
audioBuffers.push(buffer);
onProgress?.((i + 1) / chunks.length * 100);
// Checkpoint for recovery
await this.saveCheckpoint(episodeId, i);
}
// Concatenate with ffmpeg
const finalAudio = await this.concatenateAudio(audioBuffers);
const duration = await this.calculateDuration(finalAudio);
return { audioBuffer: finalAudio, duration };
}Checkpoint recovery ensures interrupted workflows resume from the last completed chunk—no wasted API calls or regenerating already-processed audio.
The Sponsor System: Monetization Built-In
Scripts include placeholder markers for sponsor reads. The system supports multiple sponsors with different placements:
- [BEGINNING_SPONSORS]: After the intro, before main content
- [MIDPOINT_SPONSORS]: Between story segments
During audio generation, the injectSponsorsForAudio method replaces placeholders with actual ad scripts:
// Sponsor injection with voice transitions
async injectSponsorsForAudio(projectId: string, script: string) {
const beginningSponsors = await sponsorService.getSponsorsForLocation(
projectId, 'beginning'
);
const midpointSponsors = await sponsorService.getSponsorsForLocation(
projectId, 'midpoint'
);
// Get voice config for sponsor transitions
const { sponsor_beginning_intro, sponsor_beginning_outro } = voiceConfig;
// Build sponsor block with transitions
const beginningBlock = [
sponsor_beginning_intro, // "This episode is brought to you by..."
...beginningSponsors.map(s => `**Sponsor[${s.sponsor_name}]:** ${s.ad_script_text}`),
sponsor_beginning_outro // "Now, back to the show..."
].join('\n\n');
return script.replace('[BEGINNING_SPONSORS]', beginningBlock);
}Sponsors can have different voices—the audio service detects **Sponsor[Name]:** markers and switches to the sponsor's configured voice for their ad read.
Personal Touch: Kids as Sponsors
I cloned my kids' voices on ElevenLabs to read the sponsor segments. It adds a fun, personal touch to the podcast and makes the ad reads more memorable. The multi-voice system makes it easy to switch between the main host voice and sponsor voices seamlessly.
Kubernetes Landing Page: One-Click Deployment
Every podcast needs a landing page. The PodcastInfrastructureService deploys a branded site to Kubernetes:
// Kubernetes deployment workflow
async deployLandingPage(projectId: string, domain: string) {
// 1. Validate kubectl access
await this.runKubectl('version --client');
// 2. Generate namespace from domain
const namespace = this.generateNamespace(domain); // podcast-example-com
// 3. Initialize RSS feed if not exists
if (!project.rss_url) {
await podcastRSSService.generateRSS(projectId);
}
// 4. Load and customize deployment template
let manifest = readFileSync('templates/podcast-deployment.yaml', 'utf-8');
manifest = manifest
.replace(/{{NAMESPACE}}/g, namespace)
.replace(/{{DOMAIN}}/g, domain)
.replace(/{{PODCAST_TITLE}}/g, podcastTitle)
.replace(/{{RSS_URL}}/g, rssUrl)
.replace(/{{PRIMARY_COLOR}}/g, brandColors[0])
.replace(/{{PLATFORM_LINKS}}/g, platformLinksHtml);
// 5. Apply to cluster
await this.runKubectl(`apply -f ${tempManifestFile}`);
// 6. Wait for deployment ready
await this.waitForDeployment(namespace, 'podcast-landing', 60);
return { siteUrl: `https://${domain}`, namespace };
}The deployment template includes:
- Nginx serving static HTML with RSS feed proxy
- ConfigMap for HTML content with brand colors
- Ingress with Let's Encrypt SSL via cert-manager
- Episode grid that loads dynamically from RSS
- Audio player for in-browser listening
Daily Automation: Set It and Forget It
The PodcastSchedulerService uses node-cron to run the workflow automatically:
// Daily automation workflow
import cron from 'node-cron';
class PodcastSchedulerService {
private jobs: Map<string, cron.ScheduledTask> = new Map();
startAutoGeneration(projectId: string, time: string = '06:00') {
const [hour, minute] = time.split(':');
const cronExpression = `${minute} ${hour} * * *`; // Daily at specified time
const job = cron.schedule(cronExpression, async () => {
console.log(`[PodcastScheduler] Running daily workflow for ${projectId}`);
// Step 1: Generate script with web search
const scriptResult = await podcastScriptService.generateScriptOneShot(
episodeId, today
);
// Step 2: Generate artwork
await podcastScriptService.generateEpisodeArtwork(episodeId);
// Step 3: Generate audio (after review or auto-approve)
if (autoApprove) {
await podcastAudioService.generateAudio(episodeId);
await podcastPublishService.publishEpisode(episodeId, ['rss']);
} else {
// Send notification for manual review
this.sendReviewNotification(episodeId);
}
});
this.jobs.set(projectId, job);
}
}The default workflow pauses after script/artwork generation for human review. Once approved, audio generation and publishing happen with one click.
The Challenges and Trade-Offs
1. TTS Character Limits
ElevenLabs limits requests to 5000 characters. A 15-minute script is 2000-2500 words (~12,000-15,000 characters). The solution: intelligent chunking at sentence boundaries with checkpoint recovery for interrupted workflows.
2. Audio Quality Consistency
Concatenating multiple TTS chunks can create audible seams. Using the latest eleven_v3 model with consistent voice settings minimizes this. ffmpeg normalization ensures uniform volume across chunks.
3. Script TTS Optimization
Markdown headers like ## Story 1 get read as "hashtag hashtag Story 1". The cleanup pass removes markdown formatting and adds natural pauses using ellipses, em dashes, and paragraph breaks.
4. Real-Time News Accuracy
GPT-5's web search finds current news, but AI can hallucinate details. The script includes a review step where hosts can fact-check and edit before audio generation. The system preserves intro/outro exactly as configured.
5. Cost Management
ElevenLabs charges per character. A 15-minute episode costs ~$0.50-1.00 on the Creator plan. GPT-5 with web search adds ~$0.10-0.20 per script. Total cost per episode: under $2—far cheaper than professional voice talent.
The Results: 4 Hours to 10 Minutes
Before AI Podcast Engine
- • Topic research: 45 minutes
- • Script writing: 90 minutes
- • Recording: 30 minutes
- • Audio editing: 45 minutes
- • Artwork design: 20 minutes
- • Publishing: 25 minutes
- Total: 4+ hours per episode
After AI Podcast Engine
- • Topic research: 0 minutes (AI web search)
- • Script writing: 0 minutes (AI generation)
- • Recording: 0 minutes (voice clone TTS)
- • Audio editing: 0 minutes (auto-normalized)
- • Artwork design: 0 minutes (AI generation)
- • Review & publish: 10 minutes
- Total: 10 minutes per episode
That's a 96% time reduction. For daily podcasts, that's 27+ hours saved per week.
Cost Analysis: Surprisingly Affordable
Per Episode Costs
Total per episode: ~$1.13
Monthly Infrastructure
Total monthly: ~$49/month for daily podcasts
Technical Highlights
My AI-Generated Podcasts
Here are the podcasts I'm currently running with this AI Podcast Engine. More coming soon as I refine the workflow!

AI News in 10
Daily AI and tech news podcast in about 10 minutes. Curated and generated entirely by AI using my and my kids' cloned voices.

Right Versus Left News
A daily podcast that looks at top news of the last 24 hours, gives a brief description of each story, then presents what the right is saying and what the left is saying about it.
More Podcasts Coming Soon
Once I nail down the workflow and fine-tune the voice cloning, I'll be launching additional shows on different topics.
What's Next: Expanding the Podcast Empire
The first podcast is live and running. Now I'm focused on:
- Dialing in the voice: Still tweaking the ElevenLabs settings to eliminate that subtle British accent. May experiment with Professional Voice Cloning for more accurate results.
- Launching a second podcast: Once the workflow is bulletproof, I plan to spin up another show on a different topic—the infrastructure supports unlimited podcasts.
- Improving sponsor integration: The kids' voices are a hit for sponsor reads, considering adding more variety.
Conclusion: The Future of Content Creation
The AI Podcast Engine proves that fully automated content creation is possible today. Not just scheduling or distribution—the entire creative pipeline from topic discovery to published episode.
The key insights: simplify ruthlessly (9 steps became 4), build for recovery (checkpoints save expensive API calls), and keep humans in the loop (review before publish catches AI mistakes).
Voice cloning technology has reached a quality threshold where synthetic voices are nearly indistinguishable from recordings. ElevenLabs' Eleven v3 model, combined with GPT-5's real-time web search, creates possibilities that were science fiction two years ago.
The podcast that would have consumed 28 hours of my week now takes 70 minutes—and most of that is optional review. That's the power of thoughtful automation.
Complete Tech Stack
AI/ML: OpenAI GPT-5 (web search, script generation), GPT-Image-1 (artwork), ElevenLabs Eleven v3 (TTS with cloned voices)
Audio: fluent-ffmpeg, ffprobe (concatenation, duration, normalization)
Storage: DigitalOcean Spaces (S3-compatible), AWS SDK v2
Database: SQLite with better-sqlite3, 4 podcast tables
Scheduling: node-cron for daily automation
Infrastructure: Kubernetes (DigitalOcean), Nginx, cert-manager (Let's Encrypt)
Frontend: React, TypeScript, shadcn/ui, Tailwind CSS
Desktop: Electron with IPC handlers for main/renderer communication
Want to Build AI-Powered Automation?
If you're looking for an engineer who can orchestrate AI services, build content automation pipelines, and deploy production systems, let's talk.