Back to Home

Building an AI Podcast Generator

Building an AI Podcast Generator

Fully Automated Daily Podcasts: From Topic Discovery to Published Episode

The Problem: Podcasting Is a Time Sink

I wanted to run a daily AI/tech news podcast. The traditional workflow looked like this:

  1. Wake up early, scan Hacker News, Reddit, TechCrunch for stories (45 min)
  2. Select and rank the top 5 most interesting stories (20 min)
  3. Write a 10-15 minute script with intro, transitions, outro (90 min)
  4. Record the episode in a quiet room with professional mic (30 min)
  5. Edit audio: remove mistakes, normalize levels, add music (45 min)
  6. Design episode artwork in Canva or Figma (20 min)
  7. Upload to hosting platform, write show notes (15 min)
  8. Update RSS feed, submit to Spotify/Apple (10 min)

That's 4+ hours per episode. For a daily podcast, that's 28+ hours per week—essentially a part-time job before you've even started your real work.

I needed full automation. Not just scheduling—the entire content creation pipeline.

The Vision: One Button, Complete Episode

I built an AI Podcast Engine inside VibeBlaster that handles everything:

AI-Powered Topic Discovery

GPT-5 with web search discovers today's top AI/tech stories in real-time. No manual scanning required—the AI finds, ranks, and summarizes the most important news.

One-Shot Script Generation

Single AI call discovers topics AND writes a complete 10-15 minute podcast script. Natural conversational tone, proper pacing, sponsor placement markers included.

Voice Cloning with ElevenLabs

Clone your voice directly on ElevenLabs' website with just a 15-second audio clip. Their latest Eleven v3 model produces remarkably high-quality, natural-sounding speech.

AI-Generated Artwork

GPT-Image-1 creates unique episode artwork based on the day's topics. Master image style reference ensures brand consistency across all episodes.

The Tech Stack: Orchestrating AI Services

OpenAI GPT-5 with Web Search

The createChatCompletionWithWebSearch method enables real-time news discovery. GPT-5 searches the web for today's top AI/tech stories, ranks them by importance, and writes a complete podcast script—all in one API call. No need for separate Hacker News, Reddit, or RSS integrations.

ElevenLabs Voice Cloning

Voice cloning is done directly on the ElevenLabs website—just upload a 15-second audio clip and their Instant Voice Cloning creates a synthetic voice. The app's ElevenLabsService then uses the cloned voice ID for text-to-speech generation with 5000-character chunking and audio concatenation via ffmpeg. Using their latest Eleven v3 model produces broadcast-quality audio with natural intonation.

DigitalOcean Spaces for Media Storage

S3-compatible object storage hosts audio files, artwork, and RSS feeds. Public URLs are generated automatically with CDN distribution. The DigitalOceanSpacesService handles uploads, metadata, and URL generation.

ffmpeg for Audio Processing

Scripts longer than 5000 characters get chunked for ElevenLabs API limits. fluent-ffmpeg concatenates audio chunks into a single episode file, calculates duration via ffprobe, and normalizes levels. Checkpoint recovery ensures interrupted workflows resume from the last completed chunk.

Kubernetes for Landing Page Deployment

The PodcastInfrastructureService deploys a branded landing page to Kubernetes with one click. Nginx serves static HTML with RSS feed proxy, Let's Encrypt handles SSL via cert-manager, and the site auto-updates when new episodes publish.

The Architecture: A 4-Step Automated Pipeline

The original plan had 9 steps with 4 AI calls and 3 external APIs. I simplified it to 4 steps with 2 AI calls:

┌─────────────────────────────────────────────────────────────┐
│  Step 1: One-Shot Script Generation (GPT-5 + Web Search)    │
│  ─────────────────────────────────────────────────────────  │
│  • GPT-5 searches web for today's AI/tech news              │
│  • Ranks and selects top 5 stories                          │
│  • Writes complete 10-15 min podcast script                 │
│  • Includes intro, transitions, sponsor placeholders        │
│  • Output: script_markdown, topic_summary, episode_name     │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Step 2: Artwork Generation (GPT-Image-1)                   │
│  ─────────────────────────────────────────────────────────  │
│  • Extract main topic from script                           │
│  • Use master image as style reference (if available)       │
│  • Generate 1400x1400 episode artwork                       │
│  • Upload to DigitalOcean Spaces                            │
│  • Output: artwork_url                                      │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Step 3: Audio Generation (ElevenLabs TTS)                  │
│  ─────────────────────────────────────────────────────────  │
│  • Inject sponsor ad scripts at placeholders                │
│  • Chunk script at sentence boundaries (5000 char limit)    │
│  • Generate TTS for each chunk with cloned voice            │
│  • Concatenate chunks with ffmpeg                           │
│  • Calculate duration, upload to Spaces                     │
│  • Output: audio_url, duration_seconds                      │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│  Step 4: Publish & Deploy                                   │
│  ─────────────────────────────────────────────────────────  │
│  • Generate RSS 2.0 feed with iTunes tags                   │
│  • Upload feed.xml to Spaces                                │
│  • Deploy/update landing page on Kubernetes                 │
│  • (Optional) Submit to Spotify via API                     │
│  • Output: rss_url, site_url                                │
└─────────────────────────────────────────────────────────────┘

This simplified workflow reduced steps by 60% and AI calls by 50%. The entire pipeline runs in under 10 minutes.

The Service Layer: 8 Specialized Modules

Each service handles a single responsibility with proper error handling and IPC response patterns:

PodcastService

Episode CRUD, config management, status tracking, pagination

PodcastScriptService

One-shot script generation, cleanup, sponsor injection, artwork generation

PodcastAudioService

Script chunking, TTS generation, ffmpeg concatenation, duration calculation

ElevenLabsService

Voice cloning, TTS API, voice management, connection testing

DigitalOceanSpacesService

S3-compatible uploads, public URL generation, file management

PodcastRSSService

RSS 2.0 generation, iTunes tags, episode enclosures, feed upload

PodcastPublishService

Multi-platform publishing, Spotify API, publish log tracking

PodcastInfrastructureService

Kubernetes deployment, landing page generation, kubectl orchestration

Voice Cloning: Your Voice, Automated

The magic of this system is voice cloning. ElevenLabs' Instant Voice Cloning on their website requires just a 15-second audio clip to create a synthetic voice that sounds remarkably like you. The cloning process happens entirely on ElevenLabs' platform—you upload your sample, they process it, and you get a voice ID to use in the app.

Note on Accent Accuracy

Instant Voice Cloning can sometimes introduce subtle accent variations—I noticed a slight British tinge in my cloned voice. This is a known characteristic of IVC with unique voices. For more accurate accent replication, ElevenLabs offers Professional Voice Cloning (PVC) which requires 30+ minutes of audio but produces more faithful results. I'm continuing to tweak the settings to dial in the perfect voice.

Once you have your cloned voice ID from ElevenLabs, the app handles the rest:

// Using the cloned voice for TTS
async generateSpeech(text: string, voiceId: string, options: TTSOptions) {
  // voiceId comes from ElevenLabs website after cloning
  const response = await this.makeRequest<ArrayBuffer>(
    'POST', 
    `/text-to-speech/${voiceId}`,
    {
      text,
      model_id: 'eleven_v3',  // Latest model for highest quality
      voice_settings: {
        stability: 0.5,
        similarity_boost: 0.75,
      }
    }
  );
  
  return Buffer.from(response.data);
}

TTS Generation with Chunking

ElevenLabs has a 5000-character limit per API call. Scripts are chunked at sentence boundaries to maintain natural flow:

// Script chunking and TTS generation
async generateAudio(episodeId: string, onProgress?: (progress: number) => void) {
  const script = await this.getScriptWithSponsors(episodeId);
  const chunks = this.splitScriptIntoChunks(script, 5000);
  
  const audioBuffers: Buffer[] = [];
  for (let i = 0; i < chunks.length; i++) {
    const buffer = await elevenlabsService.generateSpeech(
      chunks[i], voiceId, { model: 'eleven_turbo_v3' }
    );
    audioBuffers.push(buffer);
    onProgress?.((i + 1) / chunks.length * 100);
    
    // Checkpoint for recovery
    await this.saveCheckpoint(episodeId, i);
  }
  
  // Concatenate with ffmpeg
  const finalAudio = await this.concatenateAudio(audioBuffers);
  const duration = await this.calculateDuration(finalAudio);
  
  return { audioBuffer: finalAudio, duration };
}

Checkpoint recovery ensures interrupted workflows resume from the last completed chunk—no wasted API calls or regenerating already-processed audio.

The Sponsor System: Monetization Built-In

Scripts include placeholder markers for sponsor reads. The system supports multiple sponsors with different placements:

  • [BEGINNING_SPONSORS]: After the intro, before main content
  • [MIDPOINT_SPONSORS]: Between story segments

During audio generation, the injectSponsorsForAudio method replaces placeholders with actual ad scripts:

// Sponsor injection with voice transitions
async injectSponsorsForAudio(projectId: string, script: string) {
  const beginningSponsors = await sponsorService.getSponsorsForLocation(
    projectId, 'beginning'
  );
  const midpointSponsors = await sponsorService.getSponsorsForLocation(
    projectId, 'midpoint'
  );

  // Get voice config for sponsor transitions
  const { sponsor_beginning_intro, sponsor_beginning_outro } = voiceConfig;

  // Build sponsor block with transitions
  const beginningBlock = [
    sponsor_beginning_intro,  // "This episode is brought to you by..."
    ...beginningSponsors.map(s => `**Sponsor[${s.sponsor_name}]:** ${s.ad_script_text}`),
    sponsor_beginning_outro   // "Now, back to the show..."
  ].join('\n\n');

  return script.replace('[BEGINNING_SPONSORS]', beginningBlock);
}

Sponsors can have different voices—the audio service detects **Sponsor[Name]:** markers and switches to the sponsor's configured voice for their ad read.

Personal Touch: Kids as Sponsors

I cloned my kids' voices on ElevenLabs to read the sponsor segments. It adds a fun, personal touch to the podcast and makes the ad reads more memorable. The multi-voice system makes it easy to switch between the main host voice and sponsor voices seamlessly.

Kubernetes Landing Page: One-Click Deployment

Every podcast needs a landing page. The PodcastInfrastructureService deploys a branded site to Kubernetes:

// Kubernetes deployment workflow
async deployLandingPage(projectId: string, domain: string) {
  // 1. Validate kubectl access
  await this.runKubectl('version --client');
  
  // 2. Generate namespace from domain
  const namespace = this.generateNamespace(domain); // podcast-example-com
  
  // 3. Initialize RSS feed if not exists
  if (!project.rss_url) {
    await podcastRSSService.generateRSS(projectId);
  }
  
  // 4. Load and customize deployment template
  let manifest = readFileSync('templates/podcast-deployment.yaml', 'utf-8');
  manifest = manifest
    .replace(/{{NAMESPACE}}/g, namespace)
    .replace(/{{DOMAIN}}/g, domain)
    .replace(/{{PODCAST_TITLE}}/g, podcastTitle)
    .replace(/{{RSS_URL}}/g, rssUrl)
    .replace(/{{PRIMARY_COLOR}}/g, brandColors[0])
    .replace(/{{PLATFORM_LINKS}}/g, platformLinksHtml);
  
  // 5. Apply to cluster
  await this.runKubectl(`apply -f ${tempManifestFile}`);
  
  // 6. Wait for deployment ready
  await this.waitForDeployment(namespace, 'podcast-landing', 60);
  
  return { siteUrl: `https://${domain}`, namespace };
}

The deployment template includes:

  • Nginx serving static HTML with RSS feed proxy
  • ConfigMap for HTML content with brand colors
  • Ingress with Let's Encrypt SSL via cert-manager
  • Episode grid that loads dynamically from RSS
  • Audio player for in-browser listening

Daily Automation: Set It and Forget It

The PodcastSchedulerService uses node-cron to run the workflow automatically:

// Daily automation workflow
import cron from 'node-cron';

class PodcastSchedulerService {
  private jobs: Map<string, cron.ScheduledTask> = new Map();

  startAutoGeneration(projectId: string, time: string = '06:00') {
    const [hour, minute] = time.split(':');
    const cronExpression = `${minute} ${hour} * * *`; // Daily at specified time
    
    const job = cron.schedule(cronExpression, async () => {
      console.log(`[PodcastScheduler] Running daily workflow for ${projectId}`);
      
      // Step 1: Generate script with web search
      const scriptResult = await podcastScriptService.generateScriptOneShot(
        episodeId, today
      );
      
      // Step 2: Generate artwork
      await podcastScriptService.generateEpisodeArtwork(episodeId);
      
      // Step 3: Generate audio (after review or auto-approve)
      if (autoApprove) {
        await podcastAudioService.generateAudio(episodeId);
        await podcastPublishService.publishEpisode(episodeId, ['rss']);
      } else {
        // Send notification for manual review
        this.sendReviewNotification(episodeId);
      }
    });
    
    this.jobs.set(projectId, job);
  }
}

The default workflow pauses after script/artwork generation for human review. Once approved, audio generation and publishing happen with one click.

The Challenges and Trade-Offs

1. TTS Character Limits

ElevenLabs limits requests to 5000 characters. A 15-minute script is 2000-2500 words (~12,000-15,000 characters). The solution: intelligent chunking at sentence boundaries with checkpoint recovery for interrupted workflows.

2. Audio Quality Consistency

Concatenating multiple TTS chunks can create audible seams. Using the latest eleven_v3 model with consistent voice settings minimizes this. ffmpeg normalization ensures uniform volume across chunks.

3. Script TTS Optimization

Markdown headers like ## Story 1 get read as "hashtag hashtag Story 1". The cleanup pass removes markdown formatting and adds natural pauses using ellipses, em dashes, and paragraph breaks.

4. Real-Time News Accuracy

GPT-5's web search finds current news, but AI can hallucinate details. The script includes a review step where hosts can fact-check and edit before audio generation. The system preserves intro/outro exactly as configured.

5. Cost Management

ElevenLabs charges per character. A 15-minute episode costs ~$0.50-1.00 on the Creator plan. GPT-5 with web search adds ~$0.10-0.20 per script. Total cost per episode: under $2—far cheaper than professional voice talent.

The Results: 4 Hours to 10 Minutes

Before AI Podcast Engine

  • • Topic research: 45 minutes
  • • Script writing: 90 minutes
  • • Recording: 30 minutes
  • • Audio editing: 45 minutes
  • • Artwork design: 20 minutes
  • • Publishing: 25 minutes
  • Total: 4+ hours per episode

After AI Podcast Engine

  • • Topic research: 0 minutes (AI web search)
  • • Script writing: 0 minutes (AI generation)
  • • Recording: 0 minutes (voice clone TTS)
  • • Audio editing: 0 minutes (auto-normalized)
  • • Artwork design: 0 minutes (AI generation)
  • • Review & publish: 10 minutes
  • Total: 10 minutes per episode

That's a 96% time reduction. For daily podcasts, that's 27+ hours saved per week.

Cost Analysis: Surprisingly Affordable

Per Episode Costs

• GPT-5 script generation: ~$0.15
• GPT-Image-1 artwork: ~$0.17
• ElevenLabs TTS (15 min): ~$0.80
• DigitalOcean Spaces: ~$0.01

Total per episode: ~$1.13

Monthly Infrastructure

• ElevenLabs Creator plan: $22/month (100k chars)
• DigitalOcean Spaces: $5/month (250GB)
• Kubernetes cluster: $12/month (shared)
• OpenAI API: ~$10/month (30 episodes)

Total monthly: ~$49/month for daily podcasts

Technical Highlights

8 specialized services for podcast automation
40+ IPC handlers for Electron communication
4 database tables (config, episodes, sources, publish_log)
GPT-5 with web search for real-time news
ElevenLabs Eleven v3 TTS with cloned voices
ffmpeg audio processing with normalization
RSS 2.0 + iTunes tags for podcast directories
Kubernetes deployment with cert-manager SSL
Sponsor system with multi-voice support
Daily cron automation with review workflow

My AI-Generated Podcasts

Here are the podcasts I'm currently running with this AI Podcast Engine. More coming soon as I refine the workflow!

AI News in 10 Podcast Cover

AI News in 10

Daily AI and tech news podcast in about 10 minutes. Curated and generated entirely by AI using my and my kids' cloned voices.

Daily~10 minAI/Tech News
Right Versus Left News Podcast Cover

Right Versus Left News

A daily podcast that looks at top news of the last 24 hours, gives a brief description of each story, then presents what the right is saying and what the left is saying about it.

Daily~6 minPolitics

More Podcasts Coming Soon

Once I nail down the workflow and fine-tune the voice cloning, I'll be launching additional shows on different topics.

What's Next: Expanding the Podcast Empire

The first podcast is live and running. Now I'm focused on:

  • Dialing in the voice: Still tweaking the ElevenLabs settings to eliminate that subtle British accent. May experiment with Professional Voice Cloning for more accurate results.
  • Launching a second podcast: Once the workflow is bulletproof, I plan to spin up another show on a different topic—the infrastructure supports unlimited podcasts.
  • Improving sponsor integration: The kids' voices are a hit for sponsor reads, considering adding more variety.

Conclusion: The Future of Content Creation

The AI Podcast Engine proves that fully automated content creation is possible today. Not just scheduling or distribution—the entire creative pipeline from topic discovery to published episode.

The key insights: simplify ruthlessly (9 steps became 4), build for recovery (checkpoints save expensive API calls), and keep humans in the loop (review before publish catches AI mistakes).

Voice cloning technology has reached a quality threshold where synthetic voices are nearly indistinguishable from recordings. ElevenLabs' Eleven v3 model, combined with GPT-5's real-time web search, creates possibilities that were science fiction two years ago.

The podcast that would have consumed 28 hours of my week now takes 70 minutes—and most of that is optional review. That's the power of thoughtful automation.

Complete Tech Stack

AI/ML: OpenAI GPT-5 (web search, script generation), GPT-Image-1 (artwork), ElevenLabs Eleven v3 (TTS with cloned voices)

Audio: fluent-ffmpeg, ffprobe (concatenation, duration, normalization)

Storage: DigitalOcean Spaces (S3-compatible), AWS SDK v2

Database: SQLite with better-sqlite3, 4 podcast tables

Scheduling: node-cron for daily automation

Infrastructure: Kubernetes (DigitalOcean), Nginx, cert-manager (Let's Encrypt)

Frontend: React, TypeScript, shadcn/ui, Tailwind CSS

Desktop: Electron with IPC handlers for main/renderer communication

Want to Build AI-Powered Automation?

If you're looking for an engineer who can orchestrate AI services, build content automation pipelines, and deploy production systems, let's talk.