Illustration by Justin Harrell. Watch him draw the Skype-O-Saurus in this time-lapse video.
The Drupalize.Me Podcast (formerly the Lullabot podcast) has been running for many years now. During this time, not much has changed as far as what makes the podcast itself. There is theme music, a host, guests, event updates, and now even sound effects. Even when it comes to how we record a podcast, not much is different in either the method or the technology. What can make or break a podcast, though, is the quality of the sound. I'm not talking about if the podcast is HD or anything, but what the overall quality of a person's voice is, the ability to reduce or eliminate background distractions, or even just being able to create a good mix of volumes. All these things are great to be able to have some control over and edit before putting the podcast out to the masses.
Drupalize.Me has always taken sound quality and the editing process into consideration when producing a podcast. Each host has a specialized mic for recording and we encourage our guests to do the same. We try to avoid recording via phones, ear buds, or even laptop microphones. The biggest hurdle when it comes to recording a podcast is the actual recording. You can get a great recording if the podcast is just you, but the second you add guests into the mix things get out of your control. A guest may work from home and have children or dogs. A guest may prefer to record form a nearby coffee shop to get out of the office. Background noise can be a major issue in the quality of a recording.
One of the biggest tools for recording a podcast over the years has been Skype. It was one of the first widely used applications for chatting with people across the globe; and it was free for the most part. A huge advantage Skype has over lots of other options is the quality of its audio, as strange as that may sound, Skype is one of the best. Of course there is the occasional dropped call or robot voice you must deal with, but those aren't issues limited to Skype either.
A Typical Recording Process
Whether you're using Skype or some other software, if you have remote guests, there has basically been two ways to record a podcast: record the Skype call and hope for the best, or have each person on the call record locally and distribute the files with a cloud file sharing service like Dropbox. At Drupalize.Me we use an application called Audio Hijack (recently Audio Hijack Pro) to record Skype calls for our podcast. One of the great things about this app is it will record the host on a separate channel from the guests. This is great especially if there is only one guest. We also have what we call a "backup recorder" which is another team member on the call also recording for just-in-case reasons.
With a host and a single guest recording, the audio file is split into 2 channels that allows the person editing (me) to isolate each person to a separate track and adjust volumes and effects independently. This is great if one person is talking and a dog barks in the background from the other person's mic. I can just edit that out. It is this reason some podcasters prefer the "everyone record" method. That way each person is an individual track to work with; that and the audio quality is better. The biggest issue with this for Drupalize.Me is putting the burden of recording of our podcasts on our guests. Our guests are typically spread around the world and use various operating systems. Asking them to record and expect that they know how or have the means to do so is just not realistic. So over the years we have done our best (and a pretty good job if I do say so myself) to work with what we have.
Improving the Multiple Guest Recording
It wasn't until recently when I was recording a podcast that this process really got to me. I knew there was another way but never had the time or resources to investigate. A podcast I have listened to over the years, TWiT, had invested lots of time and money into solving this. Not only for quality's sake, but because they went live at some point so transferring audio files was not even an option. I read an article once how they built what they called a Skypesaurus. They used multiple computers, screens, and a hardware mixer (not to mention a budget of $1500) to make this happen. In the past I researched a way to make this happen with just one machine, I found bits and pieces but was never able to make it happen. Just recently I decided to give it another go. I found an article which described what I was trying to do with an app called Soundflower and Abletonlive.
Soundflower is an app for the Mac that has been around for quite some time. It is open source and something I never really grasped until I used it. It basically turns your Mac into a virtual mixer. It allows you to send any device or app (that allows you to select its input/output audio sources) to any other. The reason I never took hold of Soundflower was because it basically comes preset with a 2 channel in/out and a 64 channel in/out. The odd part is the 64 channel is just labeled as that, it doesn't display 64 different channels. A channel to me would be either an input or output. The article above made it make sense and more possible by modifying Soundflowers' plist file and adding actual channels listing from a-i. I finally understood what I could use it for. The other piece of software used was Ableton Live, which is multi-track recording software with a hefty price tag. I realized I could do this with Apple's Logic Pro X for a fraction of the price.
The Skype-O-Saurus is Born
There was just one other thing I needed, a way to have multiple Skype conversations at the same time. I remembered in the past when I attempted this before that I came across a piece of software called "Skypelauncher". This allowed you to launch multiple instances of Skype. With that I created there other Skype accounts (podcastbot 1 thru 3). Then between Soundflower, Logic Pro X, and Skypelauncher I was able to record a podcast while maintaining each guest and host as a separate audio channel. I was even able to bring in a music and soundboard mix in just for kicks. Another huge advantage I have with this method is the ability to adjust each person's feed to each other. So if guest one says they can't hear guest two very well, I can adjust the audio for that person only.
Video: How I Built The Skype-O-Saurus
I could go into depth all the configurations I did to make this happen, but this being Drupalize.Me an online video training site I felt it would make more sense to show you in a video. Watch to see how I went about making this happen and how I configured each piece of software.
why not Justin.tv, Livestream, etc?
why not BigBlueButton?
Thanks for writing this. I have a lot to say on this topic. I'm the co-host (and the guy in charge of production) of Epicenter Bitcoin. We started our podcast a year and a half ago and audio/video capture has been an ongoing issue since day 1. Even some 80 episodes in, we still don't control all the variables which can cause things to f**k up (and I'm starting to think we never will).
For the first 40-some episodes, we were audio only and used Skype (with call recorder) or Mumble, or some combination of both to record the show. We've always done editing and mixing in post, relying on the tallent of our audio engineer to come to the rescue when something goes wrong.
At some point we thought it would be cool to throw video in the mix. We naturally turned to Hangouts as it provided a relatively high-quality yet simple way to record video calls and send them straight to YouTube. The problem with this, however, is that there is no way to record audio locally.
Here is what a typical show looks like:
- 2 hosts (in different locations)
- 1 or 2 hosts (always different people, and in different locations)
- 1 hour long
It's very important to us that the audio version of the show be of superior quality. To achieve this, we ask each participant to record their audio locally with QuickTime X (when the guest is on a Mac, which luckily is most of the time, otherwise we ask them to install Audacity). When we start the Hangout, we also start recording audio. At the end of the show, we ask the guest to export their audio and upload it to our Dropbox. Then we send everyone's audio to our engineer so he can edit, mix and master the audio podcast. The video version gets released as is.
This probably all sounds fine in theory except... nothing ever goes as planed. Many computers, even the most recent Macs, can't handle Hangouts running for an hour while QuickTime is recording raw audio. About 30-40% of the time, one of the participant's QT crashes and we're forced to grab their audio from YouTube, which is disgusting. Not to mention, this is a huge burden on the guest. There can be anywhere between 20-60 minutes of walking the guest through getting everything setup, and we do this every week. We've been very lucky that all have been understanding, but if it was me, I'd be loosing patience in about 5 minutes.
In addition, we're starting to see the limits of Hangouts. The quality isn't always there (in comparison to Skype) and we're now thinking of adding an animated logo at the beginning of the show, pre-recorded ads during the show, etc. All this will require additional video editing, which YouTube's video editor will not be able to handle.
So, we need an other solution. The software-based version of TWiT's Skypesaurus has been on my radar for a while. However, none of the Skypesaurus-inspired solutions being proposed by the podcasting community seem to support video. I suppose I could have the same setup as you're proposing and record each participant's audio and video using something like eCamm Skype Call Recorder, however, they wouldn't see eachother.
Do you have any experience with video podcasting? Do you know of any solutions?
Why can't this be simple...
I agree, why can't this be simple... Like you said, the best method for this using Skype would just be record each one with something like eCamm and just loose the ability for them to see each other. Have you looked into using http://appear.in and see if that is less intensive on the computer's processor? You could even attempt to run and record on a master computer with Skype and still use appear.in and Skype on each persons machine so they could see each other? I wish you luck and if I come across anything that might help I will reply to your comment.
This is BRILLIANT!!! I can't find a copy of Multi-Skype Launcher, though. It seems the developer has discontinued it. Any chance you have a backup copy laying around anywhere?
This is where I got mine, but I do agree I think the developer is done with it. This version still works with the latest Mac OS. If it ever stops working there are terminal commands to do the same thing that you can find on the googles. http://www.macupdate.com/app/mac/47987/multi-skype-launcher%22
Is it possible to do this with GarageBand instead of Logic?
Is there a video that you posted that explained specifically how you used the sends in Logic Pro X? You may not believe this, but you are only video on the web talking about sends, buses and aux explaining them in the exact context I need them. Most other videos discussing S, B, and A are explaining them in the context of use with music recording. I'm trying to take my podcast to Skype using a Tascam audio interface with one caller and I'm not sure how to set up the signal flow in Logic.
I am using the sends for each guest to send their audio to each other. You want to make sure that you don't have a send for the same guest to themselves which you cause an echo. I utilize the Soundflower channels for the sends. So if I have a guest audio in Skype receiving audio on sound flower channel A/B I make sure every guest has a send to A/B minus that guest. Does that make sense?
This was an incredible article and video! The only question I have is can you briefly explain how one could possibly create your "Skype-O-Saurus" with all software that is compatible with Windows too? Ultimately I'm aiming to launch a podcast that requires 2 Skype guests (who interact with each other) on each show, hosted by myself. Thanks in advance good sir.
Chris, Sorry for the long delay, I was looking around to see if I could find anything to help you out. I haven't been in the Windows world in 10 years so I wasn't able to provide much feedback. I also couldn't find much to help you either. If you find a solution please come back and post it as a comment. Good luck!
This has been incredible for my podcast. Thanks for the post! Weirdly though a couple of weeks ago this just stopped working. It seems to be something to do with skype but I can't understand what. Has anyone else had problems?
I have to be honest, we put our podcast on hold so I haven't used this setup in a few months. Did this happen after an upgrade like El Capitan or something like that?
I solved the multiple instances of Skype issue with:
open -na /Applications/Skype.app --args -DataPath /Users/$(whoami)/Library/Application\ Support/Skype1
(Substitute Skype2, Skype3, for Skype1 to get multiple instances running.)
Got the modified Soundflower copy by Luca to run and does provide Ch A-I. I was also able to create an aggregated device with those in it, just like your video shows.
I got ahold of a copy of Logic Pro X just to try this out and see if I can make it work. This is where I am getting lost. I think maybe dumbing it down a bit further with some diagram for the non-audio engineer folks would help. For one instance of Skype, you used A and B, which is actually channels 1-2, 3-4, correct? So, for the next Skype instance, would it be C and D, which is actually 5-6, 7-8? I am also getting confused with the Sends and so on.
This was a great post. It would be interesting to update it with Loopback instead of Soundflower. I assume that would simplify things a bit.