[<< | Prev | Index | Next | >>]

Friday, July 12, 2002


Don't try this at home folks! It's probably a good way to fry your modem, computer, and/or self.

I tried to find a modern solution to using my computer as a telephone interface, but alas none of them were quite there or quite what I wanted.

So I resurrected the Frankenphone.

It used to be a Hayes 1200 baud modem. Actually, it still is -- I didn't actually have to modify it all all. I merely, umn, amended it:

There are three hook clips strategically tapping into the audio signal between the phone isolation circuitry and the digital brains of the modem. This is the second modem I gutted for this purpose--the other one wasn't nearly so cooperative. This one proved to have a very clean design, with a fairly obvious node where current was being integrated via an op-amp (or a few transitors--I forget). This allowed me to simply add in some more current from another source without having to clip any wires. Tapping the output was equally trivial. In total, I had to add One resistor, which I stuck in the blue box that holds the audio jacks:

Now, with a single full-duplex sound card, I had enough for an answering machine, but why stop there? So I pulled the SoundBlaster16 from yesteryear's computer and added it as a second card to my current machine.

With two sound cards and the Frankenphone, my computer can answer my phone and take messages for me without interrupting the mp3s I'm listening to. (Did I mention I've been a little anti-social lately?) But why stop there? So I read the documentation for the Open Sound System API's and wrote the library it ought to come with--you know, the one that makes it easy to use--and started hacking real-time audio apps with it.

The simplest just pipes the input of the card to its output. If I run this on both cards (and ATH1 the Frankenphone so it goes off-hook) then my computer headset becomes a phone:

Except all the voice data in both directions is going through the computer, so I can do whatever I want with it. Listen to this phone conversation I just had with Doug. (I've appended the corresponding source code to the bottom of this entry.)

The answering machine app was actually just a script and a hack away from my generic, scriptable "connect" app which I wrote to connect to IP ports or ttys (typically serial ports). Connect is functionally similar to telnet, but can connect to files (ttys, serial ports, etc..) as well as TCP ports (and can set baud rates and whatnot when necessary) and is scriptable. Using my new Oss library, I added a few audio recording commands to connect's scripting language, so now it can pick up the phone, play a message (using the standard Linux "play" command), and record the results. I hacked it right into connect rather than using an external recorder so that as its recording, it can analyze the signal for voice so it knows when to hang up. As long as it's hearing notable changes in the short-time average volume, the hack injects "AUDIO" into the standard text i/o stream, which looks to the script as if it's coming from the modem. So essentially the modem has been virtually enhanced to say "AUDIO" every second or two as long as there's conversation going on, and the script just waits for a sufficient period of silence (or steady noise, as in the off-hook signal) before it decides to hang up.


    if "AUDIO" 5 "goto #stillgoing"
    echo "Ok, I think they hung up."

    bg play -d /dev/dsp2 Goodbye.snd
    wait 2.

    send "ATH0^M"
    if "OK" 10 "goto #hungupok"
    goto #uhoh

    echo "Ok, we've hung up on 'em.  Now we'll stop the tape and save it."

    AudioSave "Message" 5.

So, why not just buy an answering machine you ask? Well, because now it can select outgoing messages at random from a list rather than always playing the same one. And when I listen to incoming messages, I can annotate them with textual comments, save just the ones I want, and listen to them selectively without having to "skip" through all the ones saved before it. And when I'm traveling, I can have them automatically emailed to me as audio attachments. And I can turn the ringers off on all the phones in my house, and when I'm taking calls, the computer can fade down the music and softly announce to me that someone is on the line and would I like to take it? "I'm sorry, I can't seem to find Simon at the moment. If you would like to leave a message, go ahead: " And when I'm sleeping, which the computer can infer by the hour and the fact that I haven't touched the keyboard yet in the morning, then it will silently take messages for me without making a peep. And, of course, when telemarketers call it will just rudely hang up on them.

And besides, it was just fun.

Here's the C code that does the pitch-shifting:

 * This runs the specified sound card in full-duplex mode,
 * 	recording continuously, filtering by the specified real-time
 * 	filters, and sending back to the card's output with as
 * 	little latency as possible (actually you can control the
 * 	latency with the fragmentSize and fragmentLatency).
#include "Oss.h"
#include "Io.h"

	 * These are set to the actual values once the device is open:
	static real secsPerSample = 1./8000.;
	static real samplesPerSec = 8000.;

	 * Duration of the run left:
	static real dur           = 0.;

	 * For the shift-down effect:
	static real    sdBlocksPerSecond = 20;
	static int     sdBlockSize = 0;
	static int     sdBlockSize2= 0;	// sdBlockSize/2
	static int     sdBlockSize3= 0;	// sdBlockSize*3/2
	static short  *sdBlock;			// buf of sdBlockSize bytes
	static int     sdOffset    = 0;	// How far have we written into that block?
									// Actually goes to 2*sdBlockSize...
	static int     sdLastSample= 0;	// Last sample, for averaging..

void recordCallback(pointer ossP, char *buf, int numBytes)
	Oss oss = (Oss)ossP;
	int i;

	 * For shifting down, we take a block of data, stretch it
	 *  out twice as wide, and window it with a triangle (or
	 *  any sums-to-one shape):
	forn(i, numBytes/2) {	// numBytes will always be even... right?
		int j, s, ss;

		s = ((short *)buf)[i];

		// Make two samples out of one:
		for2(j) {
			// Average rather than stair-step when doubling samples:
			if (j) {
				ss = s;
				sdLastSample = s;
			} else {
				ss = (s + sdLastSample)/2;

			 * There are four phases to filling the block --
			 *  the ramp down in the first half, ramp up in
			 *  the first half, ramp down in the second, and
			 *  ramp up in the second...
			if (sdOffset < sdBlockSize2)
				sdBlock[sdOffset             ]  = ss * (sdBlockSize2-sdOffset) / sdBlockSize2;
			else if (sdOffset < sdBlockSize)
				sdBlock[sdOffset-sdBlockSize2] += ss * (sdOffset-sdBlockSize2) / sdBlockSize2;
			else if (sdOffset < sdBlockSize3)
				sdBlock[sdOffset-sdBlockSize2]  = ss * (sdBlockSize3-sdOffset) / sdBlockSize2;
				sdBlock[sdOffset-sdBlockSize ] += ss * (sdOffset-sdBlockSize3) / sdBlockSize2;


			// If we're done, output and reset:
			if (sdOffset >= sdBlockSize * 2) {
				OssWrite(oss, (char *)sdBlock, sdBlockSize*2 /* convert to # of bytes */);
				sdOffset = 0;

	dur -= numBytes * secsPerSample;

void sdInit()
	int j;

	sdBlockSize = samplesPerSec / sdBlocksPerSecond;

	sdBlockSize2 = sdBlockSize  / 2;
	sdBlockSize  = sdBlockSize2 * 2;	// Make it even...
	sdBlockSize3 = sdBlockSize2 * 3;

	sdBlock = zalloc(short, sdBlockSize);
	forn(j, sdBlockSize)
		sdBlock[j] = 0;

	Oss oss;
	cstring devName, formatName;
	int srate, channels, fragSize, targetLatency;

	Aparse(argv, NULL,
			"d=%s(AudioDevice,/dev/dsp)", &devName,
			"s=%i(fragmentSize,512)"    , &fragSize,
			"r=%i(SampleRate,22050)"    , &srate,
			"t=%r(LengthOfTest,10.)"    , &dur,
			"l=%i(FragmentLatency,2)"	, &targetLatency,
			"b=%r(ShiftDown-BlocksPerSecond,15.0)", &sdBlocksPerSecond,

	oss = OssOpen(devName, OssModeFull, srate, 1 /* mono */, "S16_NE", fragSize, 0);

	if (!oss)
		ErShowWarning();	// In case any problems were encountered...

	OssDescribe(oss, stdout);

	if (OssFormatCmp(OssFormatName(oss), "S16_NE") || OssChannels(oss) != 1) {
		printf("Blast!  We couldn't get mono, native 16-bit, which is all we handle.\n");
		goto bail;

	samplesPerSec = OssSampleRate(oss);
	secsPerSample = 1./samplesPerSec;


	if (dur > 0.) {

		OssStartPlaying(oss, targetLatency);
		OssStartRecording(oss, recordCallback, oss);

		while (dur > 0.)
			if (ioWait(999.)<0)

		OssDescribeBuffers(oss, stdout);


[<< | Prev | Index | Next | >>]

Simon Funk / simonfunk@gmail.com