Ideal Audio-Visual system

I'm a great lover of the way my Tivo (a hard disk video recorder) changes the way TV works for me in such a dramatic fashion. Few people really understand how much it changes things until they get one.

But I'm concerned about trends I see in the PVR industry as companies like Tivo try to lock up their boxes to stop customers from copying video -- and in the process stop them from doing all sorts of cool things like TivoWeb, which puts a web server on the Tivo.

Some people wonder if the new FCC Broadcast Flag (BF) mandate will make it harder to build a Tivo like device, in particular because the BF rules were written to try to avoid interfering with today's design of the PVR.

To explore that issue, I write up below one possible architecture for home A/V that I think is superior. It turns out to be forbidden by the BF and by a number of other similar efforts.

Computerized home A/V.

Taking our lesson from the internet, everything should run over IP on the home Ethernet, which in my case is already present. 100mbps today, and gigabit in the very near future. No need for new specialized cables and protocols like firewire and USB 2.0 except where they make things cheaper. They really only work over short distances, to dedicated peripherals. IP is a general purpose protocol that doesn't care about distance or network configuration or the purpose of equipment.

Digital Speakers

Let's start with the digital IP speaker. This would be a speaker that plugs into power and Ethernet (or power-line ethernet) and has some means to give itself an address. This would be a combination of digital decoder, amplifier and speaker. These might be independent components (ie. a decoder/amp connected to a traditional speaker) but in violation of normal principles of speaker marketing, it might make sense to

This digital speaker would be able to receive, and combine together, multiple digital audio streams beamed to it. Ideally, it would also be able to answer questions about itself, such as providing its frequency response curve, and information stored in it for identification purposes, such as what room it's in, and where in the room, and possibly even the acoustic properties of its location in the room.

Ideally you would be able to get lots of digital speakers. Big expensive ones for the living room and Home Theatre. Little ones to put next to the computer or in any room of the house. Computers with speakers could of course have software drivers look like a digital speaker in this protocol.

Both an upside and downside of IP is that anything in the world can talk to anything. Thus there would need to be a firewall, or better still the speakers would need a way to decide just who they will take audio from. Generally accepting only audio from the local net will do fine, with authorization needed for audio from beyond.

A simple plug-and-play protocol is needed which notices a new speaker on the net and allows it to be easily configured by the computers on the net. A "master controller" would probably talk to any new device that announced itself, and name it and give it access keys as needed. A user at a console could later say, "That speaker I just plugged in? Label it bedroom-left."

However, most devices would have a cheap dial on them to turn to give them a "room code" and speakers would have another dial to turn to give them a function code (L, R, Surround-L, Surround-R, Center, User-Defined 1..10.) In the case you just dial the codes, plug it in, and a music player can easily figure where to send the audio for a room.

(In theory it's possible for speakers -- which can also act as crude microphones -- to work with other speakers on the net to figure out what other speakers are in the room and where they are in the room. But the dial interface is cheap and easy to understand.)

Once you have digital speakers, digital audio components become easy. They just discover the speakers, use location or user input to determine which ones to use, and send their audio to them. CD players, Tuners, etc. all can send audio directly. There is no strict need for today's "amplifier/mixer" component, though one could have one from a user interface standpoint. Since speakers would be required to add incoming streams, mixing could also be done directly there.

Now all devices can send any audio needed anywhere. The cable is just Ethernet, or even wireless Ethernet, or even power-line Ethernet, allowing the speakers to simply be plugged into the wall and work! No "monster cables" that cost hundreds of dollars. No mess of wires into the back of an amplifier. No pre-amps or amps, except where they are needed. The audio quality should be superior. (Lossless digital streams of more than CD's 41K samples/second would be supported, and possibly compressed streams.)

Thus you can easily have different sounds in all rooms of the house or the same sound, as needed. With authentication you could play sounds anywhere in the world willing to take them.

Headphone jacks would also just plug into power and Ethernet, possibly sending a command to local speakers to mute. Wireless headphones could use 802.11 or work as they currently do.

There are some issues with time sync I'm not fully up on. If these components need more accurate time-sync than can be managed over ethernet, a radio protocol for highly accurate time sync could be added to the system with occasional pulses with the time to the nanosecond.

Digital Audio Sources

Of course, in this world, most of the audio would come from digital audio players, such as MP3 playing software in a PC or dedicated music jukebox. Playing CDs, tapes and LPs would of course be possible but more often these things would be read only once, and stored on hard disk for digital playback from then on. Simple components could keep older interfaces (adding only the ability to pick what speakers they will use, or simply using defaults) or have fancy new ones.

Internet radio, even via the PC or standalone box also becomes easy and flexible. And today's video sources (satellite, digital cable, DVD) already have digital audio, and would send it to the speakers associated with the TV they are displaying video on.

Time Sync

For pure audio, you want to insert a small delay, and have synchronized clocks among the speakers. (Synchronized to within ethernet delay is fine.) This allows a jitter buffer and retransmit of loss packets. Unlike VoIP where lost packets are discarded because people would rather have a small audio glitch than a long latency, here it's the reverse. People will demand glitch-free audio.

Unfortunately that makes it hard to do audio that must be synchronized with video going over another channel. For that you must delivery and play the audio ASAP. Very limited ability to retransmit or compress data, you are permitted just a few milliseconds. A combined digital audio/digital video system can address this.

Video

For digital video, we would do something similar. TV Monitors would have, intimately connected, a digital decoder and Ethernet (probably gigabit). This could be built into some TVs, or be a small box which clamps right on to the TV's analog inputs for the best possible signal. The box would also want an infrared interface for the remote control. And a way to power the TV on and off if the TV wasn't designed for this system. (A monitor designed for this system would itself have ethernet.)

(There are such devices out there already, such as the $80

This box would receive compressed digital video streams. On gigabit Ethernet it could also receive uncompressed streams, but since in most cases we already have compressed streams that's the way to go. It would need to support layers, so that one signal could be superimposed on another, for example to lay text onto the screen for control.

The video system would be designed around the PVR concept. However, for storage, a traditional network fileserver would be used. That could be an always-on PC, or a dedicated fileserver box anywhere in the network. Ideally tucked away in the basement where the noise of the hard drives is not an issue. We don't want hard drives in our bedrooms unless they get even quieter than they are today.

This fileserver is something people want anyway, for their computing and their audio and picture needs. And in fact, any computer that's reliably on all the time can serve as this fileserver. You can even have more than one, so that you don't need to have any single one on all the time.

Tuners

To receive video, you need sources, such as tuners. You would buy components able to get digital TV streams for you. This could include a regular tuner that tunes and digitizes broadcast TV or NTSC video TV. You might have several of these, to be able to handle multiple shows at once. One could also have a satellite decoder or digital cable decoder.

DVD drives/players could also be sources of video. As could internet streaming video.

The key is you get more tuners as you need them. As new standards emerge, all you need do is get a new tuner (or new software.) All the other components of the system remain the same.

When a program is on, the tuner would get or create the digital stream for it and store it on the fileserver. A computer control box (either a PC or dedicated device) would command it to do this at specified times.

Playback of shows involves telling the fileserver to send the video to the decoder box mounted on a TV monitor. Pause, rewind and all other functions are a matter of controlling this stream. This stream can of course be read from a file for a program currently being recorded. You could watch near-live (reading what was just recorded) or any other point in the show, as a PVR does.

With all these tools you can also easily build my concept of a poor-man's video on demand.

Friendly Neighbour A/V

Something that makes technical sense (legal questions aside) would be for a group of neighbours to share an A/V network, using 54mbit wireless or stringing a short cable between houses. Neighbours could add disks and tuners, or one central neighbour could have them all. Quite efficient compared to having everybody get their own cable, satellite or HDTV antenna, especially in an appartment complex. A system could arrange equitable disk space sharing, with a big win when several homes all want to record a popular show.

Expandability and Innovation

The big advantage here is the ease of expansion and innovation. As new tools are invented, they can simply be added to the system. If a new broadcast format comes along, all you need is a decoder/tuner for it.

Need more storage? Just add more standard disk to your fileservers. Need more TVs? Just get a decoder. For sound, use digital speakers. Just plug them in and use them.

Plus you can easily bring in other devices. Want to gather up your pending TV shows and put them on your laptop for a long trip? This is easy to do, just transfer them over the network from your fileserver, perhaps along with your music. Watch them and sync up when you get home to get more video or delete the old, watched stuff.

The Tuner and controller box might come from Tivo. The decoder on one TV might come from Replay. All sorts of companies might provide all sorts of components. And they will largely work together.

But most importantly, it's easy for new companies to add new components and features that nobody has yet thought of. New ways to play or process or audio or video, new devices, all can and will be added in an exciting competitive environment.

This is what happens when you have open systems in which many vendors compete on features and price to win the consumer's dollar. It doesn't happen when rules govern what innovations can happen, or require advance permission from industry cartels or the government.

It's illegal

The problem with this great scheme? It will soon be illegal or impossible. New FCC rules mandating respect for a special bit called the Broadcast Flag require that any video system made after mid-2005 that tunes digital broadcast TV can't operate like this. It can't make the digital video stream available in standard open formats like MPEG, stored on an open fileserver like that on a PC. The video has to be protected, and only offered in digital form to devices that are certified to protect it.

This law doesn't govern the digital video that is coming to your satellite or digital cable box because those boxes already won't give you their digital video stream. You can get analog video but that's very expensive to re-encode and this usually involves a loss of quality.

DVDs have, since they began, been encrypted, and not been permitted to give you that digital video stream in an open format. 99% of DVDs can only be played on DVD players made under the restrictions of a special contract overseen by the studios.

Trying to extract the digital video for open use is also illegal, if you do anything to bypass even the most trivial of protections on it.

Can't the locked systems do all this?

You could build the above system with locked technology. You could encrypt all video it stored, and only allow playback on approved players that are licenced with the decryption keys. However, there are a lot of problems with doing it that way.

First of all, it means anybody making something that plays back the video or works with it other than as a bucket of bits has to get a licence for the keys. And the people with a keys put restrictions on those who want a licence. They use the encryption for a reason after all. The most basic restriction is they want to stop many types of copying. This turns out to be very hard.

For example, they want to stop copying to other people, or out over the internet. This means they must be able to tell the difference between your device and somebody else's, which goes against a lot of important internet principles.

They sometimes want to limit the copying, or forbid it altogether, even if it would be legal. This is obviously not in consumer interest. But they have tried to do more. For example, most DVDs have sections (such as the FBI warning at the start) where the fast-forward key is disabled. Your own player refuses to obey you when the studio decides it should not.

In general, the designers of systems are not prescient. Locked systems however, tend to work under an "Everything is forbidden unless it is permitted" style. If they didn't predict a style of operating -- because it hadn't even been invented yet -- chances are it won't be possible without modification of the rules, if at all. That means to invent something you have to petition for a modification of the rules. It's possible to get one, but it's amazing how much of a damper having to ask for any permission puts on innovation. Even having to wonder if you need to ask for permission is innovation-dampening.

The BF is designed to accommodate today's Tivo (with analog) solely because it already exists, and they could understand it. I have a strong intuition that a BF designed 10 years ago would have made it much harder to build the Tivo at all.

It's very hard for a locked system to respond to new ideas. That's because unanticipated uses are hard to distinguish from security vulnerabilities. You can fix the system after the fact, but only with permission, and only when worrying about security.

A crazy example of this is a new system Tivo has announced to let you copy the TV shows on your box to your laptop to take with you on the road. For unfathomable reasons, the shows are encrypted, and a dongle (a plug-in device you stick into the laptop to make it authorized) is needed to play them. If the videos had been ordinary MP2, you would just have been able to copy them to your laptop and play them without waiting for this complex device.

The other downside

Sadly, another downside of my design is it's ideal for implementing hard DRM. DRM vendors, seeking to protect digital content, have always faced the problem that one can just plug a recorder into the headphone jack and get a very good -- in some cases identical -- digital copy. This makes all the DRM useless.

Sadly, they would love the all-digital system above, because it could be extended to not have a headphone jack, to not decrypt the signals until the speaker magnet or monitor pixel. Then analog extraction would become much more difficult, and impossible for the average user.

So while I want this system, I am wary of it being suborned, so that you can't make a speaker or headphone without permission.

Other useful features

If I had this system, I would also want to be able to put microphone and line-input ports on the digital (or wireless digital) bus. One great thing to do would be to place a microphone near where I sit. The system could then have the speakers sound out the entire frequency band, and listen to it on the microphone, getting a complete map of the acoustic performance of the speakers, in their exact position, aiming at my exact location, with all echo and frequency characteristics. They could then work from that map to make even cheap speakers have very high quality response, more than justifying any extra cost the digital components add. (Not that this is much, since in mass quantity these digital components are much cheaper than the fancy speaker wires they replace.)

Protocol Notes

The protocol for this system should include the following features:

Resource Announcement: When a device is connected to a new network, it should either receive instructions on what to do, or by default announce its presence and parameters. "I am a left speaker in room: Kitchen." All for easy plug and play.
Resource discovery: It should be possible to query what devices are out there and what they do. When talking to them, one should learn what sub-protocols they speak, including what audio and video compression codecs they support.
Sub-protocols: For streaming of video, audio and control signals, and for description of complex resources like frequency response.
Minimum support: All devices should support basic uncompressed audio, FLAC and Vorbis, plus uncompressed video and MP1 and MP2.
Proxy Functions: Any device may allow a proxy to act for in for control purposes (audio and video might still flow directly to the device as commanded.) Thus a "virtual speaker" might exist which does processing before sending commands to the real speakers.
Master controller: In some easy fashion, devices should surrender to a master controller authorized to configure other things about them. Among other things it may limit what other devices may talk to them. By default devices will probably resist control from devices on another IP network, but a controller may tell them to accept such control.