Friday, June 20, 2008

Downloading Audio

In most cases, audio files are served just as html files, by a web server. Anything that is served by an http server can be saved to your hard drive. You usually talk to the web server through your browser. Remember, you can independently do anything your browser does.

The simple way

The most straightforward way of serving audio in an html page is to directly link to the audio file. mp3 files are usually put up this way. When you pass your mouse over the link, if you see a URL that ends in .mp3 or .rm, it's a direct link. When you click on links like this, the browser usually offers to save the file to disk. Unless a plugin for that file type is installed in your browser. In that case, you can just right-click the mouse and select save target as. Some smart-ass html authors use this lame trick of disabling the right click. They just use javascript to pop up a dialog box on a mouse right click. Don't worry. You have the html with you. Go to View -> Source and locate the link to the audio file, copy it and paste it in the Location: bar. If you are on a framed page, make sure you select Frame Source. The braindead Internet Explorer sometimes grays out the View Source menu item. Try Save As to save the whole file to disk and extract the URL. If even that is deselected, use wget to download it.

Meta files

This is the most common way of serving real media files. If you see a ram extension on a link, it's the realaudio metafile. It's a text file with a list of actual audio URLs. You can save this file to your disk and open it in your favorite browser to see the contents. Now what you want to do is download the URLs in that file. This is the best method to download any given URL. But there is a longer route if you prefer. The idea is to create a new html page with links to the actual URLs you want.
tags and use your mouse to save those audio files away. Oh, mp3 files are also served in metafiles. These are urls ending in .m3u or .pls.

Things are not always this simple. Getting to the .ram link may itself be quite involved. Web page authors use all kinds of tricks to hide this link from your eyes. But rest assured. If the browser can see it, so can you. First, try to save the page (either from the browser or through wget) and look at the contents. Sometimes, urls are generated through javascript, so you may have to really interpret the code to get the final url. If this is too complicated, you have the catchall trick: use a sniffer.

Using a sniffer

Using a packet sniffer, you can look at all the packets going to and from your machine. In this case, we are interested in http packets. Ethereal is what I use. Fire it up, and look for http packets. They have the url that ultimately goes to the web server. Just copy it and use wget.

I recently came to know of another sniffer called URLSnooper. This tool captures just the URLs going on the wire; so you don't have to wade through the html files to get at them. In conjunction with Streambox VCR, this can be a deadly tool. Look at the end of this page for the link.

Some web servers are a little smarter than usual. They let you listen to an rm file only through the Realplayer. They do this with the User-agent field in the http header. The browser usually identifies itself in this field. To defeat this, you just tell wget to mimic Realplayer:

wget --user-agent='RMA/1.0 (comptabile; RealMedia)'
You can use the sniffer to see what headers the browser or the player is using.

Using wget to download files

wget is a command line browser, so to speak. You give it a url and it saves it to your disk. It's of immense help in our context. You can save the .ram file from the browser and use wget to get the urls contained in it. You can even use the -i option to get it directly:
wget -i song1.ram
wget is your friend. It's available for all platforms.

pnm and rtsp

All the above tricks assume the audio files are sent through http: links. Quite a few web servers use a Realaudio server to serve audio. As you can expect, only Realplayer can talk to this server. The audio files are linked thru the ram links. The ram file here contains links like:
pnm://some.server.net/song1.rm
or
rtsp://some.server.net/song2.rm
The protocol here is proprietary. In the case of the latter, even though RTSP is an open protocol, the data passed in the RTSP packets is specific to the Realaudio server and the player. We all know that proprietary protocols are bad. There are some good people on the net that hacked this protocol and built a program that can download these URLs to disk as well. Look for Streambox VCR Underground on the net. This software is illegal. You are warned. Search the net and you'll find all the legal history.

That, pretty much covers all the ways (I think) of downloading stuff. Let me know if you find a page that beats all this. I'll be very interested.

Realaudio to mp3 conversion!

Is it possible to convert realaudio to mp3 or any other open format? This is probably a question that almost everybody that dealt with Real faced. Finally, I found an answer. There is a tool based on Streambox VCR that lets you do this. It's called Streambox Ripper. See below for the links.

There is a more legal way of doing the conversion if you are running Linux. vSound is a tool that lets you save the output of an audio player into a wav file. I haven't tried it. Let me know if you find it useful.

No comments:

PDF lo panduga chesuko....Sai