Sunday, November 1, 2009

A little more monad, and a semi-useful program

Okay, first, addressing a couple points from the comments on my previous post.

StdGen is not a monad (something I probably should have made a bit clearer). However, the way StdGen works is similar to the way monads work - they share the idea of replacing a change in the state of something with returning a value representing the new state.

I admit, the input-reversing program isn't done in the "right" way, but I've got to mention return somewhere.

A Semi-Useful Program!
It's nice to know that you can write a program that reverses its input, but it's not especially useful. So it's time to write something that is sort of useful. What will this program do? It's going to extract some information from an MP3's ID3 tags.

A quick overview: MP3 files commonly include information on the song they contain. This information is in the file in the form of an ID3 tag. It's not encoded in any special way; we can see it fairly easily by opening the file in a hex editor. Here's an example:
Photobucket
This particular file contains the song "I Feel Fantastic," by Jonathan Coulton. (It's a good song.) You can see an album name ("Our Bodies, Ourselves, Our Cybernetic Arms"), the artist's name, the composer's name (the previous two are the same in this case), the track title, the year of release, and the track's position on the album. You can also see the beginning of an image file inside the MP3. ID3 tags can also include information like genre, track length, or beats per minute. When you open an MP3 in iTunes, Winamp, or whatever program you use for that sort of thing, the ID3 tags are where the program gets its information about the song.

The length of the ID3 header is in the tags. However, it's in there in a rather bizarre way (28 bits - 4 bytes, disregarding the high bit of each, which is always zero), so we're going to cheat a bit. First, we're only going to look at three tags - track title, artist name, and album title. Second, instead of using the length information that's in the tags themselves, we're going to take advantage of the fact that each tag containing text is terminated with a zero byte (0x00).

In the screenshot, you'll notice, among others, three specific sequences of 3 characters: TP1, TAL, and TT2. These indicate the track title, album title, and artist name, respectively.

First, we need to read the file in somehow. For this, we'll use the functions getArgs, openFile, and hGetContents. getArgs lives in the System.Environment package. Its type is IO [String] - an IO monad carrying a list of Strings. These strings are any arguments supplied to the program from the command line.

openFile is from the package System.IO. The type looks like this:
> import System.IO
> :t openFile
openFile :: FilePath -> IOMode -> IO Handle

FilePath is the same thing as String. The IOMode parameter tells the function whether we're going to be reading, writing, or both. A Handle is sort of like a Haskell representation of the file (the details aren't especially important here).

Finally, we'll look at hGetContents, also from System.IO.
> import System.IO
> :t hGetContents
hGetContents :: Handle -> IO String

hGetContents takes a Handle to a file, and returns an IO monad carrying the entire contents of that file in a String. Normally, this might be a bit inefficient. However, because Haskell is lazy, none of the contents of the file will actually be retrieved until we need them.

To understand the rest of the program, you'll also need to know a few more things.

tails: This function takes a list, and returns a list of all suffixes of that list. For example, tails [1,2,3,4] is [[1,2,3,4],[2,3,4],[3,4],[4]]. In a lot of languages, running this on a very large list (such as the contents of a file) might be very inefficient. However, again, because Haskell is lazy, nothing in the list tails gives you is evaluated until it's needed.

isPrefixOf is fairly self-explanatory; it takes two lists, and returns True if the first is a prefix of the second, False otherwise.

filter takes two arguments: a function from some type to a Boolean, and a list of things of that type. It returns the elements from the list for which the function evaluates to True.

Finally, handle. This function takes two arguments. The first is a function from some exception type to an IO monad. The second is an IO monad of the same type. When handle is called, it tries to evaluate the second argument. If an exception occurs, it passes the exception to the function and returns what it gets back. Otherwise, it just returns that second argument. (That's probably not very clear. It's essentially "Try to do something (the second parameter). If you fail for some reason, do this first parameter instead.")

Finally finally, two dashes are used to indicate comments.

So, here's the code:
import Control.Exception
import Data.List
import System.Environment
import System.IO

--Relevant frame name constants
albumFrame = "TAL"
titleFrame = "TT2"
artistFrame = "TP1"

extractArtist :: [Char] -> [Char]
extractArtist file = readUntilZero (drop 7 (file `cutBefore` artistFrame))

extractTitle :: [Char] -> [Char]
extractTitle file = readUntilZero (drop 7 (file `cutBefore` titleFrame))

extractAlbum :: [Char] -> [Char]
extractAlbum file = readUntilZero (drop 7 (file `cutBefore` albumFrame))

readUntilZero :: [Char] -> [Char]
readUntilZero [] = []
readUntilZero ('\0':xs) = []
readUntilZero (x:xs) = x:(readUntilZero xs)

cutBefore :: [Char] -> [Char] -> [Char]
cutBefore str substr = (filter (isPrefixOf substr) (tails str)) !! 0

main = do
(fName:otherArgs) <- getArgs
inHandle <- openFile fName ReadMode
fContents <- hGetContents inHandle
if ("ID3" `isPrefixOf` fContents)
then do
handle ((\_ -> putStrLn "Title Not Found")::SomeException -> IO()) (putStrLn ("Title: " ++ (extractTitle fContents)))
handle ((\_ -> putStrLn "Album Not Found")::SomeException -> IO()) (putStrLn ("Album: " ++ (extractAlbum fContents)))
handle ((\_ -> putStrLn "Artist Not Found")::SomeException -> IO()) (putStrLn ("Artist: " ++ (extractArtist fContents)))
else putStrLn("No ID3 tags detected")

NOTE: This program as written works only for ID3 version 2.2. Versions 2.3 and 2.4 use different identifiers for the frames and a slightly different frame header format. To make the program work for versions 2.3 and 2.4 instead, make the following changes:
In line 7, replace "TAL" with "TALB"
In line 8, replace "TT2" with "TIT2"
In line 9, replace "TP1" with "TPE1"
In lines 12, 15, and 18, replace "7" with "9"
(Disclaimer: I haven't tested this modified version)

After compiling, I can run it, and it looks like this:
noethers-imac:Haskell noether$ ./id3reader "02-I Feel Fantastic copy.mp3"
Title: I Feel Fantastic
Album: Our Bodies, Ourselves, Our Cybernetic Arms
Artist: Jonathan Coulton


Or, I can give it something that doesn't have ID3 tags. Obviously the compiled program itself won't have any, so let's try that:
noethers-imac:Haskell noether$ ./id3reader id3reader
No ID3 tags detected


Finally, I made a file that starts with "ID3" (the program's method for determining if there might be tags) but contains no tags. (I basically just typed "ID3" and then banged on the keyboard like a monkey for a bit.) We can run the program on that, and get this:
noethers-imac:Haskell noether$ ./id3reader fakeTag.txt
Title Not Found
Album Not Found
Artist Not Found

No comments:

Post a Comment