Vitavonni

Wed, 23 Aug 2006

Stepping beyond tag clouds

Tag clouds [en.wikipedia.org] are a current must-have for web 2.0 applications.

Examples can be found for bookmarks, blog entries, photos, music or books.

Tag clouds are hip, because they're a dynamic feature and show the "Zeitgeist" [wikipedia]. They given an overview on the users ("California" in flickr) or on current hot topics ("Israel", "Lebanon" in technorati).

However, tag clouds also have severe limitations.

First of all, they're arbitrarily ordered. Usually alphabetic, so there is no content relationship among the entires.

Secondly, they only show an excerpt, since there are usually much more tags than fit on the screen.

Thirdly, they're atomic information, whereas relations as used e.g. in RDF [wikipedia] can convey much more complex information.

I'm trying to push tag clouds to a next level. They're a gimmick right now, but maybe we can make them to a powerful navigation tool?

Together with Enrico Zini I've just created my first tag cloud (I've skipped making a tag cloud for my blog...).

Well, it quickly evolved beyond a tag cloud. You could maybe call it a tag sky. Or tag forest.

I'm not using my blog or something like this for the tag cloud. That would be quite boring, I'm not doing real tagging on it. Instead I'm using software tags. The Debtags project, led by Enrico and I, has been working on software tags for some years now during our spare time. We have about 600 tags in a dozen of facets, and 15000 software packages (I don't have the number ready how many of that are somewhat tagged already). Well, the tagging efforts are still far from complete, thats why we're currently working on an AI to assist tagging efforts, too.

We generated two different renderings of the tag clouds for you: one separated cloud per facet, and all folded into one big cloud. Oh, and actually click on one of the tags, it will take you to a more complex tag-based navigation tool and a tagger.

So what makes these different from the usual tag clouds you see everywhere (apart from the sheer size, sorry about that. Maybe we'll add buttons next to hide/show tags with low occurrence numbers)?

Well, the tags next to each other aren't completely unrelated any more, since they are (in both renderings) grouped by their facet. This makes it easier to locate something - go through the red facets first, then look at the tags in the group.

I'm thinking about a second step, which would involve dynamic expanding details in the tag cloud, or hiding them, finally transforming the tag cloud into a true navigation utility beyond a "single click filter".

In my final diploma thesis, one of the topics to work on suggested by my professor is doing "tag clouds" (i.e. weighted lists) for relations. The prototype will likely be integrated with the IkeWiki semantic wiki. I don't have a clear vision of how the "relation cloud" will work or look like, but I havn't started with my tesis yet anyway. I currently imagine up to three clouds (corresponding to the empty places in the relation) that will dynamically adopt to the choices already made by the user. Some zooming will probably be needed, too.

Another use of tag clouds would be a visualization of the AI - the weights could be chosen by how sure the AI is about this tag; the cloud would then describe the AIs rating of a software package description.

If you have some ideas, good links, relevant papers or other feedback, just send me an email to erich@debian.org. Thank you.

[category: /en/xml | Permalink]

Archiving maildir

I've been using the maildir format for my mail boxes for some years now. I'm really happy with this solution - no locking issues, and decent support in all applications I use.

(Well, evolution has been crashing with Maildir recently, but switching to using a local IMAP server resolved this just fine.)

But the biggest benefit of maildir is that it's dead easy to write your own tools for it. I've written a small perl script that moves read mails that are not flagged into my archive after 30 days. Over the last few years, this archive (not containing large mailinglists such as debian-devel which are publicly archived) has grown a lot.

Now the benefits of maildir, a separate file for each mail, turn into drawbacks. For my archive, I don't need to care about locking or random access. Actually I rarely access it ever. But since a file always occupies whole blocks, my mailarchive occupies 1.6 GB on my disk, with just 1.2 GB of data. An experience value is a compression ration of around -60% for bz2. So by switching from Maildir to mbox.gz I can probably free up 800 MB on my disk. (And I'll move the older years onto my encrypted backup HD anyway)

Now I'm looking into scripting the conversion from Maildir to mbox.gz. I'm still a bit undecided on which tools to use...

[Update: python2.5 has support for mbox and maildir... Here's maildir2mbox in 5 lines of python:

import sys, mailbox
md = mailbox.Maildir(sys.argv[1], None)
mb = mailbox.mbox(sys.argv[2])
for mail in md:
        mb.add(mailbox.mboxMessage(mail))
nice, uh? ;-) ]

[category: /en/linux | Permalink]
Menu
[planet.debian]
[planet.xmlhack]
[planet SELinux]
[munichblogs]
[email]
[RSS 2 feed]
[English RSS 2]
Categories
< August 2006 >
SuMoTuWeThFrSa
   1 2 3 4 5
6 7 8 9101112
13141516171819
20212223242526
2728293031  
Archives
2010-Mar
2010-Feb
2010-Jan
2009-Dec
2009-Nov
2009-Oct
2009-Sep
2009-Aug
2009-Jul
2009-Jun
2009-May
2009-Apr
2009-Mar
2009-Feb
2009-Jan
2008-Dec
2008-Nov
2008-Oct
2008-Sep
2008-Aug
2008-Jul
2008-May
2008-Apr
2008-Mar
2008-Feb
2008-Jan
2007-Dec
2007-Nov
2007-Oct
2007-Sep
2007-Aug
2007-Jul
2007-Jun
2007-May
2007-Apr
2007-Mar
2007-Feb
2007-Jan
2006-Dec
2006-Nov
2006-Oct
2006-Sep
2006-Aug
2006-Jul
2006-Jun
2006-May
2006-Apr
2006-Mar
2006-Feb
2006-Jan
2005-Dec
2005-Nov
2005-Oct
2005-Sep
2005-Aug
2005-Jul
2005-Jun
2005-May
2005-Apr
2005-Mar
2005-Feb
2005-Jan
2004-Dec
2004-Nov
2004-Oct
2004-Sep
2004-Aug
2004-Jul
Other links:
Swing and the City - Lindy Hop in Munich