<?xml version="1.0" encoding="iso-8859-1" ?>
<rss version="2.0" 
   xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" 
   xmlns:html="http://www.w3.org/1999/html" 
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
   xmlns:slash="http://purl.org/rss/1.0/modules/slash/">
<channel>
   <title>Blog of Erich Schubert</title>
   <link>http://blog.drinsama.de/erich</link>
   <description></description>
   <language>en</language>
   <copyright>Copyright 2007 by Erich Schubert</copyright>
   <ttl>60</ttl>
   <pubDate>Sat, 06 Dec 2008 12:43 GMT</pubDate>
   <managingEditor>n/a</managingEditor>
   <generator>PyBlosxom http://pyblosxom.sourceforge.net/ 1.3.2 2/13/2006</generator>
<item>
   <title>On Facebook</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2008120601-facebook</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2008120601-facebook.html</link>
   <description><![CDATA[
<p>I've become a Facebook user. Caused by many of my dancing friends and lots of
collegues from university being there:
Facebook is a convenient and easy way to keep some contact with all those
people that aren't like your core friends, but still friends, and that are
all around the world.</p><p>I've also toyed around with the Facebook API, and actually written two small
applications for it. From a technical point of view, I'm actually quite
impressed by Facebook.</p><p>Yes, Facebook seems to have quite some load trouble these days. Sometimes it's
very slow. Sometimes it just malfunctions (with being kicked out and having
to re-login being just the least). These days, messages were distorted every
now and then, some app messages were always missing the last few words.</p><p>What seriously <em>does</em> impress me is the architecture they use with
third party applications. It's designed around cachability (so profile pages
aren't slowed down by slow or broken third party applications) and they do
jump some hoops to allow application writers do a lot of things while
preventing them to disturb other applications or the core functionality.</p><p>For all I know, Facebook is the first major thing to do
<b>CSS and JavaScript rewriting</b>. Data produced by third party applications
is fed through a very smart parser and rewriter that allows an impressively
large subset of CSS and JavaScript to be used without the developers having
to pay attention to not producing conflicts. In CSS, rules are prefixed with
a selector to restrict them to their applications scope. In JavaScript, object
references are uniquified, and the convenience functions you have for
interacting and accessing nodes (including functions to do common things such
as modify CSS class assignments) take care of all that. Access to the raw
JavaScript methods is filtered, so you can't e.g. use parentNode to get access
to objects outside of your scope. At least in theory.</p><p>Much of this is for the benefit of users: applications are not allowed to do
annoying animations unless the user has just interacted with them; apps also
can't modify or disturb others, or read data from other applications via DOM.</p><p>Well, of course there might be one or another security issue still there; some
of these things might also be related to the performance issues of Facebook
recently. And of course there are bugs. Lots. A couple of things still need
to be thought through properly (e.g. aggregation of feed messages with
multiple targets, localization functionality for applications, finer grained
control of data access). But their CSS and JavaScript rewriting is really
cool.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Sat, 06 Dec 2008 12:43 GMT</pubDate>
</item>
<item>
   <title>Google results WTF</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007123101-google-results-wtf</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007123101-google-results-wtf.html</link>
   <description><![CDATA[
<p>It seems like Google broke something in their search.
For example, searching for "<tt>iTunes-library-xml Python</tt>" (I've just
written a parser which will turn Apple PropertyList Pseudo-XML into a useable
Python object consisting of hashes, arrays, integers etc. btw.) actually gives
me results (at least the one around #4) that don't contain "iTunes" (and I'm
pretty sure also never contained).</p><p>OUCH:</p><p>Looking at the cached version of the result when searching for "<tt>iTunes
library xml python</tt>" contains the notice "The word iTunes is only found
on pages linking to this page". Google knows two pages linking there, neither
contained the word iTunes.</p><p>This is not what I understand as an "exact phrase match".
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 31 Dec 2007 15:36 GMT</pubDate>
</item>
<item>
   <title>Spamtrap followup</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007122801-spamtrap-followup</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007122801-spamtrap-followup.html</link>
   <description><![CDATA[
<p>A few days ago, I asked about
<a href="http://blog.drinsama.de/erich/en/xml/2007122301-proper-way-of-embedding-spamtraps.html">how to properly embed spamtraps in web pages</a>.</p><p>Well, noone could tell me if using <tt>display: none</tt> is appropriate. I
actually do not want Google to index the contents in that div. So as long as
they don't punish me for using <tt>display: none</tt> at all, it's okay. And
the page I placed the spamtrap on is a doorway-like page for others anyway;
it's not part of an important site.</p><p>It took the first spammer around 54 hours to send the first spam. Or try to
send: all 10 retries with different zombies were rejected by my spam filter.
Since then, I've been receiving another round of deliver attempts - around 5-15
per spamtrap address - almost every hour.</p><p>Of the &gt; 500 spam delivery attempts I've seen since, none made it through
my initial spam filters (not to speak of the content filter behind that), but
they were rejected at the SMTP level, even before the mail content was sent.</p><p>I've now disabled some of my spam filters to allow the trap adresses to
actually receive mail. After all, I want to use them to train my filter. :-)
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Fri, 28 Dec 2007 02:18 GMT</pubDate>
</item>
<item>
   <title>Visualizing with XHTML and SVG</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007122501-visualizing-with-xhtml-and-svg</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007122501-visualizing-with-xhtml-and-svg.html</link>
   <description><![CDATA[
<p>My thesis is about data mining, clustering of correlated data in high
dimensional vector spaces, to be a bit more precise.</p><p>In detail, I'm working on methods to improve upon existing clustering
algorithms such as <a href="http://www.dbs.informatik.uni-muenchen.de/~zimek/publications/sigmod04-4C.pdf">4C (Computing Clusters of Correlation Connected Objects)</a> and <a href="http://www.dbs.informatik.uni-muenchen.de/~zimek/publications/SSDBM2007/ERiC.pdf">ERiC (On Exploring Complex Relationships of Correlation Clusters)</a>, where you need to pick some parameters (e.g. k for a k nearest neighbour based approach) appropriately.</p><p>My approach is twofold. On one hand, I'm improving upon the traditional
covariance based correlation (which is quite sensitive to noise), so the
parameters become easier to pick, on the other hand I'm working on an approach
to automatically fine-tune the parameters to further improve stability.</p><p>For testing my computations I needed a visualization of this data. I was
considering using gnuplot (and in fact I'm using gnuplot a lot), but for some
situation I needed animation capabilities, and thats where gnuplot becomes
really messy.</p><p>So I decided to dive into SVG and Javascript. Here's my first SVG project:</p><p><a href="http://www.cip.ifi.lmu.de/~schubert/kovar/">Visualizing kNN
correlation in SVG with Javascript</a></p><p>(Internet Exploder is not supported. I don't have Windows, and for all I know
it doesn't really support SVG. Use a Gecko-based browser such as Firefox, Opera
and Safari (at least on Windows) also seem to work. I didn't get it to work on
kHTML/Konqueror/Webkit. I'm just doing this for myself, so I have no need to
support other browsers.)</p><p>It's a 3D dataset, consisting of 300 points. 100 points are noise, 100 points
are in a 2D cluster (green) and 100 points are on a 1D cluster embedded into
this plane (I'm working on algorithms that support hierarchical clusters, so
I needed a dataset with this property!).</p><p>There are two buttons in the UI, one toggles rotation, the other one toggles
the playback of "k". It will cycle k through a range of about 3-200. When
offset hits 20 (so k would be 22 or 23), the main correlation vectors - the
big blue lines - already point along the 1D cluster. At an offset of around 80
they have already diverged quite a bit from the 1D cluster - at this point, the
correlation is seeing the 2D plane quite well already.</p><p>I could also show you the behaviour for points in the 2D plane (but outside
of the 1D cluster) and noise points.</p><p>We're preparing a paper for SSDBM 2008.</p><p>[Update: Safari works at least on Windows]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Tue, 25 Dec 2007 20:13 GMT</pubDate>
</item>
<item>
   <title>Proper way of embedding spamtraps?</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007122301-proper-way-of-embedding-spamtraps</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007122301-proper-way-of-embedding-spamtraps.html</link>
   <description><![CDATA[
<p>I'm considering to embed some spamtraps (i.e. email adresses that will feed
all their incoming email to the spam filter) into some web pages.</p><p>However, I want to prevent people from accidentially using these links or even
just seeing them. So using "display: none" seems appropriate. But Google is
known to punish websites 'hiding' content from users but not from robots.</p><p>Some sites say, Google will just ignore the parts that are within
"display: hidden", others say it will punish the site altogether.</p><p>Maybe the adsense control comments will help
<pre>
&lt;!-- google_ad_section_start(weight=ignore) --&gt;
</pre>
but it wouldn't really make sense. It's meant for adsense only.</p><p>Or the page could become a bit more hackish and use javascript to kill the
unwanted content. Any experiences with the proper way of hiding spamtrap email
links like this:
<pre>
&lt;a href="mailto:aaaaaaa-never-email-this-address@domain.tld"
&gt;Unwanted Emails only&lt:/a&gt;
</pre>
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Sun, 23 Dec 2007 21:17 GMT</pubDate>
</item>
<item>
   <title>Thoughts on Google Calendar embedding</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007122101-google-calendar-embedding</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007122101-google-calendar-embedding.html</link>
   <description><![CDATA[
<p>I've recently considered using a Google calendar for a project, and tried to
embed it in a web site.</p><p>However, there are a few issues I'm having with it:
<ul>
<li>Colors: it's blue, which clashes with the site colors</li>
<li>Multiple calendars: the UI for enabling/disabling which calendars to show
is too hidden</li>
<li>Long way to the map: I love that you can add map information to entries in
the calendar, however, it takes two extra windows to reach the map: first you
have to go to the details of the entry (although the address is in the balloon)
which opens in a new window, then locate and follow the map link, which opens
another new window.</li>
</ul></p><p>Also the "multiple calendars" feature is a bit hackish. I'd like to be able to
differentiate events by flags such as "city", "outskirts", "training", "dance
event", "music event". Obviously, entries might have more than one, so I'd need
6 calendars for this already. Usually, you can combine five...</p><p>Guess I'd need to do this all in Ajax by myself. It would be cool if Google
Calendar hat an API for embedding like it has for maps. The Calendar API I've
seen so far is basically polling the raw data via JSON or XML. Which is already
great, but I do like some of the calendar layouting the do, and I'd like to
avoid having to replicate that myself.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Fri, 21 Dec 2007 09:26 GMT</pubDate>
</item>
<item>
   <title>Fun with spiders</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007071901-fun-with-spiders</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007071901-fun-with-spiders.html</link>
   <description><![CDATA[
<p>(no, not the animals, but web crawlers!)</p><p>For a pet project of mine, I've recently been spidering the web a bit myself.
So far, I've processed over 100.000 websites. The machine doing the spidering
is an old K6-450, so it's not particularly fast...</p><p>My spider is downloading the web pages HTML, and eventually some framesets
(but at most 1 level deep). It's using text contents, image 'alt' attributes,
title and some meta tags. The text contents are tokenized and stemmed.</p><p>This results in some fun numbers:
<ul>
<li>The average web page uses about 194 different words.</li>
<li>The average token (after stemming!) is 6.8 characters long</li>
</ul></p><p>Each of the web pages I'm spidering has about 6.4 categories assigned to it.
I'll be using this training set to train an AI to classify web sites.</p><p>(I've also started a <a href="http://www.kno10.com/">web page</a> for the
project, but it's still pretty much empty so far, not worth looking.)
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Thu, 19 Jul 2007 11:46 GMT</pubDate>
</item>
<item>
   <title>Updating address books and privacy</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007052001-address-books-and-privacy</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007052001-address-books-and-privacy.html</link>
   <description><![CDATA[
<p>An often quoted feature of services such as OpenBC/Xing (and most of such
'pure social networking' sites) is that they basically allow you to keep an
address book without having the need to update it yourself.</p><p>Some people may even argue that this is the only real benefit these social
networking sites do actually offer.</p><p>There are of course services dedicated to helping you keep your address book
up to date. These often offer plugins for Thunderbird and Outlook, so you can
actually use the address book directly. (e.g. Plaxo) Some email providers even
have a function to send out "please update my address book entry on you" emails
to your receipients (e.g. web.de), but most people find these quite annoying.</p><p>Now some people might argue that you could use the FOAF standard for this. But
publishing your FOAF data on the web is a privacy problem. Most people won't
be willing to publish much more than their email address there. Just like some
people are not willing to entrust their information to services such as OpenBC.</p><p>Using e.g. HTTP authentification to restrict access to your FOAF data is also
not working very well: you'd need some user management to be able to revoke
access or change the access credentials if the passwords are leaked somehow.</p><p>OpenID would definitely be interesting, but how many of your friends have
OpenID yet? And not everybody has access to deploy the server side needed for
this.</p><p>The easiest to deploy approach would be to just use public key encryption. You
could then upload an encrypted copy of your data for each 'friend' to any web
site. You could also upload different data (including work contact information
only, for example) for different recipients.</p><p>My idea is like this:
<ul>
<li>The contact information you are willing to share is published encypted via
PGP for the recipient</li>
<li>FOAF data includes a pointer to the base URI for this data</li>
<li>Base URI + GPG key id gives the location for the data</li>
<li>Data should be a more detailed FOAF file or vCard?</li>
<li>Client ("address book management") applications retrieves and updates this
data on demand ("update" button) or e.g. after a timeout of one month</li>
</ul></p><p>Big benefits of this approach:
<ul>
<li>Very high privacy</li>
<li>You don't need to entrust any service provider with your data</li>
<li>Distributed, vendor-neutral, provider-neutral approach</li>
<li>Standards based (FOAF, HTTP, PGP, vCard/iCalendar)</li>
</ul></p><p>Drawbacks:
<ul>
<li>Standards such as FOAF and PGP aren't very widely used yet</li>
<li>Not as easy to use (yet) as websites like OpenBC</li>
<li>Require that you have some URI to publish your FOAF and contact data at</li>
<li>No 'push' updates possible without active servers or sending emails</li>
<li>(No implementations - well, this is just a concept right now!)</li>
</ul>
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Sun, 20 May 2007 14:27 GMT</pubDate>
</item>
<item>
   <title>What "beta" means in a web site name</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007041102-what-beta-means</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007041102-what-beta-means.html</link>
   <description><![CDATA[
<p>"Beta" is short for "BETrAying". It means the web site isn't honest to you
about the benefits for you, what they are doing with the data or what they
are telling you.</p><p>Examples:
<ul>
<li>Download services: 'all download slots are busy, please wait 15 seconds' -
not true, but they just would like you to look at the ads for 15 seconds</li>
<li>Social networking: Myspace claims "foo is in your extended network". Even
when you are the Google robot or using an anonymizer. This is a hard-coded
string, not a calculated connection.</li>
<li>Most Web 2.0 services: we have no idea how to continue providing this
service for free once our venture capital is used up.</li>
</ul></p><p>So spelled out it would be like:
<blockquote>
Sorry, we're not telling you the truth about our service and the benefits, and
that's why we're calling the service beta; once we've found out how to do it
right and still earn some money with it, we'll remove the beta sign.
</blockquote>
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Wed, 11 Apr 2007 15:02 GMT</pubDate>
</item>
<item>
   <title>Browser detection</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007040903-browser-names</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007040903-browser-names.html</link>
   <description><![CDATA[
<p><em>Don't use the browser name for capability detection</em>.</p><p>I'd like to emphasize that. For example Google Groups won't let me upload an
image to my profile, because I'm not running Firefox or Internet Exploder.</p><p>Well, I tried both Epiphany (which has a very clean and fast UI, unlike Firefox
which is totally cluttered) and Iceweasel (which <em>is</em> Firefox, but with
the trademark replaced). Both are using Gecko, Gecko/20070324 for Epiphany and
Gecko/20070310 for Iceweasel. So I'm very sure they have the same capabilites
as Firefox 2 when it comes to web sites.</p><p><b>If you want to test for Firefox's capabilities, use the Gecko version
number.</b>. Thank you.</p><p>[Update: it was suggested I point people to
<a href="http://geckoisgecko.org/">GeckoIsGecko.org</a>, which has links on
how to properly detect the Gecko engine, instead of relying on the browser
name.]</p><p>[Update: <a href="http://web.glandium.org/blog/?p=125">Mike Hommey</a> pointed
out that the 'Gecko/date' string is mostly meaningless, and largely is the
build date; it doesn't contain tree information. Instead you should be using
the "rv: 1.8.0.11" part of the User-Agent. This is also what the getGeckoRv
function on the howto linked from
<a href="http://geckoisgecko.org/">GeckoIsGecko.org</a> does. Oh, and Firefox
2 is not gecko 2, but IIRC uses gecko 1.8.x, just like my Epiphany.]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 09 Apr 2007 11:12 GMT</pubDate>
</item>
<item>
   <title>Please, think of the animals formerly known as kittens...</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007032401-please-think-of-the-firefoxes</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007032401-please-think-of-the-firefoxes.html</link>
   <description><![CDATA[
<p>(Title stolen from <a href="http://layer-acht.org/blog/debian/#1-93">Holger
Levsen</a>)</p><p>Whenever I see
<a href="http://developer.mozilla.org/wiki-images/en/e/e0/Moz_ffx_openStandards_1024x768.jpg">this image [mozilla open Standards]</a>, I want to make a spoof
of it titled:</p><p><b>Every time you make an Ajax app, god kills a firefox</b>.</p><p>But I would certainly be violating Mozilla trademarks by doing so (their
artwork, logos and trademarks such as "firefox" are <em>not</em> OpenSource).</p><p>Anyway: <b>AJAX</b> is a hackaround, in particular it is <b>not an open
standard</b>. Please use it only where it's really needed. Granted, there is
worse (e.g. Flash; prepare for incompability hell now that the
<a href="http://www.advogato.org/person/company/diary.html?start=34">first
opensource plugin can playback youtube videos</a> - as you might be aware, many
Linux distributions <em>can not ship Adobe Flash</em>, so they'll likely start
shipping this plugin as soon as it's somewhat working sufficiently; or ActiveX
which only works with MSIE...), but that's not really a good excuse for this
abuse of Javascript that is called AJAX.</p><p>On a side note, since I already mentioned flash - I can really recommend the
<a href="http://flashblock.mozdev.org/">Flashblock mozilla extension</a>. A
must have: you can view any flash if you need to by just clicking on it, but
they won't be loaded automatically anymore. So you can easily access youtube
(just one extra click!), but won't be bothered by flash ads and such stuff.</p><p>Oh, and Adobe. They're probably the biggest blocker for a widespread Linux
adoption <a href="http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=desktop_applications&articleId=9013280&taxonomyId=86&intsrc=kc_feat">judging by this article [computerworld.com]</a>, which is already very positive on Linux ("Unlike many of the applications included on new
Windows systems, these don't seem to come with annoying self-launching
advertisements, such as the irony-challenged Trend Micro Anti-Spyware
pop-up upgrade pleas that plagued my HP system at home."): maybe his biggest
issue is that he couldn't just run his Adobe Photoshop Elements on Linux.</p><p>Of course there are application trying to offer the same functionality;
starting with <a href="http://www.gimp.org/">Gimp</a>,
<a href="http://www.digikam.org/">digiKam</a> and
<a href="http://www.koffice.org/krita/">Krita</a> (and I'm not sure he tried
Krita and digiKam as well; they are probably more similar to Adobes product),
but I can understand his wish to be able to continue using the same
applications.</p><p>(My personal recommendation: start using Opensource applications on Windows,
e.g. Firefox, Thunderbird, <a href="http://inkscape.org/">Inkscape (great
vector graphics program!)</a> (and it's using open standards: SVG),
<a href="http://gaim.sf.net/">Gaim (multi-protocol instant messenger</a> and
tons of others. They're free, so even if you don't use them every day, you
didn't waste money on them... and if you happen to like them: you can be sure
that they'll be working the same if you do the switch to Linux at some point in
the future. Be prepared for when Microsoft says you PC is too old.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Sat, 24 Mar 2007 00:36 GMT</pubDate>
</item>
<item>
   <title>What the B in blog stands for</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007031904-what-the-b-in-blog-stands-for</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007031904-what-the-b-in-blog-stands-for.html</link>
   <description><![CDATA[
<p>Rumor: the B in "<em>b</em>log" actually means <em>beta</em>.</p><p>Original text of this post:</p><p>You know you've been exposed to too many Web 2.0 applications when you start
to think the B in the blogger.com logo is for "beta". I was like WTF, are they
making "beta" to be all of their logo now?</p><p>And yes, I know about it being a shortened version of "weblog", likely
coming from the pun "we blog".
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 19 Mar 2007 17:13 GMT</pubDate>
</item>
<item>
   <title>Drop out of Google by using Ajax.</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007031701-google-and-ajax</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007031701-google-and-ajax.html</link>
   <description><![CDATA[
<p><a href="http://www.brainhandles.com/2007/03/11/does-google-index-dynamic-javascripted-content/">Brain Handles</a> did a test to see if Google indexes Javascript-generated content (for simple scripts).</p><p>It doesn't. So if you are doing a heavily Ajax-based web page, you are (still)
risking to prevent Google from indexing your contents.</p><p>(Note that for two prime Ajax examples this doesn't matter: GMail and Google
Maps. One doesn't have any public content anyway, the other no text.)</p><p>Please use Ajax only sparingly. It's a dirty workaround for shortcomings in
interaction capabilities of HTML and web browsers, not the ultimate solution
to all our web problems.</p><p>Two consequences:
<ol>
<li>For now you can hide content from being indexed by Google by putting it into
Javascript code. Note that it will be missing for non-Javascript-enabled
browsers then, too.</li>
<li>There might be a business opportunity in running Javascript in a Crawler
to obtain additional text to index. This is best when integrated with the
Javascript interpreter, to e.g. index text changes as they are generated. And
this could also be used to detect e.g. sites with annoying commercials (popups),
fraud sites or sites which try to run browser exploits or that try to cheat on
search engines by e.g. doing a Javascript redirection.</li>
</ol>
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Fri, 16 Mar 2007 23:51 GMT</pubDate>
</item>
<item>
   <title>On Yahoo Pipes</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007031201-yahoo-pipes</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007031201-yahoo-pipes.html</link>
   <description><![CDATA[
<p><a href="http://orebokech.blogspot.com/2007/03/playing-with-yahoo-pipes.html">Romain Francoise mentions Yahoo Pipes</a>.</p><p>Well, I played with <a href="http://pipes.yahoo.com/">Yahoo pipes</a> like one
or two weeks ago; and while I was impressed with their Visio-Like UI, I was
lacking pretty much all functionality I wanted to try...</p><p>My goal was simple: run a query on Google Blog Search (which will have the
result available in RSS), and then grab all URLs out of that stream.</p><p>But I didn't find any 'filter' in Yahoo Pipes which allowed me to extract
the URLs (or any part of the text, actually) from the blog entries. I don't
want to remove whole result entries, but I just want to extract certain
text chunks from their body... (there might be multiple, so the regexp
module isn't an option either).</p><p>I could do that with Python in a few lines, actually.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 12 Mar 2007 02:08 GMT</pubDate>
</item>
<item>
   <title>Never rely on Ajax</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007030601-never-rely-on-ajax</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007030601-never-rely-on-ajax.html</link>
   <description><![CDATA[
<p>Fortunately, I rarely use my Google GMail account.</p><p>Because for about a month or so, I can't write replies in my preferred browser
anymore. I can only guess it's due to some broken browser detection done by
Google - my preferred browser is Epiphany, and uses XulRunner. So it's the
same engine as my Firefox, and I can write replies with Firefox (but the
Firefox UI is not as nice as Epiphanys, and it uses more memory).</p><p>(Well, almost. Epiphany is Gecko/20070209, whereas Firefox is Gecko/20070208.
So either some change in this one day breaks GMail, or Google broke it
themselves with some stupid browser detection; many people still think it's
sufficient to check for 'Firefox' to detect all non-IE users. Please use the
engine ID, that is Gecko.)</p><p>Anyway: if I'd be a heavy Google Mail user, that would be a desaster for me.
Broken for a month now and counting!</p><p>Fortunately, I don't rely on that friggin' Ajax stuff; I can either use the
standard HTML version of GMail or use Firefox. I'd just like to emphasize that
<em>Ajax apps break much easier</em>, and your users might be unhappy about
that. <em>Ajax is far from perfect</em>, but an ugly hack.</p><p>Don't overuse it.</p><p>[Update: I fixed my GMail issues by switching the language to German and back
to English. Weird, huh?]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Tue, 06 Mar 2007 22:30 GMT</pubDate>
</item>
<item>
   <title>Dojo WTF?</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007022102-dojo-dailywtf</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007022102-dojo-dailywtf.html</link>
   <description><![CDATA[
<p>From <a href="http://dojotoolkit.org/">Dojo Toolkit</a> (a very powerful AJAX
toolkit), but should probably go to <a href="http://thedailywtf.com/">The Daily
WTF</a>...
<pre>
_getAdjustedDay: function(/*Date*/dateObj)
  //summary: used to adjust date.getDay() values to the new values based on the current first day of the week value
  var days = [0,1,2,3,4,5,6];
  if(this.weekStartsOn&gt;0){
    for(var i=0;i&lt;this.weekStartsOn;i++){
      days.unshift(days.pop());
    }
  }
  return days[dateObj.getDay()]; // Number: 0..6 where 0=Sunday
}
</pre>
That code is inefficient and stupid on so many levels. For example the if
statement... you might be aware that 0 &lt; 0 is false.</p><p>Yep. I'd prefer something along the lines of
<pre>
  return (dateObj.getDay() - this.weekStartsOn) % 7;
</pre>
No arrays were abused during the making of this function.</p><p>I always thought PHP programmers were the worst, but apparently some JavaScript
"coderz" are up to par.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Wed, 21 Feb 2007 15:12 GMT</pubDate>
</item>
<item>
   <title>XML schema datatypes</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007020902-xml-schema-datatypes</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007020902-xml-schema-datatypes.html</link>
   <description><![CDATA[
<p>... suck.</p><p>The XML Schema "datetime" format can't be handled by Java's SimpleFormat (ok,
that probably is more Javas fault, of not being able to handle hours:minutes in
the timezone specification, anyway, this is mildy annoying).</p><p>The XSLT2 parsers are very intolerant about the format of the time
specification, too. They could have made more stuff optional such as the
specification of seconds; an error here will make the whole XSLT fail with
an exception. Some compact form of error handling would be nice... as would
be a smart parser which can handle various formats.</p><p>The XML Schema "duration" is even worse. First of all, it was completely
forgotton when doing XSLT 2; there are no functions to format or disassemble
it (except by regular expressions, which could also use a zero-width lookahead).</p><p>Secondly, it's lacking common specifications such as "next week". While "next
week" is computationally equivalent to "in 7 days", it can have different
semantics in some contexts (especially when not being aligned):</p><p>If I'm looking for the week February 9th 2007 is in, the result is February 5th
two February 11th. If I'm looking at a 7 day interval containing this day,
there are infinite possibilities (aligned on milliseconds and below...).
So it does make sense to make a difference between 7 days and a week.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Fri, 09 Feb 2007 16:40 GMT</pubDate>
</item>
<item>
   <title>Calendaring in a semantic wiki</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007020801-wiki-calendaring</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007020801-wiki-calendaring.html</link>
   <description><![CDATA[
<p>... is the working title of my diploma thesis.</p><p>Tomorrow I'll hold the introductory presentation on my topic in the research
seminar. This is to explain my project to other members of the working group,
so they can give me feedback and suggestions.</p><p>What I'll be doing:
<ul>
<li>extend the existing
<a href="http://ikewiki.salzburgresearch.at/">IkeWiki semantic wiki</a>
with a simple calendaring function</li>
<li>investigate use cases for calendaring in a wiki context</li>
<li>eventually develop a simple query language, that allows non-tech-savy
users to do common queries and rules; others will probably be given full
SPARQL or some extension on top of it (for temporal calculations)</li>
</ul></p><p>Where I am:
<ul>
<li>I think I know where I'm heading</li>
<li>I've started hacking on IkeWikis internals</li>
<li>IkeWiki actually queries the database via SPARQL to retrieve relevant
temporal information and displays them in a simple calendar view</li>
<li>I have some months to go for finishing my thesis</li>
</ul></p><p>Difficulties:
<ul>
<li>Repeating events. Actually a very common thing. I havn't seen a calendaring
software that could handle more than just the simplest tasks. Please name a
calendaring software that can do "the last wednesday each month which is not
a national holiday".</li>
<li>Custom time types: working days, work time (8-18 on work days), holidays,
...</li>
<li>Sub-events. Should actually be pretty easy in a semantic wiki, but defining
the semantics right and making an okay UI can get tricky</li>
<li>Non-gregorian calendars (I'll probably not handle this)</li>
<li>Ambiguity in expressions. If you have a meeting scheduled for some day at
10:00, duration: 1 day, it probably doesn't end at 10:00 the following day,
but at 18:00, end of the working day it started. If the meeting starts at
17:00, duration 1 day, it probably ends at 18:00 the next day...</li>
<li>Various technical stuff, including:</li>
<li>XSLT sucks for datetime handling. It can only process very few formats;
XSLT 2.0 has format-dateTime, but it can't format a duration for all I
know. Still IkeWiki is using XSLT a lot.</li>
<li>SPARQL is a pain for negation handling; you can do "IF NOT EXISTS", but
it's very ugly. But this is common in calendaring: "if there is not another
event at the same time, the office hour is wednesday at 10am"</li>
<li>Ajax. Ikewikis UI makes heavy use of Ajax, but that makes development and
debugging much harder; often you won't see the actual error message (or have
to dig in some messy logs), because it's all obscured somewhere in XMLRPC and
JSON calls</li>
</ul></p><p>Fortunately, we have a rather strong research group here in Munich. There are
experts for query and transformation languages (e.g.
<a href="http://www.xcerpt.org/">Xcerpt</a>,
<a href="http://www.pms.ifi.lmu.de/projekte/#Streams">SPEX - Querying XML
Streams</a>), temporal calculations
(<a href="http://www.pms.ifi.lmu.de/CTTN/">Computational Treatment of Temporal
Notions, CTTN</a>) and time modeling
<a href="http://www.pms.ifi.lmu.de/projekte/#CaTTS">Calendar and Temporal Type
System CaTTS</a>); and I'm also working closely together with the author of
IkeWiki and Xcerpt. And of course they're all nice people to work with!
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Thu, 08 Feb 2007 01:01 GMT</pubDate>
</item>
<item>
   <title>Google Unbombing?</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007012902-google-unbombing</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007012902-google-unbombing.html</link>
   <description><![CDATA[
<p>So google has finally changed their algorithm somehow to remove Googlebombs.
I wonder which approach they chose. Maybe they require the search term to
actually occur on the destination page?</p><p>Anyway, I wonder if we could now do the opposite - rank down pages by
Googlebombing them. I wonder if we could e.g. all setup links to
<a href="http://www.microsoft.com/">Windows</a> (microsoft.com) and maybe
even mention the word "Googlebomb", and Google will think we're trying to
Googlebomb Microsoft this way? So at the end maybe the Wikipedia article
becomes hit #1? That would be cool.</p><p>P.S. Ever noticed that Google blocks lynx and wget by their user-agent string?
(well, I get error 400 with lynx, which might be different, but I can use the
--user-agent option of wget to bypass their filter, so at least that filter
exists and is kind of pointless...)
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 29 Jan 2007 01:16 GMT</pubDate>
</item>
<item>
   <title>More on configuration files</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007011301-configuration-files</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007011301-configuration-files.html</link>
   <description><![CDATA[
<p><a href="http://www.gwolf.org/index.php?blog/show/193">Gunnar doesn't like the
idea of recommending XML for configuration files</a>, why do you need to be
able to edit a XML-file with a non-XML-aware editor if you don't like the raw
syntax?</p><p>If you don't like the raw syntax, use an editor that gives you a different
representation. Or use some transformation. Write a tool that converts YAML
to XML and back, if you like YAML better. (Btw, this is another reason to use
a common library for configuration file handling - let people choose their
configuration file formats!)</p><p>Writing XML in the raw with a good schema-aware editor with syntax
highlighting is actually quite nice. Have you ever edited an XML file with
eclipse? You really should do that... I once opened my Openbox (a rather
minimalistic window manager) configuration file in eclipse. Guess what, it
was giving me useful syntax completion! It had loaded and used the referenced
schema file.</p><p><img src="http://mucl.de/~erich/blogdata/eclipse-openbox.png" width="450" height="217" alt="openbox configuration in eclipse" /></p><p>It's not as if I think XML is the ultimate thing; (nor is Eclipse an editor
I'd use for configuration files; startup takes years and it frequently crashes
for me. vim also has some XML support...) IMHO there is a lot in XML that
should be stripped (such as attributes); I like JSON syntax better, except
it's in turn lacking essential information such as character encoding,
namespaces and schema information. I also don't thin JSON allows comments.
But when handling information from multiple sources (and multiple schemes),
XML is really useful. It removes most of the quessing needed for handling
other formats.</p><p>And that is what I'm precisely advocating: use standardized formats. Consider
for example the apache configuration. Do you know of any tool that can parse
the apache configuration files other than apache? Some parts look like
SGML/XML, but they don't have much more in common than using &lt; and &gt;.
When you are in need of automating something with apache, you'll be annoyed
by this. If apache would be using something where you have a reliable parser
ready for - that would be nice.</p><p>Have a look at the xchat.conf configuration file. It uses "key = value", but
they have these extra spaces there and don't use quoting, this means the file
can't be loaded by many parsers, e.g. bash. Now lets use at buttons.conf -
compeltely different syntax, "KEY value" blocks, separated by empty lines...</p><p>Btw, note that configuration handling with XML to me means also keeping
comments somehow... most applications will nuke any comments in their
configuration files; which is funny since most configuration syntaxes do have
a notion of comments, but did you ever come across an application using
sh-style configuration (i.e. that you could source in bash/dash/zsh), that
keeps comments?</p><p>P.S. The YAML homepage is not YAML. It's valid XHTML. Only if you strip out
all the tags and attributes and use only the text content within
the /html/body/pre tag, then you have something which probably is YAML.
This compatibility with HTML is probably why XML was at all successful.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Sat, 13 Jan 2007 17:05 GMT</pubDate>
</item>
<item>
   <title>RDF representation of packages?</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007011204-rdf-representation-of-packages</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007011204-rdf-representation-of-packages.html</link>
   <description><![CDATA[
<p>I wonder if we should do some RDF representation of packages - especially
including dependency data.</p><p>That way we can maybe use some RDF reasoners to query our data, and maybe
extract some interesting information. On the other hand, people with interest
in RDF to use our real-world data for their experiments, and maybe we get
something back from them.</p><p>There is a couple of package metadata we currently are not tracking inside
the actual archive, but in different places. Including licensing information
(debian/copyright), homepage location (on packages.qa.debian.org), download
location (debian/watch) - it would be nice to aggregate these into some RDF
store, and export them somehow.</p><p>For most of the package information (especially dependency information),
we'll have to write our own ontology (I wonder if we can map version numbers
to some standard rule language, or if applications will need an external
reasoner to process them?); for some things we can reuse the
<a href="http://xmlns.com/foaf/0.1/">FOAF (Friend-of-a-friend)</a> or
<a href="http://usefulinc.com/doap/">DOAP (Description of a project)</a>
ontologies. The first is rather common for describing people, people-people
and people-thing relationships; the latter was designed for describing
opensource projects (but won't be directly applicable to packages of a
project).</p><p>I've blogged about my RDF export of Debtags data before; the canonical first
step would actually have been to export the package data, and enrich it with
the Debtags collected data...</p><p>Note that RDF is designed in a way that you can have one site provide metadata
for another site. For example, the Debtags RDF export contains "category"
information for Debian packages, but does not contain e.g. the description
of the packages it talks about via an URI. So there is nothing wrong from a
RDF point of view of keeping e.g. the licensing, watch or homepage data
separate.</p><p>For the <a href="http://code.google.com/soc/">Google Summer of Code</a>, there
was a proposal including "collaborative repository of meta-informations about
source packages (CRMI)"; but the first part of the proposal, the "distribution
wide tracker tool (DWTT)" showed to be a bigger task than expected.</p><p>But maybe we'll still see CRMI at some point, and maybe we can have it provide
an RDF export of data (using a semantic wiki might be a good starting point
for CRMI maybe?).</p><p>[P.S. this blog posting maybe belongs more into the en/linux/debian category.
But only the xml category is also syndicated on planet.XMLhack, and I want
this post to go there to reach more RDF users. I really need to switch my blog
to some software which supports tagging...]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Fri, 12 Jan 2007 16:32 GMT</pubDate>
</item>
<item>
   <title>Tomcat shutdown problems - resolved</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007011201-tomcat-shutdown-problems</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007011201-tomcat-shutdown-problems.html</link>
   <description><![CDATA[
<p>I had trouble with tomcat. It started fine, but shutdown took a very long time
(enough for eclipse to suggest killing it, for example).</p><p>Some people suggested I try the tarball from Apache, instead of using the
Debian package, but that didn't help either.</p><p>By not running tomcat from eclipse and waiting a long time for the shutdown,
I finally managed to get some actual error messages:</p><p>This was usually the last message I was seeing:
<pre>
INFO: Pausing Coyote HTTP/1.1 on http-8080
</pre></p><p>After some timeout, I now got these messages:
<pre>
Protocol handler pause failed
java.net.NoRouteToHostException: No route to host
</pre></p><p>No route to localhost? WTF?</p><p>I resolved it by putting all hostnames it might try (localhost.localdomain,
although I use that nowhere, as well as the FQDN I'm using) into my
<tt>/etc/hosts</tt> and forcing them to point to 127.0.0.1 - and voila,
tomcat actually shuts down instead of pausing and then failing to notice it
has successfully paused...
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Fri, 12 Jan 2007 12:18 GMT</pubDate>
</item>
<item>
   <title>SVN $Date$ to ISO 8601 in XSLT</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007010902-svn-date-to-iso8601</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007010902-svn-date-to-iso8601.html</link>
   <description><![CDATA[
<p>I'm keeping my homepage in a SVN repository; I'm using the $Date$ variable
to automatically track the last modification date (though it will also change
on minor modifications).</p><p>For the HTML "Date" meta tag, the W3C recommends using ISO8601 date format.
This is the (non-XSLT-2, so no regexp) hack I use for conversion:
<pre>
&lt;xsl:value-of select="concat(substring($string,8,4),'-',substring($string,13,2),'-',substring($string,16,2),'T',substring($string,19,8),substring($string,28,5))" /&gt;
</pre>
Did I mention I hate XSLT? It's lacking so many standard functions, like
date-time processing, regular expressions, exceptions, ... - granted, a lot of
stuff was added for XSLT2, but it still sucks badly. Especially the syntax.</p><p>Here's how to format the date according to RFC 2616, as used in the
last-modified meta tag and HTTP/NNTP/SMTP headers:
<pre>
&lt;xsl:variable name="day" select="concat(substring($string,8,4),'-',substring($string,13,2),'-',substring($string,16,2))" /&gt;
&lt;xsl:variable name="time" select="substring($string,19,8)" /&gt;
&lt;xsl:variable name="timezone" select="substring($string,28,5)" /&gt;
&lt;xsl:value-of select="date:day-abbreviation($day)" /&gt;
&lt;xsl:text&gt;, &lt;/xsl:text&gt;
&lt;xsl:value-of select="date:day-in-month($day)" /&gt;
&lt;xsl:text&gt; &lt;/xsl:text&gt;
&lt;xsl:value-of select="date:month-abbreviation($day)" /&gt;
&lt;xsl:text&gt; &lt;/xsl:text&gt;
&lt;xsl:value-of select="date:year($day)" /&gt;
&lt;xsl:text&gt; &lt;/xsl:text&gt;
&lt;xsl:value-of select="date:time($time)" /&gt;
&lt;xsl:text&gt; &lt;/xsl:text&gt;
&lt;xsl:value-of select="$timezone" /&gt;
</pre>
Note that this does not include any error handling and is not very robust.
You also need an XSLT processor with some of the
<a href="http://exslt.org">http://exslt.org/dates-and-times</a>
extensions such as <a href="http://packages.debian.org/xsltproc">xsltproc</a>.
(Which unfortunately doesn't do XSLT2 yet and doesn't have a regexp extension).</p><p>[Update: Joel Wreed pointed me to his
<a href="http://home.comcast.net/~joelwreed/">libxslt plugins</a>, a regexp
and a exsl.org/dates-and-times plugin. These would help a lot, though IIRC the
date-parse exsl.org spec doesn't support the date format I'd need. (So I
can't just say date-format(date-parse(...),...)). Also he said that they
basically are unmaintained right now. It would be nice if they could be merged
into libxslt, though...]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Tue, 09 Jan 2007 11:00 GMT</pubDate>
</item>
<item>
   <title>Debtags RDF export</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2007010301-debtags-rdf</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2007010301-debtags-rdf.html</link>
   <description><![CDATA[
<p>I've set up some simple
<a href="http://debtags.alioth.debian.org/cgi-bin/rdf/rdf.gz">RDF export</a>
(gzip only, please add caching yourself!) for the Debtags data.</p><p>The export is using the <a href="http://usefulinc.com/doap/">DOAP</a>
("Description of a project") vocabulary, though it isn't optimal (we're not
talking about separate projects, but multiple packages may belong to the same).
The format of the RDF file may (and will!) still change, for example I'd like
to have some explicit URIs for the packages instead of just storing it in the
name tag. Suggestions for a matching vocabulary / serialization are welcome.</p><p>But if you want to play around with some real-life data, just grab a copy.</p><p>There is a couple of interesting things you can do with it, such as my
<a href="http://debtags.alioth.debian.org/cloud/">folding tag cloud</a>. So
go ahead, and play around with it a bit.</p><p>[Update: I've changed the schema a bit, it should be now clearer which
package is being described. I'm using a new namespace for the packages that
is still undefined (that's why it's called "temp")]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Wed, 03 Jan 2007 19:50 GMT</pubDate>
</item>
<item>
   <title>A Web2.0ish christmas to all!</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006122403-web20ish-christmas</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006122403-web20ish-christmas.html</link>
   <description><![CDATA[
<p>Check out the <a href="http://www.mucl.de/~erich/web20christmas/">first web 2.0
christmas card</a> (and hopefully the last, ever!).</p><p>Sorry, I just could not resist doing that... mocking web2.0 on christmas.</p><p>Merry christmas to everbody!
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Sun, 24 Dec 2006 16:34 GMT</pubDate>
</item>
<item>
   <title>Website meta langauges</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006122401-website-meta-languages</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006122401-website-meta-languages.html</link>
   <description><![CDATA[
<p><a href="http://www.bononia.it/~zack/blog//posts/looking_for_a_website_markup_language.html">Zack asked for website meta languages</a> for redoing his homepage.</p><p>Well, I redid <a href="http://www.vitavonni.de/">my homepage</a> last february,
using XML and XSLT. A monster XSLT stylesheet, because I wanted to keep my
template outside of the stylesheet.</p><p>I can not recommend XSLT. Using it for templating is quite messy. XSLT is okay
if you want to transform one representation of the data to another; it's not
if you want to add a lot of surrounding markup and things like a sitemap and
similar navigation tools. This is next to impossible in pure XSLT, it gets
better once you have some extensions (dubbed EXSLT, and supported by pretty
much any xslt processor) or maybe with XSLT2. String manipulation is also a
pain (at least with XSLT1); I gave up on generating a nice "last modified" date
from my subversion tag. Supposedly XSLT2 has some functions that could make
this easier (i.e. for parsing and printing datetime information), but the
common approach with XSLT is to only supply the required minimum of function
you need, and I'm not aware of an easy way to add custom functions, not to
mention any large standard library that can efficiently be used (which is the
true strength of Java, Python and C#, that they bring a huge collection of
pre-written ready-to-use code with them). Of course there are some efforts to
write XSLT libraries (especially for XSLT2), and this aforementioned
<a href="http://exslt.org/">EXSLT</a> is some kind of standard library that
even might be efficiently implemented in some interpreters - you can't rely
on it to be there and to just work. XSLT isn't useless, but when it comes to
presenting data to humans or writing clean, compact code it's not satisfactory
at all.</p><p>I'd give you my Makefile and XSLT file, if they weren't that messy... too many
features; I'm generating two languages, a partially expanded navigation menu,
etc. - my XSLT is 10k, my sitemap currently 4.5k, the template file is 5k.</p><p>Anyway. Half a year ago, my favourite templating language was
<a href="http://kid-templating.org/">KID templating</a>. I used it for some
small projects such as my
<a href="http://www.vitavonni.de/sandkasten/dnsoupdate/">DNSoupdate</a> tool
to edit DNS zones via the DNS protocol (requires a nameserver such as bind
which has support for DNS update, uses encryption). It was a perfect match for
such tiny pages, but I'm no longer that convinced I still like it for larger
projects.</p><p>What's good about Kid: XML templating language, Python based, easy to use in
whichever way you want by writing a few lines of python (i.e. easy to write a
Makefile to generate a static version of you homepage)</p><p>What's not good about Kid: only works with a Python interpreter. I'd prefer to
have a templating language that can be used from multiple languages. However
the Kid syntax relies on Python, which is really bad. Also I'd prefer to have
some component-render model for more complex websites. The current setup is
IMHO okay if you have one default layout and one content layout; but if you
have multiple components that could be combined differently on the pages, it
gets too messy. With my turbogears experiments, Kid was also not very
performant (but that might as well be Turbogears fault).</p><p>Right now, I'd probably still go with Kid. But I've been having a look at JSP
2.1, and JSF / Facelets in particular. There are some things about facelets
that I really like (for example, that they use a proper XML syntax, instead of
this bastardized almost-XML that JSP usually (ab-)uses). There is also some
stuff that I don't like (e.g. the massive overengineering of everything
surrounding it), and I have no idea how easy it will be to generate static
pages in a scripted fashion, i.e. using it without a real webserver. Or I
might just write it all in python, which is a nice language for manipulating
XML, usually.</p><p>Please don't just send me an email with your favourite templating engine. Like
zack, I'm only interested in XML-based templating engines, which does not
apply to most templating solutions out there. Clearsilver for example is
another bastardization of XML. I'm aware of TAL/METAL, and find them quite
interesting, but they were also not having this kind of componentization that
I'm usually thinking in.</p><p>[Update: some people have pointed me to Genshi, which is mostly Kid compatible.
However, it still has mostly the same problems, e.g. the templates being not
reusable in other languages than python, and that certain constructs are a
pain to do (e.g. the page_specific_css recipe with more than one css file).
Others have pointed me to smarty, but it's string-based and doesn't ensure
valid XML output. (Which is very useful for e.g. generating atom or rss output)
For example this is probably valid in smarty:
<tt>{if 0}&lt;b&gt;{/if}&lt;/b&gt;</tt> - allowing errors is bad. Oh, and
smarty is PHP, which is broken by design, a no-go. The best match to my ideas
so far is <a href="http://bzr.sesse.net/xml-template/">XML::Template</a> (for
Perl, Python, Ruby, PHP) which is pretty close to what I've been doing manually
when not using a templating solution. I don't know yet how well it handles
recursion - I need recursive templates for the navigation menu on my homepage.]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Sun, 24 Dec 2006 00:45 GMT</pubDate>
</item>
<item>
   <title>CSS Zen Garden reaches #200</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006121103-css-zengarden-reaches-200</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006121103-css-zengarden-reaches-200.html</link>
   <description><![CDATA[
<p><a href="http://www.csszengarden.com/">CSSzengarden</a> has reached 200 CSS
files. It's an impressive site full of CSS tricks to learn.</p><p>Go there, and just click through a few designs. There are many impressive
designs there. And while most a very different in their visual experience,
they all have the <em>exact same HTML code</em>.</p><p>So please avoid using HTML for layouting purposes. That's what CSS is for.
CSS is much more powerful and does a better job for this, so use it!
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 11 Dec 2006 00:40 GMT</pubDate>
</item>
<item>
   <title>Solving AJAX issues - error handling</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006121101-ajax-issues</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006121101-ajax-issues.html</link>
   <description><![CDATA[
<p>Ajax, when used properly, can be a great user experience.</p><p>Badly written ajax however can be a pain. Often huge javascript libraries are
loaded, it makes your browser and system slow and sometime you just end up
staring at an spinning animated gif for "Loading ...".</p><p>Good Ajax makes the application snappy, responsive, fast, and avoids screen
flicker. But with your traditional "get new HTML page" model, error handling
is done by your browser. DNS issue? Your browser will say server not found.
Connectivity issues? Browser will inform you of the timeout. Slow connection?
our browsers <a href="http://en.wikipedia.org/wiki/Throbber">throbber
[wikipedia]</a> gives you an indication something is happening.</p><p>With AJAX, it's up to the authors of the Ajax application to do proper error
handling. And many AJAX application have serious issue here.</p><p><a href="http://www.alistapart.com/articles/userproofingajax">User proofing
Ajax application [A list apart]</a> is a good article on some basics on how to
improve your Ajax applications.</p><p>Ajax is in the need for some software engineering for QA. Right now, it's so
much low level hacking there, it makes you expect 90% of Ajax applications
have serious usability and reliability issues.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 11 Dec 2006 00:14 GMT</pubDate>
</item>
<item>
   <title>JSF XML syntax for c:url in attributes?</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006112201-jsf-xml-syntax</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006112201-jsf-xml-syntax.html</link>
   <description><![CDATA[
<p>Dear Lazyweb,
What is the JSF 1.2 XML-Syntax equivalent for
<pre>
&lt;img src="&lt;c:url value='/static/foo.png' /&gt;" /&gt;
</pre>
(this will make the URL relative to your application root, not to your
web server; so if the app is installed in /foobar, the resulting URL
will be /foobar/static/foo.png)</p><p>My preferred solution would have a useful src= value so an XHTML browser can
still display the page. Same for CSS stylesheets, links and similar.</p><p>Please send me an email at erich@debian.org, I'll update the entry.
(Comments are intentially not enabled.)</p><p>Thanks.</p><p>[Update: at least in apache myfaces, this should work (untested):
<pre>
&lt;c:url value="/static/foo.png" var="url" /&gt;
&lt;img src="${url}" /&gt;
</pre>
However, the template file won't render approximately in a regular browser.
For easy-to-edit templates it would be nice to have an actual value in the src
property. I have an idea how that could work with custom tags...]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Wed, 22 Nov 2006 20:55 GMT</pubDate>
</item>
<item>
   <title>Future of AJAX</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006101202-future-of-ajax</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006101202-future-of-ajax.html</link>
   <description><![CDATA[
<p>While everybody is still crazy about AJAX - how will its future look like?
Using it is currently a major PITA, and you'll most likely have the user
download a 200k Javascript file just to make it useable for you as a
programming language. JavaScript lacks so much that current programming
languages offer out of the box.</p><p>This includes especially some comprehensive standard library (Java, C# and
Python are all great here), a compact syntax for common data structures (e.g.
set operations in Python or stream operations in C++ with <tt>&lt;&lt;</tt>)
and of course: interfaces!</p><p>Security restrictions of the browsers - <em>intentional</em> security
restrictions to avoid cross site scripting attacks - make interfacing between
different javascript applications rather cumbersome, if at all possible. And
the only way to have "private" functions in JavaScript is also more of a hack
(abusing closures) than a native feature of the language.</p><p>What I'd like to see in the next generation of JavaScript - and browsers
should start implementing that rather soon, so we'll be able to use it in
some 5 years - are proper interfaces especially for cross-site applications,
information hiding, an extensive standard libary, a short syntax for XML
processing and common data structures and pretty much all that every javascript
toolkit reimplements again and again. Oh, and the result shouldn't be Java yet,
but still an embedded scripting language. ;-)
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Thu, 12 Oct 2006 11:52 GMT</pubDate>
</item>
<item>
   <title>Cluster computing with AJAX (aka: ajax-seti)?</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006101001-cluster-computing-with-ajax</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006101001-cluster-computing-with-ajax.html</link>
   <description><![CDATA[
<p>Ajax has shown how viable it is to run client-side computations, while just
downloading the raw data from a server. But Ajax is not restricted to doing
fancy user interfaces.</p><p>It should easily be possible to use an iframe based ad to use the CPU power
of page visitors to do some large-scale computations.</p><p>Can you imagine how much processing power Google could churn up by having its
GMail users do some distributed computing? Or on YouTube. While the user is
watching the video, a javascript does some calculations in the background.</p><p>By keeping data in a cookie, your calculations might even be able to survive
page reloads. And if you're running a large ad network such as Googles', you
might even be able to dect user inactivity. Update a cookie whenever the user
comes onto an adsense page; if he didn't go on such a page for 30 minutes
assume the user is idling and start computation. If he leaves open his web
browser over night you'll get a lot of CPU cycles.</p><p>(Yes, I know that Google is supposedly not in desparate need for free CPU
cycles...)
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Tue, 10 Oct 2006 11:47 GMT</pubDate>
</item>
<item>
   <title>Ajax clouds</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006100701-ajax-clouds</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006100701-ajax-clouds.html</link>
   <description><![CDATA[
<p>I guess the term "ajax clouds" is now more appropriate.</p><p><a href="http://debtags.alioth.debian.org/cloud/">Debtags clouds</a> have
evolved. They're no longer a static page with a single cloud that will
forward you to the more complex browsing tool by Enrico, but now the
cloud will adapt to your previous choices, and allow the selection of
multiple tags.</p><p>The biggest issue probably is tag naming now (e.g. what is the difference
between "role", "use" and "scope" (unfold them to get an idea) or between
"interface" and "uitoolkit" (interface is mostly commandline vs. fullscreen
vs. windowed vs. 3d; uitoolkit is gtk vs. qt vs. whatever) - unless you're
familiar with these terms, you'll probably find it still hard to navigate
the tag cloud.</p><p>Still I hope this inspires you to think of new UIs doable with tag information
(which is a small step towards the semantic web; actually these facets here
are quite similar to RDF triples...)</p><p>These is so much things I'd like to try out with this data...</p><p>If you have suggestions, please share them via email.</p><p>For those interested in the technical stuff: tag clouds are loaded via ajax,
served from a database with ~120 MB of precalculated, precompressed json files.
Precalculation is rather expensive; on my 4+ years old laptop it took about
105 minutes (76 minutes of CPU time). Storing them in the filesystem instead of
a BerkeleyDB hashtable took more than 4 hours. The outmost (i.e. largest amount
of data) set takes 1.1 seconds to compute; there are 344871 precomputed tag
selections, so it precalculated 75 selections per second on average. Yes,
complexity is not linear; benefits from caching large results are huge.</p><p>I'd really love to run a similar interface for e.g. last.fm, but I guess this
would not work as well; their tags aren't grouped in facets. But I have some
ideas to make up for that.</p><p>P.S. This is also my first real Ajax app (except for using json instead of
XML). And I still hate Javascript.</p><p>[Update: I've worked around an issue with opera (which is stricter on
javascript object syntax than mozilla). I havn't tried Internet Exploder yet.
But this is a navigation experiment, not an application to be deployed...]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Fri, 06 Oct 2006 23:02 GMT</pubDate>
</item>
<item>
   <title>More on tagclouds</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006100302-more-on-tagclouds</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006100302-more-on-tagclouds.html</link>
   <description><![CDATA[
<p>Tag clouds are usually done by scaling font sizes according to some weight.</p><p>Actually this is not very precise. For a representative representation (lol,
I should get this domain name. representative-representation.com),
the tag size - that is the surface area! - should be a representation of the
tags weight.</p><p>The suface however doesn't directly depend on the font size, but is more like
<tt>font size * length of word</tt> (length being appropriate for the font
used).</p><p>So when displaying tags with very different font sizes, "egg" and "Technorati"
shouldn't just be scaled by their weight, but also by their word length.</p><p>OTOH, few users will actually be able to "grasp" the actual difference in size.
IMHO it's just about "popular" vs. "obscure" and about making the tool more
intersting to use.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 02 Oct 2006 22:20 GMT</pubDate>
</item>
<item>
   <title>Folding tagclouds (cont'd)</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006100301-folding-tagclouds</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006100301-folding-tagclouds.html</link>
   <description><![CDATA[
<p>I've been working a little bit more on the
<a href="http://debtags.alioth.debian.org/cloud/">folding
tagcloud</a> for Debian packages. I've added closing of folds, and the code
for displaying selected tags as well as matching packages is in place, too
(you'll need to use a different .json data to actually see results though).</p><p>To make it truly interactive, i.e. allow the selection of multiple tags
until you get some results, I need to add more data files.</p><p>So I'll have to decide now if I'm going to use a CGI (the "traditional"
method), which will likely need to have some caching, or if I'll just
precalculate everything into static .json files. I could even store them
as .gz on the webserver; any browser with ajax capabilities should be able
to do gzip decompression on the fly. This would offer maximum performance and
security, but it means I'll need more magic in the javascript (and Javascript
is <em>ugly</em>). Or I'll do a combination of both, use a tiny CGI serving the
precalculated data; the CGI could then easily be replaced with a
dynamic-caching CGI later.</p><p>SVG rendering for the tag cloud would probably be also very cool. With some
smart layouting algorithms, it could become much more cloud-like. And there
could probably be a nice animation when "subclouds" are unfolded, pusing away
the other folds. However, that would be much slower. Any animation means a
slowdown, since it adds extra delays.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 02 Oct 2006 22:13 GMT</pubDate>
</item>
<item>
   <title>Folding tagclouds</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006092802-folding-tagclouds</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006092802-folding-tagclouds.html</link>
   <description><![CDATA[
<p><a href="http://debtags.alioth.debian.org/cloud/">Folding tag
clouds</a> of <a href="http://www.debian.org/">Debian</a> software packages.</p><p>I'm trying to make tag clounds workable with a large number of tags
(<a href="http://people.debian.org/~erich/debtagcloud/tags.html">unfolded
tag cloud for comparison</a>) by folding them into subtopics.</p><p>Yay, when we're done with tagging the Debian packages, we'll have a great
new way of browsing available Linux software. Linux doesn't <em>lack</em>
software anymore, it has so much software, you just don't find what you need.</p><p>The next version of this will probably allow you to select multiple tags,
and update the tag cloud upon selection. So if you choose "GTK" ui toolkit,
the "QT" toolkit tag will become quite small, etc.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Thu, 28 Sep 2006 16:05 GMT</pubDate>
</item>
<item>
   <title>Is your mail reader SVG-enabled?</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006091602-svg-enabled-mailreader</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006091602-svg-enabled-mailreader.html</link>
   <description><![CDATA[
<p>From the debian changelogs:
<pre>
thunderbird (1.5.0.5-2) unstable; urgency=low
  * new package: thunderbird-dbg
    + enable svg
</pre></p><p>Whoa, when will we start seeing SVG spam?
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Sat, 16 Sep 2006 11:30 GMT</pubDate>
</item>
<item>
   <title>Ajax is devolution</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006082801-ajax-is-devolution</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006082801-ajax-is-devolution.html</link>
   <description><![CDATA[
<p>Ajax isn't progress, it's actually a step back.</p><p>Most people have been happy to actually use some sane advanced language for
doing web sites. There is Java, JSP, PHP, Ruby, Python, Perl, ...</p><p>And then there is Ajax. One of the main components is JavaScript. That
language most of us would have loved to forget. Forever.</p><p>Ajax brings you back into the dark ages of internet development, with browser
incompatibilities, it features
<a href="http://www.google.com/search?q=ajax+memory-leaks">memory leaks</a>,
and many Ajax apps (such as <a href="http://live.com/">live.com</a>) will run
unbearably slow on older computers. Such as my 1.8 GHz P4M laptop. If they
work at all - live.com just shows "Loading..." for me right now.</p><p>Have you ever looked at some of the actual Ajax code?</p><p>This isn't the future, this is the dark ages coming back!</p><p>[<a href="http://mjr.towers.org.uk/blog/2006/webcss">MJ Ray replies</a> that
I didn't mention accessibility issues with Ajax. That's true; I'm aware of
them, but this post focuses on Ajax doing away with all the advances in
programming languages and software engineering... The in-accessibility of Ajax
stuff is worth an own blog post sometime]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Mon, 28 Aug 2006 13:46 GMT</pubDate>
</item>
<item>
   <title>Fun with google.</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006082501-stupid-google-searches</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006082501-stupid-google-searches.html</link>
   <description><![CDATA[
<p>Talking about logic, and where people fail at understanding it...</p><p>Fun with stupid Google queries - Is there any page about
<a href="http://www.google.de/search?q=google+-google">Google and not
Google?</a></p><p>Wonder how big the Google index is?
<a href="http://www.google.de/search?q=google+OR+-google">Google OR -Google</a>.</p><p>(I'm aware that the count is imprecise, it's just funny that this query is
actually processed.)</p><p>Now let's look at MSN.
<a href="http://search.msn.de/results.aspx?q=google">google</a> 64741016<br />
<a href="http://search.msn.de/results.aspx?q=-google">-google</a> 4669973284<br />
okay. that should make "4734714300"...
<a href="http://search.msn.de/results.aspx?q=google+OR+-google">google OR -google</a> 5156927770 - oops, magic new hits.<br />
<a href="http://search.msn.de/results.aspx?q=-google+OR+google">-google OR google</a> 68735504<br />
So where are the 5 billion pages that have "google or -google" but not
"-google or google"?</p><p>And how about
<a href="http://search.msn.de/results.aspx?q=MSN+OR+-MSN">MSN OR -MSN</a> - there can be only one.</p><p><a href="http://search.msn.de/results.aspx?q=askdjfhsakdf+OR+-askdjfhsakdf">askdjfhsakdf OR -askdjfhsakdf</a> - top result for this garbage word: Google!</p><p><a href="http://www.google.de/search?q=askdjfhsakdf+OR+-askdjfhsakdf">askdjfhsakdf OR -askdjfhsakdf</a> Google results are consistent. I wonder what their
sorting is in this case... random? hash function? age?</p><p>Remember Googlestossen? Like Googlewhack, but with scoring. There must be only
one hit with both words; score = # of hits with word 1 * # of hits with word 2
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Fri, 25 Aug 2006 14:30 GMT</pubDate>
</item>
<item>
   <title>Optimizing page loading</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006082501-optimize-page-loading</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006082501-optimize-page-loading.html</link>
   <description><![CDATA[
<p>A typical web page will consist of dozens of files - images, javascript, CSS.</p><p>Web browsers usually load 2 files in parallel (recommended by the HTTP/1.1 spec
to use max 2 keep-alive connections). If you are including many javascript
files in your <tt>&lt;head /&gt;</tt> element, these will probably be loaded
first, the images second. This is the effect of images appearing "late" over
slow connections.</p><p>However, if your page can be displayed without the javascript (which is very
recommendable because of accessibility issues), you might want the browser to
load the images first. If your page totally relies on JavaScript - bad luck
for you.</p><p>In order to improve your load times, you can use some simple techniques. For
example by putting images on a separate server (e.g. images-amazon.com, yahoos
yimg.com, static.flickr.com, photos1.blogger.com).</p><p>[Update: I'm not suggesting you (ab-)use one of these sites for hosting your
images, but these are examples of big services using this technique. Blogger,
btw, has a referrer filter, so it won't work. And it breaks "planets", so
I actually recommend you to use a different blogging service.]</p><p>This is a common practise for large sites, for several reasons:
<ul>
<li>A web server serving exclusively static images can use a more lightweight,
more performant web server software</li>
<li>It can benefit from different caching strategies</li>
<li>It can benefit from different hardware choice (e.g. slower CPU but more
memory; raid not really needed, better to have two servers etc. pp.)</li>
<li>Images usually make up a large chunk of the traffic - but they're easy to
replicate to a worldwide mirror network</li>
<li>Authentication, cookies etc. don't need to be sent</li>
<li>Might be outsourced, since it doesn't contain business data</li>
<li>Images can be loaded via separate http connections</li>
</ul></p><p>The last point is the inspiration for this posting - while your web server is
still busy building some dynamic web page or serving some Ajax requests, your
image server could already be sending out the images.</p><p>This might give the user the impression that your web site is much faster.
Most users are broadband - but they still have some latency. In fact, latency
has increased for many users with broadband due to interleaving on DSL lines,
for example; ping is higher with regular DSL lines in germany than it was with
modems or ISDN.</p><p>Some relevant pages:
<a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html">HTTP/1.1 Pipelining</a> [w3.org],
<a href="http://www.mozilla.org/projects/netlib/http/pipelining-faq.html">Mozilla pipelining FAQ</a> [mozilla.org]
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Fri, 25 Aug 2006 11:16 GMT</pubDate>
</item>
<item>
   <title>Stepping beyond tag clouds</title>
   <guid isPermaLink="false">http://blog.drinsama.de/erich/en/xml/2006082301-stepping-beyond-tag-clouds</guid>
   <link>http://blog.drinsama.de/erich/en/xml/2006082301-stepping-beyond-tag-clouds.html</link>
   <description><![CDATA[
<p><a href="http://en.wikipedia.org/wiki/Tag_cloud">Tag clouds</a>
[en.wikipedia.org] are a current must-have for web 2.0 applications.</p><p>Examples can be found for <a href="http://del.icio.us/tag/">bookmarks</a>,
<a href="http://www.technorati.com/tag/">blog entries</a>,
<a href="http://www.flickr.com/photos/tags/">photos</a>,
<a href="http://www.last.fm/explore/">music</a> or
<a href="http://www.librarything.com/tagcloud.php">books</a>.</p><p>Tag clouds are hip, because they're a dynamic feature and show the
"<a href="http://en.wikipedia.org/wiki/Zeitgeist">Zeitgeist</a>" [wikipedia].
They given an overview on the users ("California" in flickr) or on current
hot topics ("Israel", "Lebanon" in technorati).</p><p>However, tag clouds also have severe limitations.</p><p>First of all, they're arbitrarily ordered. Usually alphabetic, so there is
no content relationship among the entires.</p><p>Secondly, they only show an excerpt, since there are usually much more tags
than fit on the screen.</p><p>Thirdly, they're atomic information, whereas relations as used e.g. in
<a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a>
[wikipedia] can convey much more complex information.</p><p>I'm trying to push tag clouds to a next level. They're a gimmick right now,
but maybe we can make them to a powerful navigation tool?</p><p>Together with <a href="http://www.enricozini.org/">Enrico Zini</a> I've just
created my first tag cloud (I've skipped making a tag cloud for my blog...).</p><p>Well, it quickly evolved beyond a tag cloud. You could maybe call it a tag
sky. Or tag forest.</p><p>I'm not using my blog or something like this for the tag cloud. That would be
quite boring, I'm not doing real tagging on it. Instead I'm using <em>software
tags</em>. The <a href="http://debtags.alioth.debian.org/">Debtags</a> project,
led by Enrico and I, has been working on software tags for some years now
during our spare time. We have about 600 tags in a dozen of facets, and 15000
software packages (I don't have the number ready how many of that are somewhat
tagged already). Well, the tagging efforts are still far from complete, thats
why we're currently working on an AI to assist tagging efforts, too.</p><p>We generated two different renderings of the tag clouds for you:
<a href="http://people.debian.org/~enrico/2006-08/debtagcloud.html">one
separated cloud per facet</a>, and
<a href="http://people.debian.org/~enrico/2006-08/debtagcloud-folded.html">all
folded into one big cloud</a>. Oh, and actually click on one of the tags,
it will take you to a more complex tag-based navigation tool and a tagger.</p><p>So what makes these different from the usual tag clouds you see everywhere
(apart from the sheer size, sorry about that. Maybe we'll add buttons next
to hide/show tags with low occurrence numbers)?</p><p>Well, the tags next to each other aren't completely unrelated any more, since
they are (in both renderings) grouped by their facet. This makes it easier
to locate something - go through the red facets first, then look at the tags
in the group.</p><p>I'm thinking about a second step, which would involve dynamic expanding details
in the tag cloud, or hiding them, finally transforming the tag cloud into
a true navigation utility beyond a "single click filter".</p><p>In my final diploma thesis, one of the topics to work on suggested by my
professor is doing "tag clouds" (i.e. weighted lists) for relations. The
prototype will likely be integrated with the
<a href="http://ikewiki.salzburgresearch.at/">IkeWiki semantic wiki</a>.
I don't have a clear vision of how the "relation cloud" will work or look like,
but I havn't started with my tesis yet anyway. I currently imagine up to three
clouds (corresponding to the empty places in the relation) that will
dynamically adopt to the choices already made by the user. Some zooming will
probably be needed, too.</p><p>Another use of tag clouds would be a visualization of the AI - the weights
could be chosen by how sure the AI is about this tag; the cloud would then
describe the AIs rating of a software package description.</p><p>If you have some ideas, good links, relevant papers or other feedback, just
send me <a href="mailto:erich@debian.org">an email to erich@debian.org</a>.
Thank you.
</p>
]]></description>
   <category domain="http://blog.drinsama.de/erich"></category>
   <pubDate>Wed, 23 Aug 2006 02:44 GMT</pubDate>
</item>
</channel>
</rss>
