Vitavonni

Wed, 19 Jul 2006

Google and the semantic web

Worth reading: article on Googles Peter Norvig and Tim Berners-Lee over the Semantic Web.

While Google is obviously right about "millions of web masters" who will have trouble adopting to these new standards - thats why we're writing software.

The days when you were hand-writing HTML code will be over some day. Already now - as Google acknowledged - many people fail to write proper HTML. But HTML isn't becoming easier to write. In fact, it gets much more complex with different character encodings, new CSS versions, raised bars for layouting, more dynamic web pages, Ajax, ...

More and more people won't write HTML themselves anymore, but use some software. People used to write bad HTML; now people use tools such as Wordpress which are at least expected to produce valid markup. They start using visual editors, which will eventually stop using tags such as <i /> and use the "more semantic" <em /> instead. Without actually being aware of that. And their blog software also does generate RSS for them. How many people have ever written RSS by hand?

So it's mostly a matter of the tools we offer them; with better tools we can push the use of better ("semantic") formats which then make data reuseable for others as well.

For example, tons of people hope that friendster, openbc, linkedin and all these will help them in one way or another to keep contact with some people, sell some products, find new jobs. What these web pages basically collect is FOAF data. (And, btw, if any of these web sites were true Web 2.0, they would actually export FOAF files via some API!) They have a UI people understand; now all they would need to do is share their data, and we'd have a large body of "semantic web ready" FOAF data.

Similar things apply to other "semantic" formats. Think calendards. Ical is pretty much the standard and widely used. Almost noone uses web pages which are only readable by humans.

The semantic web isn't dead or anything. It just takes some time to be widely adopted, but that was to be expected. And having tools to generate semantic data that are maybe even easier to use than non-semantic tools - after all, the computer should be able to assist you more with semantic data - is the key thing to success.

I'm also looking forward to Semantic wikis such as IkeWiki, that try hard to make entering semantic data as easy as editing a non-semantic wiki page. In large wikis such as Wikipedia, making useful links is not as easy as typing [MagicWord], because you first have to look up the magic word. A semantic wiki can assist you by suggesting appropriate links based on the information you've already entered. (e.g. if you mark a page as biology, it won't suggest you it might be a computer part).

[category: /en/xml | Permalink]

Do you remember Google Pages?

Back in february, Google launched it's page creator. Still the best Ajax web page editor I've seen so far. Still barely anyone knows it. After the initial hype (during which it was temporarily unavailable), noone talked about it any more. Actually it's a shame; this is probably the easiest way to setup a web page.

It will be interesting to see what Google's strategy is with respect to products such as this. Let it die? Make some marketing campaign? Do some cross marketing (e.g. adding it to the "Mail / Calendar / More" bar you have at the top of mail and calendar pages)?

Actually, there are tons of Google apps very few people know; only a few have been a big success (search, mail, maps, earth, news).

They certainly could use some marketing. But maybe they have a reason to wait. Imagine Google starting a marketing campaign with all their cool products ("out of beta now") just a few weeks before Vista is released...

[category: /en/xml | Permalink]

live.com is slooooow

live.com, Microsoft's latest branding experiment (read: their new Passport+MSN+whatever) features a new version of their search engine. Some time ago they promised they'll be the leading search engine this year...

It's unuseable. They've packed it with so much AJAX that it's so slow I can't really use it on my computer (which has 1.8 GHz). I'll remain a Google user. Without even looking at your results, just because I can't really use your fancy schmansy scrolling CPU-eating thingy.

Some fun from March, Reuters (original article no longer available):

"What we're saying is that in six months' time we'll be more relevant in the U.S. market place than Google," said Neil Holloway, Microsoft president for Europe, Middle East and Africa.

More relevant with Vista users that is, probably. ;-)

If you want to see an AJAX-using search engine which is useable and useful, try Exalead. They have thumbnails, clustering, and still a pretty nice interface.

[category: /en/xml | Permalink]

nofollow followup.

Micah Dubinko blogs on rel="nofollow" being a failure. I have to agree that this doesn't really stop comment spammers currently. I still believe that it will help on the long run, since most blog software will now use it by default. And people have to upgrade from time to time due to the lastest PHP security issue...

However, I would like Google or other search engines to offer some "spam submission". It would be best if they had a common submission server.

Then we could use the moderation tools of our blogs to submit these bad URLs to Google, who in turn could take measures to strip them off their index (e.g. by moderating indexing for these URLs, reducing their pagerank, freezing their ranking for some weeks etc.)

I'm of course aware of abuse possibilities (e.g. spam with your competitors URL); but most comment spam I've seen so far uses new sites anyway. I think you can detect these by charcteristical URLs, site contents, incoming links etc.

As for myself, I've never enabled comments on my blog for the very reason that I'm not willing to moderate all that spam. In my guestbook (which is mostly to reduce the number of emails I get because of certain popular contents of my german web page; people will happily post in my guestbook instead of sending me an email now), I've recently enabled a very simple filter. It will just reject any entry which contains http:// anywhere. My users don't need that, and it has reduced spam to 0 again. And I don't care for the entries there anyway. ;-)

[category: /en/xml | Permalink]
Menu
[planet.debian]
[planet.xmlhack]
[planet SELinux]
[munichblogs]
[email]
[RSS 2 feed]
[English RSS 2]
Categories
< July 2006 >
SuMoTuWeThFrSa
       1
2 3 4 5 6 7 8
9101112131415
16171819202122
23242526272829
3031     
Archives
2010-Mar
2010-Feb
2010-Jan
2009-Dec
2009-Nov
2009-Oct
2009-Sep
2009-Aug
2009-Jul
2009-Jun
2009-May
2009-Apr
2009-Mar
2009-Feb
2009-Jan
2008-Dec
2008-Nov
2008-Oct
2008-Sep
2008-Aug
2008-Jul
2008-May
2008-Apr
2008-Mar
2008-Feb
2008-Jan
2007-Dec
2007-Nov
2007-Oct
2007-Sep
2007-Aug
2007-Jul
2007-Jun
2007-May
2007-Apr
2007-Mar
2007-Feb
2007-Jan
2006-Dec
2006-Nov
2006-Oct
2006-Sep
2006-Aug
2006-Jul
2006-Jun
2006-May
2006-Apr
2006-Mar
2006-Feb
2006-Jan
2005-Dec
2005-Nov
2005-Oct
2005-Sep
2005-Aug
2005-Jul
2005-Jun
2005-May
2005-Apr
2005-Mar
2005-Feb
2005-Jan
2004-Dec
2004-Nov
2004-Oct
2004-Sep
2004-Aug
2004-Jul
Other links:
Swing and the City - Lindy Hop in Munich