Vitavonni

Sat, 28 Aug 2010

Facebook closing up for third party developers?

It looks a lot as if Facebook is closing up on third party developers. Seems like they've cherry picked the features they want to copy, and now are no longer interested in third parties anymore:

  • They removed the "publisher" feature, which allowed applications to offer custom story forms. They said it wasn't used a lot - the reason was simple, they placed third party publishers in a tiny dropdown in the "attach" section, second class to their link sharing and even crappy 'gift' application. Of course: otherwise nobody would buy their birthday "gifts".
  • They're removing the extra info boxes you could put into your profile. This is a key feature for two of my applications, which allowed people to publish a calendar feed in their profile.
  • Places is aiming at killing FourSquare and similar on the long run.

Expect once-famous Facebook applications such as SuperPoke to disappear on the long run - their integration hooks have been pulled. The API gets more and more limited for the cash cow use case: games. And of course marketing. Oh, I forgot one type of "applications". The one that basically embeds you regular web page within a Facebook tab. Wow. That's technology!

If it was really about the users, Facebook would seriously fight Facebook spam, worms and fraud. They still don't have that under control (and none of them used either of the removed features!). They do even less to prevent users from clicking on such bad links such as the Free iPad scam, which has been around for at least a week now. Or the famous "dislike button" scam. They could not spell much more clearly that they do not care about their users - they just want your data to sell more directed advertising to you. Forget about privacy, too!

Now that all my three Facebook applications are pretty much dead, this comes with a good side for me: I've been considering to disable/suspend/kill my Facebook account for quite some time. But I wasn't sure if that would also kill my applications for their users. Now that they're dead anyway, there is not much keeping me anymore.

[category: /en | Permalink]

Thu, 22 Jul 2010

Dropbox revisited

Like one or two years ago, I got myself a free Dropbox account. I use it occasionally to exchange files with some friends. It does an excellent job at that. Basically, it works like any network folder. Except it is not within a local network, but over the internet.

Dropbox serves me as an occasional replacement for both email and USB pen drive. Instead of sending a modified file by email or bringing around a dozen of photos on a USB stick, I just put them into the shared folder. This also very useful for files such as flyers, templates. They may easily grow to big to be sensibly sent by email.

For all my use, a regular free Dropbox account is more than enough. As in, I have less than 1 GB in use, and I could easily free some of that space. You start at 2 GB, and can currently get up to 10 GB free (+250 MB for each referrer; if you use the link above you'll already start at 2.25 GB). But for my use, this is already a lot.

Note that there are a couple of alternatives around. For Linux (only) there is Ubuntu One, and there is Zumo Drive and Box.net just to name a few. I havn't tried them, given that Dropbox works for me well enough.

[category: /en | Permalink]

Mon, 14 Jun 2010

Facebook clickjacking (aka: Facebook worms)

It doesn't look like Facebook gets its clickjacking issues under control. The various so called "Facebook worms" keep on reappearing every couple of days on new pages. Users don't get the concept of a "hidden like button" causing these issues, and Facebook obviously doesn't want to shut down the "like" functionality, since it will make them the ultimate heavyweight on the advertising market: which other company can give you such detailed demographics as Facebook can for your web sites?

So far there seem to be two protections available:

  • The NoScript extension for Firefox includes Clickjacking warning
  • Blocking the Facebook "like" functionality via web filters (which I recommend for privacy reasons anyway) including Firefox AdBlock (Note: Chrome/Safari/WebKit AdBlock just hides, doesn't block!) and Privoxy

Maybe the Antivirus companies should step up here, too - and on one hand, block the Like function to stop this worm from spreading, and on the other hand, prevent Facebook from spying on their users.

It's already making the news that Facebook doesn't get these issues under control.

[category: /en/web | Permalink]

Sun, 30 May 2010

BPM tap toy update

I've updated my BPM tap toy, which started behaving strangely since some GTK update. I've identified two issues, and it is working again now.

The BPM tap toy is a tiny Python script to obtain a BPM estimation by tapping on your mouse or keyboard. In contrast to many other such applications it does give you an error estimation and will visualize your precision in tapping. This way, you can judge how precise the value is.

I have some ideas to further increase precision while adding another useful visualization to judge the result quality.

[category: /en | Permalink]

Wikipedia contributors are one of a kind

The last weeks, I've occasionally been contributing to Wikipedia. I must say, Wikipedia is a really odd world of its own. Another thing I found quite surprising is the difference between the English and the German Wikipedia.

Some things I noticed:

  • Some information in Wikipedia is grossly inaccurate. Interestingly, this even applies to computer science topics as simple as database indexes.
  • A lot of the time, you meet users that 'defend' 'their' pages to any changes, as if they were the sole expert on this subject
  • There appear to be plenty of zealots^W users with their life focused on deleting things from Wikipedia they do not consider "notable". This goes as far as the German Wikipedia having an initative called "Improve, don't delete" or the "Article Rescue Squad" in English Wikipedia
  • English Wikipedia suffers a lot from link spam, despite "nofollow". In 'hot' (commercially, not on Wikipedia) topics such as "knowledge management" (yes, the "knowledge management" article on Wikipedia is crap, despite Wikis often being called "knowledge management"), half of the edits are link spam and reversals. There also is a lot of bookspam and citation spam, resulting in badly written articles where half of the article is unsorted and only partially relevant literature. In German Wikipedia, this seems to be a lot less of an issue.
  • In general, German Wikipedia seems to be a lot more patrolled against changes (in particular links), but often is both less complete and less accurate.

So on overall: keep away from Wikipedia contributors, they're all maniacs. And often, don't bother to read a Wikipedia article if you can get an appropriate textbook. The Wikipedia article will just try to sell you a dozen textbooks anyway; you'll also have to read them to check the validity.

It is a pity, that despite its size and "eyeballs", Wikipedia so far seems to have not attracted much attention by actual "domain experts", but it seems that it is largely filled by bureocrats, zealots and promoters. (That don't have any real work to do?)

There are many things wrong here, that have been pointed out by many others, too. I'm not going to rescue the Wikipedia world, either. And yes, I am aware that you can discuss in Wikipedia, too.

  • Good articles get more contributions. Nobody cares for the black sheep. It probably gives just much more gratitude to turn a good article into an excellent one, than turning a bad one into an okay one.
  • Wikipedia at the same time tries to be exhaustive and maintain a certain "notability". This ends in people creating articles (because they're missing) and others then deleting them (because they consider them "not notable"). This is very discouraging to users that might even be experts on that subject they have been adding. (See also "the rich get richer" in this blog rant)
  • New articles get extra attention, and often are attacked quickly for either quality or notability (despite WP:IMPERFECT), while articles that have been bad for years are barely re-checked. The categories listing Articles that need expansion and Articles that need cleanup are getting out of control, at 58000+ articles for the latter. Articles "lacking sources" is at 340.000+ articles. That is 1 out of 10.
  • In fact, Wikipedia even has 138 "cleanup categories". Did I mention it is run by bureocrats, not by domain experts?

I fear that Wikipedia will go the DMOZ road. There was a time that DMOZ was doing quite well. Nowadays, large parts of DMOZ are dead. For two main reasons: it's hard to get in, and the backlog is way too large. If you get into DMOZ for a larger category, you'll be faced with thousands of pending link submissions, where for a large part you don't feel qualified to judge on appropriateness or rewrite their description in a neural manner. I have the impression that the same is happening for Wikipedia: on one hand, users that join are often kicked badly for many of their first contributions and will just leave again. At the same time, many of the old articles are in desparate need for attention, but nobody of the established users is willing to spend the days of cleanup/rewriting needed to get the article into a useful shape again. And a new user will never dare to discard most of the existing article; they usually just add or modify single paragraphs to see what happens. So Wikipedia might be hitting some kind of barrier.

Still, I have to admit that I frequently use Wikipedia to look up things. Usually because it just comes up first in Google. I then often follow links to better resources, such as MathWorld. And I wish, Google would have taken me there right away ...

And don't assume I'd know how to run things better. I'd sure propose to spend more time at fixing existing articles instead of attacking new contributions that much (you'll lose contributors this way). But I also see the need for fighting spam (although you should also remove old spam ...). But I don't know if there is a solution that will actually attract domain experts to re-write all the badly written and inaccurate articles that don't have their personal zealot to patrol them.

P.S. Sorry, no comments on my blog. This isn't Facebook. Instead of commenting on my blog, how about working on Wikipedias backlog instead?

P.S. Another example is the German Wikipedia story surrounding Fefe's blog. Many here at the open source communities will know fefe for his work on Dietlibc, libowfat and similar highly respected open source projects. His blog has been famous for being high-quality in security, privacy topics, politics and media critics. Some of his fans (likely) started a Wikipedia page on his blog. There have been at least two huge discussions about deleting the Wikipedia page. "sock puppets" and all such things have been brought up, while Fefe himself was amused. The discussions around deleting the blogs page on Wikipedia made it all the way to the print newspapers. As I said, I'm not actually bringing up new critique on Wikipedia. It's an ongoing problem for years now.

[Insert random Deletionism rant against Wikipedia]

[category: /en | Permalink]

Tue, 25 May 2010

Facebook and privacy

Mark Zuckerberg apparently recently promised

we will add privacy controls that are much simpler to use.

Dear Facebook. I believe your users want more privacy, not just prettier controls. They don't want you to give away their data by default, track them across web sites (that embed the "Like" buttons) and auto-connect them to various things.

In particular, they want opt-in, not opt-out. And they also don't want to be auto-opted in by you either. Make the defaults all opt-out!

They don't want you to index and publish their data by default like if you were a paparazzi and they were a pop star. Sure, they'd like to be a pop star, but without the paparazzi stuff, you know.

There is an essential difference between actually caring about privacy, and just making it look nice (which unfortunately is what Facebook currently does, pretend it's all fine, because you could turn it off, if you go to all the sites you don't want and turn it off on each single one ...).

[category: /en | Permalink]

Tue, 18 May 2010

Apple announces new iPad

In a surprise move Apple today announced a new product closely related to the behyped iPad. This move went completely unexpected with technology analysts, who were expecting a new iPhone dubbed "iPhone 4G" to be released next.

The new product is called iPad mini, and features most of the iPad functionality in a lot smaller form factor, with extended battery life and without the overheating issues. It will focus on gamers, instead of the deprecated community of newspaper readers. The smaller form factor allows the users to bring the device along at any time.

Get all the specifications of the new iPad mini here [www.apple.com]. There is also a special edition with voice-over-3G functionality [www.apple.com].

Yes, I'm referring to the iPod touch and the iPhone. Just goofing around.

[category: /en | Permalink]

Sat, 15 May 2010

Beware of the "startpar" bug!

UPDATE: the bug is already fixed after a few hours, and only affected a minority of users (of a now deprecated, experimental option in the 'unstable' distribution, and only users that rebooted with the affected version).

The sysvinit version that hit unstable today has a grave bug if you have been running "startpar" or maybe "shell" style parallel booting. Read this bug report, if you have been using these (they were not enabled by default, so unless you've been giving parallel boot a try before, you should be ok.)

How to check if you are affected:

grep CONCURRENCY /etc/default/rcS
If this command says "startpar", then you ARE affected. If it says "shell" you MIGHT be affected. If you have not set CONCURRENCY or if it's "none" or "makefile", then you should be ok (according to the bug).

The workaround is as simple: just put either "none" or "makefile" in there, these are the only two values that are still distinct.

How to recover a broken system:

  1. Boot recovery mode aka "single-user". At some point you should be asked for the root password. Login.
  2. Run mount -o remount,rw / to enable write mode on your disk.
  3. $EDITOR /etc/default/rcS and change the value of "CUNCURRENCY"
  4. reboot
You should have a working system again.

I can only confirm that changing "startpar" to "none" helped me. I havn't tried "makefile" yet, and "none" seemed more likely to fix things.

How to block "Facebook Like" tracking in Chrome

Since AdBlock in Chrome does not block, just hide (the same probably applies to Safari and other WebKit-based-browsers), here's a simple method to actually block Facebook Like tracking:

Use a proxy.pac file, also known as Proxy auto-config. Then redirect Facebook Like to a blackhole or filtering proxy. I use privoxy, and this replaces Facebook Like embeds with an error message, which enables me to see which site uses Facebook like and that my filter is working.

Here's the relevant excerpt of my proxy.pac file:

function FindProxyForURL(url, host) {
  var fblike = /https?:\/\/([^/]*)\.facebook\.com\/(plugins|widgets)\/like.*/;
  if (fblike.test(url)) {
    return "PROXY 127.0.0.1:8118";
  }
  return "DIRECT";
}
Where "127.0.0.1:8118" is the proxy to use. If you use an unreachable proxy - I've seen 255.255.255.0:3421 used as blackhole server - then it should just time out as "unreachable". Or you use a proxy such as privoxy and block the URL there. Any proxy that refuses to serve the request will do.

Note that you can add arbitrary domains and regexps to this filter, if you want to block additional sites, such as Google Analytics, that you do not want to be able to track your surfing behaviour.

[category: /en/web | Permalink]

Tue, 04 May 2010

Facebook tracking users via "like" function.

Facebook recently launched the "Like" function, which basically can be embedded into arbitrary web sites. Naively, it does two things:

  1. Provide a "Like" button for sharing the web site
  2. Show you how many other users like the web site
Sounds good, doesn't it?

But now reconsider: even when you don't use the "Like" function, facebook is in fact notified of which web sites you visit!.

Encouraging you to "share" content with friends is the hanger for this function. This is what makes web masters install it on their sites: they expect to get some extra traffic from your friends, so they just add it.

But whether you like it or not, it basically allows Facebook to track your complete web viewing habits. And it's the target web site that opts in, not the user! Combined with all the personal information Facebook already has on you, this is a major privacy concern. Combining this information might even be illegal in some countries (but probably not in the US where Facebook lives, privacy unfortunately has a low role here).

The best workaround currently is to blacklist Facebooks "Like" function using some kind of AdBlock, for example using this element filter:

IFRAME[src^="http://www.facebook.com/plugins/like.php"]

But in general, we should try to make this kind of data aggregation illegal without explicit consent and force Facebook to make this an opt-in feature. Political work needed here ...

P.S.: Make sure to check if your AdBlock actually blocks and not just hides. As far as I can tell, WebKit Adblock, including Chromes, only hide ads. Firefoxs AdblockPlus seems to be more powerful.

P.P.S.: yes I've read the claim that Facebook doesn't track. No wait, all they basically said was that they are not going to announce at F8 they will be selling web surfing behaviour based ads to their customers. They actually did NOT state (or guarantee) they will NOT use data mining on this data. Just that you probably will not be able to buy eyeballs based on rules such as "has visited/liked disney.com" ...

P.P.P.S.: I've been told that facebook.com/widgets/like.php also needs to be blocked, since some sites use this URI scheme. And of course, Privoxy and similar privacy-increasing proxies are a useful addition, too.

[category: /en | Permalink]

Fri, 30 Apr 2010

Oil spill: fortunately it was "just oil" - and not nuclear.

Face it: technology fails. Sometimes. Sooner or later.

Statistics are a nice way to fool people, because they can't interpret the numbers properly. And usually this is done to make things sound less risky than they really are. But it's just too easy to lie with numbers. Say, they risk of a major incident at a plant is less than 1% (per year). Sounds good, doesn't it? But if you have 100 plants, that means it will happen about once a year. Make that 1000. Or even more.

There are currently around 440 nuclear reactors in use for electricity. Probably some more the IAEA doesn't know about. There are definitely extra reactors in various military vehicles including unmaintained russian submarines. So make that 500. So looking forward to the next 200 years, we want to make sure things are not likely to happen at a rate of 0.001%. Unless our atomic plants are way more than 99.999% secure, we're likely to see a major nuclear accident sooner or later.

Which brings us to a totally differnt scale when we look at photovoltaics or wind power. There are just so many more of these plants, that you can bet on some of them failing completely every day. The good news about these plants is that their damage potential is just so much lower. They might be as much as thousandfold more likely to fail than a nuclear plant. And there are probably a thousand more, too. But even if we completely underestimate the risk they just don't do as much damage even in the worst case. Much unlike a nuclear meltdown or such an oil spill. I bet noone would have thought of oil drilling being this risky. After all, it has been done for years.

So how is your risk assessment of nuclear power? Is it less risky than oil drilling?

[category: /en | Permalink]

Mon, 12 Apr 2010

Removing modlogan

Unless someone drops in as new maintainer, I'll file for removal of ModLogAn from Debian soon.

The software has been abandoned upstream for, well, a couple of years. It still works okayish (just the patterns need refreshing), and in fact I'm still running it. But there is plenty of software to replace it, and it seems as if many people go the Google Analytics way today.

Please speak up quickly if you care about ModLogAn, otherwise it's gone from Debian soon.

Mon, 01 Mar 2010

Geo-Temporal visualization

I've been playing around a bit with Geo-Temporal visualization. Here's a screenshot of an experimental visualization on Google Maps:

Geo-Temporal visualization

The icons are placed on approximate coordinates; multiple events in a small area are aggregated into a single marker. The red sectors correspond to temporal information: to the right is the current day, a full turn corresponds to a duration of 7 days. Typical events listed on this map cover 1 to 4 hours in the evening of a day, resulting in a rather small sectors in typical angles corresponding to the seven days of a week. There are three larger events, one being a weekend workshop in Hamburg (covering the saturday and sunday sectors), a Friday to Saturday in Leipzig and an event incorrectly set for all tuesday in Dresden. München on the other hand seems to take a day off on Saturday (in fact they have a full-week workshop on Lanzarote, on a part of the map not shown ...).

While this visualization is quite fancy and can scale to arbitrary time window, I will not be able to add it to the public version of this map (which can be tried out on http://swing.vitavonni.de/).

The rendering of so many polygons with Google Maps is just way to slow for all the browsers I tried. Maybe I could use cached png images instead and traditional overlays to improve performance.

For some visualizations, it would also make sense to turn the sectors into a spiral, for example where the angle corresponds to the day of the month and the distance from the center corresponds to the month.

[category: /en/web | Permalink]

Mon, 01 Feb 2010

Maps-Calendar Mashup

Well, I'd not call it a Mashup - it's actually backed by a custom database, a Xapian index for full text search and so on. To me, a true mashup would work without own server side code.

Anyway, what it does is this:

  • It gathers data from two dozen Google Calendars for the next few weeks
  • Geocodes them and does full-text indexing (including the Geo information, so you can search for that, too.)
  • Applies some magic formatting to the calendar data (making links clickable, allowing some basic styling)
  • Pushes them as KML feed to a Google Maps application
  • Markers on the map are aggregated and "clustered" (which is a simple proximity merging, which I wouldn't call clustering)
  • The map is pre-centered and panned using Google geo-location information on the visitor to show the best region with hits. So if you are in a city I have data for, it show you your city. If you are in the US, it should try the whole US next, then fall back to the whole world.

It's using the Maps V3 API, currently in public testing, which seems to give quite some extra speed compared to earlier versions. I've also added two extra controls, a search box at the top center, and a "Go to" menu on the left, which uses the visitor position from Google.

The data is coming from swing dancing calendars, so it's real world data, and you should get different results every day. Most of the data is from Germany, so that is where you can see the marker aggregation and these things in effect.

Here's the prototype.

There is still lots of things to do, but this is just my free time project, when I'm not at work, dancing or with my friends.

  • Timezones need to be handled right. So I don't give you any guarantee that any event has the correct time shown. I believe it only works right for events in CET and visitors in CET right now. It's okay: I havn't though of how to handle time zone differences yet, and the JavaScript Date API is worthless anyway, so that requires quite some effort.
  • Events in info windows aren't sorted by time yet
  • Info window UI is bad, need to do pages there
  • There is no way of choosing a different time query window except the next week. The backend does this already, it's just not in the UI.
  • There is no "temporal navigation" tool
  • No list view (well, actually there is one, but that is the old version)

I don't know yet if this will remain online, it's more of a toy project for me. Still it's cool to see where there are swing dancing events, and it's cool to be able to just zoom to another city and see where you could "hop by" for a dancing event while you're there. But there are just a lot of UI issues to solve to get this really usable, and I'm not much of an UI guy...

P.S. if it doesn't work, that probably means I'm currently working on it. There is no staging, and no "production system".

[category: /en/web | Permalink]

Fri, 22 Jan 2010

Sun Java - happy 9th birthday, user-affecting rendering bug.

It seems that Sun doesn't care much about getting bugs fixed in Java.

This bug for example causes rendering artifacts in Apache Batik, and is very visible with many SVG files. It causes circles to be rendered as approximated diamonds. It has been reported 9 years ago (the first time, there duplicates).

I understand that there are both more important bugs, and that one must avoid introducing new bugs when fixing bugs. But there should be little dependencies on a broken circle rendering routine, so please just fix this cosmetic bug, too. One of the reports is even staged "Fix understood" ...

A more important issue with Sun Java (known since 2005) is this bug, which effectively breaks Java IPv4 networking on Debian unstable now (which recently changed the IPv6-to-IPv4 fallback behaviour). So far, Sun has rated this as "request for enhancement". WTF?

Sure, you can work around the bug easily - change /etc/sysctl.d/bindv6only.conf to use the value of 0 instead to re-enable IPv4 fallback - but after all, IPv4 networking is pretty much an essential Java feature.

[category: /en/linux | Permalink]

Sat, 09 Jan 2010

Facebook Scam Groups

Facebook seems to have little interest in protecting its users from a huge flow of common scam/spam. Sure they do get active when accounts are mass hacked, and I havn't seen a "Facebook virus" for some time. Their JavaScript filtering is pretty neat, and they have implemented dereferrer pages they can use to quickly stop URLs from spreading.

However, some of my friends keep on joining very dubious groups and installing very dubios applications. No wonder "FarmVille" is sometime nicknamed "ScamVille". There still is a lot of money to make in dubious ways.

The big problem with Facebook is that everyone can set up groups and applications that look like they might be real. This is why people keep on installing "Mafia wars gifts" applications that have nothing to do with the actual game except the name. And sometimes not even realize they don't actually get these gifts in the real game.

Even worse are the "pimp" groups. It's a classic pyramid scheme. Invite all your friends to the group, then you get extra Mafia points. Facebook really needs to stop that.

A quick search for "invite proof" - these groups usually require you to post "proof" of having invited all your friends - turns up 246 groups, almost all of which promise you Mafia stuff.

Searching for "getElementsByTagName" in Facebook turns up "over 500" groups. This string is a JavaScript command commonly used to auto-invite all your friends to a group. A typical mass-spread group will use this in its "join instructions".

Facebook needs to combat this kind of spam/scam. And it's not too hard. Just actually check user complaints/reports, do simple searches like the ones I posted above, and have some employee go through them and just delete all these dubious mass-join groups. Pyramid schemes likely violate the Facebook TOS, and they definitely are illegal in at least Germany.

[category: /en/web | Permalink]

Mon, 28 Dec 2009

Enigma in Debian

Enigma is a great game, with a unique mixture of puzzles with mouse skills and action. If you know the discontinued game Oxyd originally on the Atari ST in the 90s (also on Amiga and one version on DOS), then you know the principle of Enigma. Except that it has tons of more levels and is Open Source.

Some weeks ago, I uploaded a 1.10 pre-release (approximately milestone 5) to Debian experimental. This is the soon-to-be-released new version, using a new level file format (with a much extended API to make level development even easier, ~50% less code per level now), new levels (of course), updated graphics (including support for new graphics modes), ...

Unstable still contains version 1.01; the reason is simple that I knew there would be another 1.01 maintainance release coming. However I believe it doesn't offer much against the current unstable version; it largely marks an upstream release containing patches already in the Debian package (since communication with upstream is really good).

So I have now two choices: refreshing the Debian unstable package to the "probably last" 1.01 release upstream, or going straight for the 1.10 milestones to give enigma some extra testing.

Sat, 26 Dec 2009

Duplex on HP OfficeJet Pro 8000 seriously messed up

My parents needed a new printer, and after some research I decided to recommend them an HP OfficeJet Pro 8000. Today I gave it a try, by printing some CD covers for a CD to give away for christmas to some friends.

HP failed in a very subtle way: I had printed the covers, cut them, produced the CDs for them. Then I wanted to put the printed covers into the CD cases.

Despite the graphics being 12cm x 12cm in size, HP managed to print them in 12cm x 11.4cm. Without any notice (or giving me a choice) it had decided to scale them on the y axis. Which makes them completely unusable, since they don't fit the 12cm height of the CD case now.

After some more experiments, I decided to retry without duplex, and voila: 12cm x 12cm.

Duplex on HP OfficeJet Pro 8000 is only usable for draft printing, since it will distort your pages!

(See also this devidence in the HP forums, of people with the same issue, an attempt to investigate the margin messup happening, a report that the DJ990c driver can print duplex on this printer without messing with the margins, but is slower and offers less print quality. So it seems that this is an HP driver problem. And technically, it must be caused by the driver; at least it should be able to compensate for this!)

I also noticed another issue with the print. The bottom right corner of the graphic didn't get enough ink, it looks like the printer stopped printing a bit too early. I don't know if this also happens in non-duplex, since I worked around this by adding a header and footer to the page.

Seriously, we should send back the printer. On my first try to use it, I already encountered two bugs. I wonder how many bugs I would see if I'd use it every day?

[category: /en | Permalink]

Fri, 25 Dec 2009

Media Players

Somehow, I'm still lacking the optimal media player application. Many popular ones are totally overloaded (e.g. amarok). Others like totem seems to be just a minimalistic frontend for a particular backend.

My current choice:

  • Single-shot playback: to view a random song or video I usually open them with Totem (the GNOME default) and that works okay
  • Library: I use MPD as player because it just seems to be rock stable. As UI I currently use Sonata, but I don't use it for much more than choosing a song from the currentl playlist.
  • Editing: ExFalso seems to have the best ID3v4 support, in particular it also allows multiple genre fields. (Note that Vorbis even suggests you should use multiple artist fields instead of the common "Arist A & Artist B" way of filling the fields)

However, there is one thing I'm really not satisfied with: when putting together a CD compilation for friends (say, as Christmas present), they are quite useless. A key issue here is the total playlist length. Guess what, I want to make sure it fits on a single CD. So I really need to know the total playlist length. Why do so many media players (e.g. totem, alsa-player-gtk, xfmedia4, vlc, mplayer, ...) not show you the total playlist length? They did read all the files to get artist and title. Many even have the individual song lengths, just not the total sum.

In the past I've been using old XMMS1 to check for the total length, or a CD burning application like K3B by repeatedly importing my current folder.

Right now, I'm using Quod Libet (since I like the tag-editing component exfalso a lot) to arrange the playlist. It also gives me the total length, albeit I belive I've had incorrect song lengths in it before (broken VBR files?), and it's not perfect, too: being database-driven it has really long startup times for occasional users (because of updating the database) and is much more heavyweight. I also believe I've lost some playlists because I had moved my files around once ... so I'm a bit sceptical.

Anyway, there are still hundreds of media players I havn't looked at. Don't bother me to send me an email about one I havn't mentioned!

But if you are developing a media player, please consider the use case of putting together a music CD for your friends. In particular, for users that do not use your player all day.

[category: /en/linux | Permalink]

Tue, 08 Dec 2009

Highlighting links to your own site in Google search results

The following User stylesheet snippet can be used to highlight particular search results (such as your own domain, if you want to quickly find it in Google search results):

@-moz-document url-prefix(http://www.google.com/search)
{
a[href^='http://www.vitavonni.de/'] { background-color: yellow; }
}
You might also want to add a copy for your localized Google domain:
@-moz-document url-prefix(http://www.google.de/search)
{
a[href^='http://www.vitavonni.de/'] { background-color: yellow; }
}
Or you could go the heavyweight way:
a[href*=vitavonni.de] { background-color: yellow !important; }
to even highlight any link to your domain.

This modification obviously only applies to your browser; it's meant to help you finding links to your own site more easily.

[category: /en/web | Permalink]

Eclipse TPTP on Debian unstable/AMD64

For a Java project, I wanted to give the Eclipse profiler a try. It didn't work, because it was missing a library (open the "Error log" view to see such things)

The corresponding library - libstdc++-5, and old C++ library - is no longer available in Debian unstable, so you need to grab the package from lenny. It will install fine on unstable.

Things may or may not be different on other architectures.

[Update: But TPTP is far from stable for me. It freezes Eclipse pretty much all the time.]

[category: /en | Permalink]

Sun, 06 Dec 2009

Making pyroman IPv6 capable

I'd like to make pyroman IPv6 capable. That is actually the one big thing before calling it a version "1.0".

I must admit that I havn't been very active on Pyroman (or Debian in general) the last years. This goes even so far as that "pyroman" was considered "abandoned" by Fedora or so. It is not; I use it on all my servers. It's still in use at the network I developed it for (after all there is not that much benefit for a workstation setup, where a 10 line iptables script will do the job just perfectly.).

Anyway, I'd like to get IPv6 support into pyroman, but there is one big issue here: I don't have any machine using IPv6, so I havn't used ip6tables myself yet, so I don't know about all the magic involved ...

So if you use IPv6, it would be very cool if someone would jump in to get full IPv6 support into pyroman. Madduck had already done some preliminary stuff, but I didn't get around to have a look at the integration or completeness yet.

The '--no-act' and '--print' modes of pyroman should even allow development without any IPv6 support or root permissions in the system.

Other things remaining on my pyroman wishlist:

  • Fully automatic iptables firewall visualization
  • Keeping traffic counters over firewall reloads
  • Configuration UI
  • A fancy 'arsonist' icon and a web page design

[category: /en/linux | Permalink]

Fri, 04 Dec 2009

Tracking outgoing links with Google Analytics

Here's a code fragment to track outgoing links with Google Analytics. As usual, use it at your own risk. I can not give you support for Google products, for obvious reasons.

To use it, you need at least understand where to put it (call it in a try-catch in onLoad) and how to adjust the variable name of your page tracker (I'm not using the default).

function trackLinks(){
  var as=document.getElementsByTagName("a");
  var ig=["mydomain.tld","google-analytics.com"];
  for(var i=0; i<as.length; i++) {
    var ignore=false;
    var oc=as[i].getAttribute("onclick");
    if(oc!=null){
      oc=String(oc);
      if(oc.indexOf('urchinTracker')>=0
      || oc.indexOf('_trackPageview')>=0
      || oc.indexOf('javascript:')>=0)
        continue;
    }
    if(as[i].href.indexOf("mailto:")<0){
      for(var j=0;j<ig.length;j++){
        if (as[i].href.indexOf(ig[j])>=0)
          ignore=true;
      }
    }
    if(!ignore){
      as[i].onclick = function(){
        var o=this.href.replace(/:\/*/,"/");
        pt._trackPageview('/out/'+o)+";"
        + ((oc!=null)?oc+";":"");
      };
    }
  }
}

This code tries to attach an onload handler to any outgoing link, ignoring internal links or links that use JavaScript. If such a link is clicked, it generates a virtual page access with an "/out/" URL that can be analyzed in Google Analytics.

A side benefit (apart from knowing which links are interesting to your visitors) is that you should get more accurate "time on page" statistics for your pages.

[category: /en/web | Permalink]

Sat, 28 Nov 2009

Tracking Google image search in Analytics

I do not really understand why they don't support this themselves, but Google Analytics will not track keywords for Google image search. Instead it just shows up as "referrer". A site I'm webmaster for, Swing and the City, gets a lot of image search exposure (funnily for an image that is gone since August, Google also needs to work on their index, too), so it was a bit odd to have images.google.com show up as top referrer but not "organic search".

Here's the code I use to fix this:

var r=document.referrer;
if(r.search(/images.google/)!=-1 && r.search(/prev/)!=-1){
 var e=new RegExp("images.google.([^\/]+).*&prev=([^&]+)");
 var m=e.exec(r);
 pt._addOrganic("images.google","q",true);
 pt._setReferrerOverride("http://images.google."+m[1]+unescape(m[2]));
};
pt._addOrganic("maps.google","q",true);
pt._addOrganic("forestle.org","q",true);
pt._trackPageview();

Note that image search is more complicated than the maps and forestle search engines I also add for keyword tracking. The original query is encoded in the "prev" parameter, and the easiest (or only?) way to get working tracking is to use the ReferrerOverride function of analytics.

Note: this is not a straight copy & paste, since I use this code in a compressed and encoded (for injection into the page via DOM ops) form. So no guarantee of syntax completeness. You'll need to adjust it to your variable naming anyway (I use "pt" instead of "pageTracker"). This is just to show you the use of unescape on the "prev" parameter for this purpose.

[category: /en/web | Permalink]

Thu, 26 Nov 2009

Identifying Link Spammers via nofollow links

I wonder if it's possible to identify link spammers (you know, these bots that mass-submit a link into as many blogs/etc they can find in order to boost their page rank) by the simple measure of how many of the links to their site are marked 'nofollow'.

Say, a regular page should have less than 5% (and less than 20) nofollow links; a site that goes significantly above this value probably employs some spam bot.

The only really hard thing is how to avoid attacks on a site using this ... say, I write a bot that spams links to Microsoft on as many sites as it can find that DO use 'nofollow', in order to get that site above the limit, and have google penalize it.

So in general I don't think Google would automatically penalize such things, still it could be used to e.g. have a human check the destination site for useful content, and then only blacklist when it doesn't seem to be useful.

P.S. Which BTW is a reason why some of the SEO "do nots" are bullshit: it would be too easy to deliberately use these to blacken a competitor. So a 'link farm' will at most do nothing to raise your ranking; but Google must not allow you to actually lower a competitors ranking by setting up a link farm to him!)

P.P.S. On another side note: Who guarantees that Google actually ignores "nofollow" links? They could also just be assigned a lower weight or a penalty, so that a "nofollow" link from a strong site such as Wikipedia would still be worth a lot, while the average blog comments page link goes down to 0. Say a "nofollow" link from a PR 6 site is as much worth as a regular link from a PR 4 site, and PR 2 becomes PR 0. Would already do much of the trick in discouraging the use of blog spam bots. Because after all, ignoring the links on Wikipedia for page rank would be quite stupid. In German Wikipedia, the page contents are even "sighted" (aka: peer reviewed); this is a rather trustworthy source, especially when you take time effects into account. A link being constantly in Wikipedia on a popular page for more than a month very likely is good.

[category: /en/web | Permalink]

Wed, 25 Nov 2009

Lost an ext3 filesystem

These days, something happened to one of my external USB drives that I so far only knew from ReiserFS (which I since called ReisswolFS, German word play on "shredder" ...). But, it's not ext3 which I blame.

Short story what happened:

  • Resumed the system from 'suspend'.
  • I copied some files onto the first file system.
  • I copied the same files to a second external disk (dual backup...)
  • I copied some files from the first disk, which caused an access-beyond-end-of-disk, mounting the filesystem read only
  • Unmounted the filesystem, started e2fsck
  • Started copying the files from the secondary filesystem
  • Got the same error on the second disk.
  • Cancelled e2fsck doing more damage to the first disk.
  • Shutdown and reboot
  • Memcheck, three iterations. Nothing.
  • Checked second disk, no errors in filesystem (!), copied the files I had issues accessing just fine.
  • Filesystem on disk #1 seriously trashed.
  • Had ext2fsck try to recover filesystem on disk #1
  • Pretty much all data on disk #1 is now in lost+found, it seems as if all major folders were corrupted. Lots of corrupted file entries (character devices with random permissions and numbers) there, too.
What I will do now:
  • Reformat disk #1, and restore it from the other backup (Extra backup for teh win! I also have a 3rd copy of about 2 months ago off-site)

As you can see, something was wrong with the system, not with the file system.

I have a strong suspect to have caused this. In case you wondered why I included "resumed from suspend" above: I've been having system stability issues with resume ever since upgrading to the Intel driver 2.9.0 and KMS (Debian unstable+testing) with kernels up to 2.6.31. In about 1 out of 5 resumes, I get a Xorg or system lockup after anything from 1 to 60 minutes. Sometimes I also experience video corruption after a few minutes, trashing some terminal emulation until the next redraw. Just before writing this email I had a typical lockup: when scrolling the terminal emulator. This has been a typical trigger for lockups. On contrast I havn't seen any such crashes (or screen corruption) on a fresh boot.

Freedesktop bug reporting the same issue closed as "not our bug, blame it on the kernel".

Note that 2.6.32 release candidate Changelog contain many changes for the intel DRI kernel driver. So the bug might already be fixed in the RC kernels.

Same report in Kernel Bugzilla is still 'NEW' though.

Related bug report in Debian, blaming it on KMS.

[Update: I've disabled KMS and upgraded to 2.6.32-rc8 and not had such a crash since. But I can't pinpoint it to one or the other yet.]

[Update: just tried another external harddisk ...

[305032.148616] EXT3-fs: mounted filesystem with ordered data mode.
[305066.061708] usb 1-8.3.3: reset high speed USB device using ehci_hcd and address 27
[305081.132471] usb 1-8.3.3: device descriptor read/64, error -110
...
[305147.468857] sd 4:0:0:0: Device offlined - not ready after error recovery
[305147.468880] sd 4:0:0:0: [sdb] Unhandled error code
[305147.468886] sd 4:0:0:0: [sdb] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
...
[305147.473500] WARNING: at /build/buildd-linux-2.6_2.6.32~rc8-1~experimental.1-i386-g1b8iG/linux-2.6-2.6.32~rc8/debian/build/source_i386_none/fs/buffer.c:1159 mark_buffer_dirty+0x20/0x7a()
It seems as if the USB disk stack still doesn't really survive suspends? Let me try on a fresh boot later on.

[category: /en/linux | Permalink]

Google Wave rolling out?

When I got my Google Wave account, it took the invitation about a week to arrive. A few days ago, I got my first own invites, and invited some colleagues (in an attempt to actually find a use for Google Wave beyond "rich media live messaging"). Within a few minutes they were "in". Now I just got my second set of invites. So is Google Wave now getting ready for mass opening, rocketing user numbers?

As you might have already guessed, I'm not convinced by Google Wave. It's technically interesting and well-done. The demos are all nice. It's just that the UI in the browser is a bit fragile and cumbersome, and the big question so far is:

What does Google Wave allow you to do that you couldn't do before?
To me, there has been little actual use so far. Wave can do everything, but isn't optimal in any of them:
  • You can use it for mail, but it only works with other users of Wave and lacks good offline operation.
  • It beats pretty much any instant messaging in functionality, but the UI isn't well for running in background. Most IM clients have a great UI for "background" operation.
  • Collaborative editing - I prefer having a real editor and real files for that. Check out Gobby for that. I've heard Wave is good for remote brainstorming, though.
  • Social networking, read "facebook". Wave doesn't have all the filtering stuff that Facebook is still trying hard to get useful. Just wait until someone releases "Mafia Wars" for Wave ...
  • Blogs. Sure, I could do a 'Blog Wave' and invite my friends there. Makes sense for small-audience private blogs; not for blogs like mine where I mostly write to people that I do not know.
  • Games. This probably is the current killer app on Wave: Sudoku. Although the (widespread) implementation sucks somewhat. Magnetic Poetry is a nice idea, but doesn't even work in Chrome for me properly ...
  • All the web 2.0 stuff just gets on my nerves. I'm not going to use it for my blog; I by design do not have comments on my blog, either. Being able to web 2.0 everything doesn't make up for a lack in benefits.

Yes, I'm aware that you should differentiate between the protocol and the ui. Still pretty much everything is currently designed for the web browser with full JavaScript and Flash capabilities.

Of course this isn't the end yet, Google Wave will evolve. Maybe into something cool, maybe it will remain just a niche thing. Maybe some cool apps will just use Wave as protocol. But I figure, I'll mostly wait for these things to happen first before I become a frequent user of Wave.

The biggest thing I see is the "spam" (this especially includes 'Quiz', Mafia Wars and similar Scamville type of 'apps' that surely will show up in no time, once Wave is open to the public). What will Wave provide to me to handle this flood of worthless information that I'm getting more and more?

P.S. Please don't bother to ask for invitations to Wave.

P.P.S. here's how to replace the odd scrollbars with the regular OS scrollbars with a really simple user style (CSS).

[category: /en/web | Permalink]

Thu, 19 Nov 2009

If you are in Bavaria, sign up for the smoking ban vote!

Starting 01/01/2008, Bavaria had introduced a quite hard smoking ban, which also included bars and restaurants. It however contained a backdoor by excluding non-public locations, which led to the creation of 'smoker clubs' where you had to become a member to be admitted. At some point, most clubs were of this kind.

In August 2009, however, the law was changed to exclude beer tents (Oktoberfest ...) and small bars. Many people belive that this was to get votes on the elections in september 2009 (which ended up in a minus of 6-7% compared to the previous election and a historical low for the biggest party).

This caused several organizations to call for a public vote on restoring the smoking ban to the 2008 state (without the 'smokers club' backdoor). In order to force a public vote on a law (without the governments support!), we need 10% of the voters to register as supporters for the vote. You have to register at your registered home town. For Bavaria, this means about 940.000 supporters.

If you are registered voter in Bavaria, please drop by your municipality and sign up. You need an ID and 5 Minutes, that's all. 940.000 supporters is an incredible lot of people to get to the offices, take along your friends!

When we get enough supporters, the Bavarian government has two options: accepting the changes as proposed (and thus making the initative obsolete), or conducting a public vote on it, offering an alternative (e.g. the current law, no change) and have the voters decide (which is quite expensive, so if many many people sign up, they might save that money and just pass the proposed change themselves).

For more information (german only), check the Nichtraucherschutz Bayern Website, including the sign up office locations.

P.S. In other European countries, the introduction of a strong smoking ban has led to a 10-15% decrease in heart attacks (20% for non-smokers). The german constitutional court has also already ruled that the protection of non-smokers and employees from passive smoke weights stronger than the individual's freedom to smoke in enclosed spaces.

[category: /en/politics | Permalink]

Mon, 16 Nov 2009

DebConf 2011 in Munich

We'd like to host DebConf 2011 in Munich, Germany.

However, this is a far from trivial challenge:

Rent in Munich, in particular for conference rooms, is far from cheap. In my opinion, unless we get some really big sponsor (and I'd still prefer spending sponsor money to fund developer trips to the DebConf instead!), the only chance we have is to get some rooms at the university.

However given the development of the recent years (budget etc.), it has become a lot more difficult to actually get rooms at the university for such events. Unless the event is considered to be fully a part of the universitys "work", we might have to pay rent to the university. Which again isn't that affordable.

Anyway, if you are in Munich, working at one of the universities, or in any way interested in supporting DebConf 2011 in Munich, please join the DebConf11 Germany mailing list. Also check our meetings scheduled on the DebianMuc Wiki page, currently every Monday, 18:00, at the new LiMux offices in Sonnenstr.

P.S. There will also be a Bug Squashing Party in Munich end of November: Munich BSP November 2009

[category: /en | Permalink]

Sun, 25 Oct 2009

Facebook tweaks

Every time Facebook changes anything, people complain. Most of the time just because something has changed, without knowing actually what changed.

The october layout change for example isn't too big in fact. As far as I can tell it's not much more than turning the "hot" items that were in the right sidebar into a special tab (and breaking the refresh for the live feed, but I guess they'll fix that soon). The "live" tab is basically all information (see below for getting rid of certain restrictions); the "News" tab tries to reduce this amount of information by only showing you certain posts Facebook magic considers to be "important". If you are a heavy user you will probably prefer the "Live" feed, if you are a casual Facebook user, go with the "News" feed to have less crap posts to read.

Still there are some things you should be aware of when you are a facebook user (not all of these are new):

  • Privacy settings in Facebook. You really should have a look at them. Be aware that if you put them all to the maximum, it may be next to impossible for not-yet-friends to add you. So I recommend to leave the "search privacy setting" to "all" but only have it show your photo and the option to add you as friend. This is enough for people to be able to actually add you as friend. (Setting it to "Friend of a Friend" was not sufficient for me.)
  • Friend lists. These are really useful when you use Facebook for promoting events. Make lists of regional and topic grouping, for example I have a "swing dancers in Munich" list. If you bother people with irrelevant invitations they'll just ignore or unfriend you!
  • Friend lists can also be used as a privacy device. I have a friend list called "privacy" with only those I consider to be really close. Photo markers etc. are restricted to these friends.
  • The "live" feed will not show all your friends. By default it is restricted to 250 friends. The "Options" button at the end of the live feed page will allow you to increase this setting to 5000 and allow you to check which friends activities facebook had been hiding from you. (And indeed some people were I wondered if they had stopped using facebook had just been hidden from the feed by facebook ...)
  • I wrote a Greasemonkey script (Greasemonkey is a Firefox extension, but by now also available for many other browsers, apparently even IE) to move the filters from the left to the right column, expanding the main feed column. Much more senisble layout IMHO.
  • Avoid using applications where possible, especially all those Quiz applications and games. Remember: you will give the application access to most of your data and most of your friends' data as well. So make sure you can trust that application! (See this Post by the American Civil Liberties Union for details)

Also you should never forget that all the data you put online is hard to get rid of again. Just don't put anything there you don't want everyone to know. Facebook can be really powerful when used right for example as promotion channel. But the way you should be using it is to first consider what you want people to have an impression of you, then try to present yourself this way. Don't just throw everything that comes to your mind there. (This even more applies to blogs and web sites, obviously, that don't have any privacy control)

[category: /en | Permalink]

Fri, 04 Sep 2009

Friends update - LiveDash, HoneyWish, Amiando

A short update on some friends of mine.

First of all, Patrick F. Riley - I worked with him on some projects when I was visiting the UC Berkeley, one of which was a predecessor to his latest thing: LiveDash. It's really cool: it allows you to search almost in realtime in TV feeds. It also live-indexes Twitter, blogs, news sources etc.

Secondly, HoneyWish (currently only available in German) is a service for a "honeymoon travel gift list" thing. It works like the traditional gift lists, except that instead of putting all kind of household stuff on it, there are all the parts of the honeymoon trip on the gift list. This makes much more sense these days: people tend to get married later; they might even be sharing a house for some time before getting married. So they don't need much silverware anymore, but they for sure will enjoy their honeymoon trip - so what could be a better gift for them?

Third, Amiando a web-based ticketing and event management service. Founded already some years ago by some friends, it has been growing and coming along nicely. Every now and then, it won some award, many of them in the "top startup" category.

There are of course many more projects of friends I'd like to point out, but these three definitely are highlights.

[category: /en | Permalink]

Thu, 03 Sep 2009

Embedding Flash: don't forget wmode="transparent"

If you are doing a complex web layout (such as my Swing and the City layout which features alpha-transparent fixed layers), and want to embed Flash (e.g. on the Was ist Swing? page - German: What is Swing), make sure you add the attribute wmode="transparent" to your embed tag, and <param name="wmode" value="transparent"></param> to your object. Otherwise, a layer - in particular popup menus - might end up below the flash.

This includes you, YouTube. In HD view, the user popup menu only has the top 3.5 entries out of 5 accessible for me.

The following XSLT stylesheet can be used to find such embeds in a bunch of XHTML files using the command line xsltproc findNoWmode.xslt $( find -iname '*.html' )

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:html="http://www.w3.org/1999/xhtml">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
  <xsl:call-template name="t"/>
</xsl:template>
<xsl:template name="t">
  <xsl:copy-of select="//html:embed[not(@wmode) and (count(param[@name='wmode']) = 0)]"/>
</xsl:template>
</xsl:stylesheet>

You can of course also write a XSLT stylesheet to insert the wmode statements whenever there is none, to make transparent your default.

[Update: I've received comments that this comes at qutie a performance cost for Flash, and that this might be the reason why YouTube doesn't use it - in particular for the HD videos. Also it isn't supported by WebKit based Browsers so far (so Safari neither?) and nor does it seem to be working in Gnash, an opensource flash plugin. So you have to choose between multiple evils if you are using Flash...]

[category: /en/web | Permalink]

Thu, 06 Aug 2009

Swing and the City relaunched

We've opened a completely redesigned Swing and the City web site today. The layout was quite a pain to get working because of transparency and non-scrolling parts. But on my last tests, it was working quite well in all of the major browsers. But if you notice any issue, please tell me (email: erich AT debian org)

I'm aware that the red-yellow border on the left doesn't line up right. I'm waiting for fixed graphics from the designer for that. There is also a glitch with clicking the logo when scrolled just a little bit down. These are on my to-do. At some point I also want to increase the use of CSS spriting to further reduce page load times. Oh, and Internet Explorer sucks, btw.

The web site is about Swing dancing in Munich (so no tech today), and at this time only in German. At a later stage, we might add English, too.

During August we'll also be building our own studio, "Cats Corner", which will actually be somewhat similarly decorated. :-) Congratulations to Christine for doing all that for the Lindy Hop scene!

P.S. Bring Down IE6.com, IE6 No more.com

P.P.S. See this blog post on how it is impossible to use the CSS "clip" property in a way that both IE7 and IE8 will understand. While only one is W3C standard, Firefox just accepts both ... but at least IE8 goes with the official standard now.

[category: /en/web | Permalink]

Mon, 27 Jul 2009

LoOP: Local Outlier Probabilities accepted at CIKM'09

Hans-Peter Kriegel, Peer Kröger, Erich Schubert and Arthur Zimek
LoOP: Local Outlier Probabilities
Has been accepted at CIKM 2009 (The 18th ACM Conference on Information and Knowledge Management), November 2-6th 2009, Hong Kong. And will appear in the conference proceedings published by the ACM Press.

It's an outlier detection method based on LOF (Local Outlier Factor) but a bit more statistically robust and with an easier to interpret score. Given the statistical backing, it works reasonably well on samples such as data pages of an appropriate index structure, reducing complexity to linear for the approximative version.

This publication is a bit special to me: I suggested the approach to my colleagues and they gave me the abstract and title for my birthday. :-)

[category: /en | Permalink]

Tue, 09 Jun 2009

Swing music at Amazon

As many will know, Swing dancing and -music has become my big hobby and love. I'm co-teaching classes every week, and of course people ask me where to get some music to dance to.

For this, I'm trying out the Amazon "aStore" functionality. Basically you setup a few categories and add Amazon products to these categories for people to choose from. Of course Amazon will also show other products it considers relevant etc.

My Swing music on Amazon "store" (on Amazon.de, but I guess it will also somehow take you to other Amazon sites?)

The (editor) UI is not very convincing yet. For example, it lacks an obvious way of moving "products" from one category to another, and you can't see more than 9 entries on a page in the editor, reordering is via entering sequence numbers etc. - that definitely could use some improvements.

Anyway, some people might find this useful.

[category: /en/dancing | Permalink]

Mon, 01 Jun 2009

Fun with Wolfram Alpha

Wolfram Alpha was often hyped as the latest and greatest search engine.

I wouldn't call it so. It's just a very minimalistic search frontend to a nice database with lots of numerical facts.

Yes, it can give you the height of the eiffel tower (because that's a fact in its databases). It can even compute for you what Pi times the height of the eiffel tower is. But that is about as far as you can go in combining. In my tests, I wasn't able to compare the temperature in Munich with the temperature in Berlin (both of which WA will visualize you with a pretty graph, so these are facts in WA) - their query parser just doesn't get my question.

The funniest reply so far however was to the question:

How many cars in Germany?
The answer of WA (which btw is copyrighted by WA):
No

Seriously, I doubt that there are no cars in Germany.

At least it also offers an explanation why it comes to this conclusion:

Cars is a town in south-western France (which as you might guess currently is not a part of Germany. :-) ) - so for WA, there are at least cars somewhere in Europe, but not in Germany!

[category: /en | Permalink]

Mon, 25 May 2009

If you're wondering why your circle is a diamond ...

... you might be bitten by this Java bug rendering arcs as straight lines at large zoom levels.

It looks like a classic to me: in order to improve rendering performance, you approximate arcs with straight lines at small resolutions (if it's just 2 pixels big, nobody will be able to tell the difference). Except of course, when you end up doing the same approximation at a large zoom value - of course a 100-pixel circle looks different from a 100-pixel diamond.

Reported in 2005, still not fixed in current Java (we're in 2009 now).

Sun is really slow at fixing Java bugs.

See also a related Apache Batik bug report. Fortunately, this only applies to Java rendered graphics - SVG export, PDF, Postscript are all fine.

[category: /en | Permalink]

Sat, 16 May 2009

Adobe GoLive question

Is there any way to provide an alternate CSS stylesheet for GoLive CS2 only, not for regular browsers? Because there are some things in that layout that are too difficult for the GoLive renderer, it doesn't display them right. The pages are still editable (just plain XHTML), it's just not looking right in GoLive (advanced CSS).

The site already has alternate stylesheets for browsers such as the broken Internet Explorers, so if I could convince GoLive to use their stylesheet it might be looking a lot better in the editor, too ...

I am aware that GoLive CS2 has been abandoned in favor of DreamWeaver. Still it's going to be used in a project I help with the web templates.

(Other options would be Kompozer and Amaya, but none of them seem really fit for production use: Amaya was just removed from Debian because it had some security issues and the maintainers had the impression the code was such a mess that there will be much more such issues. And Kompozer seemed to be a mostly dead branch of a Gecko hack (although there has been a new alpha release this year) ... is there some reliable opensource non-source HTML editor that I'm missing?)

P.S. Sorry, no comments in this blog. Use Email: erich AT debian ORG

[category: /en/web | Permalink]

Tue, 12 May 2009

Java hacks: Generics and toArray

Arrays and Generics in Java do not mix very well. In order to create an array, you need to know the object class the array is supposed to store.

Arrays in Java are special: they can efficiently store primitive data types. The expected difference in efficiency between byte[] and Byte[] is pretty big (of course some good VM might optimize) for obvious reasons (think of: references, garbage collection, pointer sizes, ...).

This is probably why you need to know the type before creating an array (because an array of primitive types such as byte will be different from one that stores objects of some kind).

In particular, the following Java code

  String[] foo = (String[]) new Object[0];
results in a run time error ("[Ljava.lang.Object; cannot be cast to [Ljava.lang.String;"). But it gets more confusing when you introduce generics:

public static <T> T[] test() {
  T[] te = (T[]) new Object[0];
  System.err.println(te.length);
  return te;
}

String[] foobar = test();

will print "0", then throw the same run time error in the foobar line.

What happens here is that in the test() method, T actually is replaced with "Object" at compile time. Thus the array type works just fine, and so does the call to te.length. Upon returning, it is then cast into a String[] array and fails.

Now here comes a crazy Java hack:

public static <T> T[] test(T... ts) {
  T[] te = (T[]) java.lang.reflect.Array.
      newInstance(ts.getClass().getComponentType(), 0);
  System.err.println(te.length);
  return te;
}

String[] foobar = test();

The exception is gone, foobar is of the proper type now!

A result of discovering this hack are these two methods:

public static <T> T[] newArrayOfNull(int len, T... ts) {
  // Varargs hack!
  return (T[]) java.lang.reflect.Array.
      newInstance(ts.getClass().getComponentType(), len);
}

public static <T> T[] toArray(Collection<T> coll, T... ts) { // Varargs hack! return coll.toArray(ts); }

Notice how elegant the last method looks - and it finally allows you to do toArray(collection) instead of collection.toArray(new WhateverClassTheCollectionHas[0]).

Note that this is still a hack, and may or may not work with all Java compilers, JREs and/or Java versions.

Update: Note that this 'hack' is also not transitive. The context calling toArray needs to know the object type at compile time. So it doesn't save you much more than writing "new KnownClass[0]" etc.

Update: So I'm actually not using this - it's just a hack, and often quite hackish. The problem is that when you call e.g. toArray in an Generics context, it will actually create an array of "Object", so it makes much more sense to verbosely specify the class you want to use for the arrays (and get some reliability in use back).

[category: /en | Permalink]

Dropbox experiences?

Has anyone experience with Dropbox?

It seems to be an interesting web storage service, with 2 GB of free storage.

However, the Linux client seems to be closed source (which is understandable, it seems to have a lot of neat features) - so I intend to use the web interface only (at least for now).

Update #2: There is a RFP bug for Debian, some Source is on the download site. And while this sort (except the images) is GPL, it's just the nautilus integration part, not the daemon you also need.

Did you try Dropbox? Does it work well? I know some people (especially Windows users) who could benefit a lot from a service like that, so I wonder if I should recommend them Dropbox. Or is there some better alternative (it should allow sharing of files though - synchronization is not as essential, it is a lot about exchanging files too large for usual email in small user groups; still synchronization probably is a comfortable way of transferring the files without having to think about it yourself)?

No comments in this blog - email me via erich AT debian ORG.

P.S. I know there is some referral program to get more storage, feel free to send me your referral link - I'll remove this PS once I've signed up.

P.P.S. There also is Ubuntu one, but as far as I can tell Ubuntu only so far. Looks very similar.

P.P.P.S. So far, I've received a lot of praise for DropBox.

P^4.S. My own referral link, feel free to use this to sign up (+256 MB for you, too!) and "upgrade" my account.

[category: /en/web | Permalink]
Menu
[planet.debian]
[planet.xmlhack]
[planet SELinux]
[munichblogs]
[email]
[RSS 2 feed]
[English RSS 2]
Categories
< August 2010
SuMoTuWeThFrSa
1 2 3 4 5 6 7
8 91011121314
15161718192021
22232425262728
293031    
Archives
2010-Aug
2010-Jul
2010-Jun
2010-May
2010-Apr
2010-Mar
2010-Feb
2010-Jan
2009-Dec
2009-Nov
2009-Oct
2009-Sep
2009-Aug
2009-Jul
2009-Jun
2009-May
2009-Apr
2009-Mar
2009-Feb
2009-Jan
2008-Dec
2008-Nov
2008-Oct
2008-Sep
2008-Aug
2008-Jul
2008-May
2008-Apr
2008-Mar
2008-Feb
2008-Jan
2007-Dec
2007-Nov
2007-Oct
2007-Sep
2007-Aug
2007-Jul
2007-Jun
2007-May
2007-Apr
2007-Mar
2007-Feb
2007-Jan
2006-Dec
2006-Nov
2006-Oct
2006-Sep
2006-Aug
2006-Jul
2006-Jun
2006-May
2006-Apr
2006-Mar
2006-Feb
2006-Jan
2005-Dec
2005-Nov
2005-Oct
2005-Sep
2005-Aug
2005-Jul
2005-Jun
2005-May
2005-Apr
2005-Mar
2005-Feb
2005-Jan
2004-Dec
2004-Nov
2004-Oct
2004-Sep
2004-Aug
2004-Jul
Other links:
Swing and the City - Lindy Hop in Munich