
I've been playing around a bit with Geo-Temporal visualization. Here's a screenshot of an experimental visualization on Google Maps:

The icons are placed on approximate coordinates; multiple events in a small area are aggregated into a single marker. The red sectors correspond to temporal information: to the right is the current day, a full turn corresponds to a duration of 7 days. Typical events listed on this map cover 1 to 4 hours in the evening of a day, resulting in a rather small sectors in typical angles corresponding to the seven days of a week. There are three larger events, one being a weekend workshop in Hamburg (covering the saturday and sunday sectors), a Friday to Saturday in Leipzig and an event incorrectly set for all tuesday in Dresden. München on the other hand seems to take a day off on Saturday (in fact they have a full-week workshop on Lanzarote, on a part of the map not shown ...).
While this visualization is quite fancy and can scale to arbitrary time window, I will not be able to add it to the public version of this map (which can be tried out on http://swing.vitavonni.de/).
The rendering of so many polygons with Google Maps is just way to slow for all the browsers I tried. Maybe I could use cached png images instead and traditional overlays to improve performance.
For some visualizations, it would also make sense to turn the sectors into a spiral, for example where the angle corresponds to the day of the month and the distance from the center corresponds to the month.
For my research at the university, I've become a lead developer for ELKI, a framework for developing data mining algorithms along with index structures, to be able to one one hand quickly implement new algorithms (by being able to reuse a lot of code, in particular index structures, parsers, ...), but also on the other hand to evaluate the interaction of index structures with different algorithms, distance functions, and so on. The new version 0.3 which adds a lot of new outlier detection methods will be published beginning of April at DASFAA 2010. (Note that this is designed for research and teaching use; code extensibility, readability etc. instead of maximized performance. You might want to do a rewrite in C for maximum performance, since Java does give you quite a memory and performance overhead in these setups.)
After this release, we will be doing a major redesign of the database and index layer, to allow better comparison of different index structures in parallel; right now it's hard to use more than one at a time, and building e.g. a combined index structure is a larger effort than we'd like it to be. During the process of redesigning the database layer, I'll also be improving the database update or "streaming" API.
If you know of a nice API for streaming databases, please send me an Email to erich -at- debian -dot- org.
Note that I'm looking for a programming API, i.e. "interfaces". Not for random data sources such as Twitter. Also I really need an API able to model data changes, not just "events" aka "instances". So please don't point me to what Twitter calls a "streaming API". What I'm looking for is a nicely-designed API to allow programmers to react to Database changes, such as insertions, deletions, updates, bulk operations etc. and update their index structures and algorithm results accordingly. An example would be the "oracle streams" API I believe; MySQL probably has a logging (used in their replication hacks) which can also be seen as a database stream. But these are designed around a RDBMS view, and not so much for data mining. Weka/MOA Streams seems to be just event/instance streams, where there is no such thing as a bulk insert or even a deletion. Of course there are many use cases where you will be happy with working on just the previous n instances. The more general case however handles arbitrary inserts and deletes, instead of having just inserts (and implicit delitions by an expiry strategy). And yes, of course you can in turn wrap database events as instances into an event stream (with "insert", "update", "delete" events) ...
Yes, I'm aware that this kind of setup was abandoned for what is called a "stream processing engine" (SPE) in many use cases, that do not care about "old" data or deletions in general. We'd like to be able to support both approaches, also to be able to do fair comparisons.
Of course you can also point me to badly done APIs, and explain me where they fall short for you. We're not so much interested in copying some API, but we'd just like to design a good API for people to do research on stream processing in a database context (e.g. index support for streaming data, online algorithms, ...) Or you can write me a mock-up API that you would deem useful.
Well, I'd not call it a Mashup - it's actually backed by a custom database, a Xapian index for full text search and so on. To me, a true mashup would work without own server side code.
Anyway, what it does is this:
It's using the Maps V3 API, currently in public testing, which seems to give quite some extra speed compared to earlier versions. I've also added two extra controls, a search box at the top center, and a "Go to" menu on the left, which uses the visitor position from Google.
The data is coming from swing dancing calendars, so it's real world data, and you should get different results every day. Most of the data is from Germany, so that is where you can see the marker aggregation and these things in effect.
There is still lots of things to do, but this is just my free time project, when I'm not at work, dancing or with my friends.
I don't know yet if this will remain online, it's more of a toy project for me. Still it's cool to see where there are swing dancing events, and it's cool to be able to just zoom to another city and see where you could "hop by" for a dancing event while you're there. But there are just a lot of UI issues to solve to get this really usable, and I'm not much of an UI guy...
P.S. if it doesn't work, that probably means I'm currently working on it. There is no staging, and no "production system".
It seems that Sun doesn't care much about getting bugs fixed in Java.
This bug for example causes rendering artifacts in Apache Batik, and is very visible with many SVG files. It causes circles to be rendered as approximated diamonds. It has been reported 9 years ago (the first time, there duplicates).
I understand that there are both more important bugs, and that one must avoid introducing new bugs when fixing bugs. But there should be little dependencies on a broken circle rendering routine, so please just fix this cosmetic bug, too. One of the reports is even staged "Fix understood" ...
A more important issue with Sun Java (known since 2005) is this bug, which effectively breaks Java IPv4 networking on Debian unstable now (which recently changed the IPv6-to-IPv4 fallback behaviour). So far, Sun has rated this as "request for enhancement". WTF?
Sure, you can work around the bug easily - change /etc/sysctl.d/bindv6only.conf to use the value of 0 instead to re-enable IPv4 fallback - but after all, IPv4 networking is pretty much an essential Java feature.
Facebook seems to have little interest in protecting its users from a huge flow of common scam/spam. Sure they do get active when accounts are mass hacked, and I havn't seen a "Facebook virus" for some time. Their JavaScript filtering is pretty neat, and they have implemented dereferrer pages they can use to quickly stop URLs from spreading.
However, some of my friends keep on joining very dubious groups and installing very dubios applications. No wonder "FarmVille" is sometime nicknamed "ScamVille". There still is a lot of money to make in dubious ways.
The big problem with Facebook is that everyone can set up groups and applications that look like they might be real. This is why people keep on installing "Mafia wars gifts" applications that have nothing to do with the actual game except the name. And sometimes not even realize they don't actually get these gifts in the real game.
Even worse are the "pimp" groups. It's a classic pyramid scheme. Invite all your friends to the group, then you get extra Mafia points. Facebook really needs to stop that.
A quick search for "invite proof" - these groups usually require you to post "proof" of having invited all your friends - turns up 246 groups, almost all of which promise you Mafia stuff.
Searching for "getElementsByTagName" in Facebook turns up "over 500" groups. This string is a JavaScript command commonly used to auto-invite all your friends to a group. A typical mass-spread group will use this in its "join instructions".
Facebook needs to combat this kind of spam/scam. And it's not too hard. Just actually check user complaints/reports, do simple searches like the ones I posted above, and have some employee go through them and just delete all these dubious mass-join groups. Pyramid schemes likely violate the Facebook TOS, and they definitely are illegal in at least Germany.
Enigma is a great game, with a unique mixture of puzzles with mouse skills and action. If you know the discontinued game Oxyd originally on the Atari ST in the 90s (also on Amiga and one version on DOS), then you know the principle of Enigma. Except that it has tons of more levels and is Open Source.
Some weeks ago, I uploaded a 1.10 pre-release (approximately milestone 5) to Debian experimental. This is the soon-to-be-released new version, using a new level file format (with a much extended API to make level development even easier, ~50% less code per level now), new levels (of course), updated graphics (including support for new graphics modes), ...
Unstable still contains version 1.01; the reason is simple that I knew there would be another 1.01 maintainance release coming. However I believe it doesn't offer much against the current unstable version; it largely marks an upstream release containing patches already in the Debian package (since communication with upstream is really good).
So I have now two choices: refreshing the Debian unstable package to the "probably last" 1.01 release upstream, or going straight for the 1.10 milestones to give enigma some extra testing.
My parents needed a new printer, and after some research I decided to recommend them an HP OfficeJet Pro 8000. Today I gave it a try, by printing some CD covers for a CD to give away for christmas to some friends.
HP failed in a very subtle way: I had printed the covers, cut them, produced the CDs for them. Then I wanted to put the printed covers into the CD cases.
Despite the graphics being 12cm x 12cm in size, HP managed to print them in 12cm x 11.4cm. Without any notice (or giving me a choice) it had decided to scale them on the y axis. Which makes them completely unusable, since they don't fit the 12cm height of the CD case now.
After some more experiments, I decided to retry without duplex, and voila: 12cm x 12cm.
Duplex on HP OfficeJet Pro 8000 is only usable for draft printing, since it will distort your pages!
(See also this devidence in the HP forums, of people with the same issue, an attempt to investigate the margin messup happening, a report that the DJ990c driver can print duplex on this printer without messing with the margins, but is slower and offers less print quality. So it seems that this is an HP driver problem. And technically, it must be caused by the driver; at least it should be able to compensate for this!)
I also noticed another issue with the print. The bottom right corner of the graphic didn't get enough ink, it looks like the printer stopped printing a bit too early. I don't know if this also happens in non-duplex, since I worked around this by adding a header and footer to the page.
Seriously, we should send back the printer. On my first try to use it, I already encountered two bugs. I wonder how many bugs I would see if I'd use it every day?
Somehow, I'm still lacking the optimal media player application. Many popular ones are totally overloaded (e.g. amarok). Others like totem seems to be just a minimalistic frontend for a particular backend.
My current choice:
However, there is one thing I'm really not satisfied with: when putting together a CD compilation for friends (say, as Christmas present), they are quite useless. A key issue here is the total playlist length. Guess what, I want to make sure it fits on a single CD. So I really need to know the total playlist length. Why do so many media players (e.g. totem, alsa-player-gtk, xfmedia4, vlc, mplayer, ...) not show you the total playlist length? They did read all the files to get artist and title. Many even have the individual song lengths, just not the total sum.
In the past I've been using old XMMS1 to check for the total length, or a CD burning application like K3B by repeatedly importing my current folder.
Right now, I'm using Quod Libet (since I like the tag-editing component exfalso a lot) to arrange the playlist. It also gives me the total length, albeit I belive I've had incorrect song lengths in it before (broken VBR files?), and it's not perfect, too: being database-driven it has really long startup times for occasional users (because of updating the database) and is much more heavyweight. I also believe I've lost some playlists because I had moved my files around once ... so I'm a bit sceptical.
Anyway, there are still hundreds of media players I havn't looked at. Don't bother me to send me an email about one I havn't mentioned!
But if you are developing a media player, please consider the use case of putting together a music CD for your friends. In particular, for users that do not use your player all day.
The following User stylesheet snippet can be used to highlight particular search results (such as your own domain, if you want to quickly find it in Google search results):
@-moz-document url-prefix(http://www.google.com/search)
{
a[href^='http://www.vitavonni.de/'] { background-color: yellow; }
}
You might also want to add a copy for your localized Google domain:
@-moz-document url-prefix(http://www.google.de/search)
{
a[href^='http://www.vitavonni.de/'] { background-color: yellow; }
}
Or you could go the heavyweight way:
a[href*=vitavonni.de] { background-color: yellow !important; }
to even highlight any link to your domain.This modification obviously only applies to your browser; it's meant to help you finding links to your own site more easily.
For a Java project, I wanted to give the Eclipse profiler a try. It didn't work, because it was missing a library (open the "Error log" view to see such things)
The corresponding library - libstdc++-5, and old C++ library - is no longer available in Debian unstable, so you need to grab the package from lenny. It will install fine on unstable.
Things may or may not be different on other architectures.
[Update: But TPTP is far from stable for me. It freezes Eclipse pretty much all the time.]
I'd like to make pyroman IPv6 capable. That is actually the one big thing before calling it a version "1.0".
I must admit that I havn't been very active on Pyroman (or Debian in general) the last years. This goes even so far as that "pyroman" was considered "abandoned" by Fedora or so. It is not; I use it on all my servers. It's still in use at the network I developed it for (after all there is not that much benefit for a workstation setup, where a 10 line iptables script will do the job just perfectly.).
Anyway, I'd like to get IPv6 support into pyroman, but there is one big issue here: I don't have any machine using IPv6, so I havn't used ip6tables myself yet, so I don't know about all the magic involved ...
So if you use IPv6, it would be very cool if someone would jump in to get full IPv6 support into pyroman. Madduck had already done some preliminary stuff, but I didn't get around to have a look at the integration or completeness yet.
The '--no-act' and '--print' modes of pyroman should even allow development without any IPv6 support or root permissions in the system.
Other things remaining on my pyroman wishlist:
Here's a code fragment to track outgoing links with Google Analytics. As usual, use it at your own risk. I can not give you support for Google products, for obvious reasons.
To use it, you
function trackLinks(){
var as=document.getElementsByTagName("a");
var ig=["mydomain.tld","google-analytics.com"];
for(var i=0; i<as.length; i++) {
var ignore=false;
var oc=as[i].getAttribute("onclick");
if(oc!=null){
oc=String(oc);
if(oc.indexOf('urchinTracker')>=0
|| oc.indexOf('_trackPageview')>=0
|| oc.indexOf('javascript:')>=0)
continue;
}
if(as[i].href.indexOf("mailto:")<0){
for(var j=0;j<ig.length;j++){
if (as[i].href.indexOf(ig[j])>=0)
ignore=true;
}
}
if(!ignore){
as[i].onclick = function(){
var o=this.href.replace(/:\/*/,"/");
pt._trackPageview('/out/'+o)+";"
+ ((oc!=null)?oc+";":"");
};
}
}
}
This code tries to attach an onload handler to any outgoing link, ignoring internal links or links that use JavaScript. If such a link is clicked, it generates a virtual page access with an "/out/" URL that can be analyzed in Google Analytics.
A side benefit (apart from knowing which links are interesting to your visitors) is that you should get more accurate "time on page" statistics for your pages.
I do not really understand why they don't support this themselves, but Google Analytics will not track keywords for Google image search. Instead it just shows up as "referrer". A site I'm webmaster for, Swing and the City, gets a lot of image search exposure (funnily for an image that is gone since August, Google also needs to work on their index, too), so it was a bit odd to have images.google.com show up as top referrer but not "organic search".
Here's the code I use to fix this:
var r=document.referrer;
if(r.search(/images.google/)!=-1 && r.search(/prev/)!=-1){
var e=new RegExp("images.google.([^\/]+).*&prev=([^&]+)");
var m=e.exec(r);
pt._addOrganic("images.google","q",true);
pt._setReferrerOverride("http://images.google."+m[1]+unescape(m[2]));
};
pt._addOrganic("maps.google","q",true);
pt._addOrganic("forestle.org","q",true);
pt._trackPageview();
Note that image search is more complicated than the maps and forestle search engines I also add for keyword tracking. The original query is encoded in the "prev" parameter, and the easiest (or only?) way to get working tracking is to use the ReferrerOverride function of analytics.
Note: this is not a straight copy & paste, since I use this code in a compressed and encoded (for injection into the page via DOM ops) form. So no guarantee of syntax completeness. You'll need to adjust it to your variable naming anyway (I use "pt" instead of "pageTracker"). This is just to show you the use of unescape on the "prev" parameter for this purpose.
I wonder if it's possible to identify link spammers (you know, these bots that mass-submit a link into as many blogs/etc they can find in order to boost their page rank) by the simple measure of how many of the links to their site are marked 'nofollow'.
Say, a regular page should have less than 5% (and less than 20) nofollow links; a site that goes significantly above this value probably employs some spam bot.
The only really hard thing is how to avoid attacks on a site using this ... say, I write a bot that spams links to Microsoft on as many sites as it can find that DO use 'nofollow', in order to get that site above the limit, and have google penalize it.
So in general I don't think Google would automatically penalize such things, still it could be used to e.g. have a human check the destination site for useful content, and then only blacklist when it doesn't seem to be useful.
P.S. Which BTW is a reason why some of the SEO "do nots" are bullshit: it would be too easy to deliberately use these to blacken a competitor. So a 'link farm' will at most do nothing to raise your ranking; but Google must not allow you to actually lower a competitors ranking by setting up a link farm to him!)
P.P.S. On another side note: Who guarantees that Google actually ignores "nofollow" links? They could also just be assigned a lower weight or a penalty, so that a "nofollow" link from a strong site such as Wikipedia would still be worth a lot, while the average blog comments page link goes down to 0. Say a "nofollow" link from a PR 6 site is as much worth as a regular link from a PR 4 site, and PR 2 becomes PR 0. Would already do much of the trick in discouraging the use of blog spam bots. Because after all, ignoring the links on Wikipedia for page rank would be quite stupid. In German Wikipedia, the page contents are even "sighted" (aka: peer reviewed); this is a rather trustworthy source, especially when you take time effects into account. A link being constantly in Wikipedia on a popular page for more than a month very likely is good.
These days, something happened to one of my external USB drives that I so far only knew from ReiserFS (which I since called ReisswolFS, German word play on "shredder" ...). But, it's not ext3 which I blame.
Short story what happened:
As you can see, something was wrong with the system, not with the file system.
I have a strong suspect to have caused this. In case you wondered why I included "resumed from suspend" above: I've been having system stability issues with resume ever since upgrading to the Intel driver 2.9.0 and KMS (Debian unstable+testing) with kernels up to 2.6.31. In about 1 out of 5 resumes, I get a Xorg or system lockup after anything from 1 to 60 minutes. Sometimes I also experience video corruption after a few minutes, trashing some terminal emulation until the next redraw. Just before writing this email I had a typical lockup: when scrolling the terminal emulator. This has been a typical trigger for lockups. On contrast I havn't seen any such crashes (or screen corruption) on a fresh boot.
Freedesktop bug reporting the same issue closed as "not our bug, blame it on the kernel".
Note that 2.6.32 release candidate Changelog contain many changes for the intel DRI kernel driver. So the bug might already be fixed in the RC kernels.
Same report in Kernel Bugzilla is still 'NEW' though.
Related bug report in Debian, blaming it on KMS.
[Update: I've disabled KMS and upgraded to 2.6.32-rc8 and not had such a crash since. But I can't pinpoint it to one or the other yet.]
[Update: just tried another external harddisk ...
[305032.148616] EXT3-fs: mounted filesystem with ordered data mode. [305066.061708] usb 1-8.3.3: reset high speed USB device using ehci_hcd and address 27 [305081.132471] usb 1-8.3.3: device descriptor read/64, error -110 ... [305147.468857] sd 4:0:0:0: Device offlined - not ready after error recovery [305147.468880] sd 4:0:0:0: [sdb] Unhandled error code [305147.468886] sd 4:0:0:0: [sdb] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK ... [305147.473500] WARNING: at /build/buildd-linux-2.6_2.6.32~rc8-1~experimental.1-i386-g1b8iG/linux-2.6-2.6.32~rc8/debian/build/source_i386_none/fs/buffer.c:1159 mark_buffer_dirty+0x20/0x7a()It seems as if the USB disk stack still doesn't really survive suspends? Let me try on a fresh boot later on.
When I got my Google Wave account, it took the invitation about a week to arrive. A few days ago, I got my first own invites, and invited some colleagues (in an attempt to actually find a use for Google Wave beyond "rich media live messaging"). Within a few minutes they were "in". Now I just got my second set of invites. So is Google Wave now getting ready for mass opening, rocketing user numbers?
As you might have already guessed, I'm not convinced by Google Wave. It's technically interesting and well-done. The demos are all nice. It's just that the UI in the browser is a bit fragile and cumbersome, and the big question so far is:
What does Google Wave allow you to do that you couldn't do before?To me, there has been little actual use so far. Wave can do everything, but isn't optimal in any of them:
Yes, I'm aware that you should differentiate between the protocol and the ui. Still pretty much everything is currently designed for the web browser with full JavaScript and Flash capabilities.
Of course this isn't the end yet, Google Wave will evolve. Maybe into something cool, maybe it will remain just a niche thing. Maybe some cool apps will just use Wave as protocol. But I figure, I'll mostly wait for these things to happen first before I become a frequent user of Wave.
The biggest thing I see is the "spam" (this especially includes 'Quiz', Mafia Wars and similar Scamville type of 'apps' that surely will show up in no time, once Wave is open to the public). What will Wave provide to me to handle this flood of worthless information that I'm getting more and more?
P.S. Please don't bother to ask for invitations to Wave.
P.P.S. here's how to replace the odd scrollbars with the regular OS scrollbars with a really simple user style (CSS).
Starting 01/01/2008, Bavaria had introduced a quite hard smoking ban, which also included bars and restaurants. It however contained a backdoor by excluding non-public locations, which led to the creation of 'smoker clubs' where you had to become a member to be admitted. At some point, most clubs were of this kind.
In August 2009, however, the law was changed to exclude beer tents (Oktoberfest ...) and small bars. Many people belive that this was to get votes on the elections in september 2009 (which ended up in a minus of 6-7% compared to the previous election and a historical low for the biggest party).
This caused several organizations to call for a public vote on restoring the smoking ban to the 2008 state (without the 'smokers club' backdoor). In order to force a public vote on a law (without the governments support!), we need 10% of the voters to register as supporters for the vote. You have to register at your registered home town. For Bavaria, this means about 940.000 supporters.
If you are registered voter in Bavaria, please drop by your municipality and sign up. You need an ID and 5 Minutes, that's all. 940.000 supporters is an incredible lot of people to get to the offices, take along your friends!
When we get enough supporters, the Bavarian government has two options: accepting the changes as proposed (and thus making the initative obsolete), or conducting a public vote on it, offering an alternative (e.g. the current law, no change) and have the voters decide (which is quite expensive, so if many many people sign up, they might save that money and just pass the proposed change themselves).
For more information (german only), check the Nichtraucherschutz Bayern Website, including the sign up office locations.
P.S. In other European countries, the introduction of a strong smoking ban has led to a 10-15% decrease in heart attacks (20% for non-smokers). The german constitutional court has also already ruled that the protection of non-smokers and employees from passive smoke weights stronger than the individual's freedom to smoke in enclosed spaces.
We'd like to host DebConf 2011 in Munich, Germany.
However, this is a far from trivial challenge:
Rent in Munich, in particular for conference rooms, is far from cheap. In my opinion, unless we get some really big sponsor (and I'd still prefer spending sponsor money to fund developer trips to the DebConf instead!), the only chance we have is to get some rooms at the university.
However given the development of the recent years (budget etc.), it has become a lot more difficult to actually get rooms at the university for such events. Unless the event is considered to be fully a part of the universitys "work", we might have to pay rent to the university. Which again isn't that affordable.
Anyway, if you are in Munich, working at one of the universities, or in any way interested in supporting DebConf 2011 in Munich, please join the DebConf11 Germany mailing list. Also check our meetings scheduled on the DebianMuc Wiki page, currently every Monday, 18:00, at the new LiMux offices in Sonnenstr.
P.S. There will also be a Bug Squashing Party in Munich end of November: Munich BSP November 2009
Every time Facebook changes anything, people complain. Most of the time just because something has changed, without knowing actually what changed.
The october layout change for example isn't too big in fact. As far as I can
tell it's not much more than turning the "hot" items that were in the
right sidebar into a special tab (and breaking the refresh for the live feed,
but I guess they'll fix that soon). The "live" tab is basically all
information (see below for getting rid of certain restrictions); the "News"
tab tries to reduce this amount of information by only showing you certain
posts Facebook magic considers to be "important". If you are a heavy user you
will probably prefer the "Live" feed, if you are a casual Facebook user, go
with the "News" feed to have less crap posts to read.
Still there are some things you should be aware of when you are a facebook user (not all of these are new):
Also you should never forget that all the data you put online is hard to get rid of again. Just don't put anything there you don't want everyone to know. Facebook can be really powerful when used right for example as promotion channel. But the way you should be using it is to first consider what you want people to have an impression of you, then try to present yourself this way. Don't just throw everything that comes to your mind there. (This even more applies to blogs and web sites, obviously, that don't have any privacy control)
Google mag ja nicht, dass man "googlen" als Verb verwendet. Das schwächt ihre Marke, scheint zumindest die gängige Meinung zu sein. Die einen schreiben schon "gugeln" - eine Alternative wäre "soochen":
soochen: mit einer Internetsuchmaschine suchen, beispielsweise der bekannten Suchmaschine mit zwei 'o' im Namen.
A short update on some friends of mine.
First of all, Patrick F. Riley - I worked with him on some projects when I was visiting the UC Berkeley, one of which was a predecessor to his latest thing: LiveDash. It's really cool: it allows you to search almost in realtime in TV feeds. It also live-indexes Twitter, blogs, news sources etc.
Secondly, HoneyWish (currently only available in German) is a service for a "honeymoon travel gift list" thing. It works like the traditional gift lists, except that instead of putting all kind of household stuff on it, there are all the parts of the honeymoon trip on the gift list. This makes much more sense these days: people tend to get married later; they might even be sharing a house for some time before getting married. So they don't need much silverware anymore, but they for sure will enjoy their honeymoon trip - so what could be a better gift for them?
Third, Amiando a web-based ticketing and event management service. Founded already some years ago by some friends, it has been growing and coming along nicely. Every now and then, it won some award, many of them in the "top startup" category.
There are of course many more projects of friends I'd like to point out, but these three definitely are highlights.
If you are doing a complex web layout (such as my Swing and the City layout which features alpha-transparent fixed layers), and want to embed Flash (e.g. on the Was ist Swing? page - German: What is Swing), make sure you add the attribute wmode="transparent" to your embed tag, and <param name="wmode" value="transparent"></param> to your object. Otherwise, a layer - in particular popup menus - might end up below the flash.
This includes you, YouTube. In HD view, the user popup menu only has the top 3.5 entries out of 5 accessible for me.
The following XSLT stylesheet can be used to find such embeds in a bunch of XHTML files using the command line xsltproc findNoWmode.xslt $( find -iname '*.html' )
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:html="http://www.w3.org/1999/xhtml"> <xsl:output omit-xml-declaration="yes" indent="no"/> <xsl:template match="/"> <xsl:call-template name="t"/> </xsl:template> <xsl:template name="t"> <xsl:copy-of select="//html:embed[not(@wmode) and (count(param[@name='wmode']) = 0)]"/> </xsl:template> </xsl:stylesheet>
You can of course also write a XSLT stylesheet to insert the wmode statements whenever there is none, to make transparent your default.
[Update: I've received comments that this comes at qutie a performance cost for Flash, and that this might be the reason why YouTube doesn't use it - in particular for the HD videos. Also it isn't supported by WebKit based Browsers so far (so Safari neither?) and nor does it seem to be working in Gnash, an opensource flash plugin. So you have to choose between multiple evils if you are using Flash...]
Swing and the City ist die Tanzschule von Christine bei der ich auch als Trainer mithelfe, und die Webseite habe derzeit ich etwas unter meine Fittiche genommen - mein Haupt-Beitrag zur aktuellen Renovierung (Ende August eröffnet Christine ihr eigenes Studio nähe Hauptbahnhof).
Die letzten Tage habe ich das Projekt "Tanzbeschreibungen" in Angriff genommen, und jetzt sind die Seiten mit den Beschreibungen (inkl. kurzem historischem Abriss und Videos) zu Lindy Hop und Balboa & Bal-Swing online. Hab aber noch nicht mal nach Rechtschreibfehlern gesucht.
Wer also da Fehler jeglicher Art findet, Video-Empfehlungen hat, oder mir Texte beisteuern will (Shag, Charleston, Blues und Burlesque sollen auch noch kommen!) ist selbstverfreilich herzlich Willkommen.
Ich freue mich natürlich auch über Links - Javas "Swing", eine Autovermietung "Swing" etc. und natürlich gewisse Etablissements machen es nicht einfach, uns bei Google zu finden... ;-)
We've opened a completely redesigned Swing and the City web site today. The layout was quite a pain to get working because of transparency and non-scrolling parts. But on my last tests, it was working quite well in all of the major browsers. But if you notice any issue, please tell me (email: erich AT debian org)
I'm aware that the red-yellow border on the left doesn't line up right. I'm waiting for fixed graphics from the designer for that. There is also a glitch with clicking the logo when scrolled just a little bit down. These are on my to-do. At some point I also want to increase the use of CSS spriting to further reduce page load times. Oh, and Internet Explorer sucks, btw.
The web site is about Swing dancing in Munich (so no tech today), and at this time only in German. At a later stage, we might add English, too.
During August we'll also be building our own studio, "Cats Corner", which will actually be somewhat similarly decorated. :-) Congratulations to Christine for doing all that for the Lindy Hop scene!
P.S. Bring Down IE6.com, IE6 No more.com
P.P.S. See this blog post on how it is impossible to use the CSS "clip" property in a way that both IE7 and IE8 will understand. While only one is W3C standard, Firefox just accepts both ... but at least IE8 goes with the official standard now.
Hans-Peter Kriegel, Peer Kröger, Erich Schubert and Arthur ZimekHas been accepted at CIKM 2009 (The 18th ACM Conference on Information and Knowledge Management), November 2-6th 2009, Hong Kong. And will appear in the conference proceedings published by the ACM Press.
LoOP: Local Outlier Probabilities
It's an outlier detection method based on LOF (Local Outlier Factor) but a bit more statistically robust and with an easier to interpret score. Given the statistical backing, it works reasonably well on samples such as data pages of an appropriate index structure, reducing complexity to linear for the approximative version.
This publication is a bit special to me: I suggested the approach to my colleagues and they gave me the abstract and title for my birthday. :-)
This week I'm at the 11th International Symposium on Spatial and Temporal Databases in Aalborg, Denmark.
My demo was yesterday, titled:
Elke Achtert, Thomas Bernecker, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek:
ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series.
While this release visible only adds a small piece - some distance functions for time series and some related visualization code - it still marks a major milestone in ELKI development. Large parts of ELKI were reorganized and rewritten (such as all the output handling code) and lots of stuff added, including a lot of visualization related code that is not yet completely used in this release.
As many will know, Swing dancing and -music has become my big hobby and love. I'm co-teaching classes every week, and of course people ask me where to get some music to dance to.
For this, I'm trying out the Amazon "aStore" functionality. Basically you setup a few categories and add Amazon products to these categories for people to choose from. Of course Amazon will also show other products it considers relevant etc.
My Swing music on Amazon "store" (on Amazon.de, but I guess it will also somehow take you to other Amazon sites?)
The (editor) UI is not very convincing yet. For example, it lacks an obvious way of moving "products" from one category to another, and you can't see more than 9 entries on a page in the editor, reordering is via entering sequence numbers etc. - that definitely could use some improvements.
Anyway, some people might find this useful.
... wie immer haben alle gewonnen. Die CDU verliert 5.8% auf den historischen Tiefstand von 30.7%, feiert aber dafür einfach die FDP.
Die CSU feiert, dass sie nur 0.8% verloren hat, und dadurch sicher über die bundesweite 5%-Hürde kam.
Die SPD, die "nur" 0.7% verloren hat, feiert dass die Wahlbeteiligung niedrig war, und natürlich die ganzen Nicht-Wähler eigentlich SPD wählen würden.
Die Grünen feiern geringe Gewinne - aber teilweise zu Recht: Beispielsweise in Freiburg - mit über 50% Wahlbeteiligung deutlich stärker als der Durchschnitt - wurden die Grünen stärkste Partei. In München (sowohl Stadt als auch Landkreis) und Stuttgart beispielsweise zweitstärkste. In Augsburg gleichauf mit der SPD.
Die FDP träumt schon wieder von 18+x, wobei sie zugibt dass ihre Wähler im Wesentlichen "Protestwähler" der Union sind ... oder ob es daran liegt, dass ihre Spitzenkandidatin als attraktiv gilt und für die Bunte und Parline schreibt und sich für den Stern auszog (statt im Europäischen Parlament zu sein?)? (Zugegeben, so erreichte sie eine ungeschlagene Bekanntheit der Spitzenkandidaten)
Die Linke feiert dass sie weiter zulegen konnte, obwohl die Gewinne deutlich unter den Erwartungen zurückgeblieben sind.
Oder wie es die Tagesschau basierend auf Infratest Dimap analysiert:
FDP profitiert von Unionsverlusten
[...]
Union verliert in allen Bevölkerungsgruppen
[...]
Die Linke wird stärkste Kraft bei Arbeitslosen
Wie immer gibt es in der Politik nur Gewinner, denn sonst müsste man auch mal etwas anders machen als bisher.
Wolfram Alpha was often hyped as the latest and greatest search engine.
I wouldn't call it so. It's just a very minimalistic search frontend to a nice database with lots of numerical facts.
Yes, it can give you the height of the eiffel tower (because that's a fact in its databases). It can even compute for you what Pi times the height of the eiffel tower is. But that is about as far as you can go in combining. In my tests, I wasn't able to compare the temperature in Munich with the temperature in Berlin (both of which WA will visualize you with a pretty graph, so these are facts in WA) - their query parser just doesn't get my question.
The funniest reply so far however was to the question:
How many cars in Germany?The answer of WA (which btw is copyrighted by WA):
No
Seriously, I doubt that there are no cars in Germany.
At least it also offers an explanation why it comes to this conclusion:
Cars is a town in south-western France (which as you might guess currently is not a part of Germany. :-) ) - so for WA, there are at least cars somewhere in Europe, but not in Germany!
... you might be bitten by this Java bug rendering arcs as straight lines at large zoom levels.
It looks like a classic to me: in order to improve rendering performance, you approximate arcs with straight lines at small resolutions (if it's just 2 pixels big, nobody will be able to tell the difference). Except of course, when you end up doing the same approximation at a large zoom value - of course a 100-pixel circle looks different from a 100-pixel diamond.
Reported in 2005, still not fixed in current Java (we're in 2009 now).
Sun is really slow at fixing Java bugs.
See also a related Apache Batik bug report. Fortunately, this only applies to Java rendered graphics - SVG export, PDF, Postscript are all fine.
Is there any way to provide an alternate CSS stylesheet for GoLive CS2 only, not for regular browsers? Because there are some things in that layout that are too difficult for the GoLive renderer, it doesn't display them right. The pages are still editable (just plain XHTML), it's just not looking right in GoLive (advanced CSS).
The site already has alternate stylesheets for browsers such as the broken Internet Explorers, so if I could convince GoLive to use their stylesheet it might be looking a lot better in the editor, too ...
I am aware that GoLive CS2 has been abandoned in favor of DreamWeaver. Still it's going to be used in a project I help with the web templates.
(Other options would be Kompozer and Amaya, but none of them seem really fit for production use: Amaya was just removed from Debian because it had some security issues and the maintainers had the impression the code was such a mess that there will be much more such issues. And Kompozer seemed to be a mostly dead branch of a Gecko hack (although there has been a new alpha release this year) ... is there some reliable opensource non-source HTML editor that I'm missing?)
P.S. Sorry, no comments in this blog. Use Email: erich AT debian ORG
Arrays and Generics in Java do not mix very well. In order to create an array, you need to know the object class the array is supposed to store.
Arrays in Java are special: they can efficiently store primitive data types. The expected difference in efficiency between byte[] and Byte[] is pretty big (of course some good VM might optimize) for obvious reasons (think of: references, garbage collection, pointer sizes, ...).
This is probably why you need to know the type before creating an array (because an array of primitive types such as byte will be different from one that stores objects of some kind).
In particular, the following Java code
String[] foo = (String[]) new Object[0];results in a run time error ("[Ljava.lang.Object; cannot be cast to [Ljava.lang.String;"). But it gets more confusing when you introduce generics:
public static <T> T[] test() {
T[] te = (T[]) new Object[0];
System.err.println(te.length);
return te;
}String[] foobar = test();
will print "0", then throw the same run time error in the foobar line.What happens here is that in the test() method, T actually is replaced with "Object" at compile time. Thus the array type works just fine, and so does the call to te.length. Upon returning, it is then cast into a String[] array and fails.
Now here comes a crazy Java hack:
public static <T> T[] test(T... ts) {
T[] te = (T[]) java.lang.reflect.Array.
newInstance(ts.getClass().getComponentType(), 0);
System.err.println(te.length);
return te;
}String[] foobar = test();
The exception is gone, foobar is of the proper type now!
A result of discovering this hack are these two methods:
public static <T> T[] newArrayOfNull(int len, T... ts) {
// Varargs hack!
return (T[]) java.lang.reflect.Array.
newInstance(ts.getClass().getComponentType(), len);
}public static <T> T[] toArray(Collection<T> coll, T... ts) {
// Varargs hack!
return coll.toArray(ts);
}
Notice how elegant the last method looks - and it finally allows you to do toArray(collection) instead of collection.toArray(new WhateverClassTheCollectionHas[0]).
Note that this is still a hack, and may or may not work with all Java compilers, JREs and/or Java versions.
Update: Note that this 'hack' is also not transitive. The context calling toArray needs to know the object type at compile time. So it doesn't save you much more than writing "new KnownClass[0]" etc.
Update: So I'm actually not using this - it's just a hack, and often quite hackish. The problem is that when you call e.g. toArray in an Generics context, it will actually create an array of "Object", so it makes much more sense to verbosely specify the class you want to use for the arrays (and get some reliability in use back).
Has anyone experience with Dropbox?
It seems to be an interesting web storage service, with 2 GB of free storage.
However, the Linux client seems to be closed source (which is understandable, it seems to have a lot of neat features) - so I intend to use the web interface only (at least for now).
Update #2: There is a RFP bug for Debian, some Source is on the download site. And while this sort (except the images) is GPL, it's just the nautilus integration part, not the daemon you also need.
Did you try Dropbox? Does it work well? I know some people (especially Windows users) who could benefit a lot from a service like that, so I wonder if I should recommend them Dropbox. Or is there some better alternative (it should allow sharing of files though - synchronization is not as essential, it is a lot about exchanging files too large for usual email in small user groups; still synchronization probably is a comfortable way of transferring the files without having to think about it yourself)?
No comments in this blog - email me via erich AT debian ORG.
P.S.
I know there is some referral program to get more storage, feel free to
send me your referral link - I'll remove this PS once I've signed up.
P.P.S. There also is Ubuntu one, but as far as I can tell Ubuntu only so far. Looks very similar.
P.P.P.S. So far, I've received a lot of praise for DropBox.
P^4.S. My own referral link, feel free to use this to sign up (+256 MB for you, too!) and "upgrade" my account.
Quoting MSDN on the CSS "clip" property:
As of Internet Explorer 8, the required syntax of the clip attribute is identical to that specified in the Cascading Style Sheets (CSS), Level 2 Revision 1 (CSS2.1) specification; that is, commas are now required between the parameters of the rect() value.... and if you want to support both?
...
In Internet Explorer 7 and earlier (and in Internet Explorer 8 or later in IE7 mode, EmulateIE7 mode, or IE5 mode), the commas should be omitted.
I see a few options:
DarkSEO has some code to attack php3bb captchas. (Note: I didn't even look at the code, it could be a virus or anything).
I do not find that very surprising that this has happened, most of the captchas around are very naive, and I've seen multiple scientific articles detailing how to attack various captchas. Many use colors and thin lines to make them look hard, but after applying a naive energy function and doing some blurring to remove the thin lines, they break down.
ReCaptcha is quite interesting, because it doesn't bother with some useless colorification that doesn't change contrast. But I wonder if it can't be overrun by spammers and how long it will scale. Still I figure it is what I would pick right now, because they can upgrade it if it actually is attacked by solvers.
It doesn't help much for the proxy attack on Captchas though (offer users to view some pr0n in exchange for solving a Captcha that you actually were given to solve by another site) - at least not when combined with some XSS and/or bot net. (The 'obvious' proxy approach can be IP-filtered.)
For a research project, I'm looking for some real-world time series data. Time-series are an interesting thing to study, however it is hard to get access to interesting non-trivial real-world data.
I was wondering if some people could contribute me some summarized web access data; no URLs or IP addresses.
The data I'd like to get can best be explained by the preprocessing step:
... | perl -ne '/\[(\d+\/\w+\/\d{4}:\d\d):\d\d/
&& print $1."\n";' | sort | uniq -c
(Sorry if you aren't fluent in regexp - it extracts the date and hour out of
an Apache default log file, nothing else. These lines are then summarized
by counting their unique appearances.)That should produce 24 lines per day (one per hour), looking like this:
count day-of-month/month/year:hour
It would be cool if you could send me some series for a couple of sites, if you happen to be in the position to provide this data. The data should cover at least a few weeks, the longer the better even up to a few years.
Too small sites are however not very useful but might be too noisy (so probably not the personal home page of your mom). If you are providing a larger number of series, you are of course free to include them.
I don't care much about what the site actually contains, I'd just ask you to give a tiny amount of meta information:
Data use:
The main project idea is to evaluate different distance metrics in their capability of separting the different data sources, assuming that there is some difference in the shape in these curves. A different problem can be constructed by breaking the series into chunks covering approximately a day and then trying do separate different days, starting hours of the series (or offset server timezone vs. user timezone) and/or weekdays from weekends.
In our experiments, we've come to the conclusion that the experimental results are most interesting when there is a sufficient number of classes; so I'd like to get like 20 different interesting data series. At the same time, the series should be long enough, so I can break them into multiple chunks to have a reasonable number of 'sub-series' per class. If I have really long series, or e.g. series covering the same site but from multiple servers, I could even experiment with taking sample of different length from these sets.
(Say I have series covering 2 years, that is ~17k samples, from 3 servers, then I can take 51 disjoint sub-series of length 1000, or 102 of length 500, ...)
But it's obviously not possible for me to collect this data myself - I don't operate two dozen of such sites myself ...
An extra project I've been considering some time is some peak prediction for web accesses. Say you're running some fast growing site, wouldn't it be useful to have a prediction when the number of accesses will likely hit some magical limit (and e.g. overload your server) so you can increase your capacity on time? Of course it would be more sensible to apply this prediction e.g. onto CPU usage, e.g. predicting when your system might hit 90% load average over a 5 minute window in regular operation. Network bandwidth and disk IO also come into mind. You get the idea.
Please send them via email to erich.schubert AT gmail com
Thank you.
[P.S. Already recieved the first series, thank you! I can take care of sorting myself, no need to worry about that. And yes, I'm aware that the series will probably all be quite similar - common computer usage patterns such as work hours - but that is common in real world data and part of the challenge. Separting apples from dinosaurs is not a challenge.]
I've previously mentioned my plans on redoing my blog. Well, I've settled down on some design issues already (posts will be stored as mini Atom feeds, which makes the generation of Atom feeds for the blog and categories trivial, and gives me maximum flexibility. I already have a working converter for my existing blog to Atom posts.)
Generating static HTML pages from that will be easily possible using an XSTL transformation (for example), and I got the feedback that I could just use Google AppEngine for blog comments, so my actual blog could remain static-only (and thus much more secure and reliable). Any attack/spammer/bot can then only kill the comment functionality, not my own site.
Which brings me to another design consideration: the editing widget. Either for the blog comment application, for writing my own blog entries (via a https protected script or whatever) or maybe for a small CMS I've been thinking about - having a reliable HTML in-browser editing widget is something I could use every now and then (well, I'm not doing much Web stuff anymore these days).
Geniisoft has a good overview over in-browser (aka: Through The Web, TTW) editors. The top candidates seem to be:
I've heard before of FCKeditor and TinyMCE; I think I've been on the Xinha page before, too. However, comments on them have not always been good.
To some extend they all seem to have (to some extend) feature creep which usually is a bad sign - most often this means that there are security issues in one or another module or plugin.
TinyMCE for example has been described as "a bit of a pain" and "a tad clumsy" on the GSoC mailing list. I have had less fights with it than with the Debian Wikis (MoinMoin) markup language though.
I'm not looking for anything as big as these - I just need an editor that allows for some basic formatting (bold etc., links) and that produces reasonable XHTML output. I'll be feeding the output through some custom cleanup script anyway, which will kill disallowed code. So I don't want any editor which allows the user to create code that will then be killed afterwards.
Any personal experiences with any of these, or an important alternative that I might have missed (no PHP involved, please!) - email me (no comments on blog) at erich AT debian org.
Update: I've received a couple of pointers. Please don't send me links to projects that are not actively maintained anymore - I don't want to care about having to fix bugs in the editor widget myself.
One link I've received twice is actually quite impressive: WYMeditor. It doesn't try to look like a word processor, but actually is more of a semantic editor. Much more what I'm looking for than any of the others. I've also received a link to the Yahoo! UI Library Editor, which is quite clutter-free, but in the default setup at least very text-formatting oriented, not very semantic (that doesn't mean you couldn't change it that way, I guess you can). I was also pointed to Dojo, but that framework is totally feature creep (which also explains why it loads so slowly I guess), and the last time I looked at it's source code, I had some WTF moments - code quality at least outside of core doesn't seem to be very high (Yes, that code implements the "mod 7" operation using list shifts instead of a simple arithmetic operation).
Looks like I'll give the first try to WYMeditor. Update #2: the code seems to be rather ... complex. I'm looking for something neat and clean; it doesn't need to bring along yet another XHTML schema and validator ... Maybe I should try one of the others first?
Offizielle Pressemitteilung 127/09 des Bayerischen Staatsministerium des Innern:
Killerspiele widersprechen dem Wertekonsens unserer auf einem friedlichen Miteinander beruhenden Gesellschaft und gehören geächtet. In ihren schädlichen Auswirkungen stehen sie auf einer Stufe mit Drogen und Kinderpornografie, deren Verbot zurecht niemand in Frage stellt.Hallo? Killerspiele töten nicht. Menschen töten, mit Waffen, nicht mit Spielen.
Oder um es mit dem Comedian Vince Ebert zu halten:
Verursachen Zahnspangen Pubertät?Statistisch ist diese These nicht leicht zu verwerfen - die beiden Ereignisse sind miteinander korreliert. Nur auf diese Weise findet man nicht die richtige Ursache-Wirkung-Beziehung.
Und so halte ich folgende These bzgl. Killerspiele dann doch für plausibler:
Menschen mit einer Neigung zu Gewalt bevorzugen Killerspiele (und tendieren auch eher zu Gewaltausbrüchen wie Amokläufen)
Um noch einmal auf Vince Ebert zurück zu kommen:
A: Warum dachte man früher, die Sonne kreise um die Erde?
B: Es sieht halt so aus, als ob die Sonne um die Erde kreisen würde.
A: Wie sähe es denn aus, wenn es andersrum ist, und die Erde um die Sonne kreist?!?
Fazit: Interpretationen sind immer subjektiv, und man sollte sich die Mühe machen, das ganze auch mal von einem anderen Standpunkt aus zu interpretieren, denn dieser kann eine ganz andere aber ebenso plausible Erklärung liefern! Und eine solche Interpretation lautet eben, dass Killerspiele lediglich das Gewaltpotential sichtbar machen, dass durch tiefer liegende Probleme verursacht wird.
Aber die Symptome zu behandeln statt den Ursachen, das war in der Politik schon immer einfacher und populärer.
In a Java framework I'm working on, 'pairs' arise everywhere. Unfortunately in contrast to e.g. C++, Java doesn't include a predefined 'pair' class. C++ templates are really nice because of the way they are complied and optimized (in particular they also handle what Java calls 'native datatypes'); Java generics aren't up to par with that. (But yes, Java offers other benefits, such as being much easier to parse and thus refactor). Anyway, this is not going to be a rant on Generics.
So I have this interface in Java called Pair<FIRST,SECOND> along with two implementations SimplePair<FIRST,SECOND> and ComparablePair<FIRST extends Comparable<FIRST>,SECOND extends Comparable<SECOND>>.
For performance reasons, SimplePair is declared 'final', and so is ComparablePair. It's written everywhere that making classes final can make a large difference in Java, and since these objects will be used in a lot of places, it seems reasonable to care about this here.
However, it would often be nice to have better readable code, that is assuming I'm using SimplePair<Monkey,Banana>, it would then be nice to make a derived class BananaPreference extends SimplePair<Monkey,Banana>, with added methods getMonkey() and getPreferredbanana() to make the resulting code more readable.
Having readable code is also often quite as important as having performant code, after all ...
If someone with solid experience in Java optimization has some ideas to share, please do so! Email: erich AT debian DOT org - no comments in blog.
Right now, I have one idea on how it could be possible to achieve both (seriously, I could use some feedback from Java Gurus on that): make SimplePair and ComparablePair abstract, all methods there final, then derive final classes as needed. Does that combine the benefits?
[Update: I received from Joachim Sauer the following helpful link: JavaOne presentation on performance tuning and various VMs. Basically this seems to indicate that in all these common situations, any modern Java VM should be able to figure out the inlining options automatically and optimize appropriately, so it won't benefit from any "final" hint by the developer. Note that a C++ compile doesn't do runtime optimization, but allows compile time optimization at a much lower level, so this rule doesn't apply to C++.]
Just a short reminder that the application phase for the Google Summer of Code 2009 is running.
So far, we have quite few applications. Deadline is April 3rd, 19:00 UTC. Usually applications arrive rather late, but still I have the impression that we have much less than the previous years. But less copy & paste, too.
If you are interested in doing a GSoC project at Debian:
P.S. as far as I can tell, current Debian Developers can be eligible as well, although it has also always been a goal of the project to get new contributors involved.