Vitavonni

Wed, 12 Sep 2007

Gnuplot grep

For my diploma thesis (on clustering of high dimensional data, especially correlation clustering, i.e. clustering data by properties such as data correlation) I needed an easy way to filter out data from CSV files as used by gnuplot (actually, they are whitespace-separated).

I hacked together a small tool that would allow me to easily 'grep' out certain parts of the datasets. If you know of a similar tool, please send me an email to erich AT debian DOT org.

Currently, the tool allows you to do commands such as:

gpgrep "1~5$" "1<200" < out/3d-2lin-noise.variances
to select ('grep') all sets where the first column ends with a 5 (regular expression match) and where the first column is less than 199.

The tool is still in early development. Syntax may change, and I guess I'll add some more filters. But you get the idea. Focus is on a very compact syntax.

A future filter I'm considering would e.g. select 20 random rows (with a fixed seed value, so it's reproduceable!) for sampling and a modulo-match to select e.g. every 7th row.

Maybe I'll also add some output processing later, such as averaging values, calculating variances and mean deviations, stripping away columns (but you can do that in Gnuplot already). I don't want to overdo it though - it's just meant as a mini filter you can add to script output visualization. It's not meant to replace a full statistics toolkit.

So if you know of a simple tool that can do that already, please tell me.

P.S. a few people have pointed out awk. Yes, it can do most of this. One thing I also need, and I'm not sure on how to do that the easiest way in awk is to preserve 'blocking'. That is empty lines. Because the data set

1 1
2 2

1 2 2 1

are two lines with two points each in gnuplot, not one with four points. I guess you could just do a "/^$/ {print}", though... hmm... looks like I finally have to learn awk. So far I've always been refusing to learn awk, I was happy with sed, perl and python...

[category: /en | Permalink]
Menu
[planet.debian]
[planet.xmlhack]
[planet SELinux]
[munichblogs]
[email]
[RSS 2 feed]
[English RSS 2]
Categories
< September 2007 >
SuMoTuWeThFrSa
       1
2 3 4 5 6 7 8
9101112131415
16171819202122
23242526272829
30      
Archives
2010-Jul
2010-Jun
2010-May
2010-Apr
2010-Mar
2010-Feb
2010-Jan
2009-Dec
2009-Nov
2009-Oct
2009-Sep
2009-Aug
2009-Jul
2009-Jun
2009-May
2009-Apr
2009-Mar
2009-Feb
2009-Jan
2008-Dec
2008-Nov
2008-Oct
2008-Sep
2008-Aug
2008-Jul
2008-May
2008-Apr
2008-Mar
2008-Feb
2008-Jan
2007-Dec
2007-Nov
2007-Oct
2007-Sep
2007-Aug
2007-Jul
2007-Jun
2007-May
2007-Apr
2007-Mar
2007-Feb
2007-Jan
2006-Dec
2006-Nov
2006-Oct
2006-Sep
2006-Aug
2006-Jul
2006-Jun
2006-May
2006-Apr
2006-Mar
2006-Feb
2006-Jan
2005-Dec
2005-Nov
2005-Oct
2005-Sep
2005-Aug
2005-Jul
2005-Jun
2005-May
2005-Apr
2005-Mar
2005-Feb
2005-Jan
2004-Dec
2004-Nov
2004-Oct
2004-Sep
2004-Aug
2004-Jul
Other links:
Swing and the City - Lindy Hop in Munich