okkyの銀河制圧奇譚: 15個に分類するには

sort.pl が出してくる3つのファイルを、マシン2台分用意してAとBの邂逅の図1のように分類するのに必要なツールは、実は簡単なものでいい。

今回はcommon.plとonlyFirst.plという2つのプログラムを作り、それを使うことにしよう。
common.plは2つのファイルを受け取り、その両方に存在するカウンター名を出力する。
onlyFirst.plは同じく2つのファイルを受け取るが、最初のファイルにあって2つ目のファイルには存在しないカウンター名を出力する。

「diffコマンドじゃ駄目なのか?」
という声が聞こえてきそうだが、実は駄目だ。diffは厳密な差分を作るわけではない。あまりにも複雑な差分が出てくると、その近辺をまとめて「えいやっ」と全部違うことにしてしまう。パッチを作るのには便利だが、本当の意味で差分を知りたいときには役に立たないのだ。

おっとっと。sort.pl の出力例を示そう。


"\System\Threads"
# "-0.22995779821056035196" "0.05288058895784879381" "-0.07730117409966560384" "143.68969576542572045539"
# "-0.51157343991699204758" "0.26170738442850427247" "-0.28398973957007304341" "268.04879224402808287292"
# "-0.54019071369820630440" "0.29180600716577749228" "-0.44072471855335868898" "364.09672304650726120687"
# "-0.55452955075971981685" "0.30750302266577667691" "-0.35677693634035859957" "315.68199698065233504727"
# "-0.56238435856552132237" "0.31627616675915285595" "-0.25723384667083141975" "251.10319887283403001070"
"\System\System Up Time"
# "-0.01694272285973713476" "0.00028705585790185927" "-0.00002142521429336236" "319.75744872321046887814"
# "-0.10027028487321032493" "0.01005413002855475131" "-0.00006496869460687431" "814.02214841113016417319"
# "-0.15561543326229096847" "0.02421616306941053433" "-0.00009270518792943096" "1140.66520449120347153620"
# "0.02950515281233018273" "0.00087055404247895569" "0.00002478604557354800" "-190.27570068419384733538"
# "0.64118189188208415717" "0.41111421847748865761" "0.00006133456719050028" "-615.26230486597317302122"
"\System\System Calls/sec"
# "-0.18729246211596788708" "0.03507846636546126618" "-0.00055434531116103602" "100.21166379708057100480"
# "-0.29232961498316379384" "0.08545660379620478167" "-0.00111680437443534113" "93.05619900302319881495"
# "-0.34131479803199158732" "0.11649579135561920833" "-0.00090462856546019161" "90.75926531940870958778"
# "-0.34750948339695522578" "0.12076284105081869973" "-0.00149413414952154013" "87.55393220941927354234"
# "-0.51114894911397353620" "0.26127324818031950750" "-0.00296996323726849211" "104.08031976970305210654"

# で始まっている行はコメントだ。この行はデバッグ用に各カウンターから算出されるr, r², a, b の4つの値を記録しているだけで、この段階では何の意味も無い。これは処理の前に grep -v で消してしまおう。


% cd $TOP
% cd $TOP/06rmComment
% cat filter.sh
#!/bin/sh

for i in ../05sort/*.{HI,BORDER,LOW}; do
  o=`echo $i | sed 's/\.\.\/05sort\///g'`
  echo $o
  egrep -v '^#' $i | sort | uniq > $o
done
% ./filter.sh

sort はデバッグしやすくするためだが(アルファベット順に並んでいると間違ってるかどうか探しやすい)、uniq は「ついカッとなってやった」の類だ。無くても構わない。

では。まず2つのファイルに共通しているカウンター名を引っ張り出すスクリプト。

common.pl


#! /usr/bin/perl

# common.pl  
# read  and , line by line.
# if common lines were found regardless of it's order, output it to STDOUT.

$inAfn = $ARGV[0];
$inBfn = $ARGV[1];

open( INA, $inAfn ) or die "can't open file $inAfn as read\n";
open( INB, $inBfn ) or die "can't open file $inBfn as read\n";

while () {
  chomp;
  push @inAline, ( $_ );
}

while () {
  chomp;
  push @inBline, ( $_ );
}

close INA;
close INB;


while ( $line = pop @inAline ) {
  # find same line in @inBline;
  for ( $i = 0; $i <= $#inBline; $i++ ) {
    if ( $inBline[$i] eq $line ) {
      print "$line\n";
    }
  }
}

えぇ、自分で言うのもなんだが、すげぇ馬鹿コードである。共通要素を見つけた後、next で次に行かないのもどうだかと思うし、inBline から見つかった要素を消さないのもどうだかと思う。

次は片一方にしかないカウンターを見つけるスクリプト。

onlyFirst.pl


#! /usr/bin/perl

# onlyFirst.pl  
# read  and , line by line.
# if line from  does not exist in line from  then print that to STDOUT.

$inAfn = $ARGV[0];
$inBfn = $ARGV[1];

open( INA, $inAfn ) or die "can't open file $inAfn as read\n";
open( INB, $inBfn ) or die "can't open file $inBfn as read\n";

while () {
  chomp;
  push @inAline, ( $_ );
}

while () {
  chomp;
  push @inBline, ( $_ );
}

close INA;
close INB;

A_line:

    while ( $line = pop @inAline ) {
 # find same line in @inBline;
 for ( $i = 0; $i <= $#inBline; $i++ ) {
     if ( $inBline[$i] eq $line ) {
  next A_line;
     }
 }
 print "$line\n";
    }

ふ。負けず劣らずこちらも馬鹿コードである orz


% cd $TOP
% cd $TOP/07common
% cat filter.sh
##############################################
# re-classify into matrix
#
# B \ A  |      |        |     |
#   \   | HIGH | BORDER | LOW |
#------------+------+--------+-----+
#   HIGH     |  HH  |  HB    | HL  |
#------------+------+--------+-----+
#   BORDER   |  BH  |  BB    | BL  |
#------------+------+--------+-----+
#   LOW      |  LH  |  LB    | LL  |
#------------+------+--------+-----+

./common.pl ../06rmComment/A.HI ../06rmComment/B.HI   > HH
./common.pl ../06rmComment/A.HI ../06rmComment/B.BORDER > BH
./common.pl ../06rmComment/A.HI ../06rmComment/B.LOW  > LH

./common.pl ../06rmComment/A.BORDER ../06rmComment/B.HI   > HB
./common.pl ../06rmComment/A.BORDER ../06rmComment/B.BORDER > BB
./common.pl ../06rmComment/A.BORDER ../06rmComment/B.LOW  > LB

./common.pl ../06rmComment/A.LOW ../06rmComment/B.HI   > HL
./common.pl ../06rmComment/A.LOW ../06rmComment/B.BORDER > BL
./common.pl ../06rmComment/A.LOW ../06rmComment/B.LOW  > LL

for i in HI BORDER LOW; do
  ./onlyFirst.pl ../06rmComment/A.$i ../06rmComment/B.HI > ./tmp1
  ./onlyFirst.pl ./tmp1 ../06rmComment/B.BORDER > ./tmp2
  ./onlyFirst.pl ./tmp2 ../06rmComment/B.LOW   > ./onlyA.$i
done

for i in HI BORDER LOW; do
  ./onlyFirst.pl ../06rmComment/B.$i ../06rmComment/A.HI > ./tmp1
  ./onlyFirst.pl ./tmp1 ../06rmComment/A.BORDER > ./tmp2
  ./onlyFirst.pl ./tmp2 ../06rmComment/A.LOW   > ./onlyB.$i
done

% ./filter.sh

C,Dの処理は省略した。

はっきり言ってこれでもかと言うぐらいの力押しスクリプトだ。よい子はまねをしてはいけない(ぇ…)。

というわけで、これで機械的に出来る分類は終わった。
ココから先は、人間の推理力がモノを言う。

実際、ここにある perl のスクリプトはほぼそのまま私が使っているスクリプトだ。シェルスクリプトの方は時々に応じて変更しているのでこれこのままではないが。で、今まで発見できなかった要因をいろいろ見つけることが出来ている。

性能障害問題で悩んでいるなら、きっと役に立つと思う。少なくとも同じ考え方は使えるはずだ。

okkyの銀河制圧奇譚

2008年8月26日

15個に分類するには

0 件のコメント:

コメントを投稿