So here it goes again (if you can ignore the previous one and respond to this one).
Hope all is well.
Have you given any thought to my proposal of a compile time option
that won't use asmlib?
I have included the Debian Med team on this email as they are aware
of the packaging of KMC and the whole issue with asmlib.
I have been doing some benchmarking on KMC for the past couple of
days.
I have compiled KMC in three ways:
kmc_original - kmc code compiled against the version of asmlib
distributed with KMC- alibelf64.a
kmc_native - kmc code compiled against the native OS libraries
kmc_js21 - kmc code compiled against the new version of asmlib,
compiled on my machine with my Unix makefile - libaelf64.a
I have also used the executables provided in your website in the
benchmark.
kmc_exe
The machine I used for this is a Debian Virtual Machine running on
Vagrant.
Here are the architecture details:
vagrant@debian:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz
stepping : 10
microcode : 0x60b
cpu MHz : 1426.514
cache size : 6144 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm
constant_tsc rep_good nopl pni monitor ssse3 lahf_lm
bogomips : 2853.02
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
To do the benchmark I used a fastq file that has a fair bit of
contamination (many different kmers). The file is about 227M in
size.
Here are some of the results:
For an average time in seconds over 10 runs for all the differently
compiled executables:
[1] kmc_original - 'average_duration' => '41.400'
[2] kmc_native - 'average_duration' => '41.675'
[3] kmc_js21 - 'average_duration' => '41.249'
[4] kmc_exe - 'average_duration' => '44.049'
The cumulative time for 10 runs for all the differently compiled
executables:
[5] kmc_original - 'time_taken' => '414 wallclock secs ( 0.26
usr 0.76 sys + 347.94 cusr 61.00 csys = 409.96 CPU) @ 0.02/s
(n=10)'
[6] kmc_native - 'time_taken' => '412 wallclock secs ( 0.15 usr
0.85 sys + 345.37 cusr 61.17 csys = 407.54 CPU) @ 0.02/s (n=10)'
[7] kmc_js21 - 'time_taken' => '423 wallclock secs ( 0.11 usr
0.82 sys + 355.10 cusr 61.95 csys = 417.98 CPU) @ 0.02/s (n=10)'
[8] kmc_exe - 'time_taken' => '434 wallclock secs ( 0.06 usr
0.78 sys + 368.63 cusr 60.14 csys = 429.61 CPU) @ 0.02/s (n=10)
Note: More detailed results at the end of the email.
From what I can see, [1] generally runs faster than [2] and [3],
albeit, only 1 or 2%.
Looking at the cumulative times for a set of 10 runs, the difference
between implementations is still small and in this case the native
implementation was actually faster. The machine I was doing the
benchmark on wasn't fully dedicated to the benchmark.
So there will be slight variation. I would still like to run the
benchmark for 100 runs for both methods. I would like to try this
over night, just for the 2 main kmc implementations: [1] and [2].
But I'm looking at 23 hours, so essentially one whole day really.
I'm not sure I'll do it tonight. Perhaps from Friday to Saturday.
Anywa, I understand that this performance increase might mean a lot
to you, but our group here at the Sanger and Debian can definitely
live with the native implementation.
Since the author of asmlib is taking a while to reply, our
suggestion would be to package KMC in one of two ways:
1- an implementation on your side which allows KMC to be built
without using asmlib; (preferred);
2- using the compilation that does not use asmlib at all (which can
be done on my side, as a code patch, at package creation time).
This would make the packaging job slightly easier and faster and
would allow me to reach my goal. Packaging the virus assembler
written by my colleague here at the Sanger.
If/when Agner (author of KMC) replies we can always package asmlib
and state it as a dependency for the KMC package.
Detailed Results:
The benchmarking results were done through a perl script using 2
perl Modules:
[9] Time::HiRes => High resolution alarm, sleep, gettimeofday,
interval timers
[10] Benchmark => benchmark running times of Perl code
The options used for the KMC runs are the same as the ones used for
the Virus Assembler (IVA) runs:
kmc -k100 -m4 -ci10 -cs100000000 -fq foo.fastq kmc.res bar/
Results with [9] -
Average time in seconds for 10 runs logged under the
'average_duration' attribute.
'commandline_parameters' => {
'module' => 'time_hires',
'number_of_runs' => 10,
'file_type' => 'fq',
'fastaq_filename' =>
'12950_1#10_1.fastq'
},
'kmc_exe' => {
'cmd' => '../kmc_exe/kmc -k100 -m4 -ci10
-cs100000000 -fq 12950_1#10_1.fastq ke_out.res
/home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/',
'output_filename' => 'ke_out.res',
'analysis_dir' =>
'/home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/',
'kmc_exe' => '../kmc_exe/kmc',
'duration' => '46.303',
'average_duration' => '44.049'
},
'kmc_native' => {
'analysis_dir' =>
'/home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/',
'output_filename' =>
'kn_out.res',
'average_duration' =>
'41.675',
'kmc_exe' =>
'../kmc_native/kmc',
'duration' => '41.688',
'cmd' => '../kmc_native/kmc
-k100 -m4 -ci10 -cs100000000 -fq 12950_1#10_1.fastq kn_out.res
/home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/'
},
'kmc_js21' => {
'cmd' => '../kmc_js21/kmc -k100 -m4
-ci10 -cs100000000 -fq 12950_1#10_1.fastq kj_out.res
/home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/',
'output_filename' => 'kj_out.res',
'analysis_dir' =>
'/home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/',
'kmc_exe' => '../kmc_js21/kmc',
'duration' => '40.714',
'average_duration' => '41.249'
},
'kmc_original' => {
'output_filename' => 'kk_out.res',
'analysis_dir' =>
'/home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/',
'duration' => '42.670',
'kmc_exe' => '../kmc_kmc/kmc',
'average_duration' => '41.400',
'cmd' => '../kmc_kmc/kmc -k100 -m4
-ci10 -cs100000000 -fq 12950_1#10_1.fastq kk_out.res
/home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/'
},
};
The times are consistent with the times reported by the KMC
instances when they run on verbose mode.
Results with [10] -
Cumulative time in seconds logged under the 'time_taken' attribute.
'commandline_parameters' => {
'number_of_runs' => 10,
'fastaq_filename' =>
'12950_1#10_1.fastq',
'file_type' => 'fq',
'module' => 'bench'
},
'kmc_original' => {
'output_filename' => 'kk_out.res',
'time_taken' => '414 wallclock secs
( 0.26 usr 0.76 sys + 347.94 cusr 61.00 csys = 409.96 CPU) @
0.02/s (n=10)',
'kmc_exe' => '../kmc_kmc/kmc',
'analysis_dir' =>
'/home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/',
'cmd' => '../kmc_kmc/kmc -k100 -m4
-ci10 -cs100000000 -fq 12950_1#10_1.fastq kk_out.res
/home/vagrant/build/kmc_bin/perl_profiler/kk_analysis/',
'duration' => ''
},
'kmc_native' => {
'duration' => '',
'cmd' => '../kmc_native/kmc
-k100 -m4 -ci10 -cs100000000 -fq 12950_1#10_1.fastq kn_out.res
/home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/',
'analysis_dir' =>
'/home/vagrant/build/kmc_bin/perl_profiler/kn_analysis/',
'kmc_exe' =>
'../kmc_native/kmc',
'time_taken' => '412
wallclock secs ( 0.15 usr 0.85 sys + 345.37 cusr 61.17 csys =
407.54 CPU) @ 0.02/s (n=10)',
'output_filename' =>
'kn_out.res'
},
'kmc_js21' => {
'kmc_exe' => '../kmc_js21/kmc',
'time_taken' => '423 wallclock secs (
0.11 usr 0.82 sys + 355.10 cusr 61.95 csys = 417.98 CPU) @ 0.02/s
(n=10)',
'output_filename' => 'kj_out.res',
'duration' => '',
'cmd' => '../kmc_js21/kmc -k100 -m4
-ci10 -cs100000000 -fq 12950_1#10_1.fastq kj_out.res
/home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/',
'analysis_dir' =>
'/home/vagrant/build/kmc_bin/perl_profiler/kj_analysis/'
},
'kmc_exe' => {
'duration' => '',
'cmd' => '../kmc_exe/kmc -k100 -m4 -ci10
-cs100000000 -fq 12950_1#10_1.fastq ke_out.res
/home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/',
'analysis_dir' =>
'/home/vagrant/build/kmc_bin/perl_profiler/ke_analysis/',
'kmc_exe' => '../kmc_exe/kmc',
'time_taken' => '434 wallclock secs (
0.06 usr 0.78 sys + 368.63 cusr 60.14 csys = 429.61 CPU) @ 0.02/s
(n=10)',
'output_filename' => 'ke_out.res'
}
};
Let me know what you think.
Kind regards,
Jorge