Hi all,
I would like to submit a patch which :
- speed up 'cli_bm_scanbuff()'
- reduce memory usage (allocated in 'cli_bm_init()')
What I am looking for here is your opinion on the best what to implement it.
I/ Description of the proposal :
==============================
When using the BM algorithm, if BM_MIN_LENGTH == BM_BLOCK_SIZE, the
root->bm_shift array is not useful. We can find in cli_bm_addpatt :
for(i = BM_MIN_LENGTH - BM_BLOCK_SIZE; i >= 0; i--) {
idx = HASH(pt[i], pt[i + 1], pt[i + 2]);
root->bm_shift[idx] = MIN(root->bm_shift[idx], BM_MIN_LENGTH -
BM_BLOCK_SIZE - i);
}
We can see, that in this case ALL the bm_shift that can be useful (i.e. all
bm_shift index that have a corresponding entry in the virus signature hash
table (bm_suffix)) are set to 0.
So this array becomes completely useless and we can avoid its creation
(around 200 Ko) and its use in 'cli_bm_scanbuff()' to speed up the test done
for each byte in the scanned files.
(shift = root->bm_shift[idx]; if(shift == 0)... is always true)
It is clear that this can be avoided ONLY if BM_MIN_LENGTH == BM_BLOCK_SIZE
which is the case actually but which was not in the past. So I would like to
implement it so that if s.o. change the value of the constant, for test
purpose for example, the previous behaviour remains the same.
II/ Idea of implementation :
==========================
2.1) define a macro and macroize the code (AVOID_BM_SHIFT ???) everywhere
needed
2.2) add test like : if (BM_MIN_LENGTH == BM_BLOCK_SIZE) ... where needed
and let the compiler determine dead code and optimize it away
2.3) add a new inlined function (int cli_can_avoid_bm_shift() ???) that
perform this test
2.4) any other idea ?
Personally, I think that 2.3 is the best approach.
Thanks in advance for your comments
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html