From clamav-devel-bounces@lists.clamav.net  Wed Jan 10 22:31:55 2007
Return-Path: <clamav-devel-bounces@lists.clamav.net>
X-Original-To: list@tad.clamav.net
Delivered-To: list@tad.clamav.net
Received: from tad.clamav.net ([127.0.0.1])
	by localhost (tad [127.0.0.1]) (amavisd-new, port 10024) with ESMTP
	id 12083-10; Wed, 10 Jan 2007 22:31:55 +0100 (CET)
Received: from tad.clamav.net (localhost.localdomain [127.0.0.1])
	by tad.clamav.net (Postfix) with ESMTP id 32F5D16C0B5;
	Wed, 10 Jan 2007 22:31:55 +0100 (CET)
X-Original-To: clamav-devel@tad.clamav.net
Delivered-To: clamav-devel@tad.clamav.net
Received: from tad.clamav.net ([127.0.0.1])
	by localhost (tad [127.0.0.1]) (amavisd-new, port 10024) with ESMTP
	id 12735-03 for <clamav-devel@tad.clamav.net>;
	Wed, 10 Jan 2007 22:31:53 +0100 (CET)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by tad.clamav.net (Postfix) with ESMTP id 22B6B16C0AF
	for <clamav-devel@lists.clamav.net>;
	Wed, 10 Jan 2007 22:31:53 +0100 (CET)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1H4l2r-0003fu-5f
	for clamav-devel@lists.clamav.net; Wed, 10 Jan 2007 22:31:41 +0100
Received: from aorleans-157-1-184-242.w90-20.abo.wanadoo.fr ([90.20.179.242])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <clamav-devel@lists.clamav.net>; Wed, 10 Jan 2007 22:31:41 +0100
Received: from christophe.jaillet by
	aorleans-157-1-184-242.w90-20.abo.wanadoo.fr with local (Gmexim
	0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <clamav-devel@lists.clamav.net>; Wed, 10 Jan 2007 22:31:41 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: clamav-devel@lists.clamav.net
From: "Christophe Jaillet" <christophe.jaillet@wanadoo.fr>
Date: Wed, 10 Jan 2007 22:30:40 +0100
Lines: 55
Message-ID: <eo3lv8$3id$1@sea.gmane.org>
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: aorleans-157-1-184-242.w90-20.abo.wanadoo.fr
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1807
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1896
X-Antivirus: avast! (VPS 0702-0, 09/01/2007), Outbound message
X-Antivirus-Status: Clean
X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at clamav.net
Subject: [Clamav-devel] speed up 'cli_bm_scanbuff()'
X-BeenThere: clamav-devel@lists.clamav.net
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ClamAV Development <clamav-devel@lists.clamav.net>
List-Id: ClamAV Development <clamav-devel.lists.clamav.net>
List-Unsubscribe: <http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-devel>,
	<mailto:clamav-devel-request@lists.clamav.net?subject=unsubscribe>
List-Post: <mailto:clamav-devel@lists.clamav.net>
List-Help: <mailto:clamav-devel-request@lists.clamav.net?subject=help>
List-Subscribe: <http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-devel>,
	<mailto:clamav-devel-request@lists.clamav.net?subject=subscribe>
Sender: clamav-devel-bounces@lists.clamav.net
Errors-To: clamav-devel-bounces@lists.clamav.net
X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at clamav.net

Hi all,

I would like to submit a patch which :
    - speed up 'cli_bm_scanbuff()'
    - reduce memory usage (allocated in 'cli_bm_init()')

What I am looking for here is your opinion on the best what to implement it.


I/ Description of the proposal :
==============================
When using the BM algorithm, if BM_MIN_LENGTH == BM_BLOCK_SIZE, the
root->bm_shift array is not useful. We can find in cli_bm_addpatt :

  for(i = BM_MIN_LENGTH - BM_BLOCK_SIZE; i >= 0; i--) {
      idx = HASH(pt[i], pt[i + 1], pt[i + 2]);
      root->bm_shift[idx] = MIN(root->bm_shift[idx], BM_MIN_LENGTH -
BM_BLOCK_SIZE - i);
  }

We can see, that in this case ALL the bm_shift that can be useful (i.e. all
bm_shift index that have a corresponding entry in the virus signature hash
table (bm_suffix)) are set to 0.

So this array becomes completely useless and we can avoid its creation
(around 200 Ko) and its use in 'cli_bm_scanbuff()' to speed up the test done
for each byte in the scanned files.
(shift = root->bm_shift[idx]; if(shift == 0)... is always true)

It is clear that this can be avoided ONLY if BM_MIN_LENGTH == BM_BLOCK_SIZE
which is the case actually but which was not in the past. So I would like to
implement it so that if s.o. change the value of the constant, for test
purpose for example, the previous behaviour remains the same.


II/ Idea of implementation :
==========================
2.1) define a macro and macroize the code (AVOID_BM_SHIFT ???) everywhere
needed

2.2) add test like : if (BM_MIN_LENGTH == BM_BLOCK_SIZE) ... where needed
and let the compiler determine dead code and optimize it away

2.3) add a new inlined function (int cli_can_avoid_bm_shift() ???) that
perform this test

2.4) any other idea ?


Personally, I think that 2.3 is the best approach.


Thanks in advance for your comments



_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html

