Thursday, 17 April 2014

Perf and samples

Hi everyone,

I tried libhubbub to parse html in sax mode but i noticed some performances problems for big files. It takes more than 10 seconds to parse it whereas libxml takes ~1 sec.
According to kcachegrind, 83% of time is spent in memmove.
The call graph is in attachment.
As i understand the problem, libparserutils keeps refilling it's internal utf8 buffer again and again and again.

Performances are very important to me, so i started my own sax parser : https://github.com/marmeladema/Saxxy (comments are appreciated) but i would like to know if you have some kind of test file database to validate libhubbub so that i can test my own parser ?

Thanks in advance

adema

No comments:

Post a Comment