Hi everyone,
I tried libhubbub to parse html in sax mode but i noticed some performances problems for big files. It takes more than 10 seconds to parse it whereas libxml takes ~1 sec.
According to kcachegrind, 83% of time is spent in memmove.
The call graph is in attachment.
As i understand the problem, libparserutils keeps refilling it's internal utf8 buffer again and again and again.
Performances are very important to me, so i started my own sax parser : https://github.com/marmeladema/Saxxy (comments are appreciated) but i would like to know if you have some kind of test file database to validate libhubbub so that i can test my own parser ?
Thanks in advance
adema
No comments:
Post a Comment