HPar

HPar is a prototype of a data parallel HTML5 parser. It is based on the popular HTML paser Jsoup. With speculative parallelization, HPar can even parse a single HTML file in parallel.

Fig. Speedup on MacBook Pro with a Quad-Core CPU

Getting Started:

./compile.sh
./run.sh      # output is /test/output.html

How To Use:

ParallelParser pparser = new ParallelParser(html, numThreads);
doc = pparser.parse();

Note: The prototype is still under development. Though it passed the included test set (up to 8 threads), the current version does not guarantee the resulted DOM tree is always the same as that from a sequential version. You are welcome to contribute to this project to make it more solid.

Publication:

Paper:

HPar: A practical parallel parser for HTML--taming HTML complexities for parallel parsing

Reference:

Zhao, Z., Bebenita, M., Herman, D., Sun, J., & Shen, X. (2013). HPar: A practical parallel parser 
for HTML--taming HTML complexities for parallel parsing. ACM Transactions on Architecture and Code 
Optimization (TACO), 10(4), 44.

@article{zhao2013hpar,
  title={HPar: A practical parallel parser for HTML--taming HTML complexities for parallel parsing},
  author={Zhao, Zhijia and Bebenita, Michael and Herman, Dave and Sun, Jianhua and Shen, Xipeng},
  journal={ACM Transactions on Architecture and Code Optimization (TACO)},
  volume={10},
  number={4},
  pages={44},
  year={2013},
  publisher={ACM}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
img		img
org/jsoup		org/jsoup
test		test
README.md		README.md
clean.sh		clean.sh
compile.sh		compile.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HPar

Getting Started:

How To Use:

Publication:

About

Releases

Packages

Languages

zhijia/HPar

Folders and files

Latest commit

History

Repository files navigation

HPar

Getting Started:

How To Use:

Publication:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages