Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 593 Bytes

README.md

File metadata and controls

12 lines (8 loc) · 593 Bytes

RTDMTD algorithm

I implemented the algorithm in this paper using Beautifulsoup:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.629&rep=rep1&type=pdf

These are the steps in the algorithm:

  • Guiven 2 pages A and B, use the DOM of the pages to represent them as trees.
  • Find the edition between the 2 pages with minimal cost. The possible tree editions are: insertion, deletion or replace
  • The nodes that are keep intact in the edition with minimal cost are considered template nodes. Create the minimal subtree containing that nodes. This subtree is the template.