CSE455/CIS555 HW2 Sample Data

This page contains some sample data for your second homework assignment. The HTML pages do not contain external links, so you shouldn't have to worry about your crawler “escaping” to the outside web. The XML files do, however, contain links to external URLs, so you'll need to make sure your crawler does not follow links in XML documents.

RSS Feeds

Other XML data

Marie's XML data

Duplicates (for the content-seen test)