It may seem a class library like this would be readily available: download a web page, all the assets and alter said web page to reference the downloaded assets. Well, when I needed to build this functionality for 4teaspoons I searched high and low – and not just for .NET code – any code what so ever. No one seems to have to do this. It’s more or less creating caches of a web page. Maybe it’s so simple no one thinks to create a shared library – when that is the case it’s time to contribute the code back to the community.
In my instance I wanted to download a given page and save the HTML file and all assets up to Amazon S3. This library comes with the S3 provider. A provider class can be created to persist the assets just about anywhere – database, mongodb, filesystem, etc. It’s really up to the developer what they need. I’ve taken care of the heavy lifting of parsing the pages and doing the transformation.
I hope this contribution is worthwhile and used.
https://code.google.com/p/ontheheap-websucker/


