Comments on futureArch, or the future of archives...: Investigating Terms of Service

Good discussion. The practicality of archiving blo...

2009-10-29T08:41:29.920+00:00

Good discussion. The practicality of archiving blogs and other contributary websites is problematic. One of the few times that permission for me to archive a blog was not granted was when the blog owner felt unable to grant me permission to archive other peoples material. This was one of the very few occassions when a site owner was sufficiently aware of rights issues to raise concerns.

I wonder how publishing a blog under a Creative Commons Licence - accepted by all contributors - would affect things?

Cool post! I love that most of 'em pretty much...

2009-10-16T09:03:57.725+01:00

Cool post! I love that most of 'em pretty much say "Yeah, you own everything, but it is also ours - thanks!". While we dream of freedom, it still remains the case that you don't get sommit for nuffin! :-)

Does seem odd to have a bunch of restrictions on the Web pages and yet give everything away via RSS or other mechanical means. Seems like a tangled legal mess that no one is quite sure about so I'd be inclined to grab the bits worth grabbing, keep them safe somewhere (it that really *using* it?), and wait to see what happens.

To my knowledge no one has (yet) sued Google for indexing their Web content (or indeed resurfacing it out of the Google cache). Archiving a Web page strikes me as being akin to indexing a Web page.

Dunno.

ps. Google, I hearby revoke your non-exclusive, royalty-free, world-wide license to use this comment. Not even read it. If you're an employee of Google, stop reading now. Oh. Damn. Too late... :-)

We need to look more closely at the APIs different...

2009-10-14T17:30:32.697+01:00

We need to look more closely at the APIs different services make available for getting stuff out (and judge whether we care about properties that might be lost in the process).

This post by Matt Asay on moving data into/between/out of cloud services is interesting: http://news.cnet.com/8301-13505_3-10367052-16.html.

This isn't the whole story, though. Some of th...

2009-10-14T16:24:33.500+01:00

This isn't the whole story, though. Some of these services provide access to their underlying data via other means. Twitter is an obvious example, where the API is designed to do exactly the sort of things with the ToS for the web interface prohibit. Automated retrieval of twitter data via the API is encouraged, albeit rate-limited. For archival purposes, this will almost always be sufficient, unless one is specifically concerned with the appearance of twitter's web interface as opposed to the content it is delivering.

The same is true in a different way of blogs on Wordpress, accessible via RSS feeds. That alternative approach is the key behind the ArchivePress project (http://archivepress.ulcc.ac.uk/)