-
Notifications
You must be signed in to change notification settings - Fork 6
Description
This is here a placeholder for a discussion I had with @ikreymer at our first meeting in St Louis. Ilya asked about how media files are downloaded from the web, and I told him we came up with our own way of storing the downloaded images on the local filesystem using the URL that Twitter had assigned to the uploaded file. Ilya asked if we considered storing the images in a WARC file, which would preserve the data as long as where the data came from.
Currently the images can only come from one place: http://pbs.twimg.com because we're only looking at images that are uploaded to Twitter. But if we start pulling images from other places such as Instagram, Flickr, etc it might be useful to think about how recording the data in WARCs could be useful. I think it will be particularly useful when transferring data out of DocNow and into something else.