Skip to content

warcs for images #35

@edsu

Description

@edsu

This is here a placeholder for a discussion I had with @ikreymer at our first meeting in St Louis. Ilya asked about how media files are downloaded from the web, and I told him we came up with our own way of storing the downloaded images on the local filesystem using the URL that Twitter had assigned to the uploaded file. Ilya asked if we considered storing the images in a WARC file, which would preserve the data as long as where the data came from.

Currently the images can only come from one place: http://pbs.twimg.com because we're only looking at images that are uploaded to Twitter. But if we start pulling images from other places such as Instagram, Flickr, etc it might be useful to think about how recording the data in WARCs could be useful. I think it will be particularly useful when transferring data out of DocNow and into something else.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions