oyster/design.txt

Oyster is designed as a 'proactive document cache', meaning that it takes URLs
and keeps a local copy up to date depending on user-specified criteria.

Data Model
==========

Oyster keeps its data in a MongoDB instance and makes use of GridFS to store the
raw document data.

In addition to the standard gridfs collections (fs.chunks, fs.files) oyster
uses the following collections:

tracked - metadata for tracked resources
    _id         : internal id
    _random     : a random integer used for sorting
    url         : url of resource
    doc_class   : string indicating the document class, allows for different
                  settings/hooks for some documents
    metadata    : dictionary of extra user-specified attributes
    versions    : list of dictionaries with the following keys:
                      timestamp     : UTC timestamp
                      <storage_key> : storage_id
                      (may be s3_url, gridfs_id, etc.)

logs - capped log collection
    action    : log entry
    url       : url that action was related to
    error     : boolean error flag
    timestamp : UTC timestamp for log entry

status - internal state
    update_queue : size of update queue


Storage Interface
=================
    storage_key : key to store on versions
    put(tracked_doc, data, content_type) -> id
    get(id) -> file type object