oyster/design.txt

41 lines
1.3 KiB
Plaintext

Oyster is designed as a 'proactive document cache', meaning that it takes URLs
and keeps a local copy up to date depending on user-specified criteria.
Data Model
==========
Oyster keeps its data in a MongoDB instance and makes use of GridFS to store the
raw document data.
In addition to the standard gridfs collections (fs.chunks, fs.files) oyster
uses the following collections:
tracked - metadata for tracked resources
_id : internal id
_random : a random integer used for sorting
url : url of resource
doc_class : string indicating the document class, allows for different
settings/hooks for some documents
metadata : dictionary of extra user-specified attributes
versions : list of dictionaries with the following keys:
timestamp : UTC timestamp
<storage_key> : storage_id
(may be s3_url, gridfs_id, etc.)
logs - capped log collection
action : log entry
url : url that action was related to
error : boolean error flag
timestamp : UTC timestamp for log entry
status - internal state
update_queue : size of update queue
Storage Interface
=================
storage_key : key to store on versions
put(tracked_doc, data, content_type) -> id
get(id) -> file type object