## Usage ``` por bkr --recipe examples.fruits.recipe show por bkr --recipe examples.fruits.recipe reset por bkr --recipe examples.fruits.recipe run --input words=examples/fruits.csv ``` ## Michael's Email ### Data Dictionay • `website` is supposed to be the homepage of the given agency. As you might know better than most people, that can be a surprisingly imprecise concept (they may only have a Facebook page, someone doing data entry might have picked a site that looked a lot like the official site but isn't really, the data might be out of data.). • `url` is supposed to the page dedicated to their public records submissions process or a FOIA portal if it exists. We started breaking out FOIA portals to their own model so those are mostly not included in this list. (One favorite: The official FOIA page of one agency was a PDF at a IP address) • This export unfortunately did not include the current actual contacts (email, address, fax) for public records we have on file -- that's something we can pull separately if needed. ## Part One ### 1A 3,800 URLs listed for agencies' dedicated FOIA pages in `url`: • Not AI, but is this website still good and available? • If not, is there a URL it forwards to? ### 1B { "public_records_email": "email", "public_records_address": "str", "public_records_phone": "555-555-5555", "public_records_fax": "555-555-5555", "public_records_web": "url", "general_contact_phone": "555-555-5555", "general_contact_address": "str", "foia_guide": "url", "public_reading_room": "url", "agency_logo": "url" } The fields that begin with public_records should refer to contact information specific to FOIA/Public Information/Freedome of Information requests. The fields that begin with general_contact should refer to contact information for the agency in general. If a field is not found in the HTML, leave it as null in the JSON. ### Based on these Questions • Does this page list a way for public records requests to submitted via email? • ibid, but for mail, fax, web portal for submitting records requests? • Is there a phone number listed to reach out with questions about FOIA requests? • Is there a general contact phone number listed for this agency? • Is there a general address listed for this agency? • Is there a link to guide for FOIA or public records requesters? • Is there a link to a public reading room or a place to browse documents or data posted by the agencies or view requests submitted by other people? • Is there a logo for the agency? But honestly, anything you might think interesting to poke at would be useful. ## Part Two Would also be really interested to see how well it could identify public records and FOIA pages for agencies if given the website, since we have a lot more of those, but having it be able to scrape for potentially updated contacts would be huge. Also, in case it's of interest, we've been working to integrate a tool called Klaxon (http://www.newsklaxon.org) into our portfolio and help it scale up. It monitors web pages for changes in a specified HTML element, so you could say just alert me when there's changes in the or just documents added within a specific div. One thing I've been thinking about is setting up Klaxon to look for changes to a specified area of a page, then setting up a secondary scraper that's triggered that can do something more heavy duty as warranted.