examples | ||
src/foiaghost | ||
.gitignore | ||
databeakers.toml | ||
poetry.lock | ||
pyproject.toml | ||
README.md |
Usage
por bkr --recipe examples.fruits.recipe show
por bkr --recipe examples.fruits.recipe reset
por bkr --recipe examples.fruits.recipe run --input words=examples/fruits.csv
Michael's Email
Data Dictionay
• website
is supposed to be the homepage of the given agency. As you might know better than most people, that can be a surprisingly imprecise concept (they may only have a Facebook page, someone doing data entry might have picked a site that looked a lot like the official site but isn't really, the data might be out of data.).
• url
is supposed to the page dedicated to their public records submissions process or a FOIA portal if it exists. We started breaking out FOIA portals to their own model so those are mostly not included in this list. (One favorite: The official FOIA page of one agency was a PDF at a IP address)
• This export unfortunately did not include the current actual contacts (email, address, fax) for public records we have on file -- that's something we can pull separately if needed.
Part One
1A
3,800 URLs listed for agencies' dedicated FOIA pages in url
:
• Not AI, but is this website still good and available?
• If not, is there a URL it forwards to?
1B
{ "public_records_email": "email", "public_records_address": "str", "public_records_phone": "555-555-5555", "public_records_fax": "555-555-5555", "public_records_web": "url", "general_contact_phone": "555-555-5555", "general_contact_address": "str", "foia_guide": "url", "public_reading_room": "url", "agency_logo": "url" }
The fields that begin with public_records should refer to contact information specific to FOIA/Public Information/Freedome of Information requests. The fields that begin with general_contact should refer to contact information for the agency in general. If a field is not found in the HTML, leave it as null in the JSON.
Based on these Questions
• Does this page list a way for public records requests to submitted via email? • ibid, but for mail, fax, web portal for submitting records requests? • Is there a phone number listed to reach out with questions about FOIA requests? • Is there a general contact phone number listed for this agency? • Is there a general address listed for this agency? • Is there a link to guide for FOIA or public records requesters? • Is there a link to a public reading room or a place to browse documents or data posted by the agencies or view requests submitted by other people? • Is there a logo for the agency?
But honestly, anything you might think interesting to poke at would be useful.
Part Two
Would also be really interested to see how well it could identify public records and FOIA pages for agencies if given the website, since we have a lot more of those, but having it be able to scrape for potentially updated contacts would be huge.
Also, in case it's of interest, we've been working to integrate a tool called Klaxon (http://www.newsklaxon.org) into our portfolio and help it scale up. It monitors web pages for changes in a specified HTML element, so you could say just alert me when there's changes in the or just documents added within a specific div. One thing I've been thinking about is setting up Klaxon to look for changes to a specified area of a page, then setting up a secondary scraper that's triggered that can do something more heavy duty as warranted.