foiaghost/README.md

62 lines
3.4 KiB
Markdown
Raw Normal View History

2023-05-08 04:31:20 +00:00
## Usage
```
por bkr --recipe examples.fruits.recipe show
por bkr --recipe examples.fruits.recipe reset
por bkr --recipe examples.fruits.recipe run --input words=examples/fruits.csv
por bkr --recipe examples.fruits.recipe show
```
2023-04-27 06:25:07 +00:00
## Michael's Email
### Data Dictionay
`website` is supposed to be the homepage of the given agency. As you might know better than most people, that can be a surprisingly imprecise concept (they may only have a Facebook page, someone doing data entry might have picked a site that looked a lot like the official site but isn't really, the data might be out of data.).
`url` is supposed to the page dedicated to their public records submissions process or a FOIA portal if it exists. We started breaking out FOIA portals to their own model so those are mostly not included in this list. (One favorite: The official FOIA page of one agency was a PDF at a IP address)
• This export unfortunately did not include the current actual contacts (email, address, fax) for public records we have on file -- that's something we can pull separately if needed.
## Part One
### 1A
3,800 URLs listed for agencies' dedicated FOIA pages in `url`:
• Not AI, but is this website still good and available?
• If not, is there a URL it forwards to?
### 1B
{
"public_records_email": "email",
"public_records_address": "str",
"public_records_phone": "555-555-5555",
"public_records_fax": "555-555-5555",
"public_records_web": "url",
"general_contact_phone": "555-555-5555",
"general_contact_address": "str",
"foia_guide": "url",
"public_reading_room": "url",
"agency_logo": "url"
}
The fields that begin with public_records should refer to contact information specific to FOIA/Public Information/Freedome of Information requests.
The fields that begin with general_contact should refer to contact information for the agency in general.
If a field is not found in the HTML, leave it as null in the JSON.
### Based on these Questions
• Does this page list a way for public records requests to submitted via email?
• ibid, but for mail, fax, web portal for submitting records requests?
• Is there a phone number listed to reach out with questions about FOIA requests?
• Is there a general contact phone number listed for this agency?
• Is there a general address listed for this agency?
• Is there a link to guide for FOIA or public records requesters?
• Is there a link to a public reading room or a place to browse documents or data posted by the agencies or view requests submitted by other people?
• Is there a logo for the agency?
But honestly, anything you might think interesting to poke at would be useful.
## Part Two
Would also be really interested to see how well it could identify public records and FOIA pages for agencies if given the website, since we have a lot more of those, but having it be able to scrape for potentially updated contacts would be huge.
Also, in case it's of interest, we've been working to integrate a tool called Klaxon (http://www.newsklaxon.org) into our portfolio and help it scale up. It monitors web pages for changes in a specified HTML element, so you could say just alert me when there's changes in the <body> or just documents added within a specific div. One thing I've been thinking about is setting up Klaxon to look for changes to a specified area of a page, then setting up a secondary scraper that's triggered that can do something more heavy duty as warranted.