53 lines
3.2 KiB
Markdown
53 lines
3.2 KiB
Markdown
## Michael's Email
|
|
|
|
### Data Dictionay
|
|
|
|
• `website` is supposed to be the homepage of the given agency. As you might know better than most people, that can be a surprisingly imprecise concept (they may only have a Facebook page, someone doing data entry might have picked a site that looked a lot like the official site but isn't really, the data might be out of data.).
|
|
• `url` is supposed to the page dedicated to their public records submissions process or a FOIA portal if it exists. We started breaking out FOIA portals to their own model so those are mostly not included in this list. (One favorite: The official FOIA page of one agency was a PDF at a IP address)
|
|
• This export unfortunately did not include the current actual contacts (email, address, fax) for public records we have on file -- that's something we can pull separately if needed.
|
|
|
|
## Part One
|
|
|
|
### 1A
|
|
|
|
3,800 URLs listed for agencies' dedicated FOIA pages in `url`:
|
|
• Not AI, but is this website still good and available?
|
|
• If not, is there a URL it forwards to?
|
|
|
|
### 1B
|
|
|
|
{
|
|
"public_records_email": "email",
|
|
"public_records_address": "str",
|
|
"public_records_phone": "555-555-5555",
|
|
"public_records_fax": "555-555-5555",
|
|
"public_records_web": "url",
|
|
"general_contact_phone": "555-555-5555",
|
|
"general_contact_address": "str",
|
|
"foia_guide": "url",
|
|
"public_reading_room": "url",
|
|
"agency_logo": "url"
|
|
}
|
|
|
|
The fields that begin with public_records should refer to contact information specific to FOIA/Public Information/Freedome of Information requests.
|
|
The fields that begin with general_contact should refer to contact information for the agency in general.
|
|
If a field is not found in the HTML, leave it as null in the JSON.
|
|
|
|
### Based on these Questions
|
|
|
|
• Does this page list a way for public records requests to submitted via email?
|
|
• ibid, but for mail, fax, web portal for submitting records requests?
|
|
• Is there a phone number listed to reach out with questions about FOIA requests?
|
|
• Is there a general contact phone number listed for this agency?
|
|
• Is there a general address listed for this agency?
|
|
• Is there a link to guide for FOIA or public records requesters?
|
|
• Is there a link to a public reading room or a place to browse documents or data posted by the agencies or view requests submitted by other people?
|
|
• Is there a logo for the agency?
|
|
|
|
But honestly, anything you might think interesting to poke at would be useful.
|
|
|
|
## Part Two
|
|
|
|
Would also be really interested to see how well it could identify public records and FOIA pages for agencies if given the website, since we have a lot more of those, but having it be able to scrape for potentially updated contacts would be huge.
|
|
|
|
Also, in case it's of interest, we've been working to integrate a tool called Klaxon (http://www.newsklaxon.org) into our portfolio and help it scale up. It monitors web pages for changes in a specified HTML element, so you could say just alert me when there's changes in the <body> or just documents added within a specific div. One thing I've been thinking about is setting up Klaxon to look for changes to a specified area of a page, then setting up a secondary scraper that's triggered that can do something more heavy duty as warranted. |