# whsk **whsk** is a command line utility for web scraper authors. It provides a set of utilities for inspecting HTML responses, and applying selectors against them. ## Installation It is recommended you install whsk with `uvx` or `pipx`: `uvx whsk` is the fastest way to get running with `whsk` It currently consists of two utilities: ## whsk shell `whsk shell` fetches a page, automatically parsing HTML, XML, or JSON responses. It then opens an `ipython` shell allowing you to interact with the raw and parsed response. When the command runs it will print a table of the variables it has loaded (which will depend on the type of page and particular flags passed): ``` variables ┌──────────┬───────────────────────┐ │ url │ https://example.com │ │ resp │ │ │ root │ lxml.html.HtmlElement │ │ selector │ //p │ │ selected │ 2 elements │ └──────────┴───────────────────────┘ In [1]: ``` The `In[1]`: is an `ipython` prompt, the variables in the table area available for inspection & usage. ### Options ``` Usage: whsk shell [OPTIONS] URL Launch an interactive Python shell for scraping ╭─ Arguments ──────────────────────────────────────────────────────────────────────────╮ │ * url TEXT URL to scrape [default: None] [required] │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ ╭─ Options ────────────────────────────────────────────────────────────────────────────╮ │ --ua TEXT User agent to make requests with │ │ --postdata -p TEXT POST data (will make a POST instead of GET) │ │ --header -h TEXT Additional headers in format 'Name: Value' │ │ --css -c TEXT css selector │ │ --xpath -x TEXT xpath selector │ │ --help Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────────────────╯ ``` ## whsk query `whsk query` takes the same command line options as `whsk shell` but instead of opening a shell will output the results of the `--css` or `--xpath` selection, and then exit immediately. As such, you must provide *one* of the two selector parameters. This can be used for rapid testing of queries without opening the shell each time. ### Options ``` Usage: whsk query [OPTIONS] URL Run a one-off query against the URL ╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────╮ │ * url TEXT URL to scrape [default: None] [required] │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────╮ │ --ua TEXT User agent to make requests with │ │ --postdata -p TEXT POST data (will make a POST instead of GET) │ │ --header -h TEXT Additional headers in format 'Name: Value' │ │ --css -c TEXT css selector │ │ --xpath -x TEXT xpath selector │ │ --help Show this message and exit. │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯ ```