From d417e1c00d5e54c574e0e7d25e3d5f2f5c84d63c Mon Sep 17 00:00:00 2001 From: James Turk Date: Sun, 26 Jan 2025 11:25:25 -0600 Subject: [PATCH] ua readme --- README.md | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 619f6e4..344af25 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # whsk -**whsk** is a command line utility for web scraper authors. +**whsk** (pronounced "whisk") is a command line utility for web scraper authors. It provides a set of utilities for inspecting HTML responses, and applying selectors against them. @@ -20,6 +20,23 @@ It then opens an `ipython` shell allowing you to interact with the raw and parse When the command runs it will print a table of the variables it has loaded (which will depend on the type of page and particular flags passed): ``` +$ uvx whsk shell https://example.com + variables +┌──────────┬───────────────────────┐ +│ url │ https://example.com │ +│ resp │ │ +│ root │ lxml.html.HtmlElement │ +└──────────┴───────────────────────┘ + +In [1]: +``` + +The `In[1]`: is an `ipython` prompt, the variables in the table area available for inspection & usage. + +If you pass a selector from the command line, that first query will be made for you: + +``` +$ uvx whsk shell https://example.com --xpath //p variables ┌──────────┬───────────────────────┐ │ url │ https://example.com │ @@ -32,8 +49,6 @@ When the command runs it will print a table of the variables it has loaded (whic In [1]: ``` -The `In[1]`: is an `ipython` prompt, the variables in the table area available for inspection & usage. - ### Options ``` @@ -83,3 +98,24 @@ Usage: whsk query [OPTIONS] URL ╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` + +## Common Parameters + +### --ua + +This parameter is provided as a shortcut to set common browser "User-Agent" headers. + +It must be one of: + +- linux.chrome +- linux.firefox +- mac.chrome +- mac.firefox +- mac.safari +- win.chrome +- win.edge +- win.firefox + +These will use the values in `user_agents.py`, a relatively recent snapshot of a real user agent for the browser in question. + +If you need to set a custom user agent, use `--header 'user-agent: whatever you need'`