ua readme
This commit is contained in:
parent
a5da7840b5
commit
d417e1c00d
42
README.md
42
README.md
@ -1,6 +1,6 @@
|
|||||||
# whsk
|
# whsk
|
||||||
|
|
||||||
**whsk** is a command line utility for web scraper authors.
|
**whsk** (pronounced "whisk") is a command line utility for web scraper authors.
|
||||||
|
|
||||||
It provides a set of utilities for inspecting HTML responses, and applying selectors against them.
|
It provides a set of utilities for inspecting HTML responses, and applying selectors against them.
|
||||||
|
|
||||||
@ -20,6 +20,23 @@ It then opens an `ipython` shell allowing you to interact with the raw and parse
|
|||||||
When the command runs it will print a table of the variables it has loaded (which will depend on the type of page and particular flags passed):
|
When the command runs it will print a table of the variables it has loaded (which will depend on the type of page and particular flags passed):
|
||||||
|
|
||||||
```
|
```
|
||||||
|
$ uvx whsk shell https://example.com
|
||||||
|
variables
|
||||||
|
┌──────────┬───────────────────────┐
|
||||||
|
│ url │ https://example.com │
|
||||||
|
│ resp │ <Response [200 OK]> │
|
||||||
|
│ root │ lxml.html.HtmlElement │
|
||||||
|
└──────────┴───────────────────────┘
|
||||||
|
|
||||||
|
In [1]:
|
||||||
|
```
|
||||||
|
|
||||||
|
The `In[1]`: is an `ipython` prompt, the variables in the table area available for inspection & usage.
|
||||||
|
|
||||||
|
If you pass a selector from the command line, that first query will be made for you:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ uvx whsk shell https://example.com --xpath //p
|
||||||
variables
|
variables
|
||||||
┌──────────┬───────────────────────┐
|
┌──────────┬───────────────────────┐
|
||||||
│ url │ https://example.com │
|
│ url │ https://example.com │
|
||||||
@ -32,8 +49,6 @@ When the command runs it will print a table of the variables it has loaded (whic
|
|||||||
In [1]:
|
In [1]:
|
||||||
```
|
```
|
||||||
|
|
||||||
The `In[1]`: is an `ipython` prompt, the variables in the table area available for inspection & usage.
|
|
||||||
|
|
||||||
### Options
|
### Options
|
||||||
|
|
||||||
```
|
```
|
||||||
@ -83,3 +98,24 @@ Usage: whsk query [OPTIONS] URL
|
|||||||
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯
|
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Common Parameters
|
||||||
|
|
||||||
|
### --ua
|
||||||
|
|
||||||
|
This parameter is provided as a shortcut to set common browser "User-Agent" headers.
|
||||||
|
|
||||||
|
It must be one of:
|
||||||
|
|
||||||
|
- linux.chrome
|
||||||
|
- linux.firefox
|
||||||
|
- mac.chrome
|
||||||
|
- mac.firefox
|
||||||
|
- mac.safari
|
||||||
|
- win.chrome
|
||||||
|
- win.edge
|
||||||
|
- win.firefox
|
||||||
|
|
||||||
|
These will use the values in `user_agents.py`, a relatively recent snapshot of a real user agent for the browser in question.
|
||||||
|
|
||||||
|
If you need to set a custom user agent, use `--header 'user-agent: whatever you need'`
|
||||||
|
Loading…
Reference in New Issue
Block a user