mirror of
https://github.com/stashapp/stash.git
synced 2025-12-17 04:14:39 +03:00
Feature: Support inputURL and inputHostname in scrapers (#6250)
This commit is contained in:
@@ -325,10 +325,58 @@ Alternatively, an attribute value may be set to a fixed value, rather than scrap
|
||||
|
||||
```yaml
|
||||
performer:
|
||||
Gender:
|
||||
Gender:
|
||||
fixed: Female
|
||||
```
|
||||
|
||||
### Input URL placeholders
|
||||
|
||||
The `{inputURL}` and `{inputHostname}` placeholders can be used in both `fixed` values and `selector` expressions to access information about the original URL that was used to scrape the content.
|
||||
|
||||
#### {inputURL}
|
||||
|
||||
The `{inputURL}` placeholder provides access to the full URL. This is useful when you want to return or reference the source URL as part of the scraped data.
|
||||
|
||||
For example:
|
||||
|
||||
```yaml
|
||||
scene:
|
||||
URL:
|
||||
fixed: "{inputURL}"
|
||||
Title:
|
||||
selector: //h1[@class="title"]
|
||||
```
|
||||
|
||||
When scraping from `https://example.com/scene/12345`, the `{inputURL}` placeholder will be replaced with `https://example.com/scene/12345`.
|
||||
|
||||
#### {inputHostname}
|
||||
|
||||
The `{inputHostname}` placeholder extracts just the hostname from the URL. This is useful when you need to reference the domain without manually parsing the URL.
|
||||
|
||||
For example:
|
||||
|
||||
```yaml
|
||||
scene:
|
||||
Studio:
|
||||
fixed: "{inputHostname}"
|
||||
Details:
|
||||
selector: //div[@data-domain="{inputHostname}"]//p[@class="description"]
|
||||
```
|
||||
|
||||
When scraping from `https://example.com/scene/12345`, the `{inputHostname}` placeholder will be replaced with `example.com`.
|
||||
|
||||
These placeholders can also be used within selectors for more advanced use cases:
|
||||
|
||||
```yaml
|
||||
scene:
|
||||
Details:
|
||||
selector: //div[@data-url="{inputURL}"]//p[@class="description"]
|
||||
Site:
|
||||
selector: //div[@data-host="{inputHostname}"]//span[@class="site-name"]
|
||||
```
|
||||
|
||||
> **Note:** These placeholders represent the actual URL used to fetch the content, after any URL replacements have been applied.
|
||||
|
||||
### Common fragments
|
||||
|
||||
The `common` field is used to configure selector fragments that can be referenced in the selector strings. These are key-value pairs where the key is the string to reference the fragment, and the value is the string that the fragment will be replaced with. For example:
|
||||
|
||||
Reference in New Issue
Block a user