Commit Graph

17 Commits

Author SHA1 Message Date
SmallCoccinelle
e513b6ffa5 Cache and reuse the scraper HTTP client (#1855)
* Add Cookies directly to the request

Rather than maintaining a cookie jar on a one-shot HTTP client, maintain
the jar ourselves: make a new jar, then use it to select the right
cookies.

The cookies are set on the request rather than on the client. This will
retain the current behavior as we are always throwing the client away
after each use.

This patch enables the lifting of the http client as well over time.

* Introduce a cached scraper HTTP client

The scraper cache is augmented with an *http.Client. These are safe for
concurrent use, so the pointer can safely be passed around. Push this
into scraper configurations where applicable, next to the txnManagers.

When we issue a loadUrl request, do so on the cached *http.Client,
which will reuse existing idle connections in the client if any are
present.

* Set MaxIdleConnsPerHost. Closes #1850

We allow for up to 8 idle connections to a single host. This should
make concurrent operation toward the same host reuse connections, even
for sizeable concurrency.

The number isn't bumped excessively high. We should probably limit
concurrency toward a single site anyway, since we'll be able to overrun
a site with queries quite easily if we have many concurrent goroutines
issuing requests at the same time.

* Reinstate driverOptions / useCDP check

Use DeMorgan's laws to invert the logic and exit early. Fixes tests
breaking.

* Documentation fixup.

* Use the scraper http.Client when fetching images

Fold image fetchers onto the cached scraper http.Client as well. This
makes the scraper have a single http.Client cache for all its
operations.

Thread the client upwards to the relevant attachment points: either the
cache, or a stash_box instance, which is extended to include a pointer
to the client.

Style roughly follows that of txnManagers.

* Use the same http Client as the GraphQL client use

Rather than using http.DefaultClient, use the same client as the
GraphQL client use in the stash_box subsystem. This localizes the
client used in the subsystem into the constructing New.. call.

* Hoist HTTP client construction

Create a function for initializaing the HTTP Client we use. While here
hoist magic numbers into constants. Introduce a proper static redirect
error and use it in the client code as well.

* Reinstate printCookies

This is a debugging function, and it might still come in handy in the
future at some point.

* Nitpick comment.

* Minor tidy

Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2021-10-20 16:12:24 +11:00
WithoutPants
e9d48683f8 Autotag scraper (#1817)
* Refactor scraper structures
* Move matching code into new package
* Add autotag scraper
* Always check first letter of auto-tag names
* Account for nulls

Co-authored-by: Kermie <kermie@isinthe.house>
2021-10-11 23:06:06 +11:00
SmallCoccinelle
a5ca8fc678 Enable safe linters (#1786)
* Enable safe linters

Enable the linters dogsled, rowserrcheck, and sqlclosecheck.

These report no errors currently in the code base.

Enable misspell.

Misspell finds two spelling mistakes in comments, which are fixed by the
patch as well.

Add and sort linters which are relatively
safe to add over time. Comment them out for now.

* Close the response body

If we can get a HTTP response, it has a body which ought to be closed.

By doing so, we avoid potentially leaking connections.

* Enable the exportloopref linter

There are two places in the code with these warnings. Fix them while
enabling the linter.

* Remove redundant types in tests

If a slice already determines the type, the inner type declaration is
redundant. Remove the inner declarations.

* Mark autotag test cases as parallel

Autotag test cases is by far the outlier when it comes to test time.
While go test runs test cases in parallel,
it doesn't do so inside a given package, unless one marks the test cases
as parallel.

This change provides a significant speedup on a 8-core machine for test
runs.
2021-10-03 11:48:03 +11:00
WithoutPants
4625e1f955 Unify scrape refactor (#1630)
* Unify scraped types
* Make name fields optional
* Unify single scrape queries
* Change UI to use new interfaces
* Add multi scrape interfaces
* Use images instead of image
2021-09-07 11:54:22 +10:00
bnkai
597576f5e6 Get distinct values from scraper (#1338)
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2021-04-29 11:38:55 +10:00
bnkai
aedadc3857 Add lbToKg pp action to the scraper (#1337) 2021-04-26 13:31:25 +10:00
julien0221
d673c4ce03 added details, deathdate, hair color, weight to performers and added details to studios (#1274)
* added details to performers and studios
* added deathdate, hair_color and weight to performers
* Simplify performer/studio create mutations
* Add changelog and recategorised

Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2021-04-16 16:06:35 +10:00
WithoutPants
f6ffda7504 Setup and migration UI refactor (#1190)
* Make config instance-based
* Remove config dependency in paths
* Refactor config init
* Allow startup without database
* Get system status at UI initialise
* Add setup wizard
* Cache and Metadata optional. Database mandatory
* Handle metadata not set during full import/export
* Add links
* Remove config check middleware
* Stash not mandatory
* Panic on missing mandatory config fields
* Redirect setup to main page if setup not required
* Add migration UI
* Remove unused stuff
* Move UI initialisation into App
* Don't create metadata paths on RefreshConfig
* Add folder selector for generated in setup
* Env variable to set and create config file.
Make docker images use a fixed config file.
* Set config file during setup
2021-04-12 09:31:33 +10:00
WithoutPants
a0676d5c30 Performer tags (#1132)
* Add scraping support for performer tags
* Add performer count to tag cards
* Refactor sqlite test setup
* Add performer tag filtering in gallery and image
* Add bulk update performer
* Add Performers tab to tag page
* Add count filters and sort bys for tags
* Move scene count to icon in performer card #1148
2021-03-10 12:25:51 +11:00
WithoutPants
1e04deb3d4 Data layer restructuring (#997)
* Move query builders to sqlite package
* Add transaction system
* Wrap model resolvers in transaction
* Add error return value for StringSliceToIntSlice
* Update/refactor mutation resolvers
* Convert query builders
* Remove unused join types
* Add stash id unit tests
* Use WAL journal mode
2021-01-18 12:23:20 +11:00
WithoutPants
9a84726128 Fix xpath comment element parsing (#759) 2020-08-23 17:39:15 +10:00
woodgen
e3ea3ea85e scraper/mapped: Add feetToCm post process. (#711)
This patch adds a feetToCm post process that converts imperial feet and
inches to centimeters.
2020-08-12 11:17:43 +10:00
WithoutPants
2b9215702e Refactor xpath scraper code. Add fixed and map (#616)
* Refactor xpath scraper code
* Make post-process a list
* Add map post-process action
* Add fixed xpath values
* Refactor scrapers into cache
* Refactor into mapped config
* Trim test html
2020-07-21 14:06:25 +10:00
bnkai
9d0522f62d Add "split" xpath in post-processing , newlines in replace support (#579) 2020-06-18 10:47:10 +10:00
WithoutPants
03c07a429d Add Xpath post processing and performer name query (#333)
* Extend xpath configuration. Support concatenation

* Add parseDate parsing option

* Add regex replacements

* Add xpath query performer by name

* Fix loading spinner on scrape performer

* Change ReplaceAll to Replace
2020-01-31 17:17:40 -05:00
WithoutPants
78eb527ec4 Scraper fixes (#332)
* Fix panic on invalid xpath

* Add missing attrs to scraped performer fragment
2020-01-24 22:36:24 -05:00
WithoutPants
7fdaccf669 Xpath scraping from URL (#285)
* Add xpath performer and scene scraping

* Add studio scraping

* Refactor code

* Fix compile error

* Don't overwrite performer URL during a scrape
2020-01-04 11:39:33 -05:00