I understand that using Playwright in tests is probably the most common use case (it's even in their tagline) but ultimately the introduction section of a lib should be about the lib itself, not certain scenario to use it with a 3rd-party lib B (`pytest`). Especially when it may cause side effect (I wasn't "bitten" by it but surely was confusing: when I was learning it before, I created test_example.py as said in a minefield folder which has batch of other test_xxxx.py files. And running `pytest` causes all of them to run, and gives confusing outputs. And it's not obvious to me at all, since I've never used pytest before and this is not a documentation about pytest, so no additional context was given.)
to crawl Javascript-based sites from Java. I think it was originally intended for integration tests but it sure works well for webcrawlers.
I just wrote a Python-based webcrawler this weekend for a small set of sites that is connected to a bookmark manager (you bookmark a page, it crawls related pages, builds database records, copies images, etc.) and had a very easy time picking out relevant links, text and images w/ CSS selectors and beautifulsoup. This time I used a database to manage the frontier because the system is interactive (you add a new link and it ought to get crawled quickly) but for a long time my habit was writing crawlers that read the frontier for pass N from a text file which is one URL per line and then write the frontier for pass N+1 to another text file because this kind of crawler is not only simple to write but it doesn't get stuck in web traps.
I have a few of these systems that do very heterogenous processing of mostly scraped content and something think about setting up a celery server to break work up into tasks .
agreed, playwright is great. it even has device emulation profiles built in, so you can for instance use an iphone device with the right screen size/browser/metadata automatically
I understand that using Playwright in tests is probably the most common use case (it's even in their tagline) but ultimately the introduction section of a lib should be about the lib itself, not certain scenario to use it with a 3rd-party lib B (`pytest`). Especially when it may cause side effect (I wasn't "bitten" by it but surely was confusing: when I was learning it before, I created test_example.py as said in a minefield folder which has batch of other test_xxxx.py files. And running `pytest` causes all of them to run, and gives confusing outputs. And it's not obvious to me at all, since I've never used pytest before and this is not a documentation about pytest, so no additional context was given.)
> tagline