Dude is a very simple framework for writing web scrapers using Python decorators

License License Version Version
Github Actions Github Actions Coverage CodeCov
Supported versions Python Versions Wheel Wheel
Status Status Downloads Downloads
All Contributors All Contributors

dude uncomplicated data extraction

Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax.

🚨 Dude is currently in Pre-Alpha. Please expect breaking changes.

Installation

To install, simply run the following from terminal.

pip install pydude
playwright install  # Install playwright binaries for Chrome, Firefox and Webkit.

Minimal web scraper

The simplest web scraper will look like this:

from dude import select


@select(css="a")
def get_link(element):
    return {"url": element.get_attribute("href")}

The example above will get all the hyperlink elements in a page and calls the handler function get_link() for each element.

How to run the scraper

You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.

" --output data.json path/to/script.py">
dude scrape --url "" --output data.json path/to/script.py

The output in data.json should contain the actual URL and the metadata prepended with underscore.

[
  {
    "_page_number": 1,
    "_page_url": "https://dude.ron.sh/",
    "_group_id": 4502003824,
    "_group_index": 0,
    "_element_index": 0,
    "url": "/url-1.html"
  },
  {
    "_page_number": 1,
    "_page_url": "https://dude.ron.sh/",
    "_group_id": 4502003824,
    "_group_index": 0,
    "_element_index": 1,
    "url": "/url-2.html"
  },
  {
    "_page_number": 1,
    "_page_url": "https://dude.ron.sh/",
    "_group_id": 4502003824,
    "_group_index": 0,
    "_element_index": 2,
    "url": "/url-3.html"
  }
]

Changing the output to --output data.csv should result in the following CSV content.

data.csv

Features

  • Simple Flask-inspired design - build a scraper with decorators.
  • Uses Playwright API - run your scraper in Chrome, Firefox and Webkit and leverage Playwright's powerful selector engine supporting CSS, XPath, text, regex, etc.
  • Data grouping - group related results.
  • URL pattern matching - run functions on matched URLs.
  • Priority - reorder functions based on priority.
  • Setup function - enable setup steps (clicking dialogs or login).
  • Navigate function - enable navigation steps to move to other pages.
  • Custom storage - option to save data to other formats or database.
  • Async support - write async handlers.
  • Option to use other parser backends aside from Playwright.
  • Option to follow all links indefinitely (Crawler/Spider).
  • Events - attach functions to startup, pre-setup, post-setup and shutdown events.
  • Option to save data on every page.

Supported Parser Backends

By default, Dude uses Playwright but gives you an option to use parser backends that you are familiar with. It is possible to use parser backends like BeautifulSoup4, Parsel, lxml, Pyppeteer, and Selenium.

Here is the summary of features supported by each parser backend.

Parser Backend Supports
Sync?
Supports
Async?
Selectors Setup
Handler
Navigate
Handler
CSS XPath Text Regex
Playwright βœ… βœ… βœ… βœ… βœ… βœ… βœ… βœ…
BeautifulSoup4 βœ… βœ… βœ… 🚫 🚫 🚫 🚫 🚫
Parsel βœ… βœ… βœ… βœ… βœ… βœ… 🚫 🚫
lxml βœ… βœ… βœ… βœ… βœ… βœ… 🚫 🚫
Pyppeteer 🚫 βœ… βœ… βœ… βœ… 🚫 βœ… βœ…
Selenium βœ… βœ… βœ… βœ… βœ… 🚫 βœ… βœ…

Documentation

Read the complete documentation at https://roniemartinez.github.io/dude/. All the advanced and useful features are documented there.

Requirements

  • βœ… Any dude should know how to work with selectors (CSS or XPath).
  • βœ… Familiarity with any backends that you love (see Supported Parser Backends)
  • βœ… Python decorators... you'll live, dude!

Why name this project "dude"?

  • βœ… A Recursive acronym looks nice.
  • βœ… Adding "uncomplicated" (like ufw) into the name says it is a very simple framework.
  • βœ… Puns! I also think that if you want to do web scraping, there's probably some random dude around the corner who can make it very easy for you to start with it. 😊

Author

Ronie Martinez

Contributors ✨

Thanks goes to these wonderful people (emoji key):


Ronie Martinez

🚧 πŸ’» πŸ“– ??

This project follows the all-contributors specification. Contributions of any kind welcome!

Owner
Ronie Martinez
I am a Python and C/C++ enthusiast working on open-source projects on my free time since 2013.
Ronie Martinez
Comments
  • ⬆️ Bump mypy from 0.931 to 0.940

    ⬆️ Bump mypy from 0.931 to 0.940

    Bumps mypy from 0.931 to 0.940.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump pytest from 7.0.1 to 7.1.0

    ⬆️ Bump pytest from 7.0.1 to 7.1.0

    Bumps pytest from 7.0.1 to 7.1.0.

    Release notes

    Sourced from pytest's releases.

    7.1.0

    pytest 7.1.0 (2022-03-13)

    Breaking Changes

    • #8838: As per our policy, the following features have been deprecated in the 6.X series and are now removed:

      • pytest._fillfuncargs function.
      • pytest_warning_captured hook - use pytest_warning_recorded instead.
      • -k -foobar syntax - use -k 'not foobar' instead.
      • -k foobar: syntax.
      • pytest.collect module - import from pytest directly.

      For more information consult Deprecations and Removals in the docs.

    • #9437: Dropped support for Python 3.6, which reached end-of-life at 2021-12-23.

    Improvements

    • #5192: Fixed test output for some data types where -v would show less information.

      Also, when showing diffs for sequences, -q would produce full diffs instead of the expected diff.

    • #9362: pytest now avoids specialized assert formatting when it is detected that the default __eq__ is overridden in attrs or dataclasses.

    • #9536: When -vv is given on command line, show skipping and xfail reasons in full instead of truncating them to fit the terminal width.

    • #9644: More information about the location of resources that led Python to raise ResourceWarning{.interpreted-text role="class"} can now be obtained by enabling tracemalloc{.interpreted-text role="mod"}.

      See resource-warnings{.interpreted-text role="ref"} for more information.

    • #9678: More types are now accepted in the ids argument to @pytest.mark.parametrize. Previously only [str]{.title-ref}, [float]{.title-ref}, [int]{.title-ref} and [bool]{.title-ref} were accepted; now [bytes]{.title-ref}, [complex]{.title-ref}, [re.Pattern]{.title-ref}, [Enum]{.title-ref} and anything with a [__name__]{.title-ref} are also accepted.

    • #9692: pytest.approx{.interpreted-text role="func"} now raises a TypeError{.interpreted-text role="class"} when given an unordered sequence (such as set{.interpreted-text role="class"}).

      Note that this implies that custom classes which only implement __iter__ and __len__ are no longer supported as they don't guarantee order.

    Bug Fixes

    • #8242: The deprecation of raising unittest.SkipTest{.interpreted-text role="class"} to skip collection of tests during the pytest collection phase is reverted - this is now a supported feature again.
    • #9493: Symbolic link components are no longer resolved in conftest paths. This means that if a conftest appears twice in collection tree, using symlinks, it will be executed twice.

    ... (truncated)

    Commits
    • 1dbffcc [pre-commit.ci] auto fixes from pre-commit.com hooks
    • d53a5fb Prepare release version 7.1.0
    • d306ec0 Update upcoming trainings (#9744)
    • 3e4c14b Merge pull request #9751 from fabianegli/main
    • 7f924b1 Fix typo in deprecation documentation
    • 4a8f8ad build(deps): Bump django from 4.0.2 to 4.0.3 in /testing/plugins_integration ...
    • c0fd2d8 build(deps): Bump pytest-asyncio from 0.18.1 to 0.18.2 in /testing/plugins_in...
    • 843e018 Merge pull request #9732 from nicoddemus/9730-toml-failure
    • bc43d66 [automated] Update plugin list (#9733)
    • e38d1ca Improve error message for malformed pyproject.toml files
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump selenium-wire from 4.6.2 to 4.6.3

    ⬆️ Bump selenium-wire from 4.6.2 to 4.6.3

    Bumps selenium-wire from 4.6.2 to 4.6.3.

    Changelog

    Sourced from selenium-wire's changelog.

    4.6.3 (2022-03-13)

    • Fix bug with dynamic switching of upstream proxy.
    Commits
    • 4e9af6a Bump version: 4.6.2 β†’ 4.6.3
    • 734a997 Updates for 4.6.3
    • cd4002f Merge pull request #518 from wkeeling/fix_switch_proxy_bug
    • b9abd1f Clearer to use assert_not_called()
    • 1596012 End client connection when proxy mis-match
    • 64c81c0 Update pre-commit hooks
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump mkdocs-material from 8.2.14 to 8.2.15

    ⬆️ Bump mkdocs-material from 8.2.14 to 8.2.15

    Bumps mkdocs-material from 8.2.14 to 8.2.15.

    Release notes

    Sourced from mkdocs-material's releases.

    mkdocs-material-8.2.15

    • Added Uzbek translations
    • Fixed spacing for code block results in content tabs
    Changelog

    Sourced from mkdocs-material's changelog.

    mkdocs-material-8.2.15+insiders-4.15.1 (2022-05-14)

    • Fixed #3531: Added prev/next buttons to content tabs
    • Fixed tooltip positioning when host element is hidden
    • Improved performance of element focus obervables

    mkdocs-material-8.2.15 (2022-05-14)

    • Added Uzbek translations
    • Fixed spacing for code block results in content tabs

    mkdocs-material-8.2.14+insiders-4.15.0 (2022-05-08)

    • Added support for improved tooltips
    • Fixed #3785: Show tooltip on hover for overflowing navigation link

    mkdocs-material-8.2.14 (2022-05-08)

    • Fixed missing top right rounded border on admonition
    • Fixed #3886: 4xx status codes not handled when using instant loading

    mkdocs-material-8.2.13+insiders-4.14.0 (2022-05-05)

    • Added Chinese language support to built-in search plugin
    • Fixed all-numeric page titles raising error in social plugin

    mkdocs-material-8.2.13 (2022-05-02)

    • Fixed #3865: Tags index links to tagged pages 404 on Windows
    • Fixed #3866: Bump required Python version from 3.6+ to 3.7+

    mkdocs-material-8.2.12+insiders-4.13.2 (2022-04-30)

    • Improved caching of downloaded resources in privacy plugin
    • Fixed #3851: External images not downloaded by privacy plugin

    mkdocs-material-8.2.12 (2022-04-30)

    • Added support for GitHub-style hash fragments for dark/light images
    • Improved rendering of nested code blocks in content tabs and annotations
    • Fixed #3862: Upgraded to latest Pygments and Python Markdown Extensions

    mkdocs-material-8.2.11+insiders-4.13.1 (2022-04-25)

    • Fixed #3839: Tags plugin breaks without icons (4.13.0 regression)

    mkdocs-material-8.2.11 (2022-04-25)

    • Temporarily pinned Pygments to <2.12
    • Temporarily pinned Python Markdown Extensions to <9.4

    ... (truncated)

    Commits
    • 2b549ff Prepare 8.2.15 release
    • 770b1b9 Fixed custom colors example in documentation
    • ec441cd Added Uzbek translations
    • 33e65f7 Fixed escape sequences on search separators
    • c5c45a1 Documentation
    • 3ee0313 Documentation
    • 7738685 Merge branch 'master' of github.com:squidfunk/mkdocs-material
    • fb248e3 Fixed spacing for result containers in content tabs
    • 7d72a6d Merge pull request #3900 from squidfunk/dependabot/github_actions/docker/buil...
    • 8e8b6b8 Merge pull request #3901 from squidfunk/dependabot/github_actions/docker/logi...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump playwright from 1.21.0 to 1.22.0

    ⬆️ Bump playwright from 1.21.0 to 1.22.0

    Bumps playwright from 1.21.0 to 1.22.0.

    Release notes

    Sourced from playwright's releases.

    v1.22.0

    Highlights

    • Role selectors that allow selecting elements by their ARIA role, ARIA attributes and accessible name.

      # Click a button with accessible name "log in"
      page.click("role=button[name='log in']")
      

      Read more in our documentation.

    • New [method: Locator.filter] API to filter an existing locator

      buttons = page.locator("role=button")
      # ...
      submit_button = buttons.filter(has_text="Submit")
      submit_button.click()
      
    • Codegen now supports generating Pytest Tests

      Graphics

    Browser Versions

    • Chromium 102.0.5005.40
    • Mozilla Firefox 99.0.1
    • WebKit 15.4

    This version was also tested against the following stable channels:

    • Google Chrome 101
    • Microsoft Edge 101
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump selenium-wire from 4.6.3 to 4.6.4

    ⬆️ Bump selenium-wire from 4.6.3 to 4.6.4

    Bumps selenium-wire from 4.6.3 to 4.6.4.

    Changelog

    Sourced from selenium-wire's changelog.

    4.6.4 (2022-05-11)

    • Fix bug where setting an interceptor would break websocket requests.
    Commits
    • 6d1e93c Bump version: 4.6.3 β†’ 4.6.4
    • 28d3e71 Updates for 4.6.4
    • 5d37552 Pass through current version
    • 8521b3a Merge pull request #541 from wkeeling/fix_websocket_interception_bug
    • ad1b0a6 Revert wss scheme for onward propagation of request
    • 824764b Move dependency to dev list
    • 96e7759 Merge pull request #537 from wkeeling/readme_basic_auth
    • 75821cd Fix dependency version issue with werkzeug
    • fda89ee Basic authentication example
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump mkdocs-material from 8.2.13 to 8.2.14

    ⬆️ Bump mkdocs-material from 8.2.13 to 8.2.14

    Bumps mkdocs-material from 8.2.13 to 8.2.14.

    Release notes

    Sourced from mkdocs-material's releases.

    mkdocs-material-8.2.14

    • Fixed missing top right rounded border on admonition
    • Fixed #3886: 4xx status codes not handled when using instant loading
    Changelog

    Sourced from mkdocs-material's changelog.

    mkdocs-material-8.2.14+insiders-4.15.0 (2022-05-08)

    • Added support for improved tooltips
    • Fixed #3785: Show tooltip on hover for overflowing navigation link

    mkdocs-material-8.2.14 (2022-05-08)

    • Fixed missing top right rounded border on admonition
    • Fixed #3886: 4xx status codes not handled when using instant loading

    mkdocs-material-8.2.13+insiders-4.14.0 (2022-05-05)

    • Added Chinese language support to built-in search plugin
    • Fixed all-numeric page titles raising error in social plugin

    mkdocs-material-8.2.13 (2022-05-02)

    • Fixed #3865: Tags index links to tagged pages 404 on Windows
    • Fixed #3866: Bump required Python version from 3.6+ to 3.7+

    mkdocs-material-8.2.12+insiders-4.13.2 (2022-04-30)

    • Improved caching of downloaded resources in privacy plugin
    • Fixed #3851: External images not downloaded by privacy plugin

    mkdocs-material-8.2.12 (2022-04-30)

    • Added support for GitHub-style hash fragments for dark/light images
    • Improved rendering of nested code blocks in content tabs and annotations
    • Fixed #3862: Upgraded to latest Pygments and Python Markdown Extensions

    mkdocs-material-8.2.11+insiders-4.13.1 (2022-04-25)

    • Fixed #3839: Tags plugin breaks without icons (4.13.0 regression)

    mkdocs-material-8.2.11 (2022-04-25)

    • Temporarily pinned Pygments to <2.12
    • Temporarily pinned Python Markdown Extensions to <9.4
    • Improved rendering of code annotation markers

    mkdocs-material-8.2.10+insiders-4.13.0 (2022-04-24)

    • Added support for tag icons

    mkdocs-material-8.2.10 (2022-04-24)

    • Added Macedonian translations
    • Updated Mermaid.js to version 9.0.1
    • Switched sidebar title in mobile navigation to bold font

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump mkdocs-material from 8.2.9 to 8.2.11

    ⬆️ Bump mkdocs-material from 8.2.9 to 8.2.11

    Bumps mkdocs-material from 8.2.9 to 8.2.11.

    Release notes

    Sourced from mkdocs-material's releases.

    mkdocs-material-8.2.11

    • Temporarily pinned Pygments to <2.12
    • Temporarily pinned Python Markdown Extensions to <9.4
    • Improved rendering of code annotation markers

    mkdocs-material-8.2.10

    • Added Macedonian translations
    • Updated Mermaid.js to version 9.0.1
    • Switched sidebar title in mobile navigation to bold font
    • Fixed color of arrows in class and state diagrams for dark mode
    • Fixed #3836: Inline admonitions overlayed by code block titles
    Changelog

    Sourced from mkdocs-material's changelog.

    mkdocs-material-8.2.11+insiders-4.13.1 (2022-04-25)

    • Fixed #3839: Tags plugin breaks without icons (4.13.0 regression)

    mkdocs-material-8.2.11 (2022-04-25)

    • Temporarily pinned Pygments to <2.12
    • Temporarily pinned Python Markdown Extensions to <9.4
    • Improved rendering of code annotation markers

    mkdocs-material-8.2.10+insiders-4.13.0 (2022-04-24)

    • Added support for tag icons

    mkdocs-material-8.2.10 (2022-04-24)

    • Added Macedonian translations
    • Updated Mermaid.js to version 9.0.1
    • Switched sidebar title in mobile navigation to bold font
    • Fixed color of arrows in class and state diagrams for dark mode
    • Fixed #3836: Inline admonitions overlayed by code block titles

    mkdocs-material-8.2.9 (2022-04-08)

    • Mitigate flicker on color palette switch by disabling all transitions
    • Fixed search suggestions not triggered when following deep link
    • Fixed incorrectly computed header height when using instant loading
    • Fixed #3782: Admonition titles have extra pixels on wide screens in Firefox
    • Fixed #3802: Always render table of contents container (except when hidden)

    mkdocs-material-8.2.8+insiders-4.12.0 (2022-03-27)

    • Added support for card grids and grid layouts
    • Fixed #3685: Annotations sometimes broken when using instant loading
    • Fixed #3742: Automatically add Mermaid.js when building for offline usage

    mkdocs-material-8.2.8 (2022-03-27)

    • Bumped MkDocs version to 1.3.0 to mitigate breaking changes in Jinja
    • Reverted Jinja version range limitation (added in 8.2.7)
    • Improved styling of annotations and fixed borders of code blocks in tabs
    • Added background color to code blocks in focused/hovered links
    • Added check in tags plugin whether tags overview page exists
    • Fixed #3744: Content tab indicator on wrong position when using back button

    mkdocs-material-8.2.7 (2022-03-24)

    • Temporarily limit Jinja version range to < 3.1 due to breaking changes

    mkdocs-material-8.2.6 (2022-03-23)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump playwright from 1.20.1 to 1.21.0

    ⬆️ Bump playwright from 1.20.1 to 1.21.0

    Bumps playwright from 1.20.1 to 1.21.0.

    Release notes

    Sourced from playwright's releases.

    v1.21.0

    Highlights

    • New experimental role selectors that allow selecting elements by their ARIA role, ARIA attributes and accessible name.

      # Click a button with accessible name "log in"
      page.click("role=button[name='log in']")
      

      To use role selectors, make sure to pass PLAYWRIGHT_EXPERIMENTAL_FEATURES=1 environment variable.

      Read more in our documentation.

    • New scale option in Page.screenshot for smaller sized screenshots.

    • New caret option in Page.screenshot to control text caret. Defaults to "hide".

    Behavior Changes

    • The mcr.microsoft.com/playwright docker image no longer contains Python. Please use mcr.microsoft.com/playwright/python as a Playwright-ready docker image with pre-installed Python.
    • Playwright now supports large file uploads (100s of MBs) via Locator.set_input_files API.

    Browser Versions

    • Chromium 101.0.4951.26
    • Mozilla Firefox 98.0.2
    • WebKit 15.4

    This version was also tested against the following stable channels:

    • Google Chrome 100
    • Microsoft Edge 100
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump pyproject-flake8 from 0.0.1a3 to 0.0.1a4

    ⬆️ Bump pyproject-flake8 from 0.0.1a3 to 0.0.1a4

    Bumps pyproject-flake8 from 0.0.1a3 to 0.0.1a4.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • ⬆️ Bump beautifulsoup4 from 4.11.0 to 4.11.1

    ⬆️ Bump beautifulsoup4 from 4.11.0 to 4.11.1

    Bumps beautifulsoup4 from 4.11.0 to 4.11.1.

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • Implement Selenium proxy

    Implement Selenium proxy

    Selenium proxy is not yet implemented

    https://github.com/roniemartinez/dude/blob/169181386063a90d83d3b0b985f92a2c47a1d28c/dude/optional/selenium_scraper.py#L142

    https://github.com/roniemartinez/dude/blob/169181386063a90d83d3b0b985f92a2c47a1d28c/dude/optional/selenium_scraper.py#L184

  • Option to download/save files by extension

    Option to download/save files by extension

    1. Download by file extension
    2. Download by mimetype, e.g. png should also match image/png mimetype
    dude scrape ... --download png,jpg  # download all png and jpg files
    dude scrape ... --download *  # download all files
    
Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

recipe-scrapers-webservice This is a wrapper for hhursev/recipe-scrapers which provides the api as a webservice, to be consumed as a microservice by o

Jan 17, 2022
A simple django-rest-framework api using web scraping

Apicell You can use this api to search in google, bing, pypi and subscene and get results Method : POST Parameter : query Example import request url =

Dec 19, 2021
A Very simple free proxy list scraper.
A Very simple free proxy list scraper.

Scrappp A Very simple free proxy list scraper, made in python The tool scrape proxy from diffrent sites and api's. Screenshots About the script !!! RE

May 19, 2022
robobrowser - A simple, Pythonic library for browsing the web without a standalone web browser.

RoboBrowser: Your friendly neighborhood web scraper Homepage: http://robobrowser.readthedocs.org/ RoboBrowser is a simple, Pythonic library for browsi

May 20, 2022
This is a web scraper, using Python framework Scrapy, built to extract data from the Deals of the Day section on Mercado Livre website.
This is a web scraper, using Python framework Scrapy, built to extract data  from the Deals of the Day section on Mercado Livre website.

Deals of the Day This is a web scraper, using the Python framework Scrapy, built to extract data such as price and product name from the Deals of the

Jan 12, 2022
Amazon web scraping using Scrapy Framework

Amazon-web-scraping-using-Scrapy-Framework Scrapy Scrapy is an application framework for crawling web sites and extracting structured data which can b

Jan 25, 2022
Works very well and you can ask for the type of image you want the scrapper to collect.

Works very well and you can ask for the type of image you want the scrapper to collect. Also follows a specific urls path depending on keyword selection.

Feb 17, 2022
Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

May 20, 2022
Async Python 3.6+ web scraping micro-framework based on asyncio
Async Python 3.6+ web scraping micro-framework based on asyncio

Ruia ??️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚑ Write less, run faster. Overview Ruia is an async web scraping micro-frame

May 23, 2022
Transistor, a Python web scraping framework for intelligent use cases.
Transistor, a Python web scraping framework for intelligent use cases.

Web data collection and storage for intelligent use cases. transistor About The web is full of data. Transistor is a web scraping framework for collec

May 10, 2022
Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

WebScrapperRoBot Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup. Mark your Star ⭐ ⭐ What is Web Scraping ? Web s

May 9, 2022
Web Scraping Framework

Grab Framework Documentation Installation $ pip install -U grab See details about installing Grab on different platforms here http://docs.grablib.

May 15, 2022
Web crawling framework based on asyncio.
Web crawling framework  based on asyncio.

Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp. Requirements Python3.5+ Installation pip install gain pip install uvloo

May 17, 2022
A simple python web scraper.

Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

May 6, 2022
βœ‚οΈπŸ•·οΈ Spider-Cut is a Network Mapper Framework (NMAP Framework)
βœ‚οΈπŸ•·οΈ Spider-Cut is a Network Mapper Framework (NMAP Framework)

Spider-Cut is a Network Mapper Framework (NMAP Framework) Installation | Usage | Creators | Donate Installation # Kali Linux | WSL

Mar 7, 2022
πŸ₯« The simple, fast, and modern web scraping library
πŸ₯« The simple, fast, and modern web scraping library

About gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. I

May 17, 2022
A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com
A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

TDTV2-Direct Version 1.00.1 β€’ A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com :) How to Works?? install all dependancies v

Nov 28, 2021
Simple library for exploring/scraping the web or testing a website you’re developing

Robox is a simple library with a clean interface for exploring/scraping the web or testing a website you’re developing. Robox can fetch a page, click on links and buttons, and fill out and submit forms.

May 19, 2022
Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.
Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Web Scrapping Popular Youtube Tech Channels with Selenium Data Mining, Data Wrangling, and Exploratory Data Analysis About the Data Web scrapi

Aug 18, 2021