Every web site provides APIs.

Toapi

Build Coverage Python Version License

Toapi

Overview

Toapi give you the ability to make every web site provides APIs.

Version v2.0.0, Completely rewrote.

More elegant. More pythonic

Features

  • Automatic converting HTML web site to API service.
  • Automatic caching every page of source site.
  • Automatic caching every request.
  • Support merging multiple web sites into one API service.

Get Started

Installation

$ pip install toapi
$ toapi -v
toapi, version 2.0.0

Usage

create app.py and copy the code:

from flask import request
from htmlparsing import Attr, Text
from toapi import Api, Item

api = Api()


@api.site('https://news.ycombinator.com')
@api.list('.athing')
@api.route('/posts?page={page}', '/news?p={page}')
@api.route('/posts', '/news?p=1')
class Post(Item):
    url = Attr('.storylink', 'href')
    title = Text('.storylink')


@api.site('https://news.ycombinator.com')
@api.route('/posts?page={page}', '/news?p={page}')
@api.route('/posts', '/news?p=1')
class Page(Item):
    next_page = Attr('.morelink', 'href')

    def clean_next_page(self, value):
        return api.convert_string('/' + value, '/news?p={page}', request.host_url.strip('/') + '/posts?page={page}')


api.run(debug=True, host='0.0.0.0', port=5000)

run python app.py

then open your browser and visit http://127.0.0.1:5000/posts?page=1

you will get the result like:

{
  "Page": {
    "next_page": "http://127.0.0.1:5000/posts?page=2"
  }, 
  "Post": [
    {
      "title": "Mathematicians Crack the Cursed Curve", 
      "url": "https://www.quantamagazine.org/mathematicians-crack-the-cursed-curve-20171207/"
    }, 
    {
      "title": "Stuffing a Tesla Drivetrain into a 1981 Honda Accord", 
      "url": "https://jalopnik.com/this-glorious-madman-stuffed-a-p85-tesla-drivetrain-int-1823461909"
    }
  ]
}

Todo

  1. Visualization. Create toapi project in a web page by drag and drop.

Contributing

Write code and test code and pull request.

Comments
  • 关于资源获取路径路由的问题

    关于资源获取路径路由的问题

    现在获取资源的链接一般是 https://yoursite.com/https://targetsite.com/resource/path/

    这样一来会有两个问题:

    • 丑,资源请求路径太长,不好看。
    • 直接暴露源站。

    提出一个设想:

    是否可以在 Meta 中增加一个 alias 作为源站 base_url 的替代或标识,并且作为一级资源路径插入到路由中,如 https://yoursite.com/<alias>/resource/path/,这样一来既可以满足区分多站点的需求,又可以解决上面提到的两个问题。

    在官方仓库中的例子(查看源码)有利用 flask 的路由进行自定义路由的实现,如果有多个站点多个请求路径,这样写在 items 里有一份路由,在这里面又要再写一份路由,显得有点机械了。

    Thanks.

  • Any way to send HTTP POST requests?

    Any way to send HTTP POST requests?

    In working with toapi I came across a scenario where the web page had an HTML table that was paginated.

    Clicking on "next page" would issue an ajax post request to fetch the next set of records in the data set.

    Is there anyway to accomplish this with toapi?

  • 关于post数据的获取和item是编写

    关于post数据的获取和item是编写

    1. 如果我想解析的一个页面是通过post请求才能得到的

    请问 toapi 提供这样的方式么? 我看在定义settings是有一个参数是ajax=true 那么我发送ajax请求的data应该定义在哪里呢? 翻了一圈文档和issues都没找到

    1. 关于items的编写

    自带的XPath方法返回的好像是处理过的值而不是一个 etree element 这样我比如想要获取h1下所有的文本(包括子标签)就不可以用string(.)方法 必须得再写一个clean_xx的方法

    另外 能否加入bs4的支持呢?

    最后希望该项目能越做越好! 真的很棒!

  • 运行topic run报错

    运行topic run报错

    python版本是3.5 toapi版本0.2.2

    toapi new api
    cd api
    toapi run
    

    执行topic run时报错

    ➜  api toapi run
    Traceback (most recent call last):
      File "/usr/local/bin/toapi", line 9, in <module>
        load_entry_point('toapi==0.2.2', 'console_scripts', 'toapi')()
      File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 722, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 697, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 1066, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 895, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 535, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.5/dist-packages/toapi/cli.py", line 81, in run
        app = importlib.import_module('app', base_path)
      File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 986, in _gcd_import
      File "<frozen importlib._bootstrap>", line 969, in _find_and_load
      File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 665, in exec_module
      File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
      File "/home/zz/code/python/toapi_project/api/app.py", line 7, in <module>
        api.register(Page)
      File "/usr/local/lib/python3.5/dist-packages/toapi/api.py", line 31, in register
        item.__pattern__ = re.compile(item.__base_url__ + item.Meta.route)
    TypeError: Can't convert 'dict' object to str implicitly
    
    

    item.__base_url__是str, item.Meta.route是字典

  • Cache TTL clarification

    Cache TTL clarification

    Good evening,

    I have a question regarding the cache and its time to live.

    Let's say I want to turn some site into an API and want the results of the very first request to be cached for one hour. How would I specify that in the settings? Is such a setup even possible?

    I tried setting ttl: 60 * 60, assuming that that would do the trick. But to me it seems it doesn't...

    Could you please clarify?

    Thanks in advance.

  • Production Deployment Instructions

    Production Deployment Instructions

    Hello, I am relatively new to python web development. And while I am mainly working on a mobile app. I found topapi to be a perfect companion for my backend requirements. I am now almost ready to launch my app, but am struggling to find a good production hosting environment for the toapi server code. Mainly looking around using heroku or aws or google app engine for hosting server.

    I was wondering if you can provide some instructions for deploying to a production quality server. I did go over this deploy link but still not able to link the content to the actual toapi codebase.

    And advise on how can I move forward with this.

    Thank you again,

  • Flask logging error

    Flask logging error

    python 3.7 toapi 2.1.1

    Traceback (most recent call last):
      File "main.py", line 5, in <module>
        api = Api()
      File "/usr/local/lib/python3.7/site-packages/toapi/api.py", line 24, in __init__
        self.__init_server()
      File "/usr/local/lib/python3.7/site-packages/toapi/api.py", line 27, in __init_server
        self.app.logger.setLevel(logging.ERROR)
    AttributeError: module 'flask.logging' has no attribute 'ERROR'
    
  • Error: No such command

    Error: No such command "new".

    [[email protected] python3]# python --version Python 3.6.2

    [[email protected] python3]# toapi new toapi/toapi-pic Usage: toapi [OPTIONS] COMMAND [ARGS]...

    Error: No such command "new".

    help ?

  • python2.7安装报错

    python2.7安装报错

    Traceback (most recent call last): File "app.py", line 2, in from htmlparsing import Attr, Text File "/usr/local/lib/python2.7/dist-packages/htmlparsing-0.1.5-py2.7.egg/htmlparsing.py", line 21 def init(self, text: str): ^ SyntaxError: invalid syntax

  • Access to RawHTML from selectors

    Access to RawHTML from selectors

    Hello, I need to get access to the raw HTML in one of Item instances. Currently the XPath or CSS selectors always convert the node as a string. But in my use case once I select certain part of my webpage, I need to do some post-processing in my clean_ method. But I can only get a string passed into it. Is there a way to get a rawHTML passed into my clean_ method for a given key.

    Thank you,

  • Modify routing argument

    Modify routing argument

    class Meta: source = NONE route = {'/search/:id': '/search/:id'}

    Right now ID for host url passes directly into source url. Is there a way we can modify ID before passing them on?

    For example, I need to map the query 127.0.0.1:5000/search/1 to bing.com/search/100

    So I am going to have to multiply :id with 100 before passing it as argument. Not sure if that makes sense.

  • Upgrade: Bump ujson from 4.0.2 to 5.4.0

    Upgrade: Bump ujson from 4.0.2 to 5.4.0

    Bumps ujson from 4.0.2 to 5.4.0.

    Release notes

    Sourced from ujson's releases.

    5.4.0

    Added

    Fixed

    5.3.0

    Added

    Changed

    Fixed

    5.2.0

    Added

    Fixed

    5.1.0

    Changed

    ... (truncated)

    Commits
    • 9c20de0 Merge pull request from GHSA-fm67-cv37-96ff
    • b21da40 Fix double free on string decoding if realloc fails
    • 67ec071 Merge pull request #555 from JustAnotherArchivist/fix-decode-surrogates-2
    • bc7bdff Replace wchar_t string decoding implementation with a uint32_t-based one
    • cc70119 Merge pull request #548 from JustAnotherArchivist/arbitrary-ints
    • 4b5cccc Merge pull request #553 from bwoodsend/pypy-ci
    • abe26fc Merge pull request #551 from bwoodsend/bye-bye-travis
    • 3efb5cc Delete old TravisCI workflow and references.
    • 404de1a xfail test_decode_surrogate_characters() on Windows PyPy.
    • f7e66dc Switch to musl docker base images.
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Problem: can't start the app

    Problem: can't start the app

    When I ran this example, I reported the following error

    2019/09/19 10:15:50 [Register] OK <Post: /posts /news?p=1> 2019/09/19 10:15:50 [Register] OK <Post: /posts?page={page} /news?p={page}> 2019/09/19 10:15:50 [Register] OK <Page: /posts /news?p=1> 2019/09/19 10:15:50 [Register] OK <Page: /posts?page={page} /news?p={page}> 2019/09/19 10:15:50 [Serving ] OK http://0.0.0.0:5001 2019/09/19 10:15:50 [Serving ] FAIL Windows error 1 2019/09/19 10:15:50 [Serving ] FAIL Traceback (most recent call last): File "D:\python\lib\site-packages\toapi\api.py", line 50, in run self.app.run(host, port, **options) File "D:\python\lib\site-packages\flask\app.py", line 938, in run cli.show_server_banner(self.env, self.debug, self.name, False) File "D:\python\lib\site-packages\flask\cli.py", line 629, in show_server_banner click.echo(message) File "D:\python\lib\site-packages\click\utils.py", line 260, in echo file.write(message) File "D:\python\lib\site-packages\click_winconsole.py", line 180, in write return self._text_stream.write(x) File "D:\python\lib\site-packages\click_winconsole.py", line 164, in write raise OSError(self._get_error_message(GetLastError())) OSError: Windows error 1

    toapi, version 2.1.0 Flask 1.0.2 Python 3.6.0

  • Elements not always present on page

    Elements not always present on page

    I use:

    class ProductPage(Item):
          coupon = Attr('.coupon', 'title')
    

    However some product pages do not contain the coupon html so they fail with

      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/htmlparsing.py", line 79, in parse
        return element.css(self.selector)[0].attrs[self.attr]
    IndexError: list index out of range
    

    What's the best practice to deal with that situation?

Web-Extractor - Simple Tool To Extract IP-Adress From Website
Web-Extractor - Simple Tool To Extract IP-Adress From Website

IP-Adress Extractor Simple Tool To Extract IP-Adress From Website Socials: Langu

Jan 16, 2022
Scrapes Every Email Address of Every Society in Every University
Scrapes Every Email Address of Every Society in Every University

society-email-scrape Site Live at https://kcsoc.github.io/society-email-scrape/ How to automatically generate new data Go to unis.yml Add your uni Cre

Dec 14, 2022
Scan Site - Tools For Scanning Any Site and Get Site Information
Scan Site -       Tools For Scanning Any Site and Get Site Information

Site Scanner Tools For Scanning Any Site and Get Site Information Example Require - pip install colorama - pip install requests How To Use Download Th

Mar 19, 2022
Jan 14, 2022
automatically crawl every URL and find cross site scripting (XSS)
 automatically crawl every URL and find cross site scripting (XSS)

scancss Fastest tool to find XSS. scancss is a fastest tool to detect Cross Site scripting (XSS) automatically and it's also an intelligent payload ge

Sep 24, 2022
A python-based static site generator for setting up a CV/Resume site
A python-based static site generator for setting up a CV/Resume site

ezcv A python-based static site generator for setting up a CV/Resume site Table of Contents What does ezcv do? Features & Roadmap Why should I use ezc

Oct 25, 2022
Django-static-site - A simple content site framework that harnesses the power of Django without the hassle

coltrane A simple content site framework that harnesses the power of Django with

Dec 6, 2022
Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.
Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.

Read Latest Documentation - Browse GitHub Code Repository hug aims to make developing Python driven APIs as simple as possible, but no simpler. As a r

Dec 27, 2022
Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.
Embrace the APIs of the future. Hug aims to make developing APIs as simple as possible, but no simpler.

Read Latest Documentation - Browse GitHub Code Repository hug aims to make developing Python driven APIs as simple as possible, but no simpler. As a r

Dec 27, 2022
Tink is a multi-language, cross-platform, open source library that provides cryptographic APIs that are secure, easy to use correctly, and hard(er) to misuse.

Tink A multi-language, cross-platform library that provides cryptographic APIs that are secure, easy to use correctly, and hard(er) to misuse. Ubuntu

Jan 5, 2023
💻 Algo-Phantoms-Backend is an Application that provides pathways and quizzes along with a code editor to help you towards your DSA journey.📰🔥 This repository contains the REST APIs of the application.✨
💻 Algo-Phantoms-Backend is an Application that provides pathways and quizzes along with a code editor to help you towards your DSA journey.📰🔥 This repository contains the REST APIs of the application.✨

Algo-Phantom-Backend ?? Algo-Phantoms-Backend is an Application that provides pathways and quizzes along with a code editor to help you towards your D

Nov 15, 2022
Toolchest provides APIs for scientific and bioinformatic data analysis.

Toolchest Python Client Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of runni

Jun 30, 2022
Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

Nasdaq Cloud Data Service (NCDS) Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and ot

Dec 1, 2022
The sarge package provides a wrapper for subprocess which provides command pipeline functionality.

Overview The sarge package provides a wrapper for subprocess which provides command pipeline functionality. This package leverages subprocess to provi

Dec 18, 2022
Simple yet powerful and really extendable application for managing a blog within your Django Web site.

Django Blog Zinnia Simple yet powerful and really extendable application for managing a blog within your Django Web site. Zinnia has been made for pub

Dec 24, 2022
Companion Web site for Fluent Python, Second Edition

Fluent Python, the site Source code and content for fluentpython.com. The site complements Fluent Python, Second Edition with extra content that did n

Dec 8, 2022
Simple yet powerful and really extendable application for managing a blog within your Django Web site.

Django Blog Zinnia Simple yet powerful and really extendable application for managing a blog within your Django Web site. Zinnia has been made for pub

Dec 24, 2022
googler is a power tool to Google (web, news, videos and site search) from the command-line.
googler is a power tool to Google (web, news, videos and site search) from the command-line.

googler is a power tool to Google (web, news, videos and site search) from the command-line.

Jan 4, 2023
Tornadmin is an admin site generation framework for Tornado web server.
Tornadmin is an admin site generation framework for Tornado web server.

Tornadmin is an admin site generation framework for Tornado web server.

Jan 10, 2022
Set of Web-backend projects to implement micro-blogging site

Mini-Twitter This repository contains a set of projects covered for CPSC-449 Web-Backend development under the guidance of Prof. Kenytt Avery at CSU,

Nov 7, 2021