Pure Javascript OCR for more than 100 Languages 📖🎉🖥

Tesseract.js

Lint & Test CodeQL Gitpod Ready-to-Code Financial Contributors on Open Collective npm version Maintenance License Code Style Downloads Total Downloads Month

Version 2 is now available and under development in the master branch, read a story about v2: Why I refactor tesseract.js v2?
Check the support/1.x branch for version 1


Tesseract.js is a javascript library that gets words in almost any language out of images. (Demo)

Image Recognition

fancy demo gif

Video Real-time Recognition

Tesseract.js Video

Tesseract.js wraps an emscripten port of the Tesseract OCR Engine. It works in the browser using webpack or plain script tags with a CDN and on the server with Node.js. After you install it, using it is as simple as:

import Tesseract from 'tesseract.js';

Tesseract.recognize(
  'https://tesseract.projectnaptha.com/img/eng_bw.png',
  'eng',
  { logger: m => console.log(m) }
).then(({ data: { text } }) => {
  console.log(text);
})

Or more imperative

import { createWorker } from 'tesseract.js';

const worker = createWorker({
  logger: m => console.log(m)
});

(async () => {
  await worker.load();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
  console.log(text);
  await worker.terminate();
})();

Check out the docs for a full explanation of the API.

Major changes in v2

  • Upgrade to tesseract v4.1.1 (using emscripten 1.39.10 upstream)
  • Support multiple languages at the same time, eg: eng+chi_tra for English and Traditional Chinese
  • Supported image formats: png, jpg, bmp, pbm
  • Support WebAssembly (fallback to ASM.js when browser doesn't support)
  • Support Typescript

Installation

Tesseract.js works with a <script> tag via local copy or CDN, with webpack via npm and on Node.js with npm/yarn.

CDN

<!-- v2 -->
<script src='https://unpkg.com/[email protected]/dist/tesseract.min.js'></script>

<!-- v1 -->
<script src='https://unpkg.com/[email protected]/src/index.js'></script>

After including the script the Tesseract variable will be globally available.

Node.js

Tesseract.js currently requires Node.js v6.8.0 or higher

# For v2
npm install tesseract.js
yarn add tesseract.js

# For v1
npm install [email protected]
yarn add [email protected]

Documentation

Use tesseract.js the way you like!

Contributing

Development

To run a development copy of Tesseract.js do the following:

# First we clone the repository
git clone https://github.com/naptha/tesseract.js.git
cd tesseract.js

# Then we install the dependencies
npm install

# And finally we start the development server
npm start

The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser. It will automatically rebuild tesseract.dev.js and worker.dev.js when you change files in the src folder.

Online Setup with a single Click

You can use Gitpod(A free online VS Code like IDE) for contributing. With a single click it will launch a ready to code workspace with the build & start scripts already in process and within a few seconds it will spin up the dev server so that you can start contributing straight away without wasting any time.

Open in Gitpod

Building Static Files

To build the compiled static files just execute the following:

npm run build

This will output the files into the dist directory.

Contributors

Code Contributors

This project exists thanks to all the people who contribute. [Contribute].

Financial Contributors

Become a financial contributor and help us sustain our community. [Contribute]

Individuals

Organizations

Support this project with your organization. Your logo will show up here with a link to your website. [Contribute]

Owner
Project Naptha
highlight, copy, search, edit and translate text in any image
Project Naptha
Comments
  • react-native support?

    react-native support?

    Hey guys!! I wonder if you have considered bringing support for frameworks like react-native through node. I was working on a tesseract wrapper for react-native but your lib looks much better. (Considering that now the wrapper is only implemented on android)

    So, I tryed to create a test using yours but I'm getting this error

    rsz_14632600_1552933568054084_273631139_o

  • TypeError: TesseractWorker is not a constructor

    TypeError: TesseractWorker is not a constructor

    const Worker= new TesseractWorker();//For analyzing images ^

    TypeError: TesseractWorker is not a constructor at Object. (/Users/hyder/Desktop/OCR-PDF/app.js:6:15) at Module._compile (internal/modules/cjs/loader.js:774:30) at Object.Module._extensions..js (internal/modules/cjs/loader.js:785:10) at Module.load (internal/modules/cjs/loader.js:641:32) at Function.Module._load (internal/modules/cjs/loader.js:556:12) at Function.Module.runMain (internal/modules/cjs/loader.js:837:10) at internal/main/run_main_module.js:17:11 [nodemon] app crashed - waiting for file changes before starting...

    Please suggest a solution

  • Cannot read property 'arrayBuffer' of undefined (Electron & React)

    Cannot read property 'arrayBuffer' of undefined (Electron & React)

    Describe the bug I've spent a few hours tonight trying to get tesseract.js working with an application I've been building. The stack is Electron & React and I can't seem to get it to work, I've pulled both the Electron & React example applications and they seem to work fine, but with my application, I'm bundling React inside Electron--which I think might be causing this issue.

    At first, my application wasn't loading the languages with the default setup, so I went ahead and moved to the offline tesseract. To do this, I used Webpack to copy the files from the node_modules to my build folder using copy-webpack-plugin, this works fine, so then I went ahead and created the worker like so:

    const worker = createWorker({
      cacheMethod: 'none',
      langPath: `http://localhost:3000/static/vendor/lang-data/eng.traineddata`,
      workerPath: `http://localhost:3000/static/vendor/worker.min.js`,
      corePath: `http://localhost:3000/static/vendor/tesseract-core.wasm.js`,
      logger: (m) => console.log(m),
    });
    

    Note: If I remove http://localhost:3000/ - I get Uncaught DOMException: Failed to execute 'importScripts' on 'WorkerGlobalScope': The URL '/static/vendor/worker.min.js' is invalid.

    After running the application with the steps below, I get the following error: Uncaught (in promise) TypeError: Cannot read property 'arrayBuffer' of undefined - I've spent a few hours trying to debug this, but to no avail. The langPath, workerPath, corePath all seem correct, and I can access these directly in the browser.

    I'm kind of stumped at this point, any help would be appreciated.

    To Reproduce Steps to reproduce the behavior:

    1. Go to 'https://github.com/karlhadwen/notes' - pull the repo
    2. yarn install & yarn dev
    3. Click the [+] button on the bottom left (with console open)
    4. See error (Cannot read property 'arrayBuffer' of undefined)

    Expected behavior To read the data from the image in 'http://localhost:3000/note.png' - which is the example image.

    Screenshots Screenshot 2020-05-18 at 22 22 41

    App.js: https://github.com/karlhadwen/notes/blob/master/src/App.js electron.js: https://github.com/karlhadwen/notes/blob/master/public/electron.js .webpack.config.js: https://github.com/karlhadwen/notes/blob/master/.webpack.config.js

    Desktop (please complete the following information):

    • OS: OS X (10.15.4)
    • Electron & Chrome - both do not work
    • Version: ^2.1.1

    Additional context Repo where this is happening: https://github.com/karlhadwen/notes/

  • Incorrect header check at Zlib._handle.onerror (zlib.js:363:17)

    Incorrect header check at Zlib._handle.onerror (zlib.js:363:17)

    I'm trying to process an image which is saved locally in my node server. I'm getting following error:

    2017-06-17T16:12:45.087797+00:00 app[web.1]: File write complete-- /app/sample.png 2017-06-17T16:12:46.065537+00:00 app[web.1]: pre-main prep time: 61 ms 2017-06-17T16:12:46.114192+00:00 app[web.1]: events.js:154 2017-06-17T16:12:46.114195+00:00 app[web.1]: throw er; // Unhandled 'error' event 2017-06-17T16:12:46.114196+00:00 app[web.1]: ^ 2017-06-17T16:12:46.114197+00:00 app[web.1]: 2017-06-17T16:12:46.114198+00:00 app[web.1]: Error: incorrect header check 2017-06-17T16:12:46.114201+00:00 app[web.1]: at Zlib._handle.onerror (zlib.js:363:17)

    Here is my code:

    Tesseract.recognize(completeFilePath)
       .then(function(data) {
       		console.log('Job completed');
       	})
       	.catch(function(err){
            console.log('catch\n', err);
         })
       	.finally(function(e){
            console.log('Finally');
            //cleanup temp file
         });
    
  • Current CDN Example Not Working

    Current CDN Example Not Working

    Hi. I'm trying to conduct a very simple test using just a single HTML file and by including the tesseract.js script using the CDN source in the documentation:

    <script src='https://cdn.rawgit.com/naptha/tesseract.js/1.0.10/dist/tesseract.js'></script>
    

    My HTML file is simple:

    <html>
        <head>
            <script src='https://cdn.rawgit.com/naptha/tesseract.js/1.0.10/dist/tesseract.js'></script>
            <title>Tesseract Test</title>
        </head>
        <body>
            <label for="fileInput">Choose File to OCR:</label>
            <input type="file" id="fileInput" name="fileInput"/>
            <br />
            <br />
            <div id="document-content">
            </div>
        </body>
        <script>
            document.addEventListener('DOMContentLoaded', function(){
                var fileInput = document.getElementById('fileInput');
                fileInput.addEventListener('change', handleInputChange);
            });
    
            function handleInputChange(event){
                var input = event.target;
                var file = input.files[0];
                console.log(file);
                Tesseract.recognize(file)
                    .progress(function(message){
                        console.log(message);
                    })
                    .then(function(result){
                        var contentArea = document.getElementById('document-content');
                        console.log(result);
                    })
                    .catch(function(err){
                        console.error(err);
                    });
            }
        </script>
    </html>
    

    But if I try to add an image, nothing happens in the console or anywhere else. This is also true if I clone the repository and instead load tesseract.js from the dist directory.

    I see that the main (non-github) website for the project uses the CDN version 1.0.7, so I tried using that source instead. It came to life and started reporting back progress, but then threw the following error:

    tesseract_example.html:27 Object {status: "loading tesseract core", progress: 0}
    tesseract_example.html:27 Object {status: "loading tesseract core", progress: 1}
    tesseract_example.html:27 Object {status: "initializing tesseract", progress: 0}
    index.js:10 pre-main prep time: 65 ms
    tesseract_example.html:27 Object {status: "initializing tesseract", progress: 1}
    worker.js:11953 Uncaught DOMException: Failed to execute 'postMessage' on 'DedicatedWorkerGlobalScope': An object could not be cloned.(…)
    
    (anonymous function)	@	worker.js:11953
    respond	@	worker.js:12185
    dispatchHandlers	@	worker.js:12205
    (anonymous function)	@	worker.js:11952
    
    

    Am I just doing this wrong somehow?

    (Using Chrome 54 in OSX 10.11.)

  • Working with Tesseract.js with custom language and without internet connection

    Working with Tesseract.js with custom language and without internet connection

    Hey,

    Wonder if it's possible to use tesseract.js on a mobile app using a custom traineddata file? In addition, is it possible to use it offline? locally from the mobile device without an internet connection.

    Thanks.

  • Tesseract couldn't load any languages!

    Tesseract couldn't load any languages!

    Hey folks, I'm just trying out tesseract.js and seem to be missing something... I've installed it via npm, and am trying to run what is basically the simple example in node 7:

    const Tesseract = require('tesseract.js');
    const image = require('path').resolve(__dirname, 'test.jpeg')
    
    Tesseract.recognize(image)
    .then(data => console.log('then\n', data.text))
    .catch(err => console.log('catch\n', err))
    .finally(e => {
      console.log('finally\n');
      process.exit();
    });
    

    Running this file the first time generated this error:

    // progress { status: 'loading tesseract core' }
    // progress { status: 'loaded tesseract core' }
    // progress { status: 'initializing tesseract', progress: 0 }
    // pre-main prep time: 131 ms
    // progress { status: 'initializing tesseract', progress: 1 }
    // progress { status: 'downloading eng.traineddata.gz',
    //   loaded: 116,
    //   progress: 0.000012270517521770119 }
    // events.js:160
    //       throw er; // Unhandled 'error' event
    //       ^
    
    // Error: incorrect header check
    //     at Zlib._handle.onerror (zlib.js:356:17)
    
    // SECOND ERROR
    // AdaptedTemplates != NULL:Error:Assert failed:in file ../classify/adaptmatch.cpp, line 190
    

    Subsequent running of the file results in this error:

    pre-main prep time: 83 ms
    Failed loading language 'eng'
    Tesseract couldn't load any languages!
    AdaptedTemplates != NULL:Error:Assert failed:in file ../classify/adaptmatch.cpp, line 190
    
    /Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:4
    function f(a){throw a;}var h=void 0,i=!0,j=null,k=!1;function aa(){return function(){}}function ba(a){return function(){return a}}var n,Module;Module||(Module=eval("(function() { try { return TesseractCore || {} } catch(e) { return {} } })()"));var ca={},da;for(da in Module)Module.hasOwnProperty(da)&&(ca[da]=Module[da]);var ea=i,fa=!ea&&i;
                  ^
    abort() at Error
        at Na (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:32:26)
        at Object.ka [as abort] (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:507:108)
        at _abort (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:373:173)
        at $L (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:383:55709)
        at jpa (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:388:22274)
        at lT (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:387:80568)
        at mT (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:387:80700)
        at Array.BS (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:387:69011)
        at bP (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:383:110121)
        at jT (/Users/emilyintersimone/Documents/Coding/personal/run-lines/node_modules/tesseract.js-core/index.js:387:80280)
    If this abort() is unexpected, build with -s ASSERTIONS=1 which can give more information.
    

    What am I missing?

  • Failed loading language 'eng'

    Failed loading language 'eng'

    because i couldn't get traineddate from cdn l downloaded data in my repository and tried to load it but failed...

    i used langPath to load from my local storage and i don't know why

    this is my javascript code

    var Tesseract = require('tesseract.js')
      const path = require("path");
      var imagePath= path.join(__dirname,"jake.jpg");
    
      Tesseract.create({
        langPath: path.join(__dirname, "langs")
      }).recognize(imagePath, {lang: "eng"}) 
          .then((result) => console.log(result.text));
    

    my traineddata is in 'langs' folder which is in the same repository with javascript file

    image and lang folder is in the same repository with javascript file above.

  • tessedit_pageseg_mode:

    tessedit_pageseg_mode: "1" does not work

    Hi,

    I'm trying to use "Automatic page segmentation with OSD" feature, and it does not seem to be working.

    Here is my setup:

    • Ubuntu 16.04;
    • node 8.9.4
    Tesseract.recognize(imgPath, {tessedit_pageseg_mode: "1"}) 
    
    fails with:
    pre-main prep time: 46 ms
    Error opening data file ./tessdata/osd.traineddata
    Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
    Failed loading language 'osd'
    Tesseract couldn't load any languages!
    Warning: Auto orientation and script detection requested, but osd language failed to load
    

    I do have eng.traineddata and osd.traineddata in the root folder of my project. Tesseract.recognize() and Tesseract.detect() require them to be there, and both work fine without tessedit_pageseg_mode option, or with tessedit_pageseg_mode: "6". But tessedit_pageseg_mode: "1" (or "0") blows it up with the error above.

    I tried placing eng.traineddata and osd.traineddata into ./tessdata, but this did not help. Providing TESSDATA_PREFIX to my script did not help either.

    In theory I can use Tesseract.detect(imgPath) to get page orientation, then call image magic to rotate it, and then use Tesseract.recognize(imgPath) to get HOCR data. But Tesseract.recognize(imgPath,{tessedit_pageseg_mode: "1"}) should automatically do all the above, it just fails for me because of the reason I'm not getting.

    Thank you.

  • Unexpected token <

    Unexpected token <

    With version 1.0.10 When using the Simple example of the documentation i got:

    Uncaught SyntaxError: Unexpected token < at blob:http://localhost:3000/49b8248d-3fa6-4529-a4bc-532adde1c6cc:1 (anonymous) @ blob:http://localhost:3000/49b8248d-3fa6-4529-a4bc-532adde1c6cc:1

  • Does not get text from image if image contains some background objects

    Does not get text from image if image contains some background objects

    Im building mobile app for Motivation qoutes, where any one can add qoutes and send image link that contains qoutes in it. I used the following image. img3

    it gives me text with proper line breaks, happy to see this. "YOU LEARNED TO LAUGH BEFORE YOU LEARNED TO TALK."

    But when I use this image img

    It gives me following text.

    w?- 3 <5" I! r

    • WRI‘Cf' v EALWAvs“ TEA WUERIGHT 1“ CE 0N5 *‘ A‘ £ I

    Just want to ask if the lib only works with image without any background.

  • .EXE or Solution File in the downloads folder please

    .EXE or Solution File in the downloads folder please

    Downloaded like 17 versions from 2.15 and below, and none of the folders had a .exe in them. No instructions were given. Did some research and downloaded Visual Studios 2022. Build Button was missing, because I was opening Folder instead of Solution. No Solution File included Either.

    Please Update/Fix

    Also: Downloaded Tesseract OCR https://github.com/UB-Mannheim/tesseract/wiki -64 bit However Once installed, nothing opens. Opened Folder C:\Program Files\Tesseract-OCR There is nothing really to launch besides the tesseract Application, How ever Clicking it doesn't do anything. Text to image just does a quick flash of command prompt in a tiny window, and thats it,

    Not sure If this was supposed to be a Dependency/Requirement to run Tesseract.js Would really like a guide.

  • Tesseract.js can't load language files on deployed server

    Tesseract.js can't load language files on deployed server

    Describe the bug Tesseract throws error when running in a deployed VM (ex: "https://vmdev-01:1000/DocScanner").

    But works when running on localhost (ex: "https://localhost/DocScanner").

    The code and deployment is the same. Both servers are running windows 10, IIS and ASP.NET Core 3.1 and for Frontend i'm using Angular 12.

    To Reproduce Steps to reproduce the behavior:

    1. Setup Tesseract.js to run on browser (see code bellow)
    2. Deploy WebApp on a VPS or Docker
    3. Shows error loading languages

    Expected behavior Should work the same as in localhost.

    Screenshots Error Message: image

    "assets/scripts/tesseract" folder: image

    "assets/scripts/tesseract/lang-data" folder: image

    Package.json tesseract versions: image

    Desktop:

    • OS: Windows 10
    • Browser: Chrome
    • Version: 102.0.5005.115 (Official Build) (64-bit)

    Additional context

    Tesseract.js setup code:

    import { createWorker } from 'tesseract.js';
    
    private async OcrImage(img: string) {
        const worker = createWorker({
          corePath:
            (this.baseUrl ?? '/') +
            'assets/scripts/tesseract/tesseract-core.wasm.js',
          workerPath:
            (this.baseUrl ?? '/') + 'assets/scripts/tesseract/worker.min.js',
          langPath:
            location.origin +
            (this.baseUrl ?? '/') +
            'assets/scripts/tesseract/lang-data',
          cacheMethod: 'none'
        });
    
        await worker.load();
        await worker.loadLanguage('eng');
        await worker.initialize('eng');
        await worker.setParameters({
          tessedit_char_whitelist: '0123456789',
        });
    
        const data = await worker.recognize(img);
        await worker.terminate();
        return data;
      }
    
  • Bump node-fetch from 2.6.1 to 2.6.7

    Bump node-fetch from 2.6.1 to 2.6.7

    Bumps node-fetch from 2.6.1 to 2.6.7.

    Release notes

    Sourced from node-fetch's releases.

    v2.6.7

    Security patch release

    Recommended to upgrade, to not leak sensitive cookie and authentication header information to 3th party host while a redirect occurred

    What's Changed

    Full Changelog: https://github.com/node-fetch/node-fetch/compare/v2.6.6...v2.6.7

    v2.6.6

    What's Changed

    Full Changelog: https://github.com/node-fetch/node-fetch/compare/v2.6.5...v2.6.6

    v2.6.2

    fixed main path in package.json

    Commits
    • 1ef4b56 backport of #1449 (#1453)
    • 8fe5c4e 2.x: Specify encoding as an optional peer dependency in package.json (#1310)
    • f56b0c6 fix(URL): prefer built in URL version when available and fallback to whatwg (...
    • b5417ae fix: import whatwg-url in a way compatible with ESM Node (#1303)
    • 18193c5 fix v2.6.3 that did not sending query params (#1301)
    • ace7536 fix: properly encode url with unicode characters (#1291)
    • 152214c Fix(package.json): Corrected main file path in package.json (#1274)
    • See full diff in compare view
    Maintainer changes

    This version was pushed to npm by endless, a new releaser for node-fetch since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Bump shell-quote from 1.6.1 to 1.7.3

    Bump shell-quote from 1.6.1 to 1.7.3

    Bumps shell-quote from 1.6.1 to 1.7.3.

    Release notes

    Sourced from shell-quote's releases.

    v1.7.2

    • Fix a regression introduced in 1.6.3. This reverts the Windows path quoting fix. (144e1c2)

    v1.7.1

    • Fix $ being removed when not part of an environment variable name. (@​Adman in #32)

    v1.7.0

    • Add support for parsing >> and >& redirection operators. (@​forivall in #16)
    • Add support for parsing <( process substitution operator. (@​cuonglm in #15)

    v1.6.3

    • Fix Windows path quoting problems. (@​dy in #34)

    v1.6.2

    • Remove dependencies in favour of native methods. (@​zertosh in #21)
    Changelog

    Sourced from shell-quote's changelog.

    1.7.3

    • Fix a security issue where the regex for windows drive letters allowed some shell meta-characters to escape the quoting rules. (CVE-2021-42740)

    1.7.2

    • Fix a regression introduced in 1.6.3. This reverts the Windows path quoting fix. (144e1c2)

    1.7.1

    • Fix $ being removed when not part of an environment variable name. (@​Adman in #32)

    1.7.0

    • Add support for parsing >> and >& redirection operators. (@​forivall in #16)
    • Add support for parsing <( process substitution operator. (@​cuonglm in #15)

    1.6.3

    • Fix Windows path quoting problems. (@​dy in #34)

    1.6.2

    • Remove dependencies in favour of native methods. (@​zertosh in #21)
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Bump jpeg-js from 0.4.2 to 0.4.4

    Bump jpeg-js from 0.4.2 to 0.4.4

    Bumps jpeg-js from 0.4.2 to 0.4.4.

    Release notes

    Sourced from jpeg-js's releases.

    v0.4.4

    v0.4.4 (2022-06-07)

    • feat: add comment tag encoding (#87) (13e1ffa), closes #87
    • fix: validate sampling factors (#106) (9ccd35f), closes #106
    • fix(decoder): rethrow a more helpful error if Buffer is undefined (#93) (b58cc11), closes #93
    • chore(ci): migrate to github actions (#86) (417e8e2), closes #86
    • chore(deps): bump y18n from 4.0.0 to 4.0.3 (#98) (2c90858), closes #98
    • chore(deps): bump ws from 7.2.3 to 7.4.6 (#91) (fd73289), closes #91
    • chore(deps): bump hosted-git-info from 2.8.8 to 2.8.9 (#90) (9449a8b), closes #90
    • chore(deps): bump lodash from 4.17.15 to 4.17.21 (#89) (ffdc4a4), closes #89

    v0.4.3

    v0.4.3 (2021-01-11)

    • fix: handle 0x00E1 / 0x00E0 segments from Pixel phones (#84) (a2d7ed9), closes #84
    Commits
    • 9ccd35f fix: validate sampling factors (#106)
    • b58cc11 fix(decoder): rethrow a more helpful error if Buffer is undefined (#93)
    • 2c90858 chore(deps): bump y18n from 4.0.0 to 4.0.3 (#98)
    • fd73289 chore(deps): bump ws from 7.2.3 to 7.4.6 (#91)
    • 9449a8b chore(deps): bump hosted-git-info from 2.8.8 to 2.8.9 (#90)
    • ffdc4a4 chore(deps): bump lodash from 4.17.15 to 4.17.21 (#89)
    • 13e1ffa feat: add comment tag encoding (#87)
    • 417e8e2 chore(ci): migrate to github actions (#86)
    • a2d7ed9 fix: handle 0x00E1 / 0x00E0 segments from Pixel phones (#84)
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Dark theme logo in README

    Dark theme logo in README

    Using a white background on the original logo image doesn't look the best on dark theme. Added a white lined logo with a transparent background. Kept the original HTML <img> in the readme but added prefers-color-scheme: dark source element.

It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Oct 15, 2021
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Dec 6, 2021
make a better chinese character recognition OCR than tesseract
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jun 18, 2022
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set

Jun 27, 2022
OCR engine for all the languages

Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout

Jun 24, 2022
A pure pytorch implemented ocr project including text detection and recognition
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

Jun 30, 2022
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Jun 21, 2022
Some bits of javascript to transcribe scanned pages using PageXML

nashi (nasḫī) Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's m

Feb 11, 2022
A Python wrapper for the tesseract-ocr API

tesserocr A simple, Pillow-friendly, wrapper around the tesseract-ocr API for Optical Character Recognition (OCR). tesserocr integrates directly with

Jul 3, 2022
FastOCR is a desktop application for OCR API.

FastOCR FastOCR is a desktop application for OCR API. Installation Arch Linux fastocr-git @ AUR Build from AUR or install with your favorite AUR helpe

Jun 25, 2022
OCR-D-compliant page segmentation

ocrd_segment This repository aims to provide a number of OCR-D-compliant processors for layout analysis and evaluation. Installation In your virtual e

Jun 3, 2022
OCR software for recognition of handwritten text
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Jun 27, 2022
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

Jun 29, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Jun 30, 2022
python ocr using tesseract/ with EAST opencv detector

pytextractor python ocr using tesseract/ with EAST opencv text detector Uses the EAST opencv detector defined here with pytesseract to extract text(de

Feb 28, 2022
Run tesseract with the tesserocr bindings with @OCR-D's interfaces

ocrd_tesserocr Crop, deskew, segment into regions / tables / lines / words, or recognize with tesserocr Introduction This package offers OCR-D complia

May 23, 2022
A set of workflows for corpus building through OCR, post-correction and normalisation
A set of workflows for corpus building through OCR, post-correction and normalisation

PICCL: Philosophical Integrator of Computational and Corpus Libraries PICCL offers a workflow for corpus building and builds on a variety of tools. Th

Apr 21, 2022
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Overview This collection demonstrates how to construct and train a deep, bidirectional stacked LSTM using CNN features as input with CTC loss to perfo

Jun 18, 2022
🖺 OCR using tensorflow with attention

tensorflow-ocr ?? OCR using tensorflow with attention, batteries included Installation git clone --recursive http://github.com/pannous/tensorflow-ocr

Jun 27, 2022