Linear programming solver for paper-reviewer matching and mind-matching

Paper-Reviewer Matcher

A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is implemented based on this article). This package solves problem of assigning paper to reviewers with constrains by solving linear programming problem. We minimize global distance between papers and reviewers in topic space (e.g. topic modeling can be Principal component, Latent Semantic Analysis (LSA), etc.).

Here is a diagram of problem setup and how we solve the problem.

Mind-Match Command Line

Mind-Match is a session we run at Cognitive Computational Neuroscience (CCN) conference. We use a combination of topic modeling and linear programming to solve optimal matching problem. To run example Mind-Match algorithm on sample of 500 people, you can clone the repository and run the following

python mindmatch.py data/mindmatch_example.csv --n_match=6 --n_trim=50

in the root of this repo. This should produce a matching output output_match.csv in this relative location. However, when people get much larger this script takes quite a long time to run. We use pre-cluster into groups before running the mind-matching to make the script runs faster. Below is an example script for pre-clustering and mind-matching on all data:

python mindmatch_cluster.py data/mindmatch_example.csv --n_match=6 --n_trim=50 --n_clusters=4

Example script for the conferences

Here, I include a recent scripts for our Mind Matching session for CCN conference.

  • ccn_mind_matching_2019.py contains script for Mind Matching session (match scientists to scientists) for CCN conference
  • ccn_paper_reviewer_matching.py contains script for matching publications to reviewers for CCN conference, see example of CSV files in data folder

The code makes the distance metric of topics between incoming papers with reviewers (for ccn_paper_reviewer_matching.py) and between people with people (for ccn_mind_matching_2019). We trim the metric so that the problem is not too big to solve using or-tools. It then solves linear programming problem to assign the best matches which minimize the global distance between papers to reviewers. After that, we make the output that can be used by the organizers of the CCN conference -- pairs of paper and reviewers or mind-matching schedule between people to people in the conference. You can see of how it works below.

Dependencies

Use pip to install dependencies

pip install -r requirements.txt

Please see Stackoverflow if you have a problem installing or-tools on MacOS. You can use pip to install protobuf before installing or-tools

pip install protobuf==3.0.0b4
pip install ortools

for Python 3.6,

pip install --user --upgrade ortools

Citations

If you use Paper-Reviewer Matcher in your work or conference, please cite us as follows

@misc{achakulvisut2018,
    author = {Achakulvisut, Titipat and Acuna, Daniel E. and Kording, Konrad},
    title = {Paper-Reviewer Matcher},
    year = {2018},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/titipata/paper-reviewer-matcher}},
    commit = {9d346ee008e2789d34034c2b330b6ba483537674}
}

Members

Owner
Titipat Achakulvisut
Science of Science & Applied NLP | Mahidol University | Former @KordingLab, University of Pennsylvania, and intern @allenai, organizer/co-founder of neuromatch.
Titipat Achakulvisut
Comments
  • Would you mind sending your test example file?

    Would you mind sending your test example file?

    Hi, I noticed that you used the CNN18_MindMatchData.xlsx as test example in cnn_mind_matching.py. I have searched this file by google, while unfortunately nothing. I don't know whether it is private or not. So would you mind if you send this file to my email [email protected] ? Thank you.

  • Bump protobuf from 3.0.0b4 to 3.15.0

    Bump protobuf from 3.0.0b4 to 3.15.0

    Bumps protobuf from 3.0.0b4 to 3.15.0.

    Release notes

    Sourced from protobuf's releases.

    Protocol Buffers v3.15.0

    Protocol Compiler

    • Optional fields for proto3 are enabled by default, and no longer require the --experimental_allow_proto3_optional flag.

    C++

    • MessageDifferencer: fixed bug when using custom ignore with multiple unknown fields
    • Use init_seg in MSVC to push initialization to an earlier phase.
    • Runtime no longer triggers -Wsign-compare warnings.
    • Fixed -Wtautological-constant-out-of-range-compare warning.
    • DynamicCastToGenerated works for nullptr input for even if RTTI is disabled
    • Arena is refactored and optimized.
    • Clarified/specified that the exact value of Arena::SpaceAllocated() is an implementation detail users must not rely on. It should not be used in unit tests.
    • Change the signature of Any::PackFrom() to return false on error.
    • Add fast reflection getter API for strings.
    • Constant initialize the global message instances
    • Avoid potential for missed wakeup in UnknownFieldSet
    • Now Proto3 Oneof fields have "has" methods for checking their presence in C++.
    • Bugfix for NVCC
    • Return early in _InternalSerialize for empty maps.
    • Adding functionality for outputting map key values in proto path logging output (does not affect comparison logic) and stop printing 'value' in the path. The modified print functionality is in the MessageDifferencer::StreamReporter.
    • Fixed protocolbuffers/protobuf#8129
    • Ensure that null char symbol, package and file names do not result in a crash.
    • Constant initialize the global message instances
    • Pretty print 'max' instead of numeric values in reserved ranges.
    • Removed remaining instances of std::is_pod, which is deprecated in C++20.
    • Changes to reduce code size for unknown field handling by making uncommon cases out of line.
    • Fix std::is_pod deprecated in C++20 (#7180)
    • Fix some -Wunused-parameter warnings (#8053)
    • Fix detecting file as directory on zOS issue #8051 (#8052)
    • Don't include sys/param.h for _BYTE_ORDER (#8106)
    • remove CMAKE_THREAD_LIBS_INIT from pkgconfig CFLAGS (#8154)
    • Fix TextFormatMapTest.DynamicMessage issue#5136 (#8159)
    • Fix for compiler warning issue#8145 (#8160)
    • fix: support deprecated enums for GCC < 6 (#8164)
    • Fix some warning when compiling with Visual Studio 2019 on x64 target (#8125)

    Python

    • Provided an override for the reverse() method that will reverse the internal collection directly instead of using the other methods of the BaseContainer.
    • MessageFactory.CreateProtoype can be overridden to customize class creation.

    ... (truncated)

    Commits
    • ae50d9b Update protobuf version
    • 8260126 Update protobuf version
    • c741c46 Resovled issue in the .pb.cc files
    • eef2764 Resolved an issue where NO_DESTROY and CONSTINIT were in incorrect order
    • 0040102 Updated collect_all_artifacts.sh for Ubuntu Xenial
    • 26cb6a7 Delete root-owned files in Kokoro builds
    • 1e924ef Update port_def.inc
    • 9a80cf1 Update coded_stream.h
    • a97c4f4 Merge pull request #8276 from haberman/php-warning
    • 44cd75d Merge pull request #8282 from haberman/changelog
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

  • Matching algorithm for other purposes

    Matching algorithm for other purposes

    Hi,

    Thanks so much for making this awesome tool.

    I have two questions:

    1. Can you use the Neuromatch algorithm more generally, outside of its original intent of paper-reviewer-matcher? I would be interested in doing a matching of users to a database of options in a different context. Is that possible and what would be the best way to get started?

    2. What's the licensing like? Can the algorithm be used in a commercial setting?

    Thanks a lot!

  • Time estimate to run.

    Time estimate to run.

    Hi, first thanks to open source the paper! I recently got interested in "matching algorithms" and run into this interesting paper. So I try to replicate the paper and run the sample as suggested. But it takes really long to run on my local mac(> 2.5 Hrs). You had referenced that "Here, we have around 500 users and recommended to trim around 50. This takes 1 hours to run." What's your benchmark for hardware for running this experiment?

    Thanks!

  • Similar code to create groups

    Similar code to create groups

    Hi! This is more of a question/request. Do you have code to mind-match participants into groups? Meaning, your current output csv would be something like:

    user_id,match_ids 1,93;217;463;645;783;1101 [...] 93, 1;217;463;645;783;1101 # has the same ids as before

  • Output of mindmatch.py

    Output of mindmatch.py

    Hello! I was just trying out mindmatch.py and there is something I don’t get from the output. If we take for example the provided data/output_match.csv :

    user_id,match_ids 1,93;217;463;645;783;1101 […] 217,97;304;731;792;1010;1134

    Shouldn't user_id 217 also have 1 in its match_ids?

    I also tried the function on some data of mine and I get the same situation.

  • Explanation of matching

    Explanation of matching

    Hi,

    I'm not familiar with topic modeling, so this question might be naive.

    I'm curious whether we can get explanation for each matching pair, e.g. why is Review A assigned Paper Z?

  • Add README and command line script for Mind Matching

    Add README and command line script for Mind Matching

    I'm thinking to make the script something as follows:

    python mind_match.py --input example.csv --n_match 6
    

    and example.csv will have the following column: user_id, fullname, title, abstract, conflicts. We will return a CSV file with the columns user_id, match_ids.

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".
A pytorch implementation of the ACL2019 paper

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

Jul 25, 2022
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Aug 2, 2022
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Feb 11, 2021
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Feb 17, 2021
🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk <[email protected]> and Michael

Jul 29, 2022
🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk <[email protected]> and Michael

Feb 12, 2021
Python package for performing Entity and Text Matching using Deep Learning.
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

Jul 24, 2022
Facilitating the design, comparison and sharing of deep text matching models.
Facilitating the design, comparison and sharing of deep text matching models.

MatchZoo Facilitating the design, comparison and sharing of deep text matching models. MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。 ?? News

Aug 3, 2022
🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk <[email protected]> and Michael

Feb 17, 2021
Python package for performing Entity and Text Matching using Deep Learning.
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

Feb 9, 2021
Facilitating the design, comparison and sharing of deep text matching models.
Facilitating the design, comparison and sharing of deep text matching models.

MatchZoo Facilitating the design, comparison and sharing of deep text matching models. MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。 ?? News

Feb 18, 2021
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Aug 5, 2022
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Feb 12, 2021
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Feb 17, 2021
Code release for "COTR: Correspondence Transformer for Matching Across Images"
Code release for

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

Aug 1, 2022
The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords
The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Jul 27, 2022
Pattern Matching in Python
Pattern Matching in Python

Pattern Matching finalmente chega no Python 3.10. E daí? "Pattern matching", ou "correspondência de padrões" como é conhecido no Brasil. Algumas pesso

Feb 16, 2022
Share constant definitions between programming languages and make your constants constant again

Introduction Reconstant lets you share constant and enum definitions between programming languages. Constants are defined in a yaml file and converted

Jun 22, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Jul 1, 2022