A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine

Python-package GitHub Actions Build Status R-package GitHub Actions Build Status CUDA Version GitHub Actions Build Status Static Analysis GitHub Actions Build Status Azure Pipelines Build Status Appveyor Build Status Documentation Status Link checks License Python Versions PyPI Version CRAN Version

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:

  • Faster training speed and higher efficiency.
  • Lower memory usage.
  • Better accuracy.
  • Support of parallel, distributed, and GPU learning.
  • Capable of handling large-scale data.

For further details, please refer to Features.

Benefitting from these advantages, LightGBM is being widely-used in many winning solutions of machine learning competitions.

Comparison experiments on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, distributed learning experiments show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.

Get Started and Documentation

Our primary documentation is at https://lightgbm.readthedocs.io/ and is generated from this repository. If you are new to LightGBM, follow the installation instructions on that site.

Next you may want to read:

Documentation for contributors:

News

Please refer to changelogs at GitHub releases page.

Some old update logs are available at Key Events page.

External (Unofficial) Repositories

FLAML (AutoML library for hyperparameter optimization): https://github.com/microsoft/FLAML

Optuna (hyperparameter optimization framework): https://github.com/optuna/optuna

Julia-package: https://github.com/IQVIA-ML/LightGBM.jl

JPMML (Java PMML converter): https://github.com/jpmml/jpmml-lightgbm

Treelite (model compiler for efficient deployment): https://github.com/dmlc/treelite

cuML Forest Inference Library (GPU-accelerated inference): https://github.com/rapidsai/cuml

daal4py (Intel CPU-accelerated inference): https://github.com/IntelPython/daal4py

m2cgen (model appliers for various languages): https://github.com/BayesWitnesses/m2cgen

leaves (Go model applier): https://github.com/dmitryikh/leaves

ONNXMLTools (ONNX converter): https://github.com/onnx/onnxmltools

SHAP (model output explainer): https://github.com/slundberg/shap

MMLSpark (LightGBM on Spark): https://github.com/Azure/mmlspark

Kubeflow Fairing (LightGBM on Kubernetes): https://github.com/kubeflow/fairing

Kubeflow Operator (LightGBM on Kubernetes): https://github.com/kubeflow/xgboost-operator

ML.NET (.NET/C#-package): https://github.com/dotnet/machinelearning

LightGBM.NET (.NET/C#-package): https://github.com/rca22/LightGBM.Net

Ruby gem: https://github.com/ankane/lightgbm

LightGBM4j (Java high-level binding): https://github.com/metarank/lightgbm4j

lightgbm-rs (Rust binding): https://github.com/vaaaaanquish/lightgbm-rs

MLflow (experiment tracking, model monitoring framework): https://github.com/mlflow/mlflow

{treesnip} (R {parsnip}-compliant interface): https://github.com/curso-r/treesnip

{mlr3learners.lightgbm} (R {mlr3}-compliant interface): https://github.com/mlr3learners/mlr3learners.lightgbm

Support

How to Contribute

Check CONTRIBUTING page.

Microsoft Open Source Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Reference Papers

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree". Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 3149-3157.

Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu. "A Communication-Efficient Parallel Algorithm for Decision Tree". Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 1279-1287.

Huan Zhang, Si Si and Cho-Jui Hsieh. "GPU Acceleration for Large-scale Tree Boosting". SysML Conference, 2018.

Note: If you use LightGBM in your GitHub projects, please add lightgbm in the requirements.txt.

License

This project is licensed under the terms of the MIT license. See LICENSE for additional details.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Comments
  • Add support for CUDA-based GPU build

    Add support for CUDA-based GPU build

    This is the initial CUDA work. It should work similarly to the GPU/OCL work.

    To compile use - 'USE_CUDA=1'. Python unit tests should include 'device': 'cuda' where needed.

    All unit tests pass for CPU, GPU/OCL and CUDA. CPU & CUDA were tested on ppc64le and GPU/OCL was tested on x86_64

  • Fix model locale issue and improve model R/W performance.

    Fix model locale issue and improve model R/W performance.

    When Java is used, the default C++ locale is broken. This is true for Java providers that use the C API or even Python models that require JEP.

    This patch solves that issue making the model reads/writes insensitive to such settings. To achieve it, within the model read/write codebase:

    • C++ streams are imbued with the classic locale
    • Calls to functions that are dependent on the locale are replaced
    • The default locale is not changed!

    This approach means:

    • The user's locale is never tampered with, avoiding issues such as https://github.com/microsoft/LightGBM/issues/2979 with the previous approach https://github.com/microsoft/LightGBM/pull/2891
    • Datasets can still be read according the user's locale
    • The model file has a single format independent of locale

    Changes:

    • Performance improvements: More than 35% faster model read/writes. Use fast libraries for locale-agnostic conversion:
      • value->string: https://github.com/fmtlib/fmt
      • string->double: https://github.com/lemire/fast_double_parser (10x faster double parsing according to their benchmark)
    • Add CommonC namespace which provides faster locale-independent versions of Common's methods as if using the "C" locale
    • Model code makes conversions through CommonC
    • Cleanup unused Common methods

    Bugfixes:

    • https://github.com/microsoft/LightGBM/issues/3267
    • https://github.com/microsoft/LightGBM/issues/2500
    • https://github.com/microsoft/LightGBM/issues/2890
    • https://github.com/ninia/jep/issues/205 (as it is related to LGBM as well)
  • [R-package] miscellaneous changes to comply with CRAN requirements

    [R-package] miscellaneous changes to comply with CRAN requirements

    This PR attempts to address the most recent request for changes from CRAN, https://github.com/microsoft/LightGBM/issues/3293#issuecomment-682001047, on our way to #629

    See changes to cran-comments.md for details.

  • [python] [setup] improving installation

    [python] [setup] improving installation

    This PR makes installation of python package more clear. Should fix #874 . Adds possibility to install without CMake on Windows from existing .sln-file and Visual Studio.

    Am I right that existing solution file hasn't configuration with GPU support?

  • [python] Bug fix for first_metric_only on earlystopping.

    [python] Bug fix for first_metric_only on earlystopping.

    1. Order of metric list is not fixed even if it is defined by list. So explicitly indicating which metric is used on early stopping.
    2. Due to introducing eval_train_metric feature, if the feature is enabled, then the first metric became train score of the metrics, so it does not appropriate for early stopping. So until the specified metrics of validation is coming, skipping to check early stopping in the loop.

    (This PR is latest version of #2127. Once this PR is activated #2127 should be closed.)

  • [R-Package] CRAN issues

    [R-Package] CRAN issues

    Environment info

    Operating System: Windows 8.1 Pro CPU: i7-4600U R version: 3.4

    To make a release on CRAN, we will need first to fix all the errors / warnings / notes. Currently testing on Windows, but we will also need to test on Linux. If some of them cannot be fixed, we will need to have an explanation for each of those which will not be fixed by us. @guolinke

    Maybe time to add some vignettes @coforfe if you want to work on them.

    • [x] Fix CRAN errors
    • [x] Fix CRAN warnings
    • [x] Fix CRAN notes
    • [x] Add vignettes
    • [x] Make examples runnable for tests
    • [x] Switch to MinGW by default on Windows (it falls to MinGW if VS fails to be found)
    • [x] Clean install everytime when not using precompiled dll/lib
    • [x] Pass CRAN checks on Windows
    • [x] Pass CRAN checks on Linux
    • [x] Make CRAN-valid portable installation configuration (#2960)
    • [x] Submit to CRAN
    • [x] Get accepted on CRAN

    00install.out:

    * installing *source* package 'lightgbm' ...
    ** libs
    installing via 'install.libs.R' to C:/tmp/RtmpG2NQSv/lightgbm.Rcheck/lightgbm
    Warning in file.copy("./../../include", "./", overwrite = TRUE, recursive = TRUE) :
      problem copying .\..\..\include to .\include: No such file or directory
    Error in eval(ei, envir) : Cannot find folder LightGBM/include
    * removing 'C:/tmp/RtmpG2NQSv/lightgbm.Rcheck/lightgbm'
    

    Windows (fake) CRAN log:

    R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
    Copyright (C) 2017 The R Foundation for Statistical Computing
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    
    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.
    
    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.
    
    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.
    
    > library(devtools)
    > library(roxygen2) # devtools::install_github("klutometis/roxygen")
    > setwd("C:/LightGBM/R-package")
    > devtools::check()
    Updating lightgbm documentation
    Loading lightgbm
    Re-compiling lightgbm
    "C:/PROGRA~1/R/R-34~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL "C:\LightGBM\R-package"  \
      --library="C:\tmp\RtmpG2NQSv\devtools_install_19fa019043416" --no-R --no-data --no-help --no-demo --no-inst --no-docs --no-exec --no-multiarch  \
      --no-test-load --preclean 
    
    * installing *source* package 'lightgbm' ...
    ** libs
    installing via 'install.libs.R' to C:/tmp/RtmpG2NQSv/devtools_install_19fa019043416/lightgbm
    -- Building for: Visual Studio 15 2017
    -- The C compiler identification is MSVC 19.11.25303.0
    -- The CXX compiler identification is MSVC 19.11.25303.0
    -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/Preview/Community/VC/Tools/MSVC/14.11.25301/bin/HostX86/x64/cl.exe
    -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/Preview/Community/VC/Tools/MSVC/14.11.25301/bin/HostX86/x64/cl.exe -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/Preview/Community/VC/Tools/MSVC/14.11.25301/bin/HostX86/x64/cl.exe
    -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/Preview/Community/VC/Tools/MSVC/14.11.25301/bin/HostX86/x64/cl.exe -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Try OpenMP C flag = [/openmp]
    -- Performing Test OpenMP_FLAG_DETECTED
    -- Performing Test OpenMP_FLAG_DETECTED - Success
    -- Try OpenMP CXX flag = [/openmp]
    -- Performing Test OpenMP_FLAG_DETECTED
    -- Performing Test OpenMP_FLAG_DETECTED - Success
    -- Found OpenMP: /openmp  
    -- Configuring done
    -- Generating done
    -- Build files have been written to: C:/LightGBM/R-package/src/build
    Microsoft (R) Build Engine version 15.3.118.39484
    Copyright (C) Microsoft Corporation. All rights reserved.
    
    Build started 16/06/2017 07:00:42 PM.
    Project "C:\LightGBM\R-package\src\build\_lightgbm.vcxproj" on node 1 (default targets).
    Project "C:\LightGBM\R-package\src\build\_lightgbm.vcxproj" (1) is building "C:\LightGBM\R-package\src\build\ZERO_CHECK.vcxproj" (2) on node 1 (default targets).
    PrepareForBuild:
      Creating directory "x64\Release\ZERO_CHECK\".
      Creating directory "x64\Release\ZERO_CHECK\ZERO_CHECK.tlog\".
    InitializeBuildStatus:
      Creating "x64\Release\ZERO_CHECK\ZERO_CHECK.tlog\unsuccessfulbuild" because "AlwaysCreate" was specified.
    CustomBuild:
      Checking Build System
      CMake does not need to re-run because C:/LightGBM/R-package/src/build/CMakeFiles/generate.stamp is up-to-date.
    FinalizeBuildStatus:
      Deleting file "x64\Release\ZERO_CHECK\ZERO_CHECK.tlog\unsuccessfulbuild".
      Touching "x64\Release\ZERO_CHECK\ZERO_CHECK.tlog\ZERO_CHECK.lastbuildstate".
    Done Building Project "C:\LightGBM\R-package\src\build\ZERO_CHECK.vcxproj" (default targets).
    PrepareForBuild:
      Creating directory "_lightgbm.dir\Release\".
      Creating directory "C:\LightGBM\R-package\src\Release\".
      Creating directory "_lightgbm.dir\Release\_lightgbm.tlog\".
    InitializeBuildStatus:
      Creating "_lightgbm.dir\Release\_lightgbm.tlog\unsuccessfulbuild" because "AlwaysCreate" was specified.
    CustomBuild:
      Building Custom Rule C:/LightGBM/R-package/src/CMakeLists.txt
      CMake does not need to re-run because C:/LightGBM/R-package/src/build/CMakeFiles/generate.stamp is up-to-date.
    ClCompile:
      C:\Program Files (x86)\Microsoft Visual Studio\Preview\Community\VC\Tools\MSVC\14.11.25301\bin\HostX86\x64\CL.exe /c /I"C:\LightGBM\R-package\src\include" /nologo /W4 /WX- /diagnostics:classic /O2 /Ob2 /Oi /Ot /Oy /GL /D WIN32 /D _WINDOWS /D NDEBUG /D USE_SOCKET /D "CMAKE_INTDIR=\"Release\"" /D _lightgbm_EXPORTS /D _WINDLL /D _MBCS /UNDEBUG /Gm- /EHsc /MD /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /openmp /Fo"_lightgbm.dir\Release\\" /Fd"_lightgbm.dir\Release\vc141.pdb" /Gd /TP /errorReport:queue   -pedantic -g -O0 "C:\LightGBM\R-package\src\src\c_api.cpp" "C:\LightGBM\R-package\src\src\lightgbm_R.cpp" "C:\LightGBM\R-package\src\src\application\application.cpp" "C:\LightGBM\R-package\src\src\boosting\boosting.cpp" "C:\LightGBM\R-package\src\src\boosting\gbdt.cpp" "C:\LightGBM\R-package\src\src\boosting\gbdt_prediction.cpp" "C:\LightGBM\R-package\src\src\boosting\prediction_early_stop.cpp" "C:\LightGBM\R-package\src\src\io\bin.cpp" "C:\LightGBM\R-package\src\src\io\config.cpp" "C:\LightGBM\R-package\src\src\io\dataset.cpp" "C:\LightGBM\R-package\src\src\io\dataset_loader.cpp" "C:\LightGBM\R-package\src\src\io\metadata.cpp" "C:\LightGBM\R-package\src\src\io\parser.cpp" "C:\LightGBM\R-package\src\src\io\tree.cpp" "C:\LightGBM\R-package\src\src\metric\dcg_calculator.cpp" "C:\LightGBM\R-package\src\src\metric\metric.cpp" "C:\LightGBM\R-package\src\src\objective\objective_function.cpp" "C:\LightGBM\R-package\src\src\network\linker_topo.cpp" "C:\LightGBM\R-package\src\src\network\linkers_mpi.cpp" "C:\LightGBM\R-package\src\src\network\linkers_socket.cpp" "C:\LightGBM\R-package\src\src\network\network.cpp" "C:\LightGBM\R-package\src\src\treelearner\data_parallel_tree_learner.cpp" "C:\LightGBM\R-package\src\src\treelearner\feature_parallel_tree_learner.cpp" "C:\LightGBM\R-package\src\src\treelearner\gpu_tree_learner.cpp" "C:\LightGBM\R-package\src\src\treelearner\serial_tree_learner.cpp" "C:\LightGBM\R-package\src\src\treelearner\tree_learner.cpp" "C:\LightGBM\R-package\src\src\treelearner\voting_parallel_tree_learner.cpp"
    cl : Command line warning D9025: overriding '/DNDEBUG' with '/UNDEBUG' [C:\LightGBM\R-package\src\build\_lightgbm.vcxproj]
      c_api.cpp
    cl : Command line warning D9002: ignoring unknown option '-pedantic' [C:\LightGBM\R-package\src\build\_lightgbm.vcxproj]
    cl : Command line warning D9002: ignoring unknown option '-g' [C:\LightGBM\R-package\src\build\_lightgbm.vcxproj]
    cl : Command line warning D9002: ignoring unknown option '-O0' [C:\LightGBM\R-package\src\build\_lightgbm.vcxproj]
      lightgbm_R.cpp
      application.cpp
      boosting.cpp
      gbdt.cpp
      gbdt_prediction.cpp
      prediction_early_stop.cpp
      bin.cpp
      config.cpp
      dataset.cpp
      dataset_loader.cpp
      metadata.cpp
      parser.cpp
      tree.cpp
      dcg_calculator.cpp
      metric.cpp
      objective_function.cpp
      linker_topo.cpp
      linkers_mpi.cpp
      linkers_socket.cpp
      Compiling...
      network.cpp
      data_parallel_tree_learner.cpp
      feature_parallel_tree_learner.cpp
      gpu_tree_learner.cpp
      serial_tree_learner.cpp
      tree_learner.cpp
      voting_parallel_tree_learner.cpp
    Link:
      C:\Program Files (x86)\Microsoft Visual Studio\Preview\Community\VC\Tools\MSVC\14.11.25301\bin\HostX86\x64\link.exe /ERRORREPORT:QUEUE /OUT:"C:\LightGBM\R-package\src\Release\lib_lightgbm.dll" /INCREMENTAL:NO /NOLOGO kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /manifest:embed /PDB:"C:/LightGBM/R-package/src/Release/lib_lightgbm.pdb" /SUBSYSTEM:CONSOLE /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:/LightGBM/R-package/src/Release/lib_lightgbm.lib" /MACHINE:X64   /machine:x64 /DLL _lightgbm.dir\Release\c_api.obj
      _lightgbm.dir\Release\lightgbm_R.obj
      _lightgbm.dir\Release\application.obj
      _lightgbm.dir\Release\boosting.obj
      _lightgbm.dir\Release\gbdt.obj
      _lightgbm.dir\Release\gbdt_prediction.obj
      _lightgbm.dir\Release\prediction_early_stop.obj
      _lightgbm.dir\Release\bin.obj
      _lightgbm.dir\Release\config.obj
      _lightgbm.dir\Release\dataset.obj
      _lightgbm.dir\Release\dataset_loader.obj
      _lightgbm.dir\Release\metadata.obj
      _lightgbm.dir\Release\parser.obj
      _lightgbm.dir\Release\tree.obj
      _lightgbm.dir\Release\dcg_calculator.obj
      _lightgbm.dir\Release\metric.obj
      _lightgbm.dir\Release\objective_function.obj
      _lightgbm.dir\Release\linker_topo.obj
      _lightgbm.dir\Release\linkers_mpi.obj
      _lightgbm.dir\Release\linkers_socket.obj
      _lightgbm.dir\Release\network.obj
      _lightgbm.dir\Release\data_parallel_tree_learner.obj
      _lightgbm.dir\Release\feature_parallel_tree_learner.obj
      _lightgbm.dir\Release\gpu_tree_learner.obj
      _lightgbm.dir\Release\serial_tree_learner.obj
      _lightgbm.dir\Release\tree_learner.obj
      _lightgbm.dir\Release\voting_parallel_tree_learner.obj
      c_api.obj : MSIL .netmodule or module compiled with /GL found; restarting link with /LTCG; add /LTCG to the link command line to improve linker performance
         Creating library C:/LightGBM/R-package/src/Release/lib_lightgbm.lib and object C:/LightGBM/R-package/src/Release/lib_lightgbm.exp
      Generating code
      Finished generating code
      _lightgbm.vcxproj -> C:\LightGBM\R-package\src\Release\lib_lightgbm.dll
    FinalizeBuildStatus:
      Deleting file "_lightgbm.dir\Release\_lightgbm.tlog\unsuccessfulbuild".
      Touching "_lightgbm.dir\Release\_lightgbm.tlog\_lightgbm.lastbuildstate".
    Done Building Project "C:\LightGBM\R-package\src\build\_lightgbm.vcxproj" (default targets).
    
    Build succeeded.
    
    "C:\LightGBM\R-package\src\build\_lightgbm.vcxproj" (default target) (1) ->
    (ClCompile target) -> 
      cl : Command line warning D9025: overriding '/DNDEBUG' with '/UNDEBUG' [C:\LightGBM\R-package\src\build\_lightgbm.vcxproj]
      cl : Command line warning D9002: ignoring unknown option '-pedantic' [C:\LightGBM\R-package\src\build\_lightgbm.vcxproj]
      cl : Command line warning D9002: ignoring unknown option '-g' [C:\LightGBM\R-package\src\build\_lightgbm.vcxproj]
      cl : Command line warning D9002: ignoring unknown option '-O0' [C:\LightGBM\R-package\src\build\_lightgbm.vcxproj]
    
        4 Warning(s)
        0 Error(s)
    
    Time Elapsed 00:00:44.73
    Found library file: C:/LightGBM/R-package/src/Release/lib_lightgbm.dll to move to C:/tmp/RtmpG2NQSv/devtools_install_19fa019043416/lightgbm/libs/x64* DONE (lightgbm)
    Loading required package: R6
    Writing NAMESPACE
    Writing predict.lgb.Booster.Rd
    Setting env vars -------------------------------------------------------------------------------------------------------------------------------------------
    CFLAGS  : -Wall -pedantic
    CXXFLAGS: -Wall -pedantic
    Building lightgbm ------------------------------------------------------------------------------------------------------------------------------------------
    "C:/PROGRA~1/R/R-34~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD build "C:\LightGBM\R-package" --no-resave-data  \
      --no-manual 
    
    * checking for file 'C:\LightGBM\R-package/DESCRIPTION' ... OK
    * preparing 'lightgbm':
    * checking DESCRIPTION meta-information ... OK
    * cleaning src
    * checking for LF line-endings in source and make files
    * checking for empty or unneeded directories
    Removed empty directory 'lightgbm/src/build/CMakeFiles/3.8.0/CompilerIdC/tmp'
    Removed empty directory 'lightgbm/src/build/CMakeFiles/3.8.0/CompilerIdCXX/tmp'
    Removed empty directory 'lightgbm/src/build/CMakeFiles/CMakeTmp'
    * looking to see if a 'data/datalist' file should be added
    * building 'lightgbm_0.2.tar.gz'
    Warning in utils::tar(filepath, pkgname, compression = "gzip", compression_level = 9L,  :
      storing paths of more than 100 bytes is not portable:
      'lightgbm/src/build/CMakeFiles/3.8.0/CompilerIdCXX/Debug/CompilerIdCXX.tlog/CompilerIdCXX.lastbuildstate'
    
    Setting env vars -------------------------------------------------------------------------------------------------------------------------------------------
    _R_CHECK_CRAN_INCOMING_ : FALSE
    _R_CHECK_FORCE_SUGGESTS_: FALSE
    Checking lightgbm ------------------------------------------------------------------------------------------------------------------------------------------
    "C:/PROGRA~1/R/R-34~1.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD check "C:\tmp\RtmpG2NQSv/lightgbm_0.2.tar.gz" --as-cran  \
      --timings --no-manual 
    
    * using log directory 'C:/tmp/RtmpG2NQSv/lightgbm.Rcheck'
    * using R version 3.4.0 (2017-04-21)
    * using platform: x86_64-w64-mingw32 (64-bit)
    * using session charset: ISO8859-1
    * using options '--no-manual --as-cran'
    * checking for file 'lightgbm/DESCRIPTION' ... OK
    * checking extension type ... Package
    * this is package 'lightgbm' version '0.2'
    * checking package namespace information ... OK
    * checking package dependencies ... OK
    * checking if this is a source package ... WARNING
    Subdirectory 'src' contains:
      CMakeLists.txt
    These are unlikely file names for src files.
    
    In addition to the above warning(s), found the following notes:
    
    Found the following apparent object files/libraries:
      src/Release/lib_lightgbm.dll
      src/build/CMakeFiles/3.8.0/CompilerIdC/Debug/CMakeCCompilerId.obj
      src/build/CMakeFiles/3.8.0/CompilerIdCXX/Debug/CMakeCXXCompilerId.obj
      src/build/_lightgbm.dir/Release/application.obj
      src/build/_lightgbm.dir/Release/bin.obj
      src/build/_lightgbm.dir/Release/boosting.obj
      src/build/_lightgbm.dir/Release/c_api.obj
      src/build/_lightgbm.dir/Release/config.obj
      src/build/_lightgbm.dir/Release/data_parallel_tree_learner.obj
      src/build/_lightgbm.dir/Release/dataset.obj
      src/build/_lightgbm.dir/Release/dataset_loader.obj
      src/build/_lightgbm.dir/Release/dcg_calculator.obj
      src/build/_lightgbm.dir/Release/feature_parallel_tree_learner.obj
      src/build/_lightgbm.dir/Release/gbdt.obj
      src/build/_lightgbm.dir/Release/gbdt_prediction.obj
      src/build/_lightgbm.dir/Release/gpu_tree_learner.obj
      src/build/_lightgbm.dir/Release/lightgbm_R.obj
      src/build/_lightgbm.dir/Release/linker_topo.obj
      src/build/_lightgbm.dir/Release/linkers_mpi.obj
      src/build/_lightgbm.dir/Release/linkers_socket.obj
      src/build/_lightgbm.dir/Release/metadata.obj
      src/build/_lightgbm.dir/Release/metric.obj
      src/build/_lightgbm.dir/Release/network.obj
      src/build/_lightgbm.dir/Release/objective_function.obj
      src/build/_lightgbm.dir/Release/parser.obj
      src/build/_lightgbm.dir/Release/prediction_early_stop.obj
      src/build/_lightgbm.dir/Release/serial_tree_learner.obj
      src/build/_lightgbm.dir/Release/tree.obj
      src/build/_lightgbm.dir/Release/tree_learner.obj
      src/build/_lightgbm.dir/Release/voting_parallel_tree_learner.obj
    Object files/libraries should not be included in a source package.
    * checking if there is a namespace ... OK
    * checking for executable files ... WARNING
    Found the following executable files:
      src/Release/lib_lightgbm.dll
      src/build/CMakeFiles/3.8.0/CMakeDetermineCompilerABI_C.bin
      src/build/CMakeFiles/3.8.0/CMakeDetermineCompilerABI_CXX.bin
      src/build/CMakeFiles/3.8.0/CompilerIdC/CompilerIdC.exe
      src/build/CMakeFiles/3.8.0/CompilerIdCXX/CompilerIdCXX.exe
      src/build/CMakeFiles/FindOpenMP/ompver_C.bin
      src/build/CMakeFiles/FindOpenMP/ompver_CXX.bin
      src/build/CMakeFiles/feature_tests.bin
    Source packages should not contain undeclared executable files.
    See section 'Package structure' in the 'Writing R Extensions' manual.
    * checking for hidden files and directories ... OK
    * checking for portable file names ... NOTE
    Found the following non-portable file path:
      lightgbm/src/build/CMakeFiles/3.8.0/CompilerIdCXX/Debug/CompilerIdCXX.tlog/CompilerIdCXX.lastbuildstate
    
    Tarballs are only required to store paths of up to 100 bytes and cannot
    store those of more than 256 bytes, with restrictions including to 100
    bytes for the final component.
    See section 'Package structure' in the 'Writing R Extensions' manual.
    * checking whether package 'lightgbm' can be installed ...Warning: running command '"C:/PROGRA~1/R/R-34~1.0/bin/x64/Rcmd.exe" INSTALL -l "C:/tmp/RtmpG2NQSv/lightgbm.Rcheck" --no-html --no-multiarch "C:\tmp\RTMPG2~1\LIGHTG~1.RCH\00_PKG~1\lightgbm"' had status 1
     ERROR
    Installation failed.
    See 'C:/tmp/RtmpG2NQSv/lightgbm.Rcheck/00install.out' for details.
    * DONE
    
    Status: 1 ERROR, 2 WARNINGs, 1 NOTE
    See
      'C:/tmp/RtmpG2NQSv/lightgbm.Rcheck/00check.log'
    for details.
    
    R CMD check results
    1 error  | 2 warnings | 1 note 
    checking whether package 'lightgbm' can be installed ... ERROR
    Installation failed.
    See 'C:/tmp/RtmpG2NQSv/lightgbm.Rcheck/00install.out' for details.
    
    checking if this is a source package ... WARNING
    Subdirectory 'src' contains:
      CMakeLists.txt
    These are unlikely file names for src files.
    
    In addition to the above warning(s), found the following notes:
    
    Found the following apparent object files/libraries:
      src/Release/lib_lightgbm.dll
      src/build/CMakeFiles/3.8.0/CompilerIdC/Debug/CMakeCCompilerId.obj
    ... 19 lines ...
      src/build/_lightgbm.dir/Release/metadata.obj
      src/build/_lightgbm.dir/Release/metric.obj
      src/build/_lightgbm.dir/Release/network.obj
      src/build/_lightgbm.dir/Release/objective_function.obj
      src/build/_lightgbm.dir/Release/parser.obj
      src/build/_lightgbm.dir/Release/prediction_early_stop.obj
      src/build/_lightgbm.dir/Release/serial_tree_learner.obj
      src/build/_lightgbm.dir/Release/tree.obj
      src/build/_lightgbm.dir/Release/tree_learner.obj
      src/build/_lightgbm.dir/Release/voting_parallel_tree_learner.obj
    Object files/libraries should not be included in a source package.
    
    checking for executable files ... WARNING
    Found the following executable files:
      src/Release/lib_lightgbm.dll
      src/build/CMakeFiles/3.8.0/CMakeDetermineCompilerABI_C.bin
      src/build/CMakeFiles/3.8.0/CMakeDetermineCompilerABI_CXX.bin
      src/build/CMakeFiles/3.8.0/CompilerIdC/CompilerIdC.exe
      src/build/CMakeFiles/3.8.0/CompilerIdCXX/CompilerIdCXX.exe
      src/build/CMakeFiles/FindOpenMP/ompver_C.bin
      src/build/CMakeFiles/FindOpenMP/ompver_CXX.bin
      src/build/CMakeFiles/feature_tests.bin
    Source packages should not contain undeclared executable files.
    See section 'Package structure' in the 'Writing R Extensions' manual.
    
    checking for portable file names ... NOTE
    Found the following non-portable file path:
      lightgbm/src/build/CMakeFiles/3.8.0/CompilerIdCXX/Debug/CompilerIdCXX.tlog/CompilerIdCXX.lastbuildstate
    
    Tarballs are only required to store paths of up to 100 bytes and cannot
    store those of more than 256 bytes, with restrictions including to 100
    bytes for the final component.
    See section 'Package structure' in the 'Writing R Extensions' manual.
    
  • Added LightGBM JAVA SWIG wrapper support for windows

    Added LightGBM JAVA SWIG wrapper support for windows

    Added LightGBM JAVA SWIG wrapper support for windows OS TODO: Add to Mac OS (need to find a Mac to validate this on)

    To validate on windows (note, you must have SWIG and java sdk installed and JAVA_HOME environment variable must be set):

    mkdir build cd build cmake -DCMAKE_GENERATOR_PLATFORM=x64 -DUSE_SWIG=ON .. cmake --build . --target ALL_BUILD --config Release

  • [R-package] make package installable with CRAN toolchain (fixes #2960)

    [R-package] make package installable with CRAN toolchain (fixes #2960)

    This pull request contains a proposal for the next step to get the LightGBM R package onto CRAN: building without CMake.

    See conversation in #629 for some background.

    Essentially, CRAN very particular about how source packages with C++ code are built. It enforces a lot of checks to ensure portability, and will reject packages that require any of the following:

    • non-portable flags
    • non-standard / non-open-source build tools

    The R package does not currently comply with CRAN's preferred build toolchain. This PR fixes that 😀

    Overview

    As of this PR, LightGBM's R package gains a CRAN-compliant installation toolchain using autoconf. From "Writing R Extensions"

    If your package needs some system-dependent configuration before installation you can include an executable (Bourne25) shell script configure in your package which (if present) is executed by R CMD INSTALL before any other action is performed. This can be a script created by the Autoconf mechanism, but may also be a script written by yourself...the full power of Autoconf is available for your extension package (including variable substitution, searching for libraries, etc.).

    The details of how this is used are explained in the proposed changes to R-package/README.md added to this PR.

    Notes for Reviewers

    • I have been testing this in PR on my fork for the last few months (most recent one: https://github.com/jameslamb/LightGBM/pull/31). Some CI jobs are still failing but I think this is close enough that it's ready for the review process to start.
    • Ignore the commented-out CI jobs for now. I've commented out some non-R CI jobs to minimize the burden that this PR puts on our CI while it is still in development.

    Thanks in advance for your time and thorough reviews!

  • v3.2.0 release

    v3.2.0 release

    This is a stable release. The next release may is 4.0.0, for many breaking changes. And it may take a long time to finish. So it better to fix the critical bugs in 3.2.0.

    Please list the PR need to be merged in this release.

  • Dart - very poor accuracy

    Dart - very poor accuracy

    When I use dart as a booster I always get very poor performance in term of l2 result for regression task. Even If I use small drop_rate = 0.01 or big like 0.3.

    When I use dart in xgboost on same dataset, with similar setting (same learning rate, similiar num_trees) dart alwasy give me boost for accuracy (small but always).

    But here accuracy is poor badly, like there is a bug, not just dart is not suitable for my task.

    Can anyone confirm that dart is working for regression task in term of better accuracy?

    My setting is as follows (part of the Python code for ramdom search of params):

    lr = np.random.choice([0.01, 0.005, 0.0025]) list_count = np.random.choice([250, 500, 750, 1000]) min_in_leaf = np.random.choice([25, 50, 100]) subF = np.random.choice([0.15, 0.22, 0.3, 0.5, 0.66, 0.75]) subR = np.random.choice([0.66, 0.75, 0.83, 0.9]) max_depth = np.random.choice([9, 11, 15, 25, 45, 100, -1]) dart_rate = np.random.choice([0, 0, 0, 0.01, 0.03, 0.1]) max_bin = np.random.choice([63, 127, 255, 511]) lambda_l1 = np.random.choice([0, 1., 10., 100.]) lambda_l2 = np.random.choice([0, 1., 10., 100.])

    iterace = 10000 if only_testing: min_in_leaf = 25 iterace = 10

    boost_type = 'gbdt' if dart_rate > 0: boost_type = 'dart'

    params = { 'task' : 'train', 'boosting_type' : boost_type, 'objective' : 'regression', 'metric' : 'l2', 'max_depth' : int(max_depth), 'num_leaves' : int(list_count), 'min_data_in_leaf' : int(min_in_leaf), 'learning_rate' : lr, 'feature_fraction' : subF,
    'bagging_fraction' : subR, 'bagging_freq': 1, 'verbose' : 0, 'nthread' : nthread, 'drop_rate': dart_rate, 'max_bin': max_bin, 'lambda_l1' : lambda_l1, 'lambda_l2' : lambda_l2 }

    model = lg.train( params, (matrix_learn, target_learn), num_boost_round = iterace, valid_datas = (matrix_test, target_test), early_stopping_rounds = 50 )

  • [ci] remove Travis (fixes #3519)

    [ci] remove Travis (fixes #3519)

    ~This is a draft PR to move CI jobs from Travis to Azure DevOps.~

    This PR moves remaining Mac + Linux jobs that are currently running on Travis to GitHub Actions. This project is ending its reliance on Travis based on Travis's strategic decision to offer only very very limited support for open source projects. See #3519 for full background and discussion.

  • [ci] Appveyor builds failing

    [ci] Appveyor builds failing

    Description

    For the last 2 days, Appveyor CI jobs have been failing, blocking development on this project. I've seen these changes on several, totally-unrelated PRs (e.g. #5503, #5504, #5505, #5506), so I'm fairly sure it's related to that job's configuration and not breaking changes on PRs.

    Reproducible example

    See any recent appveyor jobs, for example this one from #5506: https://ci.appveyor.com/project/guolinke/lightgbm/builds/44872180/job/sxsvo19536lcd058.

    Additional Comments

    LightGBM is compiling successfully, but with several errors in the Python unit tests.

    C:\Miniconda3-x64\envs\test-env\lib_pypy\_functools.py:80: in __call__
        return self._func(*(self._args + fargs), **fkeywords)
    C:\Miniconda3-x64\envs\test-env\lib\site-packages\sklearn\utils\estimator_checks.py:3489: in check_fit_idempotent
        estimator.fit(X_train, y_train)
    C:\Miniconda3-x64\envs\test-env\lib\site-packages\lightgbm\sklearn.py:1011: in fit
        init_model=init_model
    C:\Miniconda3-x64\envs\test-env\lib\site-packages\lightgbm\sklearn.py:806: in fit
        callbacks=callbacks
    C:\Miniconda3-x64\envs\test-env\lib\site-packages\lightgbm\engine.py:223: in train
        booster = Booster(params=params, train_set=train_set)
    C:\Miniconda3-x64\envs\test-env\lib\site-packages\lightgbm\basic.py:2775: in __init__
        train_set.construct()
    C:\Miniconda3-x64\envs\test-env\lib\site-packages\lightgbm\basic.py:1929: in construct
        self.feature_name = self.get_feature_name()
    C:\Miniconda3-x64\envs\test-env\lib\site-packages\lightgbm\basic.py:2380: in get_feature_name
        ptr_string_buffers = (ctypes.c_char_p * num_feature)(*map(ctypes.addressof, string_buffers))
    C:\Miniconda3-x64\envs\test-env\lib_pypy\_ctypes\array.py:203: in __init__
        self[i] = arg
    C:\Miniconda3-x64\envs\test-env\lib_pypy\_ctypes\array.py:229: in __setitem__
        cobj = self._type_.from_param(value)
    C:\Miniconda3-x64\envs\test-env\lib_pypy\_ctypes\primitive.py:382: in from_param
        return super(SimpleType, self).from_param(value)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    self = <class 'ctypes.c_char_p'>, value = 547343581888
        def from_param(self, value):
            if isinstance(value, self):
                return value
            try:
                as_parameter = value._as_parameter_
            except AttributeError:
                raise TypeError("expected %s instance instead of %s" % (
    >               self.__name__, type(value).__name__))
    E           TypeError: expected c_char_p instance instead of int
    C:\Miniconda3-x64\envs\test-env\lib_pypy\_ctypes\basics.py:59: TypeError
    _____ test_sklearn_integration[LGBMRegressor()-check_fit_check_is_fitted] _____
    self = <class 'ctypes.c_char_p'>, value = 547343579440
        def from_param(self, value):
            if isinstance(value, self):
                return value
            try:
    >           as_parameter = value._as_parameter_
    E           AttributeError: 'int' object has no attribute '_as_parameter_'
    C:\Miniconda3-x64\envs\test-env\lib_pypy\_ctypes\basics.py:56: AttributeError
    

    Haven't investigated much yet, just opening this to track.

  • replace `pandas.Series.iteritems` with `pandas.Series.items`

    replace `pandas.Series.iteritems` with `pandas.Series.items`

  • [tests][dask] solve timeouts

    [tests][dask] solve timeouts

    Sometimes the dask tests get stuck on the _train_part function with the following call stack:

    File "/home/jose/mambaforge/envs/lgb-test/lib/python3.9/threading.py", line 937, in _bootstrap self._bootstrap_inner()
    File "/home/jose/mambaforge/envs/lgb-test/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run()
    File "/home/jose/mambaforge/envs/lgb-test/lib/python3.9/threading.py", line 917, in run self._target(*self._args, **self._kwargs)
    File "/home/jose/mambaforge/envs/lgb-test/lib/python3.9/site-packages/distributed/threadpoolexecutor.py", line 57, in _worker task.run()
    File "/home/jose/mambaforge/envs/lgb-test/lib/python3.9/site-packages/distributed/_concurrent_futures_thread.py", line 65, in run result = self.fn(*self.args, **self.kwargs)
    File "/home/jose/mambaforge/envs/lgb-test/lib/python3.9/site-packages/distributed/worker.py", line 2882, in apply_function msg = apply_function_simple(function, args, kwargs, time_delay)
    File "/home/jose/mambaforge/envs/lgb-test/lib/python3.9/site-packages/distributed/worker.py", line 2904, in apply_function_simple result = function(*args, **kwargs)
    File "/hdd/github/LightGBM/python-package/lightgbm/dask.py", line 322, in _train_part model.fit(
    File "/hdd/github/LightGBM/python-package/lightgbm/sklearn.py", line 1084, in fit super().fit(
    File "/hdd/github/LightGBM/python-package/lightgbm/sklearn.py", line 797, in fit self._Booster = train(
    File "/hdd/github/LightGBM/python-package/lightgbm/engine.py", line 223, in train booster = Booster(params=params, train_set=train_set)
    File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 2775, in __init__ train_set.construct()
    File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 1923, in construct self._lazy_init(self.data, label=self.label,
    File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 1578, in _lazy_init self.__init_from_np2d(data, params_str, ref_dataset)
    File "/hdd/github/LightGBM/python-package/lightgbm/basic.py", line 1708, in __init_from_np2d _safe_call(_LIB.LGBM_DatasetCreateFromMat( 
    

    Which I'm able to reproduce locally by trying to train 100 consecutive times.

    I haven't been able to reproduce this with the number of threads in the workers equal to 1.

  • [R-package] [docs] use CRAN canonical form for package links

    [R-package] [docs] use CRAN canonical form for package links

    While checking the R package on win-builder, I saw the following R CMD check NOTE:

    Found the following (possibly) invalid URLs:
      URL: https://cran.r-project.org/web/packages/roxygen2/index.html
        From: README.md
        Status: 200
        Message: OK
        CRAN URL not in canonical form
      The canonical URL of the CRAN page for a package is 
        https://CRAN.R-project.org/package=pkgname
    

    This PR proposes accepting that suggestion, to reduce the risk of the next CRAN submission being rejected.

  • [ci] [R-package] use R 4.2.1 in Windows CI jobs (fixes #4881)

    [ci] [R-package] use R 4.2.1 in Windows CI jobs (fixes #4881)

    Fixes #4881. Replaces #5274.

    Changes in this PR

    • upgrade from R 4.1.3 to R 4.2.1 in Windows CI jobs
    • upgrade from Rtools40 to Rtools42 in Windows CI jobs
    • explicitly pass version parsed from VERSION.txt into test_r_package_windows.ps1 (instead of relying on wildcards like lightgbm_*.tar.gz
    • extend "skip one test when compiling with MSVC" fix from #5448 to all MSVC jobs

    Why did the check in configure.win need to change?

    Compiling the test program used to check for inet_pton with the compiler that ships with RTools42 (gcc 10.3), results in the following error.

    conftest.cpp: In function 'int main()':
    conftest.cpp:3:13: error: invalid conversion from 'INT (*)(INT, LPCSTR, PVOID)' {aka 'int (*)(int, const char*, void*)'} to 'void*' [-fpermissive]
        3 |   void* p = inet_pton;
          |             ^~~~~~~~~
          |             |
          |             INT (*)(INT, LPCSTR, PVOID) {aka int (*)(int, const char*, void*)}
    

    To avoid that casting, this PR proposes just assigning the address of inet_pton to a function pointer for a function with exactly the same signature as the one LightGBM's code expects.

    Why does the use of "lightgbm_*.tar.gz" need to be replaced with "lightgbm_$env:LGB_VER.tar.gz"?

    I'm not sure...probably related to R or some tool shipping with RTools42 not working well with wildcards. While testing in #4881, I saw errors like the following:

    Error: no packages were specified

    Notes for Reviewers

    #4881 refers to creating a comment-triggered workflow for the UCRT toolchain for Windows builds, but I don't think that's necessary any more now that R 4.2.0 has actually been released and Rtools42 contains the UCRT build tools.

    I closed #4881 in favor of this because that PR was getting very busy with debugging commits and comments.

    References

    There is some documentation about Rtools42 adding its own compilers to PATH, but I think that just means to R's PATH (referenced by R CMD commands).

    e.g., see https://cran.r-project.org/bin/windows/base/howto-R-4.2.html

    To make the use of Rtools42 simpler, when R is installed via the binary installer it by default uses Rtools42 for the compilers and libraries. PATH will be set by R (inside front-ends like RGui and RTerm, but also R CMD) to include the build tools (e.g. make) and the compilers (e.g. gcc). In addition, R installed via the binary installer will automatically set R_TOOLS_SOFT (and LOCAL_SOFT for backwards compatibility) to the Rtools42 location for building R packages. This feature is only present in the installer builds of R, not when R is installed from source.

    and https://github.com/r-lib/actions/issues/574#issue-1265095474

    Some additional relevant information in the r-lib/actions project:

    • https://github.com/r-lib/actions/commit/7d98418c6b5cfca592b100e2f348fbe90c18bcf3
    • https://github.com/r-lib/actions/blob/8e8f7edba0e5bda9318f9377c0dc9c2f5d0fb786/setup-r/src/installer.ts#L431-L457

    How I tested this

    Given the changes to configure.win, I also tested the project on R Hub and win-builder.

    Generated the CRAN source distribution like this:

    sh build-cran-package.sh
    

    Then used that .tar.gz to test on win-builder and R Hub.

    passing on win-builder (click me)
    • R-release (logs)
    • R-devel (logs)
    • R-oldrelease (logs)

    See https://win-builder.r-project.org/. This project is run by some of the CRAN maintainers, and very closely matches the Windows builds on CRAN.

    There are functions in {devtools} to automate uploading to this service, e.g. devtools::check_win_release(). Unfortunately, those only take a director to be packaged with R CMD build. That doesn't work with the custom stuff build-cran-package.sh has to do to build vignettes:

    https://github.com/microsoft/LightGBM/blob/dc4794b62f2c09103e558b7a8325bed66083a5c3/build-cran-package.sh#L206

    So I manually changed the maintainer to myself in DESCRIPTION, built the package with build-cran-package.sh, then uploaded manually at https://win-builder.r-project.org/upload.aspx.

    passing on R Hub (click me)

    I tried checking the package built from this branch on all 4 platforms supported by R Hub.

    • windows-x86_64-devel (logs)
    • windows-x86_64-oldrel (logs)
    • windows-x86_64-patched (logs)
    • windows-x86_64-release (logs)
    EMAIL <- "****" # my personal email
    PACKAGE_TARBALL <- "lightgbm_3.3.2.99.tar.gz"
    
    result <- rhub::check(
        path = PACKAGE_TARBALL
        , email = EMAIL
        , check_args = c(
            "--as-cran"
        )
        , platforms = c(
            "windows-x86_64-devel"
            , "windows-x86_64-oldrel"
            , "windows-x86_64-patched"
            , "windows-x86_64-release"
        )
        , env_vars = c(
            "R_COMPILE_AND_INSTALL_PACKAGES" = "always"
            , "_R_CHECK_FORCE_SUGGESTS_" = "true"
            , "_R_CHECK_CRAN_INCOMING_USE_ASPELL_" = "true"
        )
    )
    
  • [R-package] CRAN error: r-devel-linux-x86_64-debian-clang

    [R-package] CRAN error: r-devel-linux-x86_64-debian-clang

    Description

    CRAN checks for {lightgbm} are showing an ERROR for the r-devel-linux-x86_64-debian-clang check flavor.

    image

    https://cran.r-project.org/web/checks/check_results_lightgbm.html

    Reproducible example

    I haven't tried to reproduce this yet outside of the CRAN system.

    According to CRAN's logs (link), exactly one test is failing.

         == Failed ======================================================================
         -- 1. Failure (test_basic.R:1276:5): lgb.train() supports non-ASCII feature name
         dumped_model[["feature_names"]] not identical to iconv(feature_names, to = "UTF-8").
         4/4 mismatches
         x[1]: "F_é\u009b¶"
         y[1]: "F_<U+96F6>"
        
         x[2]: "F_äž\u0080"
         y[2]: "F_<U+4E00>"
        
         x[3]: "F_äº\u008c"
         y[3]: "F_<U+4E8C>"
        
         x[4]: "F_äž\u0089"
         y[4]: "F_<U+4E09>"
        
         == DONE ========================================================================
         Error: Test failures
         Execution halted
    

    Additional Comments

    This project has a CI job (#4164) that is intended to exactly replicate CRAN's r-devel-linux-x86_64-debian-clang check. That job has been succeeding. I'm not sure yet what the difference is between that job and CRAN's setup.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

Sep 20, 2022
LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

LibRerank LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRer

Sep 23, 2022
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking. TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

Sep 15, 2022
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Sep 5, 2022
A high performance and generic framework for distributed DNN training
A high performance and generic framework for distributed DNN training

BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith

Sep 20, 2022
Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting
Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting

House Prices - Advanced Regression Techniques Predicting House Prices with Machine Learning This project is build to enhance my knowledge about machin

Jan 1, 2022
Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )
Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

implementation of machine learning Algorithms such as decision tree and random forest and xgboost on darasets then compare results for each and implement ant colony and genetic algorithms on tsp map, play blackjack game and robot in grid world and evaluate reward for it

Jan 19, 2022
Test symmetries with sklearn decision tree models

Test symmetries with sklearn decision tree models Setup Begin from an environment with a recent version of python 3. source setup.sh Leave the enviro

Jul 19, 2022
Decision Tree Regression algorithm implemented on Python from scratch.
Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

Dec 22, 2021
XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

XGBoost-Ray is a distributed backend for XGBoost, built on top of distributed computing framework Ray.

Aug 22, 2022
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models tabular data.

Jul 25, 2022
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just

Jun 26, 2022
Sep 18, 2022
High performance implementation of Extreme Learning Machines (fast randomized neural networks).

High Performance toolbox for Extreme Learning Machines. Extreme learning machines (ELM) are a particular kind of Artificial Neural Networks, which sol

Jun 18, 2022
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

Sep 21, 2022
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.

Sep 17, 2022
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Sep 26, 2022
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Aug 24, 2022
machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service
machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

This is a machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service. We initially made this project as a requirement for an internship at Indian Servers. We are now making it open to contribution.

Sep 1, 2022