Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques

Vowpal Wabbit

Linux build status Windows build status MacOS build status

codecov Total Alerts Gitter chat

This is the Vowpal Wabbit fast online learning code.

Why Vowpal Wabbit?

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. There is a specific focus on reinforcement learning with several contextual bandit algorithms implemented and the online nature lending to the problem well. Vowpal Wabbit is a destination for implementing and maturing state of the art algorithms with performance in mind.

  • Input Format. The input format for the learning algorithm is substantially more flexible than might be expected. Examples can have features consisting of free form text, which is interpreted in a bag-of-words way. There can even be multiple sets of free form text in different namespaces.
  • Speed. The learning algorithm is fast -- similar to the few other online algorithm implementations out there. There are several optimization algorithms available with the baseline being sparse gradient descent (GD) on a loss function.
  • Scalability. This is not the same as fast. Instead, the important characteristic here is that the memory footprint of the program is bounded independent of data. This means the training set is not loaded into main memory before learning starts. In addition, the size of the set of features is bounded independent of the amount of training data using the hashing trick.
  • Feature Interaction. Subsets of features can be internally paired so that the algorithm is linear in the cross-product of the subsets. This is useful for ranking problems. The alternative of explicitly expanding the features before feeding them into the learning algorithm can be both computation and space intensive, depending on how it's handled.

Visit the wiki to learn more.

Getting Started

For the most up to date instructions for getting started on Windows, MacOS or Linux please see the wiki. This includes:

Comments
  • C# refactoring, memory leak fixes, general goodness,...

    C# refactoring, memory leak fixes, general goodness,...

    fixed Runtime library mismatch between zlib, libvw, vw.exe, VowpalWabbitCore.dll (CLR),... by using zlib/boost nuget provided msbuild targets included Visual Leak Detector for memory leak detection on windows refactored C# API to allow users to dynamically constructor serializers based on alternate descriptions (not just on static annotations) string marshalling is compatible to command line (either escaping or splitting) schema based pre-hashing: if hash can be determine from schema it's only generated once and re-used for each example. added type extension API for marshalling allow user to generate native and string examples in parallel in both debug and release keep marshalling expression tree for debugging refactored marshalling expression tree generation to improve readability added sweeping helper improved C# label parsing extensibility added assembly signing fixed memory leaks in C# usage of VW fixed model hashing/reload interaction fixed handling of empty line examples within set of action dependent features fixed order issue when predicting ADF examples containing empty action dependent features fixed default namespace incompatibility (space vs. 0) improved RunTests to C# test wrapping (detects inter-test dependencies and input files) unit tests are run in test/ folder, thus no need copy all input files added user-supplied model id support

  • VWRegressor provides very different performance for  loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared'

    VWRegressor provides very different performance for loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared'

    Describe the bug

    VWRegressor provides very different performance for loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared'

    loss_function = 'squared' - provides very GOOD low MAE loss_function = 'quantile' , quantile_tau = 0.5 - provides bad high MAE

    data is mixture categorical data and continues data : 600 rows like this

    image

    To Reproduce

            if 1:
                model = VWRegressor(convert_to_vw = False ,normalized = True, 
                                                               passes = passes, 
                                                                power_t = 0.5, #1.0,
                                                               readable_model = 'my_VW.model' , cache_file =  'my_VW.cache' ,
                                                               learning_rate = 2.3 , l2 = l2, l1=l1,
                                                               quadratic= 'CC' , cubic = 'CCC',
                                                                loss_function = 'quantile' , quantile_tau = 0.5)
                q=0
            else:
                model = VWRegressor(convert_to_vw = False ,normalized = True, 
                                                          passes = passes, 
                                                           power_t = 0.5, #1.0,
                                                          readable_model = 'my_VW.model' , cache_file =  'my_VW.cache' ,
                                                          learning_rate = 2.1, loss_function = 'squared' , l2 = l2, l1=l1,
                                                          quadratic= 'CC' , cubic = 'CCC' )
    

    Expected behavior

    my guess MAE for loss_function = 'quantile' , quantile_tau = 0.5 and loss_function = 'squared' should be very similar

    in addition loss_function = 'quantile' , quantile_tau = 0.9 and loss_function = 'quantile' , quantile_tau = 0.1 gives very wide confidence intervals - even no sense confidence intervals

    Observed Behavior

    How did VW behave? Please include any stack trace, log messages or crash logs.

    Environment

    What version of VW did you use? latest OS - windows 10

    Additional context

    do you have code example where VWRegressor loss_function = 'quantile' , quantile_tau = 0.9 and loss_function = 'quantile' , quantile_tau = 0.1

  • JNI Layer throws Exceptions when close method is called in parallel ON DIFFERENT MODELS

    JNI Layer throws Exceptions when close method is called in parallel ON DIFFERENT MODELS

    Problem

    In the JNI layer, when multiple passes are enabled (> 1) and an attempt is made to close separate models in parallel, exceptions can be thrown. This is true even though each model has it's own lock to guard all accesses to the native code paths. Only a global lock around calls to the model close methods seems to avoid this issue.

    I'm looking for help on identifying if any critical sections of the C code that can be guarded by a lock to avoid thread-safety issues. I'm not asking for the C code to lock. I just want help trying to figure out where to put the locks in the Java code that wraps the C code.

    Scope

    This seems to be in PR #1291 but was not fixed with PR #1295

    Discussion

    From empirical testing, it appears one of these lines seems to be the problem. I am wondering if any of these use global state. I am trying to figure out if we can lock only over a short critical section to avoid thread-safety issues.

    1. adjust_used_index(*vwInstance);
    2. vwInstance->do_reset_source = true;
    3. VW::start_parser(*vwInstance);
    4. LEARNER::generic_driver(*vwInstance);
    5. VW::end_parser(*vwInstance);

    Previous Conversation

    In PR #1295 there was the following conversation:

    @JohnLangford

    There should be zero shared state between multiple created VW objects. Is that what it's doing? (Creating multiple distinct VW objects?)

    @deaktator

    @JohnLangford. It looks like just one VW object. Each Java call does the following on the C side:

    vw* vwInstance = VW::initialize(env->GetStringUTFChars(command, NULL);

    @JohnLangford

    A single VW object can not be operated on in multiple threads because the code inside VW is not thread safe. If you want to have a model which is shared by multiple threads, you set this up more explicitly by initializing a new VW object with an existing model.

    @deaktator

    Hey @JohnLangford. We take care of multi-threaded access to VW by locking anywhere that requires access to the C code. The thread-safety issues I encountered before were on an incomplete version of the code that locked in the wrong place. When I run the tests in parallel, they seem to work just fine now. I ran them a bunch of times with forking in the tests and didn't see any issues.

    Tracking Down What's Happening

    It appears @jon-morra-zefr pretty much copied the C# code for multiple passes, so this seems like it might apply to C# as well. Both C# and JNI C++ code appear below as well as the calling code that blows up.

    I've seen a bunch of different errors that occur at the same spot. invalidated cache, malformed LDF feature exceptions, etc.

    Example Code That triggers exceptions

    // Doing this many times in parallel with no locks causes problems.
    
    	val vwJNI = VWLearners.create[VWTypedLearner[_]](vwLearnString)
    	// Learning in here using   vwJNI.learn
    
    	// PROBLEM AREA:
    	lock.lock()     // <== NEED GLOBAL LOCKING OR EXCEPTIONS THROWN
    	vwJNI.close()
    	lock.unlock()   // <== NEED GLOBAL LOCKING OR EXCEPTIONS THROWN
    

    Similarity of the C# and JNI C++ Code

    C# Code: vowpal_wabbit/cs/cli/vowpalwabbit.cpp

    void VowpalWabbit::RunMultiPass()
    { if (m_vw->numpasses > 1)
      { try
        { adjust_used_index(*m_vw);
          m_vw->do_reset_source = true;
          VW::start_parser(*m_vw);
          LEARNER::generic_driver(*m_vw);
          VW::end_parser(*m_vw);
        }
        CATCHRETHROW
      }
    }
    

    JNI C++ Code: vowpal_wabbit/java/src/main/c++/vowpalWabbit_learner_VWLearners.cc

    JNIEXPORT void JNICALL Java_vowpalWabbit_learner_VWLearners_performRemainingPasses(JNIEnv *env, jclass obj, jlong vwPtr)
    { try
      { vw* vwInstance = (vw*)vwPtr;
        if (vwInstance->numpasses > 1)
          { adjust_used_index(*vwInstance);
            vwInstance->do_reset_source = true;
            VW::start_parser(*vwInstance);
            LEARNER::generic_driver(*vwInstance);
            VW::end_parser(*vwInstance);
          }
      }
      catch(...)
      { rethrow_cpp_exception_as_java_exception(env);
      }
    }
    

    Any thoughts?

  • Continuous actions

    Continuous actions

    This is the preliminary PR for continuous actions.

    This includes, cats_tree (continuous action tree with smoothing) algorithm, converting between PMF (discrete) to PDF (continuous) distribution, sampling form continuous PDF, etc.

    The code is for the paper available at https://arxiv.org/pdf/2006.06040.pdf

    We will add more details.

  • Bug fixes

    Bug fixes

    Fixed NRE on empty hashes Skip model load/initialize when seeding from in-memory model Fixed progressive validation in Azure trainer Includes mixed JSON string and JSON direct support Includes native C++ JSON parsing

  • Try coveralls

    Try coveralls

    Added 3 new make targets: vw_gcov, library_example_gcov, test_gcov which build vw and the examples with GCOV support, then run tests. This allows coveralls to analyze test coverage in the source code, but slows the tests down signifigantly. I also edited the travis .yml file to upload the results to coveralls.io and added the badge to the readme.

    Someone will need to setup a coveralls account for the main VW project and point the badge in the readme to that badge. Currently the coveralls badge points only to my fork.

  • Pandas to vw text format

    Pandas to vw text format

    1. Overview

    The goal of this PR is to fix the issue #2308.

    The PR introduces a new class DFToVW in vowpalwabbit.pyvw that takes as input the pandas.DataFrame and special types (SimpleLabel, Feature, Namespace) that specify the desired VW conversion.

    These classes make extensive use of a class Col that refers to a given column in the user specified dataframe.

    A simpler interface DFtoVW.from_colnames also be used for the simple use-cases. The main benefit is that the user need not use the specific types.


    Below are some usages of this class. They all rely on the following pandas.DataFrame called df :

      house_id  need_new_roof  price  sqft   age  year_built
    0      id1              0   0.23  0.25  0.05        2006
    1      id2              1   0.18  0.15  0.35        1976
    2      id3              0   0.53  0.32  0.87        1924
    

    2. Simple usage using DFtoVW.from_colnames

    Let say we want to build a VW dataset with the target need_new_roof and the feature age :

    from vowpalwabbit.pyvw import DFtoVW
    conv = DFtoVW.from_colnames(y="need_new_roof", x=["age", "year_built"], df=df)
    

    Then we can use the method process_df:

    conv.process_df()
    

    that outputs the following list:

    ['0 | 0.05 2006', '1 | 0.35 1976', '0 | 0.87 1924']
    

    This list can then directly be consumed by the method pyvw.model.learn.

    3. Advanced usages using default constructor

    The class DFtoVW also allow the following patterns in its default constructor :

    • tag
    • (named) namespaces, with scaling factor
    • (named) features, with constant feature possible

    To use these more complex patterns we need to import them using:

    from vowpalwabbit.pyvw import SimpleLabel, Namespace, Feature, Col
    

    3.1. Named namespace with scaling, and named feature

    Let's create a VW dataset that include a named namespace (with scaling) and a named feature:

    conv = DFtoVW(
            df=df,
            label=SimpleLabel(Col("need_new_roof")),
            namespaces=Namespace(name="Imperial", value=0.092, features=Feature(value=Col("sqft"), name="sqm"))
            )
    conv.process_df()
    

    which yields:

    ['0 |Imperial:0.092 sqm:0.25',
     '1 |Imperial:0.092 sqm:0.15',
     '0 |Imperial:0.092 sqm:0.32']
    

    3.2. Multiple namespaces, multiple features, and tag

    Let's create a more complex example with a tag and multiples namespaces with multiples features.

    conv = DFtoVW(
            df=df, 
            label=SimpleLabel(Col("need_new_roof")),
            tag=Col("house_id"),
            namespaces=[
                    Namespace(name="Imperial", value=0.092, features=Feature(value=Col("sqft"), name="sqm")),
                    Namespace(name="DoubleIt", value=2, features=[Feature(value=Col("price")), Feature(Col("age"))])
                    ]
            )
    conv.process_df()
    

    which yields:

    ['0 id1|Imperial:0.092 sqm:0.25 |DoubleIt:2 0.23 0.05',
     '1 id2|Imperial:0.092 sqm:0.15 |DoubleIt:2 0.18 0.35',
     '0 id3|Imperial:0.092 sqm:0.32 |DoubleIt:2 0.53 0.87']
    

    4. Implementation details

    • The class DFtoVW and the specific types are located in vowpalwabbit/pyvw.py. The class only depends on the pandas module.
    • the code includes docstrings
    • 8 tests are included in tests/test_pyvw.py

    5. Extensions

    • This PR does not yet handle multilines and more complex label types.
    • To convert very large dataset that can't fit in RAM, one can make use of the pandas import option chunksize and process each chunk at a time. I could also implement this functionnality directly in the class using generator. The generator would then be consumed by either a VW learning interface or could be written to external file (for conversion purpose only).
  • Test don't pass on Mac OS 10.10

    Test don't pass on Mac OS 10.10

    Mac os 10.10 and boost version boost-1.58.0

    gcc --version Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn) Target: x86_64-apple-darwin14.0.0 Thread model: posix

    Get some warnings and test 16 don't pass.

    Here is full log:

    make
    cd vowpalwabbit; /Library/Developer/CommandLineTools/usr/bin/make -j 4 things
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c main.cc -o main.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c global_data.cc -o global_data.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_regressor.cc -o parse_regressor.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_primitives.cc -o parse_primitives.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c unique_sort.cc -o unique_sort.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cache.cc -o cache.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c rand48.cc -o rand48.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c simple_label.cc -o simple_label.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c multiclass.cc -o multiclass.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c oaa.cc -o oaa.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c multilabel_oaa.cc -o multilabel_oaa.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c boosting.cc -o boosting.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c ect.cc -o ect.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c autolink.cc -o autolink.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c binary.cc -o binary.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c lrq.cc -o lrq.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cost_sensitive.cc -o cost_sensitive.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c multilabel.cc -o multilabel.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c label_dictionary.cc -o label_dictionary.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c csoaa.cc -o csoaa.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cb.cc -o cb.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cb_adf.cc -o cb_adf.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cb_algs.cc -o cb_algs.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search.cc -o search.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_meta.cc -o search_meta.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_sequencetask.cc -o search_sequencetask.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_dep_parser.cc -o search_dep_parser.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_hooktask.cc -o search_hooktask.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_multiclasstask.cc -o search_multiclasstask.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_entityrelationtask.cc -o search_entityrelationtask.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c search_graph.cc -o search_graph.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_example.cc -o parse_example.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c scorer.cc -o scorer.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c network.cc -o network.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parse_args.cc -o parse_args.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c accumulate.cc -o accumulate.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c gd.cc -o gd.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c learner.cc -o learner.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c lda_core.cc -o lda_core.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c gd_mf.cc -o gd_mf.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c mf.cc -o mf.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c bfgs.cc -o bfgs.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c noop.cc -o noop.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c print.cc -o print.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c example.cc -o example.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c parser.cc -o parser.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c loss_functions.cc -o loss_functions.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c sender.cc -o sender.o
    parser.cc:452:26: warning: 'daemon' is deprecated: first deprecated in OS X 10.5
          [-Wdeprecated-declarations]
          if (!all.active && daemon(1,1))
                             ^
    /usr/include/stdlib.h:267:6: note: 'daemon' has been explicitly marked
          deprecated here
    int      daemon(int, int) __DARWIN_1050(daemon) __OSX_AVAILABLE_BUT_DEPR...
             ^
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c nn.cc -o nn.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c bs.cc -o bs.o
    1 warning generated.
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c cbify.cc -o cbify.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c topk.cc -o topk.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c stagewise_poly.cc -o stagewise_poly.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c log_multi.cc -o log_multi.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c active.cc -o active.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c kernel_svm.cc -o kernel_svm.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c best_constant.cc -o best_constant.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c ftrl.cc -o ftrl.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c svrg.cc -o svrg.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c lrqfa.cc -o lrqfa.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c interact.cc -o interact.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c comp_io.cc -o comp_io.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c interactions.cc -o interactions.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c vw_exception.cc -o vw_exception.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c allreduce.cc -o allreduce.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o active_interactor active_interactor.cc
    ar rcs liballreduce.a allreduce.o
    ar rcs libvw.a hash.o global_data.o io_buf.o parse_regressor.o parse_primitives.o unique_sort.o cache.o rand48.o simple_label.o multiclass.o oaa.o multilabel_oaa.o boosting.o ect.o autolink.o binary.o lrq.o cost_sensitive.o multilabel.o label_dictionary.o csoaa.o cb.o cb_adf.o cb_algs.o search.o search_meta.o search_sequencetask.o search_dep_parser.o search_hooktask.o search_multiclasstask.o search_entityrelationtask.o search_graph.o parse_example.o scorer.o network.o parse_args.o accumulate.o gd.o learner.o lda_core.o gd_mf.o mf.o bfgs.o noop.o print.o example.o parser.o loss_functions.o sender.o nn.o bs.o cbify.o topk.o stagewise_poly.o log_multi.o active.o kernel_svm.o best_constant.o ftrl.o svrg.o lrqfa.o interact.o comp_io.o interactions.o vw_exception.o
    /usr/bin/g++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o vw main.o -L. -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    cd cluster; /Library/Developer/CommandLineTools/usr/bin/make
    /usr/bin/clang++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -c spanning_tree.cc -o spanning_tree.o
    spanning_tree.cc:161:9: warning: 'daemon' is deprecated: first deprecated in OS
          X 10.5 [-Wdeprecated-declarations]
        if (daemon(1,1))
            ^
    /usr/include/stdlib.h:267:6: note: 'daemon' has been explicitly marked
          deprecated here
    int      daemon(int, int) __DARWIN_1050(daemon) __OSX_AVAILABLE_BUT_DEPR...
             ^
    1 warning generated.
    /usr/bin/clang++ -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o spanning_tree spanning_tree.o 
    cd library; /Library/Developer/CommandLineTools/usr/bin/make things
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o ezexample_predict ezexample_predict.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o ezexample_train ezexample_train.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o library_example library_example.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o recommend recommend.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o gd_mf_weights gd_mf_weights.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    /usr/bin/g++ -g -std=c++0x   -D__extern_always_inline=inline -Wall -pedantic -O3 -fomit-frame-pointer -fno-strict-aliasing  -D_FILE_OFFSET_BITS=64 -DNDEBUG -I /usr/local/include  -fPIC  -o test_search test_search.cc -L ../vowpalwabbit -l vw -l allreduce -L /usr/local/lib -lboost_program_options-mt -lboost_serialization-mt -l pthread -l z
    
    make test
    cd vowpalwabbit; /Library/Developer/CommandLineTools/usr/bin/make -j 4 things
    make[1]: Nothing to be done for `things'.
    cd library; /Library/Developer/CommandLineTools/usr/bin/make things
    make[1]: Nothing to be done for `things'.
    vw running test-suite...
    (cd test && ./RunTests -d -fe -E 0.001 ../vowpalwabbit/vw ../vowpalwabbit/vw)
    Testing on: hostname=air-mac OS=darwin
    Testing vw: ../vowpalwabbit/vw
    Testing lda: ../vowpalwabbit/vw
    RunTests: '-D' to see any diff output
    RunTests: '-o' to force overwrite references
    RunTests: test 1: stderr OK
    RunTests: test 2: stderr OK
    RunTests: test 2: predict OK
    RunTests: test 3: stderr OK
    RunTests: test 4: stdout OK
    RunTests: test 4: stderr OK
    RunTests: test 5: stderr OK
    RunTests: test 6: stderr OK
    RunTests: test 6: minor (<0.001) precision differences ignored
    RunTests: test 6: predict OK
    RunTests: test 7: stderr OK
    RunTests: test 8: stderr OK
    RunTests: test 8: minor (<0.001) precision differences ignored
    RunTests: test 8: predict OK
    RunTests: test 9: stderr OK
    RunTests: test 9: predict OK
    RunTests: test 10: stderr OK
    RunTests: test 10: predict OK
    RunTests: test 11: stderr OK
    RunTests: test 12: stderr OK
    RunTests: test 13: stderr OK
    RunTests: test 14: stdout OK
    RunTests: test 14: minor (<0.001) precision differences ignored
    RunTests: test 14: stderr OK
    RunTests: test 15: stdout OK
    RunTests: test 15: stderr OK
    RunTests: test 16: stdout OK
    --- diff -u --minimal train-sets/ref/rcv1_small.stderr stderr.tmp
    --- train-sets/ref/rcv1_small.stderr    2015-08-13 00:22:20.000000000 +0300
    +++ stderr.tmp  2015-08-13 00:33:33.000000000 +0300
    @@ -17,7 +17,7 @@
      5 0.47879     0.00006     0.00617      0.595892   0.183063                            0.47184     1.00000   
      6 0.47750     0.00000     0.00221      0.703360   0.403715                            0.68626     1.00000   
      7 0.47680     0.00000     0.00038      0.588395   0.175459                            0.08911     1.00000   
    - 8 0.47671     0.00000     0.00002      0.568445   0.136827                            0.00444     1.00000   
    + 8 0.47671     0.00000     0.00002      0.568443   0.136827                            0.00444     1.00000   
    
     finished run
     number of examples = 8000
    RunTests: test 16: FAILED: ref(train-sets/ref/rcv1_small.stderr) != stderr(stderr.tmp)
        cmd: ../vowpalwabbit/vw -k -c -d train-sets/rcv1_small.dat --loss_function=logistic -b 20 --bfgs --mem 7 --passes 20 --termination 0.001 --l2 1.0 --holdout_off
    
    
  • Trying to upgrade from vw-jni-8.2.0 to something close to vw-jni-8.4.1-SNAPSHOT

    Trying to upgrade from vw-jni-8.2.0 to something close to vw-jni-8.4.1-SNAPSHOT

    Using VW 8.4.0 installed with brew I created a simple test set initializing VW with

    $ vw --csoaa 10  -b 24  --l2 0.0  -l 0.1  -c -k --passes 100  -f /Users/pat/big-data/harness/models/test_resource  --save_resume
    

    Then I paste examples in:

    0:0.0 1:1.0 | user_user_2 testGroupId_1 
    0:0.0 1:1.0 | user_user_2 testGroupId_1 
    0:0.0 1:1.0 | user_user_2 testGroupId_1 
    0:0.0 1:1.0 | user_user_2 testGroupId_1 
    0:1.0 1:0.0 | user_user_1 testGroupId_1 
    0:1.0 1:0.0 | user_user_1 testGroupId_1 
    0:1.0 1:0.0 | user_user_1 testGroupId_1 
    0:1.0 1:0.0 | user_user_1 testGroupId_1 
    save_
    

    at the save_ the file Users/pat/big-data/harness/models/test_resource is updated—all is well.

    Using the last available JNI binary wrapper for 8.2.0 doing the same thing from Java does not update the model file. I'm not running in --quiet mode and there is no complaint from VW.

    The save_ pseudo example does apparently work for 8.2.0 since another user is using it in CLI and daemon mode.

    Is this feature not supported with JNI?

    On the advice of @arielf it appears I need something like vw-jni-8.4.1-SNAPSHOT so trying to build for dev machine (MBP) and deploy machine (ubuntu). Dev machine first.

  • Python wrapper installation fails

    Python wrapper installation fails

    I'm not able to pip install vowpalwabbit to install the python wrapper. I don't know enough to understand why it's failing, but I thought it might be worth bringing to someone's attention.

    I'm on OSX and using an Anaconda environment. I installed vowpal wabbit from homebrew.

    Here's my traceback:

    Collecting vowpalwabbit
      Using cached vowpalwabbit-8.2.0.tar.gz
    Building wheels for collected packages: vowpalwabbit
      Running setup.py bdist_wheel for vowpalwabbit ... error
      Complete output from command /Users/vvvvv/anaconda/envs/trendrank/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/tmp5FosrOpip-wheel- --python-tag cp27:
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.macosx-10.5-x86_64-2.7
      creating build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
      copying vowpalwabbit/__init__.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
      copying vowpalwabbit/pyvw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
      copying vowpalwabbit/sklearn_vw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
      running egg_info
      writing vowpalwabbit.egg-info/PKG-INFO
      writing top-level names to vowpalwabbit.egg-info/top_level.txt
      writing dependency_links to vowpalwabbit.egg-info/dependency_links.txt
      warning: manifest_maker: standard file '-c' not found
    
      reading manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no previously-included files matching '*.o' found anywhere in distribution
      warning: no previously-included files matching '*.exe' found anywhere in distribution
      warning: no previously-included files matching '*.pyc' found anywhere in distribution
      writing manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
      running build_ext
      Traceback (most recent call last):
        File "<string>", line 1, in <module>
        File "/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py", line 184, in <module>
          tests_require=['tox'],
    
    [...]
    
        File "/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py", line 38, in find_boost
          raise Exception('Could not find boost python library')
      Exception: Could not find boost python library
    
      ----------------------------------------
      Failed building wheel for vowpalwabbit
      Running setup.py clean for vowpalwabbit
    Failed to build vowpalwabbit
    Installing collected packages: vowpalwabbit
      Running setup.py install for vowpalwabbit ... error
        Complete output from command /Users/vvvvv/anaconda/envs/trendrank/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-4GRRiq-record/install-record.txt --single-version-externally-managed --compile:
        running install
        running build
        running build_py
        creating build
        creating build/lib.macosx-10.5-x86_64-2.7
        creating build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
        copying vowpalwabbit/__init__.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
        copying vowpalwabbit/pyvw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
        copying vowpalwabbit/sklearn_vw.py -> build/lib.macosx-10.5-x86_64-2.7/vowpalwabbit
        running egg_info
        creating vowpalwabbit.egg-info
        writing vowpalwabbit.egg-info/PKG-INFO
        writing top-level names to vowpalwabbit.egg-info/top_level.txt
        writing dependency_links to vowpalwabbit.egg-info/dependency_links.txt
        writing manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
        warning: manifest_maker: standard file '-c' not found
    
        reading manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
        reading manifest template 'MANIFEST.in'
        warning: no files found matching '*' under directory 'src'
        warning: no previously-included files matching '*.o' found anywhere in distribution
        warning: no previously-included files matching '*.exe' found anywhere in distribution
        warning: no previously-included files matching '*.pyc' found anywhere in distribution
        writing manifest file 'vowpalwabbit.egg-info/SOURCES.txt'
        running build_ext
        make: *** No rule to make target `clean'.  Stop.
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py", line 184, in <module>
            tests_require=['tox'],
    
    [...]
    
        subprocess.CalledProcessError: Command '['make', 'clean']' returned non-zero exit status 2
    
        ----------------------------------------
    Command "/Users/vvvvv/anaconda/envs/trendrank/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-4GRRiq-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/vx/n21m77w12nd0tb9xwhfcgd75gpm86h/T/pip-build-74A7hN/vowpalwabbit/
    
  • Accept multiline examples in the JNI interface

    Accept multiline examples in the JNI interface

    This is a pretty major refactoring of the JNI layer. The impetus for this refactoring was the ability to accept multiline examples but it lead to a much larger change. The biggest change is the decoupling of the return type and the prediction function. This proved necessary to support all the different ways to extract data with --cb_explore.

    I am going to go over this in detail with @deaktator offline, so @JohnLangford let's hold off on merging this for now. If anyone else has any comments at this time they are certainly welcome.

  • feat: stabilize unique_ptr based initialize function

    feat: stabilize unique_ptr based initialize function

    I am essentially just stabilizing the initialize_experimental function which has been working well for quite some time. With the changes of adding skip_model_load and removing the custom stack functionality from the stable interface. The custom stack is still available from initialize_experimental

    std::unique_ptr<VW::workspace> initialize(std::unique_ptr<config::options_i> options,
        std::unique_ptr<VW::io::reader> model_override_reader = nullptr, bool skip_model_load = false,
        driver_output_func_t driver_output_func = nullptr, void* driver_output_func_context = nullptr,
        VW::io::logger* custom_logger = nullptr);
    
  • refactor: change initial pool size to 0

    refactor: change initial pool size to 0

    Forcing all consumers to have a preallocated pool is quite restrictive, especially now that the pool is dynamic (in the past it was a fixed size ring buffer).

    Removing this initial allocation improves support for library scenarios, and in practice should have no effect on driver performance. (hyperfine shows this too)

    This change from a default of 256 to an initial size of 0 means that base memory consumption is halved (An example is ~31kb) from 13.5MB to 6.2MB.

    Startup should also be reduced. While it is hard to measure: vw --no_stdin using hyperfine shows:

    Benchmark 1: ./build/vowpalwabbit/cli/vw --no_stdin
      Time (mean ± σ):       4.5 ms ±   2.1 ms    [User: 1.6 ms, System: 0.9 ms]
      Range (min … max):     3.0 ms …  30.6 ms    682 runs
    
      Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
    
    Benchmark 2: ./vw-master --no_stdin
      Time (mean ± σ):       5.8 ms ±   1.6 ms    [User: 2.0 ms, System: 1.5 ms]
      Range (min … max):     4.2 ms …  21.6 ms    658 runs
    
      Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
    
    Summary
      './build/vowpalwabbit/cli/vw --no_stdin' ran
        1.29 ± 0.69 times faster than './vw-master --no_stdin'
    
  • Conditional Contextual Bandit - predictions does not predict

    Conditional Contextual Bandit - predictions does not predict

    Describe the bug

    Hello,

    I am currently working on the conditinal contextual combinatorial bandit, i.e. ccb. I have 4 slots and a total of 12 actions. I am training my algorithm with the following code:

    def formatting_and_training_bandit_input_data(bandit, vowpal_workspace):
        for i in bandit.Id.unique():
            bandit_loop =bandit[bandit['Id'] == i]
            bandit_loop.reset_index(inplace = True,drop = True)
            action =bandit_loop.Action
            prob_action =bandit_loop.prob_action
            reward=bandit_loop.Reward * -100
            userid = bandit_loop.UserId
            gender = bandit_loop.Gender
            age =bandit_loop.Age
            daten =[
            f"ccb shared  | UserId={userid[0]} Gender={gender[0]}  Age={age[0]} clicks_total:{bandit_loop.clicks_total[0]} AmountPerOrder_Median_D365:{bandit_loop.AmountPerOrder_Median_D365[0]} avg_product_amount:{bandit_loop.avg_product_amount[0]}  day_lenght:{bandit_loop.day_lenght[0]}",        "ccb action |",
            "ccb action | 0",
            "ccb action | 1",
            "ccb action | 2",
            "ccb action | 3",
            "ccb action | 4",
            "ccb action | 5",
            "ccb action | 6",
            "ccb action | 7",
            "ccb action | 8",
            "ccb action | 9",
            "ccb action | 10",
            "ccb action | 11",
            f"ccb slot {action.iloc[0 ]}:{prob_action.iloc[0 ]}:{reward.iloc[0]} |",
            f"ccb slot {action.iloc[1]}:{prob_action.iloc[1 ]}:{reward.iloc[1] } |",
            f"ccb slot {action.iloc[2]}:{prob_action.iloc[2 ]}:{reward.iloc[ 2]} |",
            f"ccb slot {action.iloc[3]}:{prob_action.iloc[3 ]}:{reward.iloc[3]} |",
            # f"ccb slot |",
            # f"ccb slot |",
            # f"ccb slot |",
            # f"ccb slot |",
            # f"ccb slot |",
            # f"ccb slot |",
            # f"ccb slot |",
            # f"ccb slot |",
            ]
                
           
            vowpal_workspace.learn(daten)
        vowpal_workspace.save("cb.saved")
        return daten
    

    I would like to make the forecast with the following data:

    def formatting_and_predicting_bandit_input_data(bandit, vowpal_workspace):
        prediction_result = []
        for i in bandit.Id.unique():
            bandit_loop =bandit[bandit['Id'] == i]
            bandit_loop.reset_index(inplace = True,drop = True)
            # action =bandit_loop.Action
            # reward=bandit_loop.Reward * -1
            userid = bandit_loop.UserId
            gender = bandit_loop.Gender
            age =bandit_loop.Age
            daten =[
            f"ccb shared  | UserId={userid[0]} Gender={gender[0]}  Age={age[0]} clicks_total:{bandit_loop.clicks_total[0]} AmountPerOrder_Median_D365:{bandit_loop.AmountPerOrder_Median_D365[0]} avg_product_amount:{bandit_loop.avg_product_amount[0]}  day_lenght:{bandit_loop.day_lenght[0]}",
            "ccb action | 0",
            "ccb action | 1",
            "ccb action | 2",
            "ccb action | 3",
            "ccb action | 4",
            "ccb action | 5",
            "ccb action | 6",
            "ccb action | 7",
            "ccb action | 8",
            "ccb action | 9",
            "ccb action | 10",
            "ccb action | 11",
            f"ccb slot |",
            f"ccb slot |",
            f"ccb slot |",
            f"ccb slot |",
    
            ]
                
    
            pred= vowpal_workspace.predict(daten)
            prediction_result.append(pred[0][0][0])
            prediction_result.append(pred[1][0][0])
            prediction_result.append(pred[2][0][0])
            prediction_result.append(pred[3][0][0])
        return prediction_result
    

    The problem is that the forecast is not good. All 12 actions are used with the same frequency. Can anyone help me here? I

    How to reproduce

    you simply cant

    Version

    22

    OS

    Ubuntu

    Language

    Python

    Additional context

    Conditional Contextual Bandit

This is a Machine Learning model which predicts the presence of Diabetes in Patients

Diabetes Disease Prediction This is a machine Learning mode which tries to determine if a person has a diabetes or not. Data The dataset is in comma s

Mar 16, 2022
Machine Learning Model to predict the payment date of an invoice when it gets created in the system.

Payment-Date-Prediction Machine Learning Model to predict the payment date of an invoice when it gets created in the system.

Sep 9, 2022
Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Jan 6, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Jan 9, 2023
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Nov 25, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Oct 3, 2022
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks
STUMPY is a powerful and scalable Python library for computing a Matrix Profile, which can be used for a variety of time series data mining tasks

STUMPY STUMPY is a powerful and scalable library that efficiently computes something called the matrix profile, which can be used for a variety of tim

Jan 6, 2023
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

Jun 30, 2022
A concept I came up which ditches the idea of "layers" in a neural network.
A concept I came up which ditches the idea of

Dynet A concept I came up which ditches the idea of "layers" in a neural network. Install Copy Dynet.py to your project. Run the example Install matpl

Dec 5, 2021
Ml based project which uses regression technique to predict the price.

Price-Predictor Ml based project which uses regression technique to predict the price. I have used various regression models and finds the model with

Jul 9, 2022
A simple python program which predicts the success of a movie based on it's type, actor, actress and director

Movie-Success-Prediction A simple python program which predicts the success of a movie based on it's type, actor, actress and director. The program us

Dec 17, 2021
learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio
learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

learn python in 100 days, a simple step could be follow from beginner to master of every aspect of python programming and project also include side project which you can use as demo project for your personal portfolio

Nov 5, 2022
Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms
Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Jan 31, 2022
nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Dec 26, 2022
Book Recommender System Using Sci-kit learn N-neighbours

Model-Based-Recommender-Engine I created a book Recommender System using Sci-kit learn's N-neighbours algorithm for my model and the streamlit library

Jan 13, 2022
Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Jan 3, 2023
A data preprocessing package for time series data. Design for machine learning and deep learning.

A data preprocessing package for time series data. Design for machine learning and deep learning.

Jan 7, 2023
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Dec 30, 2022