Machine Learning Platform for Kubernetes

License: Apache 2 Polyaxon API Slack

Docs Release GitHub GitHub

Operator Core Api scheduler

Hub Helm Charts Codacy Badge


polyaxon

Reproduce, Automate, Scale your data science.


Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applications. We are making a system to solve reproducibility, automation, and scalability for machine learning applications.

Polyaxon deploys into any data center, cloud provider, or can be hosted and managed by Polyaxon, and it supports all the major deep learning frameworks such as Tensorflow, MXNet, Caffe, Torch, etc.

Polyaxon makes it faster, easier, and more efficient to develop deep learning applications by managing workloads with smart container and node management. And it turns GPU servers into shared, self-service resources for your team or organization.


demo


Install

TL;DR;

  • Install CLI

    # Install Polyaxon CLI
    $ pip install -U polyaxon
  • Create a deployment

    # Create a namespace
    $ kubectl create namespace polyaxon
    
    # Add Polyaxon charts repo
    $ helm repo add polyaxon https://charts.polyaxon.com
    
    # Deploy Polyaxon
    $ polyaxon admin deploy -f config.yaml
    
    # Access API
    $ polyaxon port-forward

Please check polyaxon installation guide

Quick start

TL;DR;

  • Start a project

    # Create a project
    $ polyaxon project create --name=quick-start --description='Polyaxon quick start.'
  • Train and track logs & resources

    # Upload code and start experiments
    $ polyaxon run -f experiment.yaml -l
  • Dashboard

    # Start Polyaxon dashboard
    $ polyaxon dashboard
    
    Dashboard page will now open in your browser. Continue? [Y/n]: y
  • Notebook

    # Start Jupyter notebook for your project
    $ polyaxon run --hub notebook
  • Tensorboard

    # Start TensorBoard for a run's output
    $ polyaxon run --hub tensorboard --run-uuid=UUID

compare dashboards tensorboard compare


Please check our quick start guide to start training your first experiment.

Distributed job

Polyaxon supports and simplifies distributed jobs. Depending on the framework you are using, you need to deploy the corresponding operator, adapt your code to enable the distributed training, and update your polyaxonfile.

Here are some examples of using distributed training:

Hyperparameters tuning

Polyaxon has a concept for suggesting hyperparameters and managing their results very similar to Google Vizier called experiment groups. An experiment group in Polyaxon defines a search algorithm, a search space, and a model to train.

Parallel executions

You can run your processing or model training jobs in parallel, Polyaxon provides a mapping abstraction to manage concurrent jobs.

DAGs and workflows

Polyaxon DAGs is a tool that provides container-native engine for running machine learning pipelines. A DAG manages multiple operations with dependencies. Each operation is defined by a component runtime. This means that operations in a DAG can be jobs, services, distributed jobs, parallel executions, or nested DAGs.

Architecture

Polyaxon architecture

Documentation

Check out our documentation to learn more about Polyaxon.

Dashboard

Polyaxon comes with a dashboard that shows the projects and experiments created by you and your team members.

To start the dashboard, just run the following command in your terminal

$ polyaxon dashboard -y

Project status

Polyaxon is stable and it's running in production mode at many startups and Fortune 500 companies.

Contributions

Please follow the contribution guide line: Contribute to Polyaxon.

Research

If you use Polyaxon in your academic research, we would be grateful if you could cite it.

Feel free to contact us, we would love to learn about your project and see how we can support your custom need.

Owner
polyaxon
A platform for reproducible and scalable machine learning and deep learning on kubernetes
polyaxon
Comments
  • Tensorboard error for the quick-start example

    Tensorboard error for the quick-start example

    Describe the bug

    I'm running the examples from the quick-start guide and when I tried to start Tensorboard I got the error:

    Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/polyaxon_k8s/manager.py", line 316, in create_or_update_deployment return self.create_deployment(name=name, body=body), True File "/usr/local/lib/python3.7/site-packages/polyaxon_k8s/manager.py", line 302, in create_deployment namespace=self.namespace, body=body File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/extensions_v1beta1_api.py", line 175, in create_namespaced_deployment (data) = self.create_namespaced_deployment_with_http_info(namespace, body, **kwargs) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/extensions_v1beta1_api.py", line 266, in create_namespaced_deployment_with_http_info collection_formats=collection_formats) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 334, in call_api _return_http_data_only, collection_formats, _preload_content, _request_timeout) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 168, in __call_api _request_timeout=_request_timeout) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 377, in request body=body) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 266, in POST body=body) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 222, in request raise ApiException(http_resp=r) kubernetes.client.rest.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Tue, 21 Jan 2020 17:03:28 GMT', 'Content-Length': '374'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.extensions is forbidden: User \"system:serviceaccount:polyaxon:polyaxon-polyaxon-serviceaccount\" cannot create resource \"deployments\" in API group \"extensions\" in the namespace \"polyaxon\"","reason":"Forbidden","details":{"group":"extensions","kind":"deployments"},"code":403} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/polyaxon_k8s/manager.py", line 319, in create_or_update_deployment return self.update_deployment(name=name, body=body), False File "/usr/local/lib/python3.7/site-packages/polyaxon_k8s/manager.py", line 309, in update_deployment name=name, namespace=self.namespace, body=body File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/extensions_v1beta1_api.py", line 4089, in patch_namespaced_deployment (data) = self.patch_namespaced_deployment_with_http_info(name, namespace, body, **kwargs) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/apis/extensions_v1beta1_api.py", line 4189, in patch_namespaced_deployment_with_http_info collection_formats=collection_formats) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 334, in call_api _return_http_data_only, collection_formats, _preload_content, _request_timeout) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 168, in __call_api _request_timeout=_request_timeout) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 393, in request body=body) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 286, in PATCH body=body) File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 222, in request raise ApiException(http_resp=r) kubernetes.client.rest.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Tue, 21 Jan 2020 17:03:28 GMT', 'Content-Length': '484'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.extensions \"plx-tensorboard-5aa275f671f64a75924c66323cb0e6a4\" is forbidden: User \"system:serviceaccount:polyaxon:polyaxon-polyaxon-serviceaccount\" cannot patch resource \"deployments\" in API group \"extensions\" in the namespace \"polyaxon\"","reason":"Forbidden","details":{"name":"plx-tensorboard-5aa275f671f64a75924c66323cb0e6a4","group":"extensions","kind":"deployments"},"code":403} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/polyaxon/polyaxon/scheduler/tensorboard_scheduler.py", line 53, in start_tensorboard reconcile_url=get_tensorboard_reconcile_url(tensorboard.unique_name)) File "/polyaxon/polyaxon/polypod/tensorboard.py", line 234, in start_tensorboard reraise=True) File "/usr/local/lib/python3.7/site-packages/polyaxon_k8s/manager.py", line 322, in create_or_update_deployment raise PolyaxonK8SError(e) polyaxon_k8s.exceptions.PolyaxonK8SError: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Tue, 21 Jan 2020 17:03:28 GMT', 'Content-Length': '484'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.extensions \"plx-tensorboard-5aa275f671f64a75924c66323cb0e6a4\" is forbidden: User \"system:serviceaccount:polyaxon:polyaxon-polyaxon-serviceaccount\" cannot patch resource \"deployments\" in API group \"extensions\" in the namespace \"polyaxon\"","reason":"Forbidden","details":{"name":"plx-tensorboard-5aa275f671f64a75924c66323cb0e6a4","group":"extensions","kind":"deployments"},"code":403} 
    

    To Reproduce

    $ git clone https://github.com/polyaxon/polyaxon-quick-start.git
    $ # run create, init, etc.
    $ polyaxon run -f polyaxonfile_hyperparams.yml
    $ # wait..
    $ polyaxon tensorboard -g 1 start
    

    Expected behavior

    No error.

    Environment

    Kubernetes 1.17 using Kubeadm on a local cluster.

    Let me know if you need more info.

  • Expose configmaps/secrets to build environment

    Expose configmaps/secrets to build environment

    Hey, I was wondering if I could expose configmaps or secrets to build jobs aswell. What I'm trying to do is add some custom apt sources along with a client cert in order to install some internal packages as dependencies. Currently we work around this by installing some packages at runtime.

  • No nodes in cluster and experiments fail to build

    No nodes in cluster and experiments fail to build

    I deployed Polyaxon on Minikube (Mac) and am trying to run experiments using the polyaxon quickstart repo (https://github.com/polyaxon/polyaxon-quick-start.git). However, the experiment build keeps failing, and running 'polyaxon cluster' shows no nodes:

    Cluster info:


    major 1 minor 10 compiler gc platform linux/amd64 build_date 2018-03-26T16:44:10Z git_commit fc32d2f3698e36b93322a3465f63a14e9f0eaead go_version go1.9.3 git_version v1.10.0 git_tree_state clean


    When I run 'kubectl get pods --all-namespaces', this is the output

    NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-c4cffd6dc-42gcs 1/1 Running 0 23h kube-system etcd-minikube 1/1 Running 0 23h kube-system kube-addon-manager-minikube 1/1 Running 0 23h kube-system kube-apiserver-minikube 1/1 Running 0 23h kube-system kube-controller-manager-minikube 1/1 Running 0 23h kube-system kube-dns-86f4d74b45-652fq 3/3 Running 0 23h kube-system kube-proxy-npxr5 1/1 Running 0 23h kube-system kube-scheduler-minikube 1/1 Running 0 23h kube-system kubernetes-dashboard-6f4cfc5d87-p2z4j 1/1 Running 0 23h kube-system storage-provisioner 1/1 Running 0 23h kube-system tiller-deploy-778f674bf5-xhmsv 1/1 Running 0 23h polyaxon polyaxon-docker-registry-78d5499fc9-4wm69 1/1 Running 0 5h polyaxon polyaxon-polyaxon-api-7b97bb447d-jl6h6 2/2 Running 0 5h polyaxon polyaxon-polyaxon-beat-77fb6cccc7-lmdhw 2/2 Running 0 5h polyaxon polyaxon-polyaxon-events-79c8ff59d9-2rqcq 1/1 Running 0 5h polyaxon polyaxon-polyaxon-hpsearch-9b5589f5-874n5 1/1 Running 0 5h polyaxon polyaxon-polyaxon-k8s-events-697cf8bb65-mnjz8 1/1 Running 0 5h polyaxon polyaxon-polyaxon-logs-7bf467999-b8755 1/1 Running 0 5h polyaxon polyaxon-polyaxon-monitors-57db4f7cd7-7x2j5 2/2 Running 0 5h polyaxon polyaxon-polyaxon-resources-glgwq 1/1 Running 0 5h polyaxon polyaxon-polyaxon-scheduler-76ccf9d665-xb9bg 1/1 Running 0 5h polyaxon polyaxon-postgresql-78d4cff55c-jhcvz 1/1 Running 0 5h polyaxon polyaxon-rabbitmq-6448d76c84-vp5ll 1/1 Running 0 5h polyaxon polyaxon-redis-688468649b-tg6qp 1/1 Running 0 5h

    I have also tried running 'helm update' and upgraded polyaxon to the latest release (0.3.2). How can I troubleshoot this?

  • deleted flagged missed in initialization

    deleted flagged missed in initialization

    Describe the bug

    Getting this error with version 1.1.9

    image

    To reproduce

    polyaxon upgrade && polyaxon run -f poylaxonfile

    Expected behavior

    Run completed

    Environment

    polyaxon 1.1.9

  • Scheduling many jobs at the same time leads to zombie state jobs (possible race condition?)

    Scheduling many jobs at the same time leads to zombie state jobs (possible race condition?)

    Describe the bug

    It's hard to consistently reproduce, but when scheduling many jobs such that the build happens to be at the same time, it seems like we can get the following scenario: K8s correctly schedules the pods according to their requests/limits and the available resources. Polyaxon however believes that some jobs are running although they are unschedulable by K8s. When freeing up resources quickly enough, K8s actually schedules those jobs and nothing else happens. However, if resources are blocked long enough, Polyaxon's heartbeat service will automatically stop these jobs (that it believes are running although they are unschedulable by K8s) and fail them. To me, this could be a critical bug in the scheduler and really seems like some kind of race condition. I haven't tested it with multiple users, but I assume this would occur if many users submit different jobs at the same time (a likely scenario).

    To Reproduce

    1. Create a job with a fairly large build and long runnning time (>2000 seconds).
    2. Make sure that only two of these jobs can run on the cluster at a time (by requesting resources accordingly).
    3. Run this job many times with polyaxon run -f polyaxonfile.yml (submit this command again as soon as it terminates and repeat 5 times)

    Expected behavior

    The jobs should just be recognized as unschedulable and scheduled when the resources become available again.

    Environment

    Polyaxon 0.5.6, Kubernetes 1.15.4

  • Can't use TPU

    Can't use TPU

    Describe the bug

    I tried to use Cloud TPU. But I got the error on StackDriver logging. And the experiment was failed. It seems that we need to specify tensorflow version with annotation.

    HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Internal error occurred: admission webhook \"pod-init.cloud-tpus.google.com\" denied the request: TensorFlow version must be specified in annotation \"tf-version.cloud-tpus.google.com\" for pod requesting Cloud TPUs","reason":"InternalError","details":{"causes":[{"message":"admission webhook \"pod-init.cloud-tpus.google.com\" denied the request: TensorFlow version must be specified in annotation \"tf-version.cloud-tpus.google.com\" for pod requesting Cloud TPUs"}]},"code":500}
    

    To Reproduce

    YAML

    ---
    version: 1
    
    kind: experiment
    
    environment:
      resources:
        cpu:
          requests: 4
          limits: 4
        memory:
          requests: 15000
          limits: 15000
        tpu:
          requests: 8
          limits: 8
    
    build:
      image: tensorflow/tensorflow:1.12.0
      build_steps:
        - pip install --no-cache-dir -r requirements.txt
    
    run:
      # this is just a dummy python file.
      cmd: python test.py
    

    requirements.txt

    polyaxon-client==0.3.8
    polyaxon-cli==0.3.8
    jupyter
    google-cloud-storage
    

    Expected behavior

    We can create a TPU.

    Environment

    • Polyaxon: 0.3.8

    Links

    • https://cloud.google.com/tpu/docs/kubernetes-engine-setup
    • https://github.com/tensorflow/tpu/blob/master/models/official/resnet/resnet_k8s.yaml#L28
  • Deploying on Kubernetes cluster created w/ Kubespray

    Deploying on Kubernetes cluster created w/ Kubespray

    Hi -

    I'm trying to spin up a Kubernetes cluster without the benefit of managed service like EKS or GKE, then deploy Polyaxon on that cluster. Currently I'm experiencing some issues on the Polyaxon side of this process.

    To deploy the Kubernetes cluster I'm using kubespray. I'm able to deploy the cluster to the point that kubectl get nodes shows the expected nodes in a ready state, and I'm able to deploy a simple Node.js app as a test. I am not, however, able to successfully install Polyaxon on the cluster.

    I've tried on both AWS and on my local machine using Vagrant/Virtualbox. The issues I'm experiencing are different between the two cases, which I find interesting, so I'll document both.

    AWS

    I deployed Kubernetes by loosely following this tutorial. Things went smoothly for the most part, except that I needed to deal with this issue using this fix. I used 3 t2.large instance running Ubuntu 16.04 and the standard kubespray configuration.

    As I mentioned above, I get the expected output from kubectl get nodes, and I'm able to deploy the Node.js app at the end of the tutorial.

    At first, the Polyaxon installation/deployment also seems to succeed:

    [email protected]:~$ helm install polyaxon/polyaxon \
    > --name=polyaxon \
    > --namespace=polyaxon \
    > -f polyaxon_config.yml
    NAME:   polyaxon
    LAST DEPLOYED: Sat Feb  9 00:03:29 2019
    NAMESPACE: polyaxon
    STATUS: DEPLOYED
    
    RESOURCES:
    ==> v1/Secret
    NAME                             TYPE    DATA  AGE
    polyaxon-docker-registry-secret  Opaque  1     3m4s
    polyaxon-postgresql              Opaque  1     3m4s
    polyaxon-rabbitmq                Opaque  2     3m4s
    polyaxon-polyaxon-secret         Opaque  4     3m4s
    
    ==> v1/ConfigMap
    NAME                      DATA  AGE
    redis-config              1     3m4s
    polyaxon-polyaxon-config  141   3m4s
    
    ==> v1beta1/ClusterRole
    NAME                           AGE
    polyaxon-polyaxon-clusterrole  3m4s
    
    ==> v1beta1/DaemonSet
    NAME                         DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
    polyaxon-polyaxon-resources  2        2        2      2           2          <none>         3m4s
    
    ==> v1beta1/Deployment
    NAME                          DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
    polyaxon-docker-registry      1        1        1           1          3m4s
    polyaxon-postgresql           1        1        1           1          3m4s
    polyaxon-rabbitmq             1        1        1           1          3m4s
    polyaxon-redis                1        1        1           1          3m4s
    polyaxon-polyaxon-api         1        1        1           0          3m4s
    polyaxon-polyaxon-beat        1        1        1           1          3m4s
    polyaxon-polyaxon-events      1        1        1           1          3m4s
    polyaxon-polyaxon-hpsearch    1        1        1           1          3m4s
    polyaxon-polyaxon-k8s-events  1        1        1           1          3m4s
    polyaxon-polyaxon-monitors    1        1        1           1          3m4s
    polyaxon-polyaxon-scheduler   1        1        1           1          3m3s
    
    ==> v1/Pod(related)
    NAME                                           READY  STATUS   RESTARTS  AGE
    polyaxon-polyaxon-resources-hpbcv              1/1    Running  0         3m4s
    polyaxon-polyaxon-resources-m7bjv              1/1    Running  0         3m4s
    polyaxon-docker-registry-58bff6f777-vkl6h      1/1    Running  0         3m4s
    polyaxon-postgresql-f4fc68c67-25t4p            1/1    Running  0         3m4s
    polyaxon-rabbitmq-74c5d87cf6-qlk2b             1/1    Running  0         3m4s
    polyaxon-redis-6f7db88668-99qvw                1/1    Running  0         3m4s
    polyaxon-polyaxon-api-75c5989cb4-ppv7t         1/2    Running  0         3m4s
    polyaxon-polyaxon-beat-759d6f9f96-qdhmd        2/2    Running  0         3m3s
    polyaxon-polyaxon-events-86f49f8b78-vvscx      1/1    Running  0         3m4s
    polyaxon-polyaxon-hpsearch-5f77c8d6cd-gkdms    1/1    Running  0         3m3s
    polyaxon-polyaxon-k8s-events-555f6c8754-c242k  1/1    Running  0         3m3s
    polyaxon-polyaxon-monitors-864dd8fb67-h7s47    2/2    Running  0         3m2s
    polyaxon-polyaxon-scheduler-7f4978774d-pm9xz   1/1    Running  0         3m2s
    
    ==> v1/ServiceAccount
    NAME                                      SECRETS  AGE
    polyaxon-polyaxon-serviceaccount          1        3m4s
    polyaxon-polyaxon-workers-serviceaccount  1        3m4s
    
    ==> v1beta1/ClusterRoleBinding
    NAME                                   AGE
    polyaxon-polyaxon-clusterrole-binding  3m4s
    
    ==> v1beta1/Role
    NAME                            AGE
    polyaxon-polyaxon-role          3m4s
    polyaxon-polyaxon-workers-role  3m4s
    
    ==> v1beta1/RoleBinding
    NAME                                    AGE
    polyaxon-polyaxon-role-binding          3m4s
    polyaxon-polyaxon-workers-role-binding  3m4s
    
    ==> v1/Service
    NAME                      TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)                                AGE
    polyaxon-docker-registry  NodePort      10.233.42.186  <none>       5000:31813/TCP                         3m4s
    polyaxon-postgresql       ClusterIP     10.233.17.56   <none>       5432/TCP                               3m4s
    polyaxon-rabbitmq         ClusterIP     10.233.33.173  <none>       4369/TCP,5672/TCP,25672/TCP,15672/TCP  3m4s
    polyaxon-redis            ClusterIP     10.233.31.108  <none>       6379/TCP                               3m4s
    polyaxon-polyaxon-api     LoadBalancer  10.233.36.234  <pending>    80:32050/TCP,1337:31832/TCP            3m4s
    

    After a few minutes all the expected pods are running:

    [email protected]:~$ kubectl get pods --namespace polyaxon
    NAME                                            READY   STATUS    RESTARTS   AGE
    polyaxon-docker-registry-58bff6f777-vkl6h       1/1     Running   0          3m49s
    polyaxon-polyaxon-api-75c5989cb4-ppv7t          1/2     Running   0          3m49s
    polyaxon-polyaxon-beat-759d6f9f96-qdhmd         2/2     Running   0          3m48s
    polyaxon-polyaxon-events-86f49f8b78-vvscx       1/1     Running   0          3m49s
    polyaxon-polyaxon-hpsearch-5f77c8d6cd-gkdms     1/1     Running   0          3m48s
    polyaxon-polyaxon-k8s-events-555f6c8754-c242k   1/1     Running   0          3m48s
    polyaxon-polyaxon-monitors-864dd8fb67-h7s47     2/2     Running   0          3m47s
    polyaxon-polyaxon-resources-hpbcv               1/1     Running   0          3m49s
    polyaxon-polyaxon-resources-m7bjv               1/1     Running   0          3m49s
    polyaxon-polyaxon-scheduler-7f4978774d-pm9xz    1/1     Running   0          3m47s
    polyaxon-postgresql-f4fc68c67-25t4p             1/1     Running   0          3m49s
    polyaxon-rabbitmq-74c5d87cf6-qlk2b              1/1     Running   0          3m49s
    polyaxon-redis-6f7db88668-99qvw                 1/1     Running   0          3m49s
    

    The issue in this case arises with the LoadBalancer IP, which remains suspended in a pending state:

    [email protected]:~$ kubectl get --namespace polyaxon svc -w polyaxon-polyaxon-api
    NAME                    TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                       AGE
    polyaxon-polyaxon-api   LoadBalancer   10.233.52.219   <pending>     80:30684/TCP,1337:31886/TCP   13h
    
    [email protected]:~$ kubectl get svc --namespace polyaxon polyaxon-polyaxon-api -o json
    {
        "apiVersion": "v1",
        "kind": "Service",
        "metadata": {
            "creationTimestamp": "2019-02-09T01:03:11Z",
            "labels": {
                "app": "polyaxon-polyaxon-api",
                "chart": "polyaxon-0.3.8",
                "heritage": "Tiller",
                "release": "polyaxon",
                "role": "polyaxon-api",
                "type": "polyaxon-core"
            },
            "name": "polyaxon-polyaxon-api",
            "namespace": "polyaxon",
            "resourceVersion": "17172",
            "selfLink": "/api/v1/namespaces/polyaxon/services/polyaxon-polyaxon-api",
            "uid": "78640925-2c06-11e9-8f3f-121248b9afae"
        },
        "spec": {
            "clusterIP": "10.233.52.219",
            "externalTrafficPolicy": "Cluster",
            "ports": [
                {
                    "name": "api",
                    "nodePort": 30684,
                    "port": 80,
                    "protocol": "TCP",
                    "targetPort": 80
                },
                {
                    "name": "streams",
                    "nodePort": 31886,
                    "port": 1337,
                    "protocol": "TCP",
                    "targetPort": 1337
                }
            ],
            "selector": {
                "app": "polyaxon-polyaxon-api"
            },
            "sessionAffinity": "None",
            "type": "LoadBalancer"
        },
        "status": {
            "loadBalancer": {}
        }
    }
    

    Looking through the Polyaxon issues, I see that this can happen on minikube, but I wasn't able to find anything that helps me debug my particular case. What are the conditions that need to be met in the Kubernetes deployment, in order for the LoadBalancer IP step to succeed?

    Vagrant/Virtualbox

    I was suspicious that my issues might be specific to the AWS environment, rather than a general issue with kubespray/polyaxon, so as a second test I tried deploying the Kubernetes cluster locally using Vagrant and Virtualbox. To do this I used the Vagrantfile in the kubespray repo as described here.

    After debugging a couple kubespray issues, I was able to get the cluster up and running and deploy the Node.js app again.

    Deploying Polyaxon, I again saw the issue w/ the LoadBalancer IP getting stuck in a pending state. What was interesting to me though, was that a number of pods actually failed to run as well, despite the fact that the deployment ostensibly succeeded:

    [email protected]:~$ helm ls
    NAME            REVISION        UPDATED                         STATUS          CHART           APP VERSION     NAMESPACE
    polyaxon        1               Sat Feb  9 06:01:21 2019        DEPLOYED        polyaxon-0.3.8                  polyaxon
    
    [email protected]:~$ kubectl get pods --namespace polyaxon
    NAME                                           READY   STATUS    RESTARTS   AGE
    polyaxon-docker-registry-58bff6f777-wlb9p      0/1     Pending   0          36m
    polyaxon-polyaxon-api-6bc75ff4ff-v694k         0/2     Pending   0          36m
    polyaxon-polyaxon-beat-744c96b9f8-mbz5j        0/2     Pending   0          36m
    polyaxon-polyaxon-events-58d9c9cbd6-72skt      0/1     Pending   0          36m
    polyaxon-polyaxon-hpsearch-dc9cf6556-8rh78     0/1     Pending   0          36m
    polyaxon-polyaxon-k8s-events-9f8cdf5-fvqnx     0/1     Pending   0          36m
    polyaxon-polyaxon-monitors-58766747c9-gcf2r    0/2     Pending   0          36m
    polyaxon-polyaxon-resources-rnntm              1/1     Running   0          36m
    polyaxon-polyaxon-resources-t4pv6              0/1     Pending   0          36m
    polyaxon-polyaxon-resources-x9f42              0/1     Pending   0          36m
    polyaxon-polyaxon-scheduler-76bfdcfcc7-d9tq4   0/1     Pending   0          36m
    polyaxon-postgresql-f4fc68c67-lwgds            1/1     Running   0          36m
    polyaxon-rabbitmq-74c5d87cf6-lhvj8             1/1     Running   0          36m
    polyaxon-redis-6f7db88668-6wlgs                1/1     Running   0          36m
    

    I'm not quite sure what's going on here. My best guess would be that the virtual machines don't have the necessary resources to run these pods? ... Would be interesting to hear the experts weigh in 😄.

    Please help!

  • polyaxon/polyaxon-api is start but no service on

    polyaxon/polyaxon-api is start but no service on

    docker log

    Running...
    Use default user
    nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
    nginx: configuration file /etc/nginx/nginx.conf test is successful
    Restarting nginx: nginx.
    nginx is running.
    [uWSGI] getting INI configuration from web/uwsgi.nginx.ini
    *** Starting uWSGI 2.0.18 (64bit) on [Tue Aug 18 08:34:22 2020] ***
    compiled with version: 6.3.0 20170516 on 13 August 2020 13:15:05
    os: Linux-4.18.0-193.el8.x86_64 #1 SMP Fri May 8 10:59:10 UTC 2020
    nodename: polyaxon-polyaxon-api-5c8f885949-wjq9p
    machine: x86_64
    clock source: unix
    pcre jit disabled
    detected number of CPU cores: 4
    current working directory: /polyaxon
    detected binary path: /usr/local/bin/uwsgi
    uWSGI running as root, you can use --uid/--gid/--chroot options
    *** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
    chdir() to /polyaxon/web/..
    your memory page size is 4096 bytes
    detected max file descriptor number: 1048576
    lock engine: pthread robust mutexes
    thunder lock: enabled
    uwsgi socket 0 bound to UNIX address /polyaxon/web/../web/polyaxon.sock fd 3
    uWSGI running as root, you can use --uid/--gid/--chroot options
    *** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
    Python version: 3.7.6 (default, Jan  3 2020, 23:53:24)  [GCC 6.3.0 20170516]
    Python main interpreter initialized at 0x5626c4254800
    uWSGI running as root, you can use --uid/--gid/--chroot options
    *** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
    python threads support enabled
    your server socket listen backlog is limited to 100 connections
    your mercy for graceful operations on workers is 60 seconds
    mapped 425960 bytes (415 KB) for 4 cores
    *** Operational MODE: preforking ***
    added /polyaxon/web/../polyaxon/ to pythonpath.
    WSGI app 0 (mountpoint='') ready in 2 seconds on interpreter 0x5626c4254800 pid: 66 (default app)
    uWSGI running as root, you can use --uid/--gid/--chroot options
    *** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
    *** uWSGI is running in multiple interpreter mode ***
    spawned uWSGI master process (pid: 66)
    spawned uWSGI worker 1 (pid: 72, cores: 1)
    spawned uWSGI worker 2 (pid: 73, cores: 1)
    spawned uWSGI worker 3 (pid: 74, cores: 1)
    spawned uWSGI worker 4 (pid: 75, cores: 1)
    

    docker image

    polyaxon/polyaxon-gateway                                        1.1.7                 a52bd2a3a36d        4 days ago          473MB
    polyaxon/polyaxon-api                                            1.1.7                 dc1d59a6bff9        4 days ago          590MB
    polyaxon/polyaxon-cli                                            1.1.7                 5ea8e132a2a0        4 days ago          419MB
    

    kubectl --namespace=polyaxon get pod

    NAME                                          READY   STATUS    RESTARTS   AGE
    polyaxon-polyaxon-api-5c8f885949-wjq9p        0/1     Running   4          30m
    polyaxon-polyaxon-gateway-77c4d46d4d-t85ww    1/1     Running   0          30m
    polyaxon-polyaxon-operator-7f48b54676-mh48l   1/1     Running   0          30m
    polyaxon-polyaxon-streams-7c4876dc54-jh2p6    1/1     Running   0          30m
    polyaxon-postgresql-0                         1/1     Running   0          30m
    

    helm version

    Client: &version.Version{SemVer:"v2.16.10", GitCommit:"bceca24a91639f045f22ab0f41e47589a932cf5e", GitTreeState:"clean"}
    Server: &version.Version{SemVer:"v2.16.10", GitCommit:"bceca24a91639f045f22ab0f41e47589a932cf5e", GitTreeState:"clean"}
    

    kubectl version

    Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-13T16:12:48Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
    Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:48:36Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
    
  • Logs are not displayed correctly in terminal

    Logs are not displayed correctly in terminal

    Describe the bug

    Unable to see the logs correctly. Unfortunately the only things visible within in terminal are callback errors:

    $ polyaxon experiment -xp X logs
    building -- 
    scheduled -- 
    starting -- 
    running -- 
    error from callback <function SocketTransportMixin.socket.<locals>.<lambda> at 0x7fd723146400>: the JSON object must be str, not 'bytes'
    error from callback <function SocketTransportMixin.socket.<locals>.<lambda> at 0x7fd723146400>: the JSON object must be str, not 'bytes'
    error from callback <function SocketTransportMixin.socket.<locals>.<lambda> at 0x7fd723146400>: the JSON object must be str, not 'bytes'
    error from callback <function SocketTransportMixin.socket.<locals>.<lambda> at 0x7fd723146400>: the JSON object must be str, not 'bytes'
    error from callback <function SocketTransportMixin.socket.<locals>.<lambda> at 0x7fd723146400>: the JSON object must be str, not 'bytes'
    ...
    error from callback <bound method SocketTransportMixin._on_close of <polyaxon_client.transport.Transport object at 0x7fd723190978>>: _on_close() missing 1 required positional argument: 'ws'
    

    To Reproduce

    Started experiment with polyaxon run -u and then started the logs-view polyaxon experiment -xp X logs

    Experiment:

    https://github.com/polyaxon/polyaxon-examples/tree/master/tensorflow/cifare10/polyaxonfile.yml

    Expected behavior

    Building -- creating image -
      master.1 -- INFO:tensorflow:Using config: {'_model_dir': '/outputs/root/cifar10/experiments/1', '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_session_config': gpu_options {
      master.1 --   force_gpu_compatible: true
      master.1 -- }
    

    Environment

    Local

    polyaxon is running within a virtualenv using python3.

    Cluster

    OS: Ubuntu 18.04 Kubernetes: 1.12.1

  • "cluster-admin not found" error while installing polyaxon with helm

    I am using minikube to set up a local kubernetes single node cluster. I have set up helm as described in the docs. But when I try to deploy polyaxon by following the docs, I get an error.

    temp-training:~ shivam.m$ helm install --wait polyaxon/polyaxon Error: release rousing-peahen failed: clusterroles.rbac.authorization.k8s.io "rousing-peahen-polyaxon-ingress-clusterrole" is forbidden: attempt to grant extra privileges: [PolicyRule{Resources:["configmaps"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["configmaps"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["endpoints"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["pods"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["secrets"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["secrets"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["nodes"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["get"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["list"]} PolicyRule{Resources:["services"], APIGroups:[""], Verbs:["watch"]} PolicyRule{Resources:["ingresses"], APIGroups:["extensions"], Verbs:["get"]} PolicyRule{Resources:["ingresses"], APIGroups:["extensions"], Verbs:["list"]} PolicyRule{Resources:["ingresses"], APIGroups:["extensions"], Verbs:["watch"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["create"]} PolicyRule{Resources:["events"], APIGroups:[""], Verbs:["patch"]} PolicyRule{Resources:["ingresses/status"], APIGroups:["extensions"], Verbs:["update"]}] user=&{system:serviceaccount:kube-system:tiller 8e197f15-1373-11e8-9b02-080027bbca2c [system:serviceaccounts system:serviceaccounts:kube-system system:authenticated] map[]} ownerrules=[] ruleResolutionErrors=[clusterroles.rbac.authorization.k8s.io "cluster-admin" not found]

    I tried disabling the rbac and running it again but then I get an error related to port allocation. temp-training:~ shivam.m$ helm install --set=rbac.enabled=false polyaxon/polyaxon Error: release mortal-gorilla failed: Service "mortal-gorilla-docker-registry" is invalid: spec.ports[0].nodePort: Invalid value: 31813: provided port is already allocated

  • Unable to run experiments with v1.1.8

    Unable to run experiments with v1.1.8

    Describe the bug

    Unable to run experiments with new version 1.1.8. "Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f168f918700>: Failed to establish a new connection: [Errno 111] Connection refused')" Seems to be from tracking.init()

    Also when running polyaxon project ls (only the first time):

    Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f030fb6dbe0>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /api/v1/compatibility/cb08b595c6be5fe48fcbaf4860dd900c/1-1-8/cli
    Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f030fb6dc88>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /api/v1/compatibility/cb08b595c6be5fe48fcbaf4860dd900c/1-1-8/cli
    Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f030fb6dd68>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /api/v1/compatibility/cb08b595c6be5fe48fcbaf4860dd900c/1-1-8/cli
    Could not connect to remote server to fetch compatibility versions.
    Checking CLI compatibility version ...
    Could get the min/latest versions from compatibility API.
    

    However if I run it again it works as expected.

    To Reproduce

    version: 1.1
    kind: component
    name: simple-experiment
    description: Minimum information to run this TF.Keras example
    tags: [examples]
    run:
      kind: job
      init:
      - git: {url: "https://github.com/polyaxon/polyaxon-quick-start"}
        container:
          env:
            - name: http_proxy
              value: "***"
            - name: https_proxy
              value: "***"
      container:
        image: polyaxon/polyaxon-quick-start
        workingDir: "{{ globals.artifacts_path }}/polyaxon-quick-start"
        command: [python3, model.py]
        env:
          - name: http_proxy
            value: "***"
          - name: https_proxy
            value: "***"
    

    Expected behavior

    A running experiment.

    Environment

    deploymentChart: platform
    deploymentVersion: 1.1.8
    
    artifactsStore:
      name: minio
      kind: s3
      schema: {"bucket": "***"}
      secret:
        name: "***"
    
    connections:
      - name: data
        kind: volume_claim
        schema:
          mountPath: ***
          volumeClaim: ***
          readOnly: true
    
    scheduler:
      enabled: true
    
    streams:
      enabled: true
    
    postgresql:
      persistence:
        enabled: true
        storageClass: nfs
    
    redis:
      enabled: true
      master:
        persistence:
          enabled: true
          storageClass: nfs
      slave:
        persistence:
          enabled: true
          storageClass: nfs
    broker: redis
    
    rabbitmq-ha:
      enabled: false
    
    ui:
      enabled: true
      adminEnabled: true
    
  • Incorrect operation state when operation cannot be found

    Incorrect operation state when operation cannot be found

    Current behavior

    Currently polyaxon doesn't mark operation as stopped when it cannot find it for whatever reason.

    HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"operations.core.polyaxon.com \"plx-operation-e38d8ec730424cd28c3941ebbe980b88\" not found","reason":"NotFound","details":{"name":"plx-operation-e38d8ec730424cd28c3941ebbe980b88","group":"core.polyaxon.com","kind":"operations"},"code":404}
    

    Enhancement

    I would like that changed so operation is either marked as stopped or not found or whatever when polyaxon can't find it because running state is not factual and it makes integrating polyaxon into workflow unreliable.

  • How to build the polyaxon/polyaxon-operator image?

    How to build the polyaxon/polyaxon-operator image?

    Describe the problem

    How to build the polyaxon/polyaxon-operator image? I didn't find the releated dockerfile. Is it build from https://github.com/polyaxon/mloperator?

  • Polyaxon can't get plxlogs for pytorchjob in dashboard

    Polyaxon can't get plxlogs for pytorchjob in dashboard

    When i run the pytorchjob, i can't get the plxlogs in dashboard after the job finished,

    But if i clicked the logs button of this job in dashboard before the job finished, I can collect the plxlogs.

    If i run the common job, the plxlogs is normal.

    yaml config

    version: 1 kind: component tags: [examples, pytorch, kubeflow] run: kind: pytorchjob master: replicas: 1 init: - git: {"url": "https://github.com/polyaxon/polyaxon-examples"} container: image: pytorch/pytorch:1.0-cuda10.0-cudnn7-runtime command: ["sh", "-c", "python -u {{ globals.artifacts_path }}/polyaxon-examples/in_cluster/kubeflow/pytorchjob/mnist.py"] resources: requests: nvidia.com/gpu: 1 worker: replicas: 1 init: - git: {"url": "https://github.com/polyaxon/polyaxon-examples"} container: image: pytorch/pytorch:1.0-cuda10.0-cudnn7-runtime command: ["sh", "-c", "python -u {{ globals.artifacts_path }}/polyaxon-examples/in_cluster/kubeflow/pytorchjob/mnist.py"] resources: requests: nvidia.com/gpu: 1

  • Preserve an artifact staged status when copied / transferred across projects

    Preserve an artifact staged status when copied / transferred across projects

    Use case

    There's the immediate feature as the title describes and there a general use case of including model / artifact --name in plx models stage when there're multiple models with many versions where two models with different names can have the same version (there shouldn't be a restriction on version names between different models) and when one wants to change the status of a single one via

    plx models stage -p PROJECT -n MODEL -ver rc0 -to=production
    

    where -n I mean the artifact name (i.e. the name used in plx model register --artifact)

    Alternatives

    is to plx models stage again but is not possible in the global model registry where there're multiple models since plx models stage doesn't have --name corresponding to the model / artifact name.

A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.
A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

OMNI A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes. Why? When I finished my Kubernetes cluster using a few Raspber

Jul 3, 2022
OpenDILab RL Kubernetes Custom Resource and Operator Lib

DI Orchestrator DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator. Prerequisites A w

Sep 8, 2022
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Sep 26, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Sep 20, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Sep 21, 2022
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.
Scripts of Machine Learning Algorithms from Scratch. Implementations of machine learning models and algorithms using nothing but NumPy with a focus on accessibility. Aims to cover everything from basic to advance.

Algo-ScriptML Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The goal of this project is not t

Sep 11, 2022
This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Developed By Google!

Machine Learning Hand Detector This is a Machine Learning Based Hand Detector Project, It Uses Machine Learning Models and Modules Like Mediapipe, Dev

Feb 25, 2022
[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning
[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

SurRoL IROS 2021 SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning Features dVRK compati

Sep 1, 2022
Determined: Deep Learning Training Platform
Determined: Deep Learning Training Platform

Determined: Deep Learning Training Platform Determined is an open-source deep learning training platform that makes building models fast and easy. Det

Sep 19, 2022
Diffgram - Supervised Learning Data Platform
 Diffgram - Supervised Learning Data Platform

Data Annotation, Data Labeling, Annotation Tooling, Training Data for Machine Learning

Sep 26, 2022
Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset
 Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset

Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset This repository provides a unified online platform, LoLi-P

Sep 24, 2022
A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform

Aug 30, 2022
Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

Megaverse Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of the engine enables ph

Sep 15, 2022
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

Sep 14, 2022
A deep learning based semantic search platform that computes similarity scores between provided query and documents

semanticsearch This is a deep learning based semantic search platform that computes similarity scores between provided query and documents. Documents

Nov 30, 2021
A platform for intelligent agent learning based on a 3D open-world FPS game developed by Inspir.AI.

Wilderness Scavenger: 3D Open-World FPS Game AI Challenge This is a platform for intelligent agent learning based on a 3D open-world FPS game develope

Sep 2, 2022
An easy-to-use federated learning platform
An easy-to-use federated learning platform

FederatedScope is a comprehensive federated learning platform that provides convenient usage and flexible customization for various federated learning

Sep 23, 2022
🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python
🔥 Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

Cogitare is a Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python. A friendly interface for beginners and a powerful too

Jun 14, 2022
Visualizer for neural network, deep learning, and machine learning models
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Sep 26, 2022