This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis

Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city

Information about data set

The dataset contains, roughly, TWO groups of files: ● Uber trip data from 2014 (April - September), separated by month, with detailed location information. ● Uber trip data from 2015 (January - June), with less fine-grained location information.

Uber trip data from 2014 There are six files of raw data on Uber pickups in New York City from April to September 2014. The files are separated by month and each has the following columns: ● Date/Time : The date and time of the Uber pickup ● Lat : The latitude of the Uber pickup ● Lon : The longitude of the Uber pickup ● Base : The TLC base company code affiliated with the Uber pickup. These files are named:

● uber-raw-data-apr14.csv ● uber-raw-data-aug14.csv ● uber-raw-data-jul14.csv ● uber-raw-data-jun14.csv ● uber-raw-data-may14.csv ● uber-raw-data-sep14.csv

Uber trip data from 2015

Also included is the file uber-raw-data-janjune-15.csv This file has the following columns: ● Dispatching_base_num : The TLC base company code of the base that dispatched the Uber. ● Pickup_date : The date and time of the Uber pickup ● Affiliated_base_num : The TLC base company code affiliated with the Uber pickup. ● locationID : The pickup location ID affiliated with the Uber pickup These files are named:

  • uber-raw-data-janjune-15.csv

motive of Project

To analyze the data of the customer rides and visualize the data to find insights that can help improve business. Data analysis and visualization is an important part of data science. They are used to gather insights from the data and with visualization you can get quick information from the data.

How to Run the Project

In order to run the project just download the data from above mentioned source then run any file.

Prerequisites

You need to have installed following softwares and libraries in your machine before running this project.

Python 3 Anaconda: It will install ipython notebook and most of the libraries which are needed like sklearn, pandas, seaborn, matplotlib, numpy, scipy.

Installing

Python 3: https://www.python.org/downloads/ Anaconda: https://www.anaconda.com/download/

Authors

DEVA DEEKSHITH and kilari jaswanth(https://github.com/Kilarijaswanth)- combined work

Similar Resources

PLUR is a collection of source code datasets suitable for graph-based machine learning.

PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the datasets. This is done by offering a unified API and data structures for all datasets.

Sep 10, 2022

A library of extension and helper modules for Python's data analysis and machine learning libraries.

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sep 21, 2022

A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Sep 20, 2022

Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

Kats is a toolkit to analyze time series data, a lightweight, easy-to-use, and generalizable framework to perform time series analysis.

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.

Sep 26, 2022

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

MaD GUI Machine Learning and Data Analytics Graphical User Interface MaD GUI is a basis for graphical annotation and computational analysis of time se

Aug 25, 2022

K-means clustering is a method used for clustering analysis, especially in data mining and statistics.

K-means clustering is a method used for clustering analysis, especially in data mining and statistics.

K Means Algorithm What is K Means This algorithm is an iterative algorithm that partitions the dataset according to their features into K number of pr

Nov 1, 2021

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Sep 19, 2022

This repo includes some graph-based CTR prediction models and other representative baselines.

Graph-based CTR prediction This is a repository designed for graph-based CTR prediction methods, it includes our graph-based CTR prediction methods: F

Sep 15, 2022

A single Python file with some tools for visualizing machine learning in the terminal.

A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

Sep 11, 2022
A comprehensive repository containing 30+ notebooks on learning machine learning!
A comprehensive repository containing 30+ notebooks on learning machine learning!

A comprehensive repository containing 30+ notebooks on learning machine learning!

Sep 19, 2022
Pytools is an open source library containing general machine learning and visualisation utilities for reuse
Pytools is an open source library containing general machine learning and visualisation utilities for reuse

pytools is an open source library containing general machine learning and visualisation utilities for reuse, including: Basic tools for API developmen

Aug 19, 2022
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

Sep 21, 2022
Meerkat provides fast and flexible data structures for working with complex machine learning datasets.
Meerkat provides fast and flexible data structures for working with complex machine learning datasets.

Meerkat makes it easier for ML practitioners to interact with high-dimensional, multi-modal data. It provides simple abstractions for data inspection, model evaluation and model training supported by efficient and robust IO under the hood.

Sep 16, 2022
Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms
Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Jan 31, 2022
Sep 18, 2022
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

Sep 15, 2022
Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets
 Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets

Interactive Web App with Streamlit and Scikit-learn that applies different Classification algorithms to popular datasets Datasets Used: Iris dataset,

Nov 18, 2021
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

Sep 20, 2022
Combines Bayesian analyses from many datasets.
Combines Bayesian analyses from many datasets.

PosteriorStacker Combines Bayesian analyses from many datasets. Introduction Method Tutorial Output plot and files Introduction Fitting a model to a d

Feb 13, 2022