Great Packages for Data Science in Python and R
by Hang Li
Domino’s Chief Data Scientist, Eduardo Ariño de la Rubia talk about Python and R as the “best” language for data scientists.
A list of useful packages from this talk.
Python
- Feather – Fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
- Ibis – Productivity-centric Python data analysis framework for SQL systems and the Hadoop platform. Co-founded by the creator of pandas
- Paratext – A library for reading text files over multiple cores.
- Bcolz – A columnar data container that can be compressed.
- Altair – Declarative statistical visualization library for Python
- Bokeh – Interactive Web Plotting for Python
- Blaze – NumPy and Pandas interface to Big Data
- Xarry – N-D labeled arrays and datasets in Python
- Dask – Versatile parallel programming with task scheduling
- Keras – High-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano.
- PyMC3 – Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano
R
- Feather – Fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
- Haven – Import foreign statistical formats into R via the embedded ReadStat C library.
- readr – Read flat/tabular text files from disk (or a connection).
- Jsonlite A fast JSON parser and generator optimized for statistical data and the web.
- ggplot2 – A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”.
- htmlwidgets – A framework for creating HTML widgets that render in various contexts including the R console, ‘R Markdown’ documents, and ‘Shiny’ web applications.
- leaflet – Create and customize interactive maps using the ‘Leaflet’ JavaScript library and the ‘htmlwidgets’ package.
- tilegramsR – Provide R spatial objects representing Tilegrams.
- dplyr – A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
- broom – Convert statistical analysis objects from R into tidy data frames
- tidytext – Text mining for word processing and sentiment analysis using ‘dplyr’, ‘ggplot2’, and other tidy tools.
- mxnet – The MXNet R packages brings flexible and efficient GPU computing and state-of-art deep learning to R.
- tensorflow – TensorFlow™ is an open source software library for numerical computation using data flow graphs.