Local Data Analytics Stack (Airflow + Superset)

2020/04/08 · 3 minute read

airflow analytics covid19 data data-analytics data-engineering docker docker-compose python superset

At work, the BI environment is often setup and ready to go. At home, when I need to do data analysis myself, it really helps if there’s data pipeline and visualization tools ready to go. Over time, I’ve developed my go-to open source data analytics stack that runs on my local machine. The repo: https://github.com/l1990790120/local-data-stack is self-contained. In this post, I’ll share a bit more details on how it works and how to use it.

Classify Unbalanced Cases with Harvard edX MOOC Dataset

2016/05/16 · 6 minute read

data-viz education highered machine learning python

I’ve run a couple classification ML algorithm on the dataset. What makes this problem interesting is that most of the students did not pass the course. I’ve re-sampled the positive cases multiple times to make the algorithms punish the false positive cases more severely.

Draw with US College Data

2016/05/04 · 233 minute read

data-vis education highered python

Showcase on what you can do with IPEDS data API. Choropleth with d3. I’ve also tried this on beaker notebook. More details to come!

Kaggle's Airbnb New User Booking Trend

2015/12/29 · 438 minute read

d3 data-vis deep-learning kaggle machine-learning python timeseries

Using mpld3 to do visualization in ipython with Kaggle’s airbnb data. First experience is great!

College Enrollment Forecast (Institution Level)

2015/12/14 · 700 minute read

education forecast highered python timeseries

Using ARIMA to forecast college enrollment for 2015 and 2016 at institution level.

College Enrollment Forecast (State Level)

2015/12/08 · 7 minute read

education forecast highered python timeseries

Using ARIMA to forecast college enrollment for 2015 and 2016 at state level.