2020/08/02
·
7
minute read
 
Almost in every AI or Machine Learning conferences I’ve been to lately, there’s a track dedicating to biases or “injustices” in algorithmic decisions. Books have been published (Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor, Algorithms of Oppression: How Search Engines Reinforce Racism etc.) and fear has been spread (Elon Musk says AI development should be better regulated, even at Tesla ).
The fear of unknown is, perhaps, more persuasive than a realistic survey of the state of AGI (Artificial General Intelligence) development.
2020/02/01
·
7
minute read
 
Data engineers rarely have a say in what’s coming in the systems we’ve built. This presents great challenges where data systems often need to be tolerant about unseen events and at the same time have extra monitoring or QA processes to allow human to determine if the exception actually signals a broader system failure. Machine learning systems have brought this challenge to a new level - in data pipelines, system failures are mostly deterministic or at least reproducible when certain conditions are met.
2016/05/16
·
6
minute read
 
I’ve run a couple classification ML algorithm on the dataset. What makes this problem interesting is that most of the students did not pass the course. I’ve re-sampled the positive cases multiple times to make the algorithms punish the false positive cases more severely.
 
2015/12/29
·
438
minute read
 
Using mpld3 to do visualization in ipython with Kaggle’s airbnb data. First experience is great!