I am a technical staff member at Lincoln Lab at MIT. Previously, I was a research scientist at Georgia Tech and a NAS fellow at NRL. I got my PhD in Computer Science from Georgia Tech in 2012. My research is broadly focused on applications of machine reasoning and machine learning to natural language processing problems. I am currently investigating data-driven model discovery systems in an effort to tackle the problem of automated machine learning.
| || || ||1||2||3||4|
|Sublinear time algorithms. Even as recently as early 2000's, we would be happy to settle for a polynomial time algorithm for any non trivial algorithmic problem. It would had been hard to imagine doing better than linear time; after all, we expect algorithms to consider all of its input in order to make a decision. How the times have changed! For the kinds of data sets that I have been dealing with these days, linear is not good enough. The prevalence of extremely large data sets in a wide variety of settings is growing rapidly and beating linear time has become a matter of practical concern as opposed to mere academic curiosity. One solution often touted to this problem is the MapReduce programming model - processing data sets with parallel, distributed algorithms on large clusters. Although this works well in some cases, it's utility has been blown out of proportion in my humble opinion. In many cases, this paradigm is not applicable or does not yield any significant advantage. This is especially true in many big graph data sets. |
If linear time is not good enough, it is natural to wonder the existence of sublinear time algorithms for some of these problems. In fact, there has been a lot of recent interest in this direction. Sublinear time is a crazy concept - it allows algorithms to process only a small fraction of the input and yet demands a reasonable solution. There are a handful of problems for which deterministic sublinear time algorithms are known. But in a majority of cases, the algorithm must use randomization and give approximate answers. There is a growing body of work in this area and recent results have shown that there are classical optimization problems whose values can be approximated in sublinear time. Also, property testing has been applied to give sublinear algorithms for a variety of problems. Further, several cool techniques like low rank approximation of matrices have emerged for designing sublinear algorithms. This is all good news! Still, the scope of sublinear algorithms remains minuscule compared to the problems out there with existing solutions considered efficient a decade ago but are no longer acceptable today.
I am grappling with one such graph matching problem presently, namely, maximum weighted matching (MWM) problem. MWM is a classic graph problem for which exact polytime solutions have been known for some time. But for massive graphs this is a non-starter. There is some (relatively) recent work where linear-time approximate algorithms have been worked out for the unweighted case (Drake & Hougardy, 2003; Pettie & Sanders, 2004). I am trying these, but strongly suspect that they may also not fit the bill. Meanwhile, I don't think there are any known sublinear time solutions for this problem. Hope theoreticians are listening!
- Drake, D. E., & Hougardy, S. (2003). Improved linear time approximation algorithms for weighted matchings. In Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques (pp. 14-23). Springer Berlin Heidelberg.
- Pettie, S., & Sanders, P. (2004). A simpler linear time 2/3− ε approximation for maximum weight matching. Information Processing Letters, 91(6), 271-276.
|Results for the 2016 NIST Speaker Recognition Evaluation was announced today. I was part of the MIT team which was placed second in a pool of more than 45 serious research teams competing from around the world. My contribution included developing a backend for calibrating and fusing several different speaker recognition models developed by other researchers on my team. Calibration is a serious issue when there are several different models and we have to combine their individual predictions in order to make an overall prediction. This issue arises because different models have their own quirks - for example, some models tend to predict probabilities conservatively (meaning closer to mid-range), some toward extremes, and others none at all (true conditional probabilities are unknown). If your metric cares about exact probabilities (e.g., logarithmic loss), we need to calibrate the models before fusing their predictions to get an average estimate. This involves a post-processing step where you learn the characteristics of the model behavior. There are two popular methods for calibration: Platt's scaling and Isotonic regression. Platt's scaling amounts to training a logistic regression model on the estimator outputs. In Isotonic regression, the idea is to fit a piecewise-constant monotonically increasing function (e.g., stair shaped function) instead of logistic regression. Alas, I cannot disclose the secret sauce that went into our backend calibrator yet.|
A guy from Montana trailer park overturned 150 years of biology. Biology textbooks tell us that lichens are symbiotic alliances between two species - an alga and a fungus. But Toby Spribille, raised in a Montana trailer park and home-schooled by what he now describes as a “fundamentalist cult” overthrew that idea. He fell in love with science, but had no way of feeding that love with higher education because he didn't have formal education and transcripts. But one German university overlooked that fact and offered him a chance to pursue college and postgraduate education, where he studied lichens.
| ||“There’s been over 140 years of microscopy. The idea that there’s something so fundamental that people have been missing is stunning” - Toby Spribille|
Lichens have an important place in biology. Early scientists thought that they were plants. But 150 years ago it was revealed that they’re composite organisms, consisting of fungi that live in partnership with algae. However, in all these years, biologists have tried in vain to grow lichens in laboratories. Whenever they artificially united the fungus and the alga, the two would never fully recreate their natural structures. It was as if something was missing. Toby showed that lichens are not alliances between two organisms. Instead, they’re alliances between three. All this time, a second type of fungus has been hiding in plain view. Read more here.
Fascinating science and a great example of why innate interest in science should be nurtured in all students.
|Imbalance is a common occurrence in real-world datasets wherein the classes do not have (roughly) equal priors. For example, you may have a binary classification problem with 1000 samples. A total of 900 samples are labeled 0 and the remaining 100 samples are labeled 1. This is an imbalanced dataset and the ratio of Class-0 to Class-1 instances is 900:100 or more concisely 9:1. You can have a class imbalance problem on two-class classification problems as well as multi-class classification problems. Imbalanced data pose serious challenges to the task of classification. Dealing with imbalanced classes can be frustrating. You may discover that all the great results you were getting is a lie (i.e., accuracy paradox). Your accuracy measures might tell you that your models are doing great, but this accuracy might only be reflecting the underlying class distribution. In other words, your model is very likely predicting one class regardless of the data it is asked to predict. One approach to combating imbalanced data is to under-sample the majority class. This increases the sensitivity of the classifier to the minority class, but you are throwing away valuable data. Or you can over-sample the minority class which blunts the sensitivity of the classifier to the majority class. Or you can do both, intelligently. This is precisely what is behind the SMOTE technique (Chawla et. al., 2002). A combination of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. If you are frustrated by imbalanced classes, check out the SMOTE implementation in the imbalanced-learn Python package.|
- Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
|Representing data as matrices is ubiquitous in Machine Learning (ML). But the challenge for scalable ML systems is that these structures can become enormous, with millions or hundred of millions of rows. One way to mitigate this problem is to reduce the size of the matrices by leaving out a bunch of rows. But in order for computations on them to yield approximately the right results, the remaining rows have to be in some sense representative of those that were omitted. Cohen and Peng (2014) have developed an algorithm to efficiently sample the rows of a matrix while preserving the p-norms of its product with vectors, finding the smallest possible approximation of the original matrix that guarantees reliable computations. They demonstrate that their algorithm is optimal for condensing matrices under any norm. This is an important contribution towards making ML more scalable.|
Last night I watched the movie "The man who knew infinity," a biopic of the Indian math genius Ramanujan. I enjoyed the movie and felt that it was a much better movie compared to "The imitation game," about the life of another great mathematician Alan Turing. Some dramatic license was taken to make the movie, but that's to be expected. Interestingly, I found out that Ken Ono was involved in the making of the movie. He is a well-known number theorist at Emory University (in Atlanta) and I happened to attend one of his talks in 2011. Ono and his student put out a paper on arXiv recently called "The 1729 K3 Surface," where they revisit the famous "taxi-cab" number. After studying Ramanujan's writings first-hand, they found that he was thinking about K3 surfaces several decades before the concept was even coined by the leading mathematicians. This adds yet another chapter to the list of spectacular recent discoveries involving Ramanujan's notebook. K3 surfaces is an exceedingly difficult mountain to climb and an important next frontier in mathematics. That Ramanujan gave remarkable examples illustrating some of their features in 1919 simply blows my mind.
|A lot of datasets these days come in the form of large and complex networks. They describe intricate systems with entities being modeled as nodes and their relationships as edges. These networks contain a wealth of information, but that information is often hidden within network patterns which are difficult to uncover. While deciphering these patterns is crucial, computational analyses of large networks are often intractable. So, many questions we ask about the world cannot be answered exactly even with unlimited compute power and time. Our only hope is to answer these questions approximately (i.e., heuristically) and prove how far the approximate answer is from the exact one in the worst case. So far, a majority of scalable heuristics for network analysis consider only simple descriptors of a network such as node count and degree, capturing only its lower-order connectivity patterns. But they suffer from a severe limitation: two networks can appear similar when viewed through the lens of simple descriptors, but can reveal a very different connectivity structure when we change the lens to include higher-order descriptors called graphlets (small subgraphs). While this limitation has been known for some time, nothing could be done because this was a tough nut to crack. A scalable technique for uncovering higher-order organization of large networks - at the level of graphlets - was not on the horizon. But to my surprise, I came across a recent article by Benson et al. titled "Higher-order organization of complex networks" in Science (2016) where they propose a new framework for clustering networks on the basis of higher-order connectivity patterns, with guarantees on the optimality of obtained clusters and which scales to billions of edges. This is such an amazing result. Kudos to Benson at al.|
- Benson, A. R., Gleich, D. F., & Leskovec, J. (2016). Higher-order organization of complex networks. Science, 353(6295), 163-166.
|Extreme theorem-proving!!! Researchers have solved a single math problem (boolean Pythagorean Triples problem) using brute-force (SAT solvers on supercomputers, yay!), producing the largest proof ever - over 200 terabytes in uncompressed length. How did they check it? If a computer generated proof is too big for humans to check, what can be said of its epistemic status? That aside, this is big deal. It shows how far we have come since the time when the SAT problem's NP-completeness was understood in the early 1970's. We seem to be readily embracing SAT solvers instead of avoiding them like plague for tractability reasons. The solver community is optimistic that they can solve most SAT problems arising in practical applications. Confused? An explanation for this gap is provided by Dick Lipton in one of his blog entries.|
|The lab that I consider my academic home at Georgia Tech was recently in the news for yet another epic feat. They developed and deployed an AI teaching assistant (TA) called Jill that the students could not tell from the other human TA's until it was announced at the end of the semester. The best part, Jill was TA'ing for an AI course!! It was also featured in the Washington Post.|
|Today is the day of the IJCAI Goal Reasoning workshop. Wish I was there! As a program committee member, I was fortunate to review three excellent papers for this workshop. This workshop has grown in stature quite a bit since it's first meeting in 2010 - case in point, AI Communications has agreed to host a Special Issue on Goal Reasoning and selected papers from the workshop will be invited to participate in the special issue.|