The paper proposed to select unlabelled training examples based-on the embedding distance between the given exemplar and the query data. A pretrained BERT model is used to compute the embedding for the training examples. The problem formulation of selecting balanced labels in a highly skewed training set and the complexity bound is appreciated by all the reviewers. The general consensus is that the paper adds an interesting contribution to active learning methods applied to word sense disambiguation. The current version of the paper would be greatly strengthened by including more datasets. Also, the clarity of the paper could be improved by highlighting the proposed active learning method in an algorithm box.