Big Code

Mining Software Repositories and Software Analytics

Big Code is about analyzing vast amounts of rich data (hence the term Big) available in software repositories to uncover interesting and actionable analytics about software systems and projects and enable a new class of applications that leverage these repositories. I use the term Big Code interchangeably with the terms Mining Software Repositories (MSR) and Software Analytics. Software repositories can be source control systems, archived communications between project personnel, defent and issue tracking systems and in general systems that help manage the progress and maintenance of software projects. Big Code could benefit software practitioners and researchers support the maintenance of software systems, improve software design/reuse, and empirically validate novel ideas and techniques. In addition patterns identified can help to understand software development and software evolution, to support predictions about software development, (i.e. predicting program bugs, predicting program behavior, predicting identifier names, or even automatically creating new code) and to exploit this knowledge in planning future development sprints. The topic spans inter-disciplinary research in Machine Learning (ML), Programming Languages (PL) and Software Engineering (SE).

Projects

npm-miner and mining the npm-registry

npm-miner is an infrastructure dedicated into mining the npm registry(the biggest registry of software packages) and reporting the results of applying software quality open source tools to the packages.

Related Publications

2018

conference Kyriakos C. Chatzidimitriou, Michail Papamichail, Themistoklis Diamantopoulos, Michail Tsapanos, Andreas L. Symeonidis: npm-miner: An Infrastructure for Measuring the Quality of the npm Registry, 2018, MSR '18: 15th International Conference on Mining Software Repositories [PDF] [Dataset] [Code] [DOI: 10.1145/3196398.3196465]