Description
Big Code is about analyzing vast amounts of rich data (hence the term Big) available in software repositories to uncover interesting and actionable analytics about software systems and projects and enable a new class of applications that leverage these repositories. I use the term Big Code interchangeably with the terms Mining Software Repositories (MSR) and Software Analytics. Software repositories can be source control systems, archived communications between project personnel, defent and issue tracking systems and in general systems that help manage the progress and maintenance of software projects. Big Code could benefit software practitioners and researchers support the maintenance of software systems, improve software design/reuse, and empirically validate novel ideas and techniques. In addition patterns identified can help to understand software development and software evolution, to support predictions about software development, (i.e. predicting program bugs, predicting program behavior, predicting identifier names, or even automatically creating new code) and to exploit this knowledge in planning future development sprints. The topic spans inter-disciplinary research in Machine Learning (ML), Programming Languages (PL) and Software Engineering (SE).
Projects
npm-miner and mining the npm-registry
npm-miner is an infrastructure dedicated into mining the npm registry (the biggest registry of software packages) and reporting the results of applying software quality open source tools to the packages.
Related Publications
2019
2018
Acceptance rate: 14 out of 24 datashowcase paper were accepted
Diploma Theses
I've worked on the subject with diploma theses students in the following proejcts:
- eslint-config-pop: An eslint configuration with the most popular configurations found in the npm registry, by Panagiotis Sakkis. [eslint plugin]