Coding for two audiences: humans and computers
Software code is written to be read by both computers and humans. Machines understand the computational meaning quickly and perfectly, while humans read it the same way natural language does: not as quickly and sometimes incorrectly. With a new three-year contract worth $1.2 million NSF-funded projecta group of software engineers and social scientists at UC Davis, will leverage this bimodality to create tools that make code easier to write, read, and maintain, and improve the overall programming experience.
The project is led by a respected computer science professor Prem Devanbuhis colleagues Cindy Rubio Gonzalez and Aditya Thakur and its cross-campus staff Gerardo Con Diaz from the Institute for Science and Technology Studies and Emily Morgan from the Institute of Linguistics.
“When you write a program, it has two audiences,” Devanbu explained. “Code is meant for human use and for computer use, and the fact that humans choose to write code in a way that is easy for humans to read reflects this bimodality.”
Because code has a human audience, there are millions and millions of lines of code that are freely available on the internet for the team to use. With this wealth of data from different types of programs and programmers of all skill levels, they can train algorithms to enhance the human experience of programming.
writing for people
Making code easy for humans to read is a crucial part of programming. People write, edit, review, and maintain code, and the best way to spot potentially catastrophic errors is through code review, a proofreading-like process in which a second person reads the code and explains what it’s doing. The team plans to train an algorithm that can rewrite code to make it easier to read without changing its computational meaning.
The initial studies, led by Morgan, showed that these small changes – the equivalent of changing from “pepper and salt” to “salt and pepper” – can make a big difference. Your goal is to train an algorithm that knows this and can read through code to give it a readability score. The higher the score, the harder it is to read. The program can then rewrite the code until the readability score drops as low as possible. Improving readability helps programmers and code reviewers analyze the software and easily find and fix potential bugs.
“Because these programs have well-defined meanings, we can change the code without affecting the meaning,” Devanbu said. “If I can change my program so that the meaning doesn’t change but it’s a lot easier to read, then code review becomes a lot easier.”
Learning through bimodality
The team also plans to use the data to improve programming for beginners. Beginners can be discouraged when their program keeps crashing, but often the problem is a minor error like a typo or missing parenthesis instead of larger systematic problems. According to Devanbu, if the team can train an algorithm that recognizes enough different programs, syntaxes, and operations, it can function like a spell checker that can detect spelling and grammatical errors in code. This would encourage beginners and focus on learning how to code instead of finding those bugs.
“The machine learning models are clever. So if you convert Celsius to Fahrenheit, for example, the model would have seen that formula many times to remember it,” he explained. “If you’re using variable names that look like they’re temperature variable names, then it knows that’s probably what you intended, and it could fix that.”
Bimodal thinking about code forces programmers to think about both the software they are writing and the human context in which it is being developed. Devanbu believes this is a crucial part of the computer science curriculum, so for the final part of the project he will be working with Díaz to develop a new curriculum that will teach students programming and ethics at the same time from the start of their college careers.
“Coding is a deeply human experience, even if we don’t always think of it that way,” he said. “When you learn to write in a natural language, you are taught to write in the context of a human experience, such as war or romance, and asked to think about how that affects the way you write. Code shouldn’t be taught any other way.”