Please log in to watch this conference skillscast.
Many codebases contain code that is overly complicated, hard to understand and hence expensive to change and evolve. Prioritizing technical debt is a hard problem as modern systems might have millions of lines of code and multiple development teams — no-one has a holistic overview. In addition, there's always a trade-off between improving existing code versus adding new features so we need to use our time wisely. So what if we could mine the collective intelligence of all contributing programmers and start to make decisions based on information from how the organization actually works with the code?
In this talk, you'll see how easily obtained version-control data let you uncover the behaviour and patterns of the development organization. This language-neutral approach lets you prioritize the parts of your system that benefit the most from improvements so that you can balance short- and long-term goals guided by data. The specific examples are from real-world codebases like Android, the Linux Kernel, .Net Core Runtime and more. This new perspective on software development will change how you view code.
Q&A
Question: What version control systems does this work on?
Answer: The tools I mention work towards Git, but I have used the techniques on SVN, Mercurial, TFS as well. In that case I simply convert the original VCS into a readonly Git repository that I point the analysis to.
It’s an automated conversion using tools like git-tfs, etc.
Question: Is it fair to say that this kind of code refactoring wouldn't help in architectural changes? E.g. Monolith to cloud native
Answer: I like to think that behavioral code analysis is important during legacy migrations and architectural change too. Two main use cases:
- Pull the risk forward: use hotspots to figure out what to migrate first.
- Supervise the new system: make sure the new system doesn’t end up with problematic hotspots from the beginning. Use behavioral code analysis as a safety net.
Question: I think a hotspot like this is a useful objective data point to help facilitate the refactoring / rearchitect to avoid the same again.
Answer: I’ve also found that hotspots are useful to communicate between tech and non-tech stakeholders like product managers. That way, we can make our case for refactorings and re-designs based on data and also show visible improvements.
Question: When you're looking at frequency of change, I'm assuming that's only historical. Do you ever use forecasting algorithms, or run model simulations depending on possible business objective changes?
Answer: Yes, I do predictions too. For example, I have an article that explains how CodeScene can predict a future code health decline: https://empear.com/blog/codescene-predict-future-code-quality-issues/
This gives an organization the ability to act early and prevent future issues.
Question: Why doesn't everyone use this stuff? It seems like such a handy tool
Answer: I’ve seen an increased awareness and interest in this space over the past years.
The main issue I have with software is its lack of physics; there’s no way of picking up a software system, turning it around, and inspecting it. I might be biased here, but I do think behavioral code analysis brings us a much needed visibility…within a context.
A tip: when running a behavioral code analysis, I use different history depths:
- Technical analyses like Hotspots and Change coupling: ~1 year or ~6 months back since I’m not interested in historical problems.
- Social analyses like System Mastery and Knowledge Loss: here I use the full history since I want an accurate map over the contribution history.
Question: This is forensically useful stuff. There’s nothing like a rabbit hole with pretty graphs to keep distracted.
Answer: I look at the visualizations from my own code on a weekly basis. It helps me build and maintain a mental model of what the code looks like.
Some additional distractions: https://codescene.io/showcase
Question: Any chance this will become a sonar plugin?
Answer: I’d like to see it the other way around: static analysis — like Sonar — is useful. But it’s most useful in a context. So for example, once I find a hotspot, I do like to view the static analysis findings for that hotspot. That can provide additional insights.
Question: Are there other hotspot “combinations” that you use to analyse technical debt? You mentioned code complexity/frequency and code complexity/active developer. Any more?
Answer: Yes, there are a bunch of other metrics that I use:
- Trends in Planned vs Unplanned work: an increase in unplanned work (e.g. defects, unexpected re-work) typically indicates a growing problem.
- Trends in Code Health: a declining code health in a hotspot is likely to indicate debt with high interest.
I have an article that explains the code health concept in more detail here: https://www.linkedin.com/pulse/measure-health-your-codebase-adam-tornhill/
Question: I didn’t quite catch it in the microservices example … was that analysis across multiple repos (one service per repo) or a monorepo? If the latter, is the former possible with the current tooling?
Answer: It was actually across multiple repos. The example had ~35 git repos.
Question: In that example was it that the code had sync/async call outs to the instances to determine coupling?
Answer: No, in CodeScene, we use ticket/issue references from the commit messages for this. If some commit in one repo references the same ticket/issue as commits in another, then there’s a logical connection. If it happens frequently, then there’s change coupling.
Some Links:
Software Design X-Rays: https://pragprog.com/titles/atevol/software-design-x-rays/
My personal blog: https://adamtornhill.com/
My company blog on tech debt and behavioral code analysis: https://empear.com/blog/
Code complexity in context: https://empear.com/blog/bumpy-road-code-complexity-in-context/
Codescene.io and the public showcases on well-known open source projects: https://codescene.io/showcase
YOU MAY ALSO LIKE:
Prioritizing Technical Debt as if Time and Money Matters
Adam Tornhill
Founder and CTO, CodeScene