How to tackle complex refactorings in big projects

Updated . Posted . Visible to the public. Repeats.

Sometimes huge refactorings or refactoring of core concepts of your application are necessary for being able to meet new requirements or to keep your application maintainable on the long run. Here are some thoughts about how to approach such challenges.

Break it down

Try to break your refactoring down in different parts. Try to make tests green for each part of your refactoring as soon as possible and only move to the next big part if your tests are fixed. It's not a good idea to work for weeks or months and wait for all puzzle pieces to come together to be able to use the GUI or run tests again. Don't be afraid to temporarily add workarounds in your code to achieve this goal. Of course you should get rid of any workaround code before finishing your refactoring! Instead of using workarounds you can try to apply the mikado method Show archive.org snapshot .

Example
You have to move heavy loads of functionality from one or many models to another model (or split a model into different new models). If this includes a lot of associated records, it may not be wise to try to do it all at once. You could use ruby's power and temporarily use method_missing to delegate requests to the "old" model(s) while you try to move one functionality at a time, always making tests green in between.

Know your limits

This might seem obvious, but in my experience is an underestimated aspect that is easily forgotten.
Before you start, consider the limitations of your refactoring. Are there any things you don't have under your control (e.g. an API to another system which is not owned by you but you receive data from) that are critical to your refactoring? - You should address these things first. There's no point in spending a lot of time refactoring your application just to realize at the end that there are unsolvable problems which your whole work depends on.

Example
You are refactoring concerns of a model that is not completely under your control, e.g. because the data it carries is imported through an API, but you want to change the data structure of how the model is saved in your application. Instead of migrating all the contents you already imported and refactoring your application logic first, start with refactoring your importer. You could as a first step build a new (empty) model with the new data structure you want and make your importer be able to transform the data in the way needed for the new model. If this succeeds, you can as a second step refactor your application, migrate your existing records to the new schema and afterwards make the importer use the correct model (now in the new structure) again.
This way you have proof of concept before spending weeks refactoring your application.

Choose the right level of abstraction

So, you started to move some code around and change things and now you need to update all the occurrences, of a model / method. If using your new models / methods does not feel good to you or you notice you have to pass stuff around A LOT (e.g. newly referenced models) you should maybe take one step back from your code and ask yourself if you're using the right level of abstraction. Would things be easier if you could operate "one layer above"? If yes, then it may be worth implementing this layer. It may be helpful to think in terms of patterns: Would your code improve by using decorators or presenter models for example? Would using a facade pattern make things easier?

Stay focused

This point is important when regular development continues during a large (and long-lasting) refactoring.
When doing large refactorings, chances are you see a lot of code that is not directly concerning your changes but could be improved nonetheless. Think twice before you go after each opportunity. On the one hand, a large refactoring is time-consuming as is and you want to finish at some point. On the other hand, unnecessary changes can make your and your fellow developers' life a lot harder.
If you can't restrain your urge to refactor additional code, at least do it in a separate commit. If this commit can be rebased to BEFORE your main refactoring its likely to be "safe" and to not cause any trouble.

Example
Consider this scenario: Your refactoring is finished, merged into master but client approval takes its time and then over time more and more rejects regarding the refactoring come in. Other development has gone on and already changed the code you have refactored. Now the client wants those other things to be deployed to production, but without the refactoring, because there are still rejects. Suddenly you realize that it's not that easy to detangle commits and changes made in the same files! The more files you touched "needlessly", the more complex detangling gets and the harder it will be to merge from master to production or vice versa.

Have a workflow

About branches
If you know that your refactoring will take weeks or months and other development has to go on in the meantime, think of a workflow to handle this before you start. If more than one developer is involved in the refactoring you should have a refactoring-master branch. The on-going development can still happen in master this way. It may make sense to deploy staging from your refactoring-master instead of from master at some point.
Having an extra branch apart from master makes production deployments easier, because you are constantly forced to keep refactoring and on-going development apart (See "Stay focused"). You should, however, regularly try to incorporate the changes of the ongoing development into your refactoring-master branch.

About commits
While refactoring, you should commit often and make rather small commits than big ones and maybe you should not name all of them "wip" (this is common sense, I hope :) ). While working in your own feature branch, you can then regularly squash these small commits to meaningful, bigger ones. When merging into master it's better to have only few (or maybe just one!) commits. Each of them should have a green test suite - otherwise git bisect will not work reliably. Remember: if you for some reason have to detangle functionality / commits (e.g. because of a production deploy with only part of functionality), it's easier with fewer commits (see "stay focused").

Last edit
Henning Koch
License
Source code in this card is licensed under the MIT License.
Posted to makandra dev (2017-01-12 15:21)