This article will be the first in a series where we look at version control and how it can help you to effectively work with data. Version control is used heavily in software development and can also be applied to documents, web sites or any other files that may change over time.
Version control is a way of tracking changes to files over time. It allows you to revert to a previous version of a file or to work on a file in different states, keeping separate versions of the same file. In software development this could mean having separate versions of a piece of code. For data analysis it could mean having one version of your dataset that you use for analysis and a separate version that you use for your documentation. Version control is a means for managing these files and also keeping a history of changes to the files over time.
Version Control and Data Analysis
For these articles we will consider version control from the perspective of performing data analysis on some Moodle data. Your analysis process could involve three stages which you may want to keep separate:
- Cleaning your datasets
- Analysing your data
- Communicating your findings.
Imagine that you are undertaking a project to determine how the length of courses in your Moodle installation relates to the completion rates of those courses. You have run some reports in Moodle and saved the data as CSV files which you will be working with. You also have some data from a separate database that holds student information beyond that which is contained within Moodle. These datasets need to be cleaned up before you analyse them and then prepare a presentation to communicate what you have found.
You completed a similar process a couple of years ago looking at student retention. This proved to be painful and you faced some major problems. You did a bunch of work cleaning up one of the files only to find that some of the data you had ‘cleaned’ (read ‘removed’) you actually needed. So you had to start over again.
Last time you did most of your analysis using Excel. This worked, but for this investigation you have decided to write some python code instead. Writing code is a new skill for you and you know you will make plenty of errors. You want to be able to go back to a known good version of your software without saving multiple versions of the same file.
Another issue you faced last time was preparing the data for presentation. You would make changes to the Excel file so you could take screenshots to put into your PowerPoint and PDF files. Then when you went back to use the files you had to keep undoing these changes. There must be a better way.
Version Control to the Rescue
Version control can address each of the issues you faced last time. It will allow you to keep different versions of the same file, so you can format it for presentation in one version without affecting the version you use for analysis. It can keep previous versions of a file and a record of the changes made to the file. This will allow you to go back to a previous version if you make mistakes in your software and want to undo those ‘bugs’. You can even use it as a form of backup by having the files stored online and also on another device, with all the different file versions maintained.
Version control is a powerful tool and essential if you are writing code or developing software to perform data analysis. In the next article we will look at a piece of version control software called Git and see how it enables version control.
- Version Control – Git GUI – 7th March 2023
- Version Control – Git Command Line Tool – 7th February 2023
- Version Control – Creating Repositories – 7th January 2023
3 thoughts on “Version Control”
Pingback: Version Control - Creating Repositories - ElearningWorld.org
Pingback: Git - Version Control - ElearningWorld.org
Nice introduction to version control !
I was thinking I don’t use this (not being a programmer) … but then I though about the work I do in Logic Pro (Digital Audio Workstation) and thought that the different versions of arrangements are actually a very similar process, and enable me to rollback and compare work.
And I guess many people use version in Microsoft Word documents and similar too.
I know Dropbox gives me file historic versions.
So maybe we are often versioning, but don’t quite realise it?