Data can come in various file formats and can be used in many different data analysis packages. For many applications MS Excel (and its various competitors) is a great tool for having an initial look at a data file prior to using it. In this article I will explain some basic actions you can take to prepare your data for viewing and exploring.
Creating a Viewing File
In this article we are looking at how to create and set up a file for viewing a dataset that you have. This should be a separate file to the one you will use for your data analysis. The purpose of this file is to be viewed so that you can develop an understanding of how the data is structured and explore the contents of the dataset.
The first point to note is that you should save your file in the .xlsx file format. Data files often come in a .csv file format and you will want to use this format when ingesting the data into your data analysis package. In this article we are preparing the file for viewing and exploring, not for completing actual data analysis on. The actions I will be discussing in this article will all be lost if you save the file in the .csv file format, so please save a copy as .xlsx before progressing. I often add ‘_VIEWING’ at the end of the file name to remind me it is just for viewing and not for actual data analysis.
If you are working with a .csv file you will notice that all of the columns are bunched together. The first action I take is to expand the columns. This can be done by selecting all the data (either CTRL-A or clicking the triangle at the top left of the spreadsheet (1)) and then double-clicking any of the lines separating the columns along the column headings row (2).
From here you can then wrap the text if needed to prevent the columns running so far to the right that you lose the overall view of each column. This can be accessed from the Home tab on the Ribbon.
It can be hard to focus on a dataset that has a lot of columns. You can hide any extraneous columns or those you aren’t interested in straight away. Select the column by clicking on its letter in the column headings row (1), right-click and select Hide (2).
To unhide the column you just need to select the column either side of the hidden column (1), right-click and select Unhide (2).
Freezing the Column Heading Row
Usually the top row of the data will have the column headings in it. Sometimes there will be additional rows above the column headings, such as a summary of the data source or parameters of the report. You can hide these rows as described above (selecting the appropriate rows rather than columns to hide). I always like to freeze the row with the column headings so that I can still see them as I scroll down through the dataset. If the top row has the column headings you can just select the View tab and then select Freeze Panes –> Freeze Top Row.
If you want to freeze more than the top row, click into a cell in the row below the last row that you want to freeze and then select Freeze Panes. You can also freeze the first column if you want to keep it visible. This can be useful when the first column is the entity the row data refers to.
Bold the Column Heading Row
I like to select the column heading row by clicking on the number to the left of the row (which selects it) and then clicking CTRL-B to apply bold formatting. I find this makes it much easier for me to see the column names as it makes them distinct from the row data.
A final preparation step that I like to take is to add filters to each column. This then allows me to easily see the unique values in each column or to apply a filter to narrow down the data I am looking at. Filters can be added by going to the Data tab (1) and then selecting clicking Filter (2).
The actions above are some simple steps you can take to make it easier to view and explore the dataset you are working with. If you perform these steps regularly you could take it to the next level and record a macro that performs each of these steps. Then all you need to do is open a copy of your file and run the macro.
What actions do you like to take to prepare a file for viewing? Comment on these below. In a future article we will look at how you can use this viewing file to explore a dataset, again in Excel.