In the first article in this series we introduced you to the gold mine of data that lives inside your Learning Management System (LMS). We will take a look in this second article at what data is and the types of data you will find within your LMS. Knowing what data is available will help you to recognise what this data can tell you.
What is Data?
Data is information that we can extract from the LMS. This data could relate to learners, courses, tutors, or even the LMS itself. The data we extract from the LMS can be termed a dataset. For these articles we will be focusing on tabular data. Tabular data is data that consists of columns (fields) and rows (records) like you would find in a spreadsheet. It is quite easy to get data out of Moodle in a tabular form and to then manipulate this data, either directly in Excel or using software such as Python. We will examine each of these approaches in later articles.
Here is the sample data from the first article showing the status of students in a course. We will use this example to examine the different types of data available.
Each row contains the data for a single entry in the dataset. In our example a row is a single student in a single course. We can see in our data that the same student (2000) is in different courses, with each course on a separate row (SYS-200 and ECO-301). Each row in this dataset will be a unique combination of Student ID and Course ID.
Each piece of data for a record will be in its own column. These pieces of information can be termed variables. A column should only hold one type of data and the type of the data will determine what values it can hold and what you can do with those values. This ‘doing’ is analysis and we will be looking at this throughout this article series.
There are several forms that a piece of data could take, known as its type. These include numbers, text, dates, booleans (True / False) or even aggregations (such as the sum, minimum, maximum, average etc.). These data forms have different properties and even have variations. There are different types of numbers (floats, ints, real etc.) and different ways of storing dates. We will keep it simple though and just refer to ‘numbers’ and ‘dates’.
In our example dataset we can see examples of several different data types:
- Student ID looks like a number but could also be text
- Course ID is text
- Active could be a boolean (True / False or Yes / No)
- Percentage Completed is a number
- Course Status is text and limited to specific categories such as ‘Not Yet Started’, ‘In Progress’, ‘Completed’, ‘Failed’.
The Student ID column is a tricky one. It looks like a number and it is probably best stored as a number given that each new student increments the number by one. However, we will not be wanting to do any calculations on the Student ID, such as adding them up or getting their average. Therefore, we would probably want to convert the column to text when we analyse this dataset. We will learn more about this when we cover cleaning data.
The Importance of Data Types
The data type of the Student ID column introduces us to a critical topic: the importance of selecting the correct data type. The type of a piece of data determines what you can do with it. You can perform math operations on a number but not on text, even if it looks like a number. (Technically you can perform math operations on text, however, the results will likely not be what you were after…). A Boolean is easy to filter on. Limiting text data to specific values allows you to create categories out of them which you can also filter on or group by. You need to understand what type your data is to know what you can do with it.
It may be possible to change the type of data in a column. This will depend on the type of the data and the values it currently holds. We will look at this manipulation of data in a future article. For now, we want to understand what data we currently have available and what we can do with that data. You can try to understand how your dataset is structured once you understand the data that makes up your dataset. With this understanding you can start to decide how you will use the data in your possession.
A tabular dataset is made up of pieces of data stored in columns and rows. Each row contains data specific to the subject of that row, which in our example is a student in a course. The type of data in each column will determine what information the dataset holds and what we can do with that information. In our next article we are going to pull these ideas together to understand how our dataset is structured and what this tells us about how we can use the data.