You have pulled some data out of your Learning Management System (LMS) and are ready to get underway analysing it. What analysis can you do on this data? Examining the structure of the dataset will tell you. With an understanding of the structure of your dataset you can start to formulate questions and identify how you will analyse the data to answer these questions. In this article we look at how to understand the structure of your dataset.
What entities are in the dataset?
Your first step in understanding your dataset is to understand who or what your dataset holds information on. We term these items that the dataset holds information on entities. An entity could be an individual user, a course, a programme of learning, a tutor, or the LMS itself. Each row will contain information for one entity and the dataset will represent the available data for a group of entities.
The entity is the subject of a row. In our example below you can see that the subject of the row is the student – the data in the row is specific to the Student ID in the first column. This means that our sample dataset contains information on students; students are the entities of interest for us.
What makes an entry unique?
Each record represents a single entry in the dataset. Each entry will contain a unique set of values for one or more columns. Understanding how an entry is unique from another entry will help you to understand what data the dataset contains. For instance, you can see in our sample data that a Student ID can appear more than once, but there is only one of each combination of Student ID and Course ID. What does this tell us about what makes an entry in this dataset? In this example, an entry is the information for one student in one course. Examine the unique columns for an entry to understand the structure of your dataset.
What data is held for an entity?
Do you know what data is available for each entity in your dataset? You will after you examine the columns that are present. Look at our sample data and list the information held for each entity. Your list should identify that we have the following information available for each student:
- The student’s courses
- What courses the student is active in
- How much of each course the student has completed
- The student’s current status in each course.
What analysis is possible?
Use your understanding of the structure of the dataset to determine what analysis is possible. Analysis will take the raw data that you hold and allow you to draw insights and conclusions from it. The key to analysing data is to understand the data types of each column. What analysis is possible with our sample data? A look at the available columns suggests:
- The breakdown of courses a student is enrolled into based on the Course ID column
- Number of students active in a course based on the Course ID and Active columns
- Completion percentage of courses for a student or all students in a given course
- The status of students in various courses using the Course Status column.
We could take this analysis further by joining this data onto another dataset, such as access logs for the LMS or a dataset of course information. Joining datasets is a topic for a future article.
Summarising the data
The last step in understanding your dataset is to summarise what it holds. Use this summary to guide your analysis, to give you pointers on what questions you could ask of the data, and to highlight any other datasets you may need to combine the dataset with to get the full picture you are after. Pause and think about how you would describe the example dataset before reading my take on it below.
The dataset tells us the status of each course a student is enrolled into and their completion percentage for that course. It shows us the student’s enrolment history along with their current workload. We could combine this dataset with a course dataset or access logs from the LMS to gather a wider picture of the student’s study progress within the institute.
You must understand your dataset to recognise the analysis that you can complete on it. This will show you any gaps in the data that you can fill by combining it with one or more other datasets. From here you can start to formulate questions to explore which will be the topic of our next article.
If you missed our previous posts in this series you can view them here: