By: Jeff Mitchell

Jeff Mitchell — Sun, 24 Sep 2023 19:43:17 +0000

In reply to Nor Hazirah Binti Mohd Zaki.

Hello, often it is better to have no data than to have incorrect data. If an email address was wrong, say missing the ‘@’ symbol, and you guessed what it was, you could be sending communications to the wrong person. If you are making decisions based on your data, the wrong data can cause you to make a poor decision. Removing the inaccurate data by setting it to blank means that it will not impact any further analysis you do on the data, such as calculating averages or finding the max value. It is also a good idea to remove these data points when you are trying to do trend analysis as they can disguise the true trend.

You can also set the value to something that represents an ‘unknown’. In python this could be np.nan or NaT for time data. For text it could be the empty string, or you could use None. This tells you that you have missing data rather than a data point of 0.

When performing the analysis you would normally remove the blank entries and then complete your analysis. This is often better than setting a numeric value to zero. If you want to know how many of each item you have in bands of 5 (0-5, 6-10, 11-15…) you would not want a lot of 0’s representing missing or inaccurate data. So it would be better to remove them first. If you are confident that you can use the average or mean in place of the value, you can do so, but if this does not make sense for your data it is much better to mark it as blank and remove that data point from your analysis. You can always have an ‘Unknown’ category with the number of blanks when you do your analysis.

By: Nor Hazirah Binti Mohd Zaki

Nor Hazirah Binti Mohd Zaki — Sun, 24 Sep 2023 08:43:18 +0000

Hi Jeff, I have a question regarding the inaccurate value data. Why it is much better to leave it blank and how we will interpret the blank data?

By: Anonymous

Anonymous — Sun, 24 Sep 2023 08:40:28 +0000

Hello, I want to ask regarding the value of inaccurate data, why would leaving it blank much be better? and how would we interpret or analyze the blank data.

By: ElearningWorld Admin

ElearningWorld Admin — Sun, 08 Aug 2021 01:02:15 +0000

Great post Jeff !
I love the way you are using Excel, something that everyone has (or an equivalent) to review the data
– and it’s surprising how much ever a visual check can reveal 🙂

Comments on: Cleaning Inaccurate Data

By: Jeff Mitchell

By: Nor Hazirah Binti Mohd Zaki

By: Anonymous

By: ElearningWorld Admin