EN caret
August 5, 2022

Why the right data set is important in the learning process

Why the right data set is important in the learning process

Dataset (or data set) is a collection of various data types stored in a digital format. Dataset is the main component of any Machine Learning system. Dataset provide the system with the data on which it will learn. Therefore, it should consist of a generalized representation of the real data. In this case data quality is very important. The main features of a good dataset are [1]:


  • accuracy;
  • completeness;
  • reliability;
  • relevance;
  • timeliness.


Accuracy is a key aspect of machine learning data. Inaccurately labeled data significantly affects the quality and accuracy of the model. Completeness means that the data set has all the data that is required to perform a specific task. The data that will be used for learning cannot be contradictory, that is what reliability refers to. In some machine learning models up to date data is very important. The timeliness and relevance of information is an important data quality characteristic, because in the real world, the data on which the model was learned may not appear anymore, which makes the model useless in a given application.


Preparing a custom dataset for a given application is a difficult task. As can be seen from the information presented above the first step is to identify what data is needed and why. The data must be properly collected and I must represent the problem under consideration. The next step is labeling the data appropriately. Labeling is an important step as improperly labeled data reduces the quality of the data. After the dataset is created, the model needs to be trained and tested.





R. L. Sarfin, „5 Characteristics of Data Quality,” 07 05 2021. [Online]. Available: [Date of access: 22 07 2022].

Join our newsletter

Stay up to date with the latest news from Cosmoeye.
An error occurred while processing the form. Please try again.
This address is already in our database.
Thank you for subscribing!
CosmoEye LLC based in Lublin announces that it has received on 11.05.2022 through Arkley Brinc limited liability company ASI S.K.A. under the Program PFR Starter Closed Investment Fund public aid from the European Funds in the amount of 2.000. 000.00 (two million PLN) for the implementation of the project on the development and commercialization of a streaming B2B system for warehouse management and enterprise resource planning, using integrated cameras (hardware) and artificial intelligence tool for real-time image analysis according to the management plan.

This website uses cookies to provide services in accordance with the Privacy Policy. You can specify the conditions for storage or access to cookies in your browser.

Download presentation Download