Data mining. Textbook - Pavel Minakov читать книгу онлайн бесплатно без сокращений (страница 3)

Pavel Minakov Data mining. Textbook читать онлайн страница 3

1 2 3 4 5

Вперед

Revealing the Data Anomalies Significance

In the context of evaluating data anomalies, it is useful to identify the relevant circumstances. For example, if there is an anomaly in the number of delayed flights, it may happen that the deviation is quite small. If many flights are delayed, it is more likely that the number of delays is very close to the natural probability. If there are several flights that are delayed, it is unlikely that the deviation is much greater than the natural probability. Therefore, this will not indicate a significantly higher deviation. This suggests that the data anomaly is not a big deal.

If the percentage deviation from the normal distribution is significantly higher, then there is a possibility that data anomalies are process related, as is the case with this anomaly. This is additional evidence that the data anomaly is a deviation from a normal distribution.

After analyzing the significance of the anomaly, it is important to find out what the cause of the anomaly is. Is it related to the process that generated the data, or is it unrelated? Did the data anomaly arise in response to an external influence, or did it originate internally? This information is useful in determining what the prospects for obtaining more information about the process are.

The reason is that not all deviations are related to process variability and affect the process in different ways. In the absence of a clear process, determining the impact of a data anomaly can be challenging.

Analysis of the importance of data anomalies

In the absence of deviation from the probability distribution evidence, data anomalies are often ignored. This makes it possible to identify data anomalies that are of great importance. In such a situation, it is useful to calculate the probability of deviation. If the probability is small enough, then the anomaly can be neglected. If the probability is much higher than the natural probability, then it may provide enough information to conclude that the process is large and the potential impact of the anomaly is significant. The most reasonable assumption is that data anomalies occur frequently.

Conclusion

In the context of assessing data accuracy, it is important to identify and analyze the amount of data anomalies. When the number of data anomalies is relatively small, it is unlikely that the deviation is significant and the impact of the anomaly is small. In this situation, data anomalies can be ignored, but when the number of data anomalies is high, it is likely that the data anomalies are associated with a process that can be understood and evaluated. In this case, the problem is how to evaluate the impact of the data anomaly on the process. The quality of the data, the frequency of the data, and the speed at which the data is generated are factors that determine how to assess the impact of an anomaly.

Analyzing data anomalies is critical to learning about processes and improving their performance. It provides information about the nature of the process. This information can be used in evaluating the impact of the deviation, evaluating the risks and benefits of applying process adjustments. After all, data anomalies are important because they give insight into processes.

The ongoing process of evaluating the impact of data anomalies provides valuable insights. This information provides useful information about the process and provides decision makers with information that can be used to improve the effectiveness of the process.

Вперед