Direkt zum Hauptbereich

Posts

Posts mit dem Label "Random" werden angezeigt.

Data Analytics: How to protect against random correlations

Many analysts know it: You find a significant correlation - and then you wonder how secure the knowledge actually is. Of course, a good model inevitably requires an evaluation based on test data, because there is no way around it. But sometimes doubts remain - because even test data can be unreliable. At the beginning of this year, during my work, I found such a significant correlation (0.8 after Spearman). However, an evaluation was only possible to a limited extent, because due to external circumstances many attributes of the data can change in the respective department and not all changes are documented. Incompleteness in the dataset - that too will be known to analysts. Now I asked myself the question: how can I secure my knowledge nevertheless. The nature of the correlation was of central importance. The situation was similar to a well-insulated building in which the room temperature is dependent on both the outside temperature and a heating. Due to the thick insulation of t