The process of finding patterns in data manually, is tedious as data here is ubiquitous, and has witnessed a dramatic transformation in strategy throughout the years. Whether we refer to hunters seeking to understand the animal migration patterns, or farmers attempting to model harvest evolution, or turn to more current concerns, like sales trend analysis, assisted medical diagnosis, or building models of the surrounding world from scientific data, we reach the same conclusion: hidden within raw data we could find important new pieces of information and knowledge. Traditional approaches for deriving knowledge from data rely strongly on manual analysis and interpretation. For any domain scientific, marketing, finance, health, business, etc. the success of a traditional analysis depends on the capabilities of one more specialists to read in the data. Eg: Scientists go through remote images of planets and asteroids to mark interested objects, such as impact craters.
Moreover, the volume of generating data is increasing dramatically, which makes traditional approaches impractical in most domains. Within the large volumes of data hidden strategic pieces of information in fields such as science, health or business. Besides the possibility to collect and store large volumes of data, the information era has also provided us with an increased computational power. The natural attitude is to employ this power to automate the process of discovering interesting models and patterns in raw data. Thus, the purpose of the knowledge discovery methods is to provide solutions to one of the problems triggered by the information era: “data overload” Fay96.
A formal definition of data mining (DM), also known historically as data fishing, data dredging, knowledge discovery in databases or depending on the domain, as business intelligence, information discovery, information harvesting or data pattern processing is Fay96