The Evolution of Data Mining: From T-Shirts to Suits and Robes
Introduction
In the world of data mining, there are three distinct types of individuals: those who wear T-shirts, those who wear suits, and those who wear robes. Each of these groups represents a unique approach to data mining, with their own strengths and weaknesses. In this article, we will delve into the nature of big data, the different types of data mining, and the tools used to extract insights from vast amounts of data.
The Nature of Big Data (4V)
Big data is characterized by four key attributes: Volume, Variety, Value, and Velocity. These attributes are often referred to as the 4V of big data.
(1) Volume (Capacity)
Big data is typically associated with a very large scale and capacity of data. In the context of banking, for example, a joint-stock bank may have at least ten million retail customers, with each customer generating several gigabytes of data per day. This massive amount of data is often stored in a data warehouse, which is a centralized repository of data from various systems.
(2) Variety (Diversity)
Big data is also characterized by its diversity, with data types ranging from structured to semi-structured to unstructured. Structured data is typically represented as a two-dimensional table, with each row representing a customer record and each column representing a property characteristic. Semi-structured data, on the other hand, may require extraction of key features and technical variables. Unstructured data, such as images and videos, is often more complicated to process.
(3) Value (Value)
The value of big data lies in its ability to generate insights and patterns that can inform business decisions. Through data mining, organizations can explore the data behind the rules and patterns, and identify opportunities to improve customer relationships, reduce risk, and increase revenue.
(4) Velocity (Velocity)
The velocity of big data refers to the speed at which data is generated and processed. With advances in data storage and processing technology, computing is no longer a bottleneck, and organizations can now process vast amounts of data in real-time.
Data Mining Types: T-Shirts, Suits, and Robes
The relationship between business and data mining is complex, and different organizations approach data mining in different ways. There are three types of individuals engaged in data mining work: those who wear T-shirts, those who wear suits, and those who wear robes.
(1) T-Shirts
Those who wear T-shirts are typically associated with the Internet industry. They are often characterized by their technical bias and focus on big data concepts. However, their approach to data mining is often more superficial, and they may not delve as deeply into the data as they could.
(2) Suits
Those who wear suits are typically associated with financial institutions, such as banks and securities companies. They are often characterized by their focus on policy regulation and competition, and their approach to data mining is often more formal and structured.
(3) Robes
Those who wear robes are often associated with the general public and may not have a deep understanding of data mining. However, they may be able to provide insights and perspectives that are not immediately apparent to those who are more familiar with the technology.
Mining Tools: The Power of Open Source
In the world of data mining, there are two dominant approaches: the use of commercial software, such as SAS, and the use of open-source software, such as R. Those who wear T-shirts and suits often prefer commercial software, while those who wear robes may prefer open-source software.
(1) SAS
SAS is a commercial software package that is widely used in the financial industry. It is known for its ability to process large datasets and perform complex statistical analyses. However, it can be expensive and may not be suitable for small-scale data analysis.
(2) R
R is an open-source software package that is widely used in the data mining community. It is known for its flexibility and ability to perform complex statistical analyses. However, it can be difficult to use and may require a high level of technical expertise.
Conclusion
In conclusion, the world of data mining is complex and multifaceted, with different approaches and tools being used by different organizations. Those who wear T-shirts, suits, and robes each have their own strengths and weaknesses, and the choice of approach will depend on the specific needs and goals of the organization. By understanding the nature of big data and the different types of data mining, organizations can make informed decisions about how to extract insights from vast amounts of data.