This project visualizes and analyzes the famous Iris Flower dataset, which contains 150 samples of iris flowers categorized into three species:
- Iris-setosa
- Iris-versicolor
- Iris-virginica
Each sample has measurements for the following features:
- Sepal Length
- Sepal Width
- Petal Length
- Petal Width
- Species (Target class)
- The dataset contains 150 samples equally divided across the three species.
- Each row corresponds to a sample, with the measurements of the four features and the species label.
Id
: Index of the flower sampleSepal Length
: Length of the sepal in centimetersSepal Width
: Width of the sepal in centimetersPetal Length
: Length of the petal in centimetersPetal Width
: Width of the petal in centimetersSpecies
: The type of iris flower (setosa, versicolor, virginica)
Based on the analysis of the Iris dataset:
- Iris-setosa is easily distinguishable from the other species, primarily due to its smaller sepal and petal sizes.
- Iris-versicolor and Iris-virginica have more similarities and show some overlap in their feature measurements. However, Iris-virginica has the largest petal sizes, while Iris-versicolor tends to have larger sepals.
- The species are more difficult to separate when based solely on sepal length and width.
- Petal length and petal width appear to be the most informative features for classification.
This visualization project demonstrates that while the Iris species have distinctive features, there are overlaps in some of the measurements, especially between Iris-versicolor and Iris-virginica. Petal length and petal width are the most effective features for distinguishing between species.
- Jupyter Notebook for data visualization and analysis.
- Python Libraries: pandas, seaborn, matplotlib.