# PCA vs t-SNE (Dimensionality Reduction techniques)

PCA (Principal Component Analysis) and t-SNE (t-Distributed Stochastic Neighbor Embedding) are both dimensionality reduction techniques that can be used in machine learning and data analysis.

PCA is a linear transformation method that identifies the most important features (principal components) that explain the most variance in the data. It reduces the number of features in the data, making it easier to visualize and analyze. It works well for high-dimensional data and is computationally efficient. However, PCA assumes that the data is normally distributed and linearly related, and may not work well with non-linear data.

t-SNE is a non-linear dimensionality reduction technique that is particularly useful for visualizing high-dimensional data. It groups similar data points together and separates dissimilar ones, preserving local structure in the data. It is particularly good at capturing non-linear relationships between data points, which is why it is often used for visualizing clusters or patterns in the data. However, t-SNE is computationally expensive and may not work well for larger datasets.

Let’s see an example of how to use PCA and t-SNE on the famous iris dataset.

`# Import required librariesimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.datasets import load_irisfrom sklearn.decomposition import PCAfrom sklearn.manifold import TSNE# Load the iris datasetiris = load_iris()# Convert data to dataframeiris_df = pd.DataFrame(iris.data, columns=iris.feature_names)`

First we will visualize the original data with scatter plot.

`# Visualize the original dataplt.scatter(iris_df['sepal length (cm)'], iris_df['sepal width (cm)'], c=iris.target)plt.xlabel('sepal length (cm)')plt.ylabel('sepal width (cm)')plt.title('Iris Dataset')plt.show()`

Now we will apply PCA to the data to reduce the dimensions from 4 to 2. And then visualize the PCA output using another scatter plot.

`# Apply PCApca = PCA(n_components=2)iris_pca = pca.fit_transform(iris.data)# Visualize PCA outputplt.scatter(iris_pca[:,0], iris_pca[:,1], c=iris.target)plt.xlabel('PC 1')plt.ylabel('PC 2')plt.title('PCA Output')plt.show()`