Step By Step Process In EDA And Feature Engineering In Data Science Projects

126,764

4,077 0

Published 2021-08-29

⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite for a few months and I love it! www.kite.com/get-kite/?utm_medium=referral&utm_sou…
--------------------------------------------------------------------------------------------------------------------------
Subscribe my vlogging channel
/ @krishnaikhindi
Please donate if you want to support the channel through GPay UPID,
Gpay: krishnaik06@okicici
Telegram link: t.me/joinchat/N77M7xRvYUd403DgfE4TWw

Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more
youtube.com/channel/UCNU_lfiiWBdtULKOw6X0Dig/join
-------------------------------------------------------------------------------------------------------------------------

Connect with me here:
Twitter: twitter.com/Krishnaik06
Facebook: www.facebook.com/krishnaik06
instagram: www.instagram.com/krishnaik06
-----------------------------------------------------------------------------------------------------------------------

All Comments (21)

@user-ek6to2wf2u 10 months ago

Exploratory Data Analysis (EDA) and Feature Engineering are two essential steps in data science projects that help in understanding the data, extracting valuable insights, and preparing the data for model building and analysis. Exploratory Data Analysis (EDA): EDA is the initial and crucial phase of any data science project. It involves exploring and summarizing the main characteristics of the dataset to gain insights into its structure, patterns, and relationships between variables. The main objectives of EDA are as follows: Data Cleaning: Identifying and handling missing or erroneous data points, dealing with outliers, and removing duplicates. Descriptive Statistics: Calculating basic statistical measures such as mean, median, standard deviation, and percentiles to understand the central tendencies and dispersion of the data. Data Visualization: Creating visual representations like histograms, scatter plots, box plots, and heatmaps to visualize the distribution and relationships between variables. Correlation Analysis: Assessing the correlation between different features to understand their interdependencies and potential multicollinearity. Hypothesis Testing: Conducting statistical tests to validate assumptions and make data-driven decisions. EDA helps data scientists to identify patterns, trends, and potential issues within the dataset. It provides a foundation for further analysis and model building. Feature Engineering: Feature engineering involves transforming the raw data into meaningful features that can be used as inputs for machine learning algorithms. The quality and relevance of features play a significant role in the performance of a predictive model. The key steps in feature engineering are as follows: Feature Selection: Choosing the most relevant features that have a significant impact on the target variable while disregarding irrelevant or redundant ones. This step helps in reducing dimensionality and enhancing model efficiency. Feature Transformation: Applying mathematical or statistical transformations to the features to make the data suitable for modeling. Common transformations include scaling, normalization, and log transformations. Handling Categorical Variables: Converting categorical variables into numerical representations using techniques like one-hot encoding or label encoding to make them usable by machine learning algorithms. Creating Interaction Features: Introducing new features based on interactions between existing features can help capture non-linear relationships. Handling Missing Data: Dealing with missing data by imputing or removing missing values, depending on the nature of the dataset. Feature Extraction: Generating new features from the existing data using domain knowledge or advanced techniques like principal component analysis (PCA) or t-distributed stochastic neighbor embedding (t-SNE). Effective feature engineering can significantly improve the performance of machine learning models by providing them with more relevant and informative inputs, leading to more accurate predictions and better insights. In summary, Exploratory Data Analysis (EDA) helps in understanding the data, identifying patterns, and making data-driven decisions. Feature engineering transforms the data into useful features, enabling machine learning models to learn from the data and make predictions effectively. Together, these two steps are fundamental for successful data science projects.
@percy8177 2 years ago

💪🤣Facial expression is serious when he said he goes with Box Plots to find the outliers. Gotta love the passion bro.
@Yeyppe 2 years ago

Krish Sir You Know Your Channel Is Not Only A YouTube Channel ... It Is Everything For Us ! Having A Mentor And Teacher Like You Is A Blessing
@abdulqudusbalogun8057 2 years ago

I have been watching your videos non stop for weeks now, by God, you are my favorite tutor...God bless
@vaishnavi4354 2 years ago

Induction session is awesome from MLDL course. .that's 🔥🔥🔥
@nanda9395 1 year ago

This is clear info about F.E and E.D.A. . 🙏🙏
@kanikabagree1084 2 years ago

This guy deserves a million subs 🌸❤️
@bhargavikoti4208 2 years ago

Thank you..much needed 🙂
@write2ruby 2 years ago

1. Feature Engineering (Takes 30% of Project Time) a) EDA i) Analyze how many numerical features are present using histogram, pdf with seaborn, matplotlib. ii) Analyze how many categorical features are present. Is multiple categories present for each feature? iii) Missing Values (Visualize all these graphs) iv) Outliers - Boxplot v) Cleaning b) Handling the Missing Values i) Mean/Median/Mode c) Handling Imbalanced dataset d) Treating the Outliers e) Scaling down the data - Standardization, Normalization f) Converting the categorical features into numerical features 2. Feature Selection a) Correlation b) KNeighbors c) ChiSquare d) Genetic Algorithm e) Feature Importance - Extra Tree Classifiers 3. Model Creation 4. Hyperparameter Tuning 5. Model Deployment 6. Incremental Learning
@awais2451985 1 year ago

a lot of love and appreciation from Pakistan for your great effort.
@apnapython 2 years ago

Thank you…great video
@ashmitasharma5879 1 month ago

Thank you so much for helping us this way ....🎉🎉🎉🎉 Thank you so much sir You are a very knowledgeable and helping natured person 🎉🎉🎉🎉🎉
@kawishdaniyal3640 2 years ago

Great Work sir jii ! 👌👌👌👌
@1234560pratik 2 years ago

What I actually need you know very well sir but how ??man ki baat jan lete ho ap antaryami ho mahagyani ho balki me to kahta hu ap purush he nahi MahaPurus ho🤩😍😍❤❤❤
@ukamakaazode 1 year ago

Thank you Krish!!!!!!!
@ankitac4994 2 years ago

Thank you for this video sir
@ShahnawazKhan-xl6ij 2 years ago

Very important step
@kasturibalaji9177 2 years ago

Hi Krishna sir, I got new job on data science domain at Chennai product based company. Your videos lots help me before I was working different domain. Best Regards, Balaji
@saimanohar3363 2 years ago

Grt list of videos for EDA. In case we have more categorical variables and less numerical variables. Post EDA, should we work on Chaid algorithm. Please suggest. Thanks
@mehrozalam94 2 years ago

Great sir <3