Added Data Preprocessing, Outlier Detection, and Visualization #3
Open
SajanGhimire1 wants to merge 7 commits intoCodeByPinar:mainfrom
Open
Added Data Preprocessing, Outlier Detection, and Visualization #3SajanGhimire1 wants to merge 7 commits intoCodeByPinar:mainfrom
SajanGhimire1 wants to merge 7 commits intoCodeByPinar:mainfrom
Conversation
…n techniques in the Plant Health Monitoring project. Integrated Plotly for visualization, replacing some of the existing Matplotlib and Seaborn plots with Plotly equivalents.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Data Preprocessing:_ Handling missing values and outliers:
Handling Missing Values:
The script applies pd.to_numeric with errors='coerce' to convert non-numeric values to NaN, and then drops columns with missing values using dropna().
Handling Outliers: _Outliers are detected using the Isolation Forest algorithm from scikit-learn.
Model Selection:_Testing various algorithms to identify the best-performing model:
The script only uses the Isolation Forest algorithm for outlier detection. It doesn't involve testing multiple algorithms for model selection.
Data Visualization: Creating insightful visualizations for better understanding:






The script includes various visualizations such as correlation heatmaps, pair plots, histograms, and boxplots, which help in understanding the data and identifying patterns.