- This event has passed.
Data Management, Analysis and Graphics with R Course
Data Management, Analysis and Graphics with R
Introduction to data Science with R
- What is analytics & Data Science?
- Common Terms in Analytics
- Analytics vs. Data warehousing, OLAP, MIS Reporting
- Relevance in industry and need of the hour
- Types of problems and business objectives in various industries
- How leading companies are harnessing the power of analytics?
- Critical success drivers
- Overview of analytics tools & their popularity
- Analytics Methodology & problem-solving framework
- List of steps in Analytics projects
- Identify the most appropriate solution design for the given problem statement
- Project plan for Analytics project & key milestones based on effort estimates
- Build Resource plan for analytics project
- Why R for data science?
Data Importation & Exportation.
- Introduction R/R-Studio – GUI
- Concept of Packages – Useful Packages (Base & Other packages)
- Data Structure & Data Types (Vectors, Matrices, factors, Data frames, and Lists)
- Importing Data from various sources (txt, dlm, excel, sas7bdata, db, etc.)
- Database Input (Connecting to database)
- Exporting Data to various formats)
- Viewing Data (Viewing partial data and full data)
- Variable & Value Labels – Date Values
Data Manipulation.
- Data Manipulation steps
- Creating New Variables (calculations & Binning)
- Dummy variable creation
- Applying transformations
- Handling duplicates
- Handling missing’s
- Sorting and Filtering
- Subsetting (Rows/Columns)
- Appending (Row appending/column appending)
- Merging/Joining (Left, right, inner, full, outer etc)
- Data type conversions
- Renaming
- Formatting
- Reshaping data
- Sampling
- Data manipulation tools
- Loops (Conditional, iterative loops, apply functions)
- Arrays
- R Built-in Functions (Text, Numeric, Date, utility)
- Numerical Functions
- Text Functions
- Date Functions
- Utilities Functions
- R User Defined Functions
Data Analysis- Visualization
- Introduction exploratory data analysis
- Descriptive statistics, Frequency Tables and summarization
- Univariate Analysis (Distribution of data & Graphical Analysis)
- Bivariate Analysis (Cross Tabs, Distributions & Relationships, Graphical Analysis)
- Creating Graphs- Bar/pie/line chart/histogram/boxplot/scatter/density etc)
- R Packages for Exploratory Data Analysis (dplyr, plyr, gmodes, car, vcd, Hmisc, psych, doby etc)
- R Packages for Graphical Analysis (base, ggplot, lattice,etc)
Introduction to Statistics
- Basic Statistics – Measures of Central Tendencies and Variance
- Building blocks – Probability Distributions – Normal distribution – Central Limit Theorem
- Inferential Statistics -Sampling – Concept of Hypothesis Testing
- Statistical Methods – Z/t-tests (One sample, independent, paired), Anova, Correlations and Chi-square
Introduction to Predictive Modeling
- Concept of model in analytics and how it is used?
- Common terminology used in analytics & modeling process
- Popular modeling algorithms
- Types of Business problems – Mapping of Techniques
- Different Phases of Predictive Modeling
Data Exportation for Modelling
Data Preparations
- Need of Data preparation
- Consolidation/Aggregation – Outlier treatment – Flat Liners – Missing values- Dummy creation – Variable Reduction
- Variable Reduction Techniques – Factor & PCA Analysis
Segmentation: Solving segmentation problems
- Introduction to Segmentation
- Types of Segmentation (Subjective Vs Objective, Heuristic Vs. Statistical)
- Heuristic Segmentation Techniques (Value Based, RFM Segmentation and Life Stage Segmentation)
- Behavioral Segmentation Techniques (K-Means Cluster Analysis)
- Cluster evaluation and profiling – Identify cluster characteristics
Linear Regression
- Interpretation of results – Implementation on new data
- Introduction – Applications
- Assumptions of Linear Regression
- Building Linear Regression Model
- Understanding standard metrics (Variable significance, R-square/Adjusted R-square, Global hypothesis ,etc)
- Assess the overall effectiveness of the model
- Validation of Models (Re running Vs. Scoring)
- Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc.)
- Interpretation of Results – Business Validation – Implementation on new data
Logistics Regression.
- Introduction – Applications
- Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
- Building Logistic Regression Model (Binary Logistic Model)
- Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)
- Validation of Logistic Regression Models (Re running Vs. Scoring)
- Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift charts, Model equation, Drivers or variable importance, etc)
- Interpretation of Results – Business Validation – Implementation on new data
Time series Forecasting
- Introduction – Applications
- Time Series Components (Trend, Seasonality, Cyclicity and Level) and Decomposition
- Classification of Techniques (Pattern based – Pattern less)
Way forward After the Training
Participants will develop a work plan through the help of facilitators that stipulates application of skills acquired in improving their organizations. ASPM will continuously monitor implementation progress after the training.
Training Evaluation:
Participants will undertake a simple assessment before the training to gauge knowledge and skills and another assessment after the training in-order to monitor knowledge gained through the training