Data Analysis using R

Course Title: Data Analysis using R
Statistical Data Management and Analysis using R course provides an insight into quantitative
data management and analysis (exploring, summarizing, statistical analyzing, visualizing). R is an
open source software with many features for quantitative data management and analysis.
This hands-on course will endeavor to practically take the participants through the R.
Data Science with R
Introduction to data Science with R
✓ What is analytics & Data Science?
✓ Common Terms in Analytics
✓ Analytics vs. Data warehousing, OLAP, MIS Reporting
✓ Relevance in industry and need of the hour
✓ Types of problems and business objectives in various industries
✓ How leading companies are harnessing the power of analytics?
✓ Critical success drivers
✓ Overview of analytics tools & their popularity
✓ Analytics Methodology & problem-solving framework
✓ List of steps in Analytics projects
✓ Identify the most appropriate solution design for the given problem statement
✓ Project plan for Analytics project & key milestones based on effort estimates
✓ Build Resource plan for analytics project
✓ Why R for data science?
Data Importation & Exportation.
✓ Introduction R/R-Studio – GUI
✓ Concept of Packages – Useful Packages (Base & Other packages)
✓ Data Structure & Data Types (Vectors, Matrices, factors, Data frames, and Lists)
✓ Importing Data from various sources (txt, dlm, excel, sas7bdata, db, etc.)
✓ Database Input (Connecting to database)
✓ Exporting Data to various formats)
✓ Viewing Data (Viewing partial data and full data)
✓ Variable & Value Labels – Date Values
Data Manipulation.
✓ Data Manipulation steps
✓ Creating New Variables (calculations & Binning)
✓ Dummy variable creation
✓ Applying transformations
✓ Handling duplicates
✓ Handling missing’s
✓ Sorting and Filtering
✓ Sub-setting (Rows/Columns)
✓ Appending (Row appending/column appending)
✓ Merging/Joining (Left, right, inner, full, outer etc)
✓ Data type conversions
✓ Renaming
✓ Formatting
✓ Reshaping data
✓ Sampling
✓ Data manipulation tools
✓ Loops (Conditional, iterative loops, apply functions)
✓ Arrays
Data Analysis- Visualization
✓ Introduction exploratory data analysis
✓ Descriptive statistics, Frequency Tables and summarization
✓ Univariate Analysis (Distribution of data & Graphical Analysis)
✓ Bivariate Analysis (Cross Tabs, Distributions & Relationships, Graphical Analysis)
✓ Creating Graphs- Bar/pie/line chart/histogram/boxplot/scatter/density etc)
✓ R Packages for Exploratory Data Analysis (dplyr, plyr, gmodes, car, vcd, Hmisc, psych,
doby etc)
✓ R Packages for Graphical Analysis (base, ggplot, lattice,etc)
Introduction to Predictive Modeling
✓ Concept of model in analytics and how it is used?
✓ Common terminology used in analytics & modeling process
✓ Popular modeling algorithms
✓ Types of Business problems – Mapping of Techniques
Linear Regression
✓ Interpretation of results – Implementation on new data
✓ Introduction – Applications
✓ Assumptions of Linear Regression
✓ Building Linear Regression Model
✓ Understanding standard metrics (Variable significance, R-square/Adjusted R-square,
Global hypothesis ,etc)
✓ Assess the overall effectiveness of the model
✓ Validation of Models (Re running Vs. Scoring)
✓ Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model
equation, drivers etc.)
✓ Interpretation of Results – Business Validation – Implementation on new data
Logistics Regression.
✓ Introduction – Applications
✓ Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
✓ Building Logistic Regression Model (Binary Logistic Model)
✓ Understanding standard model metrics (Concordance, Variable significance, Hosmer
Lemeshov Test, Gini, KS, Misclassification, ROC Curve etc)
4
✓ Validation of Logistic Regression Models (Re running Vs. Scoring)
✓ Standard Business Outputs (Decile Analysis, ROC Curve, Probability Cut-offs, Lift
charts, Model equation, Drivers or variable importance, etc)
✓ Interpretation of Results – Business Validation – Implementation on new data
Time series Forecasting
✓ Introduction – Applications
✓ Time Series Components (Trend, Seasonality, Cyclicity and Level) and Decomposition
✓ Classification of Techniques (Pattern based – Pattern less)
Data Mining
✓ Introduction to data mining
✓ Association rule mining
✓ Clustering analysis
Way forward After the Training
Participants will develop a work plan through the help of facilitators that stipulates application of
skills acquired in improving their organizations. ASPM will continuously monitor
implementation progress after the training.
Training Evaluation
Participants will undertake a simple assessment before the training to gauge knowledge and
skills and another assessment after the training in-order to monitor knowledge gained through the
training.

error: Content is protected !!