Introduction to Data Science - Unit : 1 - Topic 7 : DEFINING GOALS AND CREATING PROJECT CHARTER
DEFINING GOALS AND CREATING PROJECT
CHARTER
A project starts by
understanding the what, the why, and the how of your
project. What does the company expect you to do? And why does management place such
a value on your research? Is it part of a bigger strategic picture or a “lone
wolf” project originating from an opportunity someone detected? Answering these
three questions (what, why, how) is the goal of the first phase, so that
everybody knows what to do and can agree on the best course of action. The
outcome should be a clear research goal, a good understanding of the context, well-defined
deliverables, and a plan of action with a timetable. This information is then
best placed in a project charter.
1. Spend
time understanding the goals and context of your research
An essential outcome is the research goal that states the
purpose of your assignment in a clear and focused manner. Understanding the
business goals and context is critical for project success. Continue asking
questions and devising examples until you grasp the exact business
expectations, identify how your project fits in the bigger picture, appreciate
how your research is going to change the business, and understand how they’ll
use your results.
2.
Create a project charter
Clients
like to know upfront what they’re paying for, so after you have a good
understanding of the business problem, try to get a formal agreement on the
deliverables. All this information is best collected in a project charter. For
any significant project this would be mandatory.
A
project charter requires teamwork, and your input covers at least the
following:
Ø
A
clear research goal
Ø
The
project mission and context
Ø
How
you’re going to perform your analysis
Ø
What
resources you expect to use
Ø
Proof
that it’s an achievable project, or proof of concepts
Ø
Deliverables
and a measure of success
Ø
A
timeline
RETRIEVING DATA
The next step in data science is to retrieve the
required data . Sometimes you need to go into the field and design a data
collection process yourself, but most of the time you won’t be involved in this
step. Many companies will have already collected and stored the data for you,
and what they don’t have can often be bought from third parties. Don’t be
afraid to look outside your organization for data, because more and more
organizations are making even high-quality data freely available for public and
commercial use.
Data can be stored in many forms, ranging from simple
text files to tables in a database. The objective now is acquiring all the data
you need. This may be difficult, and even if you succeed, data is often like a
diamond in the rough: it needs polishing to be of any use to you.
CLEANSING, INTEGRATING AND TRANSFORMING
DATA
The data received from
the data retrieval phase is likely to be “a diamond in the rough.” Your task
now is to sanitize and prepare it for use in the modeling and reporting phase.
Doing so is tremendously important because your models will perform better and
you’ll lose less time trying to fix strange output. It can’t be mentioned
nearly enough times: garbage in equals garbage out. Your model needs the data
in a specific format, so data transformation will always come into play. It’s a
good habit to correct data errors as early on in the process as possible.
EXPLORATORY ANALYSIS
During exploratory data analysis you take a deep dive
into the data. Information becomes much easier to grasp when shown in a
picture, therefore you mainly use graphical techniques to gain an understanding
of your data and the interactions between variables. This phase is about exploring
data, so keeping your mind open and your eyes peeled is essential during the
exploratory data analysis phase.
MODEL BUILDING
With clean data in place and a good understanding of
the content, you’re ready to build models with the goal of making better
predictions, classifying objects, or gaining an understanding of the system
that you’re modeling. This phase is much more focused than the exploratory
analysis step, because you know what you’re looking for and what you want the
outcome to be,
Building a model is an
iterative process. The way you build your model depends on whether you go with
classic statistics or the somewhat more recent machine learning school, and the
type of technique you want to use. Either way, most models consist of the
following main steps:
3. Selection
of a modeling technique and variables to enter in the model
4. Execution
of the model
5. Diagnosis
and model comparison
1. Model and variable selection
You’ll need to select the variables you want
to include in your model and a modelling technique. Your findings from the
exploratory analysis should already give a fair idea of what variables will
help you construct a good model. Many modeling techniques are available, and
choosing the right model for a problem requires judgment on your part. You’ll
need to consider model performance and whether your project meets all the
requirements to use your model, as well as other factors:
Ø Must the model be moved to a production
environment and, if so, would it be easy to implement?
Ø How difficult is the maintenance on the
model: how long will it remain relevant if left untouched?
Ø Does the model need to be easy to explain?
When the thinking is done, it’s time for
action.
2. Model execution:
Once you’ve chosen a model you’ll need to
implement it in code. most programming languages, such as Python, already have
libraries such as StatsModels or Scikit-learn. These packages use several of
the most popular techniques. Coding a model is a nontrivial task in most cases,
so having these libraries available can speed up the process.
3. Model diagnostics and model comparison
You’ll be building multiple models from which you then choose the best one
based on multiple criteria. Working with a holdout sample helps you pick the
best-performing model. A holdout sample is a part of the data you leave out of
the model building so it can be used to evaluate the model afterward.
PRESENTING FINDINGS AND BUILDING
APPLICATIONS ON TOP OF THEM
After you’ve successfully analyzed the data and built
a well-performing model, you’re ready to present your findings to the world.
This is an exciting part; all your hours of hard work have paid off and you can
explain what you found to the stakeholders.
Sometimes people get so excited about your work that
you’ll need to repeat it over and over again because they value the predictions
of your models or the insights that you produced. For this reason, you need to
automate your models
Comments
Post a Comment