Archive - Eric Heidel, PhD PStat - Statistician For Hire

Tags

Published on

September 26, 2014

G*Power for the masses

Effect Size G*Power Sample Size Sampling Statistical Power Analysis

G*Power is a necessary tool for every researcher's toolkit

Easy statistical power and sample size calculations

I'm trying to run an online business so I'm fully Google-integrated. I see that there many search queries of different derivations related to sample size calculation as it relates to behind-the-scenes tracking measures.

There is an open-source tool available to EVERYONE that allows you to calculate your own a priori and post hoc power analyses. It is called G*Power and as your personal statistical consultant, I highly suggest you go to the following web address and download Version 3.0 to your respective device:

http://www.gpower.hhu.de/en.html

The researchers that developed this program have made a great contribution to science. It is truly a great and FREE program that can run a litany of different power analyses. You can find out in minutes how large of a sample size that you need, given that you have an idea of the effect size that you are attempting to detect in your study.

Use means, proportions, and variance measures from published studies in your field to have the most empirically rigorous hypothesized effect. Enter these values into G*Power and the adjust the variance and magnitude of the effect size to see how the required sample size changes.

Click on the Sample Size button to access the methods of conducting and interpreting sample size calculations for ten different statistical tests.

Sample Size
Published on

September 26, 2014

Preliminary statistical consultation

Database Management Research Engineer Sample Size Statistical Analysis Statistical Consultation Statistician Statistics

Support your local statistician!

Seek out methodological and statistical consultation

If you have access to a statistical consultants or statisticians within your empirical or clinical environment, seek out their services in the preliminary phases of planning your study. Here is a list of things that I do for residents, fellows, faculty, physicians, pharmacists, nurses, and staff at an academic regional medical campus:

1. Sample Size - I conduct sample size calculations for at least of 80-85% of my first-time clients. They often want to know how many people they need to reach a significant p-value. We work through the process of acquiring an evidence-based measure of effect that reflects what their research question is trying to answer.

It feels good knowing that you have a good chance of detecting significance with a small sample size. Also, it is good to find out that you have to collect A LOT more observations than you thought you would. Post hoc power analyses should be run for any non-significant main effects that may be considered Type II errors (limited or small sample sizes).

2. Statistical analysis - Real biostatistical scientists and statisticians will conduct your statistical analyses in an objective and expeditious manner to help you answer your research questions. Please help them understand what your research question is and what research design you want to use to answer it to the best of your abilities. They will be able to help you choose the correct statistic given that you can tell them the scale of measurement for your primary outcome and what type of design (between-subjects, within-subjects, correlational, mixed, or multivariate) you want to use to answer your question. It is also important to know WHO or WHAT you want to include in your sample in terms of inclusion and exclusion criteria. Finally, know your content area. We may not know your knowledge/philosophical base and need to understand the entire picture, as much as you can tell us.

3. Database management - Go ahead and let us build your database in a basic Excel spreadsheet and send an accompanying code book in Word so that we are all on the same page. It helps us all know what is going on, what variables are being collected, what they mean, how they are measured, and how the analysis will work. Share it with all members of the research team. Use the code book when entering your data. Tell the rest of us if you make changes to the code book or database. These simple tasks and communicative efforts can mean the difference between your statistics being run in five minutes versus five weeks. SERIOUSLY.

4. Write-up of findings for publication - We will give you an annotated write-up of your findings with statistical outputs and give you basic and unbiased interpretations of the statistical results of your study. We can help you write up the statistical methods and results sections of your abstracts and manuscripts. We can even help you design tables and graphs that will make your study findings more aesthetically and visually appealing to your audience.

When it comes to authorship, if you feel that your statistical professional's contribution to the design, execution, and interpretation of your study warrants authorship, offer it to them. They will greatly appreciate it! However, YOU SHOULD NEVER BE REQUIRED TO GIVE US AUTHORSHIP JUST BECAUSE WE RAN YOUR STATISTICS FOR YOU. IT IS UNETHICAL FOR US TO REQUIRE AUTHORSHIP FOR DOING OUR JOB. THAT IS, IF OUR JOB IS TO RUN STATISTICS IN YOUR EMPIRICAL OR CLINICAL ENVIRONMENT.
Published on

September 23, 2014

Using naturally skewed continuous variables as outcome variables

Kurtosis Listwise Deletion Logarithmic Transformations Non-parametric Statistics Outcome Outliers Skewness

Transformed outcomes

Some continuous variables will be naturally skewed

In medicine, there is an important metric that signifies efficiency and quality in healthcare, length of stay (LOS) in the hospital. When thinking about the distribution of a variable such as LOS, you have to put it into a relative context. The vast majority of people will have an LOS of between 0-3 days given the type of treatment or injury that brought them to hospital. VERY FEW individuals will stay at the hospital one month, six months, or a year. Therefore, the distribution looks nothing like the normal curve and is extremely positively skewed.

As a researcher, you may want to predict for a continuous variable that has a natural and logical skewness to its distribution in the population. Yet, the assumption of normality is a central tenet of running statistical analyses. What is one to do in this situation?

The answer is to first, run skewnessand kurtosis statistics to assess the normality of your continuous outcome. If the either statistic is above an absolute value of 2.0, then the distribution is non-normal. Check for outliers in the distribution that are more than 3.29 standard deviations away from the mean. Make sure that the outlying observations were entered correctly.

You now have a choice:

1. You can delete the outlying observations in a listwise fashion. This should be done only if the number of outlying variables is less than 10% of the overall distribution. This is the least preferable choice.

2. You can conduct a logarithmic transformation on the outcome variable. Doing this will normalize the distribution so that you can run the analysis using parametric statistics. The unstandardized beta coefficients, standard errors, and standardized beta coefficients are not interpretable, but the significance of the associations between the predictor variables and the transformed outcome can yield some inferential evidence.

3. You can recode the continuous outcome variable into a lower level scale of measurement such as ordinal or categorical and run non-parametric statistics to seek out any associations. Of course, you are losing the precision and accuracy of continuous-level measurement and introducing measurement error into the outcome variable, but you will still be able to run inferential statistics.

4. You can use non-parametric statistics without changing the skewed variable at all. That is one of the primary benefits of non-parametric statistics: They are robust to violations of normality and homogeneity of variance. Instead of interpreting means and standard deviations, you will interpret medians and interquartile ranges with non-parametric statistics.

Click on the Statistics button to learn more.

Statistics
Published on

September 22, 2014

Statistical tests

Hypothesis Testing Research Question Statistics

Statistical tests are used to answer research questions

It's not about the statistics, it's about the question.

In my experience, statistics is a cognitive dissonance-inducing mathematical science and no one tends to recall their personal and professional statistical experiences with much zeal. It's as if there is an automatic recoil when the topic of statistics enters the discussion and planning of a research study. The literature has posited that statistics are intimidating and nebulous because many people do not possess the necessary competencies and experience with statistics and also people do not understand the lexicon of the science.

The most important thing to remember about applied statistics, despite their prevalence, relevance, and utility in everyday life, is that they are tools that human beings use to communicate the results of data analysis. Hypothesis testing is employed in empirical research so that researchers can present their findings in a relative context that is interpretable and applicable in other research and applied environments.

Statistics are useful ONLY when they are used to answer useful, appropriate, answerable, relevant, and valid research questions that are grounded in the empirical literature.
Published on

September 20, 2014

The research question is the foundation of everything empirical

FINER PICO Research Question

Foundation for measurement, design, power, and statistics

80% of preliminary study planning should be given to the research question

As a biostatistical consultant at an academic regional medical campus, I am supposed to spend 80% of my time working with residents, fellows, faculty, clinicians, researchers, nurses, pharmacists, and hospital staff to formulate and refine their research question. THAT is how important it is to any research study.

A research question is cultivated through researchers' efforts to know the existing literature, their clinical expertise and interests, their collaboration with peers, and their intrinsic motivation towards scientific discovery and innovation. Answerable, appropriate, meaningful, and purposeful research questions make valid and needed contributions to the literature.

Deductive reasoning should be used when formulating a research question. Oftentimes, researchers will want to answer EVERY possible question and collect data on EVERY single variable that they can in hopes of finding SOMETHING SIGNIFICANT. This is not the way that REAL science works. A focused and refined research question is the basis for constructing and executing research. This does not mean that researchers cannot ask secondary, tertiary, and ancillary research questions as demographic, clinical, and confounding variables are yielded from literature reviews! Of course, these are important questions to ask and often lead to great discoveries! (Example: Viagra) However, having ONE research question that serves as the foundation for a study is extremely important and should not be overlooked!

Many novice researchers will plan an entire study around a type of research design or a statistic that they read in an article. REMEMBER, research designs and statistical tests are chosen to answer researcher questions, NOT the inverse.

All of this being said, there are two existing frameworks that greatly assist in formulating (FINER) and refining (PICO) research questions. FINER stands for feasible, interesting, novel, interesting, and relevant. PICO stands for population, intervention, comparator, and outcome.
Published on

September 19, 2014

Prevalence vs. Incidence

Cohort Cross-sectional Incidence Odds Ratio With 95% CI Prevalence Prospective Cohort Relative Risk

Prevalence and incidence used correctly

Difference in important epidemiological measures

The terms prevalence and incidence are often used interchangeably. However, they are extremely different in their utility and interpretability within epidemiology.

Prevalence is the proportion of cases or disease states that exist in a population at any given time. Prevalence is established using cross-sectional research designs. Measures of prevalence can be used to generate odds ratios for outcomes occurring given an exposure or non-exposure. It is calculated when data is collected in a retrospective fashion.

Incidence is the number of new cases or disease states that occur in a population. Incidence is established in cohort designs. Measures of incidence are used to establish the relative risk of an outcome given treatment or no treatment. It is calculated when data is collected in a prospective fashion.

Click on the Epidemiology button below to continue.

Epidemiology