Tags

  • Published on

    Research Engineer makes applied research and statistics easier

    Research Engineer is designed to get you to the correct research question, research design, sample size, database, and statistical test

    Based on your decisions to the questions presented, you will get to right place

    A few words on what I'm doing on here. I am a biostatistician, methodologist, psychometrician, and counselor. Everyday, the incredibly intelligent people I work with including physicians, residents, fellows, staff, and faculty feel anxiety when it comes to statistics and research. Research has shown that statistics can induce cognitive dissonance in an individual due to limited experiences and competencies. The collective unconscious has sequestered statistics and research into a dark corner and that's scary.

    Research and statistics are the methods by which we, as scientists, analyze, synthesize, and evaluate our research findings in a manner that can be generalized to the appropriate audience. If our methods for communicating research findings causes cognitive dissonance, just because it relates to research and statistics, then how can one ever really be able to generalize the clinical literature and integrate it into clinical practice?

    After seven years of being the one to induce cognitive dissonance in others related to research and statistics, I decided to make a useful tool for students and researchers that could alleviate some of the feelings of anxiety associated with research and statistics. I built Research Engineer.

    Research Engineer is designed to get you to the correct research question, research design, sample size, database, statistical test, evidence-based medicine intervention, diagnostic calculation, epidemiological calculation, variables, surveys, psychometrics, and educational framework to answer your current question (and future questions). 

    I am trying to bring research and statistics out of the collective unconscious and into the conscious mind where it can be effectively communicated among researchers, scientists, and students by creating this decision engine. It is easy to get to the correct research or statistical component, just answer the questions that I present you in the webpages and click on the buttons with your answer in them. Also, the step-by-step methods for conducting and interpreting each statistical test in SPSS are presented on their respective webpages. 

    You can also contact me via phone, social media, and email at any time in you have questions. If you need some help conducting statistics for a research project, I have eight years of experience across thousands of individual projects and I would love to help you on your study.  We can negotiate prices if you are an undergraduate or graduate researcher. 

    In conclusion, Research Engineer makes choosing research methods and statistical tests MUCH EASIER. Just answer the questions embedded in the various decision engines and get to the correct method or test, EVERY TIME.

    Thanks for your continued support, dear friends and colleagues. And many thanks and salutations to the individuals that use Research Engineer. I am honored and humbled to have this great opportunity to create a very useful and unique website. You all are the ones that make it shine!

    ​Sincerely,

    R. Eric Heidel, Ph.D.
    Assistant Professor of Biostatistics
    ​Affiliate Professor of Biomedical Engineering
    Department of Surgery
    Office of Medical Education, Research, and Development
    University of Tennessee Graduate School of Medicine
    Owner and Operator, Scale, LLC
  • Published on

    Sensitivity and specificity

    Diagnostic testing

    Detecting disease versus identifying the healthy

    Sensitivity and specificity are important diagnostic measures that provide evidence of a diagnostic test's ability to either "detect disease" or "identify the healthy," depending upon the clinical context.

    In order to conduct diagnostic testing, the results of the diagnostic test of interest have to be compared to the results of existing "gold standard" method of diagnosis in a defined population. The results of both the diagnostic test and "gold standard" have to be quantified as a dichotomous categorical variable (positive or negative, "+" or "-").   

    Sensitivity is the ability of a diagnostic test to detect disease. It is the percentage of people that tested positive or "+" with both the diagnostic test and the "gold standard." A diagnostic test with high sensitivity is good at picking up cases of a given disease state. It is also able to "rule out" disease states.

    Specificity is the ability of a diagnostic test to identify the healthy. It is the percentage of people that tested negative or "-" with both the diagnostic test and the "gold standard." A diagnostic test with high specificity is good at detecting cases that do not require more intensive treatment. It is also able to "rule in" disease states.

    It is optimal to have a diagnostic test that can both detect disease (sensitivity) AND identify the healthy (specificity). There is an absolute inverse relationship between sensitivity and specificity. As sensitivity goes up, specificity will go down. Higher specificity will lead to lower sensitivity. A well-accepted criterion for a balanced diagnostic test is 80% for both sensitivity and specificity. However, given the clinical context, a certain type of diagnostic test with either higher sensitivity or specificity may be warranted.  

    If the diagnostic test results are measured along a numerical continuum, then receiver operator characteristic (ROC) curves can be plotted to detect what value maximizes both sensitivity and specificity. ROC curves can also be used to compare the diagnostic efficacy of several tests concurrently and comparing area under the curve (AUC).
        
  • Published on

    Categorical measurement caveats

    Effects of categorical measurement

    Decrease statistical power and increase sample size

    Categorical variables are very prevalent in medicine. Measures like presence of comorbidities, mortality, and test results are categorical in nature. Here are some general caveats associated with categorical measurement and sample size:  

    1. Categorical outcomes will always DECREASE statistical power and INCREASE the needed sample size. This is due to the lack of precision and accuracy in categorical measurement.

    2. The underlying algebra associated with calculating 95% confidence intervals of odds ratios and relative risk is 100% dependent upon the sample size. With smaller sample sizes, by default, wider and less precise 95% confidence intervals will be found. If one of the cells of a cross-tabulation table has fewer observations that the other cells, then the 95% confidence interval will be wider and potentially not truly interpretable. A 95% confidence interval will become narrower or more precise only with larger sample sizes.  

    3. When using categorical variables for diagnostic testing purposes, larger samples sizes will be needed to calculate precise measures of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV). With smaller sample sizes in diagnostic studies, a change in one or two observations can have drastic effects on the diagnostic values.

    This is especially true when there is a subjective rating used for purposes of diagnosing someone as "positive" or "negative" for a given disease state (radiologist reading an X-ray). Inter-rater reliability coefficients such as Kappa or ICC should be employed to ensure consistency and reliability among subsequent ratings and raters. Sensitivity, specificity, and PPV will be affected by inter-rater reliability. Receiver Operator Characteristic (ROC) curves can be used to find a given value where sensitivity and specificity of a test is maximized. ROC curves can also be used to compare the area under the curve (AUC) between several diagnostic tests at the same time so that the best can be chosen.  

    4. For each predictor categorical parameter (or variable) that you want to include in a multivariate model, you have to increase your sample size by at least 20-40 observations of the outcome. This due to the limited precision, accuracy, and statistical power associated with categorical measurement. Researchers HAVE to collect more observations in order to detect any potential significant multivariate associations.  

    In the case that a polychotomous variable is to be used in a model, create (a-1), where a is the number of categories, dichotomous variables with "0" as not being that category and "1" as being that category. For each level, 20-40 more observations of the outcome will be needed to have enough statistical power to detect differences amongst the multiple groups.        
  • Published on

    Research Engineer is the world's first online decision tree for applied research and statistics

    Fully automated and freely accessible to researchers around the world

    The first interactive decision tree that integrates statistical assumptions and post hoc analyses

    Research Engineer is going to be presented for the first time in a public forum next Tuesday. I'm pretty excited to let all of my colleagues know what I've been up to these past five months. I realized earlier today that Research Engineer has completely changed my life for the better. And I'm so thankful to all of those that have supported me along the way.

    And to visitors of this website, I extend my most gracious and humble thanks for your patronage. The website will continue to grow and help you in all of your future empirical endeavors.

    I have built the world's first online decision engine for research questions, research designs, statistics, statistical power, databases, evidence-based medicine, survey design, psychometrics, epidemiology, diagnostic testing, variables, and education. I look forward to the future!
  • Published on

    Positive Predictive Value and Prevalence

    Positive Predictive Value (PPV) and Prevalence

    Increased prevalence of an outcome will lead to more cases being "picked up"

    Positive predictive value (PPV) is the likelihood that a person with a "+" on a diagnostic test actually has the disease state as it is detected using a "gold standard." Another way of defining PPV is how believable a "+" test result is in a given population.
     
    As the prevalence of a disease state in a given population increases, the positive predictive value of a test will increase. This is simply due to the fact that there are more cases or disease states that can be detected.

    If you are working with a rare outcome in a given population, be aware that less prevalent outcomes increase the number of false positives detected by a diagnostic test. By definition, lower prevalence dictates that there are not many true positives or "sick" people in a given population. With so few actual cases and more people being tested, the inherent measurement error associated with diagnostic testing will yield more and more false positives.    

    So, in conclusion, it is very important to know the baseline prevalence of your outcome or disease state in your population of interest when assessing diagnostic tests. Higher prevalence leads to increased PPV and lower prevalence leads to increased false positives.