1. When working with smaller sample sizes, adequate statistical power (and therefore statistical significance) is VERY hard to achieve.
2. There is limited precision and accuracy when using categorical or ordinal outcomes, which can further decreases statistical power.
3. When measuring for small effect sizes, small sample sizes cannot provide enough variance in the outcome to detect clinically meaningful, but small effects. This REALLY decreases your statistical power since inferential statistics depend upon variance in the mathematical sense.
With this being said, remember to interpret the p-values yielded from RCT level studies with small sample sizes in the context of the aforementioned points. If a treatment effect does not obtain statistical significance, but appears to be CLINICALLY SIGNIFICANT with a p-value approaching significance (Type II error), then perhaps more credence can be found in the effect.
If researchers run bivariate tests on 30 different outcomes with less than 20 observations and claim significance without a Bonferroni adjustment, throw the article out.