Chapter XXIX: Methodology | The Philosophy Of Science by Steven Gussman [1st Edition]

“The beauty of doing nothing is that you can do it perfectly. Only when you do something is it

almost impossible to do it without mistakes. Therefore people who are contributing nothing to

society, except their constant criticisms, can feel both intellectually and morally superior.”

– Thomas Sowell^I

“Easier said than done.”
– Anonymous

Methodology is a very large topic (many of the details of which are field-specific) and I will again treat it in a general, foundational manner, here. The biggest problem with methodology is that, among those specialized experts who understand particular methodologies, one may be mired down in the parochial details of one's own field, missing the epistemological forest for the methodological trees. The philosophy of science is the most general of all scientific topics because it is the underlying foundational support for all the rest of the specific sciences;^II epistemology is the half of the philosophy which describes what constitutes a legitimate argument, reasoning, or evidence. Because we can apply reason to itself (recursive reasoning) and because we can apply induction to itself (recursive induction), we can actually test such rules empirically the same way we would test the body of knowledge produced by the use of such rules (indeed, each time a scientific theory of ontology is empirically confirmed, so to is the epistemology used to generate it).^III The fact that empirical testing works is one of the most foundational axioms, but I believe that it is not just an assumption but a self-supporting foundation—the only legal self-supporting foundation, due to the nature of its claim. Such a special ability of self-establishment (one can induce that induction works) ensures that induction is the only legitimate starting axiom for a philosophy.^IV The biggest problem among experts in methodology comes when, all too often, they do not actually understand the epistemology that underlies their specific methodology. Most such experts probably don't even know the word! Such parochial experts then, in the worst case, think that science is chiefly the body of knowledge (which is always in flux).^V In a slightly more enlightened scenario, the specialist believes that science is most importantly a methodology. But only the great scientists appear to realize that science is really most importantly the epistemology; one's methodology and the body of knowledge uncovered by it are all downstream from this foundation. As a result, the methodology, and especially the provisional body of knowledge describing the ontology, can only be of equal or lower quality to the epistemology employed! Recall the complementary computer science concepts of an interface and an implementation.^VI The interface is essentially a promise; a pure philosophical argument without outlining any means for following through on it. Because it has no actionable content for how to make good on its promise, it is a perfect Platonic ideal. One can easily imagine the following interface:^VII

bool IsPrime(int n);

which promises to report whether some supplied number, n, is prime. Because it is merely a signature with no body-content, one can't actually yet make good on that promise in the real world—there is no supplied computation to run. The details of the implementation exist elsewhere, and different thinkers might go about pulling the task off differently (and potentially with differing accuracy, precision, and resource-efficiency in their results). Implementation is difficult: it is where one actually has to engineer a way to meet the ideals enshrined in the interface, where the promise meets the harsh reality of the world of stones (not to mention human fallibility). Here, one's partial ignorance leaves us with a situation where we may introduce our own errors and biases into the implementation, but on the plus side, what we trade away in perfection, we gain in actuality: an implementation may actually be carried out and used.^VIII In the case of checking whether a number is prime, it is a fairly tractable problem for relatively small numbers (for small n)—the brute-force algorithm would be to look sequentially through each number up to n, checking to make sure that the only integer-factors of n are 1 and n (the definition of primes):

bool IsPrime(int n){
for(int i = 2; i < n; i++){

/* If n is evenly divisible by some i

* belonging to [2, n - 1], it is not prime */

if(n % i == 0)
return false;

}

                   /* If you make it this far, n was not evenly divisible
                     * by any i belonging to [2, n – 1] and so is prime */
                   return true;
   }

This would in-principle work for any n, but the reason it only works for relatively small values of n in-practice is because one needs to directly check n - 2 ≈ n potential factors against the number n to see if they are or are not integer-factors of n (in other words, to check if any of these numbers besides 1 and n “goes into n evenly” the way five goes into ten two times with no remainder)—each of which must be thought of as a computer operation which takes time for the CPU to execute.^IX This means the number of operations required takes about n calculations—computer scientists, worried largely about the order-of-magnitude (or the exponent) would label this algorithm as O(N) and rate its efficiency as poor-to-mild. This in turn means that the amount of time required for the computer to give a response is proportional to n as well: while increasingly the speed of computer hardware (faster processors) will be able to run the algorithm for higher n than the slower machines of the past, there will always be infinitely many numbers for which checking their prime status would take longer than the present age of the universe, no matter the speed of one's hardware (which suffers its own physical constraints). The problem is not the hardware, it is the algorithm. This brute deduction (simply empirically testing each number for prime status by checking all of its potential factors) is not the kind of simple deductive law that exploits the deep insight that mathematicians and scientists want; the dream is that there may exist some calculation which takes, say, only a fixed number of operations regardless of the value of n (which computer scientists would consider to be O(c) operations—the best efficiency).^X As it turns out, mathematicians still do not have a simple deductive formula for proving whether an arbitrarily large number is or isn't prime: perhaps you will solve this ancient problem. Returning to philosophy of science, it is clear that, despite how hard-won even the scientific method has been (representing many discoveries over millennia), that actually implementing this epistemological interface as a methodology may prove difficult to live up to in-practice: in other words, things are easier said than done.

Indeed, one should view a particular methodology (which may be specific to a field or even to a given paper) as an imperfect implementation of the epistemological interface. The reason I say, “an,” instead of, “the,” implementation is that epistemology is a large and beautiful set, and there are many ways in which one can imperfectly implement this single epistemology (some of which will be more practical, or even possible, in certain fields or for certain research questions and not others). All good scientific work will have been done by a researcher who has done their best to follow the epistemological laws that this book has laid out, but such researchers won't necessarily follow them in exactly the same way. For example, sometimes one can use experiment (particularly in the lower-level sciences such as physics and chemistry where materials are available, nature is most regular, and ethics are largely not a concern) and other times (particularly in the higher-level life sciences such as psychology and sociology, which are resource intensive and fraught with ethical concerns), one may need to settle for passive observation.^XI Not all methodologies will be equally valid (they will not all live up to approximating the epistemology equally well) and so it is likely that more peer review comments consists of airing out gripes about methodological failures than for the interpretation of empirical results (although these are intimately entwined concerns). Due to the practical tractability of certain physics problems, the ingenuity of the engineers involved, and the historical economic support behind the field, physicists have been able to establish the five-sigma standard in the field of physics—a p-value equivalent to 0.0000006.^XII A p-value is a controversial metric which is often misunderstood as being the probability that an empirical result was merely due to chance rather than the hypothetical mechanism being tested against; in truth, it is a bit more complicated than that: assuming the null hypothesis is right (that your hypothesized mechanism is wrong), the p-value estimates the probability of getting an empirical finding as-or-more extreme as yours, anyway—it is a measure of the consistency between the null hypothesis and the observed effect.^XIII

〰〰

We will use again the example of my informal survey about “severe” side-effects to the covid-19 vaccine, because p-values are related to confidence intervals, as we will see. First, assuming that it holds that such variables should follow a normal distribution or bell curve, we calculate a z-score which is simply the number of standard errors away from the null hypothesis that the mean you found is:^XIV

z ≈ (μ_Observed – μ_{Null-Hypothesis}) / σ_x

in which both μ's are mean values and σ_x is the standard error. In the best case scenario, I would have had had a control group taking a placebo so that I could measure a baseline “serious” side-effect rate (or at least sought a base-rate for such events as vomiting from some other source) which would be used as μ_{Null-Hypothesis}; I did not, and so we will make the conservative estimate that this base-rate should be approximately zero percent. Of course the real value will not be zero, and this is why ours is a conservative estimate (against the safety of the vaccine): all “serious” effects will be assumed to be caused by the intervention (with real base-rates not subtracted out, because we are treating them as negligible).^XV Given this assumption, we can approximate a z-score:

z ≈ (0.054 – 0) / 0.01988 ≈ 2.72

meaning that the mean we found (5.4%) is 2.72 standard errors away from our null hypothesis of zero. The normal distribution's cumulative distribution function would reveal for us the p-value associated with this z-score; most researchers, not being statisticians, rely on pre-calculated table-look-up to find the p-value. In such tables, the y-axis represents the two most-significant-digits of the z-score, whereas the x-axis represents the least-significant-digit of the z-score.^XVI

So we look down the left-most column for “2.7”, and then follow this row right-ward towards the column labeled “0.02”, which returns for us:

p = 0.0033

Again, this estimates that assuming the null-hypothesis is true (that no one experienced a “serious” event such as vomiting as a result of the vaccine, there is only a 0.3% likelihood of finding that 5.4% (or more) of people vomited, from some other cause(s)). This dovetails nicely with our previous finding that the null-value did not fit within our 95% confidence interval, which implied that our result of 5.4% was at least of p < 0.05 (in fact, it is an order-of-magnitude better than that).^XVII We can even beging to unify our understanding of the confidence interval and p-values: recall that when calculating the error bars, the formula we used was:^XVIII

μ_Observed ±1.96σ_x

where μ_Observed was our measured mean and σ_x was the standard error. What was this mysterious 1.96 figure? It turns out that it was the z-value associated with the chunk of a normal-distribution which contains 95% of the values (or equivalently, contains 95% of the area under the bell-curve). We can find that z-value by working backwards from the required p-value: if we want a 95% confidence interval, then we must look in the body of the table for the 0.05 / 2 = 0.025 value (these are divided by two because we want the values on either side of the mean, such that they sum up to a 95% confidence interval).^XIX When we do so, we find that it corresponds to the “-1.9” row and the “0.06” column for an absolute z-value of 1.96.

〰〰

There is no objective threshold for a p-value beyond which we can 100% reject the null-hypothesis (you will never find p to be absolute zero), and so fields have developed instead flawed conventions (there are dueling hazards with setting a fixed significance threshold: too conservative and we will reject many real discoveries; too lenient and we will accept many falsehoods).^XX The low-level field of physics has adopted the five-sigma rule: they accept only z ≥ 5 (p ≤ 0.0000006); meanwhile, many of the higher-level life-and-social sciences have contented themselves with a meager z = 1.96 (p ≤ 0.05).^XXI In the wake of social psychology's replication crisis, some authors have argued for lowering the significance threshold (known as alpha, α) by an order-of-magnitude, from 0.05 to 0.005.^XXII Others argue that we should simply print the associated p-value, confidence interval(s), and raw data set such that readers of research can make up their minds for themselves the strength of the evidence.^XXIII
One of the reasons for the discrepancy in rigor is that one may relatively easily get a very large sample size of particles and trials in physics, but doing so when the subjects are people or even societies is far more resource-intensive. Another issue is that relatively forgiving p-values allow for p-hacking; that is, taking the raw data from ones experiment and coming up with all sorts of new ways to interpret the results until one of them meets your field's p-value threshold of significance (which may be done when a study's original intent led to a negative or null result).^XXIV One will notice that the majority of sources will mention that one's significance threshold (α) must be decided before perfomring the statistical test. I would argue then that at least as much a problem as the low p-values is the lack of scientific ethic on the part of the p-hacker, who is either too dim to know better, or otherwise intellectually dishonest in his pursuits. Perhaps the most powerful attempt to combat post-hoc research practices is the process of pre-registering a study, which means that journals have to agree to publish your results ahead of time (precluding publication bias against null and negative results), and locks-in one's entire methodology before the research is carried out such that no post-hoc statistical tricks can be played.^XXV

Even confidence intervals have an arbitrary component: as we have seen, the general use of 95% confidence is equivalent to the choice of p = α = 0.05 (the added benefit is that it provides error bars around the value found to let one know over what range the evidence is consistent with).^XXVI While still flawed, this at least gives one a sense for how different the value is likely to be if the mean-measured value does indeed turn out to be imprecise (or worse, inaccurate); if the 95% confidence interval has a very small width, one is confident of the singular reported value. Multiple of these methodological measures contribute towards whether one has a high-or-low powered study. If a methodology is merely a particular implementation (that might be better or worse than some other) of the epistemology, it is itself far more vulnerable to scrutiny. In this sense, “the scientific method,” is a bit of a misnomer (though I keep the term for tradition) because the scientific method is the epistemology of science, not a given methodology trying to approximate it.

The details of methodology are one's experimental (or otherwise observational) procedure for how one performed one's research. The “methodology” section typically precedes the “results” section in a scientific paper, after the “abstract” (a brief synopsis of the paper and its results) as an implicit recognition that a field's body of knowledge (the results of their studies) are limited in quality by the quality of the methodology used. What I want to emphasize here, and what has been generally ignored in recent scientific history, is that this methodology is in turn only as good as the epistemology behind it. Since most researchers never explicitly learn about the philosophy of science, they cannot have learned the foundations for how to develop and evaluate methodologies outside of the rote norms their professors and research-journals taught them. Another side-effect of their science education starting and stopping at methodology is that it leaves students without an understanding of the different standards of different fields, which in turn leaves them innocent of a the consilient big-picture world-view of science as a whole. Despite sharing the epistemology of the scientific method, different methodologies are used even in fields as related as astrophysics (largely passive observation, exploiting the consequence of the finite speed of light, that to look further into space is to look backward in time, to infer the dynamics of massive objects) and particle physics (where experiment is both possible and largely necessary because many types of particles are often not long-lasting to see their isolated behaviors).^XXVII As an another example, the methodologies in sociology (where one is lucky if the studies are particularly empirical at all) and biology (where everything from passive observation of nature in field-work to laboratory experiment is in play) are often quite different (though there is some degree of overlap between almost all fields due to the shared epistemology), often resulting in broad tension between these fields that ought to be in a state of tight consilience.^XXVIII Yet if some standard of evidence confers some level of certainty in one of these fields, it ought to in-principle confer the same amount in another; the problem is that a physicist may be left with such a high standard of evidence that they could not justify half of what they believe in their everyday life, whereas a sociologist might end up with such a low standard of evidence that they may even be a self-proclaimed relativist (in utter opposition to the philosophy of science). The truth is that we may have to lower the standard-of-evidence at the lower-levels of science and increase it in the higher-level sciences (the latter is far more important, as per a conservative estimate biasing in favor certainty). These different methodological cultures, in the absence of an education in the underlying epistemological framework that ties them together, further exacerbates the problem of over-specialization by producing people not only un-knowledgable outside of their parochial sub-field, but who are likely to be wrongfully skeptical of any claim they might hear reported from another field simply because they are not familiar with the methodological tools at play.^XXIX A balance must be met between defeatism in which we do not even try to implement the scientific method in the higher-level sciences, and grading such fields on a curve in which low standards of evidence are considered to have conferred outsize certainty about claims in such fields. At the end of the day, consistency dictates that one should take a middle-ground approach to one's methodological standards across all fields. Many of us understand that, as difficult as normalization can be with this problem (what is a “small” or “large” level of certainty?), we must become comfortable with assessing the evidence on a per-claim basis, and try to match our level of certainty to the level of evidence.

I have noticed over the years that medical doctors and sociologists sometimes like to wax poetic about the mysterious complexity of their corner of nature.^XXX These types are sometimes brazen enough to insinuate that science is just another modern myth. The reason, it seems to me, is that their fields have complexity issues (this is similar to the projection of the social psychologist, who will say that science is experiencing a replication crisis, rather than their particular sub-field). Meanwhile, Weinberg has made the strange criticism that passive observation is seldom useful, that only experiment is legitimate (again, biased by the standards and details of his own field; this is surely an example of a double standard as it is not as if Weinberg walked around without believing anything based on non-experimental observation, including from other scientific fields).^XXXI Revealed preferences almost certainly confirm this of any experimental extremist, exposing many methodological complaints as cynical.^XXXII

Of course it ought to go without saying that the very first concern is that the researcher understands the difference between an anecdote and valid empirical evidence; an anecdote is not only a sample-size of one or few (N ≈ 1), but as important, it does not consist of an actual scientific measurement: no apparatus or situation was set up ahead of time, no tools were used to ensure an accurate and precise measurement at the time of the phenomenon, and no recording of the measurement was made at the time of the purported phenomenon. Thus anecdotes tend to be the flawed memories of poorly observed events which caught the thinker off-guard in real-time, all of which leaves them more highly colored by error and bias. The second methodological concern is then not to commit a false extrapolation from a small amount of real data; one can do a scientific study with a small sample size, even of one (N = 1) in which all of the rest of the flaws of anecdotes are removed, but this is known as a case study and is understood to be taken with a grain-of-salt, suggesting further study with a larger sample-size, if anything. (Medical doctors, for example, find such presentations of singular abnormal cases interesting). One must be careful not to see an effect in a small group and assume the effect holds for larger groups, extrapolating wildly from, say, a population of ten to a population of millions (particularly if the effect size is small—and the larger the sample, the greater the sensitivity to significant-but-small effect sizes); typically, this just isn't going to work and there actually is no real effect (there was of course some reason the effect showed up in your little study, but it is not made available by the data).^XXXIII A common heuristic is that a sample size of a hundred (N = 100) is where researchers begin to take the results of a study seriously, though this is arbitrary and subject to other issues (for example, if there is a control group, one really only has two groups of fifty, and so on);^XXXIV determining the significance of a study is an open debate in methodology, hurt by the fact that such arguments do not tend to extend beyond the border of a particular field. When it came to the observer-blind randomized control trials that validated the covid-19 vaccines, the sample sizes used were on the order of tens-of-thousands of subjects (N ≈ 10,000) that we then extrapolated (and administered) to a population on the order of hundreds-of-millions (N ≈ 100,000,000). So far, that passive observation with the larger sample size, while surprisingly poorly monitored,^XXXV does appear to broadly replicate the efficacy found in the experiments, providing general supporting evidence that N = O(10,000) is a large enough sample to make predictions about N = O(100,000,000), given these large effect-sizes.^XXXVI Smaller effect sizes will often show up as zero in a sample whose size is smaller than that for which the expectation is less than one (or even on the order-of-one).^XXXVII The expectation is:

E = pN

where E is the expected number of events (the expectation), p is the probability of a single event, and N is the number of trials. Imagine, for example, that the vaccine killed one-in-one-hundred-thousand people. The expectation in an experimental group of 15,000 people would then be (1 / 100,000) × 15,000 = 0.15 deaths (meaning that most likely, no one in the study would die from the vaccine, perhaps one might, despite the existence of a real threat lurking). Yet then when we administered the vaccine to, say, 150,000,000 people, the expectation would rise to (1 / 100,000) × 150,000,000 = 1,500 deaths, a number one hopes we would notice in the passive observation of the administration of the vaccine, but which we might not notice “in the wild”. This too requires a cost-benefit analysis, because it still might be smart for many at-risk people to get if they nevertheless have a higher likelihood of catching-and-dying from the disease covid-19 than they do from taking the vaccine.^XXXVIII To detect such a signal, one would need to control for the normally-expected number of deaths in a group of that size over that time-frame, and then control for the risk of death from the disease covid-19 before deciding whether the number of vaccine deaths was an acceptable risk (and this analysis will be different for different populations). Ultimately one has to study a phenomenon for a long time to know how it behaves over a long period of time.

Sample size isn't all that matters, there is also the threat of sample bias, in which the kind of people in one's study is skewed away from the general public. This is where representation and randomization come into play. Representation means that, to the best of your ability, you select a population that is skewed as little as possible from the general public (or from the larger population you are attempting to make claims about)—this is called a representative sample. Further, randomization means that one's sample is randomly assigned to the control or experimental group(s) so as to avoid biasing one sub-population differently than the other (for example, one would not want, in a study on the effects of smoking cigarettes, to have unknowingly assigned a group with significantly more genetic pre-disposition for lung health to one or the other group, as this would skew the results of the study one way or the other). With representation and randomization, one is essentially trying to use randomness to control for all remaining variables the study doesn't measure and control for (in a perfect world, the proportion of the two sexes in a given study would be 50/50, the different races would be factored in, all ages would be included, etc.). If someone's sample is known to be biased (on purpose or by accident), it is important for the researcher to make clear that this is a non-representative sub-population, results from which should not be naively extrapolated to the whole population.^XXXIX There will be many results shared across races, cultures, and even sexes (given our shared underlying human nature).^XXXX But other effects will be causally relevant to these same variables.

Newton is often celebrated for being the first scientist to discover a universal law; Newton's universal law of gravitation applies equally well to apples falling from trees as to planets orbiting the sun. Another way to put this is that while it's important to experimentally test this theory for its precision and accuracy by fine observation of objects dropped in a vacuum (actually, an astronaut dropping an object on the moon has been used for this purpose), one would have completely missed the point by not applying it to orbiting planets which will perhaps always remain in the realm of passive observation (lest some distant descendants of ours find a way to experimentally manipulate such massive systems—and I wouldn't put it past those rascals!). This same law of physics operates when an object falls in your everyday life, and it hardly takes much experimental artifice to check.^XXXXI There is reason for caution about the results of the passive observation of phenomena, though, particularly when it comes to complex systems.^XXXXII When there are many potential causes at play, it can be very difficult to tease apart which phenomena are actually causally contributing to the observed effects, and how much. When observing a situation where many factors are playing a role, one can't simply tell what causes what (in fact, sometimes one can hardly measure an effect because some other effect might be un-doing it): one needs to control for these different possibilities to see to what degree a given potential cause is correlated with an effect. On the flip side, it is possible for the machinations of experiment itself to throw-off the natural order of things, thereby affecting the results of the experiment^XXXXIII—confirmation via passive observation is necessary to keep an eye on a result which has ramifications for the practical world (although the problem of confounds does in general turn out to be a greater worry than that of experimental interference, particularly given a well-designed experiment, in large part because the experimenter has attempted to design and therefore know about the confounds at play in experiments).^XXXXIV One way to control for even unknown confounds is known as a controlled trial (we have been implicitly encountering these throughout): here, one has a control group for which no intervention (or better yet, a placebo intervention) is undertaken, and an experimental group in which the actual intervention is tested. The results of the control group provides a baseline against which to compare the results of the experimental group, so that one may see how much of the measured effect(s) is attributable to the intervention being experimented with (essentially by subtracting out that of the control group). As one can probably tell, the complexity which necessitates this kind of design is usually the life-sciences: the subjects of the studies utilizing such methodologies are typically organisms including humans (subjects who are internally and externally subject to a complex array of causes and effects), but one may well use a controlled design in, say, physics as well (one example is the famous double-slit experiment in which the control group of electrons is shot through a single slit and the experimental group of electrons is shot through a double-slit, on their way to a distant screen—the different pattern observed is the most oft-cited evidence for so-called wave-particle duality).^XXXXV Indeed, even the lower-level sciences can be led astray by (in this case, a passive observation with) a failure to control for alternative explanations, as when the BICEP-2 experimenters briefly believed they had measured the primordial gravitational wave background predicted by inflationary cosmology before it was discovered that the swirling pattern turned out to likely be the result of interstellar dust, rather than the signature of said gravitational waves.^XXXXVI Here, there would have been little way to control for the dust in the sense that it cannot be removed, but some clever observational design would be needed to subtract it out, or measurement would need to be taken of some messenger particle that isn't affected by interstellar dust (it is likely that direct detection of gravitational waves from interferometers such as those used by LIGO will be necessary). If one were to do an experiment on just the experimental group without a control group, one would necessarily have to compare their results to either the background baseline from some other study, whose sample will be less or differently biased, or against some imaginary, conservative baseline (biased in favor of the null hypothesis, as was demonstrated in the example earlier in this chapter). Imagine, for example, performing an experiment to test the efficacy of a vaccine against a particular disease (we will again use covid-19). Let's say that you have a sample size of fifteen-thousand people (N = 15,000), whom were given the vaccine and then followed up to see if they caught the disease (covid-19) over the next couple of months, in which 100 such subjects were found to have caught covid-19 (in this thought experiment, all values will be made up for the sake of example). What should one conclude? It would be absurd to pretend that, absent the vaccine, 100% of the population would have caught the disease and therefore attribute the vaccine as having prevented 15,000 – 100 = 14,900 cases. So one should in this case compare the experimental group's infection numbers with those of the general public—pretending that 12% of the general population had caught the disease during the same time-period (this was actually feasible given the way that this pandemic was handled in the west, with daily national case-counts tallied). The experimenter must compare their result against the passively observed 15,000 × 0.12 = 1,800 expected cases—quite less than 15,000 but quite more than 100. To calculate an intervention's efficacy, one uses the following equation:

E = 1 – (p_Experiment / p_Control)

where p_Experiment is the probability of someone in the experimental group contracting the disease, and p_Control is the probability someone in the control group contracting the disease. For the situation just discussed, this would look like the following:
                       pExperiment = 100 / 15,000 = 0.67%
                       pControl = 12%
                       E = 1 – (0.0067 / 0.12) = 0.94 = 94%

which suggests that the intervention (in this case, the vaccine) reduces the number of cases by 94% (or equivalently, reduces your chances of catching the disease, all else equal, by 94%). While this method is better than nothing, it poses several potential and hard-to-detect errors. For one, sample bias could be such that, for whatever reason, the group of subjects are not representative of the wider population (due to confounds such as age, sex, race, geographic location, prevalence of the virus in their environment / exposure, use of other interventions such as masks, etc.). To control for all of this without needing to figure out how to do so one-by-one, one can instead do a study with a sample-size of 30,000 (N_Net = 30,000) with a control group of 15,000 subjects (N_Control = 15,000) to complement the experimental group of 15,000 subjects (N_Experiment = 15,000). The control group may have nothing done to them, but will just be monitored and measured for infection the same way the experimental group is—that way, the measurements of the control group provides a realistic within-experiment baseline that controls for several variables, particularly as pertain to sample bias (because, though the experimenters will have tried not to bias their sample, the next best thing is to make sure that the control group and experimental group have the same skews so as to offer a more apples-to-apples comparison to get at just the effects attributable to the intervention). As mentioned before, one persistent issue is that there may be further sample bias between those who choose to be in the control or experimental groups (perhaps the more conservative-minded will prefer to be in the control-group whereas the risk-takers will prefer to be in the experimental group, for example)! Again, this is where randomization comes into play, as in a randomized controlled trial (RCT): in this design, though one cannot control for the inherent sample bias of who decides to participate in a study overall (because coercion would be illegal!), one can randomly assign participants to the control group and experimental group such that there is no further sample bias between the two groups being compared. But there are still potential issues with this design, considering that the control group and experimental group, knowing which group they are in, may behave differently as a result. For example, one could imagine the experimental group behaving less carefully than the control group (going to larger gatherings and choosing not to wear a mask during an on-going pandemic, for example), because they assume they are likely to be protected by the vaccine—and this may result in more cases being contracted in that cohort, blunting the measured effect-size of the vaccine. Or perhaps due to some misplaced sense of duty, wanting to get the result that the vaccine works, those in the control group will do everything they can to hermetically seal their lives from covid-19 exposure, therefore making the vaccine look more effective than it actually is. We just don't know, ahead of time, how either group will react to knowing which group they are in. This is cause to perform a blind randomized controlled trial where blindness means that not only are participants randomly assigned to the control or experimental groups, but they actually do not get to know which group they are in for the duration of the study (typically through giving such subjects a placebo such that neither group can tell if they are receiving the real intervention or a fake).^XXXXVII Even here, issues persist as the experimenters themselves could consciously or unconsciously treat the subjects of one group differently than those of another (including giving them information that breaks the subjects' blindness), potentially skewing the results: here we use the double-blind randomized controlled trial wherein double-blind means that the experiment is designed such that neither the subjects nor the administers knows which group a given subject is in—the administers don't even know whether they are administering placebos or a real intervention (of course someone or something must have recorded to which group each subject was assigned so that the experimenters can use this information in their final analysis, but crucially, this information is not brought out until after the experiment has been run). Of course, one does have certain expectations when one does a study, because one has a hypothesis whose predictions are being tested. All of this controlling of potential confounds is done in the service of maximizing the chance that the results (whether they falsify or confirm the hypothesis) have not been manipulated along the way, and instead represent a genuine result (in other words, attempting to minimize error and bias).

Representation can be particularly hard to control for, both because studies ultimately need to be opt-in, and because it is resource-intensive to source subjects from further away or from different cultures than those nearby to one's own. This is one of the merits of passive observation after all; if nothing else, it is the real-world (rather than a sample) that is being observed and extrapolated from (though even here, it is almost always a biased subset of the full population). We do not know for sure how things really work until a phenomenon's consequences are understood in the natural world, which actually regularly occurs. It is important to understand the isolated two-body problem of Newtonian gravity as pertains to, say, the Moon orbiting Earth, but it is just as important to understand the larger, more complex composite system in which star systems orbit supermassive black holes (systems known as galaxies), that planets orbit these stars, that moons orbit these planets, and that, among other gravitational consequences, moons will deferentially tug on the closer side of their planets' oceans as the planet rotates underneath, causing the tides—all the while, each exerting gravity on each-other in an N-body problem, any accelerating object emitting gravitational waves through the fabric of space-time, thereby losing kinetic energy. We could not have understood this complex system without first understanding the isolated case.

The lower level sciences often have the luxury of their subjects being identical unlike the higher-level sciences: an electron is an electron (in fact, in the current formulation of quantum physics, a given electron cannot be differentiated from others in a system where many are interacting) but a human is not just a human (there are quite a few differences between any two given individual humans, for all of their similarity when either person is compared against, say, an electron). In fact, when ascending the ontological stack, this quality is something of a spectrum, as composite objects are less and less identical to others in their class, the more complex they become: most atoms of a given chemical element are identical, but they may have differences in neutrons in their nuclei (known as isotopes).^XXXXVIII Continuing this trend, most individual ants are probably more alike each other than most individual chimpanzees are, because they're fundamentally made up of fewer parts. Human beings, because of our neurological complexity and resultant penchant for culture, are perhaps the least alike individuals of a given species (though again, we do nevertheless share a robust, common human nature). The difference between “identical” and “similar” is a huge gap, and neither side should be exaggerated. I do not, of course, believe it is literally the case that electrons are identical when one takes all characteristics into account, such as position and momentum (though I do think that it is likely that the most fundamental objects have identical internal characteristics).^XXXXIX Conversely, many in the social sciences wildly exaggerate the cultural differences between peoples (all the while downplaying both their genetic differences and similarities).^XXXXX

Clearly, there is something of a gulf between the hard or natural sciences, and the so-called soft, or social sciences. However, this methodological transition is somewhat smooth through biology. Experimental designs can be more naive until one is dealing with humans because the objects of study are more fundamental (they have fewer free-parameters, giving them lower variability), are not goal-oriented (exhibit simpler law-like behaviors), and simply do not animate human bias as powerfully (one is more likely to have emotional conceits about the facts of humans than those of particles).^XXXXXI This also explains why there is more controversy surrounding the social sciences. When Weinberg says, for example, “we learn to do experiments, not worrying about the artificiality of our arrangements,” he is likely largely right when it comes to physics, but may be naive when it comes to sociology (where exceedingly careful study designs may be needed).^XXXXXII

Not all methodologies are made equal, in the sense that they do not all do as good a job at implementing the shared epistemology of the scientific method. It is true that many singular studies will be highly suggestive on their own due to the merits of their methodologies—particularly in the lower-level sciences, a single powerful study can largely establish a result on its own. But meta-analysis is both a great way to survey the state of replication on a given area of study, and a way to attempt to tease out a result from an otherwise confusing mix of results from many different studies (and methodologies).^XXXXXIII It is naive to take a “meta-analysis-or-bust” stance as well—the vaccination being used to combat the covid-19 pandemic is not based on meta-analyses, being largely based on singular large-scale studies for each particular vaccine (though more have been carried out as time has gone on), for example (though meta-analyses over time will likely be important for observing such vaccines' longitudinal safety). A meta-analysis is a paper which examines, combines, and reports on the results of a nomological network of cumulative evidence already out there in the literature for one to find. The authors, rather than perform their own experiments or observations, gather those of others that meet some methodological quality standard, and then attempt to combine their results into one larger (and so more sensitive), less biased result (because the experimenters and samples would presumably have different skews between different experiments).^XXXXXIV

The key, after all, is to remember that methodology is merely the implementation of epistemology, and an imperfect implementation of what is ultimately an imperfect state of epistemological knowledge, at that. One of the big sins of modern scientists is to confuse their particular field's current state of methodology with the scientific method as a whole. Because no given study-design is going to perfectly implement the epistemology, we seek multiple lines of evidence produced by studies with different methodologies which naturally control for each other's weaknesses; this is the recursive application of science to itself, induction to induction.^XXXXXV All along the way, we employ the error-correcting mechanisms of peer review (of which meta-analysis is one of the greatest examples, as it represents researchers intimately grappling with not just a paper, but an entire literature, to produce more literature). Epistemology is truth-seeking in-principle and methodology is truth-seeking in-practice. This resultant peer-reviewed nomological network of cumulative evidence is ultimately the best approximation of the scientific method we can achieve at a given time, and as such, it produces the best body of knowledge on a given topic.

This volume's main goal is to teach epistemology, which requires a teaching of the basics of methodology, and the relationship between the two. The vagaries of methodology, much of which are specific to given fields and sub-fields is beyond the scope of this volume. When in doubt, the rule-of-thumb is to exhibit intellectual honesty and transparency with one's methodology: one should point out the flaws (and the consequences of those flaws) in their own papers, to aid in their peers' review of their work, thereby serving the state of knowledge over oneself or one's career.^XXXXXVI

Footnotes:

0. The Philosophy Of Science table of contents can be found, here (footnotephysicist.blogspot.com/2022/04/table-of-contents-philosophy-of-science.html).

I. See Twitter user Thomas Sowell Quotes' (@ThomasSowell) August 31^st, 2018 tweet: https://twitter.com/thomassowell/status/1035573682851258374?lang=en (I do not know its original source).

II. See “In Defense Of Philosophy (Of Science)” by Gussman (https://footnotephysicist.blogspot.com/2021/05/in-defense-of-philosophy-of-science.html); "Science Makes Philosophy Obsolete" by Goldstein (https://www.edge.org/response-detail/25423) from This Idea Must Die edited by Brockman (pp. 129-131); and “Scientific Realism” by Goldstein (https://www.edge.org/response-detail/27113) from This Idea Is Brilliant edited by Brockman (pp. 273-276).

III. See the “Reason” chapter which further cite Enlightenment Now by Pinker (pp. 27, 127); the “Logic” chapter; Cosmos by Sagan and Druyan (pp. xviii, 94, 194); The Demon Haunted World by Sagan (pp. 20-22, 230, 274-275, 414, 423); Cosmos: Possible Worlds by Druyan (pp. 75); The Ape That Understood The Universe by Stewart-Williams (pp. 229-230, 268); Enlightenment Now by Pinker (pp. 7, 11, 26-28, 83, 393, 408-409); “Recursion” by Montague (https://www.edge.org/response-detail/27035) and “Fallibilism” by Curry (https://www.edge.org/response-detail/27192), both from This Idea Is Brilliant edited by Brockman (pp. 61-62, 82-83); “Science Advances By Funerals” by Barondes (https://www.edge.org/response-detail/25386) and “Planck's Cynical View Of Scientific Change” by Mercier (https://www.edge.org/response-detail/25332), both from This Idea Must Die edited by Brockman (pp. 481-485); “Science Must Destroy Religion” by Harris (https://www.edge.org/response-detail/11122) in What Is Your Dangerous Idea? edited by Brockman (pp. 148-151); and The Coddling Of The American Mind by Lukianoff and Haidt (pp. 109).

IV. See the “Logic” chapter.

V. See the “Cosmos And Chaos” chapter; the “Knowledge As Provisional” chapter which further cites “Carl Sagan's Last Interview With Charlie Rose (Full Interview)” by Rose and Sagan (at least 3:55 – 4:08) and The Demon Haunted World by Sagan (pp. 13, 22, 32, 69, 140, 165, 316-317, 326, 335, 348, 434).

VI. See the “Mechanical Philosophy” and “Intellectual Honesty” chapters.

VII. For the official documentation on C# interfaces, see “interface (C# Reference)” (Microsoft) (accessed 12/10/2022) (https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/interface) (though I have not read this whole document).

VIII. I'm reminded of the Sowell quote at the top of the chapter.

IX. One thing this algorithm has going for it are short-circuit solutions: if a number isn't prime (and most aren't), then this will generally be found out in << n steps (meaning far fewer than n steps), since the moment any novel factor was found, false would be returned. For example, half of numbers are even, meaning half of numbers will immediately return false upon finding that n is divisible by two (excepting two itself, which is the only even prime). This is a hint for one way to refactor (or fix) this code: we can change our loop-step to i += 2 to skip even numbers outright, since all prime numbers are odd (except two, which will require some special case to catch). In the case of a very large prime, however, all potential factors between 2 and n – 1 will need to be tested to verify that it is indeed a prime.

X. O(c) would mean that the problem is of a constant order-of-magnitude, regardless of the input size. One example is our simple algorithm for checking whether a number is even or odd:

       bool IsEven(int n){
           if(n % 2 == 0){
               return true;
           }else{
               return false;
           }
       }
This algorithm takes the same number of steps, and the same amount of time, regardless of the size of the input n, despite there being infinitely many inputs (limited in practice by the upper-limit on the size of integers in a particular language or on a particular machine). See the “Computation” chapter.

XI. See “In Defense Of Philosophy (Of Science)” by Gussman (https://footnotephysicist.blogspot.com/2021/05/in-defense-of-philosophy-of-science.html#FN9A) which further cites To Explain The World by Weinberg (pp. 189), "Michael Shermer With Jared Diamond—Upheaval: Turning Points For Nations In Crisis (SCIENCE SALON65)" by Michael Shermer and Jared Diamond (Skeptic) (2019) (https://www.youtube.com/watch?v=RUbZiQitR7U) (9:09-12:53), and The Consitution Of The United States Of America: And Selected Writings Of The Founding Fathers by John Adams, Benjamin Franklin, Alexander Hamilton, John Hancock, Patrick Henry, John Jay, Thomas Jefferson, James Madison, James Monroe, and George Washington (Barnes & Noble) (1761-1992 / 2012) (though I have not yet finished this collection).

XII. See “68-95-99.7 Rule” (Wikipedia) (accessed about 12/11/2022) (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule).

XIII. See “p-value” (Wikipedia) (accessed aound 12/11/2022) (https://en.wikipedia.org/wiki/P-value) (though I have not read this entire entrt) which further cites "Not Even Scientists Can Easily Explain p-values" by Christie Aschwanden (FiveThirtyEight) (2015) (https://web.archive.org/web/20190925221600/https://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/), "Why P Values Are Not A Useful Measure Of Evidence In Statistical Significance Testing" by Raymond Hubbard and R. Murray Lindsay (Theory & Psychology) (2008) (https://journals.sagepub.com/doi/10.1177/0959354307086923), "A Manifesto For Reproducible Science" by Marcus R. Munafò et al. (Nature Human Behavior) (2017) (https://www.nature.com/articles/s41562-016-0021), "Scientific Method: Statistical Errors: P Values, The 'Gold Standard' Of Statistical Validity, Are Not As Reliable As Many Scientists Assume" by Regina Nuzzo (Nature) (2014) (https://www.nature.com/articles/506150a), "The ASA Statement On p-values: Context, Process, And Purpose" by Ronald L. Wasserstein and Nicole A. Lazar (The American Statistician) (2016) (https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108), "An Investigation Of The False Discovery Rate And The Misinterpretation Of p-values" by David Colquhoun (Royal Society Open Science) (2014) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448847/), and "The Reproducibility Of Research And The Misinterpretation Of p-values" by David Colquhoun (Royal Society Open Science) (2017) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5750014/) (though I have not read these works); "Understanding Results: P-values, Confidence Intervals, And Number Need To Treat" by Lawrence Flechner and Timothy Y. Tseng (IJU) (2011) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3263226/) which further cites "Sifting The Evidence—What's Wrong With Significance Tests?" by Jonathan A. C. Sterne and George Davey Smith (BMJ) (2001) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1119478/) and "Statistical Guidelines For Contributors To Medical Journals" by D. G. Altman et al. (Br Med J) (1983) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1547706/) (though I have not read Flechner's piece in its entirety nor those he cited at all); and “p-value Calculator” by Bogna Szyk et al. (omni Calculator) (accessed about 12/11/2022) (https://www.omnicalculator.com/statistics/p-value) (though I have not read this entire article). It is okay if you find this nuance of statistical philosophy confusing: all of us do. But statisticians will get very angry if you do not draw this distinction, presumably for good reason (how's that for the expertise heuristic in action)!

XIV. See “z-test” (Wikipedia) (accessed around 11/12/2022) (https://en.wikipedia.org/wiki/Z-test); "Central Limit Theorem" (Wikipedia) (accessed about 12/12/2022) (https://en.wikipedia.org/wiki/Central_limit_theorem); “Standard Score” (Wikipedia) (accessed about 12/12/2022) (https://en.wikipedia.org/wiki/Standard_score); "Student's t-test" (Wikipedia) (accessed around 12/12/2022) (https://en.wikipedia.org/wiki/Student%27s_t-test); “How To Calculate A Test Statistic (With Types And Examples)” by Indeed Editorial Team (indeed) (2021 / 2022) (https://www.indeed.com/career-advice/career-development/how-to-calculate-test-statistic); Chapter 7: “The t Tests” in Statistics At Square One: Ninth Edition by T. D. V. Swinscow and M J Campbell (BMJ) (1997) (https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/7-t-tests); “68-95-99.7 Rule” (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule); "How To Calculate A p-value From A t-test By Hand" by How to Calculate a P-Value from a T-Test By Hand" by Zach (Statology) (2020) (https://www.statology.org/how-to-calculate-a-p-value-from-a-t-test-by-hand/); and “A Guide To The T-Test (Definition, Purpose And Steps” by Indeed Editorial Team (indeed) (2021 / 2022) (https://www.indeed.com/career-advice/career-development/t-test) (though I have do not think I have read any of these sources in their entirety). I caution the reader that I am learning about the ins-and-outs of these statistical significance tests as I write, here. As an introduction, I do not draw distinctions between z-scores and t-scores, and the concomitant normal distributions and Student's t-distributions; any errors as a result are mine alone, to be corrected in future editions of this volume. Along similar lines, I am not sure that I am treating my example sample properly: this hinges on the question of whether the implicit assumptions of a two-sided normal distribution apply to my measure of “severe” side-effects sampled among one hundred people; again, any errors are mine alone, to be fixed in a future editions.

XV. As mentioned previously, the actual covid-19 vaccine RCTs found roughly similar rates of more serious adverse effects in both their control and experimental groups, see the “Approximation” chapter which further cites “Efficacy And Safety Of The mRNA-1273 SARS-CoV-2 Vaccine” by Baden et al. (https://www.nejm.org/doi/full/10.1056/nejmoa2035389); “Safety And Efficacy Of The BNT162b2 mRNA Covid-19 Vaccine” by Polack et al. (https://www.nejm.org/doi/full/10.1056/nejmoa2034577); and “Safety And Efficacy Of Single-Dose Ad26.COV2.S Vaccine Against Covid-19” by Sadoff et al. (https://www.nejm.org/doi/full/10.1056/NEJMoa2101544).

XVI. Though I produced these table graphics on my own using OpenOffice Calc, I retrieved the values from the tables present in "How To Use The z-table" by The Experts At Dummies (Wiley / dummies) (2022) (https://www.dummies.com/article/academics-the-arts/math/statistics/how-to-use-the-z-table-147241/) from Statistics: 1001 Practice Problems For Dummies (+ Free Online Practice): 1st Edition by The Experts At Dummies (Wiley / dummies) (2022) (though I have not read this full work). See also “How To Use The z-table” by Saul McLeod (SimplyPsychology) (2019) (https://www.simplypsychology.org/z-table.html) (though I have not read this article in its entirety).

XVII. See the “Approximation” chapter; “z-test” (https://en.wikipedia.org/wiki/Z-test); “The t Tests” by Swinscow and Campbell (https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/7-t-tests); "Understanding Results: P-values, Confidence Intervals, And Number Need To Treat" by Flechner and Tseng (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3263226/) which further cites "Statistical Guidelines For Contributors To Medical Journals" by Altman et al. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1547706/); "p - value And Confidence Intervals – Facts And Farces" by B.O. Adedokun (Annals Of Ibadan Postgraduate Medicine) (2008) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4111020/) (though I have not read this entire article) which further cites Essential Medical Statistics: Second Edition by Betty R. Kirkwood and Jonathan A. C. Sterne (Blackwell Publishing) (1988 / 2003) (though I have never read this work); "Confidence Intervals And p-values" by Wayne W. LaMorte (Boston University School Of Public Health) (2016) (https://sphweb.bumc.bu.edu/otlt/mph-modules/ep/ep713_randomerror/ep713_randomerror6.html); "Theories About μ: Connection Between p-values And Confidence Intervals" (AMSI) (2015) (https://amsi.org.au/ESA_Senior_Years/SeniorTopic4/4i/4i_4theories_3.html); “68-95-99.7 Rule” (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule); “Student's t-distribution” (Wikipedia) (accessed around 12/12/2022) (https://en.wikipedia.org/wiki/Student%27s_t-distribution) (though I have not read this entry in its entirety); and “Normal Distrubtion” (Wikipedia) (accessed around 12/12/2022) (https://en.wikipedia.org/wiki/Normal_distribution) (though I have not read this entry in its entirety).

XVIII. See the references in footnote XVII as well as “Error Bar” (https://en.wikipedia.org/wiki/Error_bar) which further cites “Standard Deviation, Standard Error: Which 'Standard' Should We Use?” by Brown (https://jamanetwork.com/journals/jamapediatrics/article-abstract/510667); and “Standard Error” (https://en.wikipedia.org/wiki/Standard_error) which further cites “Standard Deviations And Standard Errors” by Altman and Bland (https://www.bmj.com/content/331/7521/903) and The Cambridge Dictionary Of Statistics by Everitt and Skrondal.

XIX. See "How To Use The z-table" by The Experts At Dummies (https://www.dummies.com/article/academics-the-arts/math/statistics/how-to-use-the-z-table-147241/).

XX. See “p-value” (https://en.wikipedia.org/wiki/P-value).

XXI. See “68-95-99.7 Rule” (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule).

XXII. See “Replication Crisis” (Wikipedia) (accessed 12/12/2022) (https://en.wikipedia.org/wiki/Replication_crisis#Statistical_reform) (though I have only read the "Statistical Reform: Requiring Smaller p-values" section) which further cites “Redefine Statistical Significance” by Daniel J. Benjamin et al. (Nature Human Behaviour) (2017) (https://www.nature.com/articles/s41562-017-0189-z) (though I have yet to read this article).

XXIII. See “Replication Crisis” (https://en.wikipedia.org/wiki/Replication_crisis#Statistical_reform) which further cites "Justify Your Alpha" by D. Lakens et al. (Nature Human Behaviour) (2018) (https://pure.mpg.de/pubman/faces/ViewItemOverviewPage.jsp?itemId=item_3157212) (though I have not yet read this paper); “p-value” (https://en.wikipedia.org/wiki/P-value) which further cites "The ASA Statement On p-values: Context, Process, And Purpose" by Wasserstein and Lazar (https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108), "Alternatives To p value: Confidence Interval And Effect Size" by Dong Kyu (KJA) (2016) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133225/), "Why The p-value Culture Is Bad And Confidence Intervals A Better Alternative" by J. Ranstam (Osteoarthritis And Cartilage) (2012) (https://www.oarsijournal.com/article/S1063-4584(12)00778-9/fulltext), "Sifting The Evidence: Likelihood Ratios Are Alternatives To p values" by Thomas V. Perneger (BMJ) (2001) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1120301/), Chapter 5: "The Likelihood Paradigm For Statistical Evidence" (https://academic.oup.com/chicago-scholarship-online/book/28793/chapter-abstract/239484917?redirectedFrom=fulltext&login=false) from The Nature of Scientific Evidence: Statistical, Philosophical, and Empirical Considerations edited by Mark L. Taper and Subhash R. Lele (University Of Chicago Press) (2004), "Bayes-Factors: A Miracle Cure For The Replicability Crisis In Psychological Science" by Ulrich Schimmack (Replicability-Index) (2015) (https://replicationindex.com/2015/04/30/replacing-p-values-with-bayes-factors-a-miracle-cure-for-the-replicability-crisis-in-psychological-science/), "Hypothesis Testing: From p values To Bayes Factors" by John I. Marden (Journal of the American Statistical Association) (2000) (https://www.jstor.org/stable/2669779?origin=crossref), "A Test By Any Other Name: p-values, Bayes Factors And Statistical Inference" by Hal S. Stern (Multivariate Behavioral Research) (2016) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4809350/), "In Defense Of p values" by Paul A. Murtaugh (Ecology) (2014) (https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/13-0590.1), "Statisticians Found One Thing They Can Agree On: It’s Time To Stop Misusing p-values" by Christie Aschwanden (FiveThirtyEight) (2016) (https://fivethirtyeight.com/features/statisticians-found-one-thing-they-can-agree-on-its-time-to-stop-misusing-p-values/), "The Earth Is Flat (p > 0.05): Significance Thresholds And The Crisis Of Unreplicable Research" by Valentin Amrhein, Fränzi Korner-Nievergelt, and Tobias Roth (PeerJ) (2017) (https://peerj.com/articles/3544/), "Remove, Rather Than Redefine, Statistical Significance" by Valentin Amrhein and Sander Greenland (Nature Human Behaviour) (2018) (https://www.nature.com/articles/s41562-017-0224-0), and "The Reproducibility Of Research And The Misinterpretation Of p-values" by Colquhoun (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5750014/) (though I have read none of these works); and “p-value Calculator” by Szyk et al. (https://www.omnicalculator.com/statistics/p-value) which further cites "Still Not Significant" by Matthew Hankins (Probable Error) (2013) (https://mchankins.wordpress.com/2013/04/21/still-not-significant-2/) and "The Mind-Reading Salmon: The True Meaning Of Statistical Significance" by Charles Seife (Scientific American) (2011) (https://www.scientificamerican.com/article/the-mind-reading-salmon/) (though I have not read these works).

XXIV. See "Data Dredging" (Wikipedia) (accessed 12/12/2022) (https://en.wikipedia.org/wiki/Data_dredging) (though I only read the introductory paragraphs) which further cites "The ASA Statement On p-values: Context, Process, And Purpose" by Wasserstein and Lazar (https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108), "Data Dredging, Bias, Or Confounding" by George Davey Smith (BMJ) (2002) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1124898/), and "Deming, Data And Observational Studies: A Process Out Of Control And Needing Fixing" by S. Stanley Young and Alan Karr (Significance) (2011) (https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2011.00506.x) (though I have not read these papers). Remember the difference between a prediction and a post-diction, see the “Empiricsm” chapter which further cites The Elegant Universe by Greene (pp. 210-211).

XXV. See “Online Bettors Can Sniff Out Weak Psychology Studies” by Yong (https://www.theatlantic.com/science/archive/2018/08/scientists-can-collectively-sense-which-psychology-studies-are-weak/568630/); "Bad Data Analysis And Psychology's Replication Crisis" by Ferguson (https://quillette.com/2019/07/15/bad-data-analysis-and-psychologys-replication-crisis/); the "6.6. Data Blinding" section of "Interpretations And Methods: Towards A More Effectively Self-Correcting" by Lee Jussim and Jarret T. Crawford (Journal of Experimental Social Psychology) (2016) (https://sites.rutgers.edu/lee-jussim/wp-content/uploads/sites/135/2019/05/Jussimetal2016JESPmethodspaper.pdf) (though I have not yet read this wider article); and "Psychology’s Replication Crisis" by Trenton Knauer (Areo) (2019) (https://areomagazine.com/2019/10/01/psychologys-replication-crisis/) which further cites "A Social Psychological Model Of Scientific Practices: Explaining Research Practices And Outlining The Potential For Successful Reforms" by Lee Jussim et al. (Psychol Belg) (2019) (https://pubmed.ncbi.nlm.nih.gov/31565236/) (though I have not read this article).

XXVI. See “z-test” (https://en.wikipedia.org/wiki/Z-test); “The t Tests” by Swinscow and Campbell (https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/7-t-tests); "Understanding Results: P-values, Confidence Intervals, And Number Need To Treat" by Flechner and Tseng (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3263226/) which further cites "Statistical Guidelines For Contributors To Medical Journals" by Altman et al. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1547706/); "p - value And Confidence Intervals – Facts And Farces" by Adedokun (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4111020/) which further cites Essential Medical Statistics: Second Edition by Kirkwood and Sterne; "Confidence Intervals And p-values" by LaMorte (https://sphweb.bumc.bu.edu/otlt/mph-modules/ep/ep713_randomerror/ep713_randomerror6.html); "Theories About μ: Connection Between p-values And Confidence Intervals" (https://amsi.org.au/ESA_Senior_Years/SeniorTopic4/4i/4i_4theories_3.html); “68-95-99.7 Rule” (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule); “Student's t-distribution” (https://en.wikipedia.org/wiki/Student%27s_t-distribution); and “Normal Distrubtion” (https://en.wikipedia.org/wiki/Normal_distribution).

XXVII. See “In Defense Of Philosophy (Of Science)” by Gussman (https://footnotephysicist.blogspot.com/2021/05/in-defense-of-philosophy-of-science.html#FN9A) which further cites To Explain The World by Weinberg (pp. 189), "Michael Shermer With Jared Diamond—Upheaval: Turning Points For Nations In Crisis (SCIENCE SALON65)" by Shermer and Diamond (https://www.youtube.com/watch?v=RUbZiQitR7U) (9:09-12:53), and The Consitution Of The United States Of America: And Selected Writings Of The Founding Fathers by Adams, Franklin, Hamilton, Hancock, Henry, Jay, Jefferson, Madison, Monroe, and Washington.

XXVIII. See “Bret And Heather 16th DarkHorse Podcast Livestream: Meaning, Notions, & Scientific Commotions” by B. Weinstein and H. Heying (https://www.youtube.com/watch?v=QvljruLDhxY) (0:59 – 51:37); “#84: Hey YouTube: Divide by Zero (Bret Weinstein & Heather Heying DarkHorse Livestream)” by B. Weinstein and H. Heying (https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5idXp6c3Byb3V0LmNvbS80MjQwNzUucnNz/episode/QnV6enNwcm91dC04NzMwNTYw?sa=X&ved=0CAUQkfYCahcKEwiY2uSX-Pn6AhUAAAAAHQAAAAAQcg) (40:38 – 1:23:45); and Scientist by Rhodes (pp. 45, 54-55, 62-65, 70-76, 118, 132-133, 180, 199-200).

XXIX. See “In Defense Of Philosophy (Of Science)” by Gussman (https://footnotephysicist.blogspot.com/2021/05/in-defense-of-philosophy-of-science.html#FN12B) (footnote 12); “Bret And Heather 6th Live Stream: Death And Peer Review - DarkHorse Podcast” by B. Weinstein and H. Heying (https://www.youtube.com/watch?v=zc6nOphi0yE) (30:40 – 59:56); “Bret And Heather 54th DarkHorse Podcast Livestream: Lane Splitting In The Post-Election Era” by B. Weinstein and H. Heying (https://www.youtube.com/watch?v=tZMskLj1N0I) (48:09 – 50:42); “Bret And Heather 79th DarkHorse Podcast Livestream: #NotAllMice” by B. Weinstein and H. Heying (https://www.youtube.com/watch?v=bU63lsHA0y0) (47:27 – 51:04); and “Bret And Heather 81st DarkHorse Podcast Livestream: Permission To Think” by B. Weinstein and H. Heying (https://www.youtube.com/watch?v=LoaKtBMk53Y) (15:31 – 38:34).

XXX. See the “Elegance And Complexity” chapter, which further cites “The Pursuit Of Parsimony” by Haidt (https://www.edge.org/response-detail/25346) and “The Clinician's Law Of Parsimony” by Smallberg (https://www.edge.org/response-detail/25415) both from This Idea Must Die edited by Brockman (pp. 493-498).

XXXI. See the “Empiricism” chapter which further cites “In Defense Of Philosophy (Of Science) by Gussman (https://footnotephysicist.blogspot.com/2021/05/in-defense-of-philosophy-of-science.html#FN9A) which in turn cites To Explain The World by Weinberg (pp. 194, 200, 254-255).

XXXII. For more on Jussim's “selective calls for rigor”, see the “Logic” and “The Sociology Of Scientists” chapters.

XXXIII. See “Bad Data Analysis And Psychology's Replication Crisis” by Ferguson (https://quillette.com/2019/07/15/bad-data-analysis-and-psychologys-replication-crisis/); "Effect Size" by Bruce Hood (Edge / Harper Perennial) (2017 / 2018) (https://www.edge.org/response-detail/27139) from This Idea Is Brilliant edited by Brockman (pp. 479-481); and "How A Rebellious Scientist Uncovered The Surprising Truth About Stereotypes" by Claire Lehmann (Quillette) (2015) (https://quillette.com/2015/12/04/rebellious-scientist-surprising-truth-about-stereotypes/).

XXXIV. Again we are confronted with desiring an arbitrary significance threshold for what is in reality a continuous, case-by-case variable: a very large effect found with a relatively small sample size has the potential to be pointing to a real phenomenon, with size roughly unknown.

XXXV. See for example "How So Save The World, In Three Easy Steps." by B. Weinstein, R. Malone, and S. Kirsch (https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5idXp6c3Byb3V0LmNvbS80MjQwNzUucnNz/episode/QnV6enNwcm91dC04NjgzNzgz?ep=14) (though I disagree with perhaps the majority of what is said, here).

XXXVI. Safety is another matter. The experiments provided evidence of general short-term safety, but cannot have provided longitudinal evidence of safety because of course not enough time had elapsed since the vaccine's development, let alone administration, to see. Early adopters of a medicines are always taking some level of longitudinal risk—we will see how this plays out in the case of mRNA vaccination in the coming years and decades. Furthermore, these trials were only sensitive to effects larger than about 1 / 10,000 = 0.01%. Considering that these novel mRNA vaccines were promptly given to on the order of one-hundred-million Americans (including my own three doses), if such a “small” effect were missed, it would be expected to affect on the order of 100,000,000 × 0.01% = 10,000 Americans (if this side-effect were death, this might not appear to be a particularly “small” effect size). Even this would however be much smaller than the roughly one-million Americans who have succumbed to covid-19 itself, and happily, such large numbers of vaccine deaths do not seem to have materialized. Furthermore, these experiments did not study a sample large enough for any expected covid-19 deaths, meaning the vaccines' extra protection against death were unknown until the real-world observational data came in (which did not stop people from saying otherwise).

XXXVII. All of this business about sample sizes and the expectation value is really just a restatement of the law of large numbers: that as the number of trials tends to infinity, the the statistical distribution approaches the probability. Put simply: if an effect occurs in only one-in-one-thousand people (0.1%) but a study's sample size is only one-hundred (N = 100), then that study will not be able to pick up the effect.

XXXVIII. See the “Baselines And Null Hypotheses” chapter.

XXXIX. I was once asked in the private sector to review papers submitted to a business conference. This was not a serious journal such as Nature and it showed: the attitudes were extremely lax, and this literature was bandied about as a formality that no one would actually read. I raised at least one eyebrow by taking seriously my task to provide reviewer comments, but all I had done was the most cursory peek into the sources that these authors cited, and commented on their often weak methodologies. My favorite example was a paper which made some claim about the general population as pertains to their experience with virtual / augmented reality headset use. When I followed the source cited for the claim, I found that the study's sample had been, I kid you not, something like ten elderly diseased Chinese men. This fact wasn't just tucked away in a “methodology” section, indeed it was in the title of the cited study! One can scarcely imagine a weaker sample: small, age-biased, disease-biased, and race-biased all at once; needless to say (I would have thought!), it is absurd to extrapolate a result from such a population to people in general. This highlights as well why I myself (within reason) go through great pains to cite not only my sources, but my sources' sources, and to mention whether and to what degree I have read the works I mention. I know this opens me up to both fair and cynical criticism (I get the sense that most readers would be surprised by what we would find if everyone followed the same standard as I do), but the former is good: transparency is part of the scientific ethic, as it promotes intellectual honesty and better allows readers to assess the work.

XXXX. See The Blank Slate by Pinker (pp. 55, 57, 455-459) which further cites “Human Universals” by Brown (https://psycnet.apa.org/record/1991-98084-000); and "Psychology’s Replication Crisis Is Running Out Of Excuses" by Yong which further cites "Many Labs 2" by Klein et al. (https://osf.io/ux3eh/).

XXXXI. Though I generally disagree with much of the wording of journalist Nicholas G. Carr's article (I believe that “anecdote” is being confused for “scientific passive observation”), there exists some common ground between us, see “Anti-Anecdotalism” by Nicholas G. Carr (Edge / Harper Perennial) (2014 / 2015) (https://www.edge.org/response-detail/25448) from This Idea Must Die edited by Brockman (pp. 128).

XXXXII. See the “Elegance And Complexity” chapter.

XXXXIII. See “In Defense Of Philosophy (Of Science) by Gussman (https://footnotephysicist.blogspot.com/2021/05/in-defense-of-philosophy-of-science.html#FN13A) and "#84: Hey YouTube: Divide By Zero (Bret Weinstein & Heather Heying DarkHorse Livestream)" by B. Weinstein and H. Heying (https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5idXp6c3Byb3V0LmNvbS80MjQwNzUucnNz/episode/QnV6enNwcm91dC04NzMwNTYw?sa=X&ved=0CAUQkfYCahcKEwiY2uSX-Pn6AhUAAAAAHQAAAAAQcg) (40:38 – 1:23:45).

XXXXIV. See “In Defense Of Philosophy (Of Science) by Gussman (https://footnotephysicist.blogspot.com/2021/05/in-defense-of-philosophy-of-science.html#FN25A) which further cites To Explain The World by Weinberg (pp. 255).

XXXXV. Look forward to the “Physics” chapter in the “Ontology” volume.

XXXXVI. See the "BICEP2" section of "BICEP And Keck Array" (Wikipedia) (accessed 12/13/2022) (https://en.wikipedia.org/wiki/BICEP_and_Keck_Array#BICEP2) which further cites "BICEP2 I: Detection Of B-mode Polarization At Degree Angular Scales" by P. A. R. Ade et al. (arXiv / Physical Review Letters) (2014) (https://arxiv.org/abs/1403.3985) and "Planck intermediate results. XXX. The angular power spectrum of polarized dust emission at intermediate and high Galactic latitudes" by R. Adam et al. (arXiv / Astronomy & Astrophysics) (https://arxiv.org/abs/1409.5738) (though I have not read these papers).

XXXXVII. One specific example of this feature of the design is that, if the placebo effect is real, this design controls for that as well (it is possible your body unconsciously behaves differently, given your brain believes you have taken an intervention).

XXXXVIII. See Chemistry by Atkins (pp. 16).

XXXXIX. Interestingly, Wheeler once told Feynman of a hypothesis of his in which all observed electrons were somehow actually the result of the behavior of the one and only electron, see "One-Electron Universe" (Wikipedia) (accessed 12/13/2022) (https://en.wikipedia.org/wiki/One-electron_universe) (though I have not just read beyond the opening) which further cites Feynman's Nobel Lecture "The Development Of The Space-Time View Of Quantum Electrodynamics" by Richard P. Feynman (The Royal Swedish Academy Of Sciences / Nobel Foundation) (1965) (https://www.nobelprize.org/prizes/physics/1965/feynman/lecture/) (though I have not read this article).

XXXXX. See The Blank Slate by Pinker and "Psychology’s Replication Crisis Is Running Out Of Excuses" by Yong which further cites "Many Labs 2" by Klein et al. (https://osf.io/ux3eh/).

XXXXXI. I believe Saad has discussed this point on The Saad Truth, but I do not know in which episode. See also The Blank Slate by Pinker.

XXXXXII. See To Explain The World by Weinberg (pp. 255). Actually, even when it comes to physics, I believe this naivete is partially responsible for the out-of-bounds extrapolation from the “quantum weirdness” at play in highly contrived particle physics experiments to the middle-world we live in. Not only is it not clear whether, say, measurement “collapses” the wave-function of the object, or instead whether there is a multiverse in which all possibilities occur, it is not clear that the measurement problem even applies at the scales and energy levels of middle-world (yet it is today always assumed to), see “In Defense Of Philosophy (Of Science) by Gussman (https://footnotephysicist.blogspot.com/2021/05/in-defense-of-philosophy-of-science.html#FN26B) (footnote 26) which further cites “The Good Kind Of Danger” by Steven Gussman (Instagram) (2021) (pp. 6-7 / img. 7-8) (https://www.instagram.com/p/CJkY5HlA_98/), which in turn notes that, “I thought that [Carroll] (or his [Mindscape] guest) had similarly made mention of the (albeit unfailing) somewhat narrow experimental evidence for 'quantum weirdness' in the lab,” but that I could not find it when scrubbing through the Mindscape interviews of author Rob Reid, astrophysicist Adam Becker, nor philosopher David Albert. Now-equipped with knowledge of the searchable “transcript” function on YouTube, it seems likely the quote I was in search of can be found in: "Mindscape 59 | Adam Becker On The Curious History Of Quantum Mechanics" by Sean Carroll and Adam Becker (Mindscape) (2019) (https://www.youtube.com/watch?v=em7dkYZTetE) (7:45-8:30, 18:35-21:53, 45:50-47:14, 1:24:16-1:25:49) and "Episode 36: David Albert On Quantum Measurement And The Problems With Many-Worlds" by Sean Carroll and David Albert (Mindscape) (2019) (https://www.youtube.com/watch?v=AglOFx6eySE) (14:41-20:02 51:12-58:00).

XXXXXIII. See "#84: Hey YouTube: Divide By Zero (Bret Weinstein & Heather Heying DarkHorse Livestream)" by B. Weinstein and H. Heying (https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5idXp6c3Byb3V0LmNvbS80MjQwNzUucnNz/episode/QnV6enNwcm91dC04NzMwNTYw?sa=X&ved=0CAUQkfYCahcKEwiY2uSX-Pn6AhUAAAAAHQAAAAAQcg) (40:38 - 1:23:45) and "Bret And Heather 88th DarkHorse Podcast Livestream: How Bread Got Broken" by Bret Weinstein and Heather Heying (DarkHorse) (2021) (https://www.youtube.com/watch?v=KSWu6DUFFt4&list=PLjQ2gC-5yHEug8_VK8ve0oDSJLoIU4b93&index=132) (3:14 - 6:23) (although I do not believe I've otherwise heard this episode).

XXXXXIV. See "#84: Hey YouTube: Divide By Zero (Bret Weinstein & Heather Heying DarkHorse Livestream)" by B. Weinstein and H. Heying (https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5idXp6c3Byb3V0LmNvbS80MjQwNzUucnNz/episode/QnV6enNwcm91dC04NzMwNTYw?sa=X&ved=0CAUQkfYCahcKEwiY2uSX-Pn6AhUAAAAAHQAAAAAQcg) (40:38 - 1:23:45) and "Bret And Heather 88th DarkHorse Podcast Livestream: How Bread Got Broken" by B. Weinstein and H. Heying (https://www.youtube.com/watch?v=KSWu6DUFFt4&list=PLjQ2gC-5yHEug8_VK8ve0oDSJLoIU4b93&index=132) (3:14 – 6:23).

XXXXXV. "#84: Hey YouTube: Divide By Zero (Bret Weinstein & Heather Heying DarkHorse Livestream)" by B. Weinstein and H. Heying (https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5idXp6c3Byb3V0LmNvbS80MjQwNzUucnNz/episode/QnV6enNwcm91dC04NzMwNTYw?sa=X&ved=0CAUQkfYCahcKEwiY2uSX-Pn6AhUAAAAAHQAAAAAQcg) (40:38 – 1:23:45); "Bret And Heather 88th DarkHorse Podcast Livestream: How Bread Got Broken" by B. Weinstein and H. Heying (DarkHorse) (2021) (https://www.youtube.com/watch?v=KSWu6DUFFt4&list=PLjQ2gC-5yHEug8_VK8ve0oDSJLoIU4b93&index=132) (3:14 – 6:23); the “Reason” chapter which further cites Enlightenment Now by Pinker (pp. 27, 127); the “Logic” chapter; Cosmos by Sagan and Druyan (pp. xviii, 94, 194); The Demon Haunted World by Sagan (pp. 20-22, 230, 274-275, 414, 423); Cosmos: Possible Worlds by Druyan (pp. 75); The Ape That Understood The Universe by Stewart-Williams (pp. 229-230, 268); Enlightenment Now by Pinker (pp. 7, 11, 26-28, 83, 393, 408-409); “Recursion” by Montague (https://www.edge.org/response-detail/27035) and “Fallibilism” by Curry (https://www.edge.org/response-detail/27192), both from This Idea Is Brilliant edited by Brockman (pp. 61-62, 82-83); “Science Advances By Funerals” by Barondes (https://www.edge.org/response-detail/25386) and “Planck's Cynical View Of Scientific Change” by Mercier (https://www.edge.org/response-detail/25332), both from This Idea Must Die edited by Brockman (pp. 481-485); “Science Must Destroy Religion” by Harris (https://www.edge.org/response-detail/11122) in What Is Your Dangerous Idea? edited by Brockman (pp. 148-151); and The Coddling Of The American Mind by Lukianoff and Haidt (pp. 109).

XXXXXVI. See Letters To A Young Scientist by E. O. Wilson (pp. 239-240).

Search This Blog

Footnote Physicist

Chapter XXIX: Methodology | The Philosophy Of Science by Steven Gussman [1st Edition]

Comments

Post a Comment

Popular posts from this blog

Table Of Contents | The Philosophy Of Science by Steven Gussman [1st Edition]

The Consciousness Conundrum

The Genetic Leash