Chapter XXX: Statistics, Probability, And Games | The Philosophy Of Science by Steven Gussman [1st Edition]

Chapter XXX: Statistics, Probability, And Games | The Philosophy Of Science by Steven Gussman [1st Edition]

12/17/2022 07:08:00 AM

“Yes, always use the median. In Russia the watches are not made very well, so when friends get

together they compute the times on their watches. One says 5 minutes to 5, another says 5 o'clock,

the other says 11 o'clock. Use the median!”

– Yakov Zeldovich (as relayed by J. Richard Gott)^I

“And as for my gambling, it's true I lost it all a few times. But that's because I always took

the long shot and it never came in. But I still have some time before I cross that river. And if

you're at the table and you're rolling them bones, then there's no money playing it safe. You have

to take all your chips and put them on double six and watch as every eye goes to you and then to

those red dice doing their wild dance and freezing time before finding the cruel green felt.

I've been lucky.”
– Norm Macdonald^II

Some of the chapters of this book will make experts balk, but I believe they will help students as a jumping-off point, that whatever my errors or omissions, the tools provided here will leave a layperson better off than average (or at least better than how their intuitions alone would otherwise have guided them). I am not a statistician. As a philosopher of science, I will not be giving a comprehensive education on statistics. Instead, I will be trying to equip the reader with the basic statistical concepts needed to do science. I hope that this volume will serve as a jumping-off point for anyone who finds a particular interest, herein. Of course, this means that I will not be getting into advanced statistical methods—and in fact, I come from a position of skepticism of the common uses of many such tools. The more complex one's methodology, the easier it is for them to pull the wool over their reader's eyes—and if one is willing to do this, it is not even important that they understand the tools they appear to be using. When Feynman wrote of physicists' “precious mathematics” or E. O. Wilson of the fact that mathematical ability is somewhat overrated in the sciences, the point was that the philosophy of the thing, the concept is what was most important, and that one hopes that the precise mathematical description will be as elegant as possible.^III When scientists fetishize mathematical complexity, they are committing a kind of sophistry that does not serve the scientific enterprise. Many think that “sophisticated” statistics are serving this ulterior purpose in the sciences, today: celebrating complicated mathematics (perhaps misapplied to situations in which the underlying assumptions are not met) over simple statistical tests of hypotheses.^IV I do suspect much mediocre research is shrouded in obfuscating jargon and mathematics. It is thought, for example, that different statistical methods are sometimes used in succession until some small effect (perhaps an aberration in the data) is found to meet arbitrary “certainty” standards under one of these calculations (a practice known as p-hacking).^V The secret (which is threatening to experts interested in maintaining the mystique of their careers) is that the best science is done with the simplest statistical mathematics, and is really about discovering a paradigm-shifting idea for a mechanism that gives rise to an effect.

For example, if one wants establish whether people who prefer chocolate are more likely to be gym rats than people who prefer vanilla, then there is a simple and obvious prediction made by this hypothesis: by definition, all else equal, a larger portion of the people who like chocolate should frequent the gym than those who like vanilla. The mathematics involved is simply to separate your chocolate-loving and vanilla-loving populations, add up the number from each group that frequents the gym, and divide these numbers by the respective totals, and finally comparing those proportions. Imagine you have a sample of 1,000 people (N_net = 1,000), 453 of which prefer chocolate (N_Chocolate = 453) and 547 of whom prefer vanilla (N_Vanilla = 547).^VI Then when one counts the number of gym rats per group, one finds that 203 chocolate lovers frequent the gym (g_Chocolate = 203) while only 123 vanilla lovers do (g_Vanilla = 123). Because we have different sample sizes for each flavor, we must now divide the gym-rat counts by the respective flavor-counts to derive the proportion of each population which frequents the gym, and then compare these proportions via a ratio:

p_Chocolate = g_Chocolate / N_Chocolate = 203 / 453 = 0.448 = 44.8%

p_Vanilla = g_Vanilla / N_Vanilla = 123 / 547 = 0.225 = 22.5%

r_{pChocolate/pVanilla} = p_Chocolate / p_Vanilla = 0.448 / 0.225 = 1.99 ≈ 2

which suggests that a chocolate-lover is twice as likely to frequent the gym as a vanilla-lover—a large effect size. Now this would only be a correlation which would not imply any particular causation: it could be that chocolate makes one want to go to the gym, that working out causes one to crave chocolate, or any number of common causes that cause both of these otherwise independent preferences for chocolate and working out to correlate. Further, the correlation may not be as strong as it seems (or even exist, in the worst case) in the real world if there are serious confounds between the two samples which explain the difference (in the worst case scenario, one may have sourced their chocolate lovers by asking around at the gym, but sourced their vanilla lovers by asking around at ice cream shops—a failure to get a representative sample). Even if one mustered a spurious defense of more complicated statistical analyses, perhaps arguing that such tools allow the revelation of smaller effect sizes otherwise hidden by sample size and bias (I don't myself believe this); why should we care so much about small effects that may have little bearing on how the world actually behaves?^VII Isn't it more interesting to discover large effects that actually help explain the world, to a first-approximation? This kind of statistical and methodological minimalism ensures we are talking about relatively large effect sizes using methods that anyone can peer review, by checking the most basic predictions of our hypotheses before anything else. Furthermore, if one is interested in small effects, there really is no better alternative to simply juicing one's sample size until it is large enough to make such a detection—this will not only elucidate the effect, but one will not need to worry that it is a statistical artifact of fancy mathematics, nor take a statistician's word for it, either. It is likely that many of the social psychology results that are failing to replicate were results with a small effect size, anyway.^VIII The misuse of mathematics is the preying on of a reader's (lay-people and scientist alike) bad heuristic that the fancier the mathematics, the more impressive, intelligent, and likely to be true the result must be; the truth is actually the exact opposite. Einstein is purported to have said that, “if you can't explain it simply, you don't understand it well enough,”^IX and himself used ancient, basic algebra (the Pythagorean theorem)—thousands-year-old mathematics—to derive his special theory of relativity (it is used to derive gamma, γ, which is the proportion by which time dilates and length contracts, dependent on one's speed).^X Granted, Einstein's later general theory of relativity required more complex mathematics, but nonetheless the special theory of relativity represents a secret of the universe (that space and time are related and malleable under certain extreme conditions) hidden for millennium behind only the simplest of mathematics and a very clever idea.^XI In fact, Einstein was met with a lot of push-back—some of it initially fair, as his is a most extraordinary claim—but much of it most unfair, based only in a distaste for the idea or even for the fact that Einstein was Jewish; such thinkers allowed the ugly biases of their time and place interfere with the objective science.^XII In modern science, it is probably more often the case that an argument is too complicated to be true than that it is too simple to be true.^XIII

Today it is popular to act as though probabilities arise from fundamentally stochastic processes, that there actually exist random processes, even when all information is taken into account in-principle by an omniscient being. Enlightenment thinkers prior to the 20^th century would have seen this as a backwards way of viewing the relationship between probability and statistics. Statistics come first (in the form of the empirical distribution of outcomes in a complex system), and probabilities later (as the normalization of those statistics, applied in lieu of a deterministic mechanism of prediction).^XIV Imagine, for example, that I create an apparatus at the surface of the Earth which consists of a plate on the ground which has a positive electric charge (but you do not know this). Then I hand you balls to drop onto the plate, all of which look identical. Unbeknownst to you, while all of the balls have the same mass, two-thirds of them have a neutral electric charge and one-third have a positive electric charge such that when, “dropped,” these balls actually accelerate upward by 9.81 m/s².^XV Not knowing any of the mechanisms involved (the Newtonian gravitational force, the Coulombian electromagnetic force, mass, and charge), all you could do is drop the balls and see what happened. After a number of trials, you would realize that, statistically, two-thirds of the balls fall down, and one-third of them “fall” up. Understanding that statistics imply probabilities, you do realize that you have some predictive power upon dropping a given ball: a two-thirds chance that it will fall down, and a one-third chance that it will “fall” up. To conclude that the process was literally stochastic, in other words, that the balls were actually identical and yet that a third of them behaved in the totally opposite way, would be foolish: one generally understands that a stochastic understanding of a process reflects an incomplete understanding of a process.^XVI Statistics are just facts about the distributions of outcomes in the world, given incomplete information about the state and laws of nature; probabilities, on the other hand, are the application of those facts to make informed decisions at above chance rates. Probabilities are normalized, meaning that they are given as fractions (or percents) of unity (one): they are codified as numbers from 0 to 1 (or 0% to 100%) signifying proportions or ratios of populations of arbitrary size. This allows one to compare differently sized populations directly by controlling for population size. This normalization allows us to speak about a quantity increasing (or decreasing) by some percent of its principle amount—allowing us not to worry about absolute numbers so much as relative numbers. If one can find out through statistics that that one-in-ten people have been afflicted with a particular ailment, then one has some blunt measure of the probability of a person getting infected: 10%. Conversely, if you had somehow known that the probability of infection were 10%, you could predict that 10% of the population had come down with the disease.

There are a few simple statistical tools that can get one a long way, such as averages. The simplest is the mode which is just the most commonly occurring value in a set:^XVII
int Mode(int[] values){

/* Create a parallel array for holding the number of

* instances of each value in values */

int[] counts = new int[values.Length];

// Independent index for the counts array

int j = 0;

// By definition, there is at least one instance of each value

for (j = 0; j < counts.Length; j++){

counts[j] = 1;

}

// Reset the counts index

j = 0;

// Sort the value array in-place

Array.Sort(values);

// Place-hold the index of the current running-mode

int modeIndex = 0;

for(int i = 1; i < values.Length; i++){

// Increment the count, if necessary

if(values[i] == values[i - 1]){

counts[j]++;

// Catch the special case when the largest value is the mode

if(i == values.Length - 1) {

/* Check if the most recent

* value is the mode */

if (counts[j] > counts[modeIndex])

modeIndex = j;

}

}else{

/* Check if the most recent value is

* the new running-mode */

if(counts[j] > counts[modeIndex])

modeIndex = j;

// Change the count index

j = i;

}

}

// If there is no mode, return the special value -int.MaxValue

if (counts[modeIndex] == 1)

return -int.MaxValue;

// Return the mode

return values[modeIndex];

}
In a set of integers, a clear mode is likely, but in a set of real numbers, because of slight differences in less-significant fractional digits, there may be no exact repeats of values despite low variance in the values; for this reason, a mode either isn't used with real numbers, or otherwise one must set approximation criteria for equivalence such that the data are cut up into discrete ranges or buckets which are then counted as equivalent (for example, the set {4.3, 7.5, 1.2, 9.1, 1.3, 1.1} has no mode, but one can be defined if we treat ±0.1 as equivalent, which finds the pseudo-mode to be 1.2).^XVIII The mean (which you will colloquially know as “the average”) is when one adds up all of the versions of the same value that one is in possession of, and divides this sum by the number of samples:

μ = ∑v_i / N

where μ is the mean, v_i represents the N individual values, and N is the total number of values. The mean allows one to control for variance and error in different measurements (say ten people used a ruler and individually measured the length of an object, getting slightly different answers), or in a sample of different but related values (say of the cost of a carton of milk in different stores) returning just one value (based on the rest) to work with (which also means that it is prone to error when generalized to all cases). The mean can give one a more accurate result than taking any old singular value, which is highly prone to variability on its own. Other times, one has a data set with a few outliers that bare less relationship to the other values in the data set (either because they are far smaller or far larger than the rest), and one wants an average that is less affected by such extremum; here, the median is most useful.^XIX The median can be found by lining up each data point in chronological order and then picking the middle result (if the number of data points, N, is even, then the mean is taken between the two middle-values):

   float Median(float[] values){
           Array.Sort(values);
           // Take the mid-point of an odd-sized array

           if(values.Length % 2 != 0){
                   return values[values.Length / 2];
                   // Take the mean of the two mid-points of an even-sized array
           }else{
                   return (values[values.Length / 2 - 1] + values[values.Length / 2]) / 2f;

}
}

The median will give a more accurate result when one has some extremum in their data set (which is a particular problem when a sample size is relatively small).^XX Because the median is less sensitive to extremum, it is a quick way to get a sense for a more common value among the set (which is often helpful in economic situations, for example, such as real-estate prices in a neighborhood that may contain a few outlier mansions).^XXI There are more surgical ways to control for such things (perhaps narrowing one's set of values by a metric, say, square footage or simply checking cost per square foot), but the median statistic is a nice, blunt first approximate tool for such purposes. Physicists will even use it when there are a few available estimates for a physical constant, for example.^XXII The median of the data available is often the closest to the truth one can get without preforming their own measurement of better quality than those measurements of the past (especially when one considers that much scientific measurement is not trivial). Because the median will literally reflect one of the available values, one might be tempted to consider the research that produced that particular number as having had the best methodology among the measurements, or to otherwise think that it is the only measurement that actually contributed, but this is wrong-headed, though the other values did not participate via summing, the other values did contribute to the selection (and any one of them may have had a better methodology than that of the study which produced the value selected in the end).

One must also be able to quantify the amount of variety in a data-set, as well, and this is where the variance and standard deviation come in.^XXIII The variance is a measure of the amount of variety about the mean in a data-set, and it is calculated by taking the sum of the squares of the differences between the mean and each individual value, and dividing that entire quantity by the sample size:^XXIV

σ² = ∑(v_i - μ)² / N

where σ² is the variance, μ is the mean, and v_i are the individual data-values, and N is the sample size. This value can be quite, “large,” and difficult to interpret; to bring it down to earth, it can be useful to deal in the standard deviation which is merely the square-root of the variance (in addition to the fact that taking the square-root of a number will tend to halve its order-of-magnitude, doing so here places the measure back into the same unit as the rest of our statistics and of each data-point themselves):^XXV

σ = √σ² = √(∑(v_i - μ)² / N)

where σ is the standard deviation.

The law of large numbers is what marries statistics to probabilities. It is often intractable to actually survey the entire population to get the real statistical state of things (to actually ask every living person their favorite ice cream flavor, for example).^XXVI Only for very important scenarios are there resources directed towards achieving something like that goal, for example, the U.S. census or votes in Presidential elections. Typically, one needs to settle for a sample population, which is when one takes a sub-population from the population of interest and extrapolates the data from the smaller sample to the larger sample using proportions. Often, in a world of some three-hundred million Americans (and some seven billion humans), a good study might sample on the order of tens-of-thousands of people and then extrapolate their findings to get at the overall state of things. Of major concern is attempting to make sure that one has a representative sample (or a random sample): if one's metric of interest is to extrapolate, then other already-known metrics (sex, race, age, etc.) should also extrapolate (a sample is a better representation of the full population when it has the same proportion of each of these variables' values as the wider population of interest).^XXVII One would not like the sample's individuals to all be similar in some way that individuals of the total population are not; this discrepancy between the sample and the total population is called sample bias and while some level of it is ultimately unavoidable, it may be mitigated. This is done by attempting to get a representative sample of people, trying not to allow for more people from a certain geographical location than another, for example. Sometimes, one may try to dispel worries about a bias in their sample due, for example, to its seeming to be unlikely to affect the particular measure (though this is hazardous as such studies are the means to find unexpected correlations). Much low-funded university research like that I have participated in might at best get a sample size of a hundred (N = 100).^XXVIII But even putting sample size aside, these hundred subjects are often just a hundred students from the university at which the study is being performed (and further, they are likely biased towards being students interested in that field of research, be it psychology or economics); that is a very narrow population to try and extrapolate from.^XXIX Once one is happy with their sample and ready to extrapolate, all one does is divide their measure by the sample population size and then apply that percentage to the total population: one may conclude from a well-performed study of 10,000 subjects in which 3,333 preferred chocolate to vanilla, that 3,333 / 10,000 = 33% of the larger 330,000,000 population prefers chocolate (estimating about 110,000,000 chocolate-loving Americans).^XXX Ultimately, spatial relativity is such that there's no such thing as length unless one has two different lengths to compare with each other, so that you may talk about one in the terms of the other (this is the definition of a unit); that is what a ruler is: when we say that something is 0.2 meters long, we are saying that it is 20% of a meter-stick in length.^XXXI Something that is 10 meters long is 1,000% of a meter. The law of large numbers states that if one knows the probability of an event occurring, then one can calculate the expectation (by multiplying the percent-chance by the number of potential occurrences), and ultimately, that as the number of occurrences tends towards infinity, the statistical outcome will approach the probability.^XXXII For example, a coin has a fifty-fifty chance of landing heads-or-tails. In ten trials, the expectation is that one will flip 0.5 × 10 = 5 heads (and 10 - 5 = 5 tails, as well). But because of the small sample size (N = 10), when one carries out this experiment, it would not be terribly surprising to find that the coin landed heads six times, and tails four times, yielding a probability estimate of 60% heads, 40% tails—not unrelated to 50/50, but with quite a bit of error nonetheless.^XXXIII Now if one does a hundred trials (N = 100), the likelihood is that they will get much closer to 50/50 than 60/40—perhaps 53 end will up heads, whereas 47 will end up tails, corresponding with a 53% heads, and 47% tails, probability. The point is that as the number of trials tends to infinity (N → ∞), the statistical outcomes tend to the perfect 50% heads, 50% tails probability distribution. Of course, things are more often found out in the reverse order than this: we want to determine some probability, and we do so by getting a large representative sample and taking the statistical distribution seriously. One might be able to intuit a 50/50 probability from apparent symmetry and experience with coin-flipping, but we ultimately know that probability from flipping many trials and checking the statistical outcome, interpreting these as probabilities for predicting the future outcomes of similar trials. Unfortunately, as mentioned in not all systems appear to be governed by such simple probabilities that the law of large numbers may be counted on.^XXXIV

Now, when it comes to making decisions, probabilities and expectations are not the only things to take into account, on their own: the consequences of the event in question must also factor in a risk-benefit assessment (even though this may be harder to quantify—in fact, it is the same old problem of trying to normalize what is a “small” or “large” probability, in this case frames as what is a “good” or “bad” risk to take).^XXXV The occurrence of a certain risk could have what appears to be a really “small” probability associated with it, but if that risk is high stakes, say, death, it cannot be easily dismissed just because of its “low” probability! For example, if I said you have to choose between a 1% chance of dying and a 50% chance of giving me $5, everyone would (and should) choose the 50% chance of giving me $5.00, even though you have a much higher chance of “losing”. This is because the reduction in consequence associated with that loss more than makes up for it. In this case, the expectation is essentially that you paid 50% × $5.00 = $2.50 (in reality, you will either pay $0.00 or $5.00) to have no chance of dying in this moment. To make the other choice would be to reduce your life by 1% (in reality, risk a 1% chance of imminently dying) just so that you don't have to spend a little money—thereby implying that 1% of your life is worth $5.00, meaning your entire life is only worth only $500.00!

In reality, probabilities tend to be associated with a certain time-period. For example, the probability of an asteroid hitting the Earth ever is likely to be 100%; but if the expectation is that a meteor will hit the Earth once in a million years, then one knows the probability is 50% in half-a-million-years, or that the expectation is that the Earth will be hit ten times in ten millions years. A classic mistake among beginners is to confuse expectation and probability, and particularly to confuse mutually exclusive probabilities as though they affect each other. In most cases, we have very simple statistics that give independent probabilities (independent as in, events do not interact or change each other). For example, when one flips a coin ten times in a row, each coin flip is an independent event, in which the odds are 50/50 heads-or-tails, regardless of the outcomes of previous flips. Now, one can calculate the reduced probability of several independent chance-events coming out a particular way ahead of time, which is done by taking the product of all of the probabilities involved:

p_net = ∏p_i = p₁ × p₂ × ... × p_N

where p_net is the composite probability, ∏ (big pi) is the iterative-product symbol (analogous to big-sigma for summations), p_i represents the individual probabilities of each event, and N is the number of probabilistic events. For example, the probability of flipping a coin and getting heads five times in a row is: 0.5 × 0.5 × 0.5 × 0.5 × 0.5 = 0.5⁵ = 3.125%. But each individual coin-flip of the among the ten coin-flips still had a 50% chance of landing heads on its own.^XXXVI This is an example of isolation and reductionism. If there is a 20% chance of being robbed in your city, in a given year, and then the robbery of your next-door neighbor brings this year's percent of people victimized by robbery up to 25%, that does not now mean that you can safely conclude that you're not going to be robbed for the remainder of the year.^XXXVII Think about the ontology of the situation: one who makes this mistake is acting as if the robbers are all communicating with each other, trying to make sure that they don't break out of step with the known statistic, “oh, yeah, I guess I'll have to wait till next year to rob people, we already met our quota.” Someone in that situation still has a 20% chance of being robbed at some point within any year-long time-span (starting and stopping on January 1^st is also a fallacy), same as always (a 20% per year / 365 days = 0.055% chance of being robbed on a given day, every day).

This time-version of the expectation culminates in game theory. The concept of iterated games are what gives rise to the law of large numbers. Whether the probabilistic game is repeated on many different subjects or repeated in many different trials, it will give rise to the statistical expectation. Certain strategies may be fine in single-dose, but are revealed to perform very poorly when repeated. This recursive application of probability theory to probability theory only makes it clearer that one should always bet on a 51% chance over 49% chance, as in the long run, the distribution will manifest even if individual games are nearly equal odds. Game theory is so called because it evaluates competitions between different strategies in iterated games and makes statistical predictions about outcomes. A balance between proportions of certain strategies is known as a Nash equilibrium (or as it is known in its use in evolutionary biology, where evolution can be viewed as a fitness-maximizing game between competing individual organisms), an evolutionarily stable state (ESS).^XXXVIII These Nash equilibria can help one predict either how often individuals will employ a certain trait, or otherwise what percent of the population will and will not posses a given trait (or some combination).^XXXIX For example, when evaluating whether a living organism should employ behaviors that are selfish, selfless, or a hybrid approach in which one evaluates whether an individual is likely to return the favor before deciding whether to be selfish or altruistic towards them, the conditional strategy (known as reciprocal altruism) is most likely to be stable, and therefore evolve.^XXXX Solved games are those for which all strategies, states, and outcomes are mathematically understood, ahead of time, granting the person with knowledge at least the power to always draw his opponent (for example, Tic Tac Toe is solved; an unsolved example is Go!, a game which has enough complexity emerging from elegant rules that it is non-trivially hard to fully grasp from the perspective of game theory).^XXXXI Solved games are in a sense flattened in time into a static structure. The rules of how to always win (or at least tie) are known ahead of time, every reaction understood, even deterministically. Newtonian physics is very nearly a solved game, particularly the kinematic equations.^XXXXII Quantum physics is the antithesis of a solved game and does not even aspire to do better than its statistical answers (neither do most physicists—they have, I think as a matter of culture and not of science, imbibed that the cosmos is at its core a chaos).^XXXXIII

One of the most important statistical distributions in nature is the normal distribution which takes the shape of a bell-curve. Such a curve is found when a data-set's mean, median, and mode are all the same value, and when the rest of the distribution symmetrically and smoothly decreases to zero about the origin in a bell-like shape.^XXXXIV The statistics of such curves are well-known and make up the most-used statistical model in the sciences (for better or worse—I wrote that the,“curve is found when,” earlier because it may be erroneously imposed on nature out of convenience rather than observed in nature). Nonetheless, many aspects of nature have indeed been found to follow such a bell-curve, from the distribution of human height to that of human IQ scores. The left-and-right extremum of the distribution are known as tails and these are where the extraordinary (positive or negative) population lives. Taking human height as an example, most people exist around the middle of the bell, having an average height, while the tails represent the few who are extraordinarily short or tall. This ubiquitous distribution is defined by the following equation:^XXXXV

f(x) = (1 / σ√(2π))e^{-½((x
– μ) / σ)^2}

where x is any position along the x-axis, f(x) is the corresponding y-coordinate on the bell-curve, σ is the standard deviation, and μ is the mean, median, and mode. Below is a graph of a standard bell-curve where the standard deviation is set to one, and the mean / median / mode is set to zero:^XXXXVI

A nice feature of the normal distribution is the 68-95-99/7 rule or the empirical rule which means that about 68% of the values fall within one standard deviation from the average, 95% fall within two standard deviations, and 99.7% fall within three standard deviations (in accordance with the diminishing returns on results as one follows out towards the extreme tails.^XXXXVII While real-world measured data may often follow a bell curve, such measures will of course not always conform to this shape. Happily, due to the central limit theorem, when calculating error bars, it is often the case that the sampling distribution (a distribution of independent random samples of the mean in your population distribution) is itself a normal distribution, allowing the simple calculation of the standard error, confidence intervals, and p-values that we saw in previous examples.^XXXXVIII Returning to the “serious” vaccine side-effects example, our population distribution was 104 people counted as having no “serious” side-effects (0), five people counted as having a “serious” side-effect (1), and just two people counted as having a debatably “serious” side-effect (0.5):^XXXXIX

The sample distribution of the mean however is treated as a normal distribution with mean 0.054 and standard deviation (which is the standard error) 0.01988 (which is so thin-and-tall that it cannot be easily captured in one image):^XXXXX

Footnotes:

0. The Philosophy Of Science table of contents can be found, here (footnotephysicist.blogspot.com/2022/04/table-of-contents-philosophy-of-science.html).

I. See The Cosmic Web by Gott (pp. 70).

II. See Based On A True Story: Not A Memoir by Norm Macdonald (Spiegel & Grau) (2016 / 2017) (pp. 231).

III. See the “Mathematics” chapter which further cites “A Centennial Celebration For Richard Feynman” by Clavin (https://pma.caltech.edu/news/centennial-celebration-richard-feynman-82264); "Dick's Tricks" by Susskind (Caltech) (2018) (https://www.youtube.com/watch?v=ldfUAzRMs_k) (21:46-22:46); the "Mathematics" chapter from Letters To a Young Scientist by E. O. Wilson (pp. 27-41); and “Advice To Young Scientists - E.O. Wilson” by E. O. Wilson (https://www.youtube.com/watch?v=ptJg2GScPEQ) (3:48 – 9:03).

IV. In addition to the citations in footnote III above, see "Lies, Damned Lies, And Vaccine Statistics" by "Dr RollerGator PhD" (WHAT) (2021) (https://drrollergator.substack.com/p/damned-lies-and-vaccine-statistics) (in which the author continually reminds the reader that most results simply require, "counting things and dividing things counted"; though I have not yet read this whole article, this statement has always resonated with me and my experience in making sense of the covid-19 pandemic, watching on in puzzlement as many authors decided to use overly-complicated models rather than simply testing their predictions in a straight-forward manner).

V. See the “Methodology” chapter which further cites "Bad Data Analysis And Psychology's Replication Crisis" by Ferguson (https://quillette.com/2019/07/15/bad-data-analysis-and-psychologys-replication-crisis/).

VI. This is of course only an example, with the numbers made up for the sake of argument (everyone knows vanilla is better than chocolate)!

VII. See "Bad Data Analysis And Psychology's Replication Crisis" by Ferguson (https://quillette.com/2019/07/15/bad-data-analysis-and-psychologys-replication-crisis/) and “Effect Size” by Hood (https://www.edge.org/response-detail/27139) from This Idea Is Brilliant edited by Brockman (pp. 479-481).

VIII. See "Bad Data Analysis And Psychology's Replication Crisis" by Ferguson (https://quillette.com/2019/07/15/bad-data-analysis-and-psychologys-replication-crisis/).

IX. See the “Elegance And Complexity” chapter which further cites the BrainyQuote entry for this quote: https://www.brainyquote.com/quotes/albert_einstein_383803, and the goodreads entry for this quote: https://www.goodreads.com/quotes/19421-if-you-can-t-explain-it-to-a-six-year-old.

X. See Modern Physics by Serway, Moses, and Moyer (pp. 15-19) and the “Special Relativity” chapter in Why Does E = mc²? by Brian Cox and Jeff Forshaw (Da Capo Press) (2009) (pp. 37-56).

XI. See Why Does E = mc²? by Cox and Forshaw (pp. 22, 77).

XII. See "A Profusion Of Place | Part I: Of Unity And Philosophy" by Gussman (https://footnotephysicist.blogspot.com/2020/03/a-profusion-of-place-part-i-of-unity.html#FN67A) which further cites What Is Real? by Becker (pp. 260, 263-264); “Episode 36: David Albert On Quantum Measurement And The Problems With Many-Worlds” by Carroll and Albert (https://youtu.be/AglOFx6eySE) (beginning at 50:31); “Mindscape 59 | Adam Becker On The Curious History Of Quantum Mechanics” by Carroll and Becker (https://www.youtube.com/watch?v=em7dkYZTetE) (beginning at 1:04:42); Our Mathematical Universe (pp. 152-153, 363); and "The Universe" by Seth Lloyd (Edge / Harper Perennial) (2014 / 2015) (https://www.edge.org/response-detail/25449) from This Idea Must Die edited by Brockman (pp. 13).

XIII. See for example “In Defense Of Philosophy (Of Science)” by Gussman (https://footnotephysicist.blogspot.com/2021/05/in-defense-of-philosophy-of-science.html#FN21B) which further cites To Explain The World by Weinberg (pp. 150-151).

XIV. See for example Exploratory Programming For The Arts And Humanities by Montfort (pp. 228, 232-234).

XV. Ignore whether or not this idealized example is itself realistic, and pay attention to the argument being made: this is the spirit of the thought experiment.

XVI. See the “Determinism” chapter which further cites The Demon Haunted World by Sagan (pp. 8, 295) and Cosmos: Possible Worlds by Druyan (pp. 274).

XVII. Notice that my code makes reference to some function from the built-in Array class, Sort(), which appears to chronologically sort the array “in-place” (meaning the function doesn't return a sorted array, it over-writes the passed-in array with the new sorted version), but that I do not actually show a definition for this function: this is essentially an interface, a part of the problem I leave to someone else to implement. Programming languages come with all sorts-of built-in functions for one to call upon, along with documentation for how to use them.

XVIII. See Exploratory Programming For The Arts And Humanities by Montfort (pp. 230-231).

XIX. See Exploratory Programming For The Arts And Humanities by Montfort (pp. 230).

XX. This is a better option, for example, than arbitrarily removing extremum (which may well represent real outliers, and not produced by errors!) before taking the mean. One ought generally to decide whether they are taking the mean or the median, and must otherwise have a very good reason (which is rare) for removing extremum by hand—hard evidence that those removed were actually false data-points (as free-parameters such as this are another way for researchers to fool others and themselves into a desired result). I hated it in school when students would carelessly remove extremum when the whole point of the lesson on taking the mean from many trials was that this process would produce an accurate result despite variance. See also Exploratory Programming For The Arts And Humanities by Montfort (pp. 230).

XXI. In a college lecture, my former professor, and colleague psychologist Sean Duffy taught us to use the median when dealing with outliers by giving the example of a realtor's trick: he said that they will use the mean in areas with a few very expensive houses because those mansions pull the mean price upward, allowing them to say they're getting you a deal by beating, “the average.” Using the median is a nice blunt way to control for outliers and get a sense for what other people are actually paying for a comparable dwelling. In this case, another method is to control for square-footage—as in, comparing price-per-square-foot rather than price-per-dwelling (which is likely what was being done).

XXII. Astrophysicist J. Richard Gott and his colleagues were twelve years early on the correct Hubble constant because they took the median of the available variant estimates from different physical measurement methods, giving them the value of 67 that would later be within the error-bars of precision cosmology as revealed by the Plank satellite and Sloan Digital Sky Survey observations, see The Cosmic Web by Gott (pp. 70).

XXIII. See Exploratory Programming For The Arts And Humanities by Montfort (pp. 195, 229-230 and “The Average” by Nicholas A. Christakis (Edge / Harper Perennial) (2014 / 2015) (https://www.edge.org/response-detail/25437) from This Idea Must Die edited by Brockman (pp. 532-534).

XXIV. See the “Approximation” chapter which further cites “Standard Deviation” (https://en.wikipedia.org/wiki/Standard_deviation); and Exploratory Programming For The Arts And Humanities by Montfort (pp. 229-230).

XXV. See “Standard Deviation” (https://en.wikipedia.org/wiki/Standard_deviation) and Exploratory Programming For The Arts And Humanities by Montfort (pp. 229-230).

XXVI. See the “Objectivity” chapter.

XXVII. See the “Methodology” chapter.

XXVIII. See "Visual Judgments Of Length In The Economics Laboratory: Are There Brains In Stochastic Choice?" by Sean Duffy, Steven Gussman, and John Smith (Journal of Behavioral And Experimental Economics) (2021) (https://www.sciencedirect.com/science/article/abs/pii/S2214804321000483). As a side note, there is a methodological quirk associated with at least one of the studies done using this line-length computer software I engineered, in which an administrator of the experiment, not being in possession of exact change, rounded up the pay-out to participants (I doubt this affected the results of the experiment, but in principle, it should be noted).

XXIX. This kind of methodology criticism was a theme in the “Social Relationships And Health” honors seminar that I took at Rutgers University Camden, taught by psychologist Kristin J. August.

XXX. Of course, these values (other than the approximate population of the U.S., by way of Google which further cites the U.S. Census Bureau and World Bank) are made up for the sake of argument.

XXXI. See the “Mathematics” chapter.

XXXII. See The Physics Of Wall Street by Weatherall (pp. 7-8, 59-60, 69). There do exist exotic probability distributions which do not follow the law of large numbers, see The Physics Of Wall Street by Weatherall (pp. 59-62, 69-75).

XXXIII. Following the mathematics laid out in the “Approximation” chapter, we can place error bars on this hypothetical experiment. Consider heads as the value 1 and tails as the value 0, then the mean is 60% (μ = 0.6). This information can now be used to calculate the standard deviation, which comes out to σ = √((6 × 0.16 + 4 × 0.36) / 10) = 0.4899. From here, the standard error is σ_x ≈ 0.4899 / √10 ≈ 0.1549. Now, we multiply the standard error by 1.96 for a 95% confidence interval: CFI_95% ≈ ±0.30. This means that our result, with error bars is: 60% ±30% (or somewhere between 30-90%). As one can see, the very small sample size (N = 10) resulted in a large level of error. In fact, when one has a relatively small sample size (N < 30), one uses a t-score instead of a z-score, found from a t-table—here, for N = 10, 95% confidence is associated with the coefficient 2.262, meaning: CFI_95% ≈ 2.262 × 0.1549 ≈ ±0.35 (25-95%)—very imprecise, indeed, see the “Applicability” section of “z-test” (https://en.wikipedia.org/wiki/Z-test#Applicability) and the “Table Of Selected Values” section of “Student's t-distribution” (https://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values). In this case, we know that the real value for a fair coin is 50%, so we can actually calculate the percent error: |0.5 – 0.6| / |0.5| = 0.2 = 20% error (meaning that 0.5 × 1.2 = 0.6—our experimental value was 20% larger than expected), see the “Approximation” chapter which further cites “Percentage Error” (https://www.mathsisfun.com/numbers/percentage-error.html).

XXXIV. See The Physics Of Wall Street by Weatherall (pp. 59-62, 69-75).

XXXV. Although I have little chance of finding it, I am aware of mechanical engineer and educator Bill Nye having brought up this point on, I believe an episode of StarTalk Radio some years back.

XXXVI. This is actually what complicates my disease example from earlier, in which the chance of catching a communicable (or contagious) disease is taken to be approximately the active number of cases divided by the population size: the chances that someone gets infected increases anytime someone is infected, in fact exponentially, because each newly infected person can now infect the same number of people as the last person did.

XXXVII. Of course, these absurdly large numbers are being made up for the sake of argument.

XXXVIII. See The Selfish Gene by Dawkins (pp. 88-113) and “The Play By Nature” by William D. Hamilton (Science) (1977) from The Selfish Gene by Dawkins (pp. 459, 461) (though I have only read the excerpt in the 40^th anniversary edition of Dawkins' work).

XXXIX. See The Selfish Gene by Dawkins (pp. 95-96).

XXXX. See the “You Scratch My Back, I'll Ride On Yours” and “Nice Guys Finish First” chapters in The Selfish Gene by Dawkins (pp. 96-97, 216-244, 261-301) which further cites (among much else) "The Evolution Of Cooperation" by Axelrod and Hamilton (https://www.science.org/doi/10.1126/science.7466396) and The Evolution Of Cooperation by Axelrod.

XXXXI. See “Solved Game” (Wikipedia) (accessed 12/17/2022) (https://en.wikipedia.org/wiki/Solved_game); How To Create A Mind by Kurzweil (pp. 6-7, 38-39, 166); and Our Mathematical Universe by Tegmark (pp. 324-325, 261-263).

XXXXII. Though there are caveats to this due to chaos theory, see as discussed in the “Elegance And Complexity” chapter which further cites The Physics Of Wall Street by Weatherall (pp. 131-132, 136-149); and "A Profusion Of Place | Part I: Of Unity And Philosophy" by Gussman (https://footnotephysicist.blogspot.com/2020/03/a-profusion-of-place-part-i-of-unity.html#FN43B), which in turn cites The Great Unknown by du Sautoy (pp. 21-71) which further cites “The Three-Dimensional Dynamics Of The Die Throw” by Marcin Kapitaniak, Jaroslaw Strzalko, Juliusz Grabski, and Tomasz Kapitaniak (Chaos) (2012) (https://aip.scitation.org/doi/10.1063/1.4746038) (though I have not yet read this paper).

XXXXIII. For example, I once asked physicist Brian Greene for his thoughts on quantum foundations (and specifically Bohmian pilot waves), and part of his response was sociological: that other approaches to interpreting quantum physics won the day not for their scientific value so much as for their proponents' social popularity, see "Richard Dawkins & Brian Greene" by Richard Dawkins and Brian Greene (Pangburn Philosophy) (2018) (https://www.youtube.com/watch?v=7iQSJNI6zqI) (1:54:24-1:57:00). Becker argued similarly in What Is Real? by Becker.

XXXXIV. See “Normal Distribution” (https://en.wikipedia.org/wiki/Normal_distribution) which further cites “Normal Distribution” by Rod Pierce (Math Is Fun) (https://www.mathsisfun.com/data/standard-normal-distribution.html).

XXXXV. See “Normal Distribution” (https://en.wikipedia.org/wiki/Normal_distribution).

XXXXVI. This image was exported from the Desmos graphing calculator's interactive bell-curve page: https://www.desmos.com/calculator/4qr7jwhsri. I encourage the reader to visit this page and play with sliders for the values of the mean / median / mode (these are denoted as b on the Desmos page) and the standard deviation (denoted as a on the Desmos page) to see how changing these values affects the shape of the normal distribution.

XXXXVII. See “68-95-99.7 Rule” (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule) which

XXXXVIII. See “Normal Distribution” (https://en.wikipedia.org/wiki/Normal_distribution) and “Central Limit Theorem” (https://en.wikipedia.org/wiki/Central_limit_theorem).

XXXXIX. See the “Approximation” and “Methodology” chapters. This graph was produced using OpenOffice Calc software.

XXXXX. This graph was produced using the Desmos graphing calculator: https://www.desmos.com/calculator/zogsxfzroh. There are more complicated scenarios in which there may be circumstances in which one is in possession of more sophisticated statistics that are not mutually exclusive and must be treated accordingly; or otherwise, one may find a statistical situation not governed by simple distributions such as normal bell curves (for example, fat-tailed distributions which may lack a simple law of large numbers to extrapolate from), see The Physics Of Wall Street by Weatherall (pp. 59-62, 69-75) and “Standard Deviation” by Nassim Nicholas Taleb (Edge / Harper Perennial) (2014 / 2015) (https://www.edge.org/response-detail/25401) from This Idea Must Die edited by Brockman (pp. 535-537).

Comments

Steven GussmanDecember 17, 2022 at 7:16 AM
Change Log:
Version 0.01 12/17/22 7:16 AM
- Made sure the text was all "normal" scale
ReplyDelete
Replies

Post a Comment