Fundamental Flaws in Meta-Analytical Review of Social Media Experiments
Ferguson's meta-analysis obscures social media impacts on mental health because it is based on an invalid design and erroneous data.
Recently, psychologist Chris Ferguson published a ‘meta-analytical review‘ of social media experimental studies, concluding that its statistically insignificant finding reveals no impact and so “undermines” the notion that reductions of social media usage can benefit adolescent mental health.
As we will see, however, social media reduction experiments measuring impacts on depression risk contradict Ferguson’s conclusion.
There are three primary reasons why Ferguson's meta-analysis produced a statistically insignificant result:
1) The meta-analysis is not a valid measure of mental-health impacts in social media reduction experiments, contrary to what Ferguson states in the Abstract.1 Out of the 27 studies included by Ferguson, only 19 reduced social media use and among these only 10 studies examine symptoms of mental disorders such as depression or anxiety.2 Ferguson obscures mental-health effects by including general well-being outcomes without even defining which measures he counts as well-being.
2) The meta-analytical design interprets withdrawal symptoms as evidence that social media reductions are harmful. These are temporary declines in some aspects of well-being that appear only in short experiments after sudden reduction of social media use and involve feelings such as frustration, not symptoms of mental health disorders.
3) There are 6 study selection mistakes and 3 large errors in effect size assignment — and all of them are biased in support of Ferguson’s views. For example, the strongest negative effect size assigned by Ferguson among the 27 studies is from an experiment whose authors agree the effect size is substantially positive — Ferguson essentially reversed the sign of the effect.3 Correcting these errors (or even a single one of them) would flip the result of Ferguson’s meta-analysis to statistical significance, removing a key premise of his main argument.4
Note that the confidence interval in Ferguson’s meta-analytical result barely included zero and so all the three flawed components above — the misuse of withdrawal symptoms to counter mental health improvements, the study selection mistakes, and the effect size assignment errors — were required for Ferguson to obtain a statistically insignificant outcome.
In this critique, I will concentrate only on the three major problems specific to the meta-analysis. There are also a number of other problems with the review, such as Ferguson’s misinterpretations of statistical insignificance and effect sizes.
Unfortunately, the publication of Ferguson’s meta-analytical review in a prominent journal and its coverage in the news media5 as well as its promotion by Ferguson and others ensures the erroneous review will be used to spread misinformation about adolescent mental health.
Main Conclusion of the Review
The review was published in Psychology of Popular Media — see Do Social Media Experiments Prove a Link With Mental Health: A Methodological and Meta-Analytic Review.6
In its Public Policy Relevance Statement, Ferguson declares that his meta-analysis “undermines” the notion that “reductions in social media time would improve adolescent mental health.”7
In the Conclusion of his paper, Ferguson states that statistical insignificance of his meta-analytical result “undermines” the belief of Jonathan Haidt and Jean Twenge that reductions in social media usage would benefit adolescent mental health.8
Indeed Ferguson has announced that his study “finds that reducing social media time has NO impact on mental health.”
Note: see https://osf.io/27dx6 for Ferguson’s list of studies and see https://osf.io/jcha2 for the list of effect sizes assigned by Ferguson to each study.
Experimental Evidence
We will now show that results of experimental studies contradict the main conclusion of Ferguson’s meta-analytical review.
All 12 published studies that measure symptoms of depression9 while reducing social media use reveal declines in depression risk.10
Ferguson included 10 of these studies but they are overwhelmed by the 17 studies that either did not involve social media reduction experiments or did not measure symptoms of mental disorders.
The degree to which the evidence provides proof of causal effects depends on the evaluation of the methodological designs of each experiment and other factors.
It makes no sense, however, to declare that experimental evidence indicates there is no effect on mental health and therefore undermines the notion that heavy social media use can harm adolescent mental health. It makes even less sense for Ferguson to announce that experimental evidence shows that “reducing social media time has NO impact on mental health.”
Categories of Experiments
To understand the incompatibility of experimental methods among the 27 studies included by Ferguson, we can recognize four basic categories:
Lab Experiments that last only 10 - 30 minutes. In these the ‘treatment’ is typically social media exposure such as requiring high school students to look at their own Facebook or Instagram page for 10 minutes, which resulted in momentary increase of self-esteem (Ward 2017).
Single-day Abstinence experiments that typically measure some aspect of well-being affected by withdrawal symptoms — such as in Przybylski 2022 where students reported lower Day Satisfaction at the end of the abstinence day.
One Week Reduction experiments that typically show lingering withdrawal symptoms coupled with mental health improvements when these are measured (see below).
Multi-Week Reduction experiments that tend to produce mental health improvements without any withdrawal symptoms (no well-being declines).
Another reason for incompatibility is the wide range of outcomes measured by these studies, ranging from asking student How good or bad was today? to using validated clinical measures of depression such as PHQ-9.
To complicate matters, two of 27 studies are not social media time experiments as required by Ferguson’s own selection criteria. Furthermore, two reduction studies admit to have fundamentally failed to manipulate social media use as intended, which also should have disqualified these studies from Ferguson’s meta-analysis.
Conducting a meta-analysis that produces a single effect size to reveal a supposed ‘average impact’ of all these incompatible experiments is scientifically invalid. Misrepresenting this result as the effects of social media reductions on mental health is incorrect.
Influence of Duration on Impact
Even if we accept the effect sizes assigned by Ferguson, the average impact on ‘well-being’ of social media reductions depends greatly on the duration of the experiments.
The average effect size assigned by Ferguson reveals a clear pattern:11
Three Single-day experiments: d = -0.18
Eight One Week experiments: d = +0.02
Ten Multi-Week experiments: d = +0.20
The pattern illustrates how Ferguson misuses temporary withdrawal effects on well-being in short-term experiments to counter beneficial impacts on mental health.
Some One Week experiments do not measure mental health and so only detect withdrawal symptoms in well-being. Other One Week experiments measure both well-being and mental health, but Ferguson combines these so that the withdrawal effects diminish the mental health improvements.
Multi-week experiments, however, do not reveal declines in well-being, only improvements in mental health.
Note that the effect size assigned by Ferguson to each study is greatly influenced by which outcomes Ferguson decides to include as well-being measures. When there is more than one such measure, Ferguson says he simply calculates the average.12
For example, if a study measures depression as well as satisfaction and Ferguson assigns d = +0.31 to it, this could mean a combination of d = +0.60 for depression risk and d = +0.02 for life satisfaction. The mental health risk then gets diluted by the well-being outcome in the average effect.
Duration of experiment and types of outcome measurement (well-being vs mental health) explain much of the variation in effect sizes assigned by Ferguson. This alone invalidates Ferguson’s ‘random-effects’ meta-analysis — the appropriate methodology would be to meta-analyse duration subgroups (as above) or perform a meta-regression.13
Erroneous Data
Although the design of Ferguson’s meta-analysis is severely flawed, it would not produce a ‘null’ result (a confidence interval that includes zero) if it was applied to accurate data.
Ferguson’s meta-analysis produced a statistically insignificant result only because Ferguson provided it with erroneous data.
Ferguson does so with 9 effect sizes:
Four effect sizes are missing while two effect sizes were improperly included.
Three effect sizes contain large errors.
Note that Ferguson assigns negative effect size to indicate that reducing social media is bad for mental health (and thus social media use is good) and each of the six data errors is biased toward the negative.
Improper Inclusion and Exclusion of Studies
Ferguson includes two studies that violate his own criteria for inclusion: neither Gajdics 2021 (d=-0.364) nor Deters 2013 (d = -0.207) are experiments manipulating time spent on social media (see Appendix: Study Inclusion and Exclusion for details).
Ferguson failed to include two SM reduction studies: Mosquera 2019, which found significant declines in depression risk after one week of Facebook abstention (and no declines in well-being), and Reed 2023, which found significant declines in depression risk 3 months of reduced social media use.14
Engeln 2020 and Stefano 2022 are social media exposure studies similar to the other laboratory studies included by Ferguson and yet were excluded for no apparent reason. Both studies found significant well-being declines after exposure.
Therefore two substantially negative effect sizes should not have been included while four substantially positive effect sizes were missing from the data.
For a detailed discussion of each of the four studies in light of Ferguson’s criteria, see Appendix: Study Inclusion and Exclusion.
Erroneous Effect Size Determinations
Comparing effect sizes assigned by Ferguson with information in the Abstracts of the studies raises red flags for three experiments: Collis 2022 (d = -0.138), Brailovskaia 2022 (d = 0), and Lepp 2022 (d = -0.365); subsequent examination of the full text confirmed that the effect size assignments by Ferguson are indeed highly erroneous.
In Collis 2022, there is no effect on well-being but Ferguson seems to have misread Table S2 and so incorrectly calculated a substantially negative effect.
Brailovskaia 2022 (d = 0) found substantially positive effects on well-being outcomes — the assignment of d = 0 by Ferguson is a mystery.
Lepp 2022 (d = -0.365) was assigned the most negative effect size by Ferguson of all the 27 experimental results on his list, and yet both the data and the conclusions of authors indicate a very substantial positive effect size.
After I sent email to Ferguson regarding the Lepp 2022 study, Andrew Lepp of Kent University, whom I copied, replied to thank me “for the accurate description of our referenced study” and urged Ferguson to answer my inquiry. Ferguson replied solely to Lepp, stating: “I’m comfortable with my interpretation of your data as related to the specific questions of the meta”.15
Therefore the pinnacle of Ferguson’s evidence that social media does no harm comes from a study whose authors concluded the opposite and later agreed with my criticism — Ferguson in essence flipped the sign of the effect and refuses to admit any error.
For a detailed discussion of each of the erroneous effect sizes in the three studies, including tables and graphs, see Appendix: Erroneous Effect Size Determinations.
Impact of Data Errors on Meta-Analysis
Since Ferguson’s meta-analysis produced a confidence interval that barely includes zero, correcting any one of the nine major data errors would likely produce a statistically significant finding.
For example, correcting Lepp 2022 from d = -0.365 to a reasonable yet conservative d = 0.250 increases the average effect from d = +0.084 to d =+0.107.16
This illustrates the sensitivity of the average to even a single major error in the determination of effect size by Ferguson.
If we also assign reasonable yet conservative estimates of d = 0 to Collis 2022 and d = 0.1 to Brailovskaia 2022, the average further increases to d = 0.116. If we also remove the two studies that violate Ferguson’s criteria, the average increases to d = +0.148.
In my estimate, correcting all the data errors would nearly double the effect size from d = +0.08 to d = +0.15.17
Top 10 Studies by Negative Effect Size
To better understand the problematic state of evidence that supports Ferguson’s conclusion that reducing social media use does not benefit mental health, let us briefly look at each of the top 10 studies per the negative effect size assigned by Ferguson:
Lepp 2022 -0.365: This is the one where Ferguson essentially flipped the sign of the effect.
Gajdics 2022 -0.364: This is a single-day no phones in school experiment, not a time spent on social media study (as required by the criteria set by Ferguson). Inclusion of Gajdics 2022 would require inclusion of similar experiments, like Brailovskaia 2023.
Vally 2019 -0.361: One week of sudden abstinence produced declines in aspects of well-being (withdrawal effects); mental health was not measured.
Ward dissertation 2017 -0.298: social media lab experiment (only 22 participants in the control group) where high school students spending 10 minutes looking at their own Facebook page experienced an increase in self-esteem and no effect on depression (it is unclear how Ferguson obtained an overall effect size so strong).
Kleefield dissertation 2021 -0.277: a very small experiment (only 27 participants in the control group) where one week of social media reduction led to lower self-esteem and no change in anxiety and depression.
Deters 2013 -0.207: This study manipulates Facebook status updates, not time spent on social media (as required by the criteria set by Ferguson).
Przybylski 2021 -0.152: One day of sudden social media abstention lowers momentary aspects of well-being such as Day Satisfaction among students (withdrawal effects); mental health was not measured. It seems Ferguson must have included ‘relatedness’ as outcome to obtain the effect size. There was no control group and impacts on satisfaction and relatedness ceased to be significant after controlling for sex and age.
Collis 2022 -0.138: This semester-long social media reduction experiment failed to reduce digital screen time in treatment group and found no effects on well-being; Ferguson’s assigned effect size seems to be based on misreading Table S2.
Vanman 2018 -0.135: Those who abstained from Facebook for 5 days experienced reduced stress (lower cortisol) but also reduced life satisfaction.
van Wezel 2021 -0.123: social media reduction experiment that failed because the control group (33 participants) unexpectedly also reduced social media use greatly, “thus dismantling our intended screen time manipulation” (per authors); no effects on well-being (it is unclear how Ferguson obtained a negative effect size so strong).
Conclusion
Ferguson’s meta-analytical result relies on a fundamentally flawed methodology and highly erroneous data and therefore lacks scientific validity.
Ferguson’s review, however, should not be merely ignored, as its publication in a prominent journal elevates it to a potent source of misinformation that can be used to dismiss concerns about the impact of social media on adolescent mental health.
Appendix
Study Inclusion and Exclusion
Ferguson declared that “studies must examine time spent on social media use” and yet he included two studies that violate this requirement.
Gajdics 2021 (d=-0.364) is not a true social media experiment because it is a one day no phone in high school experiment. The effects of what Gajdics refers to as ‘nomophobia’ (NO MObile PHone PhoBIA) are no doubt much stronger than any impacts related to social media use during school hours. In other words the effects are unlikely to be primarily about social media but about phone abstention and the resulting nomophobia.
When Gajdics is not excluded, there is no justification for not including other phone abstinence experiments, such Brailovskaia 2023. Exceptions for some such studies impairs the integrity of the selection process.
Since Gajdics 2021 has the second strongest negative d in the meta-analysis, its proper exclusion would have had considerable impact. Note that Gajdics portrays his findings as confirmation of ‘nomophobia’ — psychological dependence on phones.
In Deters 2013 (d = -0.207) the treatment is to only increase frequency of updating Facebook status for a week, the result being a decline in feelings of loneliness. There is, however, no evidence whatsoever in the study that time spent on social media increased.
Three of the four incorrect exclusions are puzzling because these are studies that have long been listed in the Experimental Evidence section of Social Media and Mental Health: A Collaborative Review compiled by Haidt and Twenge.
The exclusion of Mosquera 2019, a massive study (1765 participants) that found significant decrease in depression after one week of Facebook abstention, is all the more puzzling given and that it has been covered in the media (see College students who go off Facebook for a week consume less news and report being less depressed).
Engeln 2020 and Stefano 2022 are social media exposure studies similar to the other laboratory studies included by Ferguson and yet were excluded for no apparent reason.
Reed 2023, which found significant declines in depression risk after 3 months of reduced social media use, was published in February 2023, before several other studies on Ferguson’s list. It is the only experimental study I was able to find that was missed by both Ferguson and Haidt & Twenge.
Omission of Major Source
Ferguson credits two blog posts as valuable sources that he consulted when searching for experimental studies:
Studies identified by previous commentators (e.g., Hanania, 2023; Smith, 2023) as this was valuable locating studies in other fields such as economics.
N. Smith, however, refers to Hanania for a list of studies, and R. Hanania explicitly states that the studies he mentions are from a list created by Haidt & Twenge:
Jonathan Haidt and Jean Twenge have put together a Google doc that contains all studies they can find relevant to whether social media use harms the mental health of young people.
This list of studies compiled by Haidt & Twenge is, to the best of my knowledge, by far the most complete source available, and it is often referred to by both Haidt and Twenge in their work, which Ferguson criticizes in his review.
Ferguson could have avoided missing several studies had he consulted the Haidt & Twenge list.
It is remarkable that Ferguson praises two blog posts as valuable sources when both posts depend entirely on the list compiled by Haidt & Twenge, which Ferguson simply ignores.18
Erroneous Effect Size Determinations
Comparing the Ferguson effect size with information in the Abstracts of the studies raises red flags for three experiments: Collis 2022, Brailovskaia 2022, and Lepp 2022; subsequent examination of the full text confirmed that the effect size assignments by Ferguson are indefensible.
Study #1: Incorrect Effect Size for Lepp 2022
The effect size that Ferguson assigned to Lepp 2022 is the highest negative impact of all the studies on his list: d = -0.365.
The Lepp study was a social media exposure experiment and the authors concluded it provided evidence that such exposure was detrimental to well being:
Results demonstrated that social media use for 30 min caused a significant decrease in positive affect.
So how is it possible that this experiment received the largest negative effect size among all the studies, indicating that social media exposure benefited participants greatly?
The only plausible answer is the linguistic misuse by Ferguson of the term ‘control’ as used by the authors when describing their experiment:
All participants completed the following 30-minute activity conditions: treadmill walking, self-selected schoolwork (i.e., studying), social media use, and a control condition where participants sat in a quiet room (i.e., do nothing).
The problem is that the ‘control condition’ was not intended to mean what control typically means in experiments, that is a test for a placebo (or nocebo) effect. This is clear from the following hypothesis stated by the authors:
Indeed the college students who were required to sit in class and literally do nothing for 30 minutes were understandably very irritated by this and it showed up on the post-test results:
So positive affect scores fell greatly after social media exposure, but they fell similarly after imposed inactivity. If Ferguson treated the control group as a true control condition, the deleterious impact of social media exposure on positive affect was ‘canceled’ by the similar impact of imposed inactivity.
The deleterious impact of imposed inactivity was even more pronounced for negative affect (some very angry students):
Social media had little impact on negative affect, but if Ferguson subtracted the deleterious impact of ‘imposed inactivity’ from the no effect of social media exposure, the resulting effect size would indicate substantial (but illusory) benefits of social media exposure.
Needless to say, the authors never consider the ’true’ impact of social media exposure to be one where the imposed inactivity effects are subtracted as if they were a placebo (nocebo) effect. They clearly did not intend to use the term ‘control’ to imply such a misuse, otherwise they could not have concluded that social media use is bad for students and recommend in their Conclusion that students should be minimizing social media use.
After I emailed Ferguson with the Lepp 2022 effect size criticism above and asked if there will be a correction, Andrew Lepp of Kent University, whom I copied, replied to thank me “for the accurate description of our referenced study” and urged Ferguson to answer my inquiry. Ferguson replied solely to Lepp, stating: “I’m comfortable with my interpretation of your data as related to the specific questions of the meta”.
Thus out of the 27 studies examined by Ferguson, the one providing the supposedly strongest evidence against harmful impacts of social media (per Ferguson’s effect sizes) comes from a study whose authors concluded the opposite and dispute Ferguson’s effect size determination.
Study #2: Incorrect Effect Size for Collis 2022
Collis 2022 is an anomaly: out of the 10 social media reduction experiments that lasted over a week, it is the only one to which Ferguson assigned a substantially negative effect size (meaning social media is good for mental health).
The problem is that this appears to be due to an error made by Ferguson.
Nothing in the study suggests the d = -0.138 impact determined by Ferguson.
If anything, social media reduction seems to reveal a small benefit per Figure 3 of the paper:
Regression models in Table 6 provide correlations too minuscule (|r| <= 0.02) to explain d = -0.138.
The authors do, however, include Table S2: Summary statistics for well-being measures:
Perhaps Ferguson calculated d by comparing Survey 1 with Survey 3, and there is indeed a substantial decline in SWEMWBS for the treatment group.
The problem is, however, that one needs to compare Survey 2 (pre-test) with Survey 3 (post-test).
Survey 1 was a ‘calibration’ survey that took place 3 months before the experiment started — to assign this effect to an experiment that began *after* the decline occurred would make no sense.
In view of the above, it seems likely that the d = -0.138 for Collis 2022 is due to Ferguson’s misreading of Table S2.
Study #3: Incorrect Effect Size for Brailovskaia 2022
The effect size for Brailovskaia 2022 on Ferguson’s list is d = 0.
In reality, the authors report numerous benefits to well-being:
Results In the experimental groups, (addictive) SMU, depression symptoms, and COVID-19 burden decreased, while physical activity, life satisfaction, and subjective happiness increased.
There is no indication of any other well-being measures that could counter the ones mentioned by the authors (even smoking declined in the social media reduction group).
The benefits of social media reduction remained even 6 month after the experiment ended:
I’m at a loss as to how this could translate to a d = 0 result unless this effect size is a typo.
Failed Experiments
Some experiments on Ferguson’s list actually failed to reduce social media use in the treatment group or even to prevent reduction of social media use in the control group, rendering the measure of ‘impact’ dubious if not outright meaningless.
For example, consider the following admission in van Wezel 2021:
This experiment did not even have a true control group where participants would continue social media use as usual, and the attempt by researchers to induce different degrees of reduction between two groups was a complete failure — in fact the ‘control’ group reduced screen time more than the ‘treatment’ group.
Collis 2022 is another experiment gone wrong:
In the experiment, we randomly allocate half of the sample to a treatment condition in which social media usage (Facebook, Instagram, and Snapchat) is restricted to a maximum of 10 minutes per day. We find that participants in the treatment group substitute social media for instant messaging and do not decrease their total time spent on digital devices.
In fact students in the treatment group increased their digital tech use:
Remarkably, although students in the treatment group significantly reduced their social media activities, their overall digital activities overall are not affected but, in fact, exceed those of the control group in block 2 (t-test, p = 0.026). This result indicates that students substituted or even overcompensated their social media usage with other activities.
The authors further specify: we see that participants in the treatment group substituted their use of social media services for instant messaging apps (e.g. WhatsApp).
The authors do not justify the categorization of Snapchat as social media even though Snapchat is routinely described as a multimedia instant messaging app. Nor do the authors justify exclusion of WhatsApp from social media, even though it is routinely used by students for communication within large groups.
Omission of a Systematic Review
Any proper meta-analysis begins with a systematic review (see Meta-Analysis 101) — such a review would, however, reveal that social media reduction experiments produce declines in depression risk.
Furthermore, a systematic review would have revealed that a single meta-analysis of all the studies Ferguson included is impossible due to incompatible differences in experimental design and measured outcomes.
Misinterpretation of Statistical Results
To reach the conclusion that evidence undermines concerns about social media, the flawed meta-analysis and the erroneous data are not sufficient — Ferguson must also misinterpret the statistical results.
Ferguson does so by misinterpreting lack of statistical significance as evidence that there is no effect.
Ferguson also wrongly dismisses anything below d = 0.21 as too low to be of practical importance, a mistaken view based on misunderstanding rank correlations involving small ordinal scales.19
Misinterpretation of Heterogeneity
Ferguson admits that his meta-analysis suffers from extreme heterogeneity and implies that this is due to the methodological bias of researchers trying to prove social media is harmful when in reality the heterogeneity is the product of Ferguson’s own improper methodology (a potpourri of incompatible experiments and outcomes).
Since high heterogeneity in meta-analysis tends to widen confidence intervals, it is remarkable that Ferguson’s meta-analysis would still produce a statistically significant result with its CI high above zero had Ferguson fed it correct data.
Misleading Terminology
The title Do Social Media Experiments Prove a Link With Mental Health is incorrect because Ferguson enlarges the scope to aspects of general well-being (that is outcomes such as answers to How good or bad was today?) that often have little to nothing to do with mental health.
Similarly, Table 1 is labeled Meta-Analytic Results of Social Media and Mental Health Outcomes when in reality it is based on well-being outcomes.
Lack of Transparency
Ferguson does not reveal how he calculated the effect size for each study, nor even which outcomes he considers ‘well-being’ — and he never defines how precisely the effect size for a study should be calculated when it is clear which outcomes should be included.
In view of this methodological fuzziness, the very notion of ‘correct’ effect size needs to be viewed in quotation marks to indicate there is no clear definition by Ferguson. The issue is ‘defensible’ versus ‘indefensible’ rather than ‘correct’ versus ‘incorrect’ effect sizes.
Ferguson also never reveals the weights assigned to individual studies by the software he used for the meta-analysis, nor how these were determined — in particular Ferguson does not supply standard deviations for the effect sizes he calculated.
Flawed Methodological Review
The ‘meta-analytical review’ is only half of Ferguson’s paper — the other half is a 'methodological review’ that is as erroneous and flawed as the first half, but that is a topic for a separate critique.
Apr 30 Update: removed the analogy with weight and health.
May 25 Update: a major rewrite to incorporate erroneous effect sizes etc.
June 3 Update: added 3 missing studies.
In the Abstract, Ferguson states: some commentators are more convinced by experimental studies, wherein experimental groups are asked to refrain from social media use for some length of time, compared to a control group of normal use. This meta-analytic review examines the evidence provided by these studies.
These 10 studies on Ferguson’s list are Allcott 2020, Brailovskaia 2020, Brailovskaia 2022, Faulhaber 2023, Hunt 2018, Hunt 2021, Kleefield dissertation 2021, Lambert 2022, Thai 2021, and Tromholt 2016 (the inclusion of Tromholt 2016 is border-line, as its ‘emotions’ outcome is based on a subset from the Center for Epidemiologic Studies Depression Scale and PANAS that showed heavy loading on depressed affect).
See https://osf.io/27dx6 for Ferguson’s list of studies. Mosquera 2019 and Reed 2023 are missing in that list while Davis 2024 was published only recently.
See Study #1: Incorrect Effect Size for Lepp 2022 in text. Note that negative effect sizes in Ferguson’s scheme indicate that reductions of SM use are harmful instead of beneficial. Thus negative effect size assignments by Ferguson support the view that SM is not harmful (and help tilt hi meta-analytical result toward zero).
The notion of ‘correcting’ an effect size in this context means making an indefensible effect size assignment defensible. Ferguson never defines how one is supposed to determine the effect size and so the concept is too fuzzy to merit the notion that there is one correct effect size for each study. This is a separate problem from the additional issue of such a single effect size per study making any sense in the first place (and of these being in any useful manner comparable across the studies).
For example, on May 1st, an essay in The New York Times announced that psychologist Christopher J. Ferguson will publish a new meta-analysis “showing no relationship between smartphone use and well-being.”
“Further meta-analytic evidence suggests that, even taken at face value, such experiments provide little evidence for effects. Put very directly, this undermines causal claims by some scholars and politicians that reductions in social media time would improve adolescent mental health. Thus, appeals to social media experiments may have misled more than informed public policy related to technology use.”
“Currently, experimental studies should not be used to support the conclusion that social media use is associated with mental health. Taken at surface value, mean effect sizes are no different from zero. Put very directly, this undermines causal claims by some scholars (e.g., Haidt, 2020; Twenge, 2020) that reductions in social media time would improve adolescent mental health.”
Only a few studies measured other symptoms of mental disorders, such as anxiety.
These 12 studies are Allcott 2020, Brailovskaia 2020, Brailovskaia 2022, Faulhaber 2023, Hunt 2018, Hunt 2021, Lambert 2022, Thai 2021, and Tromholt 2016 that are on Ferguson’s list (https://osf.io/27dx6) plus Mosquera 2019, Reed 2023, and Davis 2024. In Thai 2021 the decline is not statistically significant due to an extremely small treatment group (n = 16) but a decline in anxiety was statistically significant. Ferguson also included a dissertation by Kleefield 2021, n = 26, which failed to reveal any statistically significant impacts and admitted insufficient power to detect substantial impacts.
We exclude the six lab studies (average d = +0.10) as these were all social media exposure experiments lasting 10 to 30 minutes. We include a 5-day experiment (Vanman 2018) in the One Week category.
Does it make sense to assign, as Ferguson did in Vanman 2019, a single effect size to a change in cortisol levels and a change in a 5-Item Life Satisfaction scale? I do not think so, but the intent of this section is to show that if we do accept these effect sizes assigned by Ferguson, they reveal heavy dependence on duration of social media reduction.
I do not count Davis 2024 as a selection mistake, as it was published recently.
This is per email communications, including an email from Ferguson to Lepp that was forwarded to me by Lepp.
Disqualifying Lepp 2022 for not having a true control group would require disqualifying also Gajdics 2021 (d = -0.364) and Przybylski 2021 (d = -0.152) as they have no control groups — and doing so would have increased the average effect size even more (d = 0.12).
Assignments of Cohen’s d are sometimes difficult to do since not all the studies include all the information required for canonical d calculation (e.g. standard deviations), so I can formulate only rough estimates. Ferguson admits to deducing Cohen’s d from other statistics, and even to converting d to r and back during meta-analysis, and does so without paying attention to the unrealistic assumptions required for such manipulations. Indeed the very use of Cohen’s d by Ferguson is at best dubious, since Effect size estimates such as Cohen's d, Hedges' g and Glass △ depend on the assumption that studies are normally distributed and have equal variance (see The accuracy of effect-size estimates under normals and contaminated normals in meta-analysis).
Ferguson is aware of the collaborative doc and its list of experimental studies because on August 15 2023 he added a comment to the doc regarding one of the experimental studies (Faulhaber 2023) on the list.
Ferguson also asserts: “recent scholarship has found that effect sizes below r = .10 have a very high false positive rate and are, in essence, indistinguishable from statistical noise (Ferguson & Heene, 2021)“ — but the paper he co-wrote provides no evidence of this assertion.