Roger Edwards

DISCLAIMER: This essay was written on my own time and equipment, and is displayed here in my private account. Opinions contained in this essay are solely the author's, and may not reflect those of my employers or any other organization or individuals in particular.

2001 additions in green. 2003 updates in red.





Numbers Games

Justification over Verification

Raise Severe Criteria?






The main means by which congressional funding and positive media image are achieved in today's National Weather Service is "improvement" in verification statistics for severe weather warnings. In a sense, this is understandable, given that the primary mission of the NWS is to reduce human casualties caused by hazardous weather. To create the perception of improved public service is politically prudent in this era of federal budget cuts and pressure from the meteorological private sector. This perception, however, partly results from flawed and misleading statistics produced by a verification system inherently, if inadvertently, geared to portray success. For an evolutionary overview of warning verification procedures, see Donaldson et. al. (1975), Pearson and David (1979), Grenier and Halmstad (1986) and Halmstad (1996).

Currently, a thunderstorm is classified as severe, for verification purposes, if it yields one or more reports of any of the following, in the county for which the warning was issued:

These standards have changed several times in the history of the U.S. Weather Bureau and NWS, as chronicled by Galway (1989).

The number of severe thunderstorm and tornado warnings collectively increased over 300% from 1988-95, and the number of like reports by over 250% in the same period (Halmstad, 1996). Increases in reports are not new, and have been attributed to greater emphasis on warning verification, denser spotter networks, and the proliferation of storm chasers. There is no meteorological evidence that actual severe weather occurrences (versus reported events) have increased so explosively during the same period; yet severe storm climatologies will misleadingly indicate as much to the users (including media) who will probably be uninformed of this critical caveat.

In addition, severe weather climatology continues to be skewed by the emphasis on verification without regard for spatial extent or public impact of severe storms. One example of this is noted by Amburn (1996): "Verification telephone calls to locate weather events often stop after the first severe weather report is received."

Viewing the data wrongness problem then as I do today, Hales (1987) wrote:

Yet, it still is, with full sanction and support of the NWS: In Halmstad (1996), there are tables of severe weather warning verification statistics nationally, by NWS region, and by individual office. These tables are tailor-made for competitive comparison by media, by NWS managers, and by every forecaster at every office. That "numbers game," as I call it, detracts from the more important missions of science and public service. Time spent by forecasters canvassing their county warning area (CWA) for reports is time not spent improving their forecast skill and public usefulness by reading, discussing, and/or conducting scientific research related to forecasting.

Moreover, the tremendous increase in warning numbers, especially those verified merely by marginally severe reports or by isolated events unrepresentative of the storm's impact on the entire warned area, could dilute public confidence in warnings, ultimately endangering lives. This is the "cry wolf" effect, where (for example) a series of warnings in a county over a period of months or years, each verified by isolated dime size hailstones or estimated gusts of 60 mph, creates apathy -- leading in turn to death and injury in a devastating event of extreme severity.

Recent report collection and verification have had a profoundly negative impact on the severe storm reports database, aside from the dramatic and presumably anti-climatological boost in the total report numbers. Severe storm reports vary sharply from place to place, across warning domains, and across report types. As if bounded by walls, certain areas experience sudden changes in reports attributable not to weather, but to collection practices. Ways to illustrate the problem are virtually limitless using public domain software (such as SVRPLOT) which will plot reporting trends. For example, in the six-year period from 1995-2000, screaming spatial inconsistencies appeared in 50 knot wind reports and 60 knot reports.

Now, what should happen more often: larger hail or smaller hail? Not so fast. Consider the same time period, when nationwide, there were 7.25 times as many 1.75 inch hail reports as 1.25 inch reports. If there is a physical process that cranks out a sevenfold boost in hail with a half inch increase in size, it would be one of the most monumental discoveries in the history of all science! Little-known statistical depth charges like this lurk throughout the severe weather database. They may seem ridiculous to the point of being funny; but to the scientific researcher unaware of the deep and pervasive flaws in storm data, they are potentially devastating, hidden subterfuge for any study using the reports database. The conclusion is inescapable: Warning verification practices have become extremely dangerous to the science of meteorology by destroying the integrity of the severe storm data. .

Below are some proposed solutions with descriptions of the problems they would address. Most of them would involve significant revisions of portions of the Weather Service Operations Manual, a process normally mired in red tape; but that is no justification for inaction. These recommendations, if expediently enacted, will improve both science and public service in the NWS.


Tornado warnings should be verified ONLY with confirmed tornadoes.

Presently, any severe event -- even one dime size hailstone -- verifies a tornado warning. This is illogical, misleading, and statistically and scientifically fallacious. As Hales wrote in 1989, "This makes it impossible to use routine verification statistics to evaluate the quality of service provided by tornado warnings." That applies even more in this era of intense scrutiny from the media, emergency managers and weather enthusiasts alike.

One potential problem with this proposal is in its potential indirect impact upon tornado climatology: More unconfirmed, "possible tornado" events may ultimately be listed as tornadoes solely to verify tornado warnings. Bogus tornado reports by the "public," law enforcement or inexperienced spotters, who are in reality watching low-hanging scud or rain shafts, may be accepted without question just to rack up a notch in the gun of warning verification. Progress must not be stalled, however, because of the potential for such exploitation by individuals who lack scientific integrity.

Verification stats should specifically include a partition of significant events , which used to be done but was stopped for some reason.

Significant events (Hales, 1988) are:

[Given damage, the economic criterion is the most tenuous and archaic -dependent of the "significant" classifications and may be ignored for this purpose.]

Although the NWS should continue warn for all thunderstorms showing severe potential, severe thunderstorm and tornado warnings should be verified primarily (if not entirely!) using significant events.

This verification would be done either solely using significant events, or on a sliding scale weighted most heavily toward the extremes of each event type.

Here's the problem: Under current NWS verification policy, one dime size hailstone among smaller ones in a remote field is numerically no more important than tens of thousands of people being clobbered by softball size hailstones at Mayfest. Unfair? Misleading? Misrepresentative? Bad public service? Scientifically unsound? Yes, to all of the above! So why continue to verify warnings without regard to the extremity of the event?

Few would argue that people should be awakened at 3 a.m. to take extraordinary protective measures in significant events, except perhaps the hail. By nature, they are relatively uncommon; yet they represent the greater public hazards for which the severe weather watch and warning system is designed. These are the measures by which we should judge the effectiveness of warnings as a public service, versus as a negotiating tool in the plea for funding.

Again quoting Hales (1988): "Two objectives this would accomplish are to make the verification system fairer to the warning offices and more meaningful to the management of the NWS. It would also go a long way in eliminating the temptation of the offices in manipulating the marginal severe reports."

The last sentence in Hales' quote is disturbing and brings up an important aside that I discuss below.

Weight hail and wind verification using an inverse of the latest census population of each county.

Too complicated? Meteorology is science; and in science, inconvenience is not a valid excuse. Census databases are readily available in digitized formats. This would reduce the population bias in reports, at least for the sake of representative stats. It would also go much further in solving the problem of relative underwarning of sparsely populated counties (documented in Hales, 1987), for which there is no physical justification in the atmosphere.

Many forecasters are encouraged not to warn where the population is low, because reports are unlikely, and warnings might not verify. Is this public service, or self-service? How do we know (I mean KNOW) that "nobody is out there"? The hiker, hunter or pumpjack repair crew who gets caught in the barrage of baseball size hail might have benefited from the warning which wasn't issued, even if they never call in a report to verify the warning. I witnessed the wrangling over an incident in the early 1990s when a yacht sank deep into the night, in a remote area, after a warning was not issued because "nobody's out there." Is the life of that fisherman, sailor or oil well repairman worth less than any given individual among 5 million in a big metro area?

Whether there are 5 million people or allegedly "none" in a storm's path, the storm is going to do what it is doing to do, period. The storm doesn't care. Its threat should be evaluated by its environment, apparent characteristics, and near-term forecasts thereof. [Same goes for watches and outlooks, when looking at the forecast environment relative to population centers or voids over land.] Where large population needs to be considered is in the warning preparedness and dissemination processes -- how to get the word out to the most people the fastest. But the preparedness and dissemination steps come before and after (respectively) the yes/no call is made on whether to issue. Meteorologists are not social scientists or psychologists and shouldn't pretend to be. Again, those factors are important to dissemination and preparedness, but that's another discussion altogether.

The lack-of-population issue is less relevant anymore for tornadoes. If anything, tornado strikes in populated areas are becoming more common, since we are putting more in their way, through suburban sprawl. Some tornadoes may still be undetected, especially in remote areas at night, or with brief, rain-wrapped touchdowns that do no noticeable damage. The population bias in tornado confirmation has decreased greatly (Grazulis, 1993), however, because of the focus on tornadoes by spotting, chasing, and proliferating home video of tornadoes. Also, in over a decade since publication of Grazulis' book, popular awareness of (and interest in) tornadoes has grown tremendously due to fiction ("Twister" and its TV-movie counterparts), documentaries on severe weather and storm chasing, and widespread broadcasting and marketing of tornado videos. If anything, as discussed above, daytime tornadoes may be over-reported because of the presence of hordes of inexperienced spotters, cops and chasers seeing "tornadoes" anytime there is a low-hanging chunk of scud.

Use only measured wind gusts -- not estimates -- to numerically verify warnings.

After around 200 storm chases in 11 years, and witnessing sustained surface winds analyzed at up to 120 mph in the outer eyewall of Hurricane Andrew (based on a map by Powell and Houston, 1996), I can't claim to estimate consistently and reliably a gust was 60 mph (severe) versus 55 (non-severe). I challenge anyone to prove that he/she can. No reputable chaser or spotter I know can claim that skill. How arrogant can a spotter, chaser or sheriff be, actually, to think he/she can reliably estimate a 60 mph wind (versus 55 or even 40) when experienced scientific storm observers (who have seen dozens to hundreds more storms each!) will deny such astounding ability?

Yet, hundreds of "60 MPH TSTM GUST" reports appear every year in the report logs. This speed is, by far, the most common wind speed in the severe weather climatology since 1990. Why 60? Why not 59, 61 or even 65? Simple: 60 mph is the nearest "clean" number above the severe threshold. It is very convenient for all links in the warning verification chain, and as such, it is rampantly abused and almost never backed up by instrumented measurement! These highly subjective estimates (often "public reports" not from trained spotters) contaminate the severe weather climatology with great errors of uncertainty, and heavily impact warning verification.

An argument can be made: "But what about areas where there are no instruments? How will the warning verify?" To which I say:

Use only measured hail diameters -- not estimates -- to numerically verify warnings.

There is no value to objective verification in subjectively estimated data, including hail diameter. All trained spotters should be supplied with standardized calipers, and should use them. For several years, I've been using a set of plastic calipers purchased for one dollar; and by direct contact of its measuring vise with the hailstone, it is ideal for precisely and accurately measuring greatest hail diameter. This measurement takes less than 2 seconds. If bulk distribution of such calipers is financially crippling to the NWS, spotters should be implored to use their own rulers, or cheap rulers supplied by local corporate contributors to spotter training sessions (such as TV and radio stations, etc.). Since exposing the body to very large hail is dangerous, spotters could provide estimates of such hail to NWS immediately for warning-decision and informative purposes, then provide accurate measurements as an official report within a few minutes, as soon as it is safe to do so.

In the national severe weather database, explicitly distinguish between measured and estimated reports of hail and wind for every single event. This process should begin at the level of the local storm report (LSR) and continue until final Storm Data entry.

Scientific research requires the most precise and accurate data possible. Under the current methods of compiling severe weather event information, there is no required distinction between measured and estimated wind and hail events -- unless the person making the entry fortuitously chooses to do so. Except in those uncommon instances, there is no way for the researcher to definitively judge the accuracy of a report such as "60 MPH TSTM GUST" or "1.75 INCH HAIL." This is because, in most cases, no comment is made as to whether the event was coarsely estimated (e.g., by comparison of hail size to an object or use of the Beaufort wind scale), or precisely measured by means of an instrument.

Making this distinction, specifically and concisely -- in every single case -- is a policy that may be enacted at any time. It would require little or no additional expense of time or resources in most cases. The meteorologist at the local NWS office collecting such information communicates with spotters and others who provide reports, and can readily determine whether most reports are directly measured or merely estimated. The nature of reports from instrumented observation platforms is, of course, already known. This distinction can be made in the comments section of LSRs under the present automated software system used by the NWS. Alternatively, according to its author (VanSpeybroeck, 1998), PC software currently in use for compiling LSRs could be easily modified to allow categorical entry of the measured/estimated status, such that LSR entries can be "flagged" for their accuracy during their automated processing for SPC rough logs. Same with AWIPS LSR software.. The only prerequisite is its authorization by the NWS.

The benefit to scientific studies of severe weather events is obvious: accurate and consistently performed measurements can be culled from the larger volume of larger estimated data with confidence, enabling more precise and accurate scientific results and more credible conclusions within the research.

A separate category of enhanced public warning, for the extremely severe events, must at least be operationally tested.

A number of people have suggested multi-tiered warnings. The primary objection is that such warnings could confuse the public, especially considering so many people still do not know the difference between an watch and a warning. This lowest-common-denominator approach ignores the fact that many people do, and is a dismissive cop-out. There is little backlash against Stage-x smog alerts because they are "too complicated." Let's give most people credit for greater degree of awareness than the most profound examples of ignorance that can be dredged from the sediment of society.

The severe thunderstorm warning as used today is a creaking old relic from the era of teletypes and WSR-57 radar networks, when much less storm-scale information was available to the forecaster. The capability to warn for specific type and extent of severe event is here -- now -- as judged by wording in some warnings and by numerous recent operationally oriented publications [i.e., NOAA (1995b), Amburn (1996), Collins (1996), Sohl, et. al. (1996)]. Let's show it, by invoking a special category of severe thunderstorm warning for events that are extreme public hazards. It can be called many things; for now, I'll refer to it as an "Extremely Dangerous Thunderstorm Warning," and solicit suggestions for even more appropriate names.

I believe that an explicit category of enhanced severe thunderstorm warnings will work, given careful testing and refinement based on public and emergency manager feedback. Some offices have already begun to do so implicitly, under the umbrella of the traditional "severe thunderstorm warning," by means of enhanced call-to-action statements such as:


This is a good step; but must be consistently applied on a national scale to be the most effective. One way is to use such alarming calls to action in conjunction with an "extremely dangerous thunderstorm warning," the header of which commands extra attention from the start. This warning is one for a public thunderstorm emergency -- most often not marginal severe such as dime size hail -- which warrants extraordinary protective action beyond the usual "taking shelter" precaution already recommended for all thunderstorms.

Guidelines for such warnings should be consistent, but flexible enough to allow judgment of situational relevance by the forecaster. A 100 mph derecho or a storm like the Mayfest supercell -- dropping hailstones large enough to kill -- would clearly qualify. A less severe storm that normally would be best covered by a severe thunderstorm warning could warrant an "extremely dangerous thunderstorm warning" if it is headed for a major outdoor public gathering.

Sure, there is a gray area here, as with most issues of operational forecasting. Therefore, this type of warning should be verified and evaluated on a case-by-case basis, rather than using rigid statistics.

Besides those specific, relatively short-term ideas for improvement, there are some broader problems that need to be openly addressed, and which will require important changes in NWS operating philosophy:


As Hales (1988) stated, the manipulation of marginal severe reports is a natural concern. After all, the fox guards the henhouse in NWS warning verification system: The same people who issue the warnings are responsible for documenting severe weather in them. Even if human nature and the pressure to verify never lead to cheating, this setup blatantly gives the appearance of impropriety, and should be overhauled.

The solution to this is not simple or straightforward. Ideally, an outside accountant (state climatologists? FEMA? Private contractor?), with no vested interest in the results, would do this work; but would also have little or no familiarity with the real-time data gathering process from spotters. Also, another layer of bureaucracy is introduced to the process, which may be dangerous in real-time should spotters have to report to some central data-gathering intermediary between them and the local NWS office. These are no excuses not to try to solve the problem and seek independent verification, though.

Manipulating severe weather databases, including failure to put equal effort into finding reports from thunderstorms in non-warned counties, constitutes SCIENTIFIC MISCONDUCT and should be treated as such. How could any person doing this call himself/ herself a meteorologist (by definition, a scientist)? One hopes it doesn't happen; but any group is only a microcosm of society at large, and is prone to ethical lapses. The blame for such scientific misconduct shifts to regional or national management if officials at local offices are pressured to yield high probabilities of detection (POD) for severe weather events -- through either comparative statistical ranking of their offices or compelling language in their performance evaluations.

One anecdote that has circulated in the field in recent years is of a forecaster wishing to verify his warning and improve his "skill'' scores, desperately scouring the countryside with phone calls in an attempt to find a loggable report, and finally reaching a citizen who saw some hail. He asks, "What size coin best describes the hail you saw?" Of course, the hail is logged as severe, because any answer meets severe criteria! Whether or not this actually happened, no one familiar with the process can honestly deny that immense effort is put into verifying warnings in the NWS. Think of the many thousands of dollars in man-hours and long-distance calls that are consumed at taxpayer expense yearly in the pursuit of severe reports. Is this really necessary? This question leads to the next issue...


The NWS verification process is self-serving, not public-serving. . Those portions of NWS warning verification results which indicate forecast improvement are commonly given to the media for positive image portrayal, and used as leverage in the annual struggles for funding. As a result, NWS management has -- whether intentionally or not -- placed enormous pressure on forecasters to verify warnings (a superficially noble goal with unpleasant side effects, as already noted), and given mainly lip service to reducing false alarms (a highly worthy goal). This pressure translates into a domino effect down through the chain of command: regional managers, area managers, meteorologists-in-charge (MICs), warning-coordination meteorologists (WCMs), and ultimately, many forecasters. In my informal conversations with several MICs, WCMs, and Science and Operations Officers (SOOs), pressure from above to verify warnings, false alarms unimportant, has been a common theme. Whether this is overt reality, as abundant anecdotal evidence suggests, or is merely perceived, the effects are the same: Enormous collective time and resources used to find reports where warnings were issued, and during their valid times.

While some form of objective verification is probably necessary for forecasters and their managers to evaluate the results of warnings, verification alone does not reveal the quality of the forecasts (warnings, in the context of this discussion), nor does it say anything about the justification for them. A forecast can be right for the wrong reasons -- a boost to the "skill scores," but still a bad forecast because it was based on faulty data or concepts.

Further, a forecaster may consistently verify warnings well with cheesy and marginal "severe" reports, miss a spectacular and deadly event, and still have great stats. But is it good public service to verify 100 out of 101 severe thunderstorm warnings with non-damaging, penny size hail, and/or tree limbs broken by subsevere (less than 50 kt) gusts, disturbing people at all hours of the night and day with weather alarms for no exceptional reason, when the one event missed is the supercell that demolishes tents at the state fair, killing a dozen people? Must be...afterall, that forecaster's numbers are good!

Judging true forecaster skill is, therefore, not a purely statistical process! Verification scores are part of it -- but only a small part. The bulk of quality control of forecasts can only be done by scrutiny of the reasons for the forecasts (for this discussion, warnings) -- based on the short-fuse data used by the forecaster (primarily from the WSR-88D) and his/her application of physical concepts of the ambient weather situation. In other words, managers should determine warning value in addition to accuracy.

Problem is, such evaluation is time-consuming, and can be lazily avoided given the easy alternative. But what is a more meaningful use of supervisory time from the public-service angle: compiling and comparing a pile of often misrepresentative and inconsistently derived verification data, or scientific examination of the forecast rationale?

The emphasis on verification of forecasts, from every NWS forecaster all the way up through management, must be shifted to justification of forecasts. Yes, I recognize that some form of objective verification will probably always be needed, at least for in-house use. However, professional development of forecasters as scientists is better served by greater emphasis on (and time spent) learning -- through training, reading, and research.

As noted in a richly referenced essay by Doswell, in an era of ever-improving model performance, the best forecasters will be the ones who can consistently predict the exceptional events. It is these events that are often the most costly, destructive, and when unforecasted and unwarned, harmful to the image of the NWS. "Cookbook" forecasters are useless here. Instead, the scientific forecaster will more often predict them. Armed with a more thorough understanding of the processes that cause severe weather, they will also be the ones issuing the most timely and accurate warnings. Time spent hunting for reports, and competitively comparing verification stats, is time not used for professional improvement. And, as discussed already, some side effects of warning verification contradict scientific ideals that should guide forecasters. In short, when emphasized too heavily, verification as now practiced ultimately defeats its intended purpose: to improve warnings.

Ultimately, the best service is provided by watches or warnings whose justification is rooted deeply and fully in meteorology. The most common lament I hear about this principle is that it is noble, but not realistic, because of fear of some terrible consequence (such as being fired, which has never happened for making a scientifically sound warning decision!). The bogeyman under the bed is, both in warning forecasting and in childhood, no more than horror fiction. If a watch or warning is meteorologically justified, politics be damned, it will stand up to inquisition. As FDR once said, there is nothing to fear but fear itself.

Pressure? Anyone who can't deal with taking some heat for a professional judgment call, and calmly provide sound meteorological justification for the call, shouldn't be in forecasting. It has been elucidated well (also by Doswell) that forecasting is very difficult on the idealist who wishes to maintian his/her integrity in the face of unethical pressures imposed from above. Through it all, the principle of scientific integrity must not be breached. We've all heard the saying about heat and kitchens. And when working in a hot kitchen there will sometimes be burns, even if every recipe is followed. It goes with the territory.


Many people, inside and outside the NWS, have called for increasing severe standards for hail and wind. Based on personal communication, a large number of NWS forecasters support this; and I have made hail size criteria proposals in the Line Forecasters Technical Action Committee (currently, Director's Advisory Committee in Forecast Operations) system. [DACFO is an internal "suggestion box" procedure for NWS forecasters to submit ideas to management.] Also, Internet meteorology discussion groups have been liberally sprinkled with various suggestions on severe hail or gust criteria changes.

The current hail size criteria, .75 inch, is an archaic threshold based on hail damage to aircraft moving at 200-300 miles per hour (Galway, 1989), not damage to people and objects on the ground -- where warnings are valid. Increasing the severe hail standard is a noble idea supported strongly by empirical evidence and observations. Storm chasers, who have the most experience as a group with large hail and its damage potential, generally recognize that hail smaller than about 1.5 inches does little if any damage to most automotive finishes. Many of the proposed severe hail standard increases hover around that diameter, ranging 1.25 inches to 2 inches (Hales' significant hail size). BUT -- There is no obvious consensus, and no proposal at all for hail size criteria that has been scientifically tested to my knowledge.

An effort was launched in the late 1990s to review severe criteria, with a key proposal to raise the hail limit to at least one inch. However, the task team suffered from too many cooks spoiling the stew -- a slew of non-scientists (e.g., emergency managers and NWS bureaucrats) unduly influencing what should have been a scientific process: determining a size for falling ice spheres to cause signficant home and auto damage. The end result was fitting in the "no surprise" weather service: no surprise! The criteria stayed at .75 inch for reasons which had nothing to do with terminal fall speeds and destructive momentum of ice spheres.

I strongly believe, based on extensive personal observations of hail of varying sizes, that severe hail standards should be raised to at least 1.5 inches diameter. Ideally, this could be done in a scientifically justifiable manner based on formal lab studies on the effects of ice chunks (simulated hail) on various common automotive and construction materials at various speeds. I hope someone has the motivation (I do), time (I don't) and funding (I don't) for such research. Attention all meteorology and structural engineering grad students and advisors: Hail damage is a worthy area of study! Afterall, the costliest thunderstorm event in U.S. history -- about $2 billion -- was the Mayfest hailstorm in Fort Worth (NOAA, 1995a).

In the meantime, sole usage of significant hail stats would effectively raise severe hail standards for the sake of warning verification and evaluation of public impact -- a major step in the right direction.

Wind damage, on the other hand, is so variable in cause that there may never be agreement on a "severe" standard. To some laypeople, any wind damage is severe, no matter how minor. This is an easy and mindless argument; afterall, many of the same people consider heavy rain and lightning "severe." Wind damage surveys, by people with professional credentials to do them, are customarily accepted as the best way to determine causative wind speed, but even they are highly subjective. [Refer for example to the F-scale arguments presented by Doswell and Burgess (1988).]

The standards used in manual SELS [SEvere Local Storms Unit, now Storm Prediction Center (SPC)] logs were customarily 6-inch or larger diameter tree limbs down -- where the ambiguous descriptor "large" was assumed to exceed that threshold -- utility wires down, or any other damage not described as "minor" in the report. SELS personnel used varying degrees of discriminatory judgment; some would not log events judged as "minor," such as shingles removed or patio furniture overturned. Clearly, that approach was grossly infested with subjectivity, despite the noble intent of in-house guidelines. Now, however, "TSTM WIND" or "WIND DAMAGE" may verify a warning, and "TSTM WIND" can mean almost anything. That's no improvement.

Realizing that wind damage is highly variable depending on many factors (such as soil moisture, construction practices, and wind direction), I nonetheless believe it is possible for an NWS-commissioned small group of meteorologists and wind engineers, experienced in storm damage surveys, to develop consistent and comprehensive guidelines for what type of wind damage qualifies as severe, which then must be clearly written and communicated to both the public and operational forecasters. This would still bear a heavy, but unavoidable, element of subjective judgment; but would at least be more useful than present guidelines.


As a professional severe storms forecaster, past recipient of wind and hail damage to my property, and veteran storm spotter and chaser, I have seen both the operational and human sides of the severe storms watch and warning system. Operationally, tens of thousands of severe thunderstorm and tornado warnings and severe weather reports arrive annually, all of which are read as received by national forecasters, as part of each one's duty to conduct continuous weather watch. They are useful for evaluating the evolution of severe thunderstorm situations, and for nearly instant (if indirect) feedback -- revealing the decision processes of the field forecasters who use outlooks and watches as guidance. The NWS warning and verification system directly affects my career, my main hobby (storm chasing), and the personal safety of my family and every citizen in the U.S. I have a vested interest in doing my best to ensure that this system optimally serves taxpayers, which is the motivation behind this effort.

There have been suggestions for the NWS to leave both severe thunderstorm criteria and watch and warning verification procedures unchanged. "Leaving things as they are" is the easy way out and fails to acknowledge the flaws in the warning and verification system, as documented here and in some of the reference material. To take that stance is to ignore the obvious. This head-in-the-sand approach will not lead to improvement of its public service, which is the ultimate purpose of the NWS.

Thanks again to meteorologists at SPC, NSSL, OSF (now ROC), and over a dozen NWS field offices, and also emergency managers and concerned citizens on WX-TALK, who provided me with insightful comments and suggestions on severe thunderstorm warning criteria, watch/warning verification, and reliability of severe reports. Thanks also go to scientists at SPC and NSSL for their encouragement and/or constructive comments on this editorial, especially Chuck Doswell, Joe Schaefer, Jack Hales, Rich Thompson and Steve Weiss. Thompson generated some supporting graphics. Many of the suggestions made here are not original; Hales had some similar recommendations in a series of tech memos and conference papers back in the mid/late 80s, and nothing was done about them.

Continued discussion may yield more suggestions, and enhancements to this editorial. I'd like for this web essay and a continued free exchange of ideas to eventually result in a formal set of proposals to those in NWS hierarchy who have the power to institute such changes. First, however, they must acknowledge the existence of the problem. That is unlikely, given the self-serving nature of the system. It is possible to elicit change, however, as Galway (1989) stated:


There has been a great deal of feedback on this essay, over which I owe apologies for not posting sooner. As I am able to dig up and reproduce old notes, I will; in the meantime, more recent feedback is reprinted below. Also, Doswell has posted some warning verification-related feedback in his essay on idealism in the NWS.

During an e-mail exchange, a WCM wrote:

    We already have enough warning decisions being made for the wrong reasons - fear/CYA...concerns over verification, overreliance on algorithms, etc, etc, etc.

What could I say? That statement, from someone who works the front lines of the system, speaks well for itself.

One long-time lead forecaster wrote,

    Yes, you're right, if this weren't the real world. But if I don't warn for _____ County every time there is a red echo on his WeatherTap radar display, their EM [emergency manager] is going to call Congressman ______'s office and b__ch til his face turns blue because they lost the WSO and he thinks we don't know what we're doing. And my boss hears about it. Then I hear about it. So...yes, I agree with you - warn for what needs it, and don't warn for the crap, that is, for every county in my CWA except _____.

Just who is the trained and educated meteorology expert here, the EM or the forecasters? I like EM's as a group very much, and I resent that he's giving them all a bad name through his arrogance and childish vindictiveness. This delusional twit thinks his little fiefdom is more important than every other, and that every single storm is severe. But your boss has some responsibility here too. The tail is wagging the dog, my friend, and you know it.

I sympathize, truly. You're not alone. In other CWAs [county warning areas], too, there is a mayor, sheriff, EM, legislator, Chamber of Commerce president or other "VIP" upset that his area lost a legacy WSO in modernization, or that a tree once fell on a shiny new Lexus he bought for his daughter, without a warning. So he threatens to call Congressman Megabucks if there is ever *not* a warning for a storm. If we live in fear of offending some ignorant, high-falutin', loudmouthed crackpot with political influence, then why not just sit his butt down at the warning chair for an hour or two instead? It is management's responsibility to shield forecasters from such rubbish and let them do meteorology. They need to tactfully stand up to those people and give them a realistic view of the world of weather warning -- not meekly bend over and whimper, "Yes Sir, may I have another," while pressuring forecasters to have a liberal warning policy for just that county. The latter is inconsistent (bad) public service.

Here is the relevant excerpt of a spring 2003 note. It was sent by a wisely conscientious and deeply concerned young newcomer to the profession, stationed at an office in the Midwest. My response follows.

    I really enjoyed the editorials page, and cannot tell you how much I agree with you on so many of the different topics you hit upon. The warning verification essay really hit home, and since our storm data focal point is a verification Nazi, it has caused a lot of debate and discussion within our office. He epitomizes everything that you point out is wrong with verification, and I plan to share the link to your essay with him, and several others in the office. Your essay logically and coherently made so many points that I have been trying to make!

    The letter you wrote to NWSEO president on testing for proficiency was awesome! I cannot tell you how much I agree with you there. Finally, I enjoyed the "dumbest e-mails ever" link and have tried to make certain that this e-mail does not end up being posted on that page!

    Keep up the nice work!

Thanks for writing, and for the good words. Don't worry, no material from you will make the dumb e-mails page! :-)

Yes, please share that essay with anyone you desire. I'm curious what the reaction will be.

It's very important that folks like us, with at least a nugget of ethical conscience and integrity, get under the skin of those who practice scientific misconduct and at least give them notice that *somebody* is watching. The problem is broad, deep and pervasive, however, and will not go away until some time after upper management abandons comparative verification numbers as a lobbying tool and PR sword. This is why regions, along with many (not all!) MICs, WCMs and focal points genuflect at the altar of POD -- FAR and scientific integrity be damned. It's classic, top-down, pressure-driven micromanagement -- the very antithesis of servant leadership.

The verification gestapo, then, isn't just a local-office problem; it's widespread -- a malignancy which metastasized through the NWS years ago and which threatens to implode it, leaving your job and everyone else's at your office automated. If that seems far-fetched, consider this:

  • 1. If only those severe reports which occur in warnings are logged, and
  • 2. Almost every thunderstorm reaching one or more of some predetermined algorithmic thresholds is warned, unless over known population voids...
    ....then one can create a Visual BASIC program to issue warnings and read and verify reports nationwide, with no human intervention. Joel Myers and his ilk *are* quite capable of this...and they don't even fully realize it yet! Scary thought, eh?


    Amburn, S.A., and P. Wolf, 1996: VIL density as a hail indicator. Preprints, 18th Conf. Severe Local Storms, American Meteorological Society.

    Collins, W.G., 1996: Use of the WSR-88D in waterspout nowcasting. Preprints, 18th Conf. Severe Local Storms, American Meteorological Society.

    Donaldson, R.J., R.M. Dyer, and M.J. Kraus, 1975: An objective evaluator of techniques for predicting severe weather events. Preprints, 9th Conf. Severe Local Storms, American Meteorological Society.

    Doswell, C.A., and D.W. Burgess, 1988: On some issues of United States tornado climatology. Mo. Wea. Rev. 116 (Feb. 1988), 495-501.

    Galway, J.G., 1989: The evolution of severe thunderstorm criteria within the weather service. Wea. Forecasting 4, (Dec. 1989), 585-592.

    Grazulis, T.P., 1993: Significant Tornadoes: 1680-1991. Environmental Films, St Johnsbury VT.

    Grenier, L.A., and J.T. Halmstad, 1986: Severe local storm warning verification preliminary procedures. NOAA Tech. Memo. NWS NSSFC-12.

    Hales, J.E., 1987: An examination of the National Weather Service severe local storm warning program and proposed improvements. NOAA Tech. Memo. NWS NSSFC-15.

    _____, 1988: Improving the watch/warning system through use of significant event data. Preprints, 15th Conf. Severe Local Storms, American Meteorological Society.

    _____, 1989: The crucial role of tornado watches in the issuance of warnings for significant tornadoes. Preprints, 12th Conf. Wea. Analysis and Forecasting, American Meteorological Society.

    _____ and D.L. Kelly, 1985: The relationship between collection of severe thunderstorm reports and warning verification. Preprints, 14th Conf. Severe Local Storms, American Meteorological Society.

    Halmstad, J.T., 1996: Severe local storm warning verification for 1995. NOAA Tech. Memo. NWS-SPC-1.

    NOAA (National Oceanic and Atmospheric Administration), 1995a: Storm Data 37 (May 1995).

    NOAA (National Oceanic and Atmospheric Administration), 1995b: The Fort Worth-Dallas Hailstorm/Flash Flood, May 5, 1995. Natural Disaster Survey Report (November, 1995).

    Powell, M.D., and S.H. Houston, 1996: Hurricane Andrew's landfall in South Florida. Part II: Surface wind fields and potential real-time applications. Wea. Forecasting 11 (Sep. 1996), 329-349.

    Sohl, C.J., E.M. Quoetone and L.R. Lemon, 1996: Severe storm warning decisions: Operational impact of multiple radars. Preprints, 18th Conf. Severe Local Storms, American Meteorological Society.

    VanSpeybroeck, K.M., 1998: Personal communication.

    The government [is] extremely fond of amassing great quantities of statistics. These are raised to the nth degree, the cube roots are extracted, and the results are arranged into elaborate and impressive displays. What must be kept ever in mind, however, is that in every case, the figures are first put down by a village watchman, and he puts down anything he damn well pleases.