Cupping is a game of chance

14 minute read

(AUTHOR’S NOTE, 15 JUN 2024): After speaking with Scott and Gildardo following the publication of this piece, I want to offer clarification, confirmation and correction that Gildardo did in fact initiate the conversation and sending of the lot at no cost to Prodigal, and that Prodigal had no intention of leaving Gildardo without payment. Bearing that in mind, I’d like the exegesis that follows to serve as a cautionary tale—for roasters and producers alike—of the hazards and risks of so-called “direct trade.” At the end of the day, I don’t believe that the coffee that arrived to the U.S. was exactly the coffee that Gildardo believed he sent, nor indeed that it was the coffee Prodigal might have hoped to receive.

This piece is inspired by curiosities surrounding Aviary’s wildcard release XX1: Gildardo Lopez, which is available in limited quantities through the Aviary website. To support this blog and the work I do—and to place this piece in context of the coffee that inspired it—please consider ordering.

I wasn’t there—not as a participant in the conversations, nor a cupper in the room. I didn’t know the process, nor the contracts, nor what was agreed upon or intoned—but from the outside, the story struck me as unusual.

When I saw it, I knew two things immediately:

  1. I wanted the coffee, no matter how it cupped, and
  2. I wanted to write about it.

A few weeks ago, I saw a post from Scott Rao on Instagram: 

A couple of months ago, Gildardo Lopez got in touch with us and offered us some green washed pink bourbon. We were hesitant, as our standard for purchasing is extremely high (as are the prices we pay).

We told Gildardo the coffee would have to cup at 87.5 or higher to be considered. Long story short, Gildardo insisted we take 30kg of green, and only pay if it met our standards.

The coffee cupped consistently at 87.25, which is excellent. Unfortunately, we are committed to our blind cupping and scoring system, and Mark and I have to independently score a coffee 87.5 without reservation to purchase a coffee.

We neither wanted to roast the coffee nor have Gildardo be out of pocket, so I suggested we sell the green and send the proceeds to Gildardo. This way, Prodigal will not make a profit, and Gildardo will receive a very solid price for his green.

It’s uncommon for coffee to transact this way—for a producer to ship an exportable lot without first receiving payment (or a contract) and without trust agents somewhere in the middle to absorb risk: risk to quality, risk of non-payment, risk of non-delivery, risk of product failure. Even in so-called “direct trade” contracts, where a buyer contracts coffee directly from a producer or exporter, in most cases, payment is connected to logistics and provided cash against documents. In other words: payment is received when the coffee is shipped, and after approval of that coffee based on a representative sample. 

In this case, it seemed that payment wasn’t provided ahead of time, but would be provided upon acceptance of the coffee, in full, upon arrival in the U.S. Outside of consignment scenarios—most of which are exploitative by virtue of forcing the person in the supply chain with the least power to shoulder the risk of nonpayment and to finance the carry of that coffee—I’d never heard of this happening before. Typically, buyers contract coffee “subject to approval of sample,” (SAS) with a designation of either “pre-ship” (PSS) or “arrival” sample with rejection windows based upon those timelines. In other words, if a buyer with a SAS-PSS contract approved the PSS and the coffee shipped, they no longer had rejection rights. In every context I’m aware of, SAS-Arrival contracts are between a roaster and an importer—in this case, even if a roaster rejects the coffee on arrival, the importer agrees to take on the risk of carrying that coffee and finding a new buyer. In either scenario, though, the producer was already paid—at or before the time of export.

I knew that Gildardo processed his coffee to parchment, but didn’t have a dry mill himself. Before export, he’d need someone to mill and clean the lot, which is something that would need to be paid for. Very few Colombian millers are well equipped to handle milling very small lots, so when I asked the few I know who were equipped to mill microlots if they’d milled this one, I was surprised when they all said no. 

And I knew that FNC would need to collect their payment upon export (they are funded through a six-cents-per-pound tax collection from every coffee export) and that there were costs associated with export that would need to be covered somewhere—so I couldn’t imagine that Gildardo could or would ship coffee without a sales contract. And Gildardo wasn’t an exporter—he would need to be, legally, to move the coffee out of Colombia. So again: I was surprised when I learned that his usual exporter had not facilitated the movement of this coffee out of the country.

And beyond that—Prodigal didn’t find that the coffee met their expectations. But Gildardo produces coffees that win awards. I’d bought a washed Pink Bourbon from Gildardo for Aviary—a coffee apparently similar to the lot in question—which I’d cupped at 88 points. During the Copa de Oro competition in 2023, which Gildardo’s son, and which Gildardo himself secured second place—I’d cupped numerous coffees from his family at 87+. Scott and Mark are experienced roasters and cuppers who follow an extensive internal methodology, and I believed them that they didn’t find the coffee up to the standard they’d set for Prodigal—but the fact that a washed Pink Bourbon from Gildardo didn’t meet that standard surprised me. By all accounts, it should have. Scott and Mark surely would agree—it’s why they pursued the coffee in the first place.

It all seemed strange to me; so without knowing anything further about the coffee, I left a message on Scott’s post that I would buy the rest. And I did: sight unseen, cup untasted.

I’d heard through the grapevine that there might have been some issues with the coffee—phenol or chlorine flavors in as many as 25% of the cups—but I figured that was a problem I’d address later.

Besides, it didn’t matter to me: I wanted the coffee, and I wanted to write about it.

I didn’t want anyone to lose money and agreed to pay Prodigal for the costs they’d sunk into the coffee thus far—around $5 per pound for shipping, plus $1 per pound for the labor to process and re-pack the coffee for shipping (it had already been broken down for resale to home roasters). It admittedly bothered me that of the $6 per pound I spent on the green coffee, none of it yet had gone to Gildardo—so since I was “borrowing” his asset—the coffee—I proposed to Gildardo and Prodigal that instead of paying them the retail price they’d offered for the green on the website, I’d roast the lot and offer it through Aviary as a retail offering, wiring Gildardo 100% of my profits as payment, which I figured should amount to a much higher price than he’d receive from the other arrangement.

Much of my work over the past few years has been building systems, protocols and competencies to establish integrity in sampling. Trust and certainty surrounding whether or not sample material received will match the coffee upon arrival is at the heart of forward bookings and the way importers operate. It’s also extremely difficult, requiring no small amount of effort, expertise and resources.

Typically, before contracting a coffee, I have a fairly complete picture and understanding of it: cultivar and processing information, farm practices, price transparency, milling practices and cup profile. And usually, by the time a lot arrives at my roastery, months after harvest, in addition to physical evaluations, I’ve roasted and tasted samples of it at least two, but usually three to four, times: first as an offer sample (as harvest progressed) and then as a pre-ship sample prior to departure from its country of origin via boat or plane. Sometimes, a “type” sample indicative of flavors or qualities will precede the offer sample, but when harvest and actual production commence, an “offer” sample is what a producer or exporter will use to shop a lot to prospective buyers and secure a sales contract. Most commonly, this offer sample will represent an incomplete or partial lot, still in processing or volume building stages—but sometimes, as countries with developed and sophisticated smallholder export markets like Colombia, the offer sample is drawn from a stock lot in parchment at a producer’s house or bodega. Nearly always, though, an offer sample will be unmilled—meaning that it’s drawn from a lot that is still in parchment or dried cherry, needing to be hulled, screened, sorted, and cleaned prior to bagging and shipment. When a pre-ship sample arrives, we’ve come to expect that it fully represents the milled, larger lot—and so base our final purchasing and contract approvals on that standard.

But when the chain of custody breaks down, or if a sample isn’t drawn in a representative manner, the consequences can cascade across continents.

Coffee is inherently heterogeneous, a fact underlying every interaction we have with it. A single shipping container may contain three-hundred twenty 60kg bags; those 60kg may come from one farmer, or contributions from thousands. And each coffee tree, in the best case, may grow enough cherry to yield just a half kilo of finished green coffee, meaning that mathematically, a bag of coffee likely contains seeds from some 120 to 200 trees at a minimum, all with their own unique genetic expression, nutrition, and pollination or cross-pollination. At the roastery, we roast in batches which may include green from different individual bags, opened on different days, pulled from different places in the stack—and each individual seed will develop differently in the roaster based on its physical characteristics and interactions with the heat. After cooling, we take just a small piece of that batch and put it into a smaller retail bag, sold to consumers. Consumers draw from that bag a small amount of coffee—maybe 18 grams at a time—and brew a cup. Each of those doses contains hundreds of individual seeds, representing the hundreds of trees and thousands of farmers who may have contributed to the lot.

In specialty, though, we don’t really remark on or consider this heterogeneity and, in fact, assume the opposite—by default. We assume that every cup we brew will taste the same, and we assume—whether or not we explicitly acknowledge it—that every bean in a dose should taste good. We believe that if something tastes “bad” on its own, removing it will improve the whole; but how a compound tastes will depend not only on its intensity and concentration but also its context. I’m not a fan of the sound of banjo on its own—but in the right setting, in the right band, it can enhance texture and rhythm to great effect.

A concise way of describing “specialty coffee” is through three attributes on the SCA scoring sheet: Clean, Sweet and Uniform. While most attributes (flavor, aftertaste, acidity, etc.) are scored on a 6-10 scale based on both the way they present in the cup both in terms of quality and intensity, the attributes of Clean, Sweet and Uniform always start with a perfect 10; we assume that they’re there, by virtue of the coffee existing, and then deduct in the absence of those qualities. 

In the beautiful fantasy of the SCA score sheet, uniformity is a given.

In the industry standard method, cuppings are conducted using five bowls for each coffee. By increasing the number of ‘looks’ a cupper has at each coffee, we attempt to increase the ability of the cupper to detect potential problems in quality. The thinking is that five cups, drawn from a representative 350 gram sample of a lot, is sufficient to detect defects or uniformity issues ahead of contracting the lot that sample represents. 

But how representative can those 60 grams on the cupping table, or the 350 grams it was drawn from—really be? And what are the odds that a problem in a coffee, such as chlorine flavors, will show up on a cupping table?

Take the case of a 320 bag, full-container booking. 

It’s pretty common for an importer to receive anywhere from 350-1,000g as a sample against this contract. The Green Coffee Association of New York, whose rules and arbitration most U.S. based specialty importers follow, specifies that the sample should be drawn from at least 10% of the lot, or 32 bags, at random. Usually, however, this is not the case: an offer sample will typically be drawn from a fraction of the bags that are accessible at the time (which is, for full container contracts, permissible under GCA guidance). But to be truly representative, a sample would need to perfectly show the larger lot—faults, defects and all—ideally in a manner consistent with how it would present in the lot statistically.

A container of coffee is anywhere from 270 to 320 bags of coffee (depending on origin) weighing approximately 19,200 kilograms. On the high end, the 1kg sample in question sent to an importer is just a fraction of that: 0.0052% of the lot. But of that 1,000 grams, in all likelihood the importer will cup perhaps 5 bowls, or perhaps 60g of the coffee. That is, the importer will make a purchasing decision based on just 0.00003125% of the lot. 

It’s homeopathy—but for coffee.

As lot sizes decrease or as sample size increase, of course, this ratio collapses; but because coffee is heterogeneous, we can’t assume that every part of a lot is the same as every other part, any more than we can assume that the mix of coffees in a blend is uniform or every cup of coffee from a single bag will taste the same. If we apply that same ratio to a typical dose of coffee—say 18g—a “representative sample” cupped by an importer would represent 0.0005g of that dose.  Excluding edge cases, that’s less than the mass of a single coffee bean, which might be anywhere from 0.13 to 0.18g—or roughly 260 times greater. In other words, the amount of material drawn to represent full container bookings very often is less representative of the coffee in that container than a single bean in a typical dose of filter coffee and as a result, can fail to provide an accurate representation of that larger lot. This is why proper sample collection and drawing protocol is important; a sufficient number of bags must be stabbed, at random, with the remaining coffee kept intact during evaluation and transit.

And yet: in spite of the mathematics, I do, in fact, have confidence that all else being equal, a representative sample will, within reason, provide a picture of a larger lot (certain attributes of physical analysis notwithstanding—but that’s a topic for another time). If sampling didn’t, in a utilitarian sense, succeed in its aims, the way we buy coffee as roasters and importers simply wouldn’t work. Coffees wouldn’t taste the same between pre-ship and arrival, and we wouldn’t be able to build an understanding of a coffee prior to contracting it.

While processing and milling can enhance uniformity or work against it, the practical reality is that while variation does naturally exist, it’s often below our threshold of detection. And because a single cup of coffee includes so many individual seeds, those variations get blended and erased under the normalizing gaze of a Gaussian distribution.

Certain defects, though, do require us to reframe our understanding of and relationship to “representative” samples.

Because of their devastating impact on cup quality, we treat defects like potato and phenol harshly; this is reflected in scoring outcomes. The way the SCA score sheet is used, if a defect presents in one cup—say phenol or chlorine—its effect on the overall score is significant enough to render a judgment of “not specialty” or “below specialty.” This makes it incredibly important to know that when a defect appears on the table that it accurately reflects not only the presence of but the frequency of that defect in the larger lot, which is reliant upon that sample being in fact representative. 

But, in truth, the presence of these defects on one cupping table may not reflect the reality on another—both ostensibly representative samples drawn from the same lot. This explains the common occurrence where a roaster will report the presence of a defect to their suppliers only to have the supplier respond that they haven’t tasted the defect in any of their own cuppings of the same lot—even drawn from the same larger representative sample.

This unknown makes risk-averse buyers turn away from coffees from certain regions of the world—the Rift Valley in Kenya, DRC, Burundi and Rwanda, for example, as it relates to potato defect—to the detriment of smallholders and exporters in those producing regions.

I spoke at length with Tim Hill back in 2019 or 2020, when he was still with Counter Culture, about some statistical work that Counter Culture had done exploring the notion of sampling and defect detection, specifically looking at potato defect and the accuracy of detection from a given sample size. Counter Culture reported that for a 250 gram sample [nb: this is fairly generous—I typically receive samples of 100-200g], there was just 10% accuracy in determining the presence of potato defect. 

I asked him to clarify what lot size this referred to—a full container or a single bag. “It doesn’t matter—1 bag or 20,000 bags—as long as it is truly a representative sample,” Tim told me.
“5600-5700 grams was statistically how many grams you would need to evaluate a sample to come out with a 90% probability of having a correct instance rate of the defect per lot.” 

In other words, rejecting a coffee in Rwanda or Burundi if you hit one potato cup on the table gives you no better chance at reducing defects versus buying a lot from those sample places where you didn’t find it. 

“You might as well not cup anything and just flip a coin.”

2,4,6-Trichloroanisole (TCA), the same molecule responsible for “cork taint” in wine, is just one compound implicated in so-called “phenolic: cups. TCA can also present with musty, sulphury or earthy aromas. Like with other molecules, sensitivity to TCA varies between individuals—meaning that while some individuals may detect TCA at low levels in coffee (or wine), others may not. When present with 2,6-dichlorophenol (DCP), however, the synergistic effect of the two compounds can result in an iodine- or chlorine-like flavor in the cup that became known as “Rioy” and which, in the 1990s, some estimates say affected some 20% of coffees from Brazil. Because phenol can’t be detected in coffee until after it has been roasted and brewed, it poses a risk to buyers and can appear in the cup like Russian roulette—seemingly unpredictably, and without warning, even in coffees coming from all-star producers who produce coffees that win awards with boutique cultivars.

Phenolic defect is present in coffees from many parts of the world—I’ve encountered it in coffees from Colombia, Peru, Ecuador, Brazil, Mexico, Honduras, Nicaragua, El Salvador, Guatemala, and Tanzania, to name a few—and the cause of these chlorine- or iodine flavors in the cup is still unknown and may result from agricultural runoff, hygiene, microbial contamination, or storage conditions.

When contracting coffee from places known to have a history of producing coffee with phenol present, it’s critical, as a buyer, to understand if the lot you’re contracting has phenol present in it—and if so, what the instance rate of that defect is.

In this case, a standard, 250-350g sample, if available, is likely sufficient—provided you cup the entire 250g. Many larger roasters, including Counter Culture, as well as importers that buy from phenol-prone farms or regions (places that have shown the defect in the past) evaluate 20 cups—or roughly 200-250g. If it’s detected in those first 20 cups, it’s typical to evaluate 50 cups total—or up to 600g of coffee—to determine the frequency rate of the defect. “Unless you hit 4 or 5 or 5 for 5 in a coffee,” Tim speculated, “I’m convinced that to get a 90% probability of cataloging phenol rate is 50-100 cups.”

Through cupping 250 grams of Gildardo’s coffee all roasted using the same inlet-time profile on my Roest L100 with development modulated slightly to explore the cup presentation, I never once detected chlorine. I don’t know that it’s not there—I just didn’t find it. It certainly wasn’t present in 25% of the cups I tasted. Because roughly only 50% of TCA breaks down during the roast, it’s also possible that some roast approaches might retain enough of the compound that cuppers sensitive to the molecule would detect it.

My scores for the coffee, which I cupped blind on tables with offers from Mexico, Colombia, Brazil and Ethiopia, ranged from 85-86.75; never crossing 87 points. There were certainly uniformity issues, and some of the cups presented with a vegetal, earthy, astringent herbality common to *Timor hybrids like older root stock from some phenotypes of Castillo (what George Howell refers to as “tail of the devil”). In its best presentation, at certain very light roasts, it showed with a blueberry tone and clean, laser-like citric acidity. But there were clearly issues of cup variation and uniformity within this 30kg lot. 

It didn’t, to me, taste like a Pink Bourbon, and, to be honest—it didn’t taste like Gildardo’s coffee. It had some characteristics of the coffee I’d bought from him in March, but almost tasted like a diluted version.

If I had any sample material from the pre-ship available, I could cup it against that reference and verify if something went amiss. This practice isn’t perfect—coffees change over time, and change through milling—but like a person with a different haircut, wearing a different outfit, they still will look like themselves even if certain attributes are turned up or down in volume. 

Because the transaction was direct and didn’t pass through export or import quality labs and wasn’t milled by known, trusted service providers, it’s not possible for me to do more than speculate about what happened. And without a pre-ship sample to compare to using my usual analysis, I couldn’t be sure: but I do suspect that it’s possible that this isn’t the lot that Prodigal originally sampled.

I suspect that the lot was cleaned by a local dry mill that, rather than specializing in specialty coffee and microlots, typically processes large volumes of commercial coffee. These mills operate most efficiently at volumes of 50+ bag runs because of the amount of changeover that may occur in the machines and the time it takes to adjust and clean them. A 30kg microlot, in most contexts, is too small and would need to be milled and sorted by hand. While I don’t believe the lot was switched in its entirety (though I have heard from my own trusted agents and exporters that some dry mills and exporters have a reputation for this), I do believe that some other coffee was mixed in, inadvertently or otherwise, whether to bulk the lot or because of equipment that wasn’t cleaned between runs.

It happens.

Every cup I tried was clean and sweet—whatever cultivar it turned out to be. But even if the 5th cup in a cupping did taste like pool water—so what?

But at the end of the day, it doesn’t matter: I committed to the lot, and I committed to Gildardo that I would sell it. I didn’t have any right of refusal—and there was no other buyer for that coffee, anyway. Coffees and their producers sink or swim based on their reputation—and a reputation of producing coffee with defects generates a certain calculus of risk in buyers’ minds and threatens not only the marketability of their coffees but also a perception of trust. When we operate without trust anchors in transactions without appropriately accounting for risk, we leave those with the least power in a transaction exposed; in the interests of supporting a producer whose coffee I love, I was more than happy to take on that risk. 

So flipped the coin, and I bought the coffee.


That's just, like, your opinion, man

  1. Bravo! What an excellent, intelligent, and informative piece, thoroughly enjoyed it. I really appreciate you sharing these, it helps the entire coffee community grow and learn!

  2. Great story! I was impressed how fast the coffee sold out. I slowly pulled my phone out of my pocket when the text came in, and by the time I reached the Shop page the coffee had sold out. Not even enough time for a buyer to flip a coin. Lots of others wanted the coffee too! Thank you for filling in lots of details since the original Instagram post.

  3. As usual a great read with some great insights!
    I’m just confused with one point:
    If Prodigal cupped from the 30kg batch they received how could it not be the same coffee?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to top