Last year, we stumbled upon a paper, The phytochemical diversity of commercial Cannabis in the United States which utilizes a joint state Leafly database of cannabis collected across a diverse set growers, technique (outdoor, partial deprivation, indoor), strain, and geographies across the United States. One observation from the analysis was the significant variation in both THC and terpene profile across samples submitted under the same strain name, and an insignificant statistical relationship between strains classified as Sativa, Indica, or Hybrid. In other words, the way we classify the cannabis experience and market specific strains means very little. A fact nearly any experienced cannabis consumers could easily validate attest absent large data sets. On the flip side, when DankeSuper started five years ago, the significant structured data used to reach this conclusion simply did not exist. We lacked the tools to appropriately tools to begin factoring the cannabis experience.
The fact that the Leafly+ University program exists serves as a clear testament the benefits of legalization, and the regulatory reporting requirements that subsequently come with it. The aggregation of data has the potential to transform our understanding of cannabis, and benefit everyone across the value chain from producer to consumer. For example, the paper’s authors were able to perform Principle Component Analysis (PCA) to create cannabis cluster groups based on cannabinoid ratios & terpene profile as exhibited below.
Terpene Principal Component Analysis & Spearman's Correlation
The application of such a factor based approach has implications for the classification of commercial Cannabis, design of animal and human research, and regulation of consumer marketing—areas which today are often divorced from the chemical reality of the Cannabis-derived material they wish to represent. The ability to merge large data sets of cannabis phytochemical profiles with user experience offers the foundation of a framework that enables the consumer to clearly navigate the products within the space while protecting the propriety of the industry’s work. In the paper, terpene clusters are utilized to create distinct principal components. These clusters serve as a partial explanatory tool for particular phenotypes effect, and help to provide a phytochemical benchmark in understanding and classifying the cannabis the experience.
Mapping Cannabis’ Phytochemical Experience
The obvious benefit of accurate strain identification and an appropriate phytochemical hierarchy is that provides a framework for understanding the cannabis experience. For example, below are principal component word clouds and radar plots capturing common semantics by consumers for two well known strains classified as “Sativa” & “Indica” under the commonly used framework. Yet, observation of the average terpene content of over 1,200 total samples of the two strains fail to cleanly fit within the principal component groups provided above. In fact, average humulene, caryophyllene, limonene, and pinene content across the two strains prove virtually identical at within 0.1% of each other providing little clarity as to what makes each strain distinct. Is there a distinction based or is it placebo effect? The Indica clearly ranked relaxed as a top quality matching our Indica expectations. While it ranked below “random words” with the Super Lemon Haze. So, in praising the power of big data and trashing current nomenclature, did we just invalid ourselves?
Not exactly. If anything, it simply highlights the importance of what we call “Experiential Data Science”. The combination of real world understanding (i.e. getting stoned) with real world phytochemical data to arrive at consistent understanding. Over the past year, DankeSuper has developed a fanatical interest in the dominant terpene from the originally referenced cluster groups: Terpinolene. We first became curious about Terpinolene in the summer of 2022 upon trying Hawaiian Punch. When we sampled it, we experienced a “Happy”, “Energetic”, “Uplifted”, “Euphoric”, “Creative” random word high. It was unlike other strain we previously experienced. We used the same database as “The Phytochemical Diversity of Commercial Cannabis…” to attempt to gain an understanding of the experience and came upon Terpinolene. Since then, we have developed a rigorous set of built around terpene profile with Terpinolene serving as our creative, euphoric wildcard. Super Lemon Haze & Blueberry exhibit material differences in two terpene concentrations: Terpinolene & Myrcene.