<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[DATA BOOST INDUSTRY: Articles - ENG]]></title><description><![CDATA[English Version of the newsletter]]></description><link>https://databoostindustry.substack.com/s/articles-eng</link><image><url>https://substackcdn.com/image/fetch/$s_!xdt-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a1ea5d3-4571-4656-8bce-4ffaafff3300_700x700.png</url><title>DATA BOOST INDUSTRY: Articles - ENG</title><link>https://databoostindustry.substack.com/s/articles-eng</link></image><generator>Substack</generator><lastBuildDate>Sat, 02 May 2026 10:36:44 GMT</lastBuildDate><atom:link href="https://databoostindustry.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[DATA BOOST]]></copyright><language><![CDATA[fr]]></language><webMaster><![CDATA[databoostindustry@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[databoostindustry@substack.com]]></itunes:email><itunes:name><![CDATA[DATA BOOST Industry]]></itunes:name></itunes:owner><itunes:author><![CDATA[DATA BOOST Industry]]></itunes:author><googleplay:owner><![CDATA[databoostindustry@substack.com]]></googleplay:owner><googleplay:email><![CDATA[databoostindustry@substack.com]]></googleplay:email><googleplay:author><![CDATA[DATA BOOST Industry]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[ENG - The "OMICS" project - Part 2 : The Kawasaki case study]]></title><description><![CDATA[The Pharm'AI Company - Project #6]]></description><link>https://databoostindustry.substack.com/p/eng-the-omics-project-part-2-the</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-the-omics-project-part-2-the</guid><dc:creator><![CDATA[DATA BOOST Industry]]></dc:creator><pubDate>Thu, 28 Nov 2024 17:21:03 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/def38196-9f71-42ca-a04f-16017830a9f9_2423x1629.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the first article of this series, I had the opportunity to <a href="https://databoostindustry.substack.com/p/eng-the-omics-project-part-1-the">explain the main concepts related to omics data</a>. Now, we will apply these concepts and analyze genetic data step by step.</p><h3><strong>Some Information About the Data Source</strong></h3><p>The raw data we will use to better understand Kawasaki disease<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> comes from a database hosted on the <a href="https://www.ncbi.nlm.nih.gov/">NCBI</a> (National Center for Biotechnology Information). </p><p>The <a href="https://www.ncbi.nlm.nih.gov/geo/">GEO</a> database (Gene Expression Omnibus) is a repository that contains various biological samples for which high-throughput gene expression analyses have been performed, i.e., measurements of gene activity. Analytical techniques such as microarrays or genomic chips enable large-scale screening, detection, and quantification of these variations.</p><h4><strong>Initial Data Format</strong></h4><p>The raw data is presented as a data table of approximately 31MB with the following characteristics:</p><ul><li><p><strong>135 columns</strong> representing samples taken from patients diagnosed with Kawasaki disease (KD) or belonging to a control group.</p></li><li><p><strong>60,075 rows</strong> representing distinct genetic sequences, primarily genes. Each gene is referenced by a unique ID ("<em>ensembl_gene_id</em>").</p></li><li><p><strong>Over 8 million values</strong>, corresponding to the quantified amount of mRNA for each gene in different patients. The unit of measurement is TPM (Transcript Per Million), a standardized metric used to quantify gene expression. This value reflects gene expression levels, as messenger RNA acts as the intermediate vector between genes and proteins.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dlzH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dlzH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png 424w, https://substackcdn.com/image/fetch/$s_!dlzH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png 848w, https://substackcdn.com/image/fetch/$s_!dlzH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png 1272w, https://substackcdn.com/image/fetch/$s_!dlzH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dlzH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png" width="1012" height="379" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:379,&quot;width&quot;:1012,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42946,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dlzH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png 424w, https://substackcdn.com/image/fetch/$s_!dlzH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png 848w, https://substackcdn.com/image/fetch/$s_!dlzH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png 1272w, https://substackcdn.com/image/fetch/$s_!dlzH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54af6275-0e29-4ac7-92de-75b095b008a9_1012x379.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Raw Data Table</figcaption></figure></div><h4><strong>Some Preliminary Data Cleaning Steps</strong></h4><p>To process this dataset, a few relatively simple actions are necessary:</p><ol><li><p>The <strong>"genename" column</strong> is removed because its information does not always align with official nomenclature.</p></li><li><p>The data matrix is <strong>transposed</strong>, meaning rows and columns are swapped.</p></li><li><p>Numeric values are <strong>rounded</strong>, significantly reducing computational power requirements. Since only the magnitude matters, precise decimals are unnecessary.</p></li></ol><p>The resulting dataset is accompanied by a metadata table indicating whether patients are healthy or affected by the disease.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l3Oj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l3Oj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!l3Oj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!l3Oj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!l3Oj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l3Oj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:470715,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l3Oj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!l3Oj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!l3Oj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!l3Oj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14536ad6-6d60-4ee6-a198-6599b2a348c5_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Processed Data Table</figcaption></figure></div><h3><strong>Processing Data with the DESeq2 Package</strong></h3><p><strong>DESeq2</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> is a bioinformatics tool used to analyze differences in gene activity between two or more populations. It helps identify which genes exhibit significant changes in expression and are more or less active under different biological conditions.</p><p>DESeq2 integrates the dataset, normalizes it for comparability, and applies a variety of tests to detect significant differences between the studied populations. This process is known as <strong>Differential Expression Analysis (DEA)</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. </p><p>Without delving into all the technical details, DESeq2 uniformly applies numerous statistical hypothesis tests to compare the expression of each gene between healthy and diseased patients. For each test, a <strong>p-value</strong> is generated, indicating whether the result is statistically significant. The strength of this package lies in its statistical assumptions tailored to genetic data (e.g., negative binomial distribution) and its automated corrective measures to ensure robust and comparable results.</p><div><hr></div><p>Here&#8217;s what we obtain after running DESeq2:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_GRH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_GRH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png 424w, https://substackcdn.com/image/fetch/$s_!_GRH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png 848w, https://substackcdn.com/image/fetch/$s_!_GRH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png 1272w, https://substackcdn.com/image/fetch/$s_!_GRH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_GRH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png" width="1456" height="495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:495,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:690602,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_GRH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png 424w, https://substackcdn.com/image/fetch/$s_!_GRH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png 848w, https://substackcdn.com/image/fetch/$s_!_GRH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png 1272w, https://substackcdn.com/image/fetch/$s_!_GRH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434c267b-5f99-4e8d-a823-c951ef7bb35d_2560x871.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">DESeq Dataframe</figcaption></figure></div><p>While the results may seem complex, there are two key values to focus on:</p><ol><li><p><strong>Adjusted p-value (padj):</strong> Indicates whether the difference between healthy and diseased patients is statistically significant. This helps rank genes to highlight the most differentiating ones.</p></li><li><p><strong>Log2FoldChange:</strong> Shows whether a gene is overexpressed (positive values) or underexpressed (negative values) in diseased patients.</p></li></ol><div><hr></div><p>With this information, we have all the necessary tools to begin a differential analysis of Kawasaki disease.</p><h2><strong>The OMIX Buddy Application</strong></h2><p>Alongside this article, you can:</p><ul><li><p><strong>Visualize and interact with the graphs</strong> presented here using the web application I created for the occasion: <a href="https://the-omix-buddy-6e7889e55137.herokuapp.com/">The OMIX buddy &#129302;</a> !</p></li><li><p>Access the complete codebase as a <a href="https://github.com/arnaud-dg/omics-project/blob/main/notebooks/Omics%20project%20Notebook.ipynb">notebook</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>, for those who want to follow the detailed steps.</p></li></ul><h2><strong>First Level of Analysis: At the Gene Level</strong></h2><p>When studying the impact of a disease on gene expression, two complementary approaches can be taken:</p><ol><li><p><strong>Individual gene analysis,</strong> which identifies specific genes whose expression changes significantly.</p></li><li><p><strong>Functional family analysis,</strong> which examines how these changes cluster into broader biological categories.</p></li></ol><h4><strong>Individual Analysis: The Volcano Plot</strong></h4><p>The <strong>Volcano plot</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> is a commonly used representation in bioinformatics for Differential Expression Analyses.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zVQK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zVQK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!zVQK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!zVQK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!zVQK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zVQK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:760629,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zVQK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!zVQK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!zVQK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!zVQK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dc34f20-73f0-4fca-8f94-c2d24169cbf1_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Volcano Plot applied to Kawasaki disease</figcaption></figure></div><p>A volcano plot is a scatterplot that displays:</p><ul><li><p><strong>On the x-axis:</strong> The <strong>Log2FoldChange</strong> value. Genes on the right are overexpressed in the diseased group compared to the control group, while genes on the left are underexpressed.</p></li><li><p><strong>On the y-axis:</strong> The <strong>adjusted p-value.</strong> Typically, the negative logarithm of the p-value is used to better distribute the data and enhance readability. Genes higher on this axis show greater expression differences between healthy and diseased patients.</p></li></ul><p>The genes of greatest interest are represented within the two oval regions at the top left and right of the plot. Generally, thresholds are defined on both axes to filter the population and retain only the most relevant data points, which is why some points appear grayed out in the visuals.</p><div><hr></div><p>For Kawasaki disease, I selected <strong>8 genes of interest</strong> for further analysis. These are promising candidates as potential biomarkers of the disease.</p><h4><strong>Functional Family Analysis</strong></h4><p>This second approach leverages a complementary database called <strong>GO (Gene Ontology)</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>. Researchers and biomedical scientists contribute to public databases that catalog genes, their families, and their functions, collectively known as <strong>ontologies.</strong></p><p>The Gene Ontology database organizes genetic information into three key dimensions:</p><ol><li><p><strong>Biological processes:</strong> The phenomena genes are involved in (e.g., inflammatory response or cell division).</p></li><li><p><strong>Molecular functions:</strong> The specific activities of proteins encoded by these genes (e.g., binding to DNA or catalyzing chemical reactions).</p></li><li><p><strong>Cellular components:</strong> The cellular structures or compartments where the genes are active (e.g., cell membrane or mitochondria).</p></li></ol><p>By cross-referencing the previous dataset with this new source of information, we can identify enriched functional categories and associated genes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6R8x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6R8x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png 424w, https://substackcdn.com/image/fetch/$s_!6R8x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png 848w, https://substackcdn.com/image/fetch/$s_!6R8x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png 1272w, https://substackcdn.com/image/fetch/$s_!6R8x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6R8x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png" width="1053" height="508" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:508,&quot;width&quot;:1053,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70061,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6R8x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png 424w, https://substackcdn.com/image/fetch/$s_!6R8x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png 848w, https://substackcdn.com/image/fetch/$s_!6R8x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png 1272w, https://substackcdn.com/image/fetch/$s_!6R8x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a8ebba0-7d16-4dc4-a0bd-97c20de1496a_1053x508.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4zyq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4zyq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png 424w, https://substackcdn.com/image/fetch/$s_!4zyq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png 848w, https://substackcdn.com/image/fetch/$s_!4zyq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png 1272w, https://substackcdn.com/image/fetch/$s_!4zyq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4zyq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png" width="1456" height="1164" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1164,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4zyq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png 424w, https://substackcdn.com/image/fetch/$s_!4zyq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png 848w, https://substackcdn.com/image/fetch/$s_!4zyq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png 1272w, https://substackcdn.com/image/fetch/$s_!4zyq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ea6d801-a090-4e70-a6e7-0c851801f827_1489x1190.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the case of Kawasaki disease, it appears that the gene families primarily involved are those associated with protein binding and RNA binding. Underexpression is predominantly observed in the cytosol and nucleoplasm.</p><h2><strong>Conclusion</strong></h2><p>In this article, we explored the power of omics data analysis, transforming over 8 million raw data points into two clear and insightful visualizations. This approach not only enhances our understanding of Kawasaki disease but also paves the way for promising therapeutic avenues.</p><p>In the next article, we will take the analysis one step further by focusing on the patient level. We will explore how data analysis can revolutionize clinical trial design by identifying more homogeneous patient groups, paving the way for truly personalized medicine.</p><p><em>&#8230; As usual &#8230; Let&#8217;s rock with data ! &#129516;&#129304;</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>GEO - An AI-guided invariant signature places MIS-C with Kawasaki disease in a continuum of host immune responses- <a href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM5392646">https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM5392646</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>Bioconductor.org - Documentation technique DESeq2 - <a href="https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html">https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>EMBL's European Bioinformatics Institute - <strong><a href="https://www.ebi.ac.uk/training/online/courses/functional-genomics-ii-common-technologies-and-data-analysis-methods/rna-sequencing/performing-a-rna-seq-experiment/data-analysis/differential-gene-expression-analysis/#:~:text=Differential%20expression%20analysis%20means%20taking,expression%20levels%20between%20experimental%20groups.">Differential gene expression analysis</a></strong></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>Notebook - Omics data analysis - <a href="https://github.com/arnaud-dg/omics-project/blob/main/notebooks/Omics%20project%20Notebook.ipynb">https://github.com/arnaud-dg/omics-project/blob/main/notebooks/Omics%20project%20Notebook.ipynb</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>HTG molecular - Understanding Volcano Plots - <a href="https://www.htgmolecular.com/blog/2022-08-25/understanding-volcano-plots">https://www.htgmolecular.com/blog/2022-08-25/understanding-volcano-plots</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>Gene Ontology Resource - <a href="https://geneontology.org/">https://geneontology.org/</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - The "OMICS" project - Part 1 : The project brief]]></title><description><![CDATA[The Pharm'AI Company - Project #6]]></description><link>https://databoostindustry.substack.com/p/eng-the-omics-project-part-1-the</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-the-omics-project-part-1-the</guid><dc:creator><![CDATA[DATA BOOST Industry]]></dc:creator><pubDate>Thu, 21 Nov 2024 20:43:14 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8748f572-d4d1-453d-a96c-9b8abc1455ac_2165x1440.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This sixth project is different from the others: it is not about Machine Learning but rather focuses on analyzing human biological data. We will therefore remain in the realm of <strong>Data</strong>, or more precisely, &#8220;Big Data.&#8221;</p><p>I must admit right away that this educational project is a bit of a challenge for me, as biology is a science with its own concepts and jargon. I will therefore make an extra effort to simplify and clarify. I encourage you to stick with me, even if you're unfamiliar with the pharmaceutical R&amp;D sector, as I genuinely believe the tools we will explore are worth knowing.</p><h2>The Challenges of Modern Medicine</h2><p>Pharmaceutical R&amp;D faces several significant challenges:</p><ul><li><p>Around 90% of drug candidates never make it to market as approved medicines.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p></li><li><p>The average ROI for pharmaceutical R&amp;D has dropped from 10% in 2010 to 1.8% in 2020<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>.</p></li><li><p>Development costs are increasing exponentially, with the failure of a molecule in Phase III R&amp;D potentially costing up to $1 billion.</p></li></ul><p>Moreover, identifying new active molecules to treat increasingly complex diseases has become more challenging. Far from the "simple" diseases caused by a predominant factor, most conditions that resist modern medicine are multifactorial (e.g., cancers, autoimmune diseases, neurodegenerative disorders, to name a few).</p><p>To identify new active molecules, it is therefore imperative to better understand the biological mechanisms underlying these pathologies to pinpoint relevant therapeutic targets. However, these conditions often involve hundreds of genes, thousands of proteins, and interacting environmental factors.</p><p>How can we identify the root causes of a disease when thousands of parameters are at play? How can we develop effective treatments when each patient reacts differently?</p><p>Guess what? &#8230; Data Science is a powerful tool for the job!</p><h2>Omics Data: A 360&#176; View of Life</h2><h4>What is it?</h4><p>The term &#8220;-omics&#8221; refers to various fields of biology that all end with the suffix "-omics." The four main components are as follows<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>:</p><ol><li><p><strong>Genomics:</strong> The study of our genes, which define our genetic heritage. The data involved consists of DNA sequences.</p></li><li><p><strong>Transcriptomics:</strong> The study of gene expression and activity. The data involved consists of mRNA sequences (the same fragments found in some modern vaccines).</p></li><li><p><strong>Proteomics:</strong> The study of proteins, the true workers of our cells. The data includes amino acid sequences that make up proteins, as well as the 3D structure of these proteins.</p></li><li><p><strong>Metabolomics:</strong> The study of biochemical processes and metabolic cascades occurring in our bodies.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Svq3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Svq3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png 424w, https://substackcdn.com/image/fetch/$s_!Svq3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png 848w, https://substackcdn.com/image/fetch/$s_!Svq3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png 1272w, https://substackcdn.com/image/fetch/$s_!Svq3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Svq3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png" width="850" height="608" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc092817-6245-4ebb-a038-7042b2393059_850x608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:608,&quot;width&quot;:850,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:354659,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Svq3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png 424w, https://substackcdn.com/image/fetch/$s_!Svq3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png 848w, https://substackcdn.com/image/fetch/$s_!Svq3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png 1272w, https://substackcdn.com/image/fetch/$s_!Svq3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc092817-6245-4ebb-a038-7042b2393059_850x608.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Th 4 types of Data</figcaption></figure></div><h2>Specific Data Science Challenges in R&amp;D</h2><p>I could mention many points, but I have limited myself to three main challenges:</p><p><strong>1. The Overwhelming Volume of Data</strong><br>The average size of a human genome is about 3.2 billion base pairs, equivalent to approximately 200 GB of raw data (about 50 HD movies!)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>.</p><p>...and as we saw earlier, this is just the first layer of data&#8212;the genome. Imagine the sheer volume of omics data associated with just one patient!</p><p>Even a simple bacterium like <em>E. coli</em> has 4,300 genes, amounting to a sequence of about 4.6 million nucleotides.</p><p><strong>2. The Unique Structure of Omics Data</strong><br>Unlike traditional datasets, omics datasets are often &#8220;horizontal&#8221;: they have a limited number of samples or patients but an extreme number of variables. This is sometimes called &#8220;fat data&#8221;<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> as these datasets are rich with numerous, often redundant, variables.</p><p>Specific tools to reduce dataset dimensionality or select relevant variables are necessary to extract insights effectively and efficiently.</p><p><strong>3. Specialized Domain Language</strong><br>As mentioned in the introduction, working with omics data requires effort to understand:</p><ul><li><p><strong>Biological jargon and concepts</strong> (e.g., co-expression networks, Differential Expression Analysis).</p></li><li><p><strong>Technical jargon related to data</strong> often stored in formats and standards unfamiliar to the general public<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>.</p></li></ul><h2>Why Are Omics Data Essential ?</h2><p>In the drug development cycle, data from biological profiling plays a crucial role, offering significant advantages at every stage of the process. These benefits are clearly illustrated in the infographic below<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Zkyk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Zkyk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Zkyk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Zkyk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Zkyk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Zkyk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg" width="658" height="1213.4134615384614" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2685,&quot;width&quot;:1456,&quot;resizeWidth&quot;:658,&quot;bytes&quot;:476100,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Zkyk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Zkyk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Zkyk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Zkyk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F520f7a23-d639-43cc-a6de-937508f5704f_1700x3135.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Biological data are useful:</p><p><strong>1. At the disease level, in basic research, for:</strong></p><ul><li><p><strong>Understanding pathological mechanisms:</strong> Omics data deciphers the complex biological processes involved in disease development.</p></li><li><p><strong>Identifying therapeutic targets:</strong> It helps discover specific genes, proteins, or biological pathways that could be targeted by new therapies.</p></li></ul><p><strong>2. At the patient level, in clinical research, for:</strong></p><ul><li><p><strong>Precise patient stratification:</strong> By segmenting populations based on their biological characteristics, clinical trials can recruit better-suited participants, reducing trial duration and cost while increasing the chances of success.</p></li><li><p><strong>Optimized clinical study monitoring through predictive biomarkers:</strong> These markers enable earlier and more accurate evaluation of the effectiveness of tested treatments.</p></li></ul><div><hr></div><p>Omics data is not limited to basic and clinical research. It also has applications in related fields such as cell culture bioprocesses, pharmacovigilance, and personalized medicine.</p><h2>Our Practical Application: Exploring Kawasaki Disease</h2><p>Kawasaki disease<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a> is a rare pediatric illness that causes inflammation of blood vessels and can lead to severe cardiac complications. This disease recently gained attention due to an increase in prevalence among children during the COVID-19 pandemic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-jg1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-jg1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-jg1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-jg1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-jg1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-jg1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg" width="584" height="338.30668414154655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:442,&quot;width&quot;:763,&quot;resizeWidth&quot;:584,&quot;bytes&quot;:55769,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-jg1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-jg1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-jg1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-jg1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae01ba58-b014-4bd1-b454-4e116b3d5e18_763x442.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The exact cause of this disease remains poorly understood.</p><p>I chose this condition to walk you through an analysis of biological data from patients affected by Kawasaki disease. Anonymized data is available online via the <a href="https://www.ncbi.nlm.nih.gov/">National Library of Medicine</a> <a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a>. </p><p>Our project will focus on creating a graphical interface to explore different genes associated with Kawasaki disease to identify potential biomarkers of interest.</p><blockquote><p><em><strong>A biomarker is a measurable characteristic that indicates normal biological processes, disease processes, or responses to interventions, with qualified biomarkers aiding drug development by reducing uncertainty in regulatory decisions within a defined context of use. - FDA</strong></em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a></p></blockquote><p>Generally, I always try to convey the concept of return on investment (ROI) for each project at &#8220;The Pharm&#8217;AI Company.&#8221; In this case, I won&#8217;t be able to do so, as calculating ROI in R&amp;D is quite complex. Indeed, identifying biomarkers helps reduce the risk of project failure, improve patient selection, and/or shorten project duration. However, all of this is very difficult to translate into hard cash.</p><h2>Conclusion</h2><p>With the basics laid out, in the next article, I will introduce &#8220;<strong>The Omix Buddy</strong>&#8221; my application, which will allow us to explore the dataset related to Kawasaki disease.</p><p>I will present visualizations that help researchers quickly identify genes of interest and understand their interactions, transforming mountains of data into <strong>biological insights</strong>.</p><p>En attendant &#8230; Let&#8217;s rock with Biology &amp; Data ! &#129516;</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>Dominique Limet - La R&amp;D dans l&#8217;industrie pharmaceutique - <a href="https://www.lajauneetlarouge.com/la-rd-dans-lindustrie-poharmaceutique/">https://www.lajauneetlarouge.com/la-rd-dans-lindustrie-poharmaceutique/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>Cercle k2 - Analyse des d&#233;penses de sant&#233; - <a href="https://cercle-k2.fr/etudes/une-analyse-2010-2020-des-investissements-et-des-depenses-de-14-societes-pharmaceutiques">https://cercle-k2.fr/etudes/une-analyse-2010-2020-des-investissements-et-des-depenses-de-14-societes-pharmaceutiques</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Farshbaf, Alieh &amp; Zare, Reza &amp; Mohajertehran, Farnaz &amp; Mohtasham, Nooshin. (2021). New diagnostic molecular markers and biomarkers in odontogenic tumors. Molecular Biology Reports. 48. <a href="https://www.researchgate.net/publication/350668866_New_diagnostic_molecular_markers_and_biomarkers_in_odontogenic_tumors">10.1007/s11033-021-06286-0</a>. </h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>Rei Robinson - <strong>How big is the human genome? - </strong><a href="https://medium.com/precision-medicine/how-big-is-the-human-genome-e90caa3409b0">https://medium.com/precision-medicine/how-big-is-the-human-genome-e90caa3409b0</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>Khan, Faridoon &amp; Albalawi, Olayan. (2024). Analysis of Fat Big Data Using Factor Models and Penalization Techniques: A Monte Carlo Simulation and Application. Axioms. 13. <a href="https://www.researchgate.net/publication/381600667_Analysis_of_Fat_Big_Data_Using_Factor_Models_and_Penalization_Techniques_A_Monte_Carlo_Simulation_and_Application">10.3390/axioms13070418</a>. </h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>Griffin, Philippa &amp; Khadake, Jyoti (2017). Best practice data life cycle approaches for the life sciences. F1000Research. 6. 1618. <a href="https://www.researchgate.net/publication/319414682_Best_practice_data_life_cycle_approaches_for_the_life_sciences">10.12688/f1000research.12344.1</a>. </h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><h5>AMGEN - <strong>Applying the Power of Omics to R&amp;D - </strong><a href="https://www.amgen.com/stories/2021/04/applying-the-power-of-omics-to-r-and-d">https://www.amgen.com/stories/2021/04/applying-the-power-of-omics-to-r-and-d</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><h5>Osmosis - Kawasaki disease - <a href="https://www.osmosis.org/learn/Kawasaki_disease">https://www.osmosis.org/learn/Kawasaki_disease</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><h5>NCBI - An AI-guided invariant signature places MIS-C with Kawasaki disease in a continuum of host immune responses - <a href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE178491">https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE178491</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><h5>FDA - What is a biomarker - <a href="https://www.fda.gov/drugs/biomarker-qualification-program/about-biomarkers-and-qualification#what-is">https://www.fda.gov/drugs/biomarker-qualification-program/about-biomarkers-and-qualification#what-is</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - Machine Learning based Predictive Maintenance - Part 3 : The modeling part]]></title><description><![CDATA[The Pharm'AI Company - Project #5]]></description><link>https://databoostindustry.substack.com/p/eng-machine-learning-based-predictive-32d</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-machine-learning-based-predictive-32d</guid><dc:creator><![CDATA[DATA BOOST Industry]]></dc:creator><pubDate>Wed, 13 Nov 2024 22:04:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>After completing the <a href="https://databoostindustry.substack.com/p/eng-machine-learning-based-predictive-92a">analysis of a real dataset together</a>, it&#8217;s now time to move on to the third and final step: training a Machine Learning model to create a predictive tool that turns data into action!</p><p>The application prototype is available <a href="https://preventivemaintenanceaeronautics-fjonxrqfymxmggytlrqfew.streamlit.app/">here</a>.</p><h2><strong>Redefining the Problem</strong></h2><p>Let&#8217;s recap some key points from the second article to quickly recontextualize the problem from a &#8220;Data&#8221; perspective.</p><p>We monitored a hundred turbines over an extended period. This monitoring allowed us to collect numerical readings from over twenty sensors (temperature, pressure, vibrations, etc.), as well as an assessment of failure risk through the TTF (Time-To-Failure).</p><p>The goal of our model is to predict the risk level of failure based on new input data, assigning it to one of the following three categories:</p><ul><li><p><strong>Class 0</strong>: Normal operation (no risk)</p></li><li><p><strong>Class 1</strong>: Caution zone (moderate risk, maintenance recommended)</p></li><li><p><strong>Class 2</strong>: Critical zone (high risk of imminent failure)</p></li></ul><p>In terms of modeling, the task is to predict one of three categories. In Data Science, this is referred to as a multi-class classification problem.</p><p>From an operational perspective, the most critical category for us is <strong>Class 1</strong>, as this is the ideal moment to trigger maintenance actions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TiMr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TiMr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!TiMr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!TiMr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!TiMr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TiMr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99847,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!TiMr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!TiMr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!TiMr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!TiMr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73153f9d-962e-4770-a755-e2b8bab964c8_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Classes definitions</figcaption></figure></div><h2><strong>Choosing the Right Algorithm</strong></h2><p>It&#8217;s often challenging to determine in advance which algorithm will best suit a given problem. There&#8217;s no universal algorithm that outperforms all others in every scenario. In Machine Learning, this reality is well encapsulated by the theorem formulated by David Wolpert, famously known as the &#8220;No Free Lunch&#8221; theorem<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p><blockquote><p><em><strong>There is no free lunch! &#8212; David Wolpert</strong></em></p></blockquote><p>This principle underscores that no algorithm is optimal for every problem; each tool has its strengths and weaknesses depending on the context.</p><p>During the development phase, it&#8217;s common to evaluate multiple algorithms in parallel to choose the one offering the best trade-offs based on the project&#8217;s criteria. These criteria include prediction quality, model explainability, inference time, and even energy consumption.</p><p>To simplify this project and avoid unnecessary complexity, I opted not to compare a range of algorithms but instead to focus directly on a specific method, i.e. <em>Random Forest</em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>.</p><p>This algorithm typically provides a good balance between performance and computational efficiency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!47q4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!47q4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg 424w, https://substackcdn.com/image/fetch/$s_!47q4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg 848w, https://substackcdn.com/image/fetch/$s_!47q4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg 1272w, https://substackcdn.com/image/fetch/$s_!47q4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!47q4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg" width="1456" height="1113" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1113,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;random forest diagram&quot;,&quot;title&quot;:&quot;random forest diagram&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="random forest diagram" title="random forest diagram" srcset="https://substackcdn.com/image/fetch/$s_!47q4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg 424w, https://substackcdn.com/image/fetch/$s_!47q4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg 848w, https://substackcdn.com/image/fetch/$s_!47q4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg 1272w, https://substackcdn.com/image/fetch/$s_!47q4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef8434f-e64a-4c62-bcd9-e0f95d625c36_820x627.svg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Random Forest Algorithm Illustration</figcaption></figure></div><p>For a detailed explanation of how decision-tree-based algorithms work&#8212;particularly Random Forest&#8212;please refer to the link and video<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> below.</p><div id="youtube2-QQN5NjJtUcc" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;QQN5NjJtUcc&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/QQN5NjJtUcc?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2><strong>Training the Model</strong></h2><p>Training a Machine Learning algorithm involves several strategic steps to ensure the reliability of our predictions. Below is a summary of these steps, but the full code is, of course, available in this <a href="http://Notebook - https://github.com/arnaud-dg/Preventive_Maintenance_Aeronautics/blob/main/notebooks/Aircraft_Predictive_Maintenance_2_Modeling.ipynb">notebook</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>.</p><h4><strong>Step #1: Data Preparation</strong></h4><p>Our initial dataset shows a marked imbalance: 17,531 observations correspond to normal operation, while only around 1,500 observations each represent moderate and high-risk states. This is quite logical&#8212;machines are more often in good working condition than close to failure.</p><p>To address this imbalance, which could bias our model, I used a statistical technique called <strong>SMOTE</strong>, which generates synthetic examples to balance the classes. Sensor measurements were also standardized (NB: Standardization is unnecessary for tree-based approaches like Random Forest but is essential for methods like PCA).</p><h4><strong>Step #2: Creating a Performance Metric</strong></h4><p>By definition, an algorithm will make prediction errors; a 100% accurate model doesn&#8217;t exist. For this project, I wanted to minimize prediction errors in <strong>Class 1</strong> at all costs, as this category triggers maintenance actions.</p><p>The choice of performance metric is crucial because it directly influences and guides the training phase. I developed a custom performance metric (<em>recall * 0.7 + accuracy * 0.3</em>) that penalizes misclassifications in Class 1 more heavily.</p><p>This strategy does lead to some false positives, resulting in slightly excessive maintenance actions. However, this approach aligns with the project&#8217;s priority: avoiding missed opportunities for preventive maintenance.</p><h4><strong>Step #3: Training the Model</strong></h4><p>As with all algorithms, the model training phase involves various optimization processes to converge toward the optimal parameters for the Random Forest model.</p><div><hr></div><p><em>A brief technical explanation for Data Science enthusiasts (apologies to the uninitiated </em>&#129327;<em>):</em></p><p>Hyperparameter optimization was conducted using <strong>GridSearchCV</strong>, which tests different combinations of parameters, including:</p><ul><li><p>Tree depth (from 4 to 8 levels),</p></li><li><p>Splitting criteria (Gini or entropy),</p></li><li><p>Minimum number of samples per split.</p></li></ul><p>A classic train/test split approach combined with 5-fold cross-validation and random permutations ensures robust results.</p><div><hr></div><h4><strong>Step #4: Performance Evaluation</strong></h4><p>The results are promising: our model achieves <strong>87% recall</strong> for detecting turbines in Class 1 and, most importantly, misses no critical Class 2 cases. Although the model can be overly cautious, misclassifying compliant turbines as risky, this trade-off is acceptable in a safety-first context.</p><p>Detection isn&#8217;t perfect, but that&#8217;s not a major concern. Class 1 spans about twenty cycles, making it nearly impossible to overlook a machine entering the red zone.</p><p>The only potential risks are encountering a new failure mode not captured during training or a sudden, unexpected breakdown.</p><h4><strong>Conclusion</strong></h4><p>An analysis of the most influential sensors reveals the critical importance of:</p><ul><li><p><strong>Sensors s7 and s11</strong> (measuring compressor discharge pressure),</p></li><li><p><strong>Sensor s4</strong> (monitoring the low-pressure turbine outlet temperature),</p></li><li><p><strong>Sensor s12</strong> (tracking the fuel flow ratio).</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J9Et!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J9Et!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png 424w, https://substackcdn.com/image/fetch/$s_!J9Et!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png 848w, https://substackcdn.com/image/fetch/$s_!J9Et!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png 1272w, https://substackcdn.com/image/fetch/$s_!J9Et!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J9Et!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png" width="1390" height="989" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:989,&quot;width&quot;:1390,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!J9Et!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png 424w, https://substackcdn.com/image/fetch/$s_!J9Et!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png 848w, https://substackcdn.com/image/fetch/$s_!J9Et!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png 1272w, https://substackcdn.com/image/fetch/$s_!J9Et!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c29f31c-bb9a-4bd6-ac05-77b8ba7d974a_1390x989.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Feature Importance in Failure Prediction</figcaption></figure></div><p>These sensors play a key role in our model&#8217;s predictions. Beyond modeling, it would be beneficial to incorporate these measurements into reference documentation and training programs as &#8220;sentinels&#8221; for monitoring equipment health.</p><h2><strong>From Prototype to Production</strong></h2><h4><strong>Solution Architecture</strong></h4><p>Transitioning from a prototype to an operational solution often represents a major challenge in data science projects. To make the model accessible and usable, I opted for a simple yet effective architecture based on an interactive web application developed with <strong>Streamlit</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> (<em>NB</em>: I&#8217;ve previously discussed this framework in earlier projects).</p><h4>Webapp overview</h4><p>The user interface developed with Streamlit includes the following features:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gDYL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gDYL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!gDYL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!gDYL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!gDYL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gDYL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:546687,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!gDYL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!gDYL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!gDYL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!gDYL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4540e39-4e24-4820-bd20-6fbfe03be406_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><strong>Simulating new sensor data:</strong> </p><p>A button generates random new configurations of values outside the training dataset.</p></li><li><p><strong>3D Visualization of Results:</strong></p><p>A PCA-based visualization shows the machine&#8217;s new state relative to historical data.</p></li><li><p><strong>Prediction Panel:</strong> </p><p>For each new configuration, the app displays the probabilities for each risk level, along with clear maintenance action recommendations.</p></li></ol><h2>Potential Improvements</h2><h4><strong>Real-Time Integration</strong></h4><p>The data used for this project was batch-collected. However, real-time sensor data is becoming the standard for such applications. This would involve:</p><ul><li><p>Ingesting continuous data streams.</p></li><li><p>Preprocessing signals to reduce noise in input data.</p></li><li><p>Using Deep Learning algorithms capable of detecting anomalies in time series, coupled with an automatic alert system.</p></li></ul><p>This predictive maintenance project clearly calls for a follow-up project with continuous data. (NB: This is on my to-do list &#128515;!)</p><h4><strong>Enriching the Dataset</strong></h4><p>The tool we&#8217;ve created is purely mathematical, based solely on sensor data. However, we could build more sophisticated tools by combining raw data with:</p><ul><li><p>CMMS data (Computerized Maintenance Management System&#8212;detailed intervention history),</p></li><li><p>Functional analyses of equipment,</p></li><li><p>Incident reports and failure descriptions,</p></li><li><p>Maintenance technicians&#8217; and engineers&#8217; expertise,<br>...and more.</p></li></ul><p>This would enable predictions not only of failure risk but also failure modes, root causes, and priority corrective actions, resulting in a much smarter tool capable of guiding the type and content of predictive maintenance interventions.</p><h4><strong>Advanced Monitoring System</strong></h4><p>The model was built using data from a single collection campaign. However, machines are dynamic systems, and we likely didn&#8217;t capture their full complexity during an isolated measurement campaign. A predictive tool valid at a specific time may quickly become obsolete.</p><p>To address this, it&#8217;s crucial to implement a data collection system paired with performance monitoring to retrain the model regularly, if needed.</p><h2>Conclusion</h2><p>We&#8217;ve just explored, step by step, the deployment of a Machine Learning project in an industrial context. This project, though educational and intentionally simplified, highlights the many technical and strategic challenges of creating a predictive maintenance solution.</p><p>To go further, initiatives such as integrating continuous data or leveraging CMMS databases could not only enhance precision but also provide deeper insights into failure modes and prioritize interventions.</p><p>Feel free to share your feedback and experiences in the comments!</p><p><em>&#8230; Let&#8217;s rock with data! &#129304;</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>Wikipedia - No free lunch theorem - <a href="https://en.wikipedia.org/wiki/No_free_lunch_theorem">https://en.wikipedia.org/wiki/No_free_lunch_theorem</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>Spotfire - What is a random forest ? - <a href="https://www.spotfire.com/glossary/what-is-a-random-forest">https://www.spotfire.com/glossary/what-is-a-random-forest</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Moran Gautherot - Tout savoir sur le random forest - <a href="https://www.youtube.com/watch?v=QQN5NjJtUcc">https://www.youtube.com/watch?v=QQN5NjJtUcc</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>Notebook - <a href="https://github.com/arnaud-dg/Preventive_Maintenance_Aeronautics/blob/main/notebooks/Aircraft_Predictive_Maintenance_2_Modeling.ipynb">https://github.com/arnaud-dg/Preventive_Maintenance_Aeronautics/blob/main/notebooks/Aircraft_Predictive_Maintenance_2_Modeling.ipynb</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>Streamlit Website - <a href="https://streamlit.io/">https://streamlit.io/</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - Machine Learning based Predictive Maintenance - Part 1 : The project brief]]></title><description><![CDATA[The Pharm'AI Company - Project #5]]></description><link>https://databoostindustry.substack.com/p/eng-machine-learning-based-predictive-92a</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-machine-learning-based-predictive-92a</guid><dc:creator><![CDATA[DATA BOOST Industry]]></dc:creator><pubDate>Tue, 05 Nov 2024 16:44:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Previously, we explored the <a href="https://databoostindustry.substack.com/p/eng-machine-learning-based-predictive">theoretical foundations of predictive maintenance</a> and its importance in modern industry. We examined how this approach differs from traditional corrective and preventive maintenance methods by leveraging data analysis to anticipate failures.</p><p>In this new article, we will now translate these theoretical considerations into a practical case.</p><h2>Description of the Dataset Used</h2><p>In my projects, I consistently seek out datasets related to the pharmaceutical industry. In this case, as I couldn't find a suitable pharmaceutical dataset, I opted to illustrate this predictive maintenance project with a dataset from the aeronautics sector<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. This choice is not problematic, as the principles of predictive maintenance easily transcend industry boundaries; methodologies developed for one field can often be successfully applied to others.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TvqM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TvqM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TvqM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TvqM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TvqM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TvqM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg" width="613" height="344.8125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:432,&quot;width&quot;:768,&quot;resizeWidth&quot;:613,&quot;bytes&quot;:82663,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!TvqM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TvqM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TvqM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TvqM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F658cb0a1-a4a7-40f7-9e3f-4975ac14f121_768x432.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The dataset I selected tracks a fleet of about one hundred aircraft turbines, each equipped with 21 different sensors. A measurement is recorded for each of these 21 sensors over the operating cycles of the turbines. Thus, for each turbine, we obtain a multivariate time series.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DXCZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DXCZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png 424w, https://substackcdn.com/image/fetch/$s_!DXCZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png 848w, https://substackcdn.com/image/fetch/$s_!DXCZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png 1272w, https://substackcdn.com/image/fetch/$s_!DXCZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DXCZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png" width="468" height="429.14716981132074" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:530,&quot;resizeWidth&quot;:468,&quot;bytes&quot;:35440,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!DXCZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png 424w, https://substackcdn.com/image/fetch/$s_!DXCZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png 848w, https://substackcdn.com/image/fetch/$s_!DXCZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png 1272w, https://substackcdn.com/image/fetch/$s_!DXCZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b950715-8fca-45ac-8a5c-fe7aa88401e0_530x486.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">List of Dataset Parameters: Aircraft Engine</figcaption></figure></div><p>In addition to sensor values, the training dataset includes a metric for each record called <strong>Time-To-Failure (TTF)</strong>, which is the number of cycles remaining until the next failure occurs.</p><p>The Time-To-Failure is a numerical value, and there are two ways to handle it:</p><ul><li><p>We can retain its numeric nature, creating a <strong>regression problem</strong> where we aim to predict the number of cycles remaining until the next failure.</p></li><li><p>Alternatively, we can transform this measure into a categorical variable with risk thresholds, creating a <strong>classification problem</strong>.</p></li></ul><p>I chose the second option as it allows us to build simple, understandable operational rules. I thus transformed TTF into three categories:</p><ul><li><p><strong>High failure risk</strong> (if TTF &lt; 30 cycles),</p></li><li><p><strong>Moderate failure risk</strong> (if 30 &#8804; TTF &lt; 50 cycles),</p></li><li><p><strong>No failure risk</strong>, representing standard machine operation (if TTF &#8805; 50 cycles).</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pc0v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pc0v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!pc0v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!pc0v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!pc0v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pc0v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99847,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!pc0v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!pc0v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!pc0v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!pc0v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabf34576-cdff-4294-8fff-3e05aad80588_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The core objective of our project is to predict, based on sensor configurations, the risk zone in which the equipment is operating.</p><h2>Analyzing the Dataset</h2><h4>What is Exploratory Data Analysis?</h4><p>Exploratory Data Analysis (EDA) forms one of the foundational steps of any data science project.</p><p>The tools we use to conduct EDA are often very simple: histograms, graphs, tables, etc<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. However, this step, sometimes underestimated, is essential because it allows us to:</p><ul><li><p>Understand the structure and nature of our data,</p></li><li><p>Visualize the distributions of different parameters,</p></li><li><p>Check data quality, enabling us to clean missing or aberrant values as needed,</p></li><li><p>Identify potential correlations between different sensors.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xxme!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xxme!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xxme!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xxme!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xxme!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xxme!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg" width="564" height="629.565" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:893,&quot;width&quot;:800,&quot;resizeWidth&quot;:564,&quot;bytes&quot;:73862,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!xxme!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xxme!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xxme!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xxme!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a3d985e-53cc-4ad7-9fe1-fb0bb544b162_800x893.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Indeed, the Data Scientist&#8217;s role when modeling a phenomenon is not limited to deploying a &#8220;standard&#8221; approach; it is always necessary to characterize the data and adapt preprocessing and modeling steps accordingly.</p><p><strong>NB</strong>: The EDA process can be a bit tedious, and the purpose of this article is not to delve into it exhaustively. I will provide selected highlights here; the full EDA is available in this <a href="https://github.com/arnaud-dg/Preventive_Maintenance_Aeronautics/blob/main/notebooks/Aircraft_Predictive_Maintenance_1_Exploratory_Data_Analysis.ipynb">notebook</a> on my GitHub repository<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>.</p><h4>1. Data Types</h4><p>A good initial practice when starting an analysis is always to display the variable types and basic statistics in data tables.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eBl1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eBl1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png 424w, https://substackcdn.com/image/fetch/$s_!eBl1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png 848w, https://substackcdn.com/image/fetch/$s_!eBl1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png 1272w, https://substackcdn.com/image/fetch/$s_!eBl1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eBl1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png" width="343" height="530" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:530,&quot;width&quot;:343,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20859,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!eBl1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png 424w, https://substackcdn.com/image/fetch/$s_!eBl1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png 848w, https://substackcdn.com/image/fetch/$s_!eBl1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png 1272w, https://substackcdn.com/image/fetch/$s_!eBl1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f364398-4ee6-4532-994a-c999e0df3f5e_343x530.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Data types</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MFPS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MFPS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png 424w, https://substackcdn.com/image/fetch/$s_!MFPS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png 848w, https://substackcdn.com/image/fetch/$s_!MFPS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png 1272w, https://substackcdn.com/image/fetch/$s_!MFPS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MFPS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png" width="996" height="138" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:138,&quot;width&quot;:996,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17513,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!MFPS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png 424w, https://substackcdn.com/image/fetch/$s_!MFPS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png 848w, https://substackcdn.com/image/fetch/$s_!MFPS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png 1272w, https://substackcdn.com/image/fetch/$s_!MFPS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed304c9d-0152-4660-9bc1-3c7447ad5b96_996x138.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Basic statistics example</figcaption></figure></div><p><strong>Conclusions:</strong></p><ul><li><p>Our dataset is relatively clean and contains no missing values that need to be addressed.</p></li><li><p>The sensors provide numerical data, with no categorical variables.</p></li><li><p>Some parameters do not vary over the cycles, so I chose to exclude them from the study.</p></li></ul><h4>2. Univariate Statistical Analysis</h4><p>Univariate analysis is a method used to examine the distribution of a variable in order to understand its fundamental characteristics, such as mean, median, or standard deviation. It allows us to detect trends, dispersions, and outliers, providing an overview of the parameter&#8217;s behavior in isolation.</p><p>Since the dataset contains only continuous variables, I displayed them as histograms. Here is the result:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hRf5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hRf5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png 424w, https://substackcdn.com/image/fetch/$s_!hRf5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png 848w, https://substackcdn.com/image/fetch/$s_!hRf5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png 1272w, https://substackcdn.com/image/fetch/$s_!hRf5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hRf5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png" width="1456" height="1167" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1167,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!hRf5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png 424w, https://substackcdn.com/image/fetch/$s_!hRf5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png 848w, https://substackcdn.com/image/fetch/$s_!hRf5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png 1272w, https://substackcdn.com/image/fetch/$s_!hRf5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c16e8db-1e39-42a5-9d97-ee506cef3f21_1982x1589.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Distributions of the different variables</figcaption></figure></div><p><strong>Conclusions:</strong></p><ul><li><p>The dataset does not contain significant outliers, except perhaps for sensor s6, which shows some atypical values.</p></li><li><p>Some variables appear to follow a normal distribution, while others show strong asymmetry (see Kurtosis and Skewness scores<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>). Preprocessing, including correction (such as logarithmic transformation) and normalization, will be necessary before proceeding to the modeling phase.</p></li></ul><h4>3. Bivariate Statistical Analysis</h4><p>Bivariate analysis examines the relationship between two variables to determine whether an association or correlation exists between them.</p><p>Below is the correlation matrix showing the existing correlations between each pair of parameters.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xDZh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xDZh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png 424w, https://substackcdn.com/image/fetch/$s_!xDZh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png 848w, https://substackcdn.com/image/fetch/$s_!xDZh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!xDZh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xDZh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png" width="1069" height="1002" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1002,&quot;width&quot;:1069,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!xDZh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png 424w, https://substackcdn.com/image/fetch/$s_!xDZh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png 848w, https://substackcdn.com/image/fetch/$s_!xDZh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!xDZh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57cc801c-b9e6-4770-bc36-36768a6138c0_1069x1002.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Correlation Clustermap of our Dataset</figcaption></figure></div><p><strong>Conclusions:</strong></p><ul><li><p>Several parameters show a strong correlation (positive in red or negative in blue) with our target variable, Time-To-Failure.</p></li><li><p>Some parameters are strongly correlated with each other; for instance, s9 and s14 actually measure the same physical quantity.</p></li></ul><p>We refer to this as multicollinearity when there are strong correlations between variables. This is an important point to consider for the next phases of the project, as several modeling algorithms are sensitive to this phenomenon, which can affect their performance and interpretability.</p><p>To address this phenomenon, it is often necessary to select the most representative features (variables) or apply dimensionality reduction techniques.</p><h3>4. Principal Component Analysis</h3><p>Given the complexity of a dataset with more than twenty columns, Principal Component Analysis (PCA) becomes a valuable tool for visualizing the underlying phenomena. <strong>NB</strong>: I previously covered this tool in the article on <a href="https://databoostindustry.substack.com/p/eng-cpv-40-part-3-multivariate-analysis">CPV 4.0</a>.</p><p>PCA<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> is a statistical method that, through projection into a new mathematical space, simplifies data complexity while retaining its essence. In other words, it reduces the number of variables to be studied, keeping those that best explain data variation.</p><p>As shown in the graph below, called a "Scree plot"<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>, PCA reveals that with only three principal components, we capture nearly 70% of the total data variance - with the first component alone explaining 53% of this variance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WMRl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WMRl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png 424w, https://substackcdn.com/image/fetch/$s_!WMRl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png 848w, https://substackcdn.com/image/fetch/$s_!WMRl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png 1272w, https://substackcdn.com/image/fetch/$s_!WMRl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WMRl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png" width="593" height="385.97633136094674" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:550,&quot;width&quot;:845,&quot;resizeWidth&quot;:593,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!WMRl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png 424w, https://substackcdn.com/image/fetch/$s_!WMRl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png 848w, https://substackcdn.com/image/fetch/$s_!WMRl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png 1272w, https://substackcdn.com/image/fetch/$s_!WMRl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2ab846e-d6ce-4eb6-950f-78e5223bc5b5_845x550.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>By applying a 2D PCA, the results are as follows:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hk8J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hk8J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png 424w, https://substackcdn.com/image/fetch/$s_!hk8J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png 848w, https://substackcdn.com/image/fetch/$s_!hk8J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png 1272w, https://substackcdn.com/image/fetch/$s_!hk8J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hk8J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!hk8J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png 424w, https://substackcdn.com/image/fetch/$s_!hk8J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png 848w, https://substackcdn.com/image/fetch/$s_!hk8J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png 1272w, https://substackcdn.com/image/fetch/$s_!hk8J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd806799b-45ea-49c4-8bfb-91c495fc9ba4_1562x784.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">PCA Diagram: Left by Risk Categories, Right by TTF</figcaption></figure></div><p>We observe a scatter plot, with each point corresponding to a different data record. When overlaying the failure risk on these visuals, a pattern emerges from the data:</p><ul><li><p>Records corresponding to optimal turbine performance (green) cluster distinctly on the left side of the chart.</p></li><li><p>In contrast, records corresponding to pre-failure states (orange and red) gradually shift to the right.</p></li></ul><p>Looking more closely, we even see distinct trajectories, revealing different degradation behaviors depending on the turbines.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZG-m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZG-m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png 424w, https://substackcdn.com/image/fetch/$s_!ZG-m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png 848w, https://substackcdn.com/image/fetch/$s_!ZG-m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png 1272w, https://substackcdn.com/image/fetch/$s_!ZG-m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZG-m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png" width="403" height="385.9643328929987" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:725,&quot;width&quot;:757,&quot;resizeWidth&quot;:403,&quot;bytes&quot;:288091,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ZG-m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png 424w, https://substackcdn.com/image/fetch/$s_!ZG-m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png 848w, https://substackcdn.com/image/fetch/$s_!ZG-m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png 1272w, https://substackcdn.com/image/fetch/$s_!ZG-m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd0af97-718b-40c4-9b0b-f95c2d7eff8a_757x725.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">ACP - Failure modes</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FyXc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FyXc!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif 424w, https://substackcdn.com/image/fetch/$s_!FyXc!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif 848w, https://substackcdn.com/image/fetch/$s_!FyXc!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif 1272w, https://substackcdn.com/image/fetch/$s_!FyXc!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FyXc!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif" width="1268" height="984" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:984,&quot;width&quot;:1268,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7746296,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!FyXc!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif 424w, https://substackcdn.com/image/fetch/$s_!FyXc!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif 848w, https://substackcdn.com/image/fetch/$s_!FyXc!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif 1272w, https://substackcdn.com/image/fetch/$s_!FyXc!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5bb31f3d-5786-4820-809a-cb7f49447697_1268x984.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Repr&#233;sentation 3D - gif</figcaption></figure></div><div><hr></div><h2>Conclusion</h2><p>The exploratory analysis phase highlighted and confirmed the potential of our data while validating the feasibility of our project: to detect early signs of failure. The dimensionality reduction achieved through PCA provided a clear view of equipment degradation dynamics.</p><p>In the final article in this series, we will explore how to model our process and create a concrete prediction tool capable of alerting maintenance teams before critical failures occur.</p><p>Feel free to check out the complete notebook on my GitHub to discover all EDA steps and share your feedback!</p><p><em>&#8230; Let&#8217;s rock with data &#8230; &#127928;</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>NASA - Turbofan Engine Degradation Simulation - <a href="https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/">https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>Wenxin - A comprehensive guide to Exploratory Data Analysis - <a href="https://medium.com/@WenxinZhang98/exploratory-data-analysis-83bfb1f17dc5">https://medium.com/@WenxinZhang98/exploratory-data-analysis-83bfb1f17dc5</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Notebook EDA Predictive Maintenance - <a href="https://github.com/arnaud-dg/Preventive_Maintenance_Aeronautics/blob/main/notebooks/Aircraft_Predictive_Maintenance_1_Exploratory_Data_Analysis.ipynb">https://github.com/arnaud-dg/Preventive_Maintenance_Aeronautics/blob/main/notebooks/Aircraft_Predictive_Maintenance_1_Exploratory_Data_Analysis.ipynb</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>NIST - Measures of Skewness and Kurtosis - <a href="https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm#:~:text=Skewness%20is%20a%20measure%20of,relative%20to%20a%20normal%20distribution">https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm#:~:text=Skewness%20is%20a%20measure%20of,relative%20to%20a%20normal%20distribution</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>Universit&#233; de Sherbrooke - Analyse en Composantes Principales - <a href="https://spss.espaceweb.usherbrooke.ca/analyse-en-composantes-principales-2/">https://spss.espaceweb.usherbrooke.ca/analyse-en-composantes-principales-2/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>Wikipedia - Scree plot - <a href="https://en.wikipedia.org/wiki/Scree_plot">https://en.wikipedia.org/wiki/Scree_plot</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - Machine Learning based Predictive Maintenance - Part 1 : The project brief]]></title><description><![CDATA[The Pharm'AI Company - Project #5]]></description><link>https://databoostindustry.substack.com/p/eng-machine-learning-based-predictive</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-machine-learning-based-predictive</guid><dc:creator><![CDATA[DATA BOOST Industry]]></dc:creator><pubDate>Tue, 29 Oct 2024 16:35:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this new article series, I&#8217;d like to discuss an example of Machine Learning usage through a project dedicated to predictive maintenance of equipment.</p><h2><strong>Some Contextual Elements</strong></h2><p>Let&#8217;s start with the basics: what is predictive maintenance?</p><p>In industry, there are various types of maintenance<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> :</p><ul><li><p><strong>Corrective maintenance</strong>, which addresses and fixes a malfunction that occurs on a machine. This type of maintenance is considered reactive, meaning intervention happens after a resource failure.</p></li><li><p><strong>Preventive maintenance</strong>, which involves performing checks and/or replacing parts to anticipate and prevent breakdowns. This preventive approach can be executed in different ways:</p><ul><li><p>Regularly and systematically over time, referred to as predictive or scheduled preventive maintenance.</p></li><li><p>Adaptively, based on data generated by the machine, known as conditional maintenance or preventive maintenance.</p></li></ul></li></ul><div><hr></div><p>With these definitions, I think you now understand the importance of the issue.</p><p>Actions that are too close together may lead to unnecessary activities, while those too far apart may increase the risk of machine breakdowns.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M6ai!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M6ai!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png 424w, https://substackcdn.com/image/fetch/$s_!M6ai!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png 848w, https://substackcdn.com/image/fetch/$s_!M6ai!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png 1272w, https://substackcdn.com/image/fetch/$s_!M6ai!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M6ai!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png" width="506" height="405.4096385542169" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:665,&quot;width&quot;:830,&quot;resizeWidth&quot;:506,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Refer to caption&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Refer to caption" title="Refer to caption" srcset="https://substackcdn.com/image/fetch/$s_!M6ai!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png 424w, https://substackcdn.com/image/fetch/$s_!M6ai!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png 848w, https://substackcdn.com/image/fetch/$s_!M6ai!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png 1272w, https://substackcdn.com/image/fetch/$s_!M6ai!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F614d0a2b-8fa7-46d2-bea5-d9c325d28338_830x665.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Maintenance plans of RM, PM and PdM</figcaption></figure></div><p>The advantage of implementing predictive maintenance solutions is to trigger maintenance actions based on factual data only when the risk of a breakdown is significant. The challenge, therefore, is to determine the optimal alert thresholds to minimize overall maintenance costs<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!892y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!892y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png 424w, https://substackcdn.com/image/fetch/$s_!892y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png 848w, https://substackcdn.com/image/fetch/$s_!892y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png 1272w, https://substackcdn.com/image/fetch/$s_!892y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!892y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png" width="531" height="467.21852387843705" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:608,&quot;width&quot;:691,&quot;resizeWidth&quot;:531,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Refer to caption&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Refer to caption" title="Refer to caption" srcset="https://substackcdn.com/image/fetch/$s_!892y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png 424w, https://substackcdn.com/image/fetch/$s_!892y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png 848w, https://substackcdn.com/image/fetch/$s_!892y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png 1272w, https://substackcdn.com/image/fetch/$s_!892y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0d537f4-703d-40f4-84ee-32bc8adf98dd_691x608.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparison of RM, PM and PdM on the cost and frequency of maintenance work</figcaption></figure></div><h2><strong>What Data to Collect?</strong></h2><p><strong>The Key Role of Measurement Sensors</strong></p><p>The core of predictive maintenance lies in collecting and analyzing data generated by equipment in operation.</p><p>Manual data entry is not suited to this purpose. The volume of information to be entered is considerable, and such entries would only introduce response delays, completely contrary to the tool&#8217;s philosophy.</p><p>Therefore, deploying such projects requires an initial technological investment. Machines must be equipped with IIoT (Industrial Internet of Things) sensors connected to a Cyber-Physical System (CPS) that centralizes the data in a database. Generally, SCADA systems<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> or tools like Data Historian<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> are used for collecting, storing, and aggregating machine data<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sFQj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sFQj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png 424w, https://substackcdn.com/image/fetch/$s_!sFQj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png 848w, https://substackcdn.com/image/fetch/$s_!sFQj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png 1272w, https://substackcdn.com/image/fetch/$s_!sFQj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sFQj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png" width="544" height="410.88" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:850,&quot;resizeWidth&quot;:544,&quot;bytes&quot;:175179,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sFQj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png 424w, https://substackcdn.com/image/fetch/$s_!sFQj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png 848w, https://substackcdn.com/image/fetch/$s_!sFQj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png 1272w, https://substackcdn.com/image/fetch/$s_!sFQj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc6afeda-5c5c-4c4f-bb49-1db400cc03f7_850x642.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Comparison of CPS and IoT; supporting Industry 4.0 development</figcaption></figure></div><h4>What Types of Sensors to Use?</h4><p>As shown in the diagram below<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>, there are many different types of sensors that can be used to characterize a machine's operation. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2M7Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2M7Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2M7Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2M7Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2M7Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2M7Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg" width="1456" height="444" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:444,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60993,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2M7Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2M7Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2M7Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2M7Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7833e8c5-83d0-47eb-b589-d7e7d46c4383_2560x781.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Common ones include:</p><ul><li><p>Vibration and acoustic (ultrasound) sensors,</p></li><li><p>Power or electrical consumption sensors,</p></li><li><p>Temperature and pressure sensors.</p></li></ul><p>However, there&#8217;s no real limit: any operational parameter from equipment can feed a Machine Learning model (e.g., rotation, torque, flow rate). Discrete events, such as states, alarms, and logs, are also relevant variables to collect and analyze.</p><h4><strong>How Often to Analyze the Data?</strong></h4><p>Once sensor data is cleaned and stored, there are two main methods for analysis:</p><ul><li><p><strong>Batch Processing</strong>: Aggregating data by cycle, day, or campaign, etc., and using a predictive Machine Learning model to make predictions based on these discontinuous data points.</p></li><li><p><strong>Real-Time Processing as a Time Series</strong>: Using anomaly detection algorithms, which often rely on neural networks and Deep Learning, to detect deviations from normal behavior.</p></li></ul><p>There&#8217;s no right or wrong solution; it depends on the project objectives, the sensors available, machine downtime costs, etc.</p><h2><strong>Benefits of Predictive Maintenance</strong></h2><p>Predictive maintenance positively impacts several business aspects:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3U3Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3U3Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png 424w, https://substackcdn.com/image/fetch/$s_!3U3Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png 848w, https://substackcdn.com/image/fetch/$s_!3U3Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png 1272w, https://substackcdn.com/image/fetch/$s_!3U3Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3U3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png" width="508" height="458.19607843137254" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:552,&quot;width&quot;:612,&quot;resizeWidth&quot;:508,&quot;bytes&quot;:55029,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3U3Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png 424w, https://substackcdn.com/image/fetch/$s_!3U3Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png 848w, https://substackcdn.com/image/fetch/$s_!3U3Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png 1272w, https://substackcdn.com/image/fetch/$s_!3U3Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcac06d05-4c0c-4a9c-98cc-586d7815b817_612x552.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here are the main KPIs that measure the gains from predictive maintenance<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a> : </p><ul><li><p><strong>Equipment Availability and OEE (Overall Equipment Effectiveness)</strong></p><ul><li><p>Reduced unplanned downtime (fewer unexpected stops)</p></li><li><p>Reduced planned downtime (less total maintenance time)</p></li></ul></li><li><p><strong>Maintenance Costs</strong></p><ul><li><p>Reduced repair costs (fewer breakdowns and severe failures)</p></li><li><p>Optimized resource utilization (better allocation of maintenance technicians and material resources)</p></li></ul></li><li><p><strong>Quality Improvement</strong></p><ul><li><p>Well-maintained equipment is less likely to produce defects or quality variations. In the pharmaceutical industry, maintenance is regularly checked during audits and inspections.</p></li></ul></li></ul><p>Finally, here are a few quantified examples I found online<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a> :</p><ul><li><p>Predictive maintenance allowed SNCF R&#233;seau&#8217;s teams to reduce switch incidents by 30%.</p></li><li><p>AI-driven maintenance reduced costs by about 20% at Alstom.</p></li><li><p>EDF monitors its wind turbines in real-time with IoT solutions, increasing equipment availability by 1% and reducing operating and maintenance costs by 2%<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a>.</p></li></ul><h2><strong>Project Charter</strong></h2><p>As always, to outline the project, I decided to formalize an <strong>AI Project Canvas</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a> to align as closely as possible with the industrial world. This tool specifies the project&#8217;s prerequisites, performance expectations, deployment considerations, and associated Return on Investment (ROI).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UTUx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UTUx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!UTUx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!UTUx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!UTUx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UTUx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png" width="1280" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:135759,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UTUx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!UTUx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!UTUx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!UTUx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9b7800e-6f6a-4d84-b501-758a752f151a_1280x720.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This project charter is illustrative and is based on approximations and calculation assumptions to be adjusted according to companies and their context.</p><p>An important point to consider in the ROI is the size of the fleet to cover. If several machines are equivalent, the development work will be more easily transferable from one machine to another.</p><h2>Conclusion</h2><p>I hope this introduction helped you understand the concept and purpose of predictive maintenance better.</p><p>In the next article, we&#8217;ll explore and analyze a maintenance dataset together.</p><p><em>&#8230; Waiting for that &#8230; Let&#8217;s rock with data &#8230; </em>&#127928;!</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>IBM - What is predictive maintenance? - <a href="https://www.ibm.com/topics/predictive-maintenance">https://www.ibm.com/topics/predictive-maintenance</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>A Survey of Predictive Maintenance: Systems, Purposes and Approaches - Tianwen&nbsp;Zhu, Yongyi&nbsp;Ran, Xin&nbsp;Zhou, and Yonggang&nbsp;Wen - <a href="https://arxiv.org/html/1912.07383v2">arXiv:1912.07383v2 [eess.SP] 22 Mar 2024</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Copadata - Qu&#8217;est-ce qu&#8217;un SCADA - <a href="https://www.copadata.com/fr/produits/zenon-software-platform/qu-est-ce-qu-un-scada-supervisory-control-and-data-acquisition/">https://www.copadata.com/fr/produits/zenon-software-platform/qu-est-ce-qu-un-scada-supervisory-control-and-data-acquisition/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>C3.ai - Glossary Data Historian - <a href="https://c3.ai/glossary/features/data-historian/">https://c3.ai/glossary/features/data-historian/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5><strong>Fog Computing Enabling Industrial Internet of Things: State-of-the-Art and Research Challenges,</strong> November 2019, Sensors 19(21):4807 - <a href="https://www.researchgate.net/publication/337043545_Fog_Computing_Enabling_Industrial_Internet_of_Things_State-of-the-Art_and_Research_Challenges">https://www.researchgate.net/publication/337043545_Fog_Computing_Enabling_Industrial_Internet_of_Things_State-of-the-Art_and_Research_Challenges</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>Instrumentys - <strong>La maintenance pr&#233;dictive - </strong><a href="https://instrumentys.com/2020/08/18/la-maintenance-predictive-une-grande-valeur-ajoutee-particulierement-pour-lindustrie-agroalimentaire/">https://instrumentys.com/2020/08/18/la-maintenance-predictive-une-grande-valeur-ajoutee-particulierement-pour-lindustrie-agroalimentaire/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><h5>Spotfire - What is predictive maintenance - <a href="https://www.spotfire.com/glossary/what-is-predictive-maintenance">https://www.spotfire.com/glossary/what-is-predictive-maintenance</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><h5>Usine Nouvelle - La maintenance entre en gare - <a href="https://www.uphf.fr/actualites/newsletter/la_maintenance_entre_en_gare_usinenouvelle_25mai2017.pdf">https://www.uphf.fr/actualites/newsletter/la_maintenance_entre_en_gare_usinenouvelle_25mai2017.pdf</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><h5>Evolution.skf - <a href="https://evolution.skf.com/fr/une-approche-statistique-pour-reduire-les-couts-dexploitation-des-eoliennes/#">https://evolution.skf.com/fr/une-approche-statistique-pour-reduire-les-couts-dexploitation-des-eoliennes/#</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><h5><strong>Data Product Canvas &#8212; A practical framework for building high-performance data products,</strong> Leandro Carvallo, ****<a href="https://medium.com/@leandroscarvalho/data-product-canvas-a-practical-framework-for-building-high-performance-data-products-7a1717f79f0">https://medium.com/@leandroscarvalho/data-product-canvas-a-practical-framework-for-building-high-performance-data-products-7a1717f79f0</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - AI Medical Devices Knowledge Base - Part 3: The First Iteration]]></title><description><![CDATA[The Pharm'AI Company - Project #4]]></description><link>https://databoostindustry.substack.com/p/eng-ai-medical-devices-knowledge-889</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-ai-medical-devices-knowledge-889</guid><dc:creator><![CDATA[DATA BOOST Industry]]></dc:creator><pubDate>Fri, 25 Oct 2024 13:12:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iSzq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>After my last <a href="https://databoostindustry.substack.com/p/eng-ai-medical-devices-knowledge-466">deep dive into the functionality and implementation of RAG</a>, it&#8217;s time to share the prototype of my conversational agent (also called a chatbot) with you so you can test it in a real-world setting.</p><p>The prototype looks like this and can be accessed by following this <a href="https://fda-510k-dkpqqdmyxeshkpspqez74e.streamlit.app/">link</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iSzq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iSzq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png 424w, https://substackcdn.com/image/fetch/$s_!iSzq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png 848w, https://substackcdn.com/image/fetch/$s_!iSzq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png 1272w, https://substackcdn.com/image/fetch/$s_!iSzq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iSzq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png" width="1434" height="982" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:982,&quot;width&quot;:1434,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:211062,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!iSzq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png 424w, https://substackcdn.com/image/fetch/$s_!iSzq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png 848w, https://substackcdn.com/image/fetch/$s_!iSzq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png 1272w, https://substackcdn.com/image/fetch/$s_!iSzq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37e0e9a-36f9-4ef2-93d0-e96d682677ba_1434x982.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">510k Knowledge Base tool</figcaption></figure></div><p><em>(PS: For transparency, I should mention that as the administrator of my Snowflake account, I have the ability to view the content of your queries&#8230; though I promise I won&#8217;t make a habit of it </em>&#128522;<em>)</em></p><h2><strong>The Web Application</strong></h2><p>I used the Streamlit platform to create the graphical interface for interacting with the AI. As I mentioned in the context of the <a href="https://databoostindustry.substack.com/p/eng-deep-learning-based-bacterial-801">gelose plate counting project</a>, the interface isn&#8217;t particularly sleek, but it does allow us to quickly build functional applications.</p><p>I designed the interface to be organized in multiple tiles.</p><h4><strong>The Sidebar Navigation</strong></h4><p>The sidebar naturally displays the tool&#8217;s name and description, along with a range of options to customize how the chatbot operates.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AUJw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AUJw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png 424w, https://substackcdn.com/image/fetch/$s_!AUJw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png 848w, https://substackcdn.com/image/fetch/$s_!AUJw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png 1272w, https://substackcdn.com/image/fetch/$s_!AUJw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AUJw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png" width="207" height="348.9230769230769" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:299,&quot;resizeWidth&quot;:207,&quot;bytes&quot;:20212,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!AUJw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png 424w, https://substackcdn.com/image/fetch/$s_!AUJw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png 848w, https://substackcdn.com/image/fetch/$s_!AUJw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png 1272w, https://substackcdn.com/image/fetch/$s_!AUJw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77ec711e-72f3-484e-8ba1-2118fd8a38d6_299x504.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Options panel</figcaption></figure></div><p>The options allow users to configure settings such as:</p><ul><li><p><strong>The language of the application</strong>: Responses from the model are typically provided in this selected language.</p></li><li><p><strong>The type of LLM used</strong>: I envisioned allowing users to choose between three different models: Llama3-8b, Mistral 7b, and gemma-7b.</p></li><li><p><strong>Model temperature</strong>: This parameter controls the degree of creativity or unpredictability in the responses. A lower value generates more deterministic responses, while a higher value encourages more creative responses.</p></li><li><p><strong>Memory of previous conversations</strong>: When using a chatbot, it&#8217;s beneficial for it to recall previous exchanges. This option includes a &#8220;Clear Cache and Reset&#8221; button in case users wish to clear the memory cache.</p></li></ul><h4>Suggested Questions</h4><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U1Af!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U1Af!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png 424w, https://substackcdn.com/image/fetch/$s_!U1Af!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png 848w, https://substackcdn.com/image/fetch/$s_!U1Af!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png 1272w, https://substackcdn.com/image/fetch/$s_!U1Af!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U1Af!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png" width="612" height="136.3222748815166" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:235,&quot;width&quot;:1055,&quot;resizeWidth&quot;:612,&quot;bytes&quot;:50837,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!U1Af!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png 424w, https://substackcdn.com/image/fetch/$s_!U1Af!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png 848w, https://substackcdn.com/image/fetch/$s_!U1Af!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png 1272w, https://substackcdn.com/image/fetch/$s_!U1Af!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2227d9c7-0001-40ef-9a21-4256508176bd_1055x235.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">New requests suggestions</figcaption></figure></div><p>This is a common feature in user experience (UX) design for chatbots, as seen with major players like OpenAI<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, Anthropic<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, and Perplexity<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. Providing question suggestions helps guide users through the tool and facilitates a smoother document consultation experience. I&#8217;ve personally set the first three introductory questions, and subsequent suggestions are generated by the LLM based on the last exchange.</p><h4><strong>Main Interface - The Chatbot</strong></h4><p>The main interface is incredibly simple and intuitive: it includes an input field and an assistant response field, much like ChatGPT.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cZ3w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cZ3w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png 424w, https://substackcdn.com/image/fetch/$s_!cZ3w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png 848w, https://substackcdn.com/image/fetch/$s_!cZ3w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png 1272w, https://substackcdn.com/image/fetch/$s_!cZ3w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cZ3w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png" width="1041" height="516" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13888bbe-e862-4223-a822-557e777942bb_1041x516.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:516,&quot;width&quot;:1041,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:85353,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!cZ3w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png 424w, https://substackcdn.com/image/fetch/$s_!cZ3w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png 848w, https://substackcdn.com/image/fetch/$s_!cZ3w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png 1272w, https://substackcdn.com/image/fetch/$s_!cZ3w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13888bbe-e862-4223-a822-557e777942bb_1041x516.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Your digital medical device companion &#128521;</figcaption></figure></div><h2><strong>A Critical Look at the Results</strong></h2><p>It&#8217;s time to evaluate the performance of the tool I created, and I&#8217;ll be completely honest &#129488;.</p><h4><strong>Quality of the Responses</strong></h4><p>I had the opportunity to test the system informally with questions for which I already knew the answers. The level of hallucination is relatively low, with few errors or misunderstandings detected.</p><p>The primary flaw of the program is that the responses are often incomplete. It&#8217;s common for the LLM to respond:</p><ul><li><p>That it doesn&#8217;t know or can&#8217;t provide certain information, even though I know it actually possesses this information &#128545;</p></li><li><p>By listing only one or two examples, when ideally a meta-analysis or a complete, exhaustive list would be preferable.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6ntR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6ntR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png 424w, https://substackcdn.com/image/fetch/$s_!6ntR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png 848w, https://substackcdn.com/image/fetch/$s_!6ntR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png 1272w, https://substackcdn.com/image/fetch/$s_!6ntR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6ntR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png" width="1038" height="344" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/defc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:344,&quot;width&quot;:1038,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:47295,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!6ntR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png 424w, https://substackcdn.com/image/fetch/$s_!6ntR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png 848w, https://substackcdn.com/image/fetch/$s_!6ntR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png 1272w, https://substackcdn.com/image/fetch/$s_!6ntR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdefc7ffb-2f18-4aba-bbf9-feb6c10f58a6_1038x344.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These points detract slightly from the user experience. I believe this is due to how I configured the RAG. By relying on only two or three information fragments to compose a response, the LLM misses the broader, high-level picture of the question. Revisiting the metadata linked to each fragment could enable it to execute queries like: &#8220;List all the devices&#8230;&#8221;</p><h4>Does This Mean It&#8217;s a Failure?</h4><p>I admit that the system falls short of my expectations, but &#8220;failure&#8221; is a relative term. The performance of an LLM-RAG system depends on three essential criteria:</p><ol><li><p><strong>Quality of Source Data</strong></p><p>I&#8217;m quite confident about the accuracy of the information available on the FDA&#8217;s site. However, I spent very little time preparing the documents in PDF format. It&#8217;s likely that images, tables, and specific layouts create obstacles to cleanly ingesting content into the vector database. For example: if you ever build an LLM-RAG, don&#8217;t expect to integrate PowerPoint slides&#8212;the result will be poor.</p></li><li><p><strong>Investment</strong></p><p>Building a chatbot and processing queries is costly. I opted for relatively affordable embedding functions and models (note: I&#8217;ve spent around twenty dollars on this system&#8217;s implementation). If you want more effective, relevant tools, greater investment is necessary.</p></li><li><p><strong>Time Spent</strong></p><p>I spent roughly 20 hours on this project, which is relatively little. I can&#8217;t afford to spend more time on educational and free projects. However, in AI projects, time is the ultimate resource needed to achieve high-performing models. It&#8217;s clear that results would improve by refining raw data, optimizing segmentation and vectorization, and enhancing indexing and retrieval methods within the vector database. However, all of these ideas require time and numerous iterations.</p></li></ol><p>A more optimistic, less harsh view is to consider this project as a simple prototype, a first iteration&#8230; and it&#8217;s normal for a first iteration to not fully meet expectations.</p><h4><strong>The Value of the First Iteration</strong></h4><p>Many AI projects never see the light of day for various reasons:</p><ul><li><p>Poor understanding of user needs,</p></li><li><p>Excessively long development times,</p></li><li><p>Poor performance, etc.</p></li></ul><p>One key tip for successfully executing an AI project is to release a first iteration as quickly as possible for testing. Even if it&#8217;s based on a very limited set of documents, and even if performance is lacking or the project&#8217;s scope is narrowly focused.</p><p>Early delivery to domain experts helps identify essential features, edge cases, and the most suitable performance metrics and tests. Exploring a prototype is always more effective than any brainstorming session for realigning a project.</p><p>Considering our project from this perspective, we can say that the 20 hours invested have met their objective &#128513;.</p><h2><strong>Improvement Paths</strong></h2><p>Finally, I&#8217;d like to share some improvement paths that could enhance the tool.</p><ul><li><p><strong>Security Challenges</strong></p><p>One major challenge for conversational agents using RAG models lies in their resistance to prompt injection attacks<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. These attacks exploit a model&#8217;s vulnerability by embedding hidden instructions within user queries. This can be dangerous, as malicious users could collect sensitive information and/or poison the system. Anticipating moderation aspects<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> is crucial when developing AI.</p><p>For my project, I applied minimal constraints to the LLM prompt; this would be insufficient in a professional setting.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!76aK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!76aK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png 424w, https://substackcdn.com/image/fetch/$s_!76aK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png 848w, https://substackcdn.com/image/fetch/$s_!76aK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png 1272w, https://substackcdn.com/image/fetch/$s_!76aK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!76aK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png" width="670" height="375.51622418879055" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:1017,&quot;resizeWidth&quot;:670,&quot;bytes&quot;:46897,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!76aK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png 424w, https://substackcdn.com/image/fetch/$s_!76aK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png 848w, https://substackcdn.com/image/fetch/$s_!76aK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png 1272w, https://substackcdn.com/image/fetch/$s_!76aK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e0c8b21-03ed-4b57-b2d7-6bbcb5ca38fd_1017x570.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">510(k) Knowledge base - Main prompt</figcaption></figure></div><ul><li><p><strong>Optimizing the Document Base</strong></p><p>Only technical documents related to medical devices have been integrated into the document base. It would be beneficial to diversify the base by including documents with more of a &#8220;visionary&#8221; angle, such as theses, articles, normative texts on SaMD (Software as Medical Device)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>, or even YouTube transcripts or conference proceedings.</p></li><li><p><strong>Capturing User Feedback</strong></p><p>To identify weak points in our chatbot, it&#8217;s essential to allow users to rate response relevance (typically with &#128077; and &#128078;). This feedback can help align the document base with real user expectations and optimize prompts.</p></li><li><p><strong>Exploring Other RAG Variants</strong></p><p>The tool we&#8217;ve created corresponds to the most basic level of RAG (referred to as Naive RAG), but there are many variants to explore<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>. I&#8217;m particularly interested in trying out graphRAG<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a> &#8212; an idea for a future project.</p></li></ul><h2><strong>Conclusion</strong></h2><p>In this article series, you&#8217;ve witnessed the birth of an LLM-RAG web application. Our newborn has taken its first cry, but there&#8217;s still much work to be done before it becomes a fully developed, reasonable, and high-performing adult! As with raising a child: testing, adjusting, iterating &#8230; these are the keys to a successful education!</p><p>I hope this journey through knowledge bases has inspired you with ideas to enrich your daily practice. Don&#8217;t hesitate to share your best use cases with me.</p><p><em>As always &#8230; Let&#8217;s Rock with Data &#8230; </em>&#129304;</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5><a href="https://openai.com/">https://openai.com/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5><a href="https://www.anthropic.com/">https://www.anthropic.com/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5><a href="https://www.perplexity.ai/">https://www.perplexity.ai/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>IBM - What is a prompt injection attack ? - <a href="https://www.ibm.com/topics/prompt-injection">https://www.ibm.com/topics/prompt-injection</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>RiskInsight - Quand les mots deviennent des armes - <a href="https://www.riskinsight-wavestone.com/2023/10/quand-les-mots-deviennent-des-armes-prompt-injection-et-intelligence-artificielle/">https://www.riskinsight-wavestone.com/2023/10/quand-les-mots-deviennent-des-armes-prompt-injection-et-intelligence-artificielle/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>FDA - Software as a Medical Device - <a href="https://www.fda.gov/medical-devices/digital-health-center-excellence/software-medical-device-samd">https://www.fda.gov/medical-devices/digital-health-center-excellence/software-medical-device-samd</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><h5>Jayanth Krishnaprakash - Types of RAG: An Overview - <a href="https://blog.jayanthk.in/types-of-rag-an-overview-0e2b3ed71b82">https://blog.jayanthk.in/types-of-rag-an-overview-0e2b3ed71b82</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><h5>Neo4j - <strong>GraphRAG Field Guide: Navigating the World of Advanced RAG Patterns - </strong><a href="https://neo4j.com/developer-blog/graphrag-field-guide-rag-patterns/">https://neo4j.com/developer-blog/graphrag-field-guide-rag-patterns/</a></h5><p>The entire code is available on my <a href="https://github.com/arnaud-dg/FDA510k">GitHub repo</a>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - AI Medical Devices Knowledge Base - Part 2: The RAG concept]]></title><description><![CDATA[The Pharm'AI Company - Project #4]]></description><link>https://databoostindustry.substack.com/p/eng-ai-medical-devices-knowledge-466</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-ai-medical-devices-knowledge-466</guid><pubDate>Tue, 22 Oct 2024 14:45:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In my previous communication, I had the opportunity to present the <a href="https://databoostindustry.substack.com/p/eng-ai-medical-devices-knowledge">concept of LLM-RAG and knowledge bases</a>. Now, in this new article, we will explore the mechanics of how a RAG works and how to deploy one relatively quickly.</p><h2>How does a RAG work?</h2><p>To illustrate the concept of RAG, I couldn't find a better visual than the one proposed by Aurimas Griciunas in his newsletter: <a href="https://www.swirlai.com/">Swirlai.com</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. I will rely on it to explain the various stages of the project.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Omhf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Omhf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Omhf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Omhf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Omhf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Omhf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg" width="603" height="656.010989010989" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1584,&quot;width&quot;:1456,&quot;resizeWidth&quot;:603,&quot;bytes&quot;:554837,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Omhf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Omhf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Omhf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Omhf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbc31ba-dfc4-4ad2-bf07-930bf8c6eebb_2126x2313.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>et's revisit the acronym RAG, which stands for <strong>Retrieval-Augmented Generation</strong>. In this acronym, we see two important facets:</p><ul><li><p>The &#8220;<strong>Generation</strong>&#8221; aspect, which involves the use of a Large Language Model (LLM).</p></li><li><p>The &#8220;<strong>Retrieval</strong>&#8221; aspect, meaning information retrieval. It refers to building a library where we store fragments of information. When a question is asked, the first step is to retrieve the fragments of information relevant to the question and provide them to the LLM to help it construct a targeted response. This is why we speak of "augmentation," meaning enrichment, of the generative AI model&#8217;s answers.</p></li></ul><p>Let&#8217;s go through this process step by step to better understand it.</p><h2>Step 1: The initial creation of the vector database</h2><p>The following steps are not performed routinely. They are completed once at the beginning of the project and during data update operations.</p><h4>1.1. Document retrieval</h4><p>I&#8217;ve already mentioned web scraping in a previous article. I used this technique again to search the FDA website for relevant documents. From the list of all AI-based medical devices, it is possible to retrieve all the &#8220;510(k) Number&#8221; case numbers and then bulk-download the associated PDF files.</p><h4>1.2. Text fragmentation</h4><p>Usually, documents are divided into several text segments, referred to as "chunks" (F). By doing this, we ensure that small fragments of text containing limited specific information are extracted. It's more effective to answer a precise question using small pieces of information.</p><p>As shown in the diagram below, we ensure that the fragments overlap to avoid edge effects, ensuring that a key sentence isn't arbitrarily cut in half.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ErWJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ErWJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ErWJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ErWJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ErWJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ErWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg" width="867" height="462" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:462,&quot;width&quot;:867,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:160150,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ErWJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ErWJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ErWJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ErWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61d56473-a772-4337-a425-1463c1e419e4_867x462.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>1.3. Vectorizing text fragments</h4><p>This part touches on the somewhat magical aspect of Data Science, but it&#8217;s possible to vectorize any word or text section. Vectorization transforms the text into a numerical representation that machines can process<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lz2g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lz2g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lz2g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lz2g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lz2g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lz2g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg" width="786" height="529" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:529,&quot;width&quot;:786,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:92700,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!lz2g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg 424w, https://substackcdn.com/image/fetch/$s_!lz2g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg 848w, https://substackcdn.com/image/fetch/$s_!lz2g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!lz2g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a58337d-7929-4196-935b-751b534bdef3_786x529.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This vector encapsulates the concepts, semantics, nuances, relationships, and more present in the text.</p><p>There are many different ways to vectorize text; for this project, I used the EMBED_TEXT_768<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> function in Snowflake, which converts an English text into a 768-dimensional vector (C).</p><p>Thus, in the database, each document has both the text fragment (<strong>Chunk</strong>) and its vectorized equivalent (Chunk_vec). The result might seem obscure to humans but is highly structured and suited to machines.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q0uC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q0uC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg 424w, https://substackcdn.com/image/fetch/$s_!q0uC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg 848w, https://substackcdn.com/image/fetch/$s_!q0uC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!q0uC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q0uC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg" width="1456" height="382" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:382,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207499,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!q0uC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg 424w, https://substackcdn.com/image/fetch/$s_!q0uC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg 848w, https://substackcdn.com/image/fetch/$s_!q0uC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!q0uC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9d67043-b5f5-4d35-868c-12626d7de144_1472x386.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>1.4. Indexing in a vector database</h4><p>Now that all our text fragments have been vectorized, they can be stored in a dedicated space called a vector database (D).</p><p>The vector itself determines its position in the database based on the numbers it comprises. The consequence (and this is exactly what we aim for) is that fragments of text with similar content will be stored close to each other, as illustrated by this figure<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sreg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sreg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Sreg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Sreg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Sreg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sreg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg" width="475" height="485.34013605442175" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:751,&quot;width&quot;:735,&quot;resizeWidth&quot;:475,&quot;bytes&quot;:177952,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Sreg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Sreg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Sreg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Sreg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd48d00-9dc2-42de-9a69-d8f3886c150f_735x751.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Step 2: The life of a query</h2><p>With these fundamentals clear regarding the vector database, it&#8217;s relatively easy to understand what happens for each query. These operations are performed routinely whenever the LLM is called upon.</p><h4>2.1. The initial query</h4><p>As usual, we&#8217;ll pose the question to the LLM. This step remains unchanged.</p><h4>2.2. Vectorizing the initial query</h4><p>Just like the text fragments were vectorized to build our knowledge base, the query we address to the LLM will also be vectorized (E) using the same function as before (C). With mathematical techniques (like &#8220;Cosine similarity,&#8221; which measures similarity between two vectors) or clustering algorithms, we&#8217;ll search for the closest vectorized fragments to our query vector.</p><p>We can choose to select a variable number of fragments (2, 3, 5, etc.); this is particularly interesting to introduce nuance and diversity into the LLM's response.</p><h4>2.3. Enriching the prompt</h4><p>Once we&#8217;ve identified and selected the relevant chunks, we can enrich the initial query by feeding these chunks into the prompt. We then ask the LLM to specifically use the contextual elements we just provided to build its answer (B).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EHqY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EHqY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png 424w, https://substackcdn.com/image/fetch/$s_!EHqY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png 848w, https://substackcdn.com/image/fetch/$s_!EHqY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png 1272w, https://substackcdn.com/image/fetch/$s_!EHqY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EHqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png" width="330" height="258.3333333333333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:310,&quot;width&quot;:396,&quot;resizeWidth&quot;:330,&quot;bytes&quot;:98855,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!EHqY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png 424w, https://substackcdn.com/image/fetch/$s_!EHqY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png 848w, https://substackcdn.com/image/fetch/$s_!EHqY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png 1272w, https://substackcdn.com/image/fetch/$s_!EHqY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96dfea1c-4d05-4a3c-8127-4c67d82ac7b2_396x310.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>2.4. The LLM&#8217;s response</h4><p>Nothing new for this step (A), which proceeds as usual.</p><h2>Implementing it on Snowflake</h2><p>For this project, I chose to use a platform called Snowflake<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>.</p><p>Snowflake is one of the titans in the Data world today, and I wanted to share a few insights about it (NB: this is factual, the article is not sponsored &#128540;!). It&#8217;s a platform that allows for storing, processing, and analyzing large amounts of data. Since it operates in the cloud, there&#8217;s no need to manage the hardware or software infrastructure.</p><p>Snowflake is known for its ability to handle all stages of the data lifecycle on a single platform<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>. Here&#8217;s how it acts as an orchestrator:</p><ul><li><p><strong>Data ingestion:</strong> Snowflake enables data collection, whether structured (like databases) or unstructured (like files, images, or data from third-party applications).</p></li><li><p><strong>Data storage:</strong> Once data is collected, it can be stored in either a data warehouse or a data lake, depending on its nature and format.</p></li><li><p><strong>Data manipulation:</strong> Snowflake offers tools to transform, analyze data, or even apply AI models. A notable feature is that computing tasks are independent of data storage, allowing different teams to work simultaneously without disrupting other processes.</p></li><li><p><strong>Data redistribution:</strong> After manipulation, the data can be made available as results for third-party applications, end users, or transferred to other systems.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Keay!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Keay!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Keay!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Keay!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Keay!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Keay!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg" width="1200" height="692" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:692,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Keay!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Keay!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Keay!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Keay!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57ce5292-6435-4f99-9867-942e6e2f4e73_1200x692.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In addition to these features, Snowflake has layers to manage metadata, access rights, logs, budgets, etc. Snowflake also promotes its scalability, meaning its computing power adjusts to demand, as do operating costs.</p><p>On the downside, one major issue is the cost, which can quickly rise, especially if data isn&#8217;t rationalized or project strategy isn&#8217;t well-defined.</p><p>Personally, I enjoy using Snowflake because it&#8217;s relatively simple and allows me to create clean, industrial-grade solutions in Data Engineering (which involves all the steps for making data available). Data Engineering can be overwhelming with many tools and specialized programming languages: Spark, Scala, Rust, dbt, etc. &#129327;</p><p>Having a single, scalable solution that works with Python and SQL allows me to build prototypes within a few hours, which suits me perfectly.</p><p>For this project, I heavily relied on the following tutorial from Snowflake: <strong><a href="https://quickstarts.snowflake.com/guide/ask_questions_to_your_own_documents_with_snowflake_cortex_search/#0">Build a Retrieval Augmented Generation (RAG) based LLM assistant using Streamlit and Snowflake Cortex</a></strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a><strong>.</strong></p><h2>Conclusion</h2><p>Our journey into the dark and jargon-filled side of the force has come to an end (no more &#8220;embedding&#8221; and &#8220;vector databases&#8221;&#129327;) ! I hope I didn&#8217;t lose you along the way.</p><p>Even if it's a bit demanding, I was keen to break down the various components of an LLM-RAG. I truly believe these tools will become pervasive in the professional world, and it's crucial to understand how they work and the scientific foundations behind them.</p><p>Indeed, when we break the apparent magic of these LLMs, we&#8217;re better able to grasp their limitations and implicitly place them in their proper context: <strong>as practical tools, not oracles of truth.</strong></p><p>In the final part of this series, I will present the demo version of my application, and you&#8217;ll be able to test it yourself.</p><p>Let&#8217;s rock with Data &#129304;&#8230; <em>and &#8220;RAG&#8221; (-a-muffin)</em> &#128540;</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>Aurimas Griciunas - <a href="https://www.swirlai.com/">Swirlai.com</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>Scaler Topics - Word embeddings with Tensrflow - <a href="https://www.scaler.com/topics/tensorflow/tensorflow-word-embeddings/">https://www.scaler.com/topics/tensorflow/tensorflow-word-embeddings/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Technical documentation Snowflake EMBED_TEXT_768 - <a href="https://docs.snowflake.com/en/sql-reference/functions/embed_text-snowflake-cortex">https://docs.snowflake.com/en/sql-reference/functions/embed_text-snowflake-cortex</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>nlpcloud - <strong>Creation of a RAG model : <a href="https://nlpcloud.com/fr/fine-tuning-semantic-search-model-with-sentence-transformers-for-rag-application.html">https://nlpcloud.com/fr/fine-tuning-semantic-search-model-with-sentence-transformers-for-rag-application.html</a></strong></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>Snowflake Website - <a href="https://www.snowflake.com/en/emea/">https://www.snowflake.com/en/emea/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>The How and Why of Modernizing Your Data Platform - <a href="https://avantage.com/snowflake/">https://avantage.com/snowflake/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><h5>Snowflake BUILD - LLM &amp; RAG : <a href="https://quickstarts.snowflake.com/guide/ask_questions_to_your_own_documents_with_snowflake_cortex_search/#0">https://quickstarts.snowflake.com/guide/ask_questions_to_your_own_documents_with_snowflake_cortex_search/#0</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - AI Medical Devices Knowledge Base - Part 1 : The project brief]]></title><description><![CDATA[The Pharm'AI Company - Project #4]]></description><link>https://databoostindustry.substack.com/p/eng-ai-medical-devices-knowledge</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-ai-medical-devices-knowledge</guid><pubDate>Tue, 15 Oct 2024 16:02:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!vNtg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>After the project on <a href="https://databoostindustry.substack.com/p/eng-workplace-accidents-analysis">analyzing workplace accidents using language models</a> (or LLMs), I would like to propose a new article to explore together another application of these technologies. We will discover how LLMs enable the construction of a knowledge base and how to interact with it.</p><h2><strong>Introduction</strong></h2><p>We often see the visual below to illustrate the interconnection between different AI technologies<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vNtg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vNtg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png 424w, https://substackcdn.com/image/fetch/$s_!vNtg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png 848w, https://substackcdn.com/image/fetch/$s_!vNtg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png 1272w, https://substackcdn.com/image/fetch/$s_!vNtg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vNtg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png" width="850" height="550" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:550,&quot;width&quot;:850,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:412400,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!vNtg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png 424w, https://substackcdn.com/image/fetch/$s_!vNtg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png 848w, https://substackcdn.com/image/fetch/$s_!vNtg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png 1272w, https://substackcdn.com/image/fetch/$s_!vNtg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86b66ebf-363b-4c38-bf12-dc9926e80b20_850x550.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This diagram shows that Generative AI is an integral part of Deep Learning and Machine Learning. Within the circle corresponding to generative AI, we obviously find LLMs, which rely on very specific neural network structures called transformers. Their unique mechanisms give them specific properties.</p><p>Generally, I simplify things by explaining that:</p><ul><li><p>The role of ML and DL models is to <strong>transform data into information</strong>; the goal is usually to predict phenomena or support decision-making with a factual basis.</p></li><li><p>On the other hand, Gen AI models, particularly for text processing, can <strong>enrich the information to make it "information ++".</strong></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HfVl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HfVl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!HfVl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!HfVl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!HfVl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HfVl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:604369,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!HfVl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!HfVl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!HfVl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!HfVl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23765e5f-f008-42c9-89cc-8221ea1d9df1_2560x1440.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s a bit of a metaphor! What I mean is that by asking an AI to summarize a text, extract an action plan, or answer specific questions, we concentrate the information&#8212;the value&#8212;that was previously diluted in the text.</p><p>However, LLMs present certain challenges, especially regarding reliability. One of the main issues, inherent to how these models work, is what we trivially call <strong>hallucinations</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. Since these models are trained on a vast dataset from the internet, they often lack precision when asked specific questions.</p><p>Try this test: ask a standard LLM to:</p><ul><li><p>Create a training plan for your team.</p></li><li><p>List the challenges faced during the development of effervescent paracetamol.</p></li><li><p>Produce a risk analysis for cleaning validations on your site.</p></li></ul><p>In most cases, the result is quite disappointing, providing content that seems plausible but is devoid of substance and specificity.</p><p>To improve relevance, these systems need to be customized by injecting external, specific knowledge from your business context.</p><h2><strong>Data Injection Techniques</strong></h2><p>There are many ways to customize an LLM, but we will focus on two main families:</p><ol><li><p><strong>Fine-tuning:</strong> This involves taking a pre-trained model and retraining it on new personalized data. This allows the LLM to better understand your sector-specific concepts. However, this approach also risks distorting the model, slightly altering its overall skills in favor of its specialization, as some of the model&#8217;s underlying parameters are modified and replaced.</p></li><li><p><strong>Retrieval-Augmented Generation (RAG):</strong> This method combines the native capabilities of a pre-trained LLM with a real-time information retrieval system. The information is vectorized and stored in an external database, and the LLM fetches relevant data to format it. The primary advantage of RAG is that it doesn&#8217;t modify the model's base structure, so there&#8217;s no risk of altering its syntactic capabilities<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k1gX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k1gX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png 424w, https://substackcdn.com/image/fetch/$s_!k1gX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png 848w, https://substackcdn.com/image/fetch/$s_!k1gX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png 1272w, https://substackcdn.com/image/fetch/$s_!k1gX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k1gX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png" width="653" height="639" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:639,&quot;width&quot;:653,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:111962,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!k1gX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png 424w, https://substackcdn.com/image/fetch/$s_!k1gX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png 848w, https://substackcdn.com/image/fetch/$s_!k1gX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png 1272w, https://substackcdn.com/image/fetch/$s_!k1gX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F284e7519-3583-4f07-9038-66ccb28aa50b_653x639.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To conclude with a metaphor: imagine our LLM is a chef specializing in French cuisine:</p><ul><li><p>Fine-tuning would be like sending the chef for a one-year training in Chinese cuisine. They would learn new recipes but might lose some of their basic techniques and gestures.</p></li><li><p>RAG would be like providing the chef with a giant library of Chinese cookbooks, allowing them to explore Chinese subtleties while retaining their French culinary knowledge.</p></li></ul><p>In our case, I&#8217;ve chosen the RAG method because it is relatively simple and quick to implement on a private server. The risk of degrading the model is absent, avoiding additional complexity.</p><h2><strong>The LLM-RAG Duo: What are the Pharmaceutical Applications?</strong></h2><p>Here are a few use cases scattered throughout the drug lifecycle:</p><ol><li><p><strong>Technology Watch in R&amp;D and Regulatory Monitoring</strong></p></li></ol><p>With these tools, you can quickly and automatically scan a corpus of texts such as scientific articles, patents, or official documents. This allows you to manipulate, connect, and summarize information, even if it&#8217;s spread across different documents. LLM-RAG systems are increasingly used to identify new indications for known active substances, known as drug repurposing.</p><p>To illustrate this use case for regulatory aspects, one could cite the Ring platform<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> available on the A3P website<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>.</p><ol start="2"><li><p><strong>Knowledge Management</strong></p></li></ol><p>Knowledge management<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> is a key process in the pharmaceutical industry, involving capturing, organizing, storing, and sharing accumulated knowledge throughout a product&#8217;s lifecycle (from R&amp;D to commercialization). In reality, this is a complex task requiring a highly mature organization, as it is often difficult to break down silos between different organizations, which do not work with the same documents or follow the same timelines.</p><p>An LLM-RAG duo appears as a perfectly suited solution to bridge these different data sources<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>.</p><ol start="3"><li><p><strong>Organizing Computer Servers</strong></p></li></ol><p>In my early industrial years, I had the opportunity to deploy the 5S methodology<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>, which involves physically organizing production workshops and work areas. Spoiler alert! Often our digital files are worse than our workshops, and it&#8217;s very difficult to find key information. LLM-RAG systems can slice through and explore massive amounts of documents, with the luxury of bypassing language barriers, as they can integrate any original language.</p><ol start="4"><li><p><strong>Post-Marketing Surveillance and Pharmacovigilance</strong></p></li></ol><p>LLM-RAG systems can continuously monitor adverse event reports, analyze external databases such as PubMed, and assess emerging risks by integrating up-to-date information. This allows for rapid responses to safety alerts.</p><p>In short, I&#8217;ve limited myself to four examples, but I could have listed many more: recurrence analysis, information extraction, knowledge preservation, training, etc.</p><p>Once again, the only limit is our imagination, as there are numerous topics where we seek to cross-reference and refine information to convert it into structured knowledge (= "Wisdom" in the diagram below) :</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8922!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8922!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png 424w, https://substackcdn.com/image/fetch/$s_!8922!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png 848w, https://substackcdn.com/image/fetch/$s_!8922!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png 1272w, https://substackcdn.com/image/fetch/$s_!8922!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8922!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png" width="728" height="243.0506329113924" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:211,&quot;width&quot;:632,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:182732,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!8922!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png 424w, https://substackcdn.com/image/fetch/$s_!8922!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png 848w, https://substackcdn.com/image/fetch/$s_!8922!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png 1272w, https://substackcdn.com/image/fetch/$s_!8922!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7749ca33-9efd-4e5e-ac0b-6a5a312967ac_632x211.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2><strong>Building a Regulatory Knowledge Base</strong></h2><p>I will now describe the project we will undertake in this series of articles. I&#8217;ve decided to create a knowledge base, which I will associate with a conversational agent (or chatbot), on the subject of regulatory submissions for medical devices.</p><p>I chose this topic because the data is easily accessible as open data on the FDA website<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a>. Over 950 medical devices using artificial intelligence are listed on the site. These can include diagnostic tools, radiography analysis devices, cardiovascular medical devices, etc.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-TxI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-TxI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-TxI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-TxI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-TxI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-TxI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg" width="996" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:996,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:120194,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!-TxI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-TxI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-TxI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-TxI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1434df6-a49f-4605-9260-78ea6ba6b7b0_996x764.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For each of them, a PDF summary is available, which constitutes an overview of the submission file validated by FDA teams (description, therapeutic indications, training data, verification tests, etc.). The content is thus verified and of high quality. </p><h2><strong>Conclusion</strong></h2><p>I hope this article has clarified the key concepts of LLMs and RAG.</p><p>The next article will be a bit more technical, as I will show you how to retrieve, slice, vectorize, and inject nuggets of knowledge into a RAG architecture. The exercise will be conducted using the Snowflake platform.</p><p><em>&#8230; Let&#8217;s Rock with Data &#8230; and RAG-a-muffin </em>&#129304;</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>Zhuhadar, Lily &amp; Lytras, Miltiadis. (2023). The Application of AutoML Techniques in Diabetes Diagnosis: Current Approaches, Performance, and Future Directions. Sustainability. <a href="https://www.researchgate.net/publication/373797588_The_Application_of_AutoML_Techniques_in_Diabetes_Diagnosis_Current_Approaches_Performance_and_Future_Directions">15. 13484. 10.3390/su151813484</a>. </h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5><em><strong>LLMs Will Always Hallucinate, and We Need to Live With This, Sourav Banerjee, Ayushi Agarwal, Saloni Singla, <a href="https://arxiv.org/html/2409.05746v1">https://arxiv.org/html/2409.05746v1</a></strong></em></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Mohamed Sarfraz Nawaz - <strong>Rag Vs Finetuning: Which Is Best For Your LLM Applications? : </strong><a href="https://www.ampcome.com/articles/rag-vs-finetuning-which-is-best-for-your-llm-applications">https://www.ampcome.com/articles/rag-vs-finetuning-which-is-best-for-your-llm-applications</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>Ring - Regulatory INtelligence Guard : <a href="https://ring.a3p.org/">https://ring.a3p.org/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>A3P : <a href="https://www.a3p.org/">https://www.a3p.org/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>ICH guideline Q10 on pharmaceutical quality system : <a href="https://www.ema.europa.eu/en/documents/scientific-guideline/international-conference-harmonisation-technical-requirements-registration-pharmaceuticals-human-guideline-q10-pharmaceutical-quality-system-step-5_en.pdf">Url </a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><h5>Kernan Freire S, Wang C, Foosherian M, Wellsandt S, Ruiz-Arenas S, Niforatos E. Knowledge sharing in manufacturing using LLM-powered tools: user study and model benchmarking. Front Artif Intell. 2024 Mar 27;7:1293084. doi: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11004332/">10.3389/frai.2024.1293084. PMID: 38601111; PMCID: PMC11004332.</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><h5>Wikipedia - LE 5S : <a href="https://fr.wikipedia.org/wiki/5S">https://fr.wikipedia.org/wiki/5S</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><h5>FDA - <strong>Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices : </strong><a href="https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices">https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - CPV 4.0 - Part 3 : Multivariate analysis for better understanding]]></title><description><![CDATA[The Pharm'AI Company - Project #3]]></description><link>https://databoostindustry.substack.com/p/eng-cpv-40-part-3-multivariate-analysis</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-cpv-40-part-3-multivariate-analysis</guid><pubDate>Mon, 07 Oct 2024 10:30:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!56Hu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>After sharing in the previous article some solutions to <a href="https://databoostindustry.substack.com/p/eng-cpv-40-part-2-automate-the-cpv">make a traditional CPV approach more efficient</a>, we will now explore the use of multivariate statistical tools to better characterize and understand manufacturing processes.</p><p>In statistical jargon, a "multivariate" analysis (sometimes referred to as MVDA for MultiVariate Data Analysis) refers to the study of several variables simultaneously in order to understand the relationships between them. Multivariate analysis allows us to highlight interactions, correlations, and patterns that would not be observable by examining the variables individually.</p><h2>How to conduct a multivariate analysis?</h2><p>Here&#8217;s how I structured the page dedicated to multivariate analysis in <a href="https://q927ad-arnaud-duigou.shinyapps.io/Shiny_app_CPV40/">CPV 4.0</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!56Hu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!56Hu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png 424w, https://substackcdn.com/image/fetch/$s_!56Hu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png 848w, https://substackcdn.com/image/fetch/$s_!56Hu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png 1272w, https://substackcdn.com/image/fetch/$s_!56Hu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!56Hu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png" width="1456" height="1170" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1170,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:305367,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!56Hu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png 424w, https://substackcdn.com/image/fetch/$s_!56Hu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png 848w, https://substackcdn.com/image/fetch/$s_!56Hu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png 1272w, https://substackcdn.com/image/fetch/$s_!56Hu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37dce76a-5008-46c0-bcfe-f5c9793c8841_1494x1201.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As you can see, the left sidebar has evolved. Previously, it was possible to select certain quality attributes to monitor them specifically. This individual monitoring is no longer relevant, since we are now moving to a holistic monitoring of the entire process from maintenance onward.</p><p>The available panel allows for the selection of:</p><ul><li><p>a time window</p></li><li><p>a product/dosage</p></li><li><p>an analysis scope</p></li></ul><p>I also envisioned the possibility of performing the analysis on different scopes by selecting: CMA, CPP, and/or CQA.</p><h2>Let&#8217;s Start Simply: Correlation Analysis</h2><p>The upper part of our interface serves as an introduction to the MVDA. It contains two charts:</p><ul><li><p><strong>A heatmap:</strong></p></li></ul><p>The heatmap is a two-dimensional chart where the correlation coefficient between two variables is displayed. This is why we see a diagonal made up of values of &#8220;1&#8221;.</p><ul><li><p>If the coefficient is equal to 0, it means that the two factors are independent of each other.</p></li><li><p>If the coefficient is positive, it means that an increase in variable x1 induces an increase in variable x2.</p></li><li><p>If the coefficient is negative, it means that an increase in variable x1 induces a decrease in variable x2.</p></li></ul><p>Of course, the closer the coefficient is to -1 or 1, the stronger the intensity of the correlation. In addition to this diagram, there is a hierarchical clustering tree, or dendrogram<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. This tree groups variables into blocks that are close to each other. With this tool, we have a preliminary view of the dynamics at play in our dataset.</p><ul><li><p><strong>A scatter plot:</strong></p></li></ul><p>As we just saw, the heatmap summarizes the intensity of the relationship between two factors in a numerical value, but it doesn&#8217;t provide more information about the shape of this correlation. For this reason, it is possible to select, on the right-hand side, the two variables of interest to visualize their distribution as a scatter plot.</p><div><hr></div><p>The two charts I have just described do not actually constitute an MVDA. However, I still chose to implement them, because the need to visualize simple correlations is a healthy and recommended first step toward more advanced analyses.</p><h2>A First MVDA Tool - Clustering</h2><p>Clustering is a data analysis technique that groups a set of points into homogeneous sub-groups called clusters<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>.</p><p>Here is a graphical representation of a clustering; each point corresponds to a production batch:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3U8N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3U8N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png 424w, https://substackcdn.com/image/fetch/$s_!3U8N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png 848w, https://substackcdn.com/image/fetch/$s_!3U8N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png 1272w, https://substackcdn.com/image/fetch/$s_!3U8N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3U8N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png" width="596" height="403" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:403,&quot;width&quot;:596,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33635,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!3U8N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png 424w, https://substackcdn.com/image/fetch/$s_!3U8N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png 848w, https://substackcdn.com/image/fetch/$s_!3U8N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png 1272w, https://substackcdn.com/image/fetch/$s_!3U8N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffa4e8cc-e2a0-4c91-b5b9-11f4de8a194e_596x403.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This chart allows us to visualize what could be called "the degree of overall similarity" that exists between the batches. <strong>Batches that are close to each other in this chart resemble each other across all their characteristics.</strong></p><div><hr></div><p><strong>A Quick Aside</strong>: To obtain this type of chart, a data transformation step must first be performed; this is why the x and y axes have such peculiar names. This transformation method is a statistical method I am particularly fond of, called PCA (Principal Component Analysis). I won't go into detail about this methodology, as it deserves an entire article on its own. I will just explain its basic concepts.</p><p>The complete dataset we are working with contains 23 variables<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. The principle of PCA is to apply dimensionality reduction to the dataset, meaning that we artificially create new axes, called <strong>principal components</strong>, and project our entire dataset onto this new two-dimensional space<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5peL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5peL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5peL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5peL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5peL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5peL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg" width="1452" height="425" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:425,&quot;width&quot;:1452,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:126490,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!5peL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5peL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5peL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5peL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0350fa-dcfa-487d-961d-a4fa0668ac77_1452x425.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>By projecting 23 columns onto a two-dimensional space, we intuitively understand that some of the subtlety hidden in our data will be lost. It&#8217;s a bit like compressing a photograph and accepting a slight blur. To use another analogy, I often describe PCA as a roadmap, where only the major roads are presented. It&#8217;s a simplified way of perceiving reality while ignoring certain superficial details.</p><p>To learn more, I invite you to watch the following french video<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>.</p><div><hr></div><p><strong>A Commented Example: </strong>By selecting data related to the product named &#8220;Product 1,&#8221; we see that there are four to five different clusters within the population of batches.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q6qr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q6qr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png 424w, https://substackcdn.com/image/fetch/$s_!q6qr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png 848w, https://substackcdn.com/image/fetch/$s_!q6qr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!q6qr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q6qr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png" width="1456" height="918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:918,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:609101,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!q6qr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png 424w, https://substackcdn.com/image/fetch/$s_!q6qr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png 848w, https://substackcdn.com/image/fetch/$s_!q6qr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!q6qr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e61a547-6b53-4c71-8145-c8141ed684e1_2283x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The analysis becomes even more relevant when this visualization is enriched by displaying additional variables. For example, one could overlay value gradients on a clustering diagram, or even cross a clustering diagram with references to raw materials.</p><p>In this illustrated example, one can, for instance, observe a clear link between the API reference and the impurity level. From experience, this type of analysis is far more useful for guiding investigations than a statistical test or a boxplot diagram, which can often be misleading as they do not reveal the &#8220;shape&#8221; of the data.</p><h2>A Second MVDA Tool - Hotelling's T&#178;</h2><p>The first tool we just discussed is well suited for visualizing and investigating disparities, but it is not the ideal visualization for alerting on deviations and managing a process.</p><p>For this reason, I have implemented another visualization, which is based on a statistical test: <strong>Hotelling&#8217;s T&#178;</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a><strong>.</strong></p><p>Hotelling&#8217;s T&#178; is the multivariate equivalent of the Student&#8217;s t-test but applied simultaneously to all the variables. This score allows us to compare, using a single criterion, all the parameters of a batch to the average vector of the parameters of other batches. This makes it possible to detect if a batch behaves differently from all the others, considering all variables at once.</p><div><hr></div><p><strong>An Illustrated Example</strong>: Here is a simplified yet illustrative example proving that, with the T&#178; technique, it is possible to visualize variations that cannot be detected on our traditional control charts.</p><p>The red point below clearly appears as an outlier. However, there is a high probability that it would not be identified in individual histograms, as it is not an extreme value in the various distributions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1jCR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1jCR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!1jCR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!1jCR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!1jCR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1jCR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png" width="703" height="395.4375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:703,&quot;bytes&quot;:351742,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!1jCR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!1jCR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!1jCR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!1jCR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96aa1f15-171e-4342-aa37-80b6d186dad3_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With Hotelling&#8217;s T&#178;, we can clearly see that this aberrant point differs from the others. And it becomes even clearer when this point is plotted on a control chart with a threshold corresponding to the 95% confidence interval.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Q_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Q_R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png 424w, https://substackcdn.com/image/fetch/$s_!2Q_R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png 848w, https://substackcdn.com/image/fetch/$s_!2Q_R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png 1272w, https://substackcdn.com/image/fetch/$s_!2Q_R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Q_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png" width="505" height="329.845467032967" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:951,&quot;width&quot;:1456,&quot;resizeWidth&quot;:505,&quot;bytes&quot;:117919,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!2Q_R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png 424w, https://substackcdn.com/image/fetch/$s_!2Q_R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png 848w, https://substackcdn.com/image/fetch/$s_!2Q_R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png 1272w, https://substackcdn.com/image/fetch/$s_!2Q_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722dc7e-7dbb-46a5-b70f-f19f043a673d_1686x1101.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It happens that certain CQAs show atypical values, and it is never easy to determine whether these deviations are isolated or not. With the T&#178; method, it becomes much more comfortable to assess whether a batch is truly atypical.</p><h2>The Contribution of Machine Learning</h2><p>This article has already covered many advanced concepts. Therefore, I will not provide a detailed review of all the charts on the &#8220;<em>Machine Learning &amp; Prediction</em>&#8221; page. Instead, I will just explain why machine learning is a relevant tool for CPV.</p><p>Machine learning involves creating predictive models, based on real data, that link the different variables together. Regardless of the type of algorithm used, it is possible, after training, to display a diagram showing the importance of the various variables (or &#8220;<em>Feature Importance</em>&#8221;).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P0o8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P0o8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png 424w, https://substackcdn.com/image/fetch/$s_!P0o8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png 848w, https://substackcdn.com/image/fetch/$s_!P0o8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png 1272w, https://substackcdn.com/image/fetch/$s_!P0o8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P0o8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png" width="590" height="395" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/551166aa-e082-4a92-b460-2088024333c5_590x395.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:395,&quot;width&quot;:590,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27693,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!P0o8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png 424w, https://substackcdn.com/image/fetch/$s_!P0o8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png 848w, https://substackcdn.com/image/fetch/$s_!P0o8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png 1272w, https://substackcdn.com/image/fetch/$s_!P0o8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F551166aa-e082-4a92-b460-2088024333c5_590x395.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One of the major benefits of machine learning is the<strong> ability to prioritize the contribution of different variables </strong>in predicting a CQA.</p><p>I find this particularly interesting because it gives more depth to our investigations and action plans. This allows us to prioritize improvement efforts in an objective, science-based manner. I believe this is a significant advantage in risk management and aligns perfectly with the approach described in the ICH guidelines<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>.</p><blockquote><p><em><strong>The evaluation of the risk to quality should be based on scientific knowledge and ultimately link to the protection of the patient.</strong></em></p><p><em><strong>The level of effort, formality and documentation of the quality risk management process should be commensurate with the level of risk.</strong></em></p></blockquote><h2>Conclusion</h2><p>We have reached the end of a project that I loved working on, as it stands at the crossroads of everything I enjoy about my profession. I hope that, through this journey into graphs and numbers, I have convinced you of the value of data-driven process control.</p><p>There were two facets to this project related to quality, each bringing its own set of improvements:</p><ol><li><p>Digitalization in the service of CPV process efficiency and performance.</p></li><li><p>Digitalization in the service of knowledge and understanding of the targeted process.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!koAQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!koAQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!koAQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!koAQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!koAQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!koAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:223743,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!koAQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!koAQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!koAQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!koAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47bdd612-e238-49a5-a1b4-b2c6acaa060b_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I hope I succeeded in combining all these criteria within the same tool!</p><p>Does this philosophy resonate with your approach to data analysis and process optimization? Feel free to share your experiences and thoughts in the comments.</p><p><em>Let&#8217;s Rock with Data to improve CPVs &#127926; &#129304;</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>Plotly - Tecnical documentation : <a href="https://plotly.com/python/clustergram/">https://plotly.com/python/clustergram/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>L<strong>a revue IA - Qu&#8217;est-ce que le clustering ? : </strong><a href="https://larevueia.fr/clustering-les-3-methodes-a-connaitre/">https://larevueia.fr/clustering-les-3-methodes-a-connaitre/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>&#381;agar J, Miheli&#269; J. Big data collection in pharmaceutical manufacturing and its use forproduct quality predictions. Sci Data. 2022 Mar 23;9(1):99. doi: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8943063/">10.1038/s41597-022-01203-x</a>. PMID: 35322032; PMCID: PMC8943063.</h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>Basak, Hritam &amp; Roy, Alik &amp; Lahiri, Jeet &amp; Bose, Sayantan &amp; Patra, Soumyadeep. (2021). SVM and ANN based Classification of EMG signals by using PCA and LDA. <a href="https://arxiv.org/pdf/2110.15279">10.48550/arXiv.2110.15279. </a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>Datascientest - Analyse en Composantes Principales : <a href="https://www.youtube.com/watch?v=ilWeGsud OGY">Vid&#233;o yotube</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>Wikipedia - Hotelling&#8217;s T&#178;: <a href="https://en.wikipedia.org/wiki/Hotelling%27s_T-squared_distribution">https://en.wikipedia.org/wiki/Hotelling%27s_T-squared_distribution</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><h5>Hotelling R package documentation : <a href="https://cran.r-project.org/web/packages/Hotelling/Hotelling.pdf">https://cran.r-project.org/web/packages/Hotelling/Hotelling.pdf</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><h5>ICH Guide ICH Q9 R1 : <a href="https://www.ema.europa.eu/en/documents/scientific-guideline/international-conference-harmonisation-technical-requirements-registration-pharmaceuticals-human-use-ich-guideline-q9-quality-risk-management-step-5-first-version_en.pdf">https://www.ema.europa.eu/en/documents/scientific-guideline/international-conference-harmonisation-technical-requirements-registration-pharmaceuticals-human-use-ich-guideline-q9-quality-risk-management-step-5-first-version_en.pdf</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - CPV 4.0 - Part 2 : Automate the CPV process]]></title><description><![CDATA[The Pharm'AI Company - Project #3]]></description><link>https://databoostindustry.substack.com/p/eng-cpv-40-part-2-automate-the-cpv</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-cpv-40-part-2-automate-the-cpv</guid><pubDate>Thu, 03 Oct 2024 08:55:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!azFh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article follows the first part: <a href="https://databoostindustry.substack.com/p/fr-cpv-40-partie-1-le-pitch-projet">an introduction to the CPV approach</a>. Together we will explore the homepage of the <a href="https://q927ad-arnaud-duigou.shinyapps.io/Shiny_app_CPV40/">CPV 4.0</a> tool dedicated to univariate analysis.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!azFh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!azFh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png 424w, https://substackcdn.com/image/fetch/$s_!azFh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png 848w, https://substackcdn.com/image/fetch/$s_!azFh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png 1272w, https://substackcdn.com/image/fetch/$s_!azFh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!azFh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png" width="1456" height="935" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:935,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:229452,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!azFh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png 424w, https://substackcdn.com/image/fetch/$s_!azFh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png 848w, https://substackcdn.com/image/fetch/$s_!azFh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png 1272w, https://substackcdn.com/image/fetch/$s_!azFh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F619588d9-2856-48dc-a254-f16ee224cf52_1497x961.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In statistical jargon, "univariate" analysis refers to focusing on the variations and distribution of a single variable. Typically, this is the foundation for most CPV analyses, which is already very informative. We will see, however, how to optimize it to make its use more relevant and efficient.</p><h2>The Two Graphical Pillars of CPV</h2><p>The CPV approach is mainly based on two types of visualizations<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>:</p><ul><li><p><strong>The Control Chart:</strong></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!myDL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!myDL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png 424w, https://substackcdn.com/image/fetch/$s_!myDL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png 848w, https://substackcdn.com/image/fetch/$s_!myDL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png 1272w, https://substackcdn.com/image/fetch/$s_!myDL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!myDL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png" width="1217" height="396" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:396,&quot;width&quot;:1217,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86746,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!myDL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png 424w, https://substackcdn.com/image/fetch/$s_!myDL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png 848w, https://substackcdn.com/image/fetch/$s_!myDL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png 1272w, https://substackcdn.com/image/fetch/$s_!myDL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc6b3df-dded-417c-bebc-fdd5d1655728_1217x396.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A control chart is a statistical tool used in quality management to monitor and control a production process. It helps detect abnormal variations in the process by graphically comparing the chronological evolution of a quality characteristic against predefined control limits. Depending on the requirements, the time axis can be represented in terms of dates or batch chronology. The main objective of this tool is to visualize drifts.</p><p>The displayed limits correspond to the specifications (in red), i.e., the required standards, and the lower (LCL for Lower Control Limit) and upper (UCL for Upper Control Limit) statistical limits (in blue), defined by &#177;3 standard deviations from the mean.</p><ul><li><p><strong>The Capability Chart:</strong></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9xSF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9xSF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png 424w, https://substackcdn.com/image/fetch/$s_!9xSF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png 848w, https://substackcdn.com/image/fetch/$s_!9xSF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png 1272w, https://substackcdn.com/image/fetch/$s_!9xSF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9xSF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png" width="592" height="404" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:404,&quot;width&quot;:592,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:33574,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!9xSF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png 424w, https://substackcdn.com/image/fetch/$s_!9xSF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png 848w, https://substackcdn.com/image/fetch/$s_!9xSF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png 1272w, https://substackcdn.com/image/fetch/$s_!9xSF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe405b548-25a6-45e9-aa76-7fcef8325e6b_592x404.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A capability chart is a visual tool used in quality management to assess whether a production process is "capable" of producing results that meet the required specifications. It compares the natural variation of the variable with the tolerances, thereby determining whether the process provides consistent and reliable results. A capable process is one that is both tight and centered around the specifications.</p><p>The capability of a process is measured by the Cp and Cpk indices. In general, a process is considered capable when these indices are greater than 1.3.</p><p>These two types of charts, which are very classic, are logically at the heart of the <a href="http://A capability chart is a visual tool used in quality management to assess whether a production process is &quot;capable&quot; of producing results that meet the required specifications. It compares the natural variation of the variable with the tolerances, thereby determining whether the process provides consistent and reliable results. A capable process is one that is both tight and centered around the specifications.  The capability of a process is measured by the Cp and Cpk indices. In general, a process is considered capable when these indices are greater than 1.3.  These two types of charts, which are very classic, are logically at the heart of the CPV 4.0 tool. However, I have envisioned special features to enhance them.">CPV 4.0</a> tool. However, I have envisioned special features to enhance them.</p><h2>The Options</h2><ul><li><p><strong>Visualization Options</strong></p></li></ul><p>I propose two different visualization options: quarterly visualization (since it is often the timeframe used for periodic process analysis), as well as the opportunity to display or not display the LCL and UCL.</p><p>These two options were chosen to illustrate frequent use cases; however, many other options could be considered. When using open-source tools like R and Python, it is possible to achieve total customization of the charts and visuals<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. This way, one exceeds the capabilities of all commercial software, and any need can be custom programmed. Thus, the end user defines their needs and how they wish to manipulate and display their information.</p><ul><li><p><strong>Outlier Detection</strong></p></li></ul><p>Typically, control charts are accompanied by outlier visualization using the Nelson Rules<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. Nelson's rules are a set of criteria used to detect non-random variations in control charts. They include eight different rules, each characterizing specific patterns of points on the control chart.</p><p>In addition to these classic Nelson rules, I chose to add an alternative that I find more elegant, which is based on a machine learning model: an anomaly detection algorithm called Isolation Forest<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>. This method detects anomalies by creating random partitions in the data until each point is isolated (and, by definition, outliers are easier to isolate than others, which is how they are identified).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uNsz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uNsz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png 424w, https://substackcdn.com/image/fetch/$s_!uNsz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png 848w, https://substackcdn.com/image/fetch/$s_!uNsz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png 1272w, https://substackcdn.com/image/fetch/$s_!uNsz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uNsz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png" width="435" height="194.50714285714287" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7638597-d154-454b-b175-5fd840756d53_1400x626.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:626,&quot;width&quot;:1400,&quot;resizeWidth&quot;:435,&quot;bytes&quot;:345477,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!uNsz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png 424w, https://substackcdn.com/image/fetch/$s_!uNsz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png 848w, https://substackcdn.com/image/fetch/$s_!uNsz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png 1272w, https://substackcdn.com/image/fetch/$s_!uNsz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7638597-d154-454b-b175-5fd840756d53_1400x626.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>I find this visualization complementary in terms of alerts because it really focuses on sudden and drastic changes in the CQAs. Nelson's rules primarily allow a retrospective analysis of observed behaviors. In contrast, a predictive algorithm learns from past data and can identify unusual variations from the first excursion. If we aim for responsiveness, this type of tool is appropriate<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>.</p><h2><strong>A Custom Interpretation</strong></h2><p>I wanted the tool to offer users a literal analysis of the two charts described above.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Bxp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Bxp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png 424w, https://substackcdn.com/image/fetch/$s_!7Bxp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png 848w, https://substackcdn.com/image/fetch/$s_!7Bxp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png 1272w, https://substackcdn.com/image/fetch/$s_!7Bxp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Bxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png" width="476" height="261.72093023255815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e773c02-2806-4e15-a727-debc2923699c_602x331.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:331,&quot;width&quot;:602,&quot;resizeWidth&quot;:476,&quot;bytes&quot;:27333,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!7Bxp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png 424w, https://substackcdn.com/image/fetch/$s_!7Bxp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png 848w, https://substackcdn.com/image/fetch/$s_!7Bxp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png 1272w, https://substackcdn.com/image/fetch/$s_!7Bxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e773c02-2806-4e15-a727-debc2923699c_602x331.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Rather than using generative AI&#8212;which would have been very entertaining &#128540; but difficult to control&#8212;I preferred to build a flowchart that generates predefined text based on certain conditions. The text thus adapts to the reality of the data.</p><p>It is then up to the user to complete and balance this very descriptive analysis with their own explanatory elements.</p><h2>The Ultimate Feature?</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_NgA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_NgA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png 424w, https://substackcdn.com/image/fetch/$s_!_NgA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png 848w, https://substackcdn.com/image/fetch/$s_!_NgA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png 1272w, https://substackcdn.com/image/fetch/$s_!_NgA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_NgA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png" width="232" height="96" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:96,&quot;width&quot;:232,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4489,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!_NgA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png 424w, https://substackcdn.com/image/fetch/$s_!_NgA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png 848w, https://substackcdn.com/image/fetch/$s_!_NgA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png 1272w, https://substackcdn.com/image/fetch/$s_!_NgA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F363e4f13-beb3-4cec-aa41-944e5c139264_232x96.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>To ensure maximum time savings, I have provided the possibility to export the charts, as well as the conclusion text, as a ".docx" file, which will directly feed CPV, APQR, or investigation reports. I designed a very simple ".docx" report for this demonstration version, but once again, it is fully customizable. One can design modular custom reports with any chart, table, text block, etc.</p><h2>Conclusion</h2><p>The "Univariate Analysis" interface of <a href="https://q927ad-arnaud-duigou.shinyapps.io/Shiny_app_CPV40/">CPV 4.0</a> meets the objectives I set for myself: all the markers of a classic CPV analysis are there, and this information can be obtained in just a few clicks.</p><p>Once the tool is customized according to the specific needs of the teams, it becomes possible to generate a report in just a few tens of minutes after carrying out some routine checks and adding a few pieces of information.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tTvm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tTvm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!tTvm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!tTvm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!tTvm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tTvm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1310888,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!tTvm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!tTvm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!tTvm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!tTvm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6312caf0-3c72-4a84-8bb5-73d922e8b3d9_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Teams can now focus their energy on what really matters: explaining the variations, identifying root causes, and implementing the right actions. This corresponds to steps 2 and 3, and that&#8217;s what we will share in the final part of the journey &#128526;.</p><p>Does this philosophy resonate with your approach to data analysis and process optimization? Feel free to share your experiences and thoughts in the comments.</p><p><em>Let&#8217;s Rock with Data and CPVs &#127926;</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>Article A3P - Continued Process Verification, strat&#233;gie de visualisation des donn&#233;es dans le cadre de la CPV<strong> </strong>: <a href="https://www.a3p.org/cpv-strategie-de-visualisation-donnees/">https://www.a3p.org/cpv-strategie-de-visualisation-donnees/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>R ggplot2 gallery : https://r-graph-gallery.com/</h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Wikipedia - Nelson&#8217;s rules : <a href="https://en.wikipedia.org/wiki/Nelson_rules">https://en.wikipedia.org/wiki/Nelson_rules</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>Datascientest - Isolation Forest : Comment d&#233;tecter les anomalies dans une dataset ? : <a href="https://datascientest.com/isolation-forest">https://datascientest.com/isolation-forest</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>Amol Mavuduru - <strong>How to perform anomaly detection with the Isolation Forest algorithm : </strong><a href="https://towardsdatascience.com/how-to-perform-anomaly-detection-with-the-isolation-forest-algorithm-e8c8372520bc">https://towardsdatascience.com/how-to-perform-anomaly-detection-with-the-isolation-forest-algorithm-e8c8372520bc</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>ISPE - Artificial Intelligence (AI) Based Continued Process Verification (CPV) : <a href="https://ispe.org/pharmaceutical-engineering/ispeak/artificial-intelligence-ai-based-continued-process-verification">https://ispe.org/pharmaceutical-engineering/ispeak/artificial-intelligence-ai-based-continued-process-verification</a></h5></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - CPV 4.0 - Part 1 : The Project Brief]]></title><description><![CDATA[The Pharm'AI Company - Project #3]]></description><link>https://databoostindustry.substack.com/p/eng-cpv-40-part-1-the-project-brief</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-cpv-40-part-1-the-project-brief</guid><pubDate>Tue, 01 Oct 2024 21:11:35 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/dc808943-d070-4bd8-9ea9-f9cc2e3cb683_2165x1440.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the pharmaceutical industry, quality is not an option, it is an obligation! Let's take a closer look at a specific aspect of the pharmaceutical quality system: CPVs !</p><h2>Introduction</h2><p>To begin with, here is a brief introduction for readers who may be less familiar with the pharmaceutical field.</p><p>The pharmaceutical sector is characterized by its highly regulated nature. Regulatory texts and health authorities require manufacturers to guarantee and prove that manufacturing processes are controlled, meaning they deliver the expected results.</p><p>When deploying a new production process, an initial validation approach is undertaken to ensure it works correctly. However, an initial validation is not sufficient; it is also necessary to ensure that the system does not drift and remains in a state of constant control throughout the product's lifecycle<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bw5e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bw5e!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bw5e!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bw5e!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bw5e!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bw5e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg" width="525" height="317.3444976076555" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:758,&quot;width&quot;:1254,&quot;resizeWidth&quot;:525,&quot;bytes&quot;:67117,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bw5e!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bw5e!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bw5e!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bw5e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe07f7db-910a-4a65-84dc-1a3cfe516435_1254x758.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Process Validation Lifecycle</figcaption></figure></div><p>This is precisely the purpose of CPVs (Continued Process Verification).</p><blockquote><p><em><strong>CPVs involve the systematic collection and analysis of data on production components and processes to ensure that product outcomes remain within predefined quality limits.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></strong></em></p></blockquote><p>In practice, this means that certain critical variables for product quality will be monitored and compared to their specifications. The analysis is generally communicated to the various departments of industrial sites to share the degree of process control and define potential actions to be taken.</p><p>Typically, a CPV report is prepared periodically for each product or product family. Thus, on large sites, it is not uncommon to have several dozen documents to prepare annually. The volume of administrative work is therefore very significant.</p><div><hr></div><p>I conclude this context setting with some terminology related to CPVs that will appear in the series of articles:</p><ul><li><p>A <strong>CQA (Critical Quality Attribute)</strong> is a physical, chemical, biological, or microbiological property of a product that must be controlled to ensure the quality and safety of the drug.</p></li><li><p>A <strong>CPP (Critical Process Parameter)</strong> is a manufacturing process parameter whose variability can significantly impact a CQA and must therefore be monitored or controlled to ensure product quality.</p></li><li><p>A <strong>CMA (Critical Material Attribute)</strong> is a physical, chemical, biological, or microbiological characteristic of a raw material that can influence a CQA and must be monitored or controlled to ensure product quality.</p></li><li><p>An <strong>OOS/OOT (Out-Of-Specification/Trend)</strong> refers to a test result that does not meet the predefined specifications for a product, raw material, or process.</p></li></ul><h2>Traditional CPVs VS &#8220;CPV 4.0&#8221;</h2><p>In the digital age, new opportunities arise to modernize this quality process. A number of companies are transitioning from the classic CPV approach to a more modern, data-driven approach.</p><p>I have represented these two symbolic worlds that coexist on a diagram, with the different dimensions that distinguish them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vMU8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vMU8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!vMU8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!vMU8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!vMU8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vMU8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:518537,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vMU8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!vMU8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!vMU8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!vMU8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef2ca5d9-948c-4d1a-b0c7-9f639aa8d0fa_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This type of visual is simplistic, but it still allows for effectively assessing an organization's maturity in managing CPVs.</p><h2>Typical Pain Points of the Traditional Approach</h2><ol><li><p><strong>A generally tedious preparation process</strong></p></li></ol><p>Currently, many companies prepare their CPV analyses using generalist statistical tools, such as Excel or industrial statistical software (e.g., Minitab&#174;, Tibco Statistica&#174;, &#8230;). Although these tools allow for optimizations through macros, they often require repetitive tasks and many manual manipulations to collect, input, and format the data. These manual tasks also increase the risk of human error.</p><p>In practice, it is not uncommon to spend several dozen hours preparing a single report. Moreover, a major problem with the traditional approach is its lack of scalability: the time required to create documents increases proportionally with the number of documents to be produced.</p><p><strong>CPV 4.0 Solution :</strong> With a well-structured data model and good data governance, I estimate that the preparation time for a report can be reduced by about 80%. Furthermore, this time savings through automation makes the process scalable, as once the software is developed, the total time required to prepare 10 or 20 reports is almost the same.</p><blockquote><p><em><strong>A CPV 4.0 process allows teams to focus on value-added activities (problem-solving, improvement actions, &#8230;) instead of office tasks.</strong></em></p></blockquote><ol start="2"><li><p><strong>A sometimes rigid and too retrospective process</strong></p></li></ol><p>Given the dozens of hours needed to prepare a report, CPV writing is often done in a staggered manner, following a fixed schedule. This leads to a retrospective analysis; a long delay may occur between the appearance of production drifts and when they are analyzed. This approach prolongs investigations, delays corrective and preventive actions, and causes a loss of opportunity to optimize the process.</p><p><strong>CPV 4.0 Solution :</strong> Data processing is done almost in real time. It becomes possible to set up automatic alerts to react quickly. With Machine Learning (ML) tools, weak signals can even be captured, and risky situations can be predicted before they materialize into OOS<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. Indeed, machine learning algorithms are capable of detecting subtle patterns and emerging trends well before they are noticeable using traditional methods.</p><blockquote><p><em><strong>A CPV 4.0 process allows for quick reactions to contain drifts and their impacts. Reactivity = Reliability.</strong></em></p></blockquote><ol start="3"><li><p><strong>A process focused on a handful of CQAs</strong></p></li></ol><p>Due to a lack of resources, CPV programs often focus on a limited set of CPPs and CQAs. This approach, while effective for detecting major drifts, has a significant limitation: it increases the risk of missing subtle interactions between different variables. By focusing only on a few terminal indicators, we risk overlooking the inherent complexity of pharmaceutical manufacturing processes.</p><p>Moreover, in the coming decades, the amount of exploitable data will only increase. This is the trend of history, and it would be regrettable not to consider these information reservoirs on the processes.</p><p><strong>CPV 4.0 Solution : </strong>Using multivariate analysis techniques, it becomes possible to take into account a wide range of variables simultaneously. This provides a more complete and nuanced view of the complex interactions within the production cycle. Better understanding of materials and processes helps reduce the risk of non-compliance and batch rejections.</p><blockquote><p><em><strong>A CPV 4.0 process allows for the development of a clear and efficient product strategy.</strong></em></p></blockquote><ol start="4"><li><p><strong>Tools that are not very interactive</strong></p></li></ol><p>Control charts and capability graphs<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>, which are at the heart of CPV analysis, are good tools for summary reports. However, these graphs have several important limitations:</p><ul><li><p>Their ability to visualize processes remains very high-level, making it difficult to really understand one batch among its neighbors.</p></li><li><p>These tools are static: they do not allow for data manipulation or cross-referencing, which slows down and limits the depth of exchanges and investigations. Often, team interactivity around data is a driver of collective intelligence.</p></li></ul><p><strong>CPV 4.0 Solution : </strong>From the beginning of this article, I have mainly been talking about technical and statistical aspects, but in reality, the CPV philosophy is not limited to tools and numbers: it is primarily based on cross-functional collaboration between teams. Modern solutions offer better transparency and interaction with data. This reduces dependency on subjective judgments, improves knowledge sharing, and achieves better collective alignment on actions.</p><blockquote><p><em><strong>A CPV 4.0 process makes data alive and interactive.</strong></em></p></blockquote><p>With the framework well set, it's time to move on to the project.</p><h2>The Project Charter</h2><p>For this third project, I propose an application that illustrates all the points I have developed above. I have called this application <a href="https://q927ad-arnaud-duigou.shinyapps.io/Shiny_app_CPV40/">CPV 4.0</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> ; it is the implementation in code of the vision I have of a modern CPV process, applied to a dry forms manufacturing process.</p><p>To do this, I used a dataset shared by the &#381;agar team et al. in one of their publications<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>. You will find in this document the detailed description of the dataset.</p><p>As usual, to contextualize the project, I propose a project charter summarizing its scope, as well as how to calculate an ROI (Return on Investment) for this type of initiative.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YhOh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YhOh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YhOh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YhOh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YhOh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YhOh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg" width="1280" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:278571,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YhOh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YhOh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YhOh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YhOh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421f0b6a-51fb-4eae-a1fc-10e25b2b23a9_1280x720.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I remind you that the figures communicated in this document are purely illustrative.</p><h2>Some Clarifications on the Project Scope</h2><ul><li><p><strong>This project requires technical prerequisites in terms of infrastructure and data governance.</strong></p><p>To implement the type of tool that I will present, it is necessary to first structure different data sources within a Data Warehouse architecture. The latter being connected to a LIMS, an ERP, and production data management systems (IPC, SCADA, &#8230;). I will not discuss these points further, as they are upstream of our project.</p></li><li><p><strong>The tool I develop involves computer system validation</strong>, which I will not describe in the upcoming articles.</p><p>One small clarification: my CPV 4.0 tool is enhanced with some AI features, but these tools are simply there to highlight insights and catalyze the decision-making process.</p><p>However, analyses and decisions remain human. Therefore, there is no issue with AI validation as such.</p></li><li><p><strong>Finally, I do not necessarily recommend a systematic CPV 4.0 approach in every situation.</strong></p><p>Indeed, there are scenarios where traditional CPVs work very well and are fully sufficient. KISS: Keep It Simple Simple!</p><p>I see interest in this approach particularly for:</p><ul><li><p>high value-added products</p></li><li><p>highly multi-factor processes (multi-step processes, biopharmaceuticals, &#8230;)</p></li><li><p>processes identified as low-capable, where troubleshooting needs are significant</p></li></ul></li></ul><h2>Conclusion</h2><p>Through this article, I hope I have demonstrated that CPVs provide a tremendous opportunity for digitalization. The journey is long, and there are many steps to bridge the gap between the traditional CPV approach and the modern 4.0 approach.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L5EN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L5EN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!L5EN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!L5EN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!L5EN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L5EN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png" width="727" height="408.9375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:727,&quot;bytes&quot;:1310888,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L5EN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!L5EN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!L5EN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!L5EN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7286c54c-d478-422b-a9aa-584151f59abe_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The CPV journey</figcaption></figure></div><p>The next article will focus on the first of these steps. I will show you how I managed to automate statistical analysis and report creation.</p><p>I strongly encourage you to explore the <a href="https://q927ad-arnaud-duigou.shinyapps.io/Shiny_app_CPV40/">CPV 4.0 app</a> now to get an idea &#128515;. Your feedback will be valuable to help us further improve it. So, don't hesitate, test it, and share your impressions!</p><p><em>Let&#8217;s Rock with Data to improve CPVs &#127926; </em>&#129304;</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>Ko&#235;rber pharma - <strong>Successfully managing your bioprocess:<br>The importance of Continued Process Verification : <a href="https://www.koerber-pharma.com/blog/successfully-managing-your-bioprocess-the-importance-of-continued-process-verification">https://www.koerber-pharma.com/blog/successfully-managing-your-bioprocess-the-importance-of-continued-process-verification</a></strong></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>Claire Bonnet. La v&#233;rification continue des proc&#233;d&#233;s de fabrication pharmaceutique : contexte, r&#232;glementation et impl&#233;mentation d&#8217;un syst&#232;me CPV au sein d&#8217;un site de fabrication de substances actives pharmaceutiques. Sciences pharmaceutiques. 2021.</h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>FDA - Guidance for Industry - Process validation general principles and practices : <a href="https://www.fda.gov/files/drugs/published/Process-Validation--General-Principles-and-Practices.pdf">https://www.fda.gov/files/drugs/published/Process-Validation--General-Principles-and-Practices.pdf</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>ISPE - Artificial Intelligence (AI) Based Continued Process Verification (CPV) : <a href="https://ispe.org/pharmaceutical-engineering/ispeak/artificial-intelligence-ai-based-continued-process-verification">https://ispe.org/pharmaceutical-engineering/ispeak/artificial-intelligence-ai-based-continued-process-verification</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>A3P - <strong>Continued Process Verification, strat&#233;gie de visualisation des donn&#233;es dans le cadre de la CPV : <a href="https://www.a3p.org/cpv-strategie-de-visualisation-donnees/">https://www.a3p.org/cpv-strategie-de-visualisation-donnees/</a></strong></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>Arnaud Duigou&#8217;s CPV 4.0 App : <a href="https://q927ad-arnaud-duigou.shinyapps.io/Shiny_app_CPV40/">https://q927ad-arnaud-duigou.shinyapps.io/Shiny_app_CPV40/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><h5>&#381;agar J, Miheli&#269; J. Big data collection in pharmaceutical manufacturing and its use forproduct quality predictions. Sci Data. 2022 Mar 23;9(1):99. doi: 10.1038/s41597-022-01203-x. PMID: 35322032; PMCID: PMC8943063. <a href="https://www.nature.com/articles/s41597-022-01203-x">https://www.nature.com/articles/s41597-022-01203-x</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - Workplace accidents analysis with LLM - Part 3 : The final dashboard]]></title><description><![CDATA[The Pharm'AI Company - Project #2]]></description><link>https://databoostindustry.substack.com/p/eng-workplace-accidents-analysis-19c</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-workplace-accidents-analysis-19c</guid><pubDate>Thu, 26 Sep 2024 08:45:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f647837c-9376-476d-842c-cf8114b759fc_1628x911.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The main challenge of this project, which was resolved and described in part 2: <a href="https://databoostindustry.substack.com/p/eng-workplace-accidents-analysis-230">Using a Secure LLM</a>, was to transform raw data into structured and valuable information.</p><p>This final part of our series will be more concise and visual. I will take you behind the scenes of the dashboard creation process that will close this project. I will also share with you some of the tips I frequently use to build dashboards with Microsoft&#8217;s tool.</p><p>Here is the link to the "Workplace accidents analyzer" dashboard so you can review it alongside the reading.</p><h2>Interface Description</h2><p>My initial objectives were as follows. I wanted:</p><ul><li><p>A synthetic tracking tool that fits on a single page, allowing for quick and intuitive consultation by teams.</p></li><li><p>A very visual and graphical communication, consistent with the subject of personnel safety (avoiding long texts and data tables).</p></li><li><p>Simplified reading focused on 4 or 5 key graphs.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tPA7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tPA7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!tPA7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!tPA7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!tPA7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tPA7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:494385,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tPA7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!tPA7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!tPA7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!tPA7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c907607-8fbc-4f54-bbd8-ea1d91980f6b_2560x1440.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I chose to organize the dashboard using a fairly classic structure with three main blocks, laid out from left to right:</p><ul><li><p>A sidebar containing the various filtering options. This allows the analysis to be divided based on the available main criteria (industry, sector, categories, etc.).</p></li><li><p>A left section focused more on the victims: age, gender, and injury mapping. It primarily includes an accident count, an age pyramid, and a human body diagram.</p></li><li><p>A right section dedicated to describing the accidents: their nature, the tools involved, and the root causes, when specified.</p></li></ul><h2>Tips #1: Properly Structure Your Data Model</h2><p>The data was clean after running my Python script since I had spent time and energy mastering the data output format and its structure. Thanks to careful data preparation in advance, the setup phase in Power BI was extremely quick.</p><p>However, I always recommend structuring your data well in Power BI by creating logically interconnected tables. A classic and effective way to handle data is by using the following architecture:</p><ul><li><p>A central table, often referred to as the "fact table." In our case, this is the table containing specific data for each accident.</p></li><li><p>"Dimension tables," which contain information, categories, etc., that may repeat across accidents.</p></li></ul><p>As you can see, this is referred to as a star schema.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2875!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2875!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2875!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2875!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2875!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2875!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg" width="564" height="386.72614107883817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:661,&quot;width&quot;:964,&quot;resizeWidth&quot;:564,&quot;bytes&quot;:48353,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2875!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2875!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2875!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2875!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19b712e1-7190-47c4-b3bc-4dd93144e122_964x661.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With this organizational structure, there are fewer errors when creating visuals, and working with large datasets becomes much faster.</p><p>For more explanations, I refer you to the following <a href="https://learn.microsoft.com/fr-fr/power-bi/guidance/star-schema">link</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p><h2>Tips #2: Look for Strong and Impactful Graphs</h2><p>A dashboard must be clear and functional. It must adapt to the habits of the teams.</p><p>If a particular type of graph is essential for your field of activity, it's worth trying to reproduce it. In the case of workplace accidents, it's very common to include a human body visual in the various entry forms to quickly point out affected areas. So, I took on the challenge of reproducing this type of diagram.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sLBE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sLBE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png 424w, https://substackcdn.com/image/fetch/$s_!sLBE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png 848w, https://substackcdn.com/image/fetch/$s_!sLBE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png 1272w, https://substackcdn.com/image/fetch/$s_!sLBE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sLBE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png" width="282" height="217" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:217,&quot;width&quot;:282,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30782,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sLBE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png 424w, https://substackcdn.com/image/fetch/$s_!sLBE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png 848w, https://substackcdn.com/image/fetch/$s_!sLBE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png 1272w, https://substackcdn.com/image/fetch/$s_!sLBE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc97f0c5-90c0-46d2-baff-ea777a2961ed_282x217.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>I found the site <a href="https://synoptic.design/">synoptic.design</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, which makes it easy to create custom visuals, and I used this <a href="https://powerbi.microsoft.com/fr-ca/blog/visual-awesomeness-unlocked-using-the-synoptic-panel/">tutorial</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> to create mine. It&#8217;s very easy, for example, to create engaging visuals that add a playful and intuitive dimension to dashboards.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!znUI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!znUI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png 424w, https://substackcdn.com/image/fetch/$s_!znUI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png 848w, https://substackcdn.com/image/fetch/$s_!znUI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png 1272w, https://substackcdn.com/image/fetch/$s_!znUI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!znUI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png" width="520" height="379" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:379,&quot;width&quot;:520,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57289,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!znUI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png 424w, https://substackcdn.com/image/fetch/$s_!znUI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png 848w, https://substackcdn.com/image/fetch/$s_!znUI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png 1272w, https://substackcdn.com/image/fetch/$s_!znUI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ca5d118-5713-43b4-9c8c-cd4637d4b290_520x379.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Tips #3: Pay Attention to Your Visuals</h2><p>We often put a lot of effort into creating AI tools or dashboards. It is therefore crucial that users embrace them, adopt them, and want to interact with them. In this regard, design and aesthetics play a significant role.</p><p>For the visual aspect, my personal routine is often as follows:</p><ul><li><p>I think about the overall structure of the dashboard. The visuals will differ for a concise summary or for a four or five-page interface.</p></li><li><p>I browse royalty-free image banks for inspiration regarding the colors and/or structure of the dashboard. For this project, I came across this image that I found interesting and from which I retained the color code.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-QPU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-QPU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-QPU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-QPU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-QPU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-QPU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg" width="422" height="158.4404332129964" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:416,&quot;width&quot;:1108,&quot;resizeWidth&quot;:422,&quot;bytes&quot;:53656,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-QPU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg 424w, https://substackcdn.com/image/fetch/$s_!-QPU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg 848w, https://substackcdn.com/image/fetch/$s_!-QPU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!-QPU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a4054ef-b9f8-4ce3-8a51-5cea5af4c5b9_1108x416.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li><li><p>I quickly create a background image in PowerPoint by pre-positioning blocks (often in light tones with rounded corners). This method allows for proper alignment of sections and avoids overloading the Power BI file with numerous objects (like images, text boxes, etc.).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oC5L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oC5L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!oC5L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!oC5L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!oC5L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oC5L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png" width="522" height="293.625" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:522,&quot;bytes&quot;:103693,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oC5L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!oC5L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!oC5L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!oC5L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7190f66-6f36-4785-acdc-88141b03cab5_1280x720.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p>I write a few hex color codes on a post-it note to reuse them in future charts.<em> PS</em>: It&#8217;s worth checking that the colors are inclusive and appropriate for people with color blindness.</p></li><li><p>I quickly build a logo and a title. I also favor icons, in the spirit of an infographic, to avoid large descriptive text blocks.</p></li></ul><p>Aesthetic preferences can differ from one person to another, and it's possible the visual won&#8217;t appeal to everyone. However, with these tips, you're not just creating a simple dashboard but offering a professional "data product," functional and visually appealing, that will meet your users' needs.</p><h2>Tips #4: Optimize the Relationships/Filters Between Your Visuals</h2><p>Most Power BI visuals are interactive, meaning it&#8217;s possible to click on the charts to perform actions.</p><p>By going to "Modeling" &#8594; "Manage Relationships," you can decide how a visual interacts with its neighbors. You can use charts to filter or subdivide neighboring charts. This is done very quickly and is often very useful in a practical use case.</p><p><strong>Some of My Other Projects</strong></p><p>I&#8217;d like to take the opportunity to reshare a few of my previous dashboards:</p><ul><li><p><a href="https://app.powerbi.com/view?r=eyJrIjoiMmU5ZmM1Y2QtZWYzMC00YTkxLWFkMjItZTA2YzVmNmZkZTlkIiwidCI6IjRlNzE0NTBjLThmZjItNDk0Yi05NDc3LWZjMTUwMWVmMzdkZSJ9">Explora&#8217;thesis</a>: A dashboard about students' doctoral theses in pharmacy.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!942X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!942X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png 424w, https://substackcdn.com/image/fetch/$s_!942X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png 848w, https://substackcdn.com/image/fetch/$s_!942X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png 1272w, https://substackcdn.com/image/fetch/$s_!942X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!942X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png" width="400" height="221.97802197802199" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:808,&quot;width&quot;:1456,&quot;resizeWidth&quot;:400,&quot;bytes&quot;:192407,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!942X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png 424w, https://substackcdn.com/image/fetch/$s_!942X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png 848w, https://substackcdn.com/image/fetch/$s_!942X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png 1272w, https://substackcdn.com/image/fetch/$s_!942X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4d6dc-020f-451d-a79e-f029b520fab6_1625x902.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li><li><p><a href="https://app.powerbi.com/view?r=eyJrIjoiYzk1MTM5ZGQtZGVkYi00M2Y2LTg5ZDQtYzI2MmY5ZGE4ZWVhIiwidCI6IjRlNzE0NTBjLThmZjItNDk0Yi05NDc3LWZjMTUwMWVmMzdkZSJ9">FDA unofficial dashboard</a>: A synthetic dashboard based on open data from the FDA.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8ZeA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8ZeA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png 424w, https://substackcdn.com/image/fetch/$s_!8ZeA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png 848w, https://substackcdn.com/image/fetch/$s_!8ZeA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png 1272w, https://substackcdn.com/image/fetch/$s_!8ZeA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8ZeA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png" width="402" height="225.5728021978022" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/863d464e-aaeb-4b93-9213-313738797df6_1624x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:402,&quot;bytes&quot;:331595,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8ZeA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png 424w, https://substackcdn.com/image/fetch/$s_!8ZeA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png 848w, https://substackcdn.com/image/fetch/$s_!8ZeA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png 1272w, https://substackcdn.com/image/fetch/$s_!8ZeA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F863d464e-aaeb-4b93-9213-313738797df6_1624x911.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li><li><p><a href="https://app.powerbi.com/view?r=eyJrIjoiYWVjNmYwYTYtY2I0OC00NGQ5LThjZjEtYmEwMzE0NjJlNmMyIiwidCI6IjRlNzE0NTBjLThmZjItNDk0Yi05NDc3LWZjMTUwMWVmMzdkZSJ9">Pharma sales monitoring</a>: A dashboard tracking the sales figures of a marketing team (using simulated data).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-gLX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-gLX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png 424w, https://substackcdn.com/image/fetch/$s_!-gLX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png 848w, https://substackcdn.com/image/fetch/$s_!-gLX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png 1272w, https://substackcdn.com/image/fetch/$s_!-gLX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-gLX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png" width="414" height="229.4629120879121" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:807,&quot;width&quot;:1456,&quot;resizeWidth&quot;:414,&quot;bytes&quot;:633801,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-gLX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png 424w, https://substackcdn.com/image/fetch/$s_!-gLX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png 848w, https://substackcdn.com/image/fetch/$s_!-gLX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png 1272w, https://substackcdn.com/image/fetch/$s_!-gLX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab67d8db-a26f-426c-817d-a683a2ab5a87_1625x901.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div></li></ul><h2>Conclusion</h2><p>Through this project, I wanted to show managers and industry experts that it is now possible to retrieve and manipulate unformatted information. This information is often hidden, diluted in the sea of all our daily tools and favorite databases &#128521;.</p><blockquote><p><strong>You can now bring it to life!</strong></p></blockquote><p>I undertook this project on workplace accidents because I found the subject motivating, but I remind you that all this is transferable and can be applied to any type of business.</p><blockquote><p><strong>The only limits are your business needs and your imagination!</strong></p></blockquote><p>After reading this, I invite you to think about your own sources of unstructured textual data. Do you have any other use case ideas for the pharmaceutical industry? Feel free to share them in the comments!</p><p>Stay tuned for the next project, and&#8230; Let&#8217;s Rock with LLM/Data &#129304;.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>Microsoft Learn, the star schema : <a href="https://learn.microsoft.com/fr-fr/power-bi/guidance/star-schema">https://learn.microsoft.com/fr-fr/power-bi/guidance/star-schema</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>Synoptic design website : <a href="https://synoptic.design/">https://synoptic.design/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Microsoft tutorial for synoptic creation : <a href="https://powerbi.microsoft.com/fr-ca/blog/visual-awesomeness-unlocked-using-the-synoptic-panel/">https://powerbi.microsoft.com/fr-ca/blog/visual-awesomeness-unlocked-using-the-synoptic-panel/</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - Workplace accidents analysis with LLM - Part 2 : The deployment of a secured LLM]]></title><description><![CDATA[The Pharm'AI Company - Project #2]]></description><link>https://databoostindustry.substack.com/p/eng-workplace-accidents-analysis-230</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-workplace-accidents-analysis-230</guid><pubDate>Tue, 24 Sep 2024 15:04:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you haven't consulted it yet, I invite you to start your reading with this <a href="https://databoostindustry.substack.com/p/eng-workplace-accidents-analysis">introductory article</a> on the project.</p><p>(As a reminder, LLM stands for: Large Language Model) </p><p>I want to warn the reader that this second part will be slightly more technical as I will reveal some programming elements.</p><h2>First Step: Data Collection from the INRS Website</h2><p>The EPICEA database<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> is available online but cannot be downloaded as-is. The first part of the work was to retrieve (also known as scraping in technical language) the data in a usable format. Web scraping is a technique that allows you to automatically collect information from websites using a program or script, somewhat like copying and pasting information on a large scale but in an automated way.</p><p>This technique involves using a program or script to navigate through web pages and extract specific pieces of information from the HTML code, such as text, images, files, or other more or less structured data.</p><p>In our case, the web scraping strategy was carried out in two distinct steps:</p><ul><li><p>First, searching for the accident file numbers available in the EPICEA database.</p></li><li><p>Then, extracting case-by-case specific information for each accident.</p></li></ul><p><em><strong>Disclaimer</strong></em>: It is important when collecting data from the internet to check the legal aspects to stay compliant with GDPR regulations (General Data Protection Regulation) applicable to personal data<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. </p><p>In this case, there is no issue since:</p><ul><li><p>INRS allows the use of its database for non-commercial purposes.</p></li><li><p>The database is completely anonymized.</p></li><li><p>We will only work on a sample of this database, focusing on the pharmaceutical, agri-food, and chemical industries.</p></li></ul><p>I wanted to mention this scraping phase, but I won&#8217;t elaborate further in this article as it&#8217;s not the core of the process. I invite you to check out Chapter 2 of the <a href="https://github.com/arnaud-dg/workplace-accidents/blob/main/Worplace_accidents_analysis_notebook.ipynb">notebook</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>, if you are interested in the implementation mechanics.</p><h2>Second Step: Deploying an AI Language Model Securely</h2><h5>&#9654; Installation</h5><p>Throughout this project, I really wanted to demonstrate that it is possible to reconcile the use of generative AI with data confidentiality.</p><p>To use a language model locally while avoiding data leaks, I chose to use <a href="https://ollama.com/">Ollama</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>. This is an open-source platform designed to run language models locally. It allows users to download, run, and interact with various pre-trained language models without requiring an expensive cloud infrastructure.</p><p>The installation process is described in their <a href="https://github.com/ollama/ollama">technical documentation</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>.</p><p><strong>Note</strong>: For those using Windows, it is generally recommended to install the<em> <a href="https://learn.microsoft.com/fr-fr/windows/wsl/about">Windows for Linux WSL system</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>. A new version executable on Windows has been released, but I have not tested its stability.</em></p><h5>&#9654;The first query</h5><p>Once these two operations are completed, it&#8217;s possible to open the WSL command interface and type the following command:</p><pre><code>ollama run <em>NODEL_NAME</em></code></pre><p>Replace <strong>MODEL_NAME</strong> with the name of the model you wish to use. The table below lists all the <a href="https://ollama.com/library">models available at this time</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k-vq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k-vq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png 424w, https://substackcdn.com/image/fetch/$s_!k-vq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png 848w, https://substackcdn.com/image/fetch/$s_!k-vq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png 1272w, https://substackcdn.com/image/fetch/$s_!k-vq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k-vq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png" width="482" height="602.1466275659824" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4edb42d-202f-484b-bf15-097ec37e3648_682x852.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:852,&quot;width&quot;:682,&quot;resizeWidth&quot;:482,&quot;bytes&quot;:65214,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k-vq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png 424w, https://substackcdn.com/image/fetch/$s_!k-vq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png 848w, https://substackcdn.com/image/fetch/$s_!k-vq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png 1272w, https://substackcdn.com/image/fetch/$s_!k-vq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4edb42d-202f-484b-bf15-097ec37e3648_682x852.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">List of models available in Ollama </figcaption></figure></div><p>Once the model is downloaded, you can query it directly via the command prompt or call it directly in our Python programs using the Ollama client with the following piece of code:</p><pre><code>from ollama import Client
from langchain.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI
from langchain_mistralai import ChatMistralAI, MistralAIEmbeddings
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage

nom_modele = "mistral"

# Configuration du mod&#232;le Ollama
llm = ChatOllama(
    model=nom_modele, 
    format="json",
    temperature=0,
    top_k=10,
    top_p=0.9,
    repeat_penalty=1.1
)</code></pre><h2>Third Step: Querying the LLM in a Controlled Manner</h2><p>As described in the previous article, there are many precautions to take with LLMs to limit their inherent risk of hallucinations.</p><p>Without going through the entire code, I will highlight three security measures that help guide and control the AI model's behavior.</p><ol><li><p><strong>A Python class named &#8220;Accident&#8221;</strong>, which lists each of the items we want to extract:</p></li></ol><pre><code><code>class Accident(BaseModel):

    Metier: str = Field(description="Metier, role ou fonction de la victime ayant subi l accident.")

    Sexe: str =  Field(description="Sexe (Homme, Femme) de la victime ayant subi l accident.")

    Age: int = Field(description="Age de la victime ayant subi l accident.")
     
    Type_accident: str = Field(description="Type d'accident survenu. 1 ou 2 mots maximum.")

    Blessure: str = Field(description="Descriptif m&#233;dical des blessures ou symptomes. 1 ou 2 mots maximum.")
        
    Deces: bool = Field(description="La victime est mentionnee comme decedee ou morte.")

    Circulation: bool = Field(description="Accident lie a la circulation.")

    Malaise: bool = Field(description="Accident lie a un malaise type AVC, infarctus, accident cardiaque.")

    Suicide: bool = Field(description="Accident lie a un suicide.")
    
    Machine: List[str] = Field(description="Machines, pi&#232;ces ou objets impliques dans l accident. 1 ou 2 mots maximum.")
                                  
    Cause: List[str] =  Field(description="Facteurs ayant cause directement ou ayant favoris&#233; l'accident. 1 a 3 mots maximum par facteur.")

    Zone: List[str] =  Field(description="Zone du corps humain impact&#233;e par l'accident")
</code></code></pre><p>You can see that for each item to be extracted, I provided a descriptive sentence (sorry it is in french &#128578;) and a format for the expected result. The format can be a string (str), an integer (int), a boolean (bool), a list (List), etc.</p><p>The output format is particularly important because it helps guide and constrain the LLM in retrieving the specific information.</p><ol start="2"><li><p><strong>A prompt</strong>, which is a structured instruction similar to what you would write to interact with ChatGPT online:</p></li></ol><pre><code><code>template_string = """Tu es un analyste qui relit des compte-rendu d'accidents et effectue des saisies. 
Analyse le texte ci-dessous qui se trouve entre les triples apostrophes et extrais-en les informations requises. 

Descriptif de l'accident : ```{descriptif}```

Si l'information n'apparait nulle part dans le narratif, n'invente rien.
Tes r&#233;ponses doivent &#234;tre en fran&#231;ais exclusivement et doivent &#234;tre les plus concises et pr&#233;cises possibles. 

IMPORTANT: Ta r&#233;ponse DOIT &#234;tre un objet JSON valide, respectant strictement le sch&#233;ma suivant. N'inclus AUCUN texte en dehors de cet objet JSON.

{format_instructions}
"""
</code></code></pre><p>For reference, the section named '''{description}''' in the prompt will automatically be replaced by the text corresponding to a specific accident.</p><p>Moreover, you can adjust a model parameter called <strong>temperature</strong>, which roughly translates to the model's level of creativity in responding to the original prompt. By setting it to 0, we maximize the likelihood that the LLM will strictly follow the prompt and avoid taking too many liberties.</p><ol start="3"><li><p><strong>Post-hoc verification functions</strong></p></li></ol><p>Despite all the precautions provided by the first two security measures, we cannot completely rule out receiving erratic responses. For this reason, it's recommended to create verification functions that will ensure the response matches the expected format and, if necessary, correct the value in question.</p><p>I will share an example I encountered: for some unknown reason, one of the LLMs provided the result &#8220;young boy&#8221; when I specifically asked it to choose between two possible answers, &#8220;man&#8221; or &#8220;woman.&#8221; These deviations are rare but can happen, so they should be anticipated.</p><p>I invite you to check out Chapter 3 of the notebook for more details.</p><p><strong>Summary of This Third Step</strong></p><p>The diagram below illustrates the concept of the three security measures.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ToH_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ToH_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png 424w, https://substackcdn.com/image/fetch/$s_!ToH_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png 848w, https://substackcdn.com/image/fetch/$s_!ToH_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png 1272w, https://substackcdn.com/image/fetch/$s_!ToH_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ToH_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png" width="932" height="593" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:593,&quot;width&quot;:932,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:348820,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ToH_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png 424w, https://substackcdn.com/image/fetch/$s_!ToH_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png 848w, https://substackcdn.com/image/fetch/$s_!ToH_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png 1272w, https://substackcdn.com/image/fetch/$s_!ToH_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3584c563-e3d7-4e1d-a2fc-b85927d72785_932x593.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I am very satisfied with the extraction work performed by the tool. In fact, after applying these precautionary measures, format errors are extremely rare. Beyond the form, we can now focus on the content and accuracy of the responses.</p><h2>Last step : Select the better LLM model</h2><p>As we saw earlier, several types of models exist. They each have their own architecture, number of parameters, and training text corpus. There are no inherently good or bad models, and we cannot predict their performance a priori.</p><p>The best approach is to remain pragmatic and compare them, challenge them with practical cases, and select the ones that seem best suited to our problem.</p><p>For this project, I chose three candidate models: <strong>Llama3.1</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>, <strong>mistral</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a>, <strong>zephyr</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a> , whose characteristics are described in the table below.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wypO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wypO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png 424w, https://substackcdn.com/image/fetch/$s_!wypO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png 848w, https://substackcdn.com/image/fetch/$s_!wypO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png 1272w, https://substackcdn.com/image/fetch/$s_!wypO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wypO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png" width="1256" height="232" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:232,&quot;width&quot;:1256,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20701,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wypO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png 424w, https://substackcdn.com/image/fetch/$s_!wypO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png 848w, https://substackcdn.com/image/fetch/$s_!wypO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png 1272w, https://substackcdn.com/image/fetch/$s_!wypO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba5e2081-2bb1-4c4d-b0df-8f122bf42942_1256x232.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>All three are open-source models (goodbye ChatGPT 4o &#129761;).</p><p>Since I chose to run these models on my personal PC, cost won&#8217;t be a significant factor. In an industrial context, however, it would be a crucial consideration.</p><p>The benchmarking protocol is simple: to assess the performance of these models, I personally analyzed about thirty accident reports. I will simply compare the number of correct answers provided by the three candidate models with reality.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j_tl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j_tl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png 424w, https://substackcdn.com/image/fetch/$s_!j_tl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png 848w, https://substackcdn.com/image/fetch/$s_!j_tl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png 1272w, https://substackcdn.com/image/fetch/$s_!j_tl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j_tl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png" width="1260" height="237" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:237,&quot;width&quot;:1260,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28794,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j_tl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png 424w, https://substackcdn.com/image/fetch/$s_!j_tl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png 848w, https://substackcdn.com/image/fetch/$s_!j_tl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png 1272w, https://substackcdn.com/image/fetch/$s_!j_tl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0da183da-e8f0-46af-a5ff-fcec4ccd10ff_1260x237.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The result of this benchmark favors the Mistral model<strong>.</strong></p><h2>Conclusion</h2><p>Below is a randomly selected example showing the result obtained with the Mistral model. The result is more than satisfactory.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iqIh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iqIh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!iqIh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!iqIh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!iqIh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iqIh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:874725,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iqIh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!iqIh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!iqIh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!iqIh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df7c449-bda2-457f-a8a7-379404839b88_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Through this article, I wanted to shed light on how an LLM works and how to interact with it in an automated and controlled way.</p><p>This was a slightly more demanding article on a technical level. I hope it didn&#8217;t lose you too much &#128521;. Don&#8217;t worry, the next article will be simpler and shorter, presenting the Power BI dashboard I created to finalize this project.</p><p>Feel free to share your thoughts in the comments or share your experiences with using LLMs!</p><p><em>Stay tuned, and &#8230; Let&#8217;s Rock with Data &#127926;</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>EPICEA Database, INRS site : <a href="https://www.inrs.fr/publications/bdd/epicea.html">https://www.inrs.fr/publications/bdd/epicea.html</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>Captain Contrat, Web Scraping : est-ce l&#233;gal ? <a href="https://www.captaincontrat.com/protection-des-creations/cgv-cgu-cga/web-scraping-est-ce-legal-me-marcotte">https://www.captaincontrat.com/protection-des-creations/cgv-cgu-cga/web-scraping-est-ce-legal-me-marcotte</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Notebook - Chapitre 2 : Data Collection form INRS website : <a href="https://github.com/arnaud-dg/workplace-accidents/blob/main/Worplace_accidents_analysis_notebook.ipynb">https://github.com/arnaud-dg/workplace-accidents/blob/main/Worplace_accidents_analysis_notebook.ipynb</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>Ollama website : <a href="https://ollama.com/">https://ollama.com/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>Ollama technical documentation: <a href="https://github.com/ollama/ollama">https://github.com/ollama/ollama</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>Linux for Windows <a href="https://learn.microsoft.com/fr-fr/windows/wsl/about">https://learn.microsoft.com/fr-fr/windows/wsl/about</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><h5>Ollama library list : <a href="https://ollama.com/library">https://ollama.com/library</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><h5>Llama model : <a href="https://llama.meta.com/">https://llama.meta.com/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><h5>Mistral model : <a href="https://mistral.ai/fr/">https://mistral.ai/fr/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><h5>Zephyr model : <a href="https://huggingface.co/blog/Isamu136/understanding-zephyr">https://huggingface.co/blog/Isamu136/understanding-zephyr</a></h5><p>The whole code is available on my github repo : <a href="https://github.com/arnaud-dg/workplace-accidents/">https://github.com/arnaud-dg/workplace-accidents/</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - Workplace accidents analysis with LLM - Part 1 : The project brief]]></title><description><![CDATA[The Pharm'AI Company - Project #2]]></description><link>https://databoostindustry.substack.com/p/eng-workplace-accidents-analysis</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-workplace-accidents-analysis</guid><pubDate>Thu, 19 Sep 2024 10:02:28 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e993f4aa-54aa-4887-a16d-be8067cc7dfc_818x546.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few months ago, I had the opportunity to watch a moving documentary from the "Infrarouge" series titled &#8220;Work to Death&#8221;<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. It focused on workplace safety and serious accidents.</p><p>According to the website of the Ministry of Health and Solidarity:</p><blockquote><p><em><strong>&#8220;Every day, two people die at work, and more than a hundred are seriously injured.&#8221;</strong></em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p></blockquote><p>When you've worked for several years in an industrial environment, you understand how crucial the topic of workplace safety is. This inspired me to develop a project around the study of workplace accidents.</p><p>During my research, I found the <a href="https://www.inrs.fr/publications/bdd/epicea.html">EPICEA</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> database, which belongs to INRS, an institute responsible for preventing risks in the professional environment. The EPICEA database lists over 21,000 cases of serious workplace accidents since the 1990s. In this database, there are "narratives," meaning accident reports written in a completely freeform manner.</p><p><strong>A technical challenge arises: how can we extract and manipulate information buried in freely written text?</strong></p><p>Before answering this question, I&#8217;d like to share a quick reminder.</p><h2>Structured VS Unstructured Data</h2><p>When people think of data, most envision tables of numbers (like Excel) or structured databases in rows and columns. This is what we call structured data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uKON!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uKON!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uKON!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uKON!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uKON!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uKON!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg" width="431" height="319.1374045801527" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:655,&quot;resizeWidth&quot;:431,&quot;bytes&quot;:63012,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uKON!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uKON!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uKON!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uKON!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d109d8b-22e4-40a8-b2be-d0b7f16e3915_655x485.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On the other hand, there is unstructured data, which doesn&#8217;t follow a predefined format or schema. Their size and content vary. Some examples of unstructured information:</p><ul><li><p>Images and videos,</p></li><li><p>Text documents (articles, reports, publications, emails, etc.),</p></li><li><p>Audio files,</p></li><li><p>Genetic sequences,</p></li><li><p>Spectrograms, etc.</p></li></ul><p>The following infographic summarizes the fundamental differences between these two worlds<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3Wn0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3Wn0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png 424w, https://substackcdn.com/image/fetch/$s_!3Wn0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png 848w, https://substackcdn.com/image/fetch/$s_!3Wn0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png 1272w, https://substackcdn.com/image/fetch/$s_!3Wn0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3Wn0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png" width="587" height="456.8740234375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:797,&quot;width&quot;:1024,&quot;resizeWidth&quot;:587,&quot;bytes&quot;:58430,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3Wn0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png 424w, https://substackcdn.com/image/fetch/$s_!3Wn0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png 848w, https://substackcdn.com/image/fetch/$s_!3Wn0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png 1272w, https://substackcdn.com/image/fetch/$s_!3Wn0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7acebf0b-d4d2-4280-8e5c-30ff4aa0f3db_1024x797.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Structured VS Unstructured Data - Lawmated</figcaption></figure></div><p>It shows that 80% of company data is in a freeform format<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>.</p><p>Obviously, healthcare companies are no exception to this observation. Many IT tools rely, at least partially, on free-text fields. Here are some everyday examples of unformatted data in the pharmaceutical industry:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bbGD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bbGD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png 424w, https://substackcdn.com/image/fetch/$s_!bbGD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png 848w, https://substackcdn.com/image/fetch/$s_!bbGD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png 1272w, https://substackcdn.com/image/fetch/$s_!bbGD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bbGD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png" width="1456" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:82449,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bbGD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png 424w, https://substackcdn.com/image/fetch/$s_!bbGD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png 848w, https://substackcdn.com/image/fetch/$s_!bbGD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png 1272w, https://substackcdn.com/image/fetch/$s_!bbGD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4262b7f3-a837-4eaa-8979-ec4dee10af9e_1727x572.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#8230; And this list is far from exhaustive. Every company holds deposits of untapped data because manual analysis would be too tedious.</p><h2>How can we refine these mines of raw data?</h2><p>For this project, we will see how to transform unstructured information from freeform text into data tables, allowing us to visualize and analyze them. Technically, this task is known as <strong>named entity extraction</strong>.</p><p>To carry out this activity, we will use LLMs (Large Language Models)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>. These are AI models trained on vast amounts of text and designed to understand and generate human language. The most well-known example to the public is ChatGPT, but there are many others.</p><p>The advantages are numerous:</p><ul><li><p><strong>The tools are adaptable</strong>. Data entry habits vary from person to person, or from site to site, depending on culture, organization, and history. By focusing on context, an LLM compensates for these variations and standardizes the text.</p></li><li><p>It&#8217;s possible to <strong>overcome the language barrier</strong>, as most tools are capable of integrating multiple languages (with better results for more common languages).</p></li><li><p>By using a <strong>generalist model already pre-trained</strong>, we avoid the tedious step of text annotation and training. The final performance might be slightly lower, but we save a significant amount of time.</p></li></ul><h2>What are the risks of LLMs in the workplace?</h2><p>There are many risks associated with using LLMs in companies. I've listed a few of these risks and the measures I plan to take to manage them in this project.</p><h4><strong>Confidentiality</strong></h4><ul><li><p><strong>Users may expose sensitive and confidential information to LLMs, creating a risk of data leakage.</strong></p></li></ul><p>For this project, I downloaded several models and ran them locally on my computer, rather than online through an API. Thus, none of the information I submit to these models is transmitted to unknown servers; in fact, the code can even be executed without any internet connection.</p><p>This ensures a private and secure language model that does not compromise data confidentiality.</p><p>The only downside to working locally is that you don&#8217;t necessarily have significant computing power, and the inference times (calculation time required to obtain a prediction) are relatively long. This is especially true if the analyzed text is massive or if the model has a large number of parameters.</p><h4><strong>Reliability</strong></h4><ul><li><p><strong>LLMs have a strong tendency to hallucinate and frequently generate inaccurate or fabricated information.</strong></p></li></ul><p>The principle of a generative AI for text is to predict the next word based on the context and the history of words in its possession. These tools, therefore, lack real understanding and logic. The risk of hallucination is structural and intrinsic; it can be reduced but cannot be entirely eliminated<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>.</p><div class="pullquote"><p><strong>LLMs generate a credible and probable text but not necessarily a truthful or accurate one!</strong></p></div><p>These tools are Swiss army knives that can be used in various ways, and the risk of hallucination varies depending on the use case. Asking an LLM to translate or summarize a text is different from asking it to create a new text from scratch.</p><p>In our project, I will ask the model to identify specific information within a text. The risk of hallucination is logically lower because the scope for creativity is limited.</p><p>Finally, there is a whole field of expertise called prompt engineering, which focuses on optimizing requests to improve output quality. It is also possible to adjust certain technical parameters, such as the temperature, which controls the model&#8217;s level of creativity.</p><h4><strong>Security</strong></h4><ul><li><p><strong>LLMs present potential vulnerabilities to external attacks.</strong></p></li></ul><p>LLMs deployed in companies are often self-learning structures. Since the goal is to interact with them, they must remember the history of a conversation and adapt their behavior to be more relevant, etc.</p><p>This advantage is also a danger because users, through repeated solicitations, can modify the chatbot's settings. This is how we&#8217;ve seen conversational agents be withdrawn from the market after being literally corrupted by their users' requests<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>.</p><p>In our project, the user will not interact directly with the LLM. The prompts are written by me in the source code, and the user will not be able to alter them.</p><h2>Conclusion</h2><p>LLMs allow us to uncover hidden treasures within unstructured data. Thanks to them, analysis opportunities that were once inaccessible due to time or resource constraints become possible. However, these tools are double-edged: as promising as the benefits are, the risks must be taken seriously!</p><p>The next article will be more technical, with code to explain my technological choices and share some concrete results.</p><p>What do you think? Do you have other ideas for use cases regarding unstructured data? Share your experience in the comments!</p><p><em>Stay tuned, and &#8230; Let&#8217;s Rock with Data </em>&#129304;</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><h5>Infrarouge, Travail &#224; mort - <a href="https://www.france.tv/france-2/infrarouge/5800044-travail-a-mort.html">https://www.france.tv/france-2/infrarouge/5800044-travail-a-mort.html</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><h5>Minist&#232;re du travail : <a href="https://travail-emploi.gouv.fr/sante-au-travail/stop-aux-accidents-du-travail-graves-et-mortels/">https://travail-emploi.gouv.fr/sante-au-travail/stop-aux-accidents-du-travail-graves-et-mortels/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><h5>Base EPICEA de l&#8217;INRS : <a href="https://www.inrs.fr/publications/bdd/epicea.html">https://www.inrs.fr/publications/bdd/epicea.html</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><h5>Structured Data vs. Unstructured Data: what are they and why care? - <a href="https://lawtomated.com/structured-data-vs-unstructured-data-what-are-they-and-why-care/">https://lawtomated.com/structured-data-vs-unstructured-data-what-are-they-and-why-care/</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><h5>Insights from Big Data Innovation: Six Steps to Harness Unstructured Data - <a href="https://www.appen.com/blog/unstructured-data-into-strategic-asset">https://www.appen.com/blog/unstructured-data-into-strategic-asset</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><h5>Wikipedia - Large Language Model - <a href="https://en.wikipedia.org/wiki/Large_language_model">https://en.wikipedia.org/wiki/Large_language_model</a></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><h5><em>LLMs Will Always Hallucinate, and We Need to Live With This, Sourav Banerjee, Ayushi Agarwal, Saloni Singla, <a href="https://arxiv.org/html/2409.05746v1">https://arxiv.org/html/2409.05746v1</a></em></h5></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><h5>Tay ou les in&#233;vitables d&#233;rives racistes de l&#8217;intelligence artificielle, France 24 - <a href="https://www.france24.com/fr/20160325-tay-derive-raciste-ia-microsoft-existentialisme-twitter-robot-conversation-nazi">https://www.france24.com/fr/20160325-tay-derive-raciste-ia-microsoft-existentialisme-twitter-robot-conversation-nazi</a></h5><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - Deep-learning-based bacterial colony detection - Part 3 : Deploy the prototype]]></title><description><![CDATA[The Pharm'AI Company - Project #1]]></description><link>https://databoostindustry.substack.com/p/eng-deep-learning-based-bacterial-801</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-deep-learning-based-bacterial-801</guid><pubDate>Tue, 17 Sep 2024 08:37:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article is the last in a series of three. It follows the article <em><a href="https://databoostindustry.substack.com/p/eng-deep-learning-based-bacterial">Training a Deep Learning Model</a></em><a href="https://databoostindustry.substack.com/p/eng-deep-learning-based-bacterial">.</a></p><p>It is time to share with you &#8220;AI SMART Gelose Counter&#8221;, the web application that I have built. Here is the <a href="https://ufc-counter-2fkxk53awlpztyyrg6aydk.streamlit.app/">link</a> to test it in parallel of your reading.</p><h2>Deploying AI in Production: A Commonly Underestimated Challenge</h2><p>It is relatively "easy" to explore a dataset on your own or to code parts of an application. However, deploying an AI in a production environment is far more complex than it seems. Let&#8217;s explore together why so many projects fail and how to avoid these pitfalls.</p><blockquote><p><em><strong>Around 80% of AI projects fail to be deployed in the field, which is twice the failure rate of traditional IT projects.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></strong></em></p></blockquote><p>Among these 80%, a significant number of AI projects fail because they do not meet the practical realities of the field. In some cases, AI performance is lacking, while in others, they don't fully meet user needs or fail to integrate well into existing workflows.</p><p>Building a data product, mastering its deployment phase, and ensuring its adoption by users requires a lot of communication and collaboration. It&#8217;s more a matter of project management and organization than technology. A good practice is to provide AI prototypes early on so they can be quickly tested and challenged.</p><p>In this prototyping game, <strong>Streamlit</strong><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> is a particularly effective Python framework!</p><h4>What is Streamlit?</h4><p>Streamlit is a Python library that makes it easy to create interactive web applications for data science and machine learning.</p><p>Its design is minimalist, but the advantages are numerous:</p><ul><li><p><strong>Collaboration and sharing</strong>: All team members have access to the same version of the model. They can thus visualize and interact directly with the data and/or the model. By bringing the application to life, users are involved from the very early stages of development. This helps gather feedback and integrate it directly into future iterations to better meet the real needs of the teams.</p></li><li><p><strong>Operational efficiency</strong>: It takes only a few hours to create a data product because the process is greatly simplified for developers. Streamlit significantly streamlines the process for developers, sparing them from complex tasks like managing user interfaces or deploying applications.</p></li><li><p><strong>User simplicity</strong>: A simple interface hides the code in the background. When coding a program and wanting to share it in raw form with a colleague, there are often numerous packages to install. This can be a real headache and not something you can ask field operators to handle. With a Streamlit web application, you create a vehicle, a container, that allows the code to be executed immediately with no hassle.</p></li></ul><p>In short, working on deployment with a prototyping logic promotes:</p><ul><li><p>Right-First-Time</p></li><li><p>Time-To-Market</p></li><li><p>User adoption</p></li></ul><p>Let&#8217;s take a look at how Streamlit fits into the project architecture.</p><h2>A Few Words on the Architecture of My Solution</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ukt_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ukt_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png 424w, https://substackcdn.com/image/fetch/$s_!Ukt_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png 848w, https://substackcdn.com/image/fetch/$s_!Ukt_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png 1272w, https://substackcdn.com/image/fetch/$s_!Ukt_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ukt_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png" width="907" height="520" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:520,&quot;width&quot;:907,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:549710,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ukt_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png 424w, https://substackcdn.com/image/fetch/$s_!Ukt_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png 848w, https://substackcdn.com/image/fetch/$s_!Ukt_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png 1272w, https://substackcdn.com/image/fetch/$s_!Ukt_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a620869-c8c7-4f5c-aa7d-d8525d0350d7_907x520.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The top part of the diagram corresponds to the training phase, as described in the second article. As a reminder, training takes time, so it is a one-off activity, and there&#8217;s no question of retraining the program each time it&#8217;s used. What we want to do routinely is use the trained model to infer a new image, meaning to make a counting prediction on a new petri dish.</p><p>Streamlit, despite its many advantages, has one limitation: it is not possible to install large models that require a lot of storage space and memory. Streamlit only handles what we call the "front-end", the storefront of our application.</p><p>For the back-end, meaning the model execution and calculations, we use an API. An API is a tool that allows different applications to communicate easily with each other by exchanging information securely. In my case, I used <a href="https://fastapi.tiangolo.com/">FastAPI</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>, which I deployed on a Heroku server.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GjO3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GjO3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GjO3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GjO3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GjO3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GjO3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg" width="1443" height="621" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:621,&quot;width&quot;:1443,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90379,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GjO3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GjO3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GjO3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GjO3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05cf9a91-ddd6-4939-92cd-b3c384d93a71_1443x621.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">API documentation page : <a href="https://ufc-counter-api-e72d4934bdd3.herokuapp.com/docs#">https://ufc-counter-api-e72d4934bdd3.herokuapp.com/docs#</a>. </figcaption></figure></div><p>The API acts like an automaton: if you submit a .jpg or .png image to the address <a href="https://ufc-counter-api-e72d4934bdd3.herokuapp.com/predict">https://ufc-counter-api-e72d4934bdd3.herokuapp.com/predict</a>, it will execute the model and return a list of coordinates containing the identified bacterial colonies.</p><h2>Description of the Streamlit Application</h2><p><a href="https://ufc-counter-2fkxk53awlpztyyrg6aydk.streamlit.app/">AI SMART Gelose Counter</a> is relatively simple and goes straight to the point. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MCuP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MCuP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!MCuP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!MCuP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!MCuP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MCuP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2232321,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MCuP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!MCuP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!MCuP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!MCuP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F987a24bc-5dbc-4720-8e36-55029244dd50_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The steps are as follows:</p><ol><li><p>Uploading a petri dish photo, either by dragging and dropping it into the &#8220;Drag and Drop&#8221; area if you have one handy &#128521; or by selecting one of the demo photos from the &#8220;Sample Library&#8221;.</p></li><li><p>The predicted image then appears in the center of the screen. Two options enhance the visualization experience:</p><ul><li><p>The &#8220;Activate shutter view&#8221; option allows the user to hide the &#8220;green boxes&#8221; surrounding the colonies.</p></li><li><p>The &#8220;Show probabilities&#8221; option displays the probabilities associated with the detection of each colony. The closer the probability is to 1, the more confident the model is that it has identified a "colony-forming unit".</p><p>These two options are particularly useful for model verification and validation activities.</p></li></ul></li><li><p>The total colony count appears in the right sidebar. The total number of colonies is compared to the actual value, at least for the sample library photos that I counted myself.</p></li></ol><p>This temporary interface, with no frills, will do exactly what is expected of it: allow subject matter experts to challenge the model and the features of the software.</p><h2>Going Further: Improvement Ideas</h2><p>I coded and wrote this project in just a few dozen hours; there&#8217;s always room for improvement. I&#8217;d like to conclude by sharing a few improvement ideas that come to mind:</p><ul><li><p>I limited myself to creating a counting model, which considers all colonies as a single object. With more data, it would have been possible to build an identification algorithm that could predict the type of bacteria based on the external appearance of the colony.</p></li><li><p>I tested only one model, Yolov5; it would have been interesting to benchmark several models to find one with better detection capability.</p></li><li><p>Finally, as I mentioned earlier, the dataset was objectively unbalanced. It could have been useful to carry out a &#8220;Data Augmentation&#8221; step. This consists of creating new images for the most underrepresented classes by making slight variations (rotations, zooms, contrasts, etc.). Increasing the volume of available images generally improves training.</p></li></ul><h2>Conclusion</h2><p>I hope you enjoyed this first experience with Computer Vision applied to the world of quality control laboratories. As for me, I&#8217;ve rediscovered the very fun world of microbiology from a new angle! &#128539;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q4eH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q4eH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg 424w, https://substackcdn.com/image/fetch/$s_!q4eH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg 848w, https://substackcdn.com/image/fetch/$s_!q4eH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!q4eH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q4eH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg" width="373" height="509.61125" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1093,&quot;width&quot;:800,&quot;resizeWidth&quot;:373,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Aucune description alternative pour cette image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Aucune description alternative pour cette image" title="Aucune description alternative pour cette image" srcset="https://substackcdn.com/image/fetch/$s_!q4eH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg 424w, https://substackcdn.com/image/fetch/$s_!q4eH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg 848w, https://substackcdn.com/image/fetch/$s_!q4eH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!q4eH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad2b0a92-bd85-4846-95df-f58dd19c5815_800x1093.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Alliance Bio Expertise - <a href="https://www.alliance-bio-expertise.com/fr/affiches-de-films/">Affiches de films</a></figcaption></figure></div><p>Feel free to test the application and share your feedback! Whether you're an expert or a novice, your opinion matters to me.</p><p><em>Stay tuned, and &#8230; Let&#8217;s Rock with Data</em> &#129311;</p><p>#AIinMicrobiology #ComputerVision #DataScience #DeepLearning #Gelose #Counting</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><em>The Root Causes of Failure for Artificial Intelligence Projects and How They Can Succeed. Avoiding the Anti-Patterns of AI. James Ryseff, Brandon De Bruhl, Sydne J. Newberry ResearchPublished Aug 13, 2024 <a href="https://www.rand.org/pubs/research_reports/RRA2680-1.html">https://www.rand.org/pubs/research_reports/RRA2680-1.html</a></em></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Streamlit documentation : <a href="https://docs.streamlit.io/">https://docs.streamlit.io/</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>fastAPI documentation : <a href="https://fastapi.tiangolo.com/">https://fastapi.tiangolo.com/</a></p><p>The full code is open-source and is available in my Github Repository : <a href="https://github.com/arnaud-dg/UFC-counter">https://github.com/arnaud-dg/UFC-counter</a></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - Deep-learning-based bacterial colony detection - Part 2 : Train a Deep Learning Model]]></title><description><![CDATA[The Pharm'AI Company - Project #1]]></description><link>https://databoostindustry.substack.com/p/eng-deep-learning-based-bacterial</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/eng-deep-learning-based-bacterial</guid><dc:creator><![CDATA[DATA BOOST Industry]]></dc:creator><pubDate>Thu, 12 Sep 2024 10:29:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article is the second in a series of 3 ; it follows the first article <a href="https://databoostindustry.substack.com/p/deep-learning-based-bacterial-colony">The project brief</a>.</p><p>In this article, we will explore, in a simplified way, the structure of a deep learning model and how to train a neural network to perform a specific task.</p><h2>What is machine learning?</h2><p>There are many real-world cases where it is impossible to explicitly program a machine. Many tasks, especially in computer vision, cannot be reduced to a list of rules. In our case, related to image analysis, it is impossible to create mathematical equations that define in every way what a bacterial colony is.</p><p>Instead of coding explicit operating rules, the principle of machine learning is to "train" a program to perform a task using large amounts of data. The system is then trained to recognize patterns in the provided information in order to make predictions or decisions on new data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fKXp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fKXp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!fKXp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!fKXp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!fKXp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fKXp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:276087,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fKXp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!fKXp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!fKXp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!fKXp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91039830-3feb-4f6e-b3d1-8f39fd0df357_2560x1440.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Humans no longer provide logical instructions but instead supply the machine with labeled data, i.e., raw data associated with the outcome to predict.</p><p>For our project, we will provide the program with:</p><ul><li><p>Pictures of agar (a solid culture medium used for bacterial growth) on which bacterial colonies are present.</p></li><li><p>A table containing the positions of these colonies.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cWC8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cWC8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!cWC8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!cWC8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!cWC8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cWC8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1116569,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cWC8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!cWC8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!cWC8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!cWC8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd315da5b-527d-47ee-aa66-eaebd9c8b79e_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The application of deep learning to images</h2><p>When dealing with images, it is common to use specific models known as neural networks and we are talking about deep learning. I highly recommend the excellent video by <a href="https://youtu.be/aircAruvnKk">3blue1brown</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> that explains very well how these structures work and how they perceive images.</p><p>For a neural network, an image is simply a matrix of numbers, where each cell corresponds to a specific pixel.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bpKD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bpKD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!bpKD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!bpKD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!bpKD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bpKD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2177090,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bpKD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png 424w, https://substackcdn.com/image/fetch/$s_!bpKD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png 848w, https://substackcdn.com/image/fetch/$s_!bpKD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png 1272w, https://substackcdn.com/image/fetch/$s_!bpKD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31a1098-3554-4cf1-9d46-6c9400e37200_2560x1440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">3blue1brown - Neural networks from the ground up</figcaption></figure></div><p>In short, neural networks are composed of various layers of artificial neurons. Each neuron is interconnected with its neighbors, and each connection is assigned a weight that can be adjusted.</p><p>By feeding the neural network numerous images whose nature is known, and by slightly adjusting the weight of each connection whenever an error is made, the various parameters will converge towards an optimum.</p><p>Through this process, called training, the neural network will "itself" identify areas of interest and the important characteristics of the images. Once trained, the model can generalize its knowledge to make predictions or decisions on new data.</p><p>This is, of course, a simplified view behind how a neural network works. In reality, things are more complex, and many choices must be made regarding the network architecture, the mathematical functions involved, and the training parameters.</p><p>For this project, I chose to use YOLOv5<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, a deep learning model specialized in real-time object detection, using a convolutional neural network (CNN) architecture.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ibvF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ibvF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png 424w, https://substackcdn.com/image/fetch/$s_!ibvF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png 848w, https://substackcdn.com/image/fetch/$s_!ibvF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png 1272w, https://substackcdn.com/image/fetch/$s_!ibvF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ibvF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png" width="679" height="201" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:201,&quot;width&quot;:679,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31170,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ibvF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png 424w, https://substackcdn.com/image/fetch/$s_!ibvF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png 848w, https://substackcdn.com/image/fetch/$s_!ibvF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png 1272w, https://substackcdn.com/image/fetch/$s_!ibvF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b6a73c8-ac29-4f68-8fb9-07f948c4fba2_679x201.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>YOLOv5 stands out for its speed and efficiency, while offering high performance with a lighter and more optimized architecture compared to its predecessors. It is not the most powerful model, but it strikes a good balance between performance and execution/memory efficiency.</p><h2>The most critical item: the dataset</h2><p>As we mentioned earlier, training is based on input data; it is this information that will shape the model's parameters and its ability to generalize.</p><blockquote><p><strong>The dataset is the cornerstone of any machine learning or deep learning project!</strong></p></blockquote><p>The quality and relevance of the AI we are going to create directly depend on the richness, diversity, and <strong>representativeness</strong> of the training data. Without a carefully selected and prepared dataset, even the most sophisticated neural network architecture will only produce mediocre or biased results.</p><p>The dataset I used for this project is the one used in the article by <em><a href="https://www.nature.com/articles/s41597-023-02404-8">Makrai et al</a></em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>. It is composed of 369 images on which domain experts have marked the coordinates of 56,865 colonies.</p><p>Therefore, I took a detailed look at the dataset composition.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QjBi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QjBi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png 424w, https://substackcdn.com/image/fetch/$s_!QjBi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png 848w, https://substackcdn.com/image/fetch/$s_!QjBi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png 1272w, https://substackcdn.com/image/fetch/$s_!QjBi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QjBi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png" width="1456" height="847" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd098fbe-b86b-47e0-80af-d22400072244_1649x959.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:847,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:149323,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QjBi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png 424w, https://substackcdn.com/image/fetch/$s_!QjBi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png 848w, https://substackcdn.com/image/fetch/$s_!QjBi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png 1272w, https://substackcdn.com/image/fetch/$s_!QjBi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd098fbe-b86b-47e0-80af-d22400072244_1649x959.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Qps!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Qps!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png 424w, https://substackcdn.com/image/fetch/$s_!8Qps!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png 848w, https://substackcdn.com/image/fetch/$s_!8Qps!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png 1272w, https://substackcdn.com/image/fetch/$s_!8Qps!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Qps!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png" width="1456" height="693" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:693,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53010,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Qps!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png 424w, https://substackcdn.com/image/fetch/$s_!8Qps!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png 848w, https://substackcdn.com/image/fetch/$s_!8Qps!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png 1272w, https://substackcdn.com/image/fetch/$s_!8Qps!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e06648b-742b-4f6c-abb1-89c31b1a4d45_1649x785.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This dataset is not perfect; some strains are overrepresented compared to others, and blood agar culture media are predominant. As a result, there is a risk that our model will perform better at counting colonies on blood agar than on other culture media. In an industrial context, it would be good to populate new images and label additional data.</p><p>For our part, we will continue training with the existing images.</p><h2>Results after training</h2><p>Our deep learning model is ready! &#128104;&#127996;&#8205;&#127859;</p><p>When tested on new images, the magic happens, and we obtain the following results.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6aEx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6aEx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png 424w, https://substackcdn.com/image/fetch/$s_!6aEx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png 848w, https://substackcdn.com/image/fetch/$s_!6aEx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png 1272w, https://substackcdn.com/image/fetch/$s_!6aEx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6aEx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png" width="1001" height="766" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:766,&quot;width&quot;:1001,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:723835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6aEx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png 424w, https://substackcdn.com/image/fetch/$s_!6aEx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png 848w, https://substackcdn.com/image/fetch/$s_!6aEx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png 1272w, https://substackcdn.com/image/fetch/$s_!6aEx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8fa9274-cccf-423c-87ae-e79e713edfb4_1001x766.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The model's accuracy is 92%, which is encouraging, even though it remains below the objectives set in the project charter.</p><p>Don&#8217;t miss the next article, where you&#8217;ll be able to test our bacterial detection application yourself!</p><p><em>Stay tuned, and &#8230; Let&#8217;s Rock with Data &#127926; and Bacteria &#129440;</em></p><p>The complete code is available on my GitHub: <a href="https://github.com/arnaud-dg/UFC-counter">https://github.com/arnaud-dg/UFC-counter</a></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>3blue1brown youtube channel </p><div id="youtube2-aircAruvnKk" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;aircAruvnKk&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/aircAruvnKk?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Yolov5 documentation : <a href="https://docs.ultralytics.com/fr/yolov5/">https://docs.ultralytics.com/fr/yolov5/</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p><em>Makrai, L., Fodr&#243;czy, B., Nagy, S.&#193;.&nbsp;et al.&nbsp;Annotated dataset for deep-learning-based bacterial colony detection.&nbsp;Sci Data&nbsp;<strong>10</strong>, 497 (2023). https://doi.org/10.1038/s41597-023-02404-8 <a href="https://www.nature.com/articles/s41597-023-02404-8">https://www.nature.com/articles/s41597-023-02404-8</a></em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></div></div>]]></content:encoded></item><item><title><![CDATA[ENG - Deep-learning-based bacterial colony detection - Part 1 : The project brief]]></title><description><![CDATA[The Pharm'AI Company - Project #1]]></description><link>https://databoostindustry.substack.com/p/deep-learning-based-bacterial-colony</link><guid isPermaLink="false">https://databoostindustry.substack.com/p/deep-learning-based-bacterial-colony</guid><dc:creator><![CDATA[DATA BOOST Industry]]></dc:creator><pubDate>Mon, 09 Sep 2024 14:30:21 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This article is the first in a series of 3 articles on the automatic counting of agar plates using Artificial Intelligence.</p><h2>Introduction and Context</h2><p>One of the core activities in a microbiologist's profession is interpreting analyses from samples taken during production; these can be raw materials, finished products, or environmental samples to control the cleanliness of premises.</p><p>These analyses are primarily presented in the form of agar plate cultures; sometimes colloquially referred to as "Petri dishes."</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="3600" height="2395" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2395,&quot;width&quot;:3600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;person holding black framed eyeglasses&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="person holding black framed eyeglasses" title="person holding black framed eyeglasses" srcset="https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1580795478724-5b048f1c5b03?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwxMnx8cGV0cml8ZW58MHx8fHwxNzI1ODg0NDY4fDA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">CDC</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>In France, there are thousands of microbiology laboratories, including those in pharmaceutical and food industries, hospitals, universities, etc. The website <a href="http://supermicrobiologistes.fr">supermicrobiologistes.fr</a> lists at least a thousand of them on its <a href="https://supermicrobiologistes.fr/la-carte-des-labos-de-microbiologie/">interactive map</a> [1]. With each laboratory potentially performing dozens of analyses daily, the number of agar plates read nationwide is absolutely massive!</p><p>There are two types of results when reading an agar plate:</p><ul><li><p>Quantification of the number of bacterial colonies (=CFU), referred to as counting or enumeration. A CFU (Colony Forming Unit) is a measure used in microbiology to estimate the number of viable microorganisms in a sample.</p></li><li><p>The nature of the germs that have grown on the culture medium, referred to as identification.</p></li></ul><h2>An Innovative Project: AI in Service of Bacterial Colony counting</h2><p>This first educational project aims to study a solution to automate the "counting" part using Artificial Intelligence. It was inspired by an article in <em><a href="https://www.nature.com/sdata/">Nature Scientific Data</a></em> written by <em><a href="https://www.nature.com/articles/s41597-023-02404-8">Makrai et al</a></em>. [2]</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!opPk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!opPk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png 424w, https://substackcdn.com/image/fetch/$s_!opPk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png 848w, https://substackcdn.com/image/fetch/$s_!opPk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png 1272w, https://substackcdn.com/image/fetch/$s_!opPk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!opPk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png" width="685" height="502" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:502,&quot;width&quot;:685,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;figure 1&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="figure 1" title="figure 1" srcset="https://substackcdn.com/image/fetch/$s_!opPk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png 424w, https://substackcdn.com/image/fetch/$s_!opPk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png 848w, https://substackcdn.com/image/fetch/$s_!opPk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png 1272w, https://substackcdn.com/image/fetch/$s_!opPk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7e75de2-6743-45b1-b7ef-e376673d9f97_685x502.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">examples of images from the Makrai et al. study [2]</figcaption></figure></div><p>What are the advantages of automatic counting systems:</p><ol><li><p>Enumeration is sometimes a tedious step as it involves visually counting a number of points on a given surface. While in most cases contamination is moderate, there are much more complex scenarios where it is more significant. An automated solution would theoretically save precious time on reading heavily contaminated agar plates.</p></li><li><p>There is also a risk of bias or human error during counting, especially for highly contaminated plates. Consequently, a reproducibility increase could also be a supplementary benefit.</p></li><li><p>Finally, I believe that these tools allowing accelerated reading also offer new opportunities in working methods and organization. We could imagine building double checks, early readings, monitoring of bacterial growth rate over time, etc. on this basis. This opens up a range of possibilities.</p></li></ol><p>Digital counting tools, allowing colonies to be pointed out on a screen, already exist. Some methodologies had already been explored in an <a href="https://www.a3p.org/scanstation-analyse-microbiologique/">A3P article</a> [3]. However, these tools that facilitate counting with a grid and/or rely on image contrast do not solve all problems.</p><p>In this case, it seemed relevant to me to use Artificial Intelligence, because detecting whether a point is a bacterial colony is more complex than it appears. Indeed:</p><ul><li><p>There can be many false positives: air bubbles, condensation droplets, human writings, external stains, etc.</p></li><li><p>The appearance of colonies is highly variable (round, fluffy, etc.), and multiple colonies can coexist.</p></li><li><p>There may be low contrast between the color of the colonies and the color of the support.</p></li><li><p>Some colonies will be difficult to count because they are located at the edge of the plate or overlapping each other.</p></li><li><p>... etc.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Brzq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Brzq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png 424w, https://substackcdn.com/image/fetch/$s_!Brzq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png 848w, https://substackcdn.com/image/fetch/$s_!Brzq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png 1272w, https://substackcdn.com/image/fetch/$s_!Brzq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Brzq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png" width="534" height="234" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/602a5012-072c-423a-8314-e58637198f16_534x234.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:234,&quot;width&quot;:534,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:157822,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Brzq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png 424w, https://substackcdn.com/image/fetch/$s_!Brzq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png 848w, https://substackcdn.com/image/fetch/$s_!Brzq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png 1272w, https://substackcdn.com/image/fetch/$s_!Brzq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602a5012-072c-423a-8314-e58637198f16_534x234.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Example of overlapping colonies</figcaption></figure></div><p>For all these reasons, I decided to use <a href="https://en.wikipedia.org/wiki/Computer_vision">Computer Vision tools</a> [4], which allow training a model to recognize the external appearance of a colony.</p><h2>The Project Charter</h2><p>When building digital tools, it's important to plan the project well and focus on the notion of value. The idea is to use Artificial Intelligence only if it is necessary and if there is a return on investment (=ROI) at stake.</p><p>To properly identify the necessary resources, well describe the use of the tool, and ensure that it brings real added value, I have grouped the different information in a document. This type of document is called a "Lean Canvas" or an "AI Project Canvas" [5] and it provides a 360&#176; overview of a digital project.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SusD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SusD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SusD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SusD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SusD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SusD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg" width="1456" height="818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:601942,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SusD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SusD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SusD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SusD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb686ec73-c09d-4ff6-8e2a-941942445f90_1968x1106.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AI project Canvas - CFU counting - Data Boost</figcaption></figure></div><h2>Some Important <em>disclaimers</em></h2><p>I conclude this article with a few points that seem important to specify before continuing.</p><ul><li><p>This project is a case study, therefore the information in the project charter, especially financial information, is purely fictional; the document simply serves to illustrate the project.</p></li><li><p>Through this project, I absolutely do not promote the fact to replace the microbiologists with automated reading. Reading an agar plate requires expertise. In my philosophy, the purpose of AI tools is not to replace experts, but to reduce workplace difficulties and improve the organization of our laboratories. To quote Garry Kasparov [6]:</p><blockquote><p><em><strong>AI Should Augment Human Intelligence, Not Replace It</strong></em></p></blockquote></li><li><p>This project is what we call a PoC (Proof Of Concept). The goal is to create a prototype together; it will absolutely not be a finished product, which would involve hundreds of hours of teamwork. Be lenient on the result.</p></li></ul><div><hr></div><p>The next article will popularize the methodology used to train a <em>Deep Learning</em> model from a dataset of agar plate images.</p><p><em>Stay tuned, and ... Let's Rock with Data &#127926; and bacteria &#129440;</em></p><p>All questions and comments are fully welcome in the comments section.</p><div><hr></div><p><strong>Sources</strong></p><h5>[1] <a href="https://supermicrobiologistes.fr/la-carte-des-labos-de-microbiologie/">https://supermicrobiologistes.fr/la-carte-des-labos-de-microbiologie/</a></h5><h5>[2] <em>Makrai, L., Fodr&#243;czy, B., Nagy, S.&#193;.&nbsp;et al.&nbsp;Annotated dataset for deep-learning-based bacterial colony detection.&nbsp;Sci Data&nbsp;10, 497 (2023).</em> <a href="https://doi.org/10.1038/s41597-023-02404-8">https://doi.org/10.1038/s41597-023-02404-8</a> <a href="https://www.nature.com/articles/s41597-023-02404-8">https://www.nature.com/articles/s41597-023-02404-8</a></h5><h5>[3] <a href="https://www.a3p.org/scanstation-analyse-microbiologique/">https://www.a3p.org/scanstation-analyse-microbiologique/</a></h5><h5>[4] <a href="https://en.wikipedia.org/wiki/Computer_vision">https://en.wikipedia.org/wiki/Computer_vision</a></h5><h5>[5] <a href="https://towardsdatascience.com/introducing-the-ai-project-canvas-e88e29eb7024">https://towardsdatascience.com/introducing-the-ai-project-canvas-e88e29eb7024</a></h5><h5>[6] <a href="https://hbr.org/2021/03/ai-should-augment-human-intelligence-not-replace-it">https://hbr.org/2021/03/ai-should-augment-human-intelligence-not-replace-it</a></h5><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://databoostindustry.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;S'abonner&quot;,&quot;language&quot;:&quot;fr&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading DATA BOOST INDUSTRY! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Tapez votre e-mail&#8230;" tabindex="-1"><input type="submit" class="button primary" value="S'abonner"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>