{"id":1732,"date":"2024-01-08T12:32:35","date_gmt":"2024-01-08T12:32:35","guid":{"rendered":"https:\/\/www.pyxll.com\/blog\/?p=1732"},"modified":"2024-01-08T12:32:36","modified_gmt":"2024-01-08T12:32:36","slug":"python-in-excel-for-youtube-data-research","status":"publish","type":"post","link":"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/","title":{"rendered":"Python in Excel for YouTube Data Research"},"content":{"rendered":"\n<p>Unlock the full potential of your YouTube video research by seamlessly integrating Python in Excel. As a content creator, staying ahead requires insightful data on video performance and trending topics. While YouTube offers valuable data, the real game-changer is having that information readily available in Excel!<\/p>\n\n\n\n<p>Traditional web scraping tools like PowerQuery fall short when it comes to extracting dynamic data requiring JavaScript execution. Python, on the other hand, excels in complex web scraping tasks, allowing us to navigate through sites like YouTube with ease, even in the presence of client-side JavaScript rendering.<\/p>\n\n\n\n<p>In this article, we will delve into the art of web scraping YouTube data using Python. Moreover, we&#8217;ll demonstrate how to effortlessly bridge the gap between Python and Excel, enabling interactive querying of scraped data directly within Excel.<\/p>\n\n\n\n<p>Our weapon of choice for this seamless integration is PyXLL, the Python Excel add-in. Unlike other tools, PyXLL stands out by allowing Python code to connect to internet resources without any sandboxing limitations. Running your Python code locally on your PC within Excel, PyXLL ensures your code and data remain secure, never leaving your machine. Plus, you can easily distribute your Python-powered Excel tools to colleagues, extending the benefits to non-Python users.<\/p>\n\n\n\n<p>Empower your YouTube research with the perfect synergy of Python&#8217;s robust web scraping capabilities and Excel&#8217;s intuitive data analysis environment. Let&#8217;s transform your data-driven insights and elevate your content creation strategy!<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a000d8c85295\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a000d8c85295\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Step_1_%E2%80%93_Getting_the_data_in_Python\" >Step 1 &#8211; Getting the data in Python<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Installing_the_Selenium_Python_Package\" >Installing the Selenium Python Package<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Installing_the_Selenium_Web_Driver\" >Installing the Selenium Web Driver<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Fetching_a_web_page_using_Selenium\" >Fetching a web page using Selenium<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Putting_it_altogether_in_a_function\" >Putting it altogether in a function<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Step_2_%E2%80%93_Calling_our_Python_function_from_Excel\" >Step 2 &#8211; Calling our Python function from Excel<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Installing_the_PyXLL_add-in\" >Installing the PyXLL add-in<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Adding_our_%E2%80%9Cyoutube_stats%E2%80%9D_Python_function_to_Excel\" >Adding our &#8220;youtube_stats&#8221; Python  function to Excel<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Calling_our_%E2%80%9Cyoutube_stats%E2%80%9D_Python_function_in_Excel\" >Calling our &#8220;youtube_stats&#8221; Python function in Excel<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.pyxll.com\/blog\/python-in-excel-for-youtube-data-research\/#Wrapping_Up\" >Wrapping Up<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading has-large-font-size\" id=\"h-step-1-getting-the-data-in-python\"><span class=\"ez-toc-section\" id=\"Step_1_%E2%80%93_Getting_the_data_in_Python\"><\/span>Step 1 &#8211; Getting the data in Python<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>For most web scraping tasks in Python I would reach for the <a href=\"https:\/\/beautiful-soup-4.readthedocs.io\/en\/latest\/\" target=\"_blank\" rel=\"noreferrer noopener\">BeautifulSoup<\/a> package. BeautifulSoup is a really great web scraping Python package that greatly simplifies extracting data from HTML documents. However, like a lot of sites, YouTube makes extensive use of JavaScript and so the content we need isn&#8217;t available in the plain HTML files, it needs the pages to actually be rendered by a web browser.<\/p>\n\n\n\n<p>All is not lost though! We can use <a href=\"https:\/\/www.selenium.dev\/\" target=\"_blank\" rel=\"noreferrer noopener\">Selenium<\/a> to render the pages for us, including the JavaScript, and fetch the data we need. Selenium operates as a web driver, functioning similarly to a standard web browser. However, instead of relying on human input for control, it operates through automation scripted tasks. Commonly employed in web testing, Selenium also proves valuable for web scraping, as demonstrated in this task.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-installing-the-selenium-python-package\"><span class=\"ez-toc-section\" id=\"Installing_the_Selenium_Python_Package\"><\/span>Installing the Selenium Python Package<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>To install the Selenium Python package simply run<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\npip install selenium\n<\/pre><\/div>\n\n\n<p>That will download and install the Selenium Python package into your Python environment. If you are using a Python virtual environment or conda environment, make sure you activate that before running the above command.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-installing-the-selenium-web-driver\"><span class=\"ez-toc-section\" id=\"Installing_the_Selenium_Web_Driver\"><\/span>Installing the Selenium Web Driver<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In order for Selenium to work it requires a <em>Web Driver<\/em>. The web driver is crucial for rendering the web page. It acts as a complete web browser implementation, excluding the user interface components. It is responsible for displaying the web page of interest accurately.<\/p>\n\n\n\n<p>There are various different web drivers available:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Web Driver Name<\/strong><\/td><td><strong>Browser<\/strong><\/td><td><strong>Download<\/strong><\/td><\/tr><tr><td>ChromeDrive<\/td><td>Google Chrome<\/td><td><a href=\"https:\/\/sites.google.com\/a\/chromium.org\/chromedriver\/downloads\" target=\"_blank\" rel=\"noreferrer noopener\">Download<\/a><\/td><\/tr><tr><td>GeckoDriver<\/td><td>Firefox<\/td><td><a href=\"https:\/\/github.com\/mozilla\/geckodriver\/releases\" target=\"_blank\" rel=\"noreferrer noopener\">Download<\/a><\/td><\/tr><tr><td>WebDriver<\/td><td>Microsoft Edge<\/td><td><a href=\"https:\/\/developer.microsoft.com\/en-us\/microsoft-edge\/tools\/webdriver\/\" target=\"_blank\" rel=\"noreferrer noopener\">Download<\/a><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>For the web driver to work with Selenium it needs to be installed on your System Path. To do that, follow these steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Download the webdriver of your choice from the links above.<\/li>\n\n\n\n<li>Extract the downloaded ZIP file and copy the web driver file somewhere on your PC.<\/li>\n\n\n\n<li>Copy the path of the folder where you saved that web driver executable file.<\/li>\n\n\n\n<li>In a Windows File Explorer, find &#8220;This PC&#8221; and right-click it and select &#8220;Properties&#8221;.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/Cct-nNHPYh28y4Yl22J7vdiI_9A8Q0k6THh0OzmVl9NRHeS_u1Jsmc6QH65KEf3iyzR4BWUmZoCAX6F0QkC-qR2M_xh7u9Zaxu1yXuZJ7BXIbQ5_2lEaEMZmQ25wvybVEExd-A2K1dxpePz2FHPi_SM\" alt=\"Accessing PC Properties\"\/><\/figure>\n\n\n\n<ol class=\"wp-block-list\" start=\"5\">\n<li>Select &#8220;Advanced system settings&#8221; and then &#8220;Environment Variables&#8221;<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/sRAAGF0Ju7pr4O6-siA5bpC0IPSf5dsuj-uxmpd52CxWWCoMO9xW313DND9CiRjDmZ2JWUgnsyZBurAqqJhIWz0Rg94mO3aSkU23AgG8sRfaJ3UUV8LV7eZWEOBibY0_KFaJY-RY9R8PYOymuTdj2XM\" alt=\"Setting PC Environment Variables\"\/><\/figure>\n\n\n\n<ol class=\"wp-block-list\" start=\"6\">\n<li>Find &#8220;Path&#8221; under &#8220;User variables&#8221; and add the folder you copied in step 3 to the Path and click &#8220;Ok&#8221; to save the settings.<\/li>\n<\/ol>\n\n\n\n<p>Note: You will need to re-open any command prompts and your Python IDE (e.g. VS Code) in order for the change to the Path to take effect.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-fetching-a-web-page-using-selenium\"><span class=\"ez-toc-section\" id=\"Fetching_a_web_page_using_Selenium\"><\/span>Fetching a web page using Selenium<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Now we&#8217;re ready to start writing some Python code!<\/p>\n\n\n\n<p>Note: For the rest of this article we will use the Chrome web driver. If you are using a different web driver you will have to make some small changes to the code.<\/p>\n\n\n\n<p>To work through and understand the code I will use a Jupyter notebook. This is a convenient way to be able to edit and run code in small blocks. Understanding the required code makes it simple to refactor into a Python module. This module becomes usable from anywhere, including Excel.<\/p>\n\n\n\n<p>We first need to create a selenium WebDriver object. This is what will allow us to fetch and inspect a web page to get the information we need. Enter the following code into a Jupyter notebook and run the cell:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom selenium import webdriver\ndriver = webdriver.Chrome()\ndriver.get(&quot;https:\/\/www.youtube.com\/results?search_query=python&quot;)\n<\/pre><\/div>\n\n\n<p>This will open a new browser window displaying that page. We can then use the browser inspector to identify web page elements. Later, we&#8217;ll learn how to achieve this without displaying the browser.<\/p>\n\n\n\n<p>In this case, we want all elements with the id &#8220;video-title&#8221;. To ask selenium to find all of these &#8220;video-title&#8221; elements we use what&#8217;s called an &#8220;X-Path&#8221;. In the browser inspector you can right click on an element and select &#8220;Copy -&gt; Copy X-Path&#8221;. Doing that, we get the following X-Path:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/*&#x5B;@id=&quot;video-title&quot;]\n<\/pre><\/div>\n\n\n<p>Now we use the WebDriver object again to get all elements matching that X-Path and append each link to a list:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nelements = driver.find_elements(&quot;xpath&quot;, &#039;\/\/*&#x5B;@id=&quot;video-title&quot;]&#039;)\nvideos = &#x5B;]\nfor element in elements:\n    href = element.get_attribute(\u2018href\u2019)\n    title = element.text\n    videos.append({\n       \u201chref\u201d: href,\n       \u201ctitle\u201d: title\n    })\n<\/pre><\/div>\n\n\n<p>&#8220;videos&#8221; is a Python list of dictionaries containing all of the video links and titles found on the page we&#8217;ve queried.<\/p>\n\n\n\n<p>There&#8217;s more information we can still extract from this page. It would be interesting to know how many views each video has had, and how long ago the video was made. This might give us some ideas about what topics are popular and provide inspiration for our own next video.<\/p>\n\n\n\n<p>To find the number of views go back to the browser inspector and select one of the elements that shows the number of views for a video. Select Copy -&gt; Copy X Path again and you&#8217;ll see that it gives you this X-Path:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/*&#x5B;@id=&quot;metadata-line&quot;]\/span&#x5B;1]\n<\/pre><\/div>\n\n\n<p>We can use this to find all matching elements, as before, but we need to do something slightly different this time as we want to make sure we get the right number of views for each video.<\/p>\n\n\n\n<p>So, instead, we look for a block further up that contains both the video title and the number of views. There&#8217;s one with the id &#8220;dismissible&#8221; which has both the title and the number of views contained within it. The X-Path for that element is<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: xml; title: ; notranslate\" title=\"\">\n\/\/*&#x5B;@id=&quot;dismissible&quot;]\n<\/pre><\/div>\n\n\n<p>Let&#8217;s change our code to iterate over all elements matching that X-Path. We can then locate the video title and the number of views beneath each element. To find elements below a specific element using an X-Path, add a &#8220;.&#8221; to the start of the X-Path. This causes the search to start from the current element.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# Find all elements containing a video\nelements = driver.find_elements(&quot;xpath&quot;, &#039;\/\/*&#x5B;@id=&quot;dismissible&quot;]&#039;)\n\n# Iterate over each element to get the details\nvideos = &#x5B;]\nfor element in elements:\n    # Start with an empty dict\n    video = {}\n    \n    # Find any title elements below this element\n    titles = element.find_elements(&quot;xpath&quot;, &#039;.\/\/*&#x5B;@id=&quot;video-title&quot;]&#039;)\n    \n    # Take the first and add it to the video dict\n    if len(titles) &gt; 0:\n        title = titles&#x5B;0]\n        video&#x5B;&quot;title&quot;] = title.text\n        video&#x5B;&quot;href&quot;] = title.get_attribute(&#039;href&#039;)\n        \n    # Find any number of views elements below this element\n    num_views = element.find_elements(&quot;xpath&quot;, &#039;.\/\/*&#x5B;@id=&quot;metadata-line&quot;]\/span&#x5B;1]&#039;)\n\n    # Take the first and add it to the video dict\n    if len(num_views) &gt; 0:\n        num_views = num_views&#x5B;0]\n        video&#x5B;&quot;num_views&quot;] = num_views.text\n\n    videos.append(video)\n<\/pre><\/div>\n\n\n<p>With this method, you can continue extracting additional details progressively. It also allows loading each video&#8217;s page to gather more information from there if needed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-putting-it-altogether-in-a-function\"><span class=\"ez-toc-section\" id=\"Putting_it_altogether_in_a_function\"><\/span>Putting it altogether in a function<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>In order to be able to easily re-use this code we&#8217;re restructure it into a Python function that will return these video details in a pandas DataFrame for a given query. Our function will look like this:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndef youtube_stats(search_query):\n    return a DataFrame of video details\n<\/pre><\/div>\n\n\n<p>We&#8217;ll then be able to call that whenever we need, even from inside Excel using the PyXLL Python Excel add-in!<\/p>\n\n\n\n<p>We&#8217;ll start with the last part first, turning our list of dictionaries into a DataFrame. The following code will do that using the videos list from the earlier code:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nimport pandas as pd\n\ndf = pd.DataFrame(videos, columns=&#x5B;&quot;title&quot;, &quot;num_views&quot;, &quot;href&quot;])\n<\/pre><\/div>\n\n\n<p>Next, when calling this as a function we don&#8217;t really want the web browser window to be displayed each time we call the function. To prevent that we have to pass some options when constructing the WebDriver object:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\noptions = webdriver.ChromeOptions()\noptions.add_argument(&quot;--headless&quot;)\n\ndriver = webdriver.Chrome(options)\n<\/pre><\/div>\n\n\n<p>Now, assembling all of these code fragments into one function gives us:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom selenium import webdriver\nimport pandas as pd\n\n\ndef youtube_stats(search_query):\n    &quot;&quot;&quot;Search for videos and return details of each as a DataFrame.&quot;&quot;&quot;\n\n    # Get the headless Chrome web driver object\n    options = webdriver.ChromeOptions()\n    options.add_argument(&quot;--headless&quot;)\n    driver = webdriver.Chrome(options)\n\n    # Get the query result page\n    url = f&quot;https:\/\/www.youtube.com\/results?search_query={search_query}&quot;\n    driver.get(url)\n\n    # Find all elements containing a video\n    elements = driver.find_elements(&quot;xpath&quot;, &#039;\/\/*&#x5B;@id=&quot;dismissible&quot;]&#039;)\n\n    # Iterate over each element to get the details\n    videos = &#x5B;]\n    for element in elements:\n        # Start with an empty dict\n        video = {}\n    \n        # Find any title elements below this element\n        titles = element.find_elements(&quot;xpath&quot;, &#039;.\/\/*&#x5B;@id=&quot;video-title&quot;]&#039;)\n    \n        # Take the first and add it to the video dict\n        if len(titles) &gt; 0:\n            title = titles&#x5B;0]\n            video&#x5B;&quot;title&quot;] = title.text\n            video&#x5B;&quot;href&quot;] = title.get_attribute(&#039;href&#039;)\n        \n        # Find any number of views elements below this element\n        num_views = element.find_elements(&quot;xpath&quot;, &#039;.\/\/*&#x5B;@id=&quot;metadata-line&quot;]\/span&#x5B;1]&#039;)\n\n        # Take the first and add it to the video dict\n        if len(num_views) &gt; 0:\n            num_views = num_views&#x5B;0]\n            video&#x5B;&quot;num_views&quot;] = num_views.text\n\n        videos.append(video)\n\n    # Build a DataFrame from the videos list\n    df = pd.DataFrame(videos, columns=&#x5B;&quot;title&quot;, &quot;num_views&quot;, &quot;href&quot;])\n\n    # Return the DataFrame\n    return df\n<\/pre><\/div>\n\n\n<p>We&#8217;ve now got a complete Python function that will search for videos on YouTube and return us a DataFrame of details about those results!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-step-2-calling-our-python-function-from-excel\"><span class=\"ez-toc-section\" id=\"Step_2_%E2%80%93_Calling_our_Python_function_from_Excel\"><\/span>Step 2 &#8211; Calling our Python function from Excel<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>My end goal was to get the data collected in step 1 into Excel and use it as part of an interactive worksheet that I can use with different queries.<\/p>\n\n\n\n<p>To call the Python function from inside of Excel and use it interactively we will use the Python Excel add-in <a href=\"https:\/\/www.pyxll.com\/\">PyXLL<\/a>. Unlike other Python in Excel tools, the PyXLL add-in lets me use my own local Python environment with Selenium installed, and it can access data from the web (which is critical for this particular application!).<\/p>\n\n\n\n<p>Installing the PyXLL add-in is straightforward, and there are resources to help you online if you get stuck. If you already have Python installed, all that is needed is to install the PyXLL add-in and then configure it to load the Python code.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-installing-the-pyxll-add-in\"><span class=\"ez-toc-section\" id=\"Installing_the_PyXLL_add-in\"><\/span>Installing the PyXLL add-in<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>You can find instructions on how to install the PyXLL add-in online in the <a href=\"https:\/\/www.pyxll.com\/docs\/userguide\/installation\/firsttime.html\" target=\"_blank\" rel=\"noreferrer noopener\">First Time Users<\/a> page of the documentation. First the PyXLL python package is installed using &#8220;pip&#8221;<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\npip install pyxll\n<\/pre><\/div>\n\n\n<p>That installs the PyXLL Python package, and part of that package is a &#8220;pyxll&#8221; command line tool which can be used to install the PyXLL Excel add-in by running &#8220;pyxll install&#8221; (you can also use &#8220;python -m pyxll install&#8221; if the &#8220;pyxll&#8221; command is not found).<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\npyxll install\n<\/pre><\/div>\n\n\n<p>The installer leads you through the installation asking a number of questions. When asked if you have already downloaded the PyXLL add-in, answer &#8220;n&#8221; and it will download the necessary files for you.<\/p>\n\n\n\n<p>Once the installer has finished, open Excel and the PyXLL add-in should be installed along with some examples.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/lk-F8jqU-Q0pnPXiC_79Di7deUnIDcFox-ZTnuGDwAs6TZuWuimY2eII0fY10lQZXby3Uk0zymuWVR-cqr7sEhl1Xm8RhN5Zn-4VNARftCWgrKUr6A-ORDeTr3exYfN2wbTsf4-qY-Lo_dSz-WYrUF0\" alt=\"Example workbook using Python in Excel\"\/><\/figure>\n\n\n\n<p>There is an example workbook and Python code included with the install. I would recommend starting by looking through that example workbook to see some of the functionality of the PyXLL add-in.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-adding-our-youtube_stats-python-function-to-excel\"><span class=\"ez-toc-section\" id=\"Adding_our_%E2%80%9Cyoutube_stats%E2%80%9D_Python_function_to_Excel\"><\/span>Adding our &#8220;youtube_stats&#8221; Python  function to Excel<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>To make the &#8220;youtube_stats&#8221; function accessible from outside the Jupyter notebook, follow these steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Create a new Python module named &#8220;youtube_stats.py&#8221; using your preferred Python IDE or a text editor like Notepad.<\/li>\n\n\n\n<li>Copy the previously developed code for the &#8220;youtube_stats&#8221; function into this module.<\/li>\n\n\n\n<li>Choose a location for the file; for example, if using GitHub, create a new folder (e.g., C:\\github\\youtube_stats) and save the file there.<\/li>\n<\/ol>\n\n\n\n<p>This ensures that the &#8220;youtube_stats&#8221; function is organized in a Python module and can be imported and utilized beyond the confines of the Jupyter notebook.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/kZP7FOj_Du9DDSRHJFxYBR_82SGmeYGi2D_G4ZILCSZapqrsctFAIDAYqfHGRx5D7fUjt1XAjYjr2THeS57bZdc3LqBOvwvhou6Asgl8vzqAk4WZy2jemklOm_ABu4cFiQkil9j0NFA7zaG9iYE5HKU\" alt=\"Editing Python code for use in Excel with VS Code\"\/><\/figure>\n\n\n\n<p>There&#8217;s just a couple of changes we need to make to our code to tell the PyXLL add-in how the &#8220;youtube_stats&#8221; function should be callable from Excel. We have to add the &#8220;@xl_func&#8221; decorator to the function, and that&#8217;s imported from the &#8220;pyxll&#8221; Python package.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/cqRVkY7sCKjnSXvYEzoC0A_vrZQGQy5dL981wF9lhkIiomKXBpnTgjv2dCFmUMEjq7eM_A3ff-N1qXUzLBeKyIXqcC0B1uTAePNp1-Zl7va8g9-t6ICLOc1foSJ7PVWTjCCfFEE-RUEtHSGtB-pGgUo\" alt=\"PyXLL's &quot;xl_func&quot; decorator for exporting Python functions to Excel\"\/><\/figure>\n\n\n\n<p>For PyXLL to seamlessly handle the conversion between Excel arguments and Python types, as well as convert the Python result back to Excel values, some additional information needs to be added to the function. While not always strictly necessary due to PyXLL&#8217;s default conversions for most types, it is good practice to provide explicit instructions. One effective approach for specifying the type conversion is through the use of Python type hints.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/ozCGBZc_CqO0dr9gJLTCtxKNSflQ79LjowOrNotsg0fgiuJt4UTc0H2kca3lWgWcAIYW1VCGMmEly0FeBt8JiEZV4T_4_5HQ6Cp91y0nYnhxrLWFa9JtSlHrRTLMp78FRUiAKvtUWoKh5SG6pvQyWOs\" alt=\"Using Python type hints to improve Excel Python functions\"\/><\/figure>\n\n\n\n<p>Finally, we have to tell the PyXLL add-in to import our Python module. That&#8217;s done in the PyXLL configuration file, &#8220;pyxll.cfg&#8221;. This will be in the same folder where you installed PyXLL. If you don&#8217;t remember where that is you can run &#8220;pyxll status&#8221;, or check the &#8220;About&#8221; option in the PyXLL tab in Excel. The pyxll.cfg file is another text file and you can open it using your Python IDE or any text editor.<\/p>\n\n\n\n<p>Tip: You can also run &#8220;pyxll config&#8221; to quickly open the pyxll.cfg file!<\/p>\n\n\n\n<p>In the pyxll.cfg file look for the &#8220;pythonpath&#8221; setting in the &#8220;[PYTHON]&#8221; section. This is a list of folders that Python will look in when importing modules. Add the folder where you created your Python module to this list.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/Gn-WXNuME-nI0iNZGIEIq4oHvAbpwaSkB2vnmCUlRBzSt3YHShDE7Bh_p0_ZMYrA9AycaQSLYTlifKuXUdJ7dzRikaMh9q5zzy2G-S6RSbtQQ_m2R3aaPOGVYCIrYnyvVRFr5bgYhHz6af6fDRPVSVk\" alt=\"Adding a source folder to the Python Path\"\/><\/figure>\n\n\n\n<p>Next, look for the &#8220;modules&#8221; setting in the &#8220;[PYXLL]&#8221; section. This is a list of modules that the PyXLL add-in will import when it loads. Add the &#8220;youtube_stats&#8221; module that you just created to this list. Here we don&#8217;t use the &#8220;.py&#8221; file extension, just the module name.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/yqDg9fBc4Xv7iLJHn4sq-711N8PZ_Xdi7PLk-sS-vWnCxhRaDZ7rs7VbW21K7_gs_-2k5q39ERkiBd0ZOXRFrMBveMeBQZVgcrB0eNCL6adT3bhUSUhCBbJ0boVp9zJXOsAYV8lkobY1htLAG-Gm5xI\" alt=\"Adding a Python module to Excel\"\/><\/figure>\n\n\n\n<p>Now all that&#8217;s left to do is to start Excel and check our new function is working! If you already have Excel started there&#8217;s no need to restart it. Simply click the &#8220;Reload&#8221; button in the PyXLL tab. That will reload the Python modules listed above, including our new one.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-calling-our-youtube_stats-python-function-in-excel\"><span class=\"ez-toc-section\" id=\"Calling_our_%E2%80%9Cyoutube_stats%E2%80%9D_Python_function_in_Excel\"><\/span>Calling our &#8220;youtube_stats&#8221; Python function in Excel<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>If everything has worked correctly, you can now enter the &#8220;youtube_stats&#8221; function directly into the Excel formula bar, just like any other Excel function!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/KAsEJUTyZY-wDDHxrSozc0p95I7PVPDUY0GnZovMIYjlqgN_I30eYyleqggo6-v8qymm__fyKksAeCPZkbQ2aYLO_d0EY7GyDuQo0c1jOSgkmh7sgktiIZUYNHh9_Xl-JxzTx53KMWsPccRo7G_JSEU\" alt=\"Calling the &quot;youtube_stats&quot; Python function in Excel\"\/><\/figure>\n\n\n\n<p><br>Our Python function seamlessly integrates into Excel as a standard Excel function, with the ability to pass the query string as a parameter. Whenever the input query string changes, the Python function is automatically invoked, updating the result dynamically.<\/p>\n\n\n\n<p>The integration of the &#8220;youtube_stats&#8221; function into Excel provides substantial power, enabling interactive and productive manipulation and analysis of extracted data. Leveraging this, I&#8217;ve developed additional tools for swift research on topic ideas for new videos, streamlining the process and enhancing productivity.<\/p>\n\n\n\n<p>What&#8217;s truly remarkable is that individuals unfamiliar with Python can effortlessly use these functions in Excel, without any learning curve. As Python code evolves to introduce new functions or enhance existing ones, it can be shared independently of the workbook. Unlike VBA, the Python code exists outside of Excel, residing in a designated folder on the local PC or a shared network drive. This independence facilitates efficient management and version control, separate from the workbooks utilizing these functions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-wrapping-up\"><span class=\"ez-toc-section\" id=\"Wrapping_Up\"><\/span>Wrapping Up<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>This article demonstrated the capability to scrape data from intricate websites like YouTube, leveraging the Selenium Python package designed for handling JavaScript-intensive sites. For simpler sites utilizing static HTML, BeautifulSoup, another Python package for web scraping, is a slightly simpler alternative, although it lacks support for client-side rendering.<\/p>\n\n\n\n<p>Additionally, we explored how intricate Python functions can seamlessly integrate with Excel through the PyXLL Python Excel add-in. Unlike other tools such as xlsxwriter, openpyxl, and pandas, which are limited to generating static reports, PyXLL empowers the creation of Python function toolkits for interactive use within Excel. This functionality proves valuable for Excel users with minimal or no Python knowledge.<\/p>\n\n\n\n<p>For further details on PyXLL, visit the website <a href=\"https:\/\/www.pyxll.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.pyxll.com<\/a>. While PyXLL is a commercial (paid) product, a 30-day free trial is available, allowing users to explore its numerous features before making a purchase.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Unlock the full potential of your YouTube video research by seamlessly integrating Python in Excel. As a content creator, staying ahead requires insightful data on video performance and trending topics. While YouTube offers valuable data, the real game-changer is having that information readily available in Excel! Traditional web scraping tools like PowerQuery fall short when<\/p>\n","protected":false},"author":1,"featured_media":1747,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[12,2,504],"tags":[660,661,659],"class_list":["post-1732","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python","category-pyxll","category-user-interface","tag-selenium","tag-webscraping","tag-youtube"],"acf":[],"jetpack_featured_media_url":"https:\/\/www.pyxll.com\/blog\/wp-content\/uploads\/2024\/01\/blog-card.jpg","jetpack_shortlink":"https:\/\/wp.me\/p7l3LP-rW","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/posts\/1732","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/comments?post=1732"}],"version-history":[{"count":1,"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/posts\/1732\/revisions"}],"predecessor-version":[{"id":1746,"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/posts\/1732\/revisions\/1746"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/media\/1747"}],"wp:attachment":[{"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/media?parent=1732"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/categories?post=1732"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pyxll.com\/blog\/wp-json\/wp\/v2\/tags?post=1732"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}