{"id":132172,"date":"2023-03-02T19:41:16","date_gmt":"2023-03-02T19:41:16","guid":{"rendered":"https:\/\/blog.finxter.com\/?p=1177626"},"modified":"2023-03-02T19:41:16","modified_gmt":"2023-03-02T19:41:16","slug":"building-a-qa-bot-with-openai-a-step-by-step-guide-to-scraping-websites-and-answer-questions","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2023\/03\/02\/building-a-qa-bot-with-openai-a-step-by-step-guide-to-scraping-websites-and-answer-questions\/","title":{"rendered":"Building a Q&amp;A Bot with OpenAI: A Step-by-Step Guide to Scraping Websites and Answer Questions"},"content":{"rendered":"\n<div class=\"kk-star-ratings kksr-auto kksr-align-left kksr-valign-top\" data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;1177626&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;top&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;2&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;starsonly&quot;:&quot;&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;Rate this post&quot;,&quot;legend&quot;:&quot;5\\\/5 - (2 votes)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;width&quot;:&quot;142.5&quot;,&quot;_legend&quot;:&quot;{score}\\\/{best} - ({count} {votes})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>\n<div class=\"kksr-stars\">\n<div class=\"kksr-stars-inactive\">\n<div class=\"kksr-star\" data-star=\"1\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" data-star=\"2\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" data-star=\"3\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" data-star=\"4\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" data-star=\"5\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<div class=\"kksr-stars-active\" style=\"width: 142.5px;\">\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<div class=\"kksr-star\" style=\"padding-right: 5px\">\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<div class=\"kksr-legend\" style=\"font-size: 19.2px;\"> 5\/5 &#8211; (2 votes) <\/div>\n<\/p><\/div>\n<p>Have you ever found yourself deep in the internet rabbit hole, searching for an answer to a question that just won&#8217;t quit? <\/p>\n<p>It can be frustrating to sift through all the online information and still come up empty-handed. <strong>But what if there was a way to get accurate and reliable answers in a snap?<\/strong> Enter the Q&amp;A bot &#8211; your new best friend for all your pressing questions!<\/p>\n<p class=\"has-base-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/14.0.0\/72x72\/2705.png\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> In this blog, we will take you on a wild ride to show you how to build your very own Q&amp;A bot using OpenAI&#8217;s language models. We&#8217;ll guide you through the process of scraping text from a website, processing it, and using OpenAI&#8217;s language models to find the answers you seek. <\/p>\n<p>And let&#8217;s face it, who doesn&#8217;t love having a robot friend that can answer all their burning questions? So buckle up and let&#8217;s build a quirky, lovable Q&amp;A bot together!<\/p>\n<p>You can check out the whole code project on the <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/openai\/openai-cookbook\/blob\/main\/apps\/web-crawl-q-and-a\/web-qa.py\" data-type=\"URL\" data-id=\"https:\/\/github.com\/openai\/openai-cookbook\/blob\/main\/apps\/web-crawl-q-and-a\/web-qa.py\" target=\"_blank\">GitHub<\/a> (cookbook). I&#8217;ll explain the steps in the following<\/p>\n<h2>Overview<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"814\" height=\"447\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-15.png\" alt=\"\" class=\"wp-image-1177651\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-15.png 814w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-15-300x165.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-15-768x422.png 768w\" sizes=\"auto, (max-width: 814px) 100vw, 814px\" \/><\/figure>\n<\/div>\n<p>This tutorial presents a Python script that <\/p>\n<ul>\n<li>crawls a website, <\/li>\n<li>extracts the text from the webpages, <\/li>\n<li>tokenizes the text, and <\/li>\n<li>creates embeddings for each text (quick explanation on &#8220;embeddings&#8221; below).<\/li>\n<\/ul>\n<p>It then uses OpenAI&#8217;s API to answer questions based on the embeddings of the text.<\/p>\n<p>You will need to create your own API key in case you want to try it yourself.<\/p>\n<p class=\"has-base-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/14.0.0\/72x72\/1f449.png\" alt=\"\ud83d\udc49\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Recommended<\/strong>: <a href=\"https:\/\/blog.finxter.com\/openai-api-or-how-i-made-my-python-code-intelligent\/\" data-type=\"post\" data-id=\"1081478\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI API \u2013 or How I Made My Python Code Intelligent<\/a><\/p>\n<p>You should also install the <code>openai<\/code> library &#8212; I&#8217;ve written a blog tutorial on this too:<\/p>\n<p class=\"has-base-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/14.0.0\/72x72\/1f449.png\" alt=\"\ud83d\udc49\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Recommended<\/strong>: <a href=\"https:\/\/blog.finxter.com\/how-to-install-openai-in-python\/\" data-type=\"post\" data-id=\"1170845\" target=\"_blank\" rel=\"noreferrer noopener\">How to Install OpenAI in Python?<\/a><\/p>\n<p>Scroll down to the <a href=\"#wholecode\" data-type=\"internal\" data-id=\"#wholecode\">whole code section<\/a> if you want to try it by copy&amp;paste. <\/p>\n<h2>Step 1<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"564\" height=\"389\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-16.png\" alt=\"\" class=\"wp-image-1177653\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-16.png 564w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-16-300x207.png 300w\" sizes=\"auto, (max-width: 564px) 100vw, 564px\" \/><\/figure>\n<\/div>\n<p>This section of the code imports the necessary Python libraries for the script, including <code>requests<\/code> for sending HTTP requests, <code>re<\/code> for regular expressions, <code>urllib.request<\/code> for opening URLs, <code>BeautifulSoup<\/code> for parsing HTML and XML, <code>deque<\/code> for creating a queue, <code>HTMLParser<\/code> for parsing HTML, <code>urlparse<\/code> for parsing URLs, <code>os<\/code> for interacting with the operating system, <code>pandas<\/code> for working with dataframes, <code>tiktoken<\/code> for getting a tokenizer, and <code>openai<\/code> for creating embeddings and answering questions.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 1\n################################################################################ import requests\nimport re\nimport urllib.request\nfrom bs4 import BeautifulSoup\nfrom collections import deque\nfrom html.parser import HTMLParser\nfrom urllib.parse import urlparse\nimport os\nimport pandas as pd\nimport tiktoken\nimport openai\nfrom openai.embeddings_utils import distances_from_embeddings\nimport numpy as np\nfrom openai.embeddings_utils import distances_from_embeddings, cosine_similarity # Regex pattern to match a URL\nHTTP_URL_PATTERN = r'^http[s]*:\/\/.+' # Define root domain to crawl\ndomain = \"openai.com\"\nfull_url = \"https:\/\/openai.com\/\" # Create a class to parse the HTML and get the hyperlinks\nclass HyperlinkParser(HTMLParser): def __init__(self): super().__init__() # Create a list to store the hyperlinks self.hyperlinks = [] # Override the HTMLParser's handle_starttag method to get the hyperlinks def handle_starttag(self, tag, attrs): attrs = dict(attrs) # If the tag is an anchor tag and it has an href attribute, add the href attribute to the list of hyperlinks if tag == \"a\" and \"href\" in attrs: self.hyperlinks.append(attrs[\"href\"])<\/pre>\n<h2>Step 2<\/h2>\n<p>This section of the code defines a function called <code>get_hyperlinks<\/code> that takes a URL as input, tries to open the URL and read the HTML, and then parses the HTML to get hyperlinks. If the response is not HTML, it returns an <a href=\"https:\/\/blog.finxter.com\/how-to-create-an-empty-list-in-python\/\" data-type=\"post\" data-id=\"453870\" target=\"_blank\" rel=\"noreferrer noopener\">empty list<\/a>.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 2\n################################################################################ # Function to get the hyperlinks from a URL\ndef get_hyperlinks(url): # Try to open the URL and read the HTML try: # Open the URL and read the HTML with urllib.request.urlopen(url) as response: # If the response is not HTML, return an empty list if not response.info().get('Content-Type').startswith(\"text\/html\"): return [] # Decode the HTML html = response.read().decode('utf-8') except Exception as e: print(e) return [] # Create the HTML Parser and then Parse the HTML to get hyperlinks parser = HyperlinkParser() parser.feed(html) return parser.hyperlinks\n<\/pre>\n<h2>Step 3<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"562\" height=\"577\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-17.png\" alt=\"\" class=\"wp-image-1177655\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-17.png 562w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-17-292x300.png 292w\" sizes=\"auto, (max-width: 562px) 100vw, 562px\" \/><\/figure>\n<\/div>\n<p>This section of the code defines a function called <code>get_domain_hyperlinks<\/code> that takes a domain and a URL as input and returns a list of hyperlinks from the URL that are within the same domain. If the hyperlink is a URL, it checks if it is within the same domain. If the hyperlink is not a URL, it checks if it is a relative link.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 3\n################################################################################ # Function to get the hyperlinks from a URL that are within the same domain\ndef get_domain_hyperlinks(local_domain, url): clean_links = [] for link in set(get_hyperlinks(url)): clean_link = None # If the link is a URL, check if it is within the same domain if re.search(HTTP_URL_PATTERN, link): # Parse the URL and check if the domain is the same url_obj = urlparse(link) if url_obj.netloc == local_domain: clean_link = link # If the link is not a URL, check if it is a relative link else: if link.startswith(\"\/\"): link = link[1:] elif link.startswith(\"#\") or link.startswith(\"mailto:\"): continue clean_link = \"https:\/\/\" + local_domain + \"\/\" + link if clean_link is not None: if clean_link.endswith(\"\/\"): clean_link = clean_link[:-1] clean_links.append(clean_link) # Return the list of hyperlinks that are within the same domain return list(set(clean_links))<\/pre>\n<h2>Step 4<\/h2>\n<p>This section of the code defines a function called <code>crawl<\/code> that takes a URL as input, parses the URL to get the domain, creates a queue to store the URLs to crawl, creates a set to store the URLs that have already been seen (no duplicates), and creates a directory to store the text files. It then continues crawling until the queue is empty, saving the text from each URL to a text file, and getting the hyperlinks from each URL and adding them to the queue.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 4\n################################################################################ def crawl(url): # Parse the URL and get the domain local_domain = urlparse(url).netloc # Create a queue to store the URLs to crawl queue = deque([url]) # Create a set to store the URLs that have already been seen (no duplicates) seen = set([url]) # Create a directory to store the text files if not os.path.exists(\"text\/\"): os.mkdir(\"text\/\") if not os.path.exists(\"text\/\"+local_domain+\"\/\"): os.mkdir(\"text\/\" + local_domain + \"\/\") # Create a directory to store the csv files if not os.path.exists(\"processed\"): os.mkdir(\"processed\") # While the queue is not empty, continue crawling while queue: # Get the next URL from the queue url = queue.pop() print(url) # for debugging and to see the progress # Save text from the url to a &lt;url>.txt file with open('text\/'+local_domain+'\/'+url[8:].replace(\"\/\", \"_\") + \".txt\", \"w\", encoding=\"UTF-8\") as f: # Get the text from the URL using BeautifulSoup soup = BeautifulSoup(requests.get(url).text, \"html.parser\") # Get the text but remove the tags text = soup.get_text() # If the crawler gets to a page that requires JavaScript, it will stop the crawl if (\"You need to enable JavaScript to run this app.\" in text): print(\"Unable to parse page \" + url + \" due to JavaScript being required\") # Otherwise, write the text to the file in the text directory f.write(text) # Get the hyperlinks from the URL and add them to the queue for link in get_domain_hyperlinks(local_domain, url): if link not in seen: queue.append(link) seen.add(link) crawl(full_url)<\/pre>\n<h2>Step 5<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"565\" height=\"373\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-18.png\" alt=\"\" class=\"wp-image-1177656\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-18.png 565w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-18-300x198.png 300w\" sizes=\"auto, (max-width: 565px) 100vw, 565px\" \/><\/figure>\n<\/div>\n<p>This section of the code defines a function called <code>remove_newlines<\/code> that takes a pandas Series object as input, replaces newlines with spaces, and returns the modified Series.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 5\n################################################################################ def remove_newlines(serie): serie = serie.str.replace('\\n', ' ') serie = serie.str.replace('\\\\n', ' ') serie = serie.str.replace(' ', ' ') serie = serie.str.replace(' ', ' ') return serie<\/pre>\n<h2>Step 6<\/h2>\n<p>This section of the code creates a list called <code>texts<\/code> to store the text files, gets all the text files in the text directory, opens each file, reads the text, omits the first 11 lines and the last 4 lines, replaces -, _, and #update with spaces, and appends the modified text to the list of texts. It then creates a dataframe from the list of texts, sets the text column to be the raw text with the newlines removed, and saves the dataframe as a CSV file.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 6\n################################################################################ # Create a list to store the text files\ntexts=[] # Get all the text files in the text directory\nfor file in os.listdir(\"text\/\" + domain + \"\/\"): # Open the file and read the text with open(\"text\/\" + domain + \"\/\" + file, \"r\", encoding=\"UTF-8\") as f: text = f.read() # Omit the first 11 lines and the last 4 lines, then replace -, _, and #update with spaces. texts.append((file[11:-4].replace('-',' ').replace('_', ' ').replace('#update',''), text)) # Create a dataframe from the list of texts\ndf = pd.DataFrame(texts, columns = ['fname', 'text']) # Set the text column to be the raw text with the newlines removed\ndf['text'] = df.fname + \". \" + remove_newlines(df.text)\ndf.to_csv('processed\/scraped.csv')\ndf.head()<\/pre>\n<h2>Step 7<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"602\" height=\"406\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-19.png\" alt=\"\" class=\"wp-image-1177657\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-19.png 602w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-19-300x202.png 300w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><\/figure>\n<\/div>\n<p>This section of the code loads a tokenizer and applies it to the text column of the dataframe to get the number of tokens for each row. It then creates a histogram of the number of tokens per row.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 7\n################################################################################ # Load the cl100k_base tokenizer which is designed to work with the ada-002 model\ntokenizer = tiktoken.get_encoding(\"cl100k_base\") df = pd.read_csv('processed\/scraped.csv', index_col=0)\ndf.columns = ['title', 'text'] # Tokenize the text and save the number of tokens to a new column\ndf['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x))) # Visualize the distribution of the number of tokens per row using a histogram\ndf.n_tokens.hist()<\/pre>\n<h2>Step 8<\/h2>\n<p>This section of the code defines a maximum number of tokens, creates a function called <code>split_into_many<\/code> that takes text and a maximum number of tokens as input and splits the text into chunks of a maximum number of tokens. <\/p>\n<p>It then loops through the dataframe and either adds the text to the list of shortened texts or splits the text into chunks of a maximum number of tokens and adds the chunks to the list of shortened texts.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 8\n################################################################################ max_tokens = 500 # Function to split the text into chunks of a maximum number of tokens\ndef split_into_many(text, max_tokens = max_tokens): # Split the text into sentences sentences = text.split('. ') # Get the number of tokens for each sentence n_tokens = [len(tokenizer.encode(\" \" + sentence)) for sentence in sentences] chunks = [] tokens_so_far = 0 chunk = [] # Loop through the sentences and tokens joined together in a tuple for sentence, token in zip(sentences, n_tokens): # If the number of tokens so far plus the number of tokens in the current sentence is greater # than the max number of tokens, then add the chunk to the list of chunks and reset # the chunk and tokens so far if tokens_so_far + token > max_tokens: chunks.append(\". \".join(chunk) + \".\") chunk = [] tokens_so_far = 0 # If the number of tokens in the current sentence is greater than the max number of # tokens, go to the next sentence if token > max_tokens: continue # Otherwise, add the sentence to the chunk and add the number of tokens to the total chunk.append(sentence) tokens_so_far += token + 1 return chunks shortened = [] # Loop through the dataframe\nfor row in df.iterrows(): # If the text is None, go to the next row if row[1]['text'] is None: continue # If the number of tokens is greater than the max number of tokens, split the text into chunks if row[1]['n_tokens'] > max_tokens: shortened += split_into_many(row[1]['text']) # Otherwise, add the text to the list of shortened texts else: shortened.append( row[1]['text'] )<\/pre>\n<h2>Step 9<\/h2>\n<p>This section of the code creates a new dataframe from the list of shortened texts, applies the tokenizer to the text column of the dataframe to get the number of tokens for each row, and creates a histogram of the number of tokens per row.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 9\n################################################################################ df = pd.DataFrame(shortened, columns = ['text'])\ndf['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))\ndf.n_tokens.hist()<\/pre>\n<h2>Step 10<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"605\" height=\"805\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-20.png\" alt=\"\" class=\"wp-image-1177658\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-20.png 605w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-20-225x300.png 225w\" sizes=\"auto, (max-width: 605px) 100vw, 605px\" \/><\/figure>\n<\/div>\n<p>Step 10 involves using OpenAI&#8217;s language model to embed the text into vectors. This allows the model to analyze the text and make predictions based on its content. The <code>openai.Embedding.create()<\/code> function is used to create the embeddings, and they are saved in a new column in the <code>DataFrame<\/code>.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 10\n################################################################################ # Note that you may run into rate limit issues depending on how many files you try to embed\n# Please check out our rate limit guide to learn more on how to handle this: https:\/\/platform.openai.com\/docs\/guides\/rate-limits df['embeddings'] = df.text.apply(lambda x: openai.Embedding.create(input=x, engine='text-embedding-ada-002')['data'][0]['embedding'])\ndf.to_csv('processed\/embeddings.csv')\ndf.head()<\/pre>\n<h2>Step 11<\/h2>\n<p>Step 11 involves loading the embeddings from the <code>DataFrame<\/code> and converting them to numpy arrays.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 11\n################################################################################ df=pd.read_csv('processed\/embeddings.csv', index_col=0)\ndf['embeddings'] = df['embeddings'].apply(eval).apply(np.array) df.head()<\/pre>\n<h2>Step 12<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"631\" height=\"419\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-21.png\" alt=\"\" class=\"wp-image-1177660\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-21.png 631w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-21-300x199.png 300w\" sizes=\"auto, (max-width: 631px) 100vw, 631px\" \/><\/figure>\n<\/div>\n<p>Step 12 includes the <code>create_context()<\/code> and <code>answer_question()<\/code> functions that use the embeddings to find the most similar context to a question and then answer it based on that context. These functions leverage OpenAI&#8217;s language models and the embeddings created in Step 10 to provide accurate and reliable answers. The <code>create_context()<\/code> function creates the context based on the question and the embeddings, while the <code>answer_question()<\/code> function uses the context and question to generate a response using OpenAI&#8217;s GPT-3 language model.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 12\n################################################################################ def create_context( question, df, max_len=1800, size=\"ada\"\n): \"\"\" Create a context for a question by finding the most similar context from the dataframe \"\"\" # Get the embeddings for the question q_embeddings = openai.Embedding.create(input=question, engine='text-embedding-ada-002')['data'][0]['embedding'] # Get the distances from the embeddings df['distances'] = distances_from_embeddings(q_embeddings, df['embeddings'].values, distance_metric='cosine') returns = [] cur_len = 0 # Sort by distance and add the text to the context until the context is too long for i, row in df.sort_values('distances', ascending=True).iterrows(): # Add the length of the text to the current length cur_len += row['n_tokens'] + 4 # If the context is too long, break if cur_len > max_len: break # Else add it to the text that is being returned returns.append(row[\"text\"]) # Return the context return \"\\n\\n###\\n\\n\".join(returns) def answer_question( df, model=\"text-davinci-003\", question=\"Am I allowed to publish model outputs to Twitter, without a human review?\", max_len=1800, size=\"ada\", debug=False, max_tokens=150, stop_sequence=None\n): \"\"\" Answer a question based on the most similar context from the dataframe texts \"\"\" context = create_context( question, df, max_len=max_len, size=size, ) # If debug, print the raw model response if debug: print(\"Context:\\n\" + context) print(\"\\n\\n\") try: # Create a completions using the questin and context response = openai.Completion.create( prompt=f\"Answer the question based on the context below, and if the question can't be answered based on the context, say \\\"I don't know\\\"\\n\\nContext: {context}\\n\\n---\\n\\nQuestion: {question}\\nAnswer:\", temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, model=model, ) return response[\"choices\"][0][\"text\"].strip() except Exception as e: print(e) return \"\"<\/pre>\n<h2>Step 13<\/h2>\n<p>Step 13 provides an example of using the <code>answer_question()<\/code> function to answer two different questions. The first question is a simple one, while the second question requires more specific knowledge. This example demonstrates the versatility of the Q&amp;A bot and its ability to answer a wide range of questions.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 13\n################################################################################ print(answer_question(df, question=\"What day is it?\", debug=False)) print(answer_question(df, question=\"What is our newest embeddings model?\"))<\/pre>\n<h2 id=\"wholecode\">Putting It All Together<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"631\" height=\"499\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-22.png\" alt=\"\" class=\"wp-image-1177661\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-22.png 631w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-22-300x237.png 300w\" sizes=\"auto, (max-width: 631px) 100vw, 631px\" \/><\/figure>\n<\/div>\n<p>You can check out the whole code project on the <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/openai\/openai-cookbook\/blob\/main\/apps\/web-crawl-q-and-a\/web-qa.py\" data-type=\"URL\" data-id=\"https:\/\/github.com\/openai\/openai-cookbook\/blob\/main\/apps\/web-crawl-q-and-a\/web-qa.py\" target=\"_blank\">GitHub<\/a> or simply copy and paste it from here:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">################################################################################\n### Step 1\n################################################################################ import requests\nimport re\nimport urllib.request\nfrom bs4 import BeautifulSoup\nfrom collections import deque\nfrom html.parser import HTMLParser\nfrom urllib.parse import urlparse\nimport os\nimport pandas as pd\nimport tiktoken\nimport openai\nfrom openai.embeddings_utils import distances_from_embeddings\nimport numpy as np\nfrom openai.embeddings_utils import distances_from_embeddings, cosine_similarity # Regex pattern to match a URL\nHTTP_URL_PATTERN = r'^http[s]*:\/\/.+' # Define root domain to crawl\ndomain = \"openai.com\"\nfull_url = \"https:\/\/openai.com\/\" # Create a class to parse the HTML and get the hyperlinks\nclass HyperlinkParser(HTMLParser): def __init__(self): super().__init__() # Create a list to store the hyperlinks self.hyperlinks = [] # Override the HTMLParser's handle_starttag method to get the hyperlinks def handle_starttag(self, tag, attrs): attrs = dict(attrs) # If the tag is an anchor tag and it has an href attribute, add the href attribute to the list of hyperlinks if tag == \"a\" and \"href\" in attrs: self.hyperlinks.append(attrs[\"href\"]) ################################################################################\n### Step 2\n################################################################################ # Function to get the hyperlinks from a URL\ndef get_hyperlinks(url): # Try to open the URL and read the HTML try: # Open the URL and read the HTML with urllib.request.urlopen(url) as response: # If the response is not HTML, return an empty list if not response.info().get('Content-Type').startswith(\"text\/html\"): return [] # Decode the HTML html = response.read().decode('utf-8') except Exception as e: print(e) return [] # Create the HTML Parser and then Parse the HTML to get hyperlinks parser = HyperlinkParser() parser.feed(html) return parser.hyperlinks ################################################################################\n### Step 3\n################################################################################ # Function to get the hyperlinks from a URL that are within the same domain\ndef get_domain_hyperlinks(local_domain, url): clean_links = [] for link in set(get_hyperlinks(url)): clean_link = None # If the link is a URL, check if it is within the same domain if re.search(HTTP_URL_PATTERN, link): # Parse the URL and check if the domain is the same url_obj = urlparse(link) if url_obj.netloc == local_domain: clean_link = link # If the link is not a URL, check if it is a relative link else: if link.startswith(\"\/\"): link = link[1:] elif link.startswith(\"#\") or link.startswith(\"mailto:\"): continue clean_link = \"https:\/\/\" + local_domain + \"\/\" + link if clean_link is not None: if clean_link.endswith(\"\/\"): clean_link = clean_link[:-1] clean_links.append(clean_link) # Return the list of hyperlinks that are within the same domain return list(set(clean_links)) ################################################################################\n### Step 4\n################################################################################ def crawl(url): # Parse the URL and get the domain local_domain = urlparse(url).netloc # Create a queue to store the URLs to crawl queue = deque([url]) # Create a set to store the URLs that have already been seen (no duplicates) seen = set([url]) # Create a directory to store the text files if not os.path.exists(\"text\/\"): os.mkdir(\"text\/\") if not os.path.exists(\"text\/\"+local_domain+\"\/\"): os.mkdir(\"text\/\" + local_domain + \"\/\") # Create a directory to store the csv files if not os.path.exists(\"processed\"): os.mkdir(\"processed\") # While the queue is not empty, continue crawling while queue: # Get the next URL from the queue url = queue.pop() print(url) # for debugging and to see the progress # Save text from the url to a &lt;url>.txt file with open('text\/'+local_domain+'\/'+url[8:].replace(\"\/\", \"_\") + \".txt\", \"w\", encoding=\"UTF-8\") as f: # Get the text from the URL using BeautifulSoup soup = BeautifulSoup(requests.get(url).text, \"html.parser\") # Get the text but remove the tags text = soup.get_text() # If the crawler gets to a page that requires JavaScript, it will stop the crawl if (\"You need to enable JavaScript to run this app.\" in text): print(\"Unable to parse page \" + url + \" due to JavaScript being required\") # Otherwise, write the text to the file in the text directory f.write(text) # Get the hyperlinks from the URL and add them to the queue for link in get_domain_hyperlinks(local_domain, url): if link not in seen: queue.append(link) seen.add(link) crawl(full_url) ################################################################################\n### Step 5\n################################################################################ def remove_newlines(serie): serie = serie.str.replace('\\n', ' ') serie = serie.str.replace('\\\\n', ' ') serie = serie.str.replace(' ', ' ') serie = serie.str.replace(' ', ' ') return serie ################################################################################\n### Step 6\n################################################################################ # Create a list to store the text files\ntexts=[] # Get all the text files in the text directory\nfor file in os.listdir(\"text\/\" + domain + \"\/\"): # Open the file and read the text with open(\"text\/\" + domain + \"\/\" + file, \"r\", encoding=\"UTF-8\") as f: text = f.read() # Omit the first 11 lines and the last 4 lines, then replace -, _, and #update with spaces. texts.append((file[11:-4].replace('-',' ').replace('_', ' ').replace('#update',''), text)) # Create a dataframe from the list of texts\ndf = pd.DataFrame(texts, columns = ['fname', 'text']) # Set the text column to be the raw text with the newlines removed\ndf['text'] = df.fname + \". \" + remove_newlines(df.text)\ndf.to_csv('processed\/scraped.csv')\ndf.head() ################################################################################\n### Step 7\n################################################################################ # Load the cl100k_base tokenizer which is designed to work with the ada-002 model\ntokenizer = tiktoken.get_encoding(\"cl100k_base\") df = pd.read_csv('processed\/scraped.csv', index_col=0)\ndf.columns = ['title', 'text'] # Tokenize the text and save the number of tokens to a new column\ndf['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x))) # Visualize the distribution of the number of tokens per row using a histogram\ndf.n_tokens.hist() ################################################################################\n### Step 8\n################################################################################ max_tokens = 500 # Function to split the text into chunks of a maximum number of tokens\ndef split_into_many(text, max_tokens = max_tokens): # Split the text into sentences sentences = text.split('. ') # Get the number of tokens for each sentence n_tokens = [len(tokenizer.encode(\" \" + sentence)) for sentence in sentences] chunks = [] tokens_so_far = 0 chunk = [] # Loop through the sentences and tokens joined together in a tuple for sentence, token in zip(sentences, n_tokens): # If the number of tokens so far plus the number of tokens in the current sentence is greater # than the max number of tokens, then add the chunk to the list of chunks and reset # the chunk and tokens so far if tokens_so_far + token > max_tokens: chunks.append(\". \".join(chunk) + \".\") chunk = [] tokens_so_far = 0 # If the number of tokens in the current sentence is greater than the max number of # tokens, go to the next sentence if token > max_tokens: continue # Otherwise, add the sentence to the chunk and add the number of tokens to the total chunk.append(sentence) tokens_so_far += token + 1 return chunks shortened = [] # Loop through the dataframe\nfor row in df.iterrows(): # If the text is None, go to the next row if row[1]['text'] is None: continue # If the number of tokens is greater than the max number of tokens, split the text into chunks if row[1]['n_tokens'] > max_tokens: shortened += split_into_many(row[1]['text']) # Otherwise, add the text to the list of shortened texts else: shortened.append( row[1]['text'] ) ################################################################################\n### Step 9\n################################################################################ df = pd.DataFrame(shortened, columns = ['text'])\ndf['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))\ndf.n_tokens.hist() ################################################################################\n### Step 10\n################################################################################ # Note that you may run into rate limit issues depending on how many files you try to embed\n# Please check out our rate limit guide to learn more on how to handle this: https:\/\/platform.openai.com\/docs\/guides\/rate-limits df['embeddings'] = df.text.apply(lambda x: openai.Embedding.create(input=x, engine='text-embedding-ada-002')['data'][0]['embedding'])\ndf.to_csv('processed\/embeddings.csv')\ndf.head() ################################################################################\n### Step 11\n################################################################################ df=pd.read_csv('processed\/embeddings.csv', index_col=0)\ndf['embeddings'] = df['embeddings'].apply(eval).apply(np.array) df.head() ################################################################################\n### Step 12\n################################################################################ def create_context( question, df, max_len=1800, size=\"ada\"\n): \"\"\" Create a context for a question by finding the most similar context from the dataframe \"\"\" # Get the embeddings for the question q_embeddings = openai.Embedding.create(input=question, engine='text-embedding-ada-002')['data'][0]['embedding'] # Get the distances from the embeddings df['distances'] = distances_from_embeddings(q_embeddings, df['embeddings'].values, distance_metric='cosine') returns = [] cur_len = 0 # Sort by distance and add the text to the context until the context is too long for i, row in df.sort_values('distances', ascending=True).iterrows(): # Add the length of the text to the current length cur_len += row['n_tokens'] + 4 # If the context is too long, break if cur_len > max_len: break # Else add it to the text that is being returned returns.append(row[\"text\"]) # Return the context return \"\\n\\n###\\n\\n\".join(returns) def answer_question( df, model=\"text-davinci-003\", question=\"Am I allowed to publish model outputs to Twitter, without a human review?\", max_len=1800, size=\"ada\", debug=False, max_tokens=150, stop_sequence=None\n): \"\"\" Answer a question based on the most similar context from the dataframe texts \"\"\" context = create_context( question, df, max_len=max_len, size=size, ) # If debug, print the raw model response if debug: print(\"Context:\\n\" + context) print(\"\\n\\n\") try: # Create a completions using the questin and context response = openai.Completion.create( prompt=f\"Answer the question based on the context below, and if the question can't be answered based on the context, say \\\"I don't know\\\"\\n\\nContext: {context}\\n\\n---\\n\\nQuestion: {question}\\nAnswer:\", temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, model=model, ) return response[\"choices\"][0][\"text\"].strip() except Exception as e: print(e) return \"\" ################################################################################\n### Step 13\n################################################################################ print(answer_question(df, question=\"What day is it?\", debug=False)) print(answer_question(df, question=\"What is our newest embeddings model?\"))<\/pre>\n<h2>How to Run This Code?<\/h2>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img decoding=\"async\" loading=\"lazy\" width=\"631\" height=\"421\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-23.png\" alt=\"\" class=\"wp-image-1177664\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-23.png 631w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-23-300x200.png 300w\" sizes=\"auto, (max-width: 631px) 100vw, 631px\" \/><\/figure>\n<\/div>\n<p>This program is a Python script that scrapes text from a website, processes it, and then uses OpenAI&#8217;s language models to answer questions based on the scraped text. <\/p>\n<p>All of the following explanations concern the <a href=\"https:\/\/github.com\/openai\/openai-cookbook\/tree\/main\/apps\/web-crawl-q-and-a\" data-type=\"URL\" data-id=\"https:\/\/github.com\/openai\/openai-cookbook\/tree\/main\/apps\/web-crawl-q-and-a\" target=\"_blank\" rel=\"noreferrer noopener\">original code project on the GitHub here<\/a>.<\/p>\n<p>Here&#8217;s a step-by-step guide on how to use it:<\/p>\n<ol>\n<li><strong>Install the required packages<\/strong>: The script uses several Python packages, including requests, BeautifulSoup, pandas, and openai. You can install these packages by running <code>pip install -r requirements.txt<\/code> in the directory where the script is located.<\/li>\n<li><strong>Set the website to scrape<\/strong>: In the script, you can specify the website to scrape by setting the <code>domain<\/code> and <code>full_url<\/code> variables in Step 1. The <code>domain<\/code> variable should be the root domain of the website (e.g., &#8220;example.com&#8221;), and the <code>full_url<\/code> variable should be the full URL of the website (e.g., &#8220;<a href=\"https:\/\/www.example.com\/\">https:\/\/www.example.com\/<\/a>&#8220;).<\/li>\n<li><strong>Run the script<\/strong>: You can run the script in a Python environment by executing <code>python script.py<\/code> in the directory where the script is located.<\/li>\n<li><strong>Wait for the scraping to complete<\/strong>: The script will take some time to scrape the website and save the text files to disk. You can monitor the progress by looking at the console output.<\/li>\n<li><strong>Ask questions<\/strong>: After the scraping is complete, you can use the <code>answer_question<\/code> function in Step 12 to ask questions based on the scraped text. The function takes in a dataframe containing the scraped text, a question to ask, and several optional parameters. You can modify the question and other parameters to suit your needs.<\/li>\n<\/ol>\n<p>Note that the script is intended as a demonstration of how to use OpenAI&#8217;s language models to answer questions based on scraped text, and may require modification to work with different websites or to answer different types of questions. It also requires an <a href=\"https:\/\/blog.finxter.com\/openai-api-or-how-i-made-my-python-code-intelligent\/\" data-type=\"URL\" data-id=\"https:\/\/blog.finxter.com\/openai-api-or-how-i-made-my-python-code-intelligent\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI API key<\/a> to use. You can sign up for an API key on the OpenAI website.<\/p>\n<h2>What Is an Embedding in This Context?<\/h2>\n<p class=\"has-global-color-8-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/14.0.0\/72x72\/1f4a1.png\" alt=\"\ud83d\udca1\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> In <strong>natural language processing<\/strong>, an embedding is a way to represent words or phrases as numerical vectors. These vectors capture semantic and contextual information about the words and phrases, and can be used to train machine learning models for various tasks such as text classification, sentiment analysis, and question answering.<\/p>\n<p>In this script, the embeddings are created using OpenAI&#8217;s language models, and they are used to encode the text from the scraped web pages into a numerical format that can be analyzed and searched efficiently. <\/p>\n<p>The embeddings are created by feeding the text through OpenAI&#8217;s <code>text-embedding-ada-002<\/code> engine, which is designed to create high-quality embeddings for a wide variety of text-based applications. <\/p>\n<p>The resulting embeddings are stored in the <code>DataFrame<\/code> and used to find the most similar context to a question in order to provide accurate and reliable answers.<\/p>\n<p class=\"has-base-background-color has-background\"><img decoding=\"async\" src=\"https:\/\/s.w.org\/images\/core\/emoji\/14.0.0\/72x72\/1f449.png\" alt=\"\ud83d\udc49\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/> <strong>Recommended<\/strong>: <a href=\"https:\/\/blog.finxter.com\/how-to-install-openai-in-python\/\" data-type=\"post\" data-id=\"1170845\" target=\"_blank\" rel=\"noreferrer noopener\">How to Install OpenAI in Python?<\/a><\/p>\n<p>If you want to improve your web scraping skills, check out the following course on the Finxter academy:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><a href=\"https:\/\/academy.finxter.com\/university\/web-scraping-with-beautifulsoup\/\" target=\"_blank\" rel=\"noreferrer noopener\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"773\" src=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-24-1024x773.png\" alt=\"\" class=\"wp-image-1177674\" srcset=\"https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-24-1024x773.png 1024w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-24-300x227.png 300w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-24-768x580.png 768w, https:\/\/blog.finxter.com\/wp-content\/uploads\/2023\/03\/image-24.png 1026w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>5\/5 &#8211; (2 votes) Have you ever found yourself deep in the internet rabbit hole, searching for an answer to a question that just won&#8217;t quit? It can be frustrating to sift through all the online information and still come up empty-handed. But what if there was a way to get accurate and reliable answers [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[857],"tags":[73,468,528],"class_list":["post-132172","post","type-post","status-publish","format-standard","hentry","category-python-tut","tag-programming","tag-python","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/132172","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=132172"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/132172\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=132172"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=132172"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=132172"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}