Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] How I Built a Back-Link Checker Using ChatGPT and Google Colab

#1
How I Built a Back-Link Checker Using ChatGPT and Google Colab

<div>
<div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;1269525&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;top&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;1&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;starsonly&quot;:&quot;&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;Rate this post&quot;,&quot;legend&quot;:&quot;5\/5 - (1 vote)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;title&quot;:&quot;How I Built a Back-Link Checker Using ChatGPT and Google Colab&quot;,&quot;width&quot;:&quot;142.5&quot;,&quot;_legend&quot;:&quot;{score}\/{best} - ({count} {votes})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>
<div class="kksr-stars">
<div class="kksr-stars-inactive">
<div class="kksr-star" data-star="1" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="2" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="3" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="4" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="5" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
<div class="kksr-stars-active" style="width: 142.5px;">
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
</div>
<div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div>
</p></div>
</p>
<p>I recently started a new website and to rank up in the search engines I hired an SEO expert. He suggested several activities out of which one was building lots of backlinks. We agreed on the budget and the timeframe in which he delivers 200 links in a month. These links are a mix of social bookmarks, profile pages, and guest articles. After he delivers the Links, I have to verify the work done and give feedback. The process of checking a URL containing the backlink is as follows –</p>
<ul>
<li>Open a URL from a list of URLs in the sheet</li>
<li>Search for my “mydomain.com” in the source code or click on the visible link pointing to my domain</li>
<li>In the sheet, mark it as <code>LINK FOUND</code> or <code>LINK NOT FOUND</code></li>
</ul>
<p>Only if it was this easy.</p>
<p>The link-building project will go on for several months. He will be building several hundred links so I must have some tool to do the checking for me. So, I came up with the idea of asking ChatGPT to give me a code in Python. Here is my prompt:</p>
<h2 class="wp-block-heading">Prompt</h2>
<pre class="wp-block-preformatted"><code>Give me a Python code that scrapes all given URLs and looks for any link that contains "mydomain.com" and prints out the URL and the link it found</code></pre>
<p><strong><em>ChatGPT</em></strong> reply: <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f916.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import requests
from bs4 import BeautifulSoup # list of URLs to scrape
urls = ['https://example.com', 'https://example.net', 'https://example.org'] # loop through each URL
for url in urls: # send a request to the URL response = requests.get(url) # parse the HTML content with BeautifulSoup soup = BeautifulSoup(response.content, 'html.parser') # find all links in the page links = soup.find_all('a') # loop through each link for link in links: # check if the link contains "mydomain.com" if 'partymap.in' in link.get('href', ''): # print the URL and link found print(f"{url} - {link.get('href')}")
</pre>
<h2 class="wp-block-heading">Choosing Google Colab as Web-Based Python Interface</h2>
<p>Now I have to run this code in a web interface. The interface should have an input box and an output box. </p>
<p>I find Google Collab to be the best option to do this in a quick, easy, and time-efficient manner. But I need the code to read the list of URLs from an input widget box. So I returned to ChatGPT and defined my environment along with many improvisations. There were several trials and errors. </p>
<p>Here are some of the prompts</p>
<p><em><strong>Improvisation Prompt 1:</strong></em> <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="?‍?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<pre class="wp-block-preformatted"><code>Also add the following Display domains that are duplicate
Display unique list of domains in which the string was not found</code></pre>
<p><em><strong>Improvisation Prompt 2:</strong></em> <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="?‍?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<pre class="wp-block-preformatted"><code>I got this error ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)
</code></pre>
<p><em><strong>Improvisation Prompt 3:</strong></em> <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="?‍?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<pre class="wp-block-preformatted"><code>Check for Redirection, if the URL redirects, print "E:REDIRECTED" and skip iteration</code></pre>
<p><em><strong>Improvisation Prompt 4:</strong></em> <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="?‍?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<pre class="wp-block-preformatted"><code>I got a mod_security error in request.get, how can I fix it</code></pre>
<p><em><strong>Improvisation Prompt 5:</strong></em> <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="?‍?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<pre class="wp-block-preformatted"><code>Add a try catch block around request and beautiful soup</code></pre>
<p><em><strong>Improvisation Prompt 6:</strong></em> <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="?‍?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<pre class="wp-block-preformatted"><code>If there are no Links found, print "E:ZERO LINKS" and skip iteration</code></pre>
<p><em><strong>Improvisation Prompt 7:</strong></em> <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="?‍?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<pre class="wp-block-preformatted"><code>The list of URLs will come from a google collab input box can you make the change</code></pre>
<p>And there were many more prompts to achieve the final results. But, since I am a Python coder, I could exit the back and forth with ChatGPT and change the code my way.</p>
<h2 class="wp-block-heading">ERROR/STATUS CODES</h2>
<p>Explanation of error codes is as follows</p>
<p><strong>Errors found in URL given in the sheet</strong></p>
<ul>
<li>UNRESOLVED – The URL in the sheet is malformed</li>
<li>DUPLICATE DOMAIN – There are multiple URLs from the same domain</li>
<li>REDIRECTED – The URL redirected to another URL, if this happens ask the SEO analyst to post the final URL in the sheet</li>
</ul>
<p><strong>Errors found in Links found in the source code of the URL</strong></p>
<ul>
<li>FOUND – Our domain backlink was found</li>
<li>NOT FOUND – Our domain backlink was not found</li>
<li>BAD LINK – Our domain backlink was not found</li>
<li>ZERO LINKS – No links were found in the source code</li>
</ul>
<p>I begin each error code with ‘<strong>E:</strong>’ to easily identify them in sheet for conditional formatting process.</p>
<p>So here is the final code:</p>
<h2 class="wp-block-heading">The Code</h2>
<p>This goes in the first code cell of Google Colab</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from IPython.display import display
import ipywidgets as widgets url_box = widgets.Textarea( placeholder='Enter URLs here', description='URLs:', layout=widgets.Layout(width='70%')
) # display the text box widget
display(url_box)
</pre>
<p>This goes in the second code cell of Google Colab</p>
<p>/enl</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse # disable SSL certificate verification
requests.packages.urllib3.disable_warnings() # get the input URLs as a list
urls = url_box.value.split()
# create lists to store URLs and domains
scraped_urls = []
unique_domains = []
duplicate_domains = []
notfound_domains = []
inputstring = "" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
} # loop through each URL
for url in urls: parsed_url = urlparse(url) domain = parsed_url.netloc
# add the domain to the list of unique domains if domain not in unique_domains: unique_domains.append(domain) else: # add the domain to the list of duplicate domains if domain not in duplicate_domains: duplicate_domains.append(domain) print("Duplicate domains:", len(duplicate_domains))
print(duplicate_domains)
print() # loop through each URL and check if the backlink exists
for url in urls: inputstring = "" parsed_url = urlparse(url) domain = parsed_url.netloc if not domain: print('E:UNRESOLVED',',',domain) continue if domain in duplicate_domains: print("E:DUPLICATE DOMAIN") continue # send a request to the URL try: response = requests.get(url, headers=headers, verify=False) except Exception as e: print('REQ:',str(e)) # check if the URL is redirecting to "mydomain.com" # check if the response is a redirect if hasattr(response, 'is_redirect') and response.is_redirect: print("E:REDIRECTED",',',domain) continue # parse the HTML content with BeautifulSoup try: soup = BeautifulSoup(response.content, 'html.parser') except Exception as e: print('BS:',str(e)) # find all links in the page links = soup.find_all('a') # print(links) #if no links found if len(links) == 0: print('E:ZERO LINKS',',',domain) continue # loop through each link for link in links: # Get the domain name from the link parsed_url = urlparse(link.get('href', '')) domain_name = parsed_url.netloc # print(domain_name) # domain_name = link.get('href', '') if domain_name: # Check if the domain name is "mydomain.com" if 'mydomain.com' in domain_name: # print(domain_name) inputstring = "FOUND" break else: inputstring = "E:NOT FOUND" # if domain not in notfound_domains: # notfound_domains.append(domain) else: inputstring = "E:BAD LINK" # add the URL to the list of scraped URLs # scraped_urls.append(inputstring) print(inputstring,',',domain)
</pre>
<p>See the CELL setup in the image. Press play in the first cell. You will get a URL input box. Paste your URLs in it.</p>
<p>Input Box:</p>
<pre class="wp-block-preformatted"><code>https://sketchfab.tld/mydomain https://30seconds.tld/mydomain/
https://speakerdeck.tld/mydomainus
https://www.ted.tld/profiles/&lt;some page>/about
https://dzone.tld/users/mydomainindia.html
https://www.reddit.tld/user/mydomainusa
https://medium.tld/@mydomainusa/about
<a data-pin-do="embedUser" href="https://www.pinterest.tld/mydomainusa/"></a>
https://www.intensedebate.tld/people/mydomainusa
https://www.growkudos.tld/profile/&lt;some page>
https://www.universe.tld/users/&lt;some page>
https://www.dostally.tld/post/&lt;some page>
https://www.socialbookmarkzone.info/&lt;some page>
https://app.raindrop.io/my/-1/item/&lt;somepage>/web
https://www.tamaiaz.tld/posts/&lt;somepage>
https://www.socialbookmarkzone.info/&lt;some page>/
https://gab.tld/mydomain/posts/&lt;some page></code>
</pre>
<p>Now press Play in the second cell and watch output panel</p>
<p>Output:</p>
<pre class="wp-block-preformatted"><code>Duplicate domains: 5
['www.socialbookmarkzone.tld, 'www.reddit.tld', 'www.instapaper.tld', 'www.wibki.tld', 'diigo.tld'] FOUND , sketchfab.tld
E:BAD LINK , 30seconds.tld
FOUND , speakerdeck.tld
E:BAD LINK , www.ted.tld
FOUND , dzone.tld
E:DUPLICATE DOMAIN
FOUND , medium.tld
FOUND , www.pinterest.tld
FOUND , www.intensedebate.tld
FOUND , www.growkudos.tld
E:ZERO LINKS , www.universe.tld
FOUND , www.dostally.tld
E:DUPLICATE DOMAIN
E:ZERO LINKS , app.raindrop.io
FOUND , www.tamaiaz.tld
E:DUPLICATE DOMAIN
E:NOT FOUND , gab.tld
</code></pre>
<h2 class="wp-block-heading">INPUT BOX CODE [GOOGLE COLLAB]</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="485" height="371" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-43.png" alt="" class="wp-image-1269542" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-43.png 485w, https://blog.finxter.com/wp-content/uplo...00x229.png 300w" sizes="(max-width: 485px) 100vw, 485px" /></figure>
</div>
<h2 class="wp-block-heading">GOOGLE COLLAB CODE CELL SETUP</h2>
<figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="485" height="426" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-44.png" alt="" class="wp-image-1269543" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-44.png 485w, https://blog.finxter.com/wp-content/uplo...00x264.png 300w" sizes="(max-width: 485px) 100vw, 485px" /></figure>
<h2 class="wp-block-heading">PASTE THE OUTPUT IN YOUR SEO TRACKER SHEET in the same line as the URLs &amp; APPLY SPLIT TEXT TO COLUMN</h2>
</p>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="268" height="350" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-45.png" alt="" class="wp-image-1269544" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-45.png 268w, https://blog.finxter.com/wp-content/uplo...30x300.png 230w" sizes="(max-width: 268px) 100vw, 268px" /></figure>
</div>
<h2 class="wp-block-heading">STEPS TO APPLY CONDITIONAL FORMATTING</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="423" height="293" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-46.png" alt="" class="wp-image-1269545" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-46.png 423w, https://blog.finxter.com/wp-content/uplo...00x208.png 300w" sizes="(max-width: 423px) 100vw, 423px" /></figure>
</div>
<h2 class="wp-block-heading">FINAL OUTPUT</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="272" height="436" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-47.png" alt="" class="wp-image-1269546" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-47.png 272w, https://blog.finxter.com/wp-content/uplo...87x300.png 187w" sizes="(max-width: 272px) 100vw, 272px" /></figure>
</div>
<p>Based on the above output the SEO analyst can rework on the links or drop these sites completely.</p>
<p>If you like the code leave a comment and I am available on Upwork for Prompt Engineering, AI Art jobs. I use ChatGPT, Midjourney, Python and many more tools for my client jobs.</p>
<p>My Upwork profile is <a href="https://www.upwork.com/freelancers/~018645334d3b757e4d" target="_blank" rel="noreferrer noopener">https://www.upwork.com/freelancers/~018645334d3b757e4d</a></p>
<hr class="wp-block-separator has-alpha-channel-opacity"/>
<p class="has-base-2-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="?‍?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/7-effective-prompting-tricks-for-chatgpt/" data-type="post" data-id="1211740" target="_blank" rel="noreferrer noopener">7 Effective Prompting Tricks for ChatGPT</a></p>
</div>


https://www.sickgaming.net/blog/2023/04/...gle-colab/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016