'Repeat a python function on its own output
I made a function that scrapes the last 64 characters of text from a website and adds it to url1
, resulting in new_url
. I want to repeat the process by scraping the last 64 characters from the resulting URL (new_url
) and adding it to url1
again. The goal is to repeat this until I hit a website where the last 3 characters are "END".
Here is my code so far:
#function
def getlink(url):
url1 = 'https://www.random.computer/api.php?file='
req=request.urlopen(url)
link = req.read().splitlines()
for i,line in enumerate(link):
text = line.decode('utf-8')
last64= text[-64:]
new_url= url1+last64
return new_url
getlink('https://www.random/api.php?file=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz012345678910')
#output
'https://www.random/api.php?file=zyxwvutsrqponmlkjihgfedcba012345678910abcdefghijklmnopqrstuvwxyz'
My trouble is figuring out a way to be able to repeat the function on its output. Any help would be appreciated!
Solution 1:[1]
A simple loop should work. I've removed the first token as it may be sensible information. Just change the WRITE_YOUR_FIRST_TOKEN_HERE
string with the code for the first link.
from urllib import request
def get_chunk(chunk, url='https://www.uchicago.computer/api.php?file='):
with request.urlopen(url + chunk) as f:
return f.read().decode('UTF-8').strip()
if __name__ == '__main__':
chunk = 'WRITE_YOUR_FIRST_TOKEN_HERE'
while chunk[-3:] != "END":
chunk = get_chunk(chunk[-64:])
print(chunk)
# Chunk is a string, do whatever you want with it,
# like chunk.splitlines() to get a list of the lines
read
get the byte stream, decode
turns it into a string, and strip
removes leading and trailing whitespaces (like \n
) so that it doesn't mess with the last 64 chars (if you get the last 64 chars but one is a \n
you will only get 63 chars of the token).
Solution 2:[2]
Try the below code. It can perform what you mention above?
import requests
from bs4 import BeautifulSoup
def getlink(url):
url1 = 'https://www.uchicago.computer/api.php?file='
response = requests.post(url)
doc = BeautifulSoup(response.text, 'html.parser')
text = doc.decode('utf-8')
last64= text[-65:-1]
new_url= url1+last64
return new_url
def caller(url):
url = getlink(url)
if not url[-3:]=='END':
print(url)
caller(url)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | chirag aggarwal |