Eurobeat歌词搜索工具


本搜索基于https://www.eurobeat-prime.com/lyrics.php?网站,里面有绝大部分的eb歌词。
通过python爬取网站可以找到你想找的歌词的歌名和歌手。以csv文件储存。第一列是id第二列是歌名。id用法是在“https://www.eurobeat-prime.com/lyrics.php?lyrics=”后面加上数字id。
搜索的内容在a = re.search(r"\bShock\b|\bshock\b|\bSHOCK\b", lrc)这一行里把\b之间的单词做替换。
for i in range(1,4967)这一行的4967要看网站现在最新的歌词的id是多少。这个要看歌词页面https://www.eurobeat-prime.com/lyrics.php?的lastest entries第一行。
requests,bs4这些东西要自己装了,pip install一下就行,不会就随便搜索就能找到
代码如下:
import re,requests,bs4,os
from concurrent.futures import ThreadPoolExecutor
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'}
# url = "https://www.eurobeat-prime.com/lyrics.php?artist=z"
# url = "https://www.eurobeat-prime.com/lyrics.php?lyrics=2"
def findEurobeat(url,idx):
resp = requests.get(url=url,headers=headers)
# print(resp.text)
soup = bs4.BeautifulSoup(resp.text,'html.parser')
title = soup.select_one('tr:nth-child(3) > td > div > b').text#body > table > tbody > tr > td > table.mtable2 > tbody > tr > td.mtopm > table > tbody > tr:nth-child(3) > td
lrcs = soup.select('tr:nth-child(3) > td > div')#body > table > tbody > tr > td > table.mtable2 > tbody > tr > td.mtopm > table > tbody > tr:nth-child(3) > td > div
lrc = ""
lrcl = lrcs[1].text.split('\n')
for i,j in enumerate(lrcl):
if i >1:
# print(i, j)
lrc += j +"\n"
# a = re.search(r"\bEurobeat\b|\beurobeat\b|\bEUROBEAT\b",lrc)
a = re.search(r"\bShock\b|\bshock\b|\bSHOCK\b", lrc)
if a:
# print(lrc)
print(idx+","+title)
with open("eblrc.csv","a+",encoding='utf-8') as f:
f.write(idx+","+title+"\n")
# print(title)
with ThreadPoolExecutor(max_workers=64) as pool:# 多线程
for i in range(1,4967):#4967
idx = str(i)
url = "https://www.eurobeat-prime.com/lyrics.php?lyrics="+idx
# findEurobeat(url,idx)
pool.submit(findEurobeat,url,idx)