如何从twitter上删除所有主题

2条回答

网友

1楼 · 编辑于 2024-04-26 05:52:13

刮除所有主要主题，例如艺术与艺术；文化，商业和；金融等使用Selenium和python您必须为visibility_of_all_elements_located()诱导WebDriverWait，并且您可以使用以下任一Locator Strategies：

使用XPATH和文本属性：

driver.get("https://twitter.com/i/flow/topics_selector")
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[contains(., 'see top Tweets about them in your timeline')]//following::div[@role='button']/div/span")))])

使用XPATH和get_attribute()：

driver.get("https://twitter.com/i/flow/topics_selector")
print([my_elem.get_attribute("textContent") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[contains(., 'see top Tweets about them in your timeline')]//following::div[@role='button']/div/span")))])

控制台输出：

['Arts & culture', 'Business & finance', 'Careers', 'Entertainment', 'Fashion & beauty', 'Food', 'Gaming', 'Lifestyle', 'Movies and TV', 'Music', 'News', 'Outdoors', 'Science', 'Sports', 'Technology', 'Travel']

要使用Selenium和WebDriver刮取所有主和子主题，您可以使用以下定位策略：

使用XPATH和get_attribute("textContent")：

driver.get("https://twitter.com/i/flow/topics_selector")
elements =  WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[contains(., 'see top Tweets about them in your timeline')]//following::div[@role='button']/div/span")))
for element in elements:
    element.click()
print([my_elem.get_attribute("textContent") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@role='button']/div/span[text()]")))])
driver.quit()

控制台输出：

['Arts & culture', 'Animation', 'Art', 'Books', 'Dance', 'Horoscope', 'Theater', 'Writing', 'Business & finance', 'Business personalities', 'Business professions', 'Cryptocurrencies', 'Careers', 'Education', 'Fields of study', 'Entertainment', 'Celebrities', 'Comedy', 'Digital creators', 'Entertainment brands', 'Podcasts', 'Popular franchises', 'Theater', 'Fashion & beauty', 'Beauty', 'Fashion', 'Food', 'Cooking', 'Cuisines', 'Gaming', 'Esports', 'Game development', 'Gaming hardware', 'Gaming personalities', 'Tabletop gaming', 'Video games', 'Lifestyle', 'Animals', 'At home', 'Collectibles', 'Family', 'Fitness', 'Unexplained phenomena', 'Movies and TV', 'Movies', 'Television', 'Music', 'Alternative', 'Bollywood music', 'C-pop', 'Classical music', 'Country music', 'Dance music', 'Electronic music', 'Hip-hop & rap', 'J-pop', 'K-hip hop', 'K-pop', 'Metal', 'Musical instruments', 'Pop', 'R&B and soul', 'Radio stations', 'Reggae', 'Reggaeton', 'Rock', 'World music', 'News', 'COVID-19', 'Local news', 'Social movements', 'Outdoors', 'Science', 'Biology', 'Sports', 'American football', 'Australian rules football', 'Auto racing', 'Baseball', 'Basketball', 'Combat Sports', 'Cricket', 'Extreme sports', 'Fantasy sports', 'Football', 'Golf', 'Gymnastics', 'Hockey', 'Lacrosse', 'Pub sports', 'Rugby', 'Sports icons', 'Sports journalists & coaches', 'Tennis', 'Track & field', 'Water sports', 'Winter sports', 'Technology', 'Computer programming', 'Cryptocurrencies', 'Data science', 'Information security', 'Operating system', 'Tech brands', 'Tech personalities', 'Travel', 'Adventure travel', 'Destinations', 'Transportation']

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

网友

2楼 · 编辑于 2024-04-26 05:52:13

看看XPATH是如何工作的。只需输入“//element[@attribute=“foo”]”，就不必写出整个路径。请小心，因为主主题和子主题（单击主主题后可见）具有相同的类名。这是导致错误的原因。下面是我如何单击子主题的，但我相信有更好的方法：

我使用以下方法找到主题元素：

topics = WebDriverWait(browser, 5).until(
        EC.presence_of_all_elements_located((By.XPATH, '//div[@class="css-901oao r-13gxpu9 r-1qd0xha r-1b6yd1w r-1vr29t4 r-ad9z0x r-bcqeeo r-qvutc0"]'))
    )

然后我创建了一个名为：

main_topics = []

然后，我循环浏览主题并将每个element.text显示到main_topics列表中，然后单击每个元素以显示主要主题

for topic in topics:
    main_topics.append(topic.text)
    topic.click()

然后，我创建了一个名为sub_topics的新变量：（它现在是所有打开的主题）

sub_topics = WebDriverWait(browser, 5).until(
        EC.presence_of_all_elements_located((By.XPATH, '//span[@class="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0"]'))
    )

然后，我又创建了两个空列表，名为：

subs_list = []

skip_these_words = ["Done", "Follow your favorite Topics", "You’ll see top Tweets about them in your timeline. Don’t see your favorite Topics yet? New Topics are added every week.", "Follow"]
]

然后，我for循环遍历sub_主题，并做了一个if语句，仅当元素不在主主题中时才将elements.text附加到subs_列表，并跳过这些单词列表。我这样做是为了过滤掉顶部的主要主题和不必要的文本，因为所有这些dern元素都具有相同的类名。最后，单击每个子主题。最后一部分令人困惑，因此下面是一个示例：

for sub in sub_topics:
    if sub.text not in main_topics and sub.text not in skip_these_words:
        subs_list.append(sub.text)
        sub.click()

还有一些隐藏的子主题。查看是否可以单击其余的子主题。然后，查看是否可以找到followbutton元素并单击每个元素

相关问题更多 >

编程相关推荐

热门问题

热门文章