抓取谷歌目的地

2024-04-29 16:10:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我正准备环游世界,很想知道世界上最吸引人的景点是什么,所以我试图在某个地方找到最热门的目的地。我想以一个国家的顶级城市和他们最好的景点结束。googledestinations最近添加了一个很好的功能。在

例如,当Google搜索Cuba Destinations时,Google会显示一张带有目的地哈瓦那、瓦拉德罗、特立尼达和古巴圣地亚哥的卡片。在

然后,当谷歌搜索Havana Cuba Destinations时,它显示的是“老哈瓦那,马莱孔,卡斯蒂略·德洛斯·特雷斯·雷耶斯·马戈斯·德尔·莫罗,埃尔首都。在

最后,我将把它变成一张桌子,看起来像:

Cuba, Havana, Old Havana.
Cuba, Havana, Malecon.
Cuba, Havana, Castillo de los Tres Reyes Magos del Morro.
Cuba, Havana, El Capitolio.
Cuba, Varadero, Hicacos Peninsula.

等等。在

我尝试过Travel destinations API中所示的API调用,但是它没有提供正确的反馈,并且经常会产生超过查询限制的结果。在

下面的代码返回一个错误:

^{pr2}$

有什么提示吗?在


Tags: 功能api地方google世界国家顶级热门
2条回答

试试这个googleplacesapiurl。您将获得(例如)纽约市的旅游景点/景点。您必须将城市名称与关键字Point Of Interest一起使用。在

https://maps.googleapis.com/maps/api/place/textsearch/json?query=new+york+city+point+of+interest&language=en&key=API_KEY

这些API结果与下面的Google搜索结果相同。 https://www.google.com/search?sclient=psy-ab&site=&source=hp&btnG=Search&q=New+York+point+of+interest

再给你两个小建议:

  • 您可以将Python客户机用于Google地图服务:https://github.com/googlemaps/google-maps-services-python
  • 对于OVER_QUERY_LIMIT问题,请确保您向Google云项目添加了一个计费方法(使用您的信用卡或免费跟踪信用余额)。不用太担心,因为谷歌每个月会给你几千个免费查询。在

您将需要使用类似Selenium的东西来实现这一点,因为页面生成了多个xhr,您无法单独使用请求获得呈现的页面。首先安装Selenium。在

sudo pip3 install selenium

然后找个司机https://sites.google.com/a/chromium.org/chromedriver/downloads (根据您的操作系统,您可能需要指定驱动程序的位置)

^{pr2}$

输出:

[('Havana', "Cuban capital known for Old Havana's colonial architecture, live salsa music & nearby beaches."), ('Varadero', 'Major Cuban resort town on Hicacos Peninsula, with a 20km beach, a golf course & several parks.'), ('Trinidad', 'Cuban town known for Plaza Mayor, colonial architecture & plantations of Valle de los Ingenios.'), ('Santiago de Cuba', 'Cuban city known for Afro-Cuban festivals & music, plus Spanish colonial & revolutionary history.'), ('Viñales', 'Cuban town known for Viñales Valley, Casa de Caridad Botanical Gardens & nearby tobacco farms.'), ('Cienfuegos', 'Cuban coastal city, known for Tomás Terry Theater, Arco de Triunfo & Playa Rancho Luna resorts.'), ('Santa Clara', 'Cuban city home to the Che Guevara Mausoleum, Parque Vidal & ornate Teatro La Caridad.'), ('Cayo Coco', 'Cuban island known for its white-sand beaches & resorts, plus reef snorkeling & flamingos.'), ('Cayo Santa María', 'Cuban island known for Gaviotas Beach, Cayo Santa María Wildlife Refuge & Pueblo La Estrella.'), ('Cayo Largo del Sur', 'Cuban island, known for beaches like Playa Blanca & Playa Sirena, plus a sea turtle center & diving.'), ('Plaza de la Revolución', 'Che Guevara and monuments'), ('Camagüey', 'Ballet, churches, history, and beaches'), ('Holguín', 'Cuban city known for Parque Calixto García, the Hacha de Holguín axe head & Guardalavaca beaches.'), ('Cayo Guillermo', 'Cuban island with beaches like Playa del Medio & Playa Pilar, plus vast expanses of coral reef.'), ('Matanzas', 'Caves, theater, beaches, history, and rivers'), ('Baracoa', 'Beaches, rivers, and nature'), ('Centro Habana', '\xa0'), ('Playa Girón', 'Beaches, snorkeling, and museums'), ('Topes de Collantes', 'Scenic nature reserve park for hiking'), ('Guardalavaca', 'Cuban resort known for Esmeralda Beach, the Cayo Naranjo Aquarium & the Chorro de Maíta Museum.'), ('Bay of Pigs', 'Snorkeling, scuba diving, and beaches'), ('Isla de la Juventud', 'Scuba diving and beaches'), ('Zapata Swamp', 'Parks, crocodiles, birdwatching, and swamps'), ('Pinar del Río', 'History'), ('Remedios', 'Churches, beaches, and museums'), ('Bayamo', 'Wax museums, monuments, history, and music'), ('Sierra Maestra', 'Peaks with a storied political history'), ('Las Terrazas', 'Zip-lining, nature reserves, and hiking'), ('Sancti Spíritus', 'History and museums'), ('Playa Ancon', 'Beaches, snorkeling, and scuba diving'), ('Jibacoa', 'Beaches, snorkeling, and jellyfish'), ('Jardines de la Reina', 'Scuba diving, fly-fishing, and gardens'), ('Cayo Jutías', 'Beach and snorkeling'), ('Guamá, Cuba', 'Crocodiles, beaches, snorkeling, and lakes'), ('Morón', 'Crocodiles, lagoons, and beaches'), ('Las Tunas', 'Beaches, nightlife, and history'), ('Soroa', 'Waterfalls, gardens, nature, and ecotourism'), ('Guanabo', 'Beach'), ('María la Gorda', 'Scuba diving, beaches, and snorkeling'), ('Alejandro de Humboldt National Park', 'Park, protected area, and hiking'), ('Ciego de Ávila', 'Zoos and beaches'), ('Bacunayagua', '\xa0'), ('Guantánamo', 'Beaches, history, and nature'), ('Cárdenas', 'Beaches, museums, monuments, and history'), ('Canarreos Archipelago', 'Sailing and coral reefs'), ('Caibarién', 'Beaches'), ('El Nicho', 'Waterfalls, parks, and nature'), ('San Luis Valley', 'Cranes, national wildlife refuge, and elk')]

根据评论更新:

from bs4 import BeautifulSoup
from selenium import webdriver
import time

browser = webdriver.Chrome()
for place in ["Cuba", "Belgum", "France"]:
    url = ("https://www.google.nl/destination/compare?site=destination&output=search")
    browser.get(url) # you may not need to do this every time if you clear the search box
    time.sleep(2)
    element = browser.find_element_by_name('q') # get the query box
    time.sleep(2)
    element.send_keys(place) # populate the search box
    time.sleep (2)
    search_box=browser.find_element_by_class_name('sbsb_c') # get the first element in the list
    search_box.click() # click it
    time.sleep (2)
    destinations=browser.find_element_by_id('DESTINATIONS') # Click the destinations link
    destinations.click()
    time.sleep (2)
    html_source = browser.page_source
    soup = BeautifulSoup(html_source, "lxml")
    # Get the headings
    hs = [tag.text for tag in soup.find_all('h2')]
    # get the text containg divs
    divs = [tag.text for tag in soup.find_all('div', {'class': False})]
    # Delete surplus divs
    del divs[:22]
    del divs[-1:]
    print(list(zip(hs,divs)))

browser.quit()

相关问题 更多 >