如何重构159行Python代码以修复已弃用的列表'.append'?
我们正在为一个客户报价,他们想把数据迁移到Hubspot,现在我们在处理数据建模和数据库的问题,以便做好计划。
在规划迁移时,我们查看了Hubspot的数据,一个团队成员发现了一段代码,这段代码在某些方面很有帮助。不过,代码中用到了EDIT列表的.append方法,这意味着我们需要根据数据的结构来改变写法。
我原以为“DataFrame”对象没有“append”这个属性是因为它是pandas的,抱歉让人困惑,我以为所有的数据框都是pandas的数据框。
这段代码比较长,我有几个问题,完整的代码可以在这里找到 Hubspot Community Data
- 欢迎提出所有问题,希望这不是个愚蠢的问题,我对如何解决这个问题感到困惑。
- 你是怎么把这152行代码拆分成更小的部分的?最好每个函数只做1到2件事,而不是更多,这正是我希望做到的。
- 我该如何重构或调整下面这段代码,以便数据字典能正常工作,因为.append现在不可用了?由于_append可能不是最有效的选择,我不确定从哪里开始。
编辑1:好的,我在更新共享的代码,错误信息“ return object.getattribute(self, name) AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?”来自第90行,这在generate_data函数的第一个for循环中,位于数据字典之后。
company_industry = faker.random_element(
["Technology", "Healthcare", "Finance", "Real Estate"]
)
# This code has been created by Michael Kupermann (michael.kupermann@codersunlimited.com or michael@kupermann.com)
# The purpose of this code is to generate dummy data that simulates a realistic dataset for HubSpot CRM.
# This data can then be used for demonstrations, testing, or other purposes that require a representative dataset.
# You need to amend the HubSpot Sales and Service Pipelines before you import the data.
#
# Required Packages:
# 1. Faker: This package is used to generate the fake data for our dataset.
# 2. Pandas: This package is used to handle the data in a tabular format and to write the data to an Excel file.
# 3. DateTime: This package is used to generate realistic date data for the 'close_date' field.
#
# To install the necessary packages, you can use pip, the Python package installer.
# Open your terminal (or command prompt on Windows), and enter the following commands:
# pip install faker
# pip install pandas
# pip install datetime
#
# If you're using a Jupyter notebook, you can prefix these commands with an exclamation mark:
# !pip install faker
# !pip install pandas
# !pip install datetime
from faker import Faker
import pandas as pd
from datetime import datetime, timedelta
# Function to generate data for a given country. Here 100 companies with 10 contacts, deals, tickets for each company
def generate_data(
country,
company_rows=100,
contacts_per_company=10,
deals_per_company=10,
products_per_deal=10,
):
# Set the locale for Faker based on the country
if country == "Germany":
faker = Faker("de_DE")
elif country == "United States":
faker = Faker("en_US")
elif country == "France":
faker = Faker("fr_FR")
elif country == "Italy":
faker = Faker("it_IT")
elif country == "Japan":
faker = Faker("ja_JP")
elif country == "United Kingdom":
faker = Faker("en_GB")
elif country == "Canada":
faker = Faker("en_CA")
elif country == "Austria":
faker = Faker("de_AT")
elif country == "Switzerland":
faker = Faker("de_CH")
# Create a dictionary to hold the data
data = {
"company_name": [],
"company_domain": [],
"company_industry": [],
"company_address": [],
"company_country": [],
"contact_firstname": [],
"contact_lastname": [],
"contact_email": [],
"contact_phone": [],
"contact_address": [],
"contact_country": [],
"contact_function": [],
"contact_department": [],
"deal_name": [],
"deal_stage": [],
"deal_amount": [],
"deal_type": [],
"deal_source": [],
"close_date": [],
"ticket_title": [],
"ticket_status": [],
"ticket_priority": [],
"product_name": [],
"product_price": [],
"product_description": [],
"product_sku": [],
"product_quantity": [],
}
# Loop to generate data for each company
for _ in range(company_rows):
company_name = faker.company()
company_domain = faker.domain_name()
company_industry = faker.random_element(
["Technology", "Healthcare", "Finance", "Real Estate"]
)
company_address = faker.address().replace("\n", ", ")
company_country = country
# Loop to generate data for each contact
for _ in range(contacts_per_company):
contact_firstname = faker.first_name()
contact_lastname = faker.last_name()
contact_email = faker.email()
contact_phone = faker.phone_number()
contact_address = faker.address().replace("\n", ", ")
contact_country = country
contact_function = faker.job()
contact_department = faker.random_element(
["Sales", "Marketing", "Human Resources", "Engineering"]
)
# Append generated company and contact data to the lists in the dictionary
data["company_name"].append(company_name)
data["company_domain"].append(company_domain)
data["company_industry"].append(company_industry)
data["company_address"].append(company_address)
data["company_country"].append(company_country)
data["contact_firstname"].append(contact_firstname)
data["contact_lastname"].append(contact_lastname)
data["contact_email"].append(contact_email)
data["contact_phone"].append(contact_phone)
data["contact_address"].append(contact_address)
data["contact_country"].append(contact_country)
data["contact_function"].append(contact_function)
data["contact_department"].append(contact_department)
# Generate deal and product data
data["deal_name"].append(f"Deal-{faker.uuid4()}")
data["deal_stage"].append(
faker.random_element(
[
"Appointment Scheduled",
"Qualified To Buy",
"Presentation Scheduled",
"Decision Maker Brought-In",
]
)
)
data["deal_amount"].append(faker.random_int(min=1000, max=50000))
data["deal_type"].append(
faker.random_element(["New Business", "Existing Business"])
)
data["deal_source"].append(
faker.random_element(
["Direct Traffic", "Organic Search", "Paid Search", "Social Media"]
)
)
data["close_date"].append(
(
datetime.today() + timedelta(days=faker.random_int(min=1, max=90))
).date()
)
# Generate product data
data["product_name"].append(f"Product-{faker.uuid4()}")
data["product_price"].append(faker.random_int(min=10, max=1000))
data["product_description"].append(faker.catch_phrase())
data["product_sku"].append(faker.random_int(min=10000, max=99999))
data["product_quantity"].append(faker.random_int(min=1, max=100))
# Generate ticket data
data["ticket_title"].append(f"Ticket-{faker.uuid4()}")
data["ticket_status"].append(
faker.random_element(
["New", "Waiting on contact", "Waiting on us", "Closed"]
)
)
data["ticket_priority"].append(
faker.random_element(["Low", "Medium", "High"])
)
# Convert the data dictionary to a pandas DataFrame
df = pd.DataFrame(data)
return df
# Define the list of countries for which we want to generate data
g7_countries = [
"Canada",
"France",
"Germany",
"Italy",
"Japan",
"United Kingdom",
"United States",
"Austria",
"Switzerland",
]
# Create an empty DataFrame to hold the generated data
result = pd.DataFrame()
for country in g7_countries:
df = generate_data(country)
# Append the data for each country to the result DataFrame
result = result.append(df)
# Write the generated data to an Excel file
result.to_excel(r"C:\~\~\~\hubspot_dummy_data.xlsx", index=False)
1 个回答
1
关于你标题中的问题,提到的这个不再推荐使用的方法 pandas.DataFrame.append
。
正如评论中提到的,你在问题中分享的代码第一部分使用的 append
是 list.append
,这个方法是可以正常工作的,速度也很快,而且没有被淘汰。
问题出在后面的部分,它使用了 pandas.DataFrame.append
:
# Create an empty DataFrame to hold the generated data
result = pd.DataFrame()
for country in g7_countries:
df = generate_data(country)
# Append the data for each country to the result DataFrame
result = result.append(df)
为了避免出现 AttributeError
错误,你可以改用 concat
:
result_list = []
for country in g7_countries:
df = generate_data(country)
result_list.append(df)
result = pd.concat(result_list)
这段代码生成了一些假数据,可能你只是想用来测试。我觉得没必要去优化它。