从dataframe创建postgres表时删除主键约束

2024-06-16 11:03:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从pandas dataframe在Postgres中创建几个表,但我一直遇到这个错误

 psycopg2.errors.InvalidForeignKey: there is no unique constraint matching given keys for referenced table "titles"

在研究这个问题几个小时后,我终于发现,当我从pandas dataframe将数据插入父表时,主键约束由于某些原因而被删除,因此我在尝试从另一个表引用它时会出现此错误

但是,当我使用pgAdmin4创建表并手动插入几行数据时,我没有遇到这个问题

enter image description hereenter image description here

您可以看到,当我使用pgAdmin创建表时,主键和外键按照预期创建,我对此没有问题

但是,当我尝试使用psycopg2库从pandas dataframe插入数据时,主键没有被创建

我不明白为什么会这样

enter image description here

我用于创建表的代码-

# function for faster data insertion
def psql_insert_copy(table, conn, keys, data_iter):
    """
    Execute SQL statement inserting data

    Parameters
    ----------
    table : pandas.io.sql.SQLTable
    conn : sqlalchemy.engine.Engine or sqlalchemy.engine.Connection
    keys : list of str
        Column names
    data_iter : Iterable that iterates the values to be inserted
    """
    # gets a DBAPI connection that can provide a cursor
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)

        columns = ", ".join('"{}"'.format(k) for k in keys)
        if table.schema:
            table_name = "{}.{}".format(table.schema, table.name)
        else:
            table_name = table.name

        sql = "COPY {} ({}) FROM STDIN WITH CSV".format(table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)


def create_titles_table():
    # connect to the database
    conn = psycopg2.connect(
        dbname="imdb",
        user="postgres",
        password=os.environ.get("DB_PASSWORD"),
        host="localhost",
    )
    # create a cursor
    c = conn.cursor()

    print()
    print("Creating titles table...")
    c.execute(
        """CREATE TABLE IF NOT EXISTS titles(
                title_id  TEXT PRIMARY KEY,
                title_type  TEXT, 
                primary_title  TEXT, 
                original_title  TEXT,
                is_adult  INT, 
                start_year  REAL, 
                end_year  REAL, 
                runtime_minutes REAL
                )
    """
    )
    # commit changes
    conn.commit()

    # read the title data
    df = load_data("title.basics.tsv")
    # replace \N with nan
    df.replace("\\N", np.nan, inplace=True)
    # rename columns
    df.rename(
        columns={
            "tconst": "title_id",
            "titleType": "title_type",
            "primaryTitle": "primary_title",
            "originalTitle": "original_title",
            "isAdult": "is_adult",
            "startYear": "start_year",
            "endYear": "end_year",
            "runtimeMinutes": "runtime_minutes",
        },
        inplace=True,
    )
    # drop the genres column
    title_df = df.drop("genres", axis=1)
    # convert the data types from str to numeric
    title_df["start_year"] = pd.to_numeric(title_df["start_year"], errors="coerce")
    title_df["end_year"] = pd.to_numeric(title_df["end_year"], errors="coerce")
    title_df["runtime_minutes"] = pd.to_numeric(
        title_df["runtime_minutes"], errors="coerce"
    )

    # create SQLAlchemy engine
    engine = create_engine(
        "postgresql://postgres:" + os.environ["DB_PASSWORD"] + "@localhost:5432/imdb"
    )
    # insert the data into titles table
    title_df.to_sql(
        "titles", engine, if_exists="replace", index=False, method=psql_insert_copy
    )

    # commit changes
    conn.commit()
    # close cursor
    c.close()
    # close the connection
    conn.close()
    print("Completed!")
    print()


def create_genres_table():

    # connect to the database
    conn = psycopg2.connect(
        dbname="imdb",
        user="postgres",
        password=os.environ.get("DB_PASSWORD"),
        host="localhost",
    )

    # create a cursor
    c = conn.cursor()

    print()
    print("Creating genres table...")
    c.execute(
        """CREATE TABLE IF NOT EXISTS genres(
            title_id  TEXT NOT NULL, 
            genre  TEXT,
            FOREIGN KEY (title_id) REFERENCES titles(title_id)
            )
    """
    )
    # commit changes
    conn.commit()

    # read the data
    df = load_data("title.basics.tsv")
    # replace \N with nan
    df.replace("\\N", np.nan, inplace=True)
    # rename columns
    df.rename(columns={"tconst": "title_id", "genres": "genre"}, inplace=True)
    # select only relevant columns
    genres_df = df[["title_id", "genre"]].copy()
    genres_df = genres_df.assign(genre=genres_df["genre"].str.split(",")).explode(
        "genre"
    )

    # create engine
    engine = create_engine(
        "postgresql://postgres:" + os.environ["DB_PASSWORD"] + "@localhost:5432/imdb"
    )

    # insert the data into genres table
    genres_df.to_sql(
        "genres", engine, if_exists="replace", index=False, method=psql_insert_copy
    )

    # commit changes
    conn.commit()
    # close cursor
    c.close()
    # close the connection
    conn.close()
    print("Completed!")
    print()

if __name__ == "__main__":

    print()
    print("Creating IMDB Database...")
    # connect to the database
    conn = psycopg2.connect(
        dbname="imdb",
        user="postgres",
        password=os.environ.get("DB_PASSWORD"),
        host="localhost",
    )

    # create the titles table
    create_titles_table()

    # create genres table
    create_genres_table()
    # close the connection
    conn.close()
    print("Done with Everything!")
    print()

Tags: thetodfclosedatatitlecreatetable
1条回答
网友
1楼 · 发布于 2024-06-16 11:03:40

我认为问题是to_sql(if_exists="replace")。尝试使用to_sql(if_exists="append")-我的理解是“replace”会删除整个表,并创建一个没有约束的新表

相关问题 更多 >