用于将SQLServer元数据摄取到Google云数据目录中的库
google-datacatalog-sqlserver-connector的Python项目详细描述
google datacatalog sqlserver连接器
用于将SQLServer元数据摄取到Google云数据目录中的库。 目前支持SQL Server 2017标准。在
免责声明:这不是官方支持的谷歌产品。
目录
- 1. Installation 在
- 2. Environment setup 在
- 3. Adapt user configurations
- 4. Run entry point 在
- 5 Scripts inside tools 在
- 6. Developer environment 在
- 7. Metrics
- 8. Troubleshooting
1。安装
使用pip在virtualenv中安装此库。virtualenv是一个用于 创建隔离的Python环境。它解决的基本问题是 依赖项和版本,以及间接的权限。在
使用virtualenv,可以在不需要系统的情况下安装此库 安装权限,并且不会与已安装的系统冲突 依赖关系。确保使用python3.6+。在
1.1条。Mac/Linux
pip3 install virtualenv
virtualenv --python python3.6 <your-env>
source <your-env>/bin/activate
<your-env>/bin/pip install google-datacatalog-sqlserver-connector
1.2条。Windows
^{pr2}$1.3条。从源安装
1.3.1。获取代码
git clone https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms/
cd datacatalog-connectors-rdbms/google-datacatalog-sqlserver-connector
1.3.2。创建并激活virtualenv
pip3 install virtualenv
virtualenv --python python3.6 <your-env>
source <your-env>/bin/activate
1.3.3。安装库
pip install .
2。环境设置
2.1条。身份验证凭据
2.1.1。创建服务帐户并授予其以下角色
- 数据目录管理员
2.1.2。下载一个JSON密钥并将其另存为
<YOUR-CREDENTIALS_FILES_FOLDER>/sqlserver2dc-credentials.json
Please notice this folder and file will be required in next steps.
2.2设置SQL Server驱动程序(可选)
在没有安装SQLServer的计算机上运行连接器时,需要执行以下步骤。在
2.3条。设置环境变量
根据您的环境替换以下值:
exportGOOGLE_APPLICATION_CREDENTIALS=data_catalog_credentials_file exportSQLSERVER2DC_DATACATALOG_PROJECT_ID=google_cloud_project_id exportSQLSERVER2DC_DATACATALOG_LOCATION_ID=google_cloud_location_id exportSQLSERVER2DC_SQLSERVER_SERVER=sqlserver_server exportSQLSERVER2DC_SQLSERVER_USERNAME=sqlserver_username exportSQLSERVER2DC_SQLSERVER_PASSWORD=sqlserver_password exportSQLSERVER2DC_SQLSERVER_DATABASE=sqlserver_database exportSQLSERVER2DC_RAW_METADATA_CSV=sqlserver_raw_csv (If supplied ignores the SQLSERVER server credentials)
3。调整用户配置
除了默认元数据之外,连接器还可以使用用户提供的值丰富元数据, 例如为每个模式和表名添加前缀。在
下表显示了默认情况下刮取的元数据,以及可配置的元数据。在
Metadata | Description | Scraped by default | Config option |
---|---|---|---|
schema_name | Name of the Schema | Y | --- |
table_name | Name of a table | Y | --- |
table_type | Type of a table (BASE, VIEW, etc) | Y | --- |
column_name | Name of a column | Y | --- |
column_type | Column data type | Y | --- |
column_default_value | Default value of a column | Y | --- |
column_nullable | Whether a column is nullable | Y | --- |
column_char_length | Char length of values in a column | Y | --- |
column_numeric_precision | Numeric precision of values in a column | Y | --- |
prefix | Prefix to be added to schema and tables name | N/A | enrich_metadata.entry_prefix |
entry_id_pattern_for_prefix | Entry ID pattern which the prefix will be applied | N/A | enrich_metadata.entry_id_pattern_for_prefix |
prefix
应符合数据目录entryId
:
The ID must begin with a letter or underscore, contain only English letters, numbers and underscores, and have at most 64 characters (combined the prefix + the entryId).
如果提供了entry_id_pattern_for_prefix
,则前缀将仅应用于此模式。在
存储库根目录中的示例配置文件ingest_cfg.yaml显示了预期的配置类型。在
如果要启用用户定义的配置,请添加摄取_cfg.yaml公司指向执行连接器的目录,并根据需要调整它。
4。运行入口点
4.1条。运行Python入口点
- 虚拟人
google-datacatalog-sqlserver-connector \ --datacatalog-project-id=$SQLSERVER2DC_DATACATALOG_PROJECT_ID\ --datacatalog-location-id=$SQLSERVER2DC_DATACATALOG_LOCATION_ID\ --sqlserver-host=$SQLSERVER2DC_SQLSERVER_SERVER\ --sqlserver-user=$SQLSERVER2DC_SQLSERVER_USERNAME\ --sqlserver-pass=$SQLSERVER2DC_SQLSERVER_PASSWORD\ --sqlserver-database=$SQLSERVER2DC_SQLSERVER_DATABASE\ --raw-metadata-csv=$SQLSERVER2DC_RAW_METADATA_CSV
4.2条。使用用户定义的entry resource URL前缀运行Python入口点
当此选项无法准确确定数据库主机名时,连接器非常有用。 例如,在代理、负载平衡器或数据库读取副本下运行时, 您可以指定主实例的前缀,以便资源URL指向 到存储数据的确切数据库。在
- 虚拟人
google-datacatalog-sqlserver-connector \ --datacatalog-project-id=$SQLSERVER2DC_DATACATALOG_PROJECT_ID\ --datacatalog-location-id=$SQLSERVER2DC_DATACATALOG_LOCATION_ID\ --datacatalog-entry-resource-url-prefix project/database-instance \ --sqlserver-host=$SQLSERVER2DC_SQLSERVER_SERVER\ --sqlserver-user=$SQLSERVER2DC_SQLSERVER_USERNAME\ --sqlserver-pass=$SQLSERVER2DC_SQLSERVER_PASSWORD\ --sqlserver-database=$SQLSERVER2DC_SQLSERVER_DATABASE\ --raw-metadata-csv=$SQLSERVER2DC_RAW_METADATA_CSV
4.3。运行Docker入口点
docker build -t sqlserver2datacatalog . docker run --rm --tty -v YOUR-CREDENTIALS_FILES_FOLDER:/data sqlserver2datacatalog \ --datacatalog-project-id=$SQLSERVER2DC_DATACATALOG_PROJECT_ID\ --datacatalog-location-id=$SQLSERVER2DC_DATACATALOG_LOCATION_ID\ --sqlserver-host=$SQLSERVER2DC_SQLSERVER_SERVER\ --sqlserver-user=$SQLSERVER2DC_SQLSERVER_USERNAME\ --sqlserver-pass=$SQLSERVER2DC_SQLSERVER_PASSWORD\ --sqlserver-database=$SQLSERVER2DC_SQLSERVER_DATABASE\ --raw-metadata-csv=$SQLSERVER2DC_RAW_METADATA_CSV
工具内有5个脚本
5.1条。运行清理
# List of projects split by comma. Can be a single value without commaexportSQLSERVER2DC_DATACATALOG_PROJECT_IDS=my-project-1,my-project-2
# Run the clean up python tools/cleanup_datacatalog.py --datacatalog-project-ids=$SQLSERVER2DC_DATACATALOG_PROJECT_IDS
6。开发者环境
6.1条。安装并运行Yapf formatter
pip install --upgrade yapf # Auto update files yapf --in-place --recursive src tests # Show diff yapf --diff --recursive src tests # Set up pre-commit hook# From the root of your git project. curl -o pre-commit.sh https://raw.githubusercontent.com/google/yapf/master/plugins/pre-commit.sh chmod a+x pre-commit.sh mv pre-commit.sh .git/hooks/pre-commit
6.2条。安装并运行Flake8 linter
pip install --upgrade flake8 flake8 src tests
6.3条。运行测试
python setup.py test
7。指标
{a36}
8。故障排除
在连接器执行达到数据目录配额限制的情况下,将引发错误并记录以下详细信息,具体取决于执行的读/写/搜索操作:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Quota exceeded for quota metric 'Read requests' and limit 'Read requests per minute' of service 'datacatalog.googleapis.com' for consumer 'project_number:1111111111111'."
debug_error_string =
"{"created":"@1587396969.506556000", "description":"Error received from peer ipv4:172.217.29.42:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Quota exceeded for quota metric 'Read requests' and limit 'Read requests per minute' of service 'datacatalog.googleapis.com' for consumer 'project_number:1111111111111'.","grpc_status":8}"
For有关数据目录配额的详细信息,请转到:Data Catalog quota docs。在
- 项目
标签: