|
2 | 2 | <img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="50%" /> |
3 | 3 | </p> |
4 | 4 |
|
5 | | -# **data-diff** |
| 5 | +<h1 align="center"> |
| 6 | +data-diff |
| 7 | +</h1> |
| 8 | + |
| 9 | +<h2 align="center"> |
| 10 | +Develop dbt models faster by testing as you code. |
| 11 | +</h2> |
| 12 | +<h4 align="center"> |
| 13 | +See how every change to dbt code affects the data produced in the modified model and downstream. |
| 14 | +</h4> |
| 15 | +<br> |
6 | 16 |
|
7 | 17 | ## What is `data-diff`? |
8 | | -data-diff is a **free, open-source tool** that enables data professionals to detect differences in values between any two tables. |
9 | 18 |
|
10 | | -## Documentation |
| 19 | +data-diff is an open source package that you can use to see the impact of your dbt code changes on your dbt models as you code. |
11 | 20 |
|
12 | | -[**🗎 Documentation**](https://docs.datafold.com/guides/os_data_diff) - our detailed documentation has everything you need to start diffing. |
| 21 | +<div align="center"> |
13 | 22 |
|
14 | | -### Databases we support |
| 23 | + |
15 | 24 |
|
16 | | -- PostgreSQL >=10 |
17 | | -- MySQL |
18 | | -- Snowflake |
19 | | -- BigQuery |
20 | | -- Redshift |
21 | | -- Oracle |
22 | | -- Presto |
23 | | -- Databricks |
24 | | -- Trino |
25 | | -- Clickhouse |
26 | | -- Vertica |
27 | | -- DuckDB >=0.6 |
28 | | -- SQLite (coming soon) |
| 25 | +</div> |
29 | 26 |
|
30 | | -For their corresponding connection strings, check out our [detailed table](https://github.com/datafold/data-diff/blob/master/docs/supported-databases.md). |
| 27 | +<br> |
31 | 28 |
|
32 | | -#### Looking for a database not on the list? |
33 | | -If a database is not on the list, we'd still love to support it. [Please open an issue](https://github.com/datafold/data-diff/issues) to discuss it, or vote on existing requests to push them up our todo list. |
| 29 | +:eyes: **Watch 4-min demo video [here](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)** |
34 | 30 |
|
35 | | -## Get started |
| 31 | +## Getting Started |
36 | 32 |
|
37 | | -### Installation |
| 33 | +**Install `data-diff`** |
38 | 34 |
|
39 | | -#### First, install `data-diff` using `pip`. |
| 35 | +Install `data-diff` with the command that is specific to the database you use with dbt. |
40 | 36 |
|
| 37 | +### Snowflake |
41 | 38 | ``` |
42 | | -pip install data-diff |
| 39 | +pip install data-diff 'data-diff[snowflake,dbt]' -U |
43 | 40 | ``` |
44 | 41 |
|
45 | | -#### Then, install one or more driver(s) specific to the database(s) you want to connect to. |
46 | | - |
47 | | -- `pip install 'data-diff[mysql]'` |
48 | | - |
49 | | -- `pip install 'data-diff[postgresql]'` |
50 | | - |
51 | | -- `pip install 'data-diff[snowflake]'` |
52 | | - |
53 | | -- `pip install 'data-diff[presto]'` |
54 | | - |
55 | | -- `pip install 'data-diff[oracle]'` |
56 | | - |
57 | | -- `pip install 'data-diff[trino]'` |
58 | | - |
59 | | -- `pip install 'data-diff[clickhouse]'` |
60 | | - |
61 | | -- `pip install 'data-diff[vertica]'` |
62 | | - |
63 | | -- For BigQuery, see: https://pypi.org/project/google-cloud-bigquery/ |
64 | | - |
65 | | -_Some drivers have dependencies that cannot be installed using `pip` and still need to be installed manually._ |
66 | | - |
67 | | -### Run your first diff |
| 42 | +### BigQuery |
| 43 | +``` |
| 44 | +pip install data-diff 'data-diff[dbt]' google-cloud-bigquery -U |
| 45 | +``` |
68 | 46 |
|
69 | | -Once you've installed `data-diff`, you can run it from the command line. |
| 47 | +### Redshift |
| 48 | +``` |
| 49 | +pip install data-diff 'data-diff[redshift,dbt]' -U |
| 50 | +``` |
70 | 51 |
|
| 52 | +### Postgres |
71 | 53 | ``` |
72 | | -data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS] |
| 54 | +pip install data-diff 'data-diff[postgres,dbt]' -U |
73 | 55 | ``` |
74 | 56 |
|
75 | | -Be sure to read [the docs](https://docs.datafold.com/reference/open_source/cli) for detailed instructions how to build one of these commands depending on your database setup. |
| 57 | +### Databricks |
| 58 | +``` |
| 59 | +pip install data-diff 'data-diff[databricks,dbt]' -U |
| 60 | +``` |
76 | 61 |
|
77 | | -#### Code Example: Diff Tables Between Databases |
78 | | -Here's an example command for your copy/pasting, taken from the screenshot above when we diffed data between Snowflake and Postgres. |
| 62 | +### DuckDB |
| 63 | +``` |
| 64 | +pip install data-diff 'data-diff[duckdb,dbt]' -U |
| 65 | +``` |
79 | 66 |
|
| 67 | +**Update a few lines in your `dbt_project.yml`**. |
80 | 68 | ``` |
81 | | -data-diff \ |
82 | | - postgresql://<username>:'<password>'@localhost:5432/<database> \ |
83 | | - <table> \ |
84 | | - "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \ |
85 | | - <TABLE> \ |
86 | | - -k activity_id \ |
87 | | - -c activity \ |
88 | | - -w "event_timestamp < '2022-10-10'" |
| 69 | +#dbt_project.yml |
| 70 | +vars: |
| 71 | + data_diff: |
| 72 | + prod_database: my_database |
| 73 | + prod_schema: my_default_schema |
89 | 74 | ``` |
90 | 75 |
|
91 | | -#### Code Example: Diff Tables Within a Database |
| 76 | +**Run your first data diff!** |
92 | 77 |
|
93 | 78 | ``` |
94 | | -data-diff \ |
95 | | - "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA_1>?warehouse=<WAREHOUSE>&role=<ROLE>" <TABLE_1> \ |
96 | | - <SCHEMA_2>.<TABLE_2> \ |
97 | | - -k org_id \ |
98 | | - -c created_at -c is_internal \ |
99 | | - -w "org_id != 1 and org_id < 2000" \ |
100 | | - -m test_results_%t \ |
101 | | - --materialize-all-rows \ |
102 | | - --table-write-limit 10000 |
| 79 | +dbt run && data-diff --dbt |
103 | 80 | ``` |
104 | 81 |
|
105 | | -In both code examples, I've used `<>` carrots to represent values that **should be replaced with your values** in the database connection strings. For the flags (`-k`, `-c`, etc.), I opted for "real" values (`org_id`, `is_internal`) to give you a more realistic view of what your command will look like. |
| 82 | +We recommend you get started by walking through [our simple setup instructions](https://docs.datafold.com/development_testing/open_source) which contain examples and details. |
| 83 | + |
| 84 | +Please reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) if you have any trouble whatsoever getting started! |
106 | 85 |
|
107 | | -### We're here to help! |
| 86 | +<br><br> |
108 | 87 |
|
109 | | -We're here to help! Please post any questions in [GitHub Discussions](https://github.com/datafold/data-diff/discussions). |
| 88 | +### Diffing between databases |
110 | 89 |
|
111 | | -## How to Use |
| 90 | +Check out our [documentation](https://docs.datafold.com/reference/open_source/cli) if you're looking to compare data across databases (for example, between Postgres and Snowflake). |
112 | 91 |
|
113 | | -* [Examples with dbt, joindiff, and hashdiff](https://docs.datafold.com/reference/open_source/cli#examples) |
114 | | -* [Examples with Python](https://data-diff.readthedocs.io/en/latest/python-api.html) |
115 | | -* [How to use with TOML configuration file](https://docs.datafold.com/reference/open_source/cli#toml-config-file) |
| 92 | +<br> |
116 | 93 |
|
117 | | -## How to Contribute |
118 | | -* Feel free to open an issue or contribute to the project by working on an existing issue. |
119 | | -* Please read the [contributing guidelines](https://github.com/datafold/data-diff/blob/master/CONTRIBUTING.md) to get started. |
120 | | -* To add a new database driver, check out [docs](https://github.com/datafold/data-diff/blob/master/docs/new-database-driver-guide.rst). |
| 94 | +## Contributors |
121 | 95 |
|
122 | | -Big thanks to everyone who contributed so far: |
| 96 | +We thank everyone who contributed so far! |
123 | 97 |
|
124 | 98 | <a href="https://github.com/datafold/data-diff/graphs/contributors"> |
125 | 99 | <img src="https://contributors-img.web.app/image?repo=datafold/data-diff" /> |
126 | 100 | </a> |
127 | 101 |
|
128 | | -## Technical Explanation |
129 | | - |
130 | | -Check out this [technical explanation](https://github.com/datafold/data-diff/blob/master/docs/technical-explanation.md) of how data-diff works. |
| 102 | +<br> |
131 | 103 |
|
132 | 104 | ## Analytics |
| 105 | + |
133 | 106 | * [Usage Analytics & Data Privacy](https://github.com/datafold/data-diff/blob/master/docs/usage_analytics.md) |
134 | 107 |
|
| 108 | +<br> |
| 109 | + |
135 | 110 | ## License |
136 | 111 |
|
137 | 112 | This project is licensed under the terms of the [MIT License](https://github.com/datafold/data-diff/blob/master/LICENSE). |
0 commit comments