Query and Visualize

Set up environment

1 !pip install -U athena-intelligence

1 import json
2 import pandas as pd
3 from IPython.display import Markdown
4 
5 ATHENA_API_KEY = "<YOUR API KEY HERE>"
6 
7 from athena import Model, Tools
8 from athena.client import Athena
9 
10 athena = Athena(
11     api_key=ATHENA_API_KEY,
12 )

Get datasets

Call dataset.get method to get datasets. Use optional pagination parameters to run bulk workflows with datasets.

1 datasets = athena.dataset.get(page=1, page_size=5)
2 datasets

Athena returns a json object with a list of datasets with the following fields: dataset id, name, database id, schema details (dialect, CREATE statement and first 3 rows), as well as pagination info.

To access raw json, use .json():

1 data = json.loads(datasets.json())
2 datasets_list = data['datasets']
3 import pandas as pd
4 pd.set_option('display.max_colwidth', None)
5 df_datasets = pd.DataFrame(datasets_list)
6 df_datasets

Document individual datasets with `athena.submit_and_poll`

With datasets loaded, we can proceed with the documentation workflow. We’ll start by defining a function that takes a list of datasets and send them one by one to Athena with a documentation prompt.

1 documentation_responses = [] 
2 def generate_documentation_for_dataset(dataset_name, dataset_schema_details):
3     # Placeholder for the function to submit and poll for documentation generation
4     message = athena.message.submit_and_poll(
5     content=
6     f"""
7 **Task:** Generate comprehensive documentation for a dataset.
8 
9 **Objective:**
10 Create output template documentation for a table, detailing its schema, fields, and relevant metadata. The documentation should follow the structure provided below and adhere to the specified markdown format and tone. Use metadata and other available information to produce the documentation tailored to the context. This documentation will serve as a guide for understanding the dataset's structure, purpose, and usage within the organization. It should be clear, concise, and informative, catering to both technical and non-technical stakeholders.
11 
12 **Instructions:**
13 1. Explore information available on the dataset {dataset_name}:
14 - dataset metadata: 
15 
16 {dataset_schema_details}
17 
18 
19 2. For each section of the documentation, provide clear, concise information as outlined in the output template. Use professional language and ensure the documentation is accessible to a broad audience.
20 3. Include a brief example value or description where requested to illustrate the type of content expected.
21 4. Only include factual statements. When making assumptions or inferences, clearly label them as such.
22 
23 **Output Template:**
24 
25 ## Athena Generated Dataset Documentation
26 
27 ### TABLE: \`[TABLE NAME]\`
28 
29 **Generated on: [CURRENT DATE]**
30 
31 #### Dataset Description:
32 Provide a comprehensive explanation of the table's purpose, detailing what one row represents and the business process or workflow it supports.
33 
34 #### Field Report:
35 Document each field in the table, including its name, description, data type, and an example value.
36 
37 | Field Name | Field Description | Field Type | Example Value |
38 | ---------- | ----------------- | ---------- | ------------- |
39 | [FIELD NAME] | [FIELD DESCRIPTION] | [FIELD TYPE] | [EXAMPLE VALUE] |
40 | ...additional fields as necessary... |
41 
42 #### Sample Query and First Three Rows:
43 Include a sample SQL query that returns the first three rows of data, followed by the results of the query.
44 
45 #### Use Cases & Guidelines:
46 Describe the organization's use cases and guidelines for using this dataset, highlighting any best practices or restrictions.
47 
48 #### Other Notes & Considerations:
49 List any additional notes or considerations relevant to the dataset's use or interpretation.
50 
51 **End of Template**
52 
53 Please ensure all information is accurate and up-to-date, reflecting the current state of the dataset as of [CURRENT DATE].
54     """,
55     model=Model.MIXTRAL_SMALL_8_X_7_B_0211,
56     tools=[],
57     )
58     print(f"Generating documentation for dataset: {dataset_name}")
59     message_json=json.loads(message.json())
60     documentation_responses.append({'dataset_name': dataset_name, 'documentation_message': message_json['content']})
61

Now we can kick off the workflow.

1 # Iterate over each row in the DataFrame
2 for index, row in df_datasets.iterrows():
3     dataset_name = row['name']
4     dataset_schema_details = row['schema_details']
5     
6     # Generate documentation for the current dataset
7     generate_documentation_for_dataset(dataset_name, dataset_schema_details)

Convert results to markdown to read and copy generated documentation.

1 def json_to_markdown_document(json_list):
2     markdown_document = ""
3     if not json_list:
4         return "No data available"
5     
6     for item in json_list:
7         for key, value in item.items():
8             markdown_document += f"**{key}:** {value}\n\n"
9         markdown_document += "---\n\n"  # Separator line between items
10     
11     return markdown_document
12 
13 # Convert the list of dictionaries to Markdown
14 markdown_document = json_to_markdown_document(documentation_responses)
15 
16 # Display the Markdown in the notebook
17 display(Markdown(markdown_document))

Generate documentation and ERD diagrams for multiple datasets

Now that we documented all individual tables, we can ask Athena to process proccess created documentation and generate a higher-level description of the whole body of data, together with joins and other notable relationships between tables.

1 def generate_high_level_documentation(markdown_document):
2     # Placeholder for the function to submit and poll for high-level documentation generation
3     message = athena.message.submit_and_poll(
4     content=
5     f"""
6 **Task:** Generate high-level comprehensive documentation for a body of datasets.
7 
8 **Objective:**
9 Create high-level output documentation for multiple related tables, detailing their schema, fields, relationships, and relevant metadata. The documentation should follow the structure provided below and adhere to the specified markdown format and tone. Use the provided markdown document and other available information to produce the documentation tailored to the context. This documentation will serve as a guide for understanding the structure, purpose, and usage of the datasets within the organization. It should be clear, concise, and informative, catering to both technical and non-technical stakeholders.
10 
11 **Instructions:**
12 1. Explore information available in the provided markdown document:
13 - Provided markdown document:
14 
15 {markdown_document}
16 
17 
18 2. For each section of the documentation, provide clear, concise information as outlined in the output template. Use professional language and ensure the documentation is accessible to a broad audience.
19 3. Include diagrams such as Entity-Relationship Diagrams (ERD) and other helpful diagrams to explore relationships in the data.
20 4. Discuss possible analyses and how the datasets can be joined for these analyses.
21 5. Only include factual statements. When making assumptions or inferences, clearly label them as such.
22 6. Pay attention to Mermaid diagram dialect and double-check yourself. 
23 
24 **Output Template:**
25 
26 ## Athena Generated High-Level Dataset Documentation
27 
28 ### Overview of Datasets
29 
30 Provide a brief overview of the datasets included in the markdown document, summarizing their purpose and how they relate to each other.
31 
32 ### Entity-Relationship Diagram (ERD)
33 
34 Include an ERD that visually represents the relationships between the datasets.
35 
36 ### Possible Analyses
37 
38 Discuss potential analyses that could be performed using these datasets, highlighting how they can be joined and what insights might be derived.
39 
40 ### Other Helpful Diagrams
41 
42 Include other diagrams that may help in understanding the relationships between the datasets, such as flowcharts or sequence diagrams.
43 
44 ### Guidelines for Use
45 
46 Describe the organization's guidelines for using these datasets together, including any best practices or restrictions.
47 
48 ### Other Notes & Considerations
49 
50 List any additional notes or considerations relevant to the use or interpretation of these datasets as a whole.
51 
52 **End of Template**
53 
54 Please ensure all information is accurate and up-to-date, reflecting the current state of the datasets as of [CURRENT DATE].
55     """,
56     model=Model.MIXTRAL_SMALL_8_X_7_B_0211,
57     tools=[],
58     )
59     print("Generating description for provided dataset-level documentation")
60     message_json=json.loads(message.json())
61     return message_json['content']

1 high_level_documentation = generate_high_level_documentation(markdown_document)
2 display(Markdown(high_level_documentation))

Getting Started

Guides

API Reference

Set up environment

Get datasets

Document individual datasets with `athena.submit_and_poll`

Generate documentation and ERD diagrams for multiple datasets

1	import json
2	import pandas as pd
3	from IPython.display import Markdown
4
5	ATHENA_API_KEY = "<YOUR API KEY HERE>"
6
7	from athena import Model, Tools
8	from athena.client import Athena
9
10	athena = Athena(
11	api_key=ATHENA_API_KEY,
12	)

1	datasets = athena.dataset.get(page=1, page_size=5)
2	datasets

1	data = json.loads(datasets.json())
2	datasets_list = data['datasets']
3	import pandas as pd
4	pd.set_option('display.max_colwidth', None)
5	df_datasets = pd.DataFrame(datasets_list)
6	df_datasets

1	documentation_responses = []
2	def generate_documentation_for_dataset(dataset_name, dataset_schema_details):
3	# Placeholder for the function to submit and poll for documentation generation
4	message = athena.message.submit_and_poll(
5	content=
6	f"""
7	Task: Generate comprehensive documentation for a dataset.
8
9	Objective:
10	Create output template documentation for a table, detailing its schema, fields, and relevant metadata. The documentation should follow the structure provided below and adhere to the specified markdown format and tone. Use metadata and other available information to produce the documentation tailored to the context. This documentation will serve as a guide for understanding the dataset's structure, purpose, and usage within the organization. It should be clear, concise, and informative, catering to both technical and non-technical stakeholders.
11
12	Instructions:
13	1. Explore information available on the dataset {dataset_name}:
14	- dataset metadata:
15
16	{dataset_schema_details}
17
18
19	2. For each section of the documentation, provide clear, concise information as outlined in the output template. Use professional language and ensure the documentation is accessible to a broad audience.
20	3. Include a brief example value or description where requested to illustrate the type of content expected.
21	4. Only include factual statements. When making assumptions or inferences, clearly label them as such.
22
23	Output Template:
24
25	## Athena Generated Dataset Documentation
26
27	### TABLE: \`[TABLE NAME]\`
28
29	Generated on: [CURRENT DATE]
30
31	#### Dataset Description:
32	Provide a comprehensive explanation of the table's purpose, detailing what one row represents and the business process or workflow it supports.
33
34	#### Field Report:
35	Document each field in the table, including its name, description, data type, and an example value.
36
37	\| Field Name \| Field Description \| Field Type \| Example Value \|
38	\| ---------- \| ----------------- \| ---------- \| ------------- \|
39	\| [FIELD NAME] \| [FIELD DESCRIPTION] \| [FIELD TYPE] \| [EXAMPLE VALUE] \|
40	\| ...additional fields as necessary... \|
41
42	#### Sample Query and First Three Rows:
43	Include a sample SQL query that returns the first three rows of data, followed by the results of the query.
44
45	#### Use Cases & Guidelines:
46	Describe the organization's use cases and guidelines for using this dataset, highlighting any best practices or restrictions.
47
48	#### Other Notes & Considerations:
49	List any additional notes or considerations relevant to the dataset's use or interpretation.
50
51	End of Template
52
53	Please ensure all information is accurate and up-to-date, reflecting the current state of the dataset as of [CURRENT DATE].
54	""",
55	model=Model.MIXTRAL_SMALL_8_X_7_B_0211,
56	tools=[],
57	)
58	print(f"Generating documentation for dataset: {dataset_name}")
59	message_json=json.loads(message.json())
60	documentation_responses.append({'dataset_name': dataset_name, 'documentation_message': message_json['content']})
61

1	# Iterate over each row in the DataFrame
2	for index, row in df_datasets.iterrows():
3	dataset_name = row['name']
4	dataset_schema_details = row['schema_details']
5
6	# Generate documentation for the current dataset
7	generate_documentation_for_dataset(dataset_name, dataset_schema_details)

1	def json_to_markdown_document(json_list):
2	markdown_document = ""
3	if not json_list:
4	return "No data available"
5
6	for item in json_list:
7	for key, value in item.items():
8	markdown_document += f"{key}: {value}\n\n"
9	markdown_document += "---\n\n" # Separator line between items
10
11	return markdown_document
12
13	# Convert the list of dictionaries to Markdown
14	markdown_document = json_to_markdown_document(documentation_responses)
15
16	# Display the Markdown in the notebook
17	display(Markdown(markdown_document))

1	def generate_high_level_documentation(markdown_document):
2	# Placeholder for the function to submit and poll for high-level documentation generation
3	message = athena.message.submit_and_poll(
4	content=
5	f"""
6	Task: Generate high-level comprehensive documentation for a body of datasets.
7
8	Objective:
9	Create high-level output documentation for multiple related tables, detailing their schema, fields, relationships, and relevant metadata. The documentation should follow the structure provided below and adhere to the specified markdown format and tone. Use the provided markdown document and other available information to produce the documentation tailored to the context. This documentation will serve as a guide for understanding the structure, purpose, and usage of the datasets within the organization. It should be clear, concise, and informative, catering to both technical and non-technical stakeholders.
10
11	Instructions:
12	1. Explore information available in the provided markdown document:
13	- Provided markdown document:
14
15	{markdown_document}
16
17
18	2. For each section of the documentation, provide clear, concise information as outlined in the output template. Use professional language and ensure the documentation is accessible to a broad audience.
19	3. Include diagrams such as Entity-Relationship Diagrams (ERD) and other helpful diagrams to explore relationships in the data.
20	4. Discuss possible analyses and how the datasets can be joined for these analyses.
21	5. Only include factual statements. When making assumptions or inferences, clearly label them as such.
22	6. Pay attention to Mermaid diagram dialect and double-check yourself.
23
24	Output Template:
25
26	## Athena Generated High-Level Dataset Documentation
27
28	### Overview of Datasets
29
30	Provide a brief overview of the datasets included in the markdown document, summarizing their purpose and how they relate to each other.
31
32	### Entity-Relationship Diagram (ERD)
33
34	Include an ERD that visually represents the relationships between the datasets.
35
36	### Possible Analyses
37
38	Discuss potential analyses that could be performed using these datasets, highlighting how they can be joined and what insights might be derived.
39
40	### Other Helpful Diagrams
41
42	Include other diagrams that may help in understanding the relationships between the datasets, such as flowcharts or sequence diagrams.
43
44	### Guidelines for Use
45
46	Describe the organization's guidelines for using these datasets together, including any best practices or restrictions.
47
48	### Other Notes & Considerations
49
50	List any additional notes or considerations relevant to the use or interpretation of these datasets as a whole.
51
52	End of Template
53
54	Please ensure all information is accurate and up-to-date, reflecting the current state of the datasets as of [CURRENT DATE].
55	""",
56	model=Model.MIXTRAL_SMALL_8_X_7_B_0211,
57	tools=[],
58	)
59	print("Generating description for provided dataset-level documentation")
60	message_json=json.loads(message.json())
61	return message_json['content']

1	high_level_documentation = generate_high_level_documentation(markdown_document)
2	display(Markdown(high_level_documentation))

Set up environment

Get datasets

Document individual datasets with athena.submit_and_poll

Generate documentation and ERD diagrams for multiple datasets

Document individual datasets with `athena.submit_and_poll`