Query and Visualize

1

Set up environment

1!pip install -U athena-intelligence
1import json
2import pandas as pd
3import os
4
5ATHENA_API_KEY = os.environ["ATHENA_API_KEY"]
6
7from athena import Model, Tools
8from athena.client import Athena
9
10athena = Athena(
11 api_key=ATHENA_API_KEY,
12)
2

Query SQL with athena.query.execute()

Initialize default database or specify id of another connected database. Ask Athena team if unclear.

1# default database id = 9
2default_database_id = 9

Put your query here

1query = """
2WITH RankedCustomers AS (
3 SELECT
4 r.route_id,
5 r.src AS route_source,
6 r.dst AS route_destination,
7 ts.passenger_id,
8 c.name AS customer_name,
9 SUM(ts.num_tickets) AS total_tickets,
10 RANK() OVER (PARTITION BY r.route_id ORDER BY SUM(ts.num_tickets) DESC) AS rank
11 FROM
12 sample_schema.train_routes r
13 JOIN sample_schema.train_trips t ON r.route_id = t.route_id
14 JOIN sample_schema.train_ticketsales ts ON t.trip_id = ts.trip_id
15 JOIN sample_schema.train_customers c ON ts.passenger_id = c.passenger_id
16 GROUP BY
17 r.route_id,
18 r.src,
19 r.dst,
20 ts.passenger_id,
21 c.name
22)
23SELECT
24 route_id,
25 route_source,
26 route_destination,
27 passenger_id,
28 customer_name,
29 total_tickets
30FROM
31 RankedCustomers
32WHERE
33 rank <= 3
34ORDER BY
35 route_id,
36 rank;
37"""

Run the query and convert to Pandas dataframe for subsequent analysis

1query_result = athena.query.execute(
2 sql_command = query,
3 database_id = default_database_id
4 )
5
6formatted_query_result = json.loads(query_result.json())
7data_output = formatted_query_result['result']['data']
8
9# Convert the list of dictionaries to a DataFrame
10df = pd.DataFrame(data_output)
11
12# Display the DataFrame to verify the contents
13df
3

Visualize data using Athena Notebooks Sidebar

Once you have an SQL output, you can harness the power and flexibility of visualisations or further analysis using Python. In Athena Notebooks, you can click “Chat” in the top right corner to ask Athena for help writing code for visualizations you need. Copy and paste a dataframe headers and 2-3 rows to give Athena all needed context.

  • Write code to visualize this table: route_id route_source route_destination passenger_id customer_name total_tickets 0 1 West Sylvia Williamsshire 816 Jaruwan Rogers 13 1 1 West Sylvia Williamsshire 679 Karen Tapp 9

  • Sure, here’s the code: …

Copy & run the code in the notebook.

1import pandas as pd
2import matplotlib.pyplot as plt
3import numpy as np
4
5# Assuming 'df' is your DataFrame
6# Ensure your DataFrame is named 'df' or adjust the variable name accordingly
7
8# Visualization 1: Total Tickets Sold by Route
9route_tickets_sum = df.groupby('route_id')['total_tickets'].sum().reset_index()
10plt.figure(figsize=(10, 6))
11plt.bar(route_tickets_sum['route_id'], route_tickets_sum['total_tickets'], color='deepskyblue') # Changed color to 'deepskyblue'
12plt.title('Total Tickets Sold by Route')
13plt.xlabel('Route ID')
14plt.ylabel('Total Tickets Sold')
15plt.xticks(rotation=45)
16plt.tight_layout()
17plt.show()