Use cases of data and Schemaspy: Database Management

After surveying all the articles on the web, all articles provide information about installing Schemaspy, but very few articles provide its use case. So here’s an article on Schemaspy which allows you to get basic stats for any database and its use cases for database management. Just to mention the world is driven by data and perhaps data can also help in database management. In our case, we will get this data from Schemaspy.

Below are some insights or anomalies which can be identified by using Schemaspy.

1. Identifying and Removing Orphaned Tables

Occasionally, the development and testing of processes may cause some database objects to become ‘orphaned’. The entities that used to depend upon them are no longer present, so the objects themselves are no longer linked to, or required by, any other part of the system. Orphan tables are the table with a missing primary key or foreign key.

To view the orphan table from the database click on the ‘Orphan table’ tab on schemaspy on the top bar.

Use the below steps to do a different assessment that needs to be done before removing the orphan table.

i. Identify all tables without primary key. Check if they are in use or the table's requirement and remove the table if not important.

ii. Identify all the tables without a foreign key, this step is important. There might be a case where a table has a primary key but a foreign key is not yet created. Yet this table can be important.

Remove all unwanted tables manually using pgadmin or other tools.

2. Identify irrelevant tables

Below are the possibilities that the table is irrelevant

- Tables with 0 rows — these are the tables with no data

- Tables with single column — these are the tables which have no value as it just has one column

Above irrelevant tables can be identified using schemaspy. To identify these columns go to tab ‘Tables’ (option available on the landing page), a table will appear with a list of tables in the database along with their count of columns and rows. Click on arrows beside column name ‘Row’ (in ascending order, shown in the image below), tables with rows 0 will appear on top, likewise, click on arrows beside column name ‘Columns’ to get the list of tables with 0 or 1 columns.

You can prepare the list of tables using the above step and remove the tables from pgadmin or other tools.

3. Identify tables with large data or unused/unselected data

The first question arises before identifying these columns is “Does it affect the performance, if a table has a large amount of data?

The answer to the above question is ‘Yes’ and ‘No’ depending on the scenarios.

First scenario ‘No’ as the answer, if the table has a large number of rows it won’t affect the performance to an extent. But still, that data can be unwanted data. By using the steps mentioned in the above step you can get the list of tables with the max number of rows and then decide whether to keep that data or remove it. As said it won’t affect the performance but it might create confusion and in the end, it's garbage data. You can generate this report monthly and identify if there is a sudden uncertain increase in the count of rows and take necessary action.

Another scenario ‘Yes’ as the answer, as a number of columns in a table affect select queries. Whether or not they’re “extra” or “unused” is a question of database design, and doesn’t have anything to do with the question of performance.

All other things being equal, a row in a table with 100 columns is going to take up more space than a row in a table with 10 columns. Because the rows will be larger, your server will have to work (relatively) harder to go over an equal number of rows in the wider table than in the smaller table.

So the conclusion, a table with max number of columns might affect the performance as well as consume space.

Identify the tables with max numbers of columns or check if there is a sudden increase in the count of columns (monthly monitoring) using the steps mentioned in point 2.

4. Anomalies in the database

Schemaspy provides readily available anomalies in the database without doing any settings or modifications. To check this click on the ‘Anomalies’ tab available on the top middle of the screen.

Below are the anomalies available on schemaspy tool:

i. Columns whose name and type imply a relationship to another table’s primary key

- This provides a list of columns that has a relationship with another table’s primary key. This option can be helpful to verify the mapping between two tables and their columns.

ii. Tables without indexes

Let’s first understand “What is poor indexing/No indexing?”

Any SQL Server table configuration where performance suffers due to excessive, improper, or missing indexes is considered to be poor indexing.

If indexes are not properly created, SQL Server has to go through more records in order to retrieve the data requested by a query. Therefore, it uses more hardware resources (processor, memory, disk, and network), and obtaining the data lasts longer.

A wrong index can be an index created on a column that doesn’t provide easier data manipulation or an index created on multiple columns which instead of speeding up queries, slows them down.

A table without a clustered index can also be considered as a poor indexing practice. Executing a SELECT statement, inserting, updating, and deleting records is slower on a heap table than on a clustered one.

The list of tables without index is available under this option under the ‘Anomalies’ tab. Once you have a list you can do the assessment over it.

iii. Tables that contain a single column

Tables with a single column can be treated as unnecessary columns as they won’t have much data available also even if some data is available that will not be much use unless we just have to store information which can be done in some other way.

The list of tables that contains a single column is available under this option under the ‘Anomalies’ tab.

iv. Tables with incrementing column names, potentially indicating denormalization

This point is covered in point 2

v. Columns whose default value is the word ‘NULL’ or ‘null’

A column with the default value ‘null’ is nothing with unwanted columns which must be removed. The list of Columns whose default value is the word ‘NULL’ or ‘null’ is available under this option under the ‘Anomalies’ tab.

5. Relationship

You can find a Relationship tab on the top middle bar of the screen. This tab shows the entity-relationship diagram of the database.

An ER diagram shows the relationship among entity sets. An entity set is a group of similar entities and these entities can have attributes. In terms of DBMS, an entity is a table or attribute of a table in the database, so by showing relationships among tables and their attributes, ER diagram shows the complete logical structure of a database.

If you want to check the ER diagram on table level, go to tab ‘Tables’ click on the table name for which you want to check the relationship diagram. A new page will open up with the table details, you can find the relationship diagram at the bottom of that page. Using the same step you can also check the children's table for a particular column of that table.

6. Basic stats of the DB

Basis stats of the DB included a count of tables, columns, anomalies, views, constraints, and routines. These stats can be views on the landing page of the tool.

This will give you a clear picture of your database at an overall level.

Hope this article was useful for you. Please put down your suggestions in a comment below.

Analyst (Customer Insight) at Capita India