SQL Server Overview
Running in a Container
On AMD64 PCs just use the sql-server container image.
On M1 macs, use the following:
It’s the SQL Server Edge - at this moment it’s the only SQL Server variant compiled for ARM64.
Structure
- RDMS:
- Databases
- Schemas (the default is dbo) - they act as namespaces, they are
useful for managing security
- Tables
- Columns
- Views
- Stored Procedures
- Tables
- Schemas (the default is dbo) - they act as namespaces, they are
useful for managing security
- Databases
Every object in the database has its fully qualified name in the format:
{Instace}.{Database}.{Schema}.{Object}
.
Naming
Popular Choices:
- PascalCase
- underscore_separated
- Hybrid_Of_Above
Best Practices
Normal Forms
Normalization defines a set of rules that help to achieve a good design of a database, reducing duplication and data anomalies.
- 1NF
- one value per table cell (e.g. first and last names)
- rows should be unique - primary key should be there
- 2NF
- single-column primary keys
- 3NF
- Column values should only depend upon the key - if any non-key column depends on column(s) other than the key one, this data does not belong in that table. For example, keeping patient’s doctor phone in the Patients database is not a good idea. There should be a separate table - Doctors - and patients should be linked ot appropriate doctors.
There also other Normal Forms.
Constraints
Constraints specify additional rules for the columns in our tables.
NULL
We can “constraint” columns to allow NULLs. By default, the NOT NULL
constains
is applied.
DEFAULT
Provides a default value for a column. It is useful when NOT NULL
is applied
as well.
Primary Keys
Thy ensure uniqueness. It provides a backing index. That index can be clustered or unclustered.
UNIQUE
It ensures no duplicates of some column. It allows NULL (but only one!). They are backed by an index (clustered or unclustered (the default)).
The UNIQUE
constraint can be applied on multiple columns, meaning that the
combination of their values should be unique.
Foreign Keys
The Foreign Key constraint links two tables together. We cannot create an entry
that refers to a non-existing entity. For example, we can’t create an
OrderItem
for an Order
that doesn’t exist.
CASCADE
Entities referred to by a foreign key cannot be deleted. We can help ourselves
with the CASCADE
option. Whenever we delete an entity referenced by a foreign
key, the entity that refers to it will be deleted as well.
CHECK
This constraint allows to define declarative various conditions on the table-level or on the column-level. The conditions are boolean expressions.
Examples:
CHECK (LastName <> '')
-LastName
value can’t be emptyCHECK (StockName <> StockDescription)
- two columns can’t have the same valuesCHECK (Currency IN ('PLN', 'EUR'))
Views
Views encapsulate queries. This way we can store some commonly used queries and use them whenever they’re needed. For example, we can create some complicated query that joins multiple tables and hides unneded data.
Views can be seen as Virtual Tables. Applications can invoke the views instead of querying the actual tables. This way, backward compatibility can be kept. We could add some new column to a table, but make sure that the View still returns the same set of data as before.
Columns returned in a View can have their names changed.
Updateable Views
We can update data in tables via Views. However, any such operation can only reference columns of only one base table. There are a few other restrictions as well.
Indexes
We create indexes:
- to enforce uniqueness on data (primary keys and unique constraints are backed by indexes)
- to improve query performance - these need to be determined by our workload
- for foreign keys
DB data is stored in pages, sorted by clustered index. Then, these pages are put into a B-Tree. All apges are in leaves. Then, a top row of each page is taken to build higher layers of the tree (intermediary layers + a root).
Here’s an example of a seek for (C, 54.22):
Clustered Index
The data (rows) in the table is sorted using the index column. It’s the default backing index for Primary Keys. There is no separate storage of index.
Guidelines
Ideally, a clustered index values should be ever-increasing. That way, the organization of existing data does not need to change when new entries are being added (page splits). It should also be unchanging, for the same reasons (changing the index value causes a delete-insert operation pair on the row). Additionally, the index should be narrow (the less bytes the better). This way, there will be less intermediary pages.
Non-Clustered Index
It doesn’t sort the rows. It’s stored in a different place, separately from the table (similar to book index). It is stored as B-Tree.
When querying data, first DB checks the index, and then it goes to retrieve the data at the specified address. Clustered Index uses the data directly, so it’s faster.
The index is always in sync with the data (the same amount of rows).
Multiple Columns
Here’s how to create an index:
The order of columns in the index matters. If our index contains columns “TransactionType” and “Amount”, the index will be created in a way that rows are sorted by “TransactionType”. We can quickly query data by the “TransactionType” column. However, querying on “Amount” will require the DB to scan throught the entire index data structure to find appropriate rows.
Here’s an example:
Our index has been created optimally for the queries that we want to use. The order of the columns is: “TransactionType”, “Amount”. If we used a reversed order, Query II would still work fine, however, Query I would need to scan all the rows.
For queries that include inequality predicates, the following applies:
- columns using equality predicates should come before inequality ones in the index
- multiple inquality columns are hard to index well
Tips
- Non-clustered Index can include additional columns than the ones being
indexed. A
SELECT
statement can filter based on indexed columns, but it might require some additional columns to be returned. Including these columns in the index itself skips the step where the DB engine would have to lookup this data in the actual raw data files. - The index can be a Filtered Index - this way, not all of the rows are indexed, but only the ones that satisfy some filter. It makes the index smaller. E.g., it could be useful for tables that use the “soft-deletes” technique. We could skip such entities from the index if our queries should not bother with them.