Categories
Archives
- September 2024
- August 2024
- July 2024
- June 2024
- April 2024
- March 2024
- January 2024
- December 2023
- October 2023
- September 2023
- August 2023
- July 2023
- May 2023
- April 2023
- February 2023
- January 2023
- November 2022
- October 2022
- September 2022
- July 2022
- May 2022
- April 2022
- February 2022
- January 2022
- December 2021
- November 2021
- September 2021
- August 2021
- July 2021
The first question you need to ask when choosing a storage technology is what kind of data you are storing? In other words, we need to classify our data. Typically, data could be classified as one of the following types.
Structured data
Structured data is data that follows a strict definition or schema. Typically, this means information is stored in a table with columns defining the type and size of that data. Sometimes, one or more tables might be linked together with some form of a key. This type of data is typically stored inside a SQL database.
Semi-structured data
Semi-structured data is similar to structured data in that it often (but doesn’t have to) conforms to a set structure. This might be in a hierarchical format rather than ”flat” tables:

Figure 9.1 – Structured versus semi-structured data
As you can see from the preceding diagram, semi-structured data still has a structure, but it is not as strict as structured data. Finally, we have data that conforms to no structure at all.
Unstructured data
Unstructured data is also known as raw data. Typically, it is not in a set format as you find with structured and semi-structured, or it might use a propriety file type – for example, a media file or a Word document. Typical examples include (but are not limited to) the following:
- Media files such as videos, images, and audio
- Application files, such as Word documents, Excel files, or PowerPoint
- Disk images such as Virtual Hard Disk (VHD) files used by VMs.
- Pure text files (without any structure)
- Log files (but not in tabular format)
Data classification, on its own, will not necessarily tell you the storage you should choose. For example, you might be using structured data which would normally be stored in a SQL database. However, you could always receive the data as comma-separated text files that you want to manage as text files. Conversely, you might be receiving Excel files, but you need to query data across them.
Therefore, the next question we need to ask is what operations will we be performing with the data?
Leave a Reply