Data classification – Exploring Storage Solutions

The first question you need to ask when choosing a storage technology is what kind of data you are storing? In other words, we need to classify our data. Typically, data could be classified as one of the following types.

Structured data

Structured data is data that follows a strict definition or schema. Typically, this means information is stored in a table with columns defining the type and size of that data. Sometimes, one or more tables might be linked together with some form of a key. This type of data is typically stored inside a SQL database.

Semi-structured data

Semi-structured data is similar to structured data in that it often (but doesn’t have to) conforms to a set structure. This might be in a hierarchical format rather than ”flat” tables:

Figure 9.1 – Structured versus semi-structured data

As you can see from the preceding diagram, semi-structured data still has a structure, but it is not as strict as structured data. Finally, we have data that conforms to no structure at all.

Unstructured data

Unstructured data is also known as raw data. Typically, it is not in a set format as you find with structured and semi-structured, or it might use a propriety file type – for example, a media file or a Word document. Typical examples include (but are not limited to) the following:

  • Media files such as videos, images, and audio
  • Application files, such as Word documents, Excel files, or PowerPoint
  • Disk images such as Virtual Hard Disk (VHD) files used by VMs.
  • Pure text files (without any structure)
  • Log files (but not in tabular format)

Data classification, on its own, will not necessarily tell you the storage you should choose. For example, you might be using structured data which would normally be stored in a SQL database. However, you could always receive the data as comma-separated text files that you want to manage as text files. Conversely, you might be receiving Excel files, but you need to query data across them.

Therefore, the next question we need to ask is what operations will we be performing with the data?

Leave a Reply

Your email address will not be published. Required fields are marked *



          Copyright © 2015-2024 | About | Terms of Service | Privacy Policy