Wednesday, February 13, 2013

Semantic, baby!

OK, so I decided to create a semantic database. From relation database schema I had the basics - there's a row, row can have some columns and they may store some row's own values, or they can store a reference to some other row (in the same or different table). Years later I determined exact names I'll use for it: row = entity, column = attribute, reference = relation. And yes, any resemblance to ER model is purely intentional ;-)

My database model was quite obvious at this point: entity, attrib, relation, entity-attrib (m:n), entity-relation (m:n).


Then I started to think about what structure those tables should have. I started with "data" tables - entity, attrib and relation.

First two columns were quite obvious - integer primary key (id) + tinyint some kind of "type". Now I need something for data.

My first thought was: One column for text value + one for numeric value. And maybe one for datetime. Such model is good for performance and indexing, but I didn't like the code have to decide all the time where to put or look for a value and if I'd like to have one column always NULL. Single varchar column for value would be much better, but then I'll lose the performance advantage. But what about decimal numbers? Or dual values, like coordinates?

It felt like a Sophie's choice and after quite long thinking of all pros and cons I decided to sacrifice performance. I was going to store everything as a text, but the model will be pure.

Then I started to expand my thoughts about the whole schema little more. Relations should be able to contain attribs as well. So another table for m:n relation emerges. And all rels, ents and attrs would have their own type as well... there we go, another 3 tables. That's 9 now and that's it!


Those tables are just for data, nothing else. Sure, pretty much everything in QetriX is "data", but I needed more tables for specific purposes. Like translations, large text storage, secure file storage, activity log, change log, statistics, feedback or various caches. The schema grew big once again.

No comments:

Post a Comment