Proposed Changes to the Core for MyInternet 6
---------------------------------------------
Author: Sam Watkins
Date: 30/10/2000
We are working on a new object-oriented infrastructure and coding-standard for
MyInternet version 6. There is going to be a fair bit of work to acheive this,
but the end result is that the next version of our product will be flexible,
powerful and maintainable. We will have a world-class groupware platform, to
put others to shame. It is important to make the core of MyInternet as good as
we can, because this is the foundation for all of our other development work.
We need to fix it up now before it's too late!
I'm going to describe some of the major problems with the current modules and
infrastructure, and propose solutions to them. I'd like to say right now that
the criticisms I am going to make apply to my own code at least as much as to
anyone else's - some of my code in MI has been pretty shoddily put together
(e.g. the instance editor, and the curriculum-resource module!).
--------------------------------------------------------------------------------
1. We currently do not use real Perl objects to represent MI objects
(instances), only whole classes (modules). All messages that should be
directed to instances are sent to the module as a whole, with parameters
(%args) which include the instance number, type of view (hint), etc. This
seems like a small problem, but it is the most fundamental and serious fault in
MI at the moment, it is preventing developers from writing good OO code.
Fortunately, this problem is easy to fix (we should have done it ages
ago) - but we need to re-write existing modules in a more OO style to
take advantage of this change, and the other changes I am proposing.
Currently the renderer (ExpatProcessor.pm) is assembling the %args list
that gets passed to the modules. For a beginning, I have added a new
`constructor' in MI::Modules::Base, which will take these args as a
parameter, and for the moment, return a blessed reference to the very
same %args!
--------------------------------------------------------------------------------
2. We use numbers to identify instances. This is okay for the moment, but
there will be trouble, as we are finding that we may need to allocate more than
the maximum of 1000 `system' objects per class. Also, using numbers makes it
difficult to understand the XML.
We need to implement `named objects', some sort of indexing on metadata
titles or ids, so as not to be relying on hard-wired instance-numbers.
It is not obvious that `Raw,900' is `Deployment Selector Sidebar -
hometest' - we need to refer to it by some other identifier in our code
and from other objects. Perhaps something like
or
not
which is meaningless.
Even better would be to provide multiple access mechanisms for an
object, e.g.
"creator:uk.cambridge.adams.douglas.hitchhikers_guide",
"category:art.literature.novel.fiction.science.hitchhikers_guide"
or something. I like to think that objects exist in nested `spaces',
perhaps we can implement this, it would be better than numbers.
--------------------------------------------------------------------------------
3. Most modules do not have an object-model and API, so we cannot write other
modules (or scripts) that make use of the services that they provide, and we
cannot implement completely different user-interfaces (such as command-line or
java-applet). We cannot easily integrate alternate backends for a module with
our system. We also cannot easily convert the data from one version (or
backend) of a module to work with the next version - with a proper OO API, this
is trivial to acheive.
We need to build good base libraries for low-level data-storage and
persistency. I have started to do this with MI::Data, MI::MetaData and
Rowlib, but these are not sufficient. We need general-purpose classes
that a module-writer can use as super-classes, and not have to worry
about how the data is being stored - and we need the facility to store
and transmit (groups of) objects without losing their class information
(like Java serialization, but using language-independent XML). Some of
the work I have done on `dbischema' will be useful for this; I have
developed a generic `class factory' for building classes with automatic
convenience methods, XML support, merging, diffing, and more (the
latter are pretty essential for groupware and versioning). There are
other such libraries in CPAN, and mentioned in the OO Perl book, from
which we can get ideas.
There will not be any SQL or file access in the central part of a
MyInternet module, for the same reason that we will not have HTML in
the modules - the back- (and front-) ends may vary.
We need to ensure that developers write maintainable, documented,
quality OO code for modules and core (I am more guilty of not doing
this than anyone else, I promise you!). This means new coding
standards, `egoless programming' and code reviews of every bit of code
we write - at least in the beginning - but this need not be too much
trouble (Aris can do it ;). The most important thing in writing a
module is to give it a good OO design with a clean API - for low-level
data-access, for high-level functionality, and for the user-interface.
If the user can do it by clicking a button, a developer aught to be
able to do it by calling a method.
For example, if we have a bookmarks property, and the backend is
encapsulated with an OO API, it would be easy to add a new function to
upload existing Navigator and IE bookmarks files and integrate them, or
an automatic indexing function that determines which other people have
similar bookmarks to you (friend finder?), or a function that looks for
other webpages with a similar theme to those you have bookmarked, and
suggests them to you, occassionally - you can accept or reject, and the
computer gets a better idea of your interests. When we add a new
`hierarchically categorised bookmarks' feature, and the backend
data-representation and user-interface change, there will be little or
no effect on these other `bookmarks clients'.
--------------------------------------------------------------------------------
4. Modules freely mix code for creating the user-interface (HTML, XML, CGI
forms) with code that performs functions and low-level data-access. This means
that only a devloper can change the user-interface to a module, and that
maintenance becomes much more difficult. If we decide to completely change the
user-interface, or the backend, we have to scrap the other part also. Code
with fragments of XML strewn through it is not much better than code with
fragments of HTML strewn through it. Our modules need interfaces to their
interfaces!
We need to build libraries that encapsulate user-interface generation,
input and output, like the class-libraries used in `real' GUI toolkits
such as Tk, Java AWT / Swing, GTK+, etc. I think we should even go so
far as to implement callbacks (or signals) attached to buttons for
input - event/signal-driven OO is the most advanced paradigm to date,
and we need to start getting used to it. In this way, we can
completely isolate the CGI/HTTP dependent portion of our code, and in
future it will be easy to put a real Java or Tk interface where the
browser used to be.
--------------------------------------------------------------------------------
5. It is a real problem that we are concentrating on XML representation, and
producing elaborate XML schemas without any `real' objects behind them. XML is
just a notation (and a buzzword), it doesn't provide instant good design.
Every class we write should be able to serialise its objects in an XML-based
language, but more importantly every XML tag that we invent should correspond
directly to a class, which has behaviour.
If we have an XML tag, for example, there had better be a
Perl (or whatever language) MI::List object that performs list
operations on it, can send events when it changes, etc. I would say
that it is appropriate to have objects behind even , and
tags. XML is an excellent format for data-exchange, but our
modules should not be producing it directly - they should be creating
structures of objects that have XML serialisation capability. We can
create these classes with a `class factory' like I use in dbischema,
and add specific functionality manually (the worst thing in OO is
writing all the `get' and `set' methods - but we don't need to!). This
technique worked really well for my dbischema tool, I have already
reused the database-schema class-family in a number of other programs:
in dbidump, to determine which fields are numeric for inserts in mSQL;
in the back-door mSQL table recovery hack; and in SINA 3.12 to emulate
the old sql.pl / sina.sql combination, that only took about 20 minutes!
One benefit if we abstract our UI in this way: entity escaping (a major
source of potential and actual bugs) will be automatic, and it will be
easy to shift to a persistent user-interface (point 5), replacing the
`connect, build, throw-away' technique we are using at the moment.
Another benefit - adding support for internationalisation and such
things will be easier. Modules will be much easier to read and
maintain. No more `let's see if I can get four varieties of quoting
all in one line' competitions.
--------------------------------------------------------------------------------
6. A single instance may be viewed by multiple users, and even a single user
may see and use multiple `views' of a single instance at the same time, when
cross-referencing a multi-page resource, for example, or presenting data from a
science project in tabular and graphical form. There are different types of
`views' - the normal `object' view, `editor' view, `metadata' view and `access
control' view (these latter two are really views of associated `metadata' and
'acl' objects). Different people may choose to view an object in different
ways, and with different client software. This `multiple views' paradigm
(implemented in Smalltalk) is especially appropriate for a groupware
application such as MyInternet, where we (will) have millions of users,
communications spaces such as chatrooms and forums, and colloborative projects
such as family-trees and school projects. We currently have no coherent
support in the core for `view' objects (nor even `instance' objects yet!), nor
for multiple people to simultaneously edit a single object.
We need to implement separate `view' classes for our objects in
addition to the `model' classes (the API objects described in point 3).
The most obvious reason for this is that there are commonly many
`views' on a single object, and these views may need to keep state on
the server. Consider a curriculum-resource - many children are reading
it, but looking at different pages - we want to remember what page they
were up to last time. Perhaps we will allow them to highlight sections
and type comments in the margin. When someone is filling in a form, we
need to remember what they have entered already, and where they are up
to. This is all `view' state. There is already support for keeping
different types of state in the mi-core, but it is ugly, and not OO.
Every `mi-module' will need at least two (sets of) classes in its
implementation - the `object' and a `view': for example, the Calendar
module will have `object' classes that provide an API to the calendar
as a whole and also to its components - days, events, anniversaries,
etc. These make up the backend, and have NO user interface details
whatsoever - but there are methods to perform all the queries and
actions that the user interface will require, including searching,
selection, etc. All of the brains are here. These parts should be
`thread safe' so to speak, and support multiple clients.
In some complex cases, such as search-ranking logic, we might put a
lower-level layer between the brains and the back-end, rather than
putting all the brains in the back-end objects, so that if the back-end
changes, we can keep the same brains, and vice-versa - the structure
can vary from case to case, but we do need some structure!
Every module will also have a `view' class (like a `widget', but
actually a real object, with the ability to generate XML). Where part
of the `view' system for an object is implemented in a
language-other-than-Perl (for example XSL, client-side Java or
Java-script), then the Perl `view' class will be a proxy, transmitting
parts of the model and results of queries on it, probably in XML, to
the rest of the distributed view-object. In some cases, we might keep
state on our servers, in other cases on the client's computer, but we
need a uniform interface to views within MyInternet. Anything from the
model onwards should be `pluggable' - there are many ways to build an
interface, and not all of them utilise XSLT. I discovered this when
implementing the `raw' myinternet command for images and the like, I
had to hack around the renderer, and the result was very poor. Images,
when they require processing at all, definitely should not be
serialized to XML and put through an XSLT processor! is a bit scary!!
--------------------------------------------------------------------------------
7. Every time the user connects and makes a request, a number of MyInternet
modules must build complex XML objects, send it to the renderer, and then
abandon all the work it has done. Some of the XML and HTML may be cached, if
they are regarded as static, but when a very slightly different view is
requested later (e.g. the user clicks on the head of a column to sort a table),
caching is no use, the entire XML structure must be generated and rendered
again, when really only a small amount of shuffling is required. This is most
evident in the PL object, which regenerates the whole page each time (although
individual objects will be cached) - there is no easy way at the moment to get
it to update just a part of the display, for a client that supports such
dynamic rendering.
There are two reasons for this - the granularity of our objects is too
large, and the present renderer-pipeline process is one-way, one-time
for each object. We need to build persistent `view' (user-interface)
objects into the system, distributed across the server and client as
appropriate. In some cases, all the `persistence' might be on the
client-side - but this is extreme, not the normal situation we have
become accustomed to working with CGI.
A calendar `view' might aggregate other views, down to the `table',
`hbox', `button' and `hyperlink' level. The calendar view might not
simply produce XML, it may build a tree of objects representing its
interface. You might think that there would be performance issues with
doing this, but I think that taking this road will improve performance.
Perl can comfortably support millions of objects in RAM, and it will be
easy to write an `automatic caching' function in our base-class, such
that objects (and trees of objects) that are not used for a long time
get swapped out to disc (as XML) or abandoned. If an object is still
referenced in memory, when it gets swapped out, we can leave a
place-holder that, on any method call, loads the real object into its
place, re-blesses itself, then calls the real method. Sounds like a
hack, but it would work wonders, and we get the advantage of real OO
and automatic caching for every object in the system that needs it (one
more thing module-writers don't need to worry about). We can build
distribution support, undo/redo and conflict-resolution support into
the base objects, so we can do groupware - you change an object in
Australia, it sends an event to the mirror object in the UK - if
there's a conflict, and the system can't do a merge, the changes
bounce. We can implement XML-based RPC and mobile-objects, eventually
mobile agents (e.g. for remote queries).
We need to anticipate supporting dynamic clients, which can keep direct
connections open to the MI server, and communicate continuously in both
directions. Then we will have real event-driven groupware.
This is the sort of exciting stuff that I WANT to be programming. I
know I can do it, and I know the rest of the developers here can do it,
because it will give us something to be excited about. The main
problem with this company is that the developers are bored and
fraustrated!
--------------------------------------------------------------------------------
8. Testing is currently a haphazard and painful process, because we cannot
easily do unit-testing on our non-modular code, and are forced to test through
the browser/web-server/mid pipe.
Object-oriented code should be much easier to test and debug - and we
can build testing support into the framework. In Perl, we can make
generic `dummy' objects, that query the developer for input whenever a
method is called, and generic `trace' wrappers, so that we can get a
record of messages on a particular object, or even through the whole
system. If our objects are descended from a common ancestor that can
provide these services (and the other services I have described
already), we can provide debugging options to say `all Minical::Event
objects will be dummies', or `trace all messages to and from the State
object' (we could acheive the `from' bit using `caller' and a global
trace, it would be better still if we could put a trap on messages
going out of an object - but I don't know how!).
Another feature that we should build in the foundations is listener /
observer support - it should be possible to put a listener on any
attribute of any object, and on structures like lists and trees. Then
we can have some powerful event-based stuff happening, like automatic
notification of email, incremental translation / compilation, etc.
--------------------------------------------------------------------------------
9. Rationalise MI::MetaData and ACLs
I would like to split the ACL information out of `metadata' into a separate
object. Currently they are together only because the ACLs are stored in the
same file as the metadata, but this will change.
Many of the methods in MI::MetaData should not be there, as they relate to
aspects of instances and modules unrelated to metadata. Parallel to this
problem, the "METADATA" branch of the MI::Data tree really contains all sorts
of instance data, far more than just metadata (version control, ACLs, instance
data, user/instance data, etc.).
We should not be passing raw strings like
"object PL 1 snc internal hacking sam"
to the acl_test method, we need a more abstract interface so we can add new
dimensions to the ACL system without breaking old code. We need to have a "can
we 'blah'" message, passing along the main (connection?) object so that the ACL
system can interrogate whatever information is relevant to the question - the
`test_acl_user' was a small step in this direction.
Groups - we need to support multiple groups for a single user - then groups
could be used for `project groups', `family groups', `year-level groups',
`special-interest groups' etc.
Does an ACL really belong to an instance? It could also be argued that it
belongs just as much to a user - it's a relational thing. We may want to have
rules like `Johnny is not allowed to use any MI games or chatrooms between the
hours of 5:00pm and 7:00pm on weeknights, because he's supposed to be doing his
homework.' Or `Isabelle is not allowed to see any pictures from outside the
educache' (well, this is beyond MI at the moment). We should at least lay the
foundations for such fascism, with a relational / rules-based ACL system!
I think ACLs should be strict, and checked by the system, not by each container
object as it sees fit.
The two `container' fields in the current ACL schema bother me. Firstly, they
are not enforced (I tried writing code to enforce them recently, and everything
broke). Secondly, if there is a technical or political reason why an object
must not be used outside a certain container, this is a different dimension to
the normal access control. It makes little sense to say that one particular
person may view Minical,1001 only inside PL,1, but that the rest of the
teaching group can view it only inside PL,2. We will need a new, relational
back-end on the ACL system.
I think that the (to be written) indexing / search system should be integrated
with the ACL system, as search queries will typically involve both ACL and
metadata / content constraints and weighted factors.
-----------------------------------------------------------------------------------
Here is a very rough sketch of some initial proposed changes to the
object-model:
Classes (modules):
------------------
Currently we are using metadata in instance 0 to describe the class as
a whole, e.g. the "edit" ACL in instance 0 metadata describes who can
create the object. (incidentally, this should definitely be changed to
a `create' ACL) This class metadata should be accessible through the
`class' object.
$class->
new_object - this is to create a new object
get_object - this is to access an object that's already been
created
metadata ? - description of class, etc.
acl ? - can you create instances, etc.
new_view(type)
get_view(id)
Objects (instances):
--------------------
$object->
new_view(type)
get_view(id)
number - this is the `instance number', hopefully to
be eliminated!
version
metadata
get(key)
set(key, value)
...
acl - this is a `closure' on the singleton ACL/indexing engine
test(predicate)
insert(rule)
...
class
...
---
additional methods to access sub-components of the object, and to
access the `body' of the object if it is a component, etc.
Views (types include - view, edit, configure, help, etc.):
----------------------------------------------------------
$view->
size - hint (large, small, single-line)
object - this accesses the object's data (above)
...
user
...
connection
...
---
additional methods to access sub-components of the view, and to
access the `body' of the view if it is a component, etc.
----
That's the end of my ravings for the moment.
Sam