Proposed Changes to the Core for MyInternet 6
---------------------------------------------

Author: Sam Watkins
Date:   30/10/2000


We are working on a new object-oriented infrastructure and coding-standard for
MyInternet version 6.  There is going to be a fair bit of work to acheive this,
but the end result is that the next version of our product will be flexible,
powerful and maintainable.  We will have a world-class groupware platform, to
put others to shame.  It is important to make the core of MyInternet as good as
we can, because this is the foundation for all of our other development work.
We need to fix it up now before it's too late!

I'm going to describe some of the major problems with the current modules and
infrastructure, and propose solutions to them.  I'd like to say right now that
the criticisms I am going to make apply to my own code at least as much as to
anyone else's - some of my code in MI has been pretty shoddily put together
(e.g. the instance editor, and the curriculum-resource module!).


--------------------------------------------------------------------------------

1. We currently do not use real Perl objects to represent MI objects
(instances), only whole classes (modules).  All messages that should be
directed to instances are sent to the module as a whole, with parameters
(%args) which include the instance number, type of view (hint), etc.  This
seems like a small problem, but it is the most fundamental and serious fault in
MI at the moment, it is preventing developers from writing good OO code.

	Fortunately, this problem is easy to fix (we should have done it ages
	ago) - but we need to re-write existing modules in a more OO style to
	take advantage of this change, and the other changes I am proposing.

	Currently the renderer (ExpatProcessor.pm) is assembling the %args list
	that gets passed to the modules.  For a beginning, I have added a new
	`constructor' in MI::Modules::Base, which will take these args as a
	parameter, and for the moment, return a blessed reference to the very
	same %args!

--------------------------------------------------------------------------------

2. We use numbers to identify instances.  This is okay for the moment, but
there will be trouble, as we are finding that we may need to allocate more than
the maximum of 1000 `system' objects per class.  Also, using numbers makes it
difficult to understand the XML.

	We need to implement `named objects', some sort of indexing on metadata
	titles or ids, so as not to be relying on hard-wired instance-numbers.
	It is not obvious that `Raw,900' is `Deployment Selector Sidebar -
	hometest' - we need to refer to it by some other identifier in our code
	and from other objects.  Perhaps something like

       <mi:object class="Raw" id="DeploymentSelector hometest"/>
    or <mi:object class="DeploymentSelector" title="hometest"/>
   not <mi:object class="Raw" instance="900"/>
	which is meaningless.

	Even better would be to provide multiple access mechanisms for an
	object, e.g.
		"creator:uk.cambridge.adams.douglas.hitchhikers_guide",
		"category:art.literature.novel.fiction.science.hitchhikers_guide"
	or something.  I like to think that objects exist in nested `spaces',
	perhaps we can implement this, it would be better than numbers.

--------------------------------------------------------------------------------

3. Most modules do not have an object-model and API, so we cannot write other
modules (or scripts) that make use of the services that they provide, and we
cannot implement completely different user-interfaces (such as command-line or
java-applet).  We cannot easily integrate alternate backends for a module with
our system.  We also cannot easily convert the data from one version (or
backend) of a module to work with the next version - with a proper OO API, this
is trivial to acheive.

	We need to build good base libraries for low-level data-storage and
	persistency.  I have started to do this with MI::Data, MI::MetaData and
	Rowlib, but these are not sufficient.  We need general-purpose classes
	that a module-writer can use as super-classes, and not have to worry
	about how the data is being stored - and we need the facility to store
	and transmit (groups of) objects without losing their class information
	(like Java serialization, but using language-independent XML).  Some of
	the work I have done on `dbischema' will be useful for this; I have
	developed a generic `class factory' for building classes with automatic
	convenience methods, XML support, merging, diffing, and more (the
	latter are pretty essential for groupware and versioning).  There are
	other such libraries in CPAN, and mentioned in the OO Perl book, from
	which we can get ideas.

	There will not be any SQL or file access in the central part of a
	MyInternet module, for the same reason that we will not have HTML in
	the modules - the back- (and front-) ends may vary.

	We need to ensure that developers write maintainable, documented,
	quality OO code for modules and core (I am more guilty of not doing
	this than anyone else, I promise you!).  This means new coding
	standards, `egoless programming' and code reviews of every bit of code
	we write - at least in the beginning - but this need not be too much
	trouble (Aris can do it ;).  The most important thing in writing a
	module is to give it a good OO design with a clean API - for low-level
	data-access, for high-level functionality, and for the user-interface.
	If the user can do it by clicking a button, a developer aught to be
	able to do it by calling a method.

	For example, if we have a bookmarks property, and the backend is
	encapsulated with an OO API, it would be easy to add a new function to
	upload existing Navigator and IE bookmarks files and integrate them, or
	an automatic indexing function that determines which other people have
	similar bookmarks to you (friend finder?), or a function that looks for
	other webpages with a similar theme to those you have bookmarked, and
	suggests them to you, occassionally - you can accept or reject, and the
	computer gets a better idea of your interests.  When we add a new
	`hierarchically categorised bookmarks' feature, and the backend
	data-representation and user-interface change, there will be little or
	no effect on these other `bookmarks clients'.

--------------------------------------------------------------------------------

4. Modules freely mix code for creating the user-interface (HTML, XML, CGI
forms) with code that performs functions and low-level data-access.  This means
that only a devloper can change the user-interface to a module, and that
maintenance becomes much more difficult.  If we decide to completely change the
user-interface, or the backend, we have to scrap the other part also.  Code
with fragments of XML strewn through it is not much better than code with
fragments of HTML strewn through it.  Our modules need interfaces to their
interfaces!

	We need to build libraries that encapsulate user-interface generation,
	input and output, like the class-libraries used in `real' GUI toolkits
	such as Tk, Java AWT / Swing, GTK+, etc.  I think we should even go so
	far as to implement callbacks (or signals) attached to buttons for
	input - event/signal-driven OO is the most advanced paradigm to date,
	and we need to start getting used to it.  In this way, we can
	completely isolate the CGI/HTTP dependent portion of our code, and in
	future it will be easy to put a real Java or Tk interface where the
	browser used to be.

--------------------------------------------------------------------------------

5. It is a real problem that we are concentrating on XML representation, and
producing elaborate XML schemas without any `real' objects behind them.  XML is
just a notation (and a buzzword), it doesn't provide instant good design.
Every class we write should be able to serialise its objects in an XML-based
language, but more importantly every XML tag that we invent should correspond
directly to a class, which has behaviour.

	If we have an XML <mi:list> tag, for example, there had better be a
	Perl (or whatever language) MI::List object that performs list
	operations on it, can send events when it changes, etc.  I would say
	that it is appropriate to have objects behind even <a></a>, <b></b> and
	<br/> tags.  XML is an excellent format for data-exchange, but our
	modules should not be producing it directly - they should be creating
	structures of objects that have XML serialisation capability.  We can
	create these classes with a `class factory' like I use in dbischema,
	and add specific functionality manually (the worst thing in OO is
	writing all the `get' and `set' methods - but we don't need to!).  This
	technique worked really well for my dbischema tool, I have already
	reused the database-schema class-family in a number of other programs:
	in dbidump, to determine which fields are numeric for inserts in mSQL;
	in the back-door mSQL table recovery hack; and in SINA 3.12 to emulate
	the old sql.pl / sina.sql combination, that only took about 20 minutes!

	One benefit if we abstract our UI in this way: entity escaping (a major
	source of potential and actual bugs) will be automatic, and it will be
	easy to shift to a persistent user-interface (point 5), replacing the
	`connect, build, throw-away' technique we are using at the moment.
	Another benefit - adding support for internationalisation and such
	things will be easier.  Modules will be much easier to read and
	maintain.  No more `let's see if I can get four varieties of quoting
	all in one line' competitions.

--------------------------------------------------------------------------------

6. A single instance may be viewed by multiple users, and even a single user
may see and use multiple `views' of a single instance at the same time, when
cross-referencing a multi-page resource, for example, or presenting data from a
science project in tabular and graphical form.  There are different types of
`views' - the normal `object' view, `editor' view, `metadata' view and `access
control' view (these latter two are really views of associated `metadata' and
'acl' objects).  Different people may choose to view an object in different
ways, and with different client software.  This `multiple views' paradigm
(implemented in Smalltalk) is especially appropriate for a groupware
application such as MyInternet, where we (will) have millions of users,
communications spaces such as chatrooms and forums, and colloborative projects
such as family-trees and school projects.  We currently have no coherent
support in the core for `view' objects (nor even `instance' objects yet!), nor
for multiple people to simultaneously edit a single object.

	We need to implement separate `view' classes for our objects in
	addition to the `model' classes (the API objects described in point 3).
	The most obvious reason for this is that there are commonly many
	`views' on a single object, and these views may need to keep state on
	the server.  Consider a curriculum-resource - many children are reading
	it, but looking at different pages - we want to remember what page they
	were up to last time.  Perhaps we will allow them to highlight sections
	and type comments in the margin.  When someone is filling in a form, we
	need to remember what they have entered already, and where they are up
	to.  This is all `view' state.  There is already support for keeping
	different types of state in the mi-core, but it is ugly, and not OO.

	Every `mi-module' will need at least two (sets of) classes in its
	implementation - the `object' and a `view': for example, the Calendar
	module will have `object' classes that provide an API to the calendar
	as a whole and also to its components - days, events, anniversaries,
	etc.  These make up the backend, and have NO user interface details
	whatsoever - but there are methods to perform all the queries and
	actions that the user interface will require, including searching,
	selection, etc.  All of the brains are here.  These parts should be
	`thread safe' so to speak, and support multiple clients.

	In some complex cases, such as search-ranking logic, we might put a
	lower-level layer between the brains and the back-end, rather than
	putting all the brains in the back-end objects, so that if the back-end
	changes, we can keep the same brains, and vice-versa - the structure
	can vary from case to case, but we do need some structure!

	Every module will also have a `view' class (like a `widget', but
	actually a real object, with the ability to generate XML).  Where part
	of the `view' system for an object is implemented in a
	language-other-than-Perl (for example XSL, client-side Java or
	Java-script), then the Perl `view' class will be a proxy, transmitting
	parts of the model and results of queries on it, probably in XML, to
	the rest of the distributed view-object.  In some cases, we might keep
	state on our servers, in other cases on the client's computer, but we
	need a uniform interface to views within MyInternet.  Anything from the
	model onwards should be `pluggable' - there are many ways to build an
	interface, and not all of them utilise XSLT.  I discovered this when
	implementing the `raw' myinternet command for images and the like, I
	had to hack around the renderer, and the result was very poor.  Images,
	when they require processing at all, definitely should not be
	serialized to XML and put through an XSLT processor!  <pixel
	colour="red"/> is a bit scary!!

--------------------------------------------------------------------------------

7. Every time the user connects and makes a request, a number of MyInternet
modules must build complex XML objects, send it to the renderer, and then
abandon all the work it has done.  Some of the XML and HTML may be cached, if
they are regarded as static, but when a very slightly different view is
requested later (e.g. the user clicks on the head of a column to sort a table),
caching is no use, the entire XML structure must be generated and rendered
again, when really only a small amount of shuffling is required.  This is most
evident in the PL object, which regenerates the whole page each time (although
individual objects will be cached) - there is no easy way at the moment to get
it to update just a part of the display, for a client that supports such
dynamic rendering.

	There are two reasons for this - the granularity of our objects is too
	large, and the present renderer-pipeline process is one-way, one-time
	for each object.  We need to build persistent `view' (user-interface)
	objects into the system, distributed across the server and client as
	appropriate.  In some cases, all the `persistence' might be on the
	client-side - but this is extreme, not the normal situation we have
	become accustomed to working with CGI.

	A calendar `view' might aggregate other views, down to the `table',
	`hbox', `button' and `hyperlink' level.  The calendar view might not
	simply produce XML, it may build a tree of objects representing its
	interface.  You might think that there would be performance issues with
	doing this, but I think that taking this road will improve performance.
	Perl can comfortably support millions of objects in RAM, and it will be
	easy to write an `automatic caching' function in our base-class, such
	that objects (and trees of objects) that are not used for a long time
	get swapped out to disc (as XML) or abandoned.  If an object is still
	referenced in memory, when it gets swapped out, we can leave a
	place-holder that, on any method call, loads the real object into its
	place, re-blesses itself, then calls the real method.  Sounds like a
	hack, but it would work wonders, and we get the advantage of real OO
	and automatic caching for every object in the system that needs it (one
	more thing module-writers don't need to worry about).  We can build
	distribution support, undo/redo and conflict-resolution support into
	the base objects, so we can do groupware - you change an object in
	Australia, it sends an event to the mirror object in the UK - if
	there's a conflict, and the system can't do a merge, the changes
	bounce.  We can implement XML-based RPC and mobile-objects, eventually
	mobile agents (e.g. for remote queries).

	We need to anticipate supporting dynamic clients, which can keep direct
	connections open to the MI server, and communicate continuously in both
	directions.  Then we will have real event-driven groupware.

	This is the sort of exciting stuff that I WANT to be programming.  I
	know I can do it, and I know the rest of the developers here can do it,
	because it will give us something to be excited about.  The main
	problem with this company is that the developers are bored and
	fraustrated!

--------------------------------------------------------------------------------

8. Testing is currently a haphazard and painful process, because we cannot
easily do unit-testing on our non-modular code, and are forced to test through
the browser/web-server/mid pipe.

	Object-oriented code should be much easier to test and debug - and we
	can build testing support into the framework.  In Perl, we can make
	generic `dummy' objects, that query the developer for input whenever a
	method is called, and generic `trace' wrappers, so that we can get a
	record of messages on a particular object, or even through the whole
	system.  If our objects are descended from a common ancestor that can
	provide these services (and the other services I have described
	already), we can provide debugging options to say `all Minical::Event
	objects will be dummies', or `trace all messages to and from the State
	object' (we could acheive the `from' bit using `caller' and a global
	trace, it would be better still if we could put a trap on messages
	going out of an object - but I don't know how!).

	Another feature that we should build in the foundations is listener /
	observer support - it should be possible to put a listener on any
	attribute of any object, and on structures like lists and trees.  Then
	we can have some powerful event-based stuff happening, like automatic
	notification of email, incremental translation / compilation, etc.

--------------------------------------------------------------------------------

9. Rationalise MI::MetaData and ACLs

I would like to split the ACL information out of `metadata' into a separate
object.  Currently they are together only because the ACLs are stored in the
same file as the metadata, but this will change.

Many of the methods in MI::MetaData should not be there, as they relate to
aspects of instances and modules unrelated to metadata.  Parallel to this
problem, the "METADATA" branch of the MI::Data tree really contains all sorts
of instance data, far more than just metadata (version control, ACLs, instance
data, user/instance data, etc.).

We should not be passing raw strings like
"object PL 1 snc internal hacking sam"
to the acl_test method, we need a more abstract interface so we can add new
dimensions to the ACL system without breaking old code.  We need to have a "can
we 'blah'" message, passing along the main (connection?) object so that the ACL
system can interrogate whatever information is relevant to the question - the
`test_acl_user' was a small step in this direction.

Groups - we need to support multiple groups for a single user - then groups
could be used for `project groups', `family groups', `year-level groups',
`special-interest groups' etc.

Does an ACL really belong to an instance?  It could also be argued that it
belongs just as much to a user - it's a relational thing.  We may want to have
rules like `Johnny is not allowed to use any MI games or chatrooms between the
hours of 5:00pm and 7:00pm on weeknights, because he's supposed to be doing his
homework.'  Or `Isabelle is not allowed to see any pictures from outside the
educache' (well, this is beyond MI at the moment).  We should at least lay the
foundations for such fascism, with a relational / rules-based ACL system!

I think ACLs should be strict, and checked by the system, not by each container
object as it sees fit.

The two `container' fields in the current ACL schema bother me.  Firstly, they
are not enforced (I tried writing code to enforce them recently, and everything
broke).  Secondly, if there is a technical or political reason why an object
must not be used outside a certain container, this is a different dimension to
the normal access control.  It makes little sense to say that one particular
person may view Minical,1001 only inside PL,1, but that the rest of the
teaching group can view it only inside PL,2.  We will need a new, relational
back-end on the ACL system.

I think that the (to be written) indexing / search system should be integrated
with the ACL system, as search queries will typically involve both ACL and
metadata / content constraints and weighted factors.

-----------------------------------------------------------------------------------


Here is a very rough sketch of some initial proposed changes to the
object-model:

Classes (modules):
------------------

	Currently we are using metadata in instance 0 to describe the class as
	a whole, e.g. the "edit" ACL in instance 0 metadata describes who can
	create the object.  (incidentally, this should definitely be changed to
	a `create' ACL)  This class metadata should be accessible through the
	`class' object.

$class->
	new_object		- this is to create a new object
	get_object		- this is to access an object that's already been
				  created
	metadata  ?		- description of class, etc.
	acl       ?		- can you create instances, etc.
	new_view(type)
	get_view(id)


Objects (instances):
--------------------

$object->
	new_view(type)
	get_view(id)
	number			- this is the `instance number', hopefully to
				  be eliminated!
	version
	metadata
		get(key)
		set(key, value)
		...
	acl			- this is a `closure' on the singleton ACL/indexing engine
		test(predicate)
		insert(rule)
		...
	class
		...
	---
	additional methods to access sub-components of the object, and to
	access the `body' of the object if it is a component, etc.


Views (types include - view, edit, configure, help, etc.):
----------------------------------------------------------

$view->
	size 			- hint (large, small, single-line)
	object			- this accesses the object's data (above)
		...
	user
		...
	connection
		...
	---
	additional methods to access sub-components of the view, and to
	access the `body' of the view if it is a component, etc.


----

That's the end of my ravings for the moment.

Sam