Sunday, October 26, 2008
Microblogging is not my thing...
A few weeks ago I got a phone call from a client. They want to insource a mission-critical piece of code. I talked about the concept of Habitable Software and thought I could write something here.
As I started to write, words unfolded in unexpected rivers. Apparently, I've got too much to say [and too little time].
So, I tried to use a mind map to get an overview of what I was trying to say.
Here it is (click to get a full-scale pdf):
Strictly speaking, it's not even a mind map, as I drew a graph, not a tree. I find the tree format very limiting, which is probably a side-effect of keeping a lot of connections in my mind.
Another side effect is that I find micro-blogging unsatisfactory. Sure, I could post something like:
interesting book, take a look
and get over, but it's just not my style.
Anyway, I'll try to keep this short and just add a link to the presentation on form Vs. function that I mentioned in the mind map: Integrating Form and Function. Don't mind the LISP stuff :-). That thing about the essential and contingent interpreter is great.
More on all this another time, as I manage to unravel the fabric of my mind :-)
As I started to write, words unfolded in unexpected rivers. Apparently, I've got too much to say [and too little time].
So, I tried to use a mind map to get an overview of what I was trying to say.
Here it is (click to get a full-scale pdf):
Strictly speaking, it's not even a mind map, as I drew a graph, not a tree. I find the tree format very limiting, which is probably a side-effect of keeping a lot of connections in my mind.
Another side effect is that I find micro-blogging unsatisfactory. Sure, I could post something like:
interesting book, take a look
and get over, but it's just not my style.
Anyway, I'll try to keep this short and just add a link to the presentation on form Vs. function that I mentioned in the mind map: Integrating Form and Function. Don't mind the LISP stuff :-). That thing about the essential and contingent interpreter is great.
More on all this another time, as I manage to unravel the fabric of my mind :-)
Labels: article reference, book reference, design, form, link
Saturday, October 18, 2008
Some Small Design Issues (part 2)
So far, so good. The only important choice we had to make was the placement of the display logic (stage 1). That choice was easy, so to speak: a little speculative thinking, along the lines of "do we want to reuse the same display logic with another process control system" would have been enough to choose. The "do the simplest thing" approach would have lead us in the wrong direction, but even in that case, refactoring the display logic outside process control would have been easy. Now, unfortunately, things are going to get messy.
Stage 4 - different data in different states
The system can be in different internal states. For instance, some physical components may undergo maintenance, so the corresponding software component would go into an offline state. The process itself may go into an idle state between different shifts, and so on.
Sometimes, the internal state is irrelevant. Sometimes, however, we may want to display different variables in different states. For instance, if the process is idle, we might want to see the time of day and a the seconds of idle time. It would make little sense to display any data about the product, as there is no product in the idle state.
Of course, the set of internal states can change between one system and another. In many cases, the state is stored internally and not published into the database. Even the internal model of "state" can be different: in some cases, the state manifests itself implicitly, through a set of internal variables; in some cases, a single variable holds the explicit system state.
Now, we basically have two main choices, with several sub-choices and consequences. We may keep the state inside the system, or we may require the system to publish its own state.
Keeping the state inside the system seems like a good choice: after all, it's what good ol' information hiding would like us to do. However, that doesn't really fit with the idea of an external display logic. How do we know which variables to show in each state, if we don't even know the state? Of course, we may come up with some kludge, like publishing "artificial" or "synthetic" variables. These variables are published only for the display module, and they change meaning as the internal state changes. That's ugly and would get worse in the next stages. Note, however, how information hiding is now going against the clean separation of concern we aimed for in stage 1.
We may therefore decide to publish the internal state. That's not completely correct either. If the state is implicitly stored inside several internal variables, we do not want to publish those variables. In general, we don't want to publish the actual state: we want to publish a business-relevant abstraction of the internal state.
At this point, however, configuration becomes harder. We have to decide which variables we want to show, in which display, in which state. The state has to be made explicit, and visible to the user (at least, as far as configuration goes). The display subsystem has to know about state too - not the internal state, but the published state. That's an undesirable form of coupling, and it's the first sign that our solution is not going to be great.
Stage 5 - different display types
Back in stage 2, we just wanted to handle different protocols. The display type, however, was "fixed": a few rows, each with a label and a floating-point value. Now we have to handle different display types too.
In some simplified systems, or in some areas of a plant, a a standard green-yellow-red semaphore can be used as a cheaper (and more intuitive :-) display, based on 2 thresholds.
In other cases, an LCD or plasma screen is adopted, and here we have a wide choice: show more data, show more text, show trends (graphically), and so on.
Again, here we face a relatively difficult coupling/cohesion choice. Consider the semaphore: where do we configure the thresholds, and were do we place the logic? We could do both inside the control system, and publish a new variable to control the lights. That's easy, but in a sense, we are moving some display concerns inside the system. The new variable is there only because we want a semaphore. We could put that logic inside the display subsystem (actually, inside the semaphore class). But a threshold is a process concept, and we also need some logic to prevent flickering, which is more akin process control logic, not display logic.
The trend is even worse. So far, each variable had one single slot inside the [real-time] database. There was no history. A trend requires history. Where do we put that? It's easier to store history inside the display logic (the trend class) because no changes are required to the control system and (more importantly) to the database. However, that's kinda clumsy. If the display subsystem is shut down for any reason, we won't be able to show a trend when it's started up again (not for a while, at least).
Also, we may have a performance issue here. When we want to show a few numeric values, we don't need to be fast (quite the opposite :-). So the process control system can publish once in a while, or the display can sample the database once in a while (if we can't push from the database). A trend, however, may require a much faster publishing/sampling than we were prepared to handle.
Again, it's hard to find a clear-cut winning solution. The two domains are entangled, and we seem to lack the right abstractions, something that would make the control and display more independent on each other.
Stage 6 - different data sources
This just adds to the mess. Now, different control systems are implemented by different teams, working on different kind of industrial processes. We have slow processes, where data could be stored in a plain, cheap relational database. We also have moderately fast processes, where we need a real-time database. We also have hard real-time systems, where data are published periodically through a proprietary, TCP-based messaging protocol (and not stored anywhere), and so on.
Now, unfortunately, we can't store the configuration inside the real-time database anymore, because there might not be a real-time database. Overall, is no big deal, but we have to reference a variable while we configure the display, and that reference can easily break if we change the process control system. What's worse,we may not have a way to know, besides running the system, which ain't nice.
So, here it is. The problem itself is relatively easy. We also have several simple solutions. However, most of them lack elegance and quality. The natural temptation is to go after higher levels of abstractions, along the "enterprise data bus" concept on one side and the OPC initiative on the other. I've seen that, and what you get is usually a slow behemoth that nobody really likes. Another natural temptation is to go after the MVC idea and to create several "controllers" to mediate between processes and displays. In practice, we just give up and declare that mixing display concerns and process concerns is not that bad, and we name that [gordian] knot "controller". Not really elegant.
Curiously enough, I have seen similar (although superficially quite different) problems in different domains, like cab dispatching. There is probably a meta-problem pattern at play, but I haven't got the time to investigate the issue (I do have a few ideas though).
Still, it's damn hard to find a simple, elegant, flexible solution where the processes and the displays are nicely separated. Part of the problem, I think, is that we lack a way to model the force field.
More on this another time: meanwhile, if you've got any brilliant idea, just let me know :-).
Stage 4 - different data in different states
The system can be in different internal states. For instance, some physical components may undergo maintenance, so the corresponding software component would go into an offline state. The process itself may go into an idle state between different shifts, and so on.
Sometimes, the internal state is irrelevant. Sometimes, however, we may want to display different variables in different states. For instance, if the process is idle, we might want to see the time of day and a the seconds of idle time. It would make little sense to display any data about the product, as there is no product in the idle state.
Of course, the set of internal states can change between one system and another. In many cases, the state is stored internally and not published into the database. Even the internal model of "state" can be different: in some cases, the state manifests itself implicitly, through a set of internal variables; in some cases, a single variable holds the explicit system state.
Now, we basically have two main choices, with several sub-choices and consequences. We may keep the state inside the system, or we may require the system to publish its own state.
Keeping the state inside the system seems like a good choice: after all, it's what good ol' information hiding would like us to do. However, that doesn't really fit with the idea of an external display logic. How do we know which variables to show in each state, if we don't even know the state? Of course, we may come up with some kludge, like publishing "artificial" or "synthetic" variables. These variables are published only for the display module, and they change meaning as the internal state changes. That's ugly and would get worse in the next stages. Note, however, how information hiding is now going against the clean separation of concern we aimed for in stage 1.
We may therefore decide to publish the internal state. That's not completely correct either. If the state is implicitly stored inside several internal variables, we do not want to publish those variables. In general, we don't want to publish the actual state: we want to publish a business-relevant abstraction of the internal state.
At this point, however, configuration becomes harder. We have to decide which variables we want to show, in which display, in which state. The state has to be made explicit, and visible to the user (at least, as far as configuration goes). The display subsystem has to know about state too - not the internal state, but the published state. That's an undesirable form of coupling, and it's the first sign that our solution is not going to be great.
Stage 5 - different display types
Back in stage 2, we just wanted to handle different protocols. The display type, however, was "fixed": a few rows, each with a label and a floating-point value. Now we have to handle different display types too.
In some simplified systems, or in some areas of a plant, a a standard green-yellow-red semaphore can be used as a cheaper (and more intuitive :-) display, based on 2 thresholds.
In other cases, an LCD or plasma screen is adopted, and here we have a wide choice: show more data, show more text, show trends (graphically), and so on.
Again, here we face a relatively difficult coupling/cohesion choice. Consider the semaphore: where do we configure the thresholds, and were do we place the logic? We could do both inside the control system, and publish a new variable to control the lights. That's easy, but in a sense, we are moving some display concerns inside the system. The new variable is there only because we want a semaphore. We could put that logic inside the display subsystem (actually, inside the semaphore class). But a threshold is a process concept, and we also need some logic to prevent flickering, which is more akin process control logic, not display logic.
The trend is even worse. So far, each variable had one single slot inside the [real-time] database. There was no history. A trend requires history. Where do we put that? It's easier to store history inside the display logic (the trend class) because no changes are required to the control system and (more importantly) to the database. However, that's kinda clumsy. If the display subsystem is shut down for any reason, we won't be able to show a trend when it's started up again (not for a while, at least).
Also, we may have a performance issue here. When we want to show a few numeric values, we don't need to be fast (quite the opposite :-). So the process control system can publish once in a while, or the display can sample the database once in a while (if we can't push from the database). A trend, however, may require a much faster publishing/sampling than we were prepared to handle.
Again, it's hard to find a clear-cut winning solution. The two domains are entangled, and we seem to lack the right abstractions, something that would make the control and display more independent on each other.
Stage 6 - different data sources
This just adds to the mess. Now, different control systems are implemented by different teams, working on different kind of industrial processes. We have slow processes, where data could be stored in a plain, cheap relational database. We also have moderately fast processes, where we need a real-time database. We also have hard real-time systems, where data are published periodically through a proprietary, TCP-based messaging protocol (and not stored anywhere), and so on.
Now, unfortunately, we can't store the configuration inside the real-time database anymore, because there might not be a real-time database. Overall, is no big deal, but we have to reference a variable while we configure the display, and that reference can easily break if we change the process control system. What's worse,we may not have a way to know, besides running the system, which ain't nice.
So, here it is. The problem itself is relatively easy. We also have several simple solutions. However, most of them lack elegance and quality. The natural temptation is to go after higher levels of abstractions, along the "enterprise data bus" concept on one side and the OPC initiative on the other. I've seen that, and what you get is usually a slow behemoth that nobody really likes. Another natural temptation is to go after the MVC idea and to create several "controllers" to mediate between processes and displays. In practice, we just give up and declare that mixing display concerns and process concerns is not that bad, and we name that [gordian] knot "controller". Not really elegant.
Curiously enough, I have seen similar (although superficially quite different) problems in different domains, like cab dispatching. There is probably a meta-problem pattern at play, but I haven't got the time to investigate the issue (I do have a few ideas though).
Still, it's damn hard to find a simple, elegant, flexible solution where the processes and the displays are nicely separated. Part of the problem, I think, is that we lack a way to model the force field.
More on this another time: meanwhile, if you've got any brilliant idea, just let me know :-).
Labels: design
Sunday, October 12, 2008
Some Small Design Issues (part 1)
In a previous post, I talked about some small, yet thorny design problems I was facing. As I started writing about them, it became clear that real-world design problems are never so small: there is always a large context that is somehow needed to fully understand the issues.
Trying to distill the problem to fit a blog post is a nightmare: it takes forever, it slows me down to the point I'm not blogging anymore, and is exactly the opposite of what I meant when I wrote Blogging as Destructuring a few years ago. On a related note, Ed Yourdon (at his venerable age :-) is moving toward microblogging for similar reasons.
Still, there is little sensible design talk around, so what does a [good] man gotta do? Simplify the problem even more, split the tale in a few episodes, and so on.
I said "tale" because I'll frame the design problem as a story. I don't mean to imply that things went exactly this way. Actually, I wasn't even there at the time. However, looking at the existing artifacts, it seems reasonable that they somehow evolved that way.
Also, an incremental story is an interesting narrative device for a design problem, as it allows to put every single decision in perspective, and to reason about the non-linear impact of some choices.
Stage 1 - the beginning
We have some kind of industrial process control system. We want to show a few process variables on a large-size numeric display, like in the picture below:

At this point the problem is quite simple, yet we have one crucial choice to make: where do we put the new logic?
We have basically three alternatives:
1) inside an existing module/process, that is, "close to the data"
2) in a new/external process, connected via IPC to the existing process[es]. Connection might be operating in a push or pull mode, depending on update rate and so on (we'll ignore this for sake of simplicity).
3) in a new/external process, obtaining data through a [real-time] database or a similar middleware. The existing processes would have to publish the process variables on the database. The new process might pull data or be pushed data, depending on the data source.
Even at this early stage, we need to make an important architectural decision. It's interesting to see that in very small systems, where all the data is stored inside one process, alternative (1) is simpler. Everything we need "is just there", so why don't we add the display logic too?
This is how simple, lean systems get to be complex, fragile, bulky systems: it's just easier to add code where the critical mass is.
So, let's say our guys went for alternative (3). We have one data source where all the relevant variables are periodically stored.

Now, we just need to know which variables we want to show, and in which row. For simplicity, the configuration could be stored inside the database itself, like this (through an OO perspective):

Using an ugly convention, "-1" as a row value indicates that the process variable isn't shown at all.
Stage 2 - into the real world
Customers may already have a display, or the preferred supplier may discontinue a model, or sell a better/cheaper/more reliable one, and so on. Different displays have different protocols, but they're just multi-line displays nonetheless.
Polymorphism is just what the doctor ordered: looking inside the Display component, we might find something like this:

It's trivial to keep most of the logic to get and format data unchanged. Only the protocol needs to become an extension point. Depending on the definition of protocol (does it extend to the physical layer too?) we may have a slightly more complex, layered design, but let's keep it simple - there are no challenges here anyway.
Stage 3 - more data, more data!
Processes gets more and more complex, and customers want to see more data. More than one display is needed. Well, it's basically trivial to modify the database to store the display number as well.
A "display number" field is then added to the Process Variable Descriptor. Note that at this point, we need a better physical display, as the one in the picture above has hard-coded labels. We may want to add one more field to the descriptor (a user-readable name), and our protocol class may or may not need some restyling to account for this [maybe optional] information. The multiplicity between "Everything Else" and "Display Protocol" is no longer 1. Actually, we have a qualified association, using the display number as a key (diagram not shown). No big deal.
Note: at this stage, a constraint has been added, I guess by software engineers, not process engineers: the same process variable can't be shown on two different displays. Of course, a different database design could easily handle this limitation, but it wasn't free, and it wasn't done.
Hmmm, OK, so far, so good. No thorny issues. See you soon for part 2 :-).
Trying to distill the problem to fit a blog post is a nightmare: it takes forever, it slows me down to the point I'm not blogging anymore, and is exactly the opposite of what I meant when I wrote Blogging as Destructuring a few years ago. On a related note, Ed Yourdon (at his venerable age :-) is moving toward microblogging for similar reasons.
Still, there is little sensible design talk around, so what does a [good] man gotta do? Simplify the problem even more, split the tale in a few episodes, and so on.
I said "tale" because I'll frame the design problem as a story. I don't mean to imply that things went exactly this way. Actually, I wasn't even there at the time. However, looking at the existing artifacts, it seems reasonable that they somehow evolved that way.
Also, an incremental story is an interesting narrative device for a design problem, as it allows to put every single decision in perspective, and to reason about the non-linear impact of some choices.
Stage 1 - the beginning
We have some kind of industrial process control system. We want to show a few process variables on a large-size numeric display, like in the picture below:

At this point the problem is quite simple, yet we have one crucial choice to make: where do we put the new logic?
We have basically three alternatives:
1) inside an existing module/process, that is, "close to the data"
2) in a new/external process, connected via IPC to the existing process[es]. Connection might be operating in a push or pull mode, depending on update rate and so on (we'll ignore this for sake of simplicity).
3) in a new/external process, obtaining data through a [real-time] database or a similar middleware. The existing processes would have to publish the process variables on the database. The new process might pull data or be pushed data, depending on the data source.
Even at this early stage, we need to make an important architectural decision. It's interesting to see that in very small systems, where all the data is stored inside one process, alternative (1) is simpler. Everything we need "is just there", so why don't we add the display logic too?
This is how simple, lean systems get to be complex, fragile, bulky systems: it's just easier to add code where the critical mass is.
So, let's say our guys went for alternative (3). We have one data source where all the relevant variables are periodically stored.

Now, we just need to know which variables we want to show, and in which row. For simplicity, the configuration could be stored inside the database itself, like this (through an OO perspective):

Using an ugly convention, "-1" as a row value indicates that the process variable isn't shown at all.
Stage 2 - into the real world
Customers may already have a display, or the preferred supplier may discontinue a model, or sell a better/cheaper/more reliable one, and so on. Different displays have different protocols, but they're just multi-line displays nonetheless.
Polymorphism is just what the doctor ordered: looking inside the Display component, we might find something like this:

It's trivial to keep most of the logic to get and format data unchanged. Only the protocol needs to become an extension point. Depending on the definition of protocol (does it extend to the physical layer too?) we may have a slightly more complex, layered design, but let's keep it simple - there are no challenges here anyway.
Stage 3 - more data, more data!
Processes gets more and more complex, and customers want to see more data. More than one display is needed. Well, it's basically trivial to modify the database to store the display number as well.
A "display number" field is then added to the Process Variable Descriptor. Note that at this point, we need a better physical display, as the one in the picture above has hard-coded labels. We may want to add one more field to the descriptor (a user-readable name), and our protocol class may or may not need some restyling to account for this [maybe optional] information. The multiplicity between "Everything Else" and "Display Protocol" is no longer 1. Actually, we have a qualified association, using the display number as a key (diagram not shown). No big deal.
Note: at this stage, a constraint has been added, I guess by software engineers, not process engineers: the same process variable can't be shown on two different displays. Of course, a different database design could easily handle this limitation, but it wasn't free, and it wasn't done.
Hmmm, OK, so far, so good. No thorny issues. See you soon for part 2 :-).
Labels: design, link, profession
Wednesday, October 01, 2008
Ricerca Sviluppatori
Un mio cliente in Emilia sta cercando un paio di nuovi sviluppatori. Metto di seguito un po' di caratteristiche desiderabili (non serve averle tutte ma aiuta :-) ed una descrizione di quali sarebbero le prime attivita' di cui occuparsi. Se siete interessati mandate a me il vostro CV, all'indirizzo jobs@eptacom.net, anche perche' sui primi progetti sicuramente lavoreremmo spesso insieme...
Caratteristiche ideali [ripeto, non serve averle tutte]:
- Neolaureato, ma siamo interessati anche candidati con 2/3 anni di esperienza
- Preferibile laurea in Informatica o Ingegneria Informatica
- Mente elastica, niente fissazioni pseudo-religiose su linguaggi, tool, metodi e quant'altro.
- Buona conoscenza del C++, idealmente del C++ moderno (smart pointers, stl, eccezioni, template, ecc)
- Familiarita' con i concetti dell'OOP (polimorfismo e compagni)
- Non farebbe male una certa conoscenza delle API di Windows, cosi' come di C# e .NET
- Una certa sensibilita' alle questioni di design, capacita' di leggere un diagramma UML, conoscenza di qualche pattern, ecc
- Conoscenza dei principali algoritmi e un buon "pensiero algoritmico" per la soluzione di problemi
- Una base di ingegneria del software, anche quella "limitata" tipica dei corsi universitari. Diciamo che se gli passo un articolo di IEEE Transactions on Software Engineering da leggere (tipicamente perche' inerente il lavoro) mi piacerebbe che non cascasse dalle nuvole.
- La sede del lavoro e' in Emilia, e' necessario lavorare in sede.
Di cosa si occuperanno:
inizialmente lavoreremo insieme su alcuni tool di sw engineering (che non sono il core business dell'azienda, ma ci servono per fare alcune analisi, prendere alcune decisioni, e capire se riusciamo a realizzare alcune cose in modo automatico).
Sicuramente lavoreremo anche sul refactoring di codice esistente, in ottica anche "formativa" per gli altri sviluppatori (ovvero, il risultato sara' "di metodo" oltre che "di codice"). Anche qui, idealmente penseremo un po' a come automatizzare alcune trasformazioni.
Dopodiche' seguiranno un po' di progetti "speciali" dentro il core business, ma sicuramente ci sara' spesso da leggere / capire codice esistente, quindi se guardare il codice altrui vi fa proprio paura :-), purtroppo non fa per voi, altrimenti, fatevi sentire :-).
Labels: announce





