Yevgen Nebesov
- Sep 1
- 5 min read

Socio-technical Engineering Part 4: Design Patterns and Tactics

Introduction

Digitalization projects serve as examples of socio-technical systems. These systems consist of people organized within social structures and incorporate technical artifacts such as code and infrastructure. Like other human-made systems, socio-technical systems are subject to requirements, as discussed in Part 2 of this series. Although systems exist in various domains (biological, mechanical, economic, ecological, social, technical, and socio-technical), they often share similar types of requirements. Design solutions that fulfill specific requirements in one domain can be adapted and applied to another domain. For instance, the design of helicopters can draw inspiration from dragonflies, and principles of feedback control can be found in mechanical and chemical systems. To translate a solution from one domain to a similar problem in another domain, it is crucial to understand which design decisions constitute a solution and how these decisions arrange the system elements. The previous post explained the basic elements of socio-technical systems and introduced the three atomic decision types that express the design of any system. With this background, we can leverage design patterns and tactics from other domains and apply them to socio-technical systems. Let's explore the possibilities it offers.

Case Study: Availability

Availability refers to a system's ability to remain operational over time. Here's an example of an availability requirement for a socio-technical system:"We want to be able to make any architectural decision at any time."

Imagine a project with a patriarchal Lead Software Architect in an ivory tower who is involved in all technical concepts and approves all significant architectural decisions. Unfortunately, this project does not meet the aforementioned requirement because the Lead Software Architect can fall sick, go on vacation, or even leave the project. Many of you may be familiar with this situation, where statements like "We can't decide without our project lead/manager or some smart employee Cassandra" arise. Now, let's explore how we can design our socio-technical system to satisfy this availability requirement. We will employ generic tactics for availability applicable to any system. Here is a tactics tree for availability from the book "Software Architecture in Practice" 4.th ed.

I won't delve deeply into each tactic as not all of them are fully relevant to our problem. You can use this tree as a quick reference guide. Let's select a couple of tactics and see how we can interpret them for our availability scenario.

Please note: The application of tactics depends on your degree of freedom, which is influenced by the number of architectural decisions (decomposition, integration, and allocation) made before attempting to solve the problem. For example, you have

made the decomposition decision to have the role of Lead Software Architect,
made the allocation decision to assign responsibility for Software Architecture Documentation to the Lead Software Architect,
made the integration decision for the Lead Software Architect to review and approve all changes and suggestions to the Software Architecture Documentation,
and allocated the role of Lead Software Architect to Cassandra.

In this case, your solution space is extremely limited. However, if none of these decisions have been made, you may end up with a solution that doesn't include the role of Lead Software Architect at all.

Self-Test + Removal from Service

Self-test is a tactic that involves running procedures to test the correctness of the system's operation. Another tactic, Removal from Service, entails temporarily placing a system component in an out-of-service state to mitigate potential system failures. The combination of these two tactics can be applied in the following example.

Example

Let's assume the decision to allocate the role of Lead Software Architect to Cassandra has already been made. Here's a BDD-style scenario to self-test your system for availability:

GIVEN Lead Software Architect Cassandra is on vacation

WHEN an external software architecture audit for the project is requested

THEN the audit should not result in any major findings

This particular tactic does not provide a solution for fulfilling the self-test requirement directly. However, it offers guidance on evaluating the health of the socio-technical system and verifying design decisions. You can run this test scenario by ordering an audit during Cassandra’s vacation. Just don’t tell her ;)

Redundant Spare

This tactic involves having multiple components with the same functionality to provide coverage in case of component failure.

Example

Applied to our problem, it could mean allocating the role of Lead Software Architect to multiple individuals (components) - Cassandra and Jack. These two components should share a common knowledge base and authority. To achieve this, they can communicate daily, attend all meetings together, and review each other's work.

Shadow

This tactic introduces a spare component gradually before the main component's downtime and then incrementally puts the spare component out of service when the main component resumes its active role. This tactic is similar to the redundant spare tactic, but the operational overlap of the two components is shorter in the case of shadow.

Example

Cassandra remains the main component fulfilling the role of Lead Software Architect. However, one week before her vacation, she starts the handover process to Jack. During this week, Jack operates in a "shadow mode" alongside Cassandra. Throughout her vacation and one week after, Jack fully takes on the role of Lead Software Architect. During the one-week transition period after Cassandra's vacation, she continues operating in a "shadow mode" alongside Jack. After the transition period, the role is entirely reassigned to Cassandra, and Jack is no longer involved.

These were just a few tactics. The selection and application of tactics depend on the specific availability requirements and degrees of freedom in your context. Every design begins with requirements, and you can refine your requirements, such as those for availability, by specifying BDD-style test scenarios and using them to verify your system.

Please note that having an architect overseeing all decisions is not inherently good or bad; it depends on your specific requirements. Vertical scalability, which focuses on enhancing individual resources without increasing their number, can serve as a performance booster. However, it often clashes with the goal of ensuring availability. This conflict is also apparent in socio-technical systems.

Indeed, operating without the need for constant synchronization and alignment on every aspect can enhance the productivity of an individual (vertical scaling) but may reduce overall availability. Many organizations possess highly efficient individuals who specialize in certain areas and are referred to as "experts" or "stars" by some, while others consider them "bottlenecks" or "single points of failure." The perception of these individuals depends on the organization's specific requirements.

Conclusion

Systems are ubiquitous, and systems engineers from different domains can learn from each other to solve structurally similar problems. By understanding the elements that populate socio-technical systems and the types of decisions that shape them(see Part 3 in this series), we can readily benefit from patterns and tactics from the domain of software systems. All we need to do is reinterpret software engineering ideas in the context of socio-technical elements and decisions. This post provided a case study on applying these concepts to availability requirements, focusing on maintaining system operation even when some resources are unavailable. In a future post, I will explore a similar study for scalability requirements and how to enhance a project by scaling it up. As I mentioned in a previous post, simply hiring more people can be counterproductive. Stay tuned for more insights.

Socio-technical Engineering Part 4: Design Patterns and Tactics

Case Study: Availability

Self-Test + Removal from Service

Example

Redundant Spare

Example

Shadow

Example

Conclusion

Recent Posts

Comments