Introduction to Sociotechnical Engineering

Abstract

In software projects, individuals and technical artifacts influence each other, forming complex sociotechnical systems that require a holistic approach. Despite their complexity, sociotechnical systems can be approached similarly to technical systems in Systems Engineering—through requirements analysis, design, troubleshooting, and testing.

This article introduces an approach called Sociotechnical Engineering, which applies and adapts the classical categories and principles of Systems Engineering to the world of software projects as sociotechnical systems.

Introduction

Software projects consist of people and the technical artifacts they produce. These artifacts include not only code but also requirements, UI designs, test cases, user stories, coding guidelines, architectural documentation, CI/CD scripts, and many other elements. Artifacts are not only influenced by humans but also, in turn, influence us. Just as a traffic light can prompt us to stop, coding guidelines, for example, can shape our programming styles. Furthermore, artifacts serve as communication tools between people. This close interconnection and interaction between humans and artifacts suggest that both components should be viewed as a unified system. Such systems are often referred to as sociotechnical systems.

Despite their complexity, sociotechnical systems can be approached similarly to technical systems—through requirements analysis, design, troubleshooting, and testing. Sociotechnical Engineering reinterprets classical principles of Systems Engineering for the domain of sociotechnical systems.

Sociotechnical systems cannot be designed in a fully deterministic manner, as they are influenced by multiple sources of uncertainty, evolve over time, and are inhabited by people. Nevertheless, we can shape the conditions and context in which these systems develop according to our needs.

This article guides readers through the main categories of Systems Engineering—requirements analysis, system design, and troubleshooting—and adapts them to sociotechnical systems. It demonstrates how common approaches and tactics can be reused to manage the complexity of sociotechnical challenges.

Requirements

Requirements guide design. They create transparency and enable structured discussions about architectural decisions. Well-defined requirements help identify conflicts between different needs early on, allowing management to prioritize and communicate what is most important. This applies equally to technical systems and sociotechnical systems, such as software projects.

Functional Requirements

Functional requirements specify what the system should do. For sociotechnical systems, this includes producing various artifacts, such as code, documentation, configurations, project handbooks, UI designs, and more. Not only does the production-ready code matter. Intermediate artifacts—such as business cases, UI designs, requirements documentation, and test cases—are not merely by-products; they serve as essential communication mediums between people and thus play a crucial role in sociotechnical systems.

By listing all required artifacts (e.g., project handbooks, system requirements, architecture decisions, code, test cases, deployment scripts, configurations, etc.), uncertainty about the scope of deliveries can be reduced, and the roles needed to fulfill the project can be structured effectively.

Non-Functional Requirements

Non-functional requirements specify how the system should achieve its functionality. Classical attributes such as availability, performance, scalability, and security are just as relevant for sociotechnical systems.

Example 1: Decision Sovereignty

The project should always be able to make technical and business decisions independently.

This requirement prevents paralysis when key decision-makers are unavailable. It is akin to ensuring system availability, where the system can operate even if some components are down. Sociotechnical decisions, such as role descriptions and technical choices, must align with this requirement. For instance, while a centralized relational database might offer consistency, it may lead to decision bottlenecks if only one person understands its complexity. Alternatively, using multiple document databases may promote distributed decision-making at the cost of technical consistency. Architectural decisions are always trade-offs, and they require transparent requirements to guide them.

Example 2: Development Scalability

The project should be able to increase its delivery speed by scaling the team.

Though hiring more people seems like an obvious solution to boost performance, unprepared social and technical structures can hinder scalability. Sociotechnical systems are not inherently designed for scalability. Building such a quality into the system without explicit requirements could reduce performance in other areas. Scalability, if not required, would only enforce unnecessary decoupling of social and technical structures, potentially increasing system complexity and reducing consistency. Therefore, development scalability must be explicitly stated to ensure appropriate trade-offs are made.

Listing sociotechnical requirements early helps avoid personal biases, reduce political struggles, and identify conflicts, such as when availability and performance requirements clash.

Design

Elements of Sociotechnical Systems

In sociotechnical systems, people work on artifacts according to their roles. Three core element types define these systems: people, roles, and artifacts.

Key Elements

People: Individuals like Cassandra or Hector act as processing nodes, albeit non-deterministic ones.
Roles: Analogous to services, roles are allocated to individuals (e.g., "Lead Architect" is allocated to Cassandra). If Cassandra is unavailable due to illness or vacation, the service "Lead Architect" is unavailable. However, if the role is allocated with redundancy to both Cassandra and Hector, the role remains operational even if one person is unavailable. Thus, the statement "The role of Lead Architect is allocated to Cassandra" is more flexible than "Cassandra is the Lead Architect". This approach can be referred to as "Role-as-a-Service" (RaaS).
Artifacts: These are responsibilities tied to roles. For example, the role "Lead Architect" may entail responsibility for the artifact "Software Architecture Documentation"(SAD) which compiles system designs and technical decisions. This approach focuses on the quality and delivery of artifacts rather than individual activities, establishing clear server-client relationships between roles responsible for artifacts and their stakeholders.

The "Role-as-a-Service" approach, in which roles are defined by their responsibility for artifacts and assigned to individuals, differs from other role description approaches such as the Role Canvas, Role Model Canvas, or the RACI Matrix. These approaches focus on activities, tasks, and skills, providing a white-box view of the role.

In contrast, the "Role-as-a-Service" approach defines the role as a black-box through its public interface—specifically, the artifacts that the role is expected to deliver to its customers. When roles are viewed as services, the products of the service—the artifacts for the service’s customers—become more relevant than the activities performed by the service itself.

Roles-Artifacts-People Mapping

Roles-Artifacts-People Mapping is a collaborative exercise to identify dependencies between sociotechnical elements. A diagram illustrating dependencies around the artifact "Software Architecture Documentation" (SAD) is included below.

From such a diagram, one can infer:

The role of Software Architect has low availability because it is allocated only to Cassandra.
Cassandra is responsible for two roles: Software Architect and Software Engineer.
Kate, the Product Owner, is Cassandra's client because Product Owner is a stakeholder of SAD. This relationship allows Kate to express her informational needs to Cassandra. If Kate is dissatisfied, the diagram can guide troubleshooting efforts, such as increasing the availability of the architect role or reallocating responsibilities.

Reuse of Patterns and Tactics

With the help of a design language for sociotechnical systems, one can develop, discuss, and communicate design decisions to fulfill sociotechnical requirements effectively. There are numerous patterns and tactics to address various design challenges. For instance, the book Software Architecture in Practice 4th ed., by Len Bass, Paul Clements, and Rick Kazman catalogs dozens of tactics for meeting availability requirements. While not all of these tactics are applicable to every scenario, they serve as insightful references. Such catalogs are useful for identifying well-known solutions and provide a common vocabulary to streamline communication.

Similar catalogs of patterns and tactics for other requirement types—such as performance, scalability, and security—can be found in the same book or similar architectural resources.

Self-Test + Removal from Service

Problem: One wants to determine whether a role is not overly dependent on a single individual.

Solution: Self-Test + Removal from Service

Self-Test is a tactic that involves running procedures to verify the correctness of the system’s operation. Removal from Service entails temporarily taking a system component out of service to mitigate potential failures. These tactics can be combined effectively in sociotechnical systems.

For example, consider the role of "Lead Software Architect" allocated to Cassandra. The system could include a test scenario articulated in a BDD-style format to ensure availability:

Given Lead Software Architect Cassandra is on vacation

When An external software architecture audit is requested

Then The audit should not reveal any major findings

This scenario articulates clear expectations regarding the quality of service. It can also be executed to verify the system’s performance and address gaps in preparedness.

Redundant Spare and Shadow

Problem: One aims to increase the availability of a role and reduce dependency on individual persons.

Solution 1: Redundant Spare

This tactic ensures multiple components share the same functionality, providing backup in case of a failure. Applied to the example above, it would involve allocating the role of Lead Architect to two individuals, such as Cassandra and Jack. To ensure consistent delivery, the two individuals must share a common knowledge base and authority. This can be achieved through regular communication, co-attendance at meetings, and mutual review of each other’s work.

Solution 2: Shadow

The Shadow tactic introduces a backup component temporarily before the primary component goes offline. Afterward, the shadow component is gradually withdrawn when the primary component resumes its role. Unlike Redundant Spare, Shadow involves a shorter operational overlap, focusing on transitioning responsibility smoothly.

Similar catalogs of patterns and tactics for other requirement types, such as performance, scalability, and security, can be found in the same book or in similar architectural resources.

Troubleshooting

Sociotechnical systems, like technical systems, may perform poorly, incorrectly, or even become dysfunctional. While no single troubleshooting approach fits all issues, two guiding principles can help address problems in sociotechnical systems:

1. Bugs are relative to requirements

2. Problems in complex systems are symptoms of structural messes

Bugs Are Relative to Requirements

If a system doesn’t perform as expected, it isn’t necessarily a bug. A bug is defined as behavior that contradicts stated requirements. Consider the following:

The project is progressing too slowly.
Technical debt is accumulating rapidly.
There are excessive meetings.

These situations, while problematic, do not automatically warrant fixes unless they contradict specific requirements. For instance:

The project may be slow because there is no budget to expand the team.
Technical debt may have accumulated from a prototype designed for quick approval by sponsors.
Excessive meetings may result from the high priority of alignment between different functions, especially in safety-critical domains.

Before resolving such issues, it is crucial to define new requirements, identify any conflicts with existing ones, and assess their implications. This approach ensures that changes are deliberate, communicated effectively, and aligned with overarching project goals.

Problems in Complex Systems Are Symptoms of Structural Messes

The prominent systems thinker Russell Ackoff noted that problems in complex systems are often intertwined within a larger mess (Ackoff's Best: His Classic Writings on Management by Russell L. Ackoff). Solving isolated problems is ineffective; instead, the focus should be on dissolving the underlying mess.

Example: Excessive Meetings

A common problem in sociotechnical systems is meeting overload. Typical solutions might include:

Clustering meetings in specific time slots to increase focus time.
Encouraging team members to decline meetings where they have no contributions.
Appointing moderators and requiring agendas to increase meeting efficiency.

While these strategies address symptoms, they do not tackle the root cause of excessive meetings. To identify the mess, one must analyze the sociotechnical design.

For instance, two teams, A and B, working on tightly coupled components C1 and C2, may need constant communication to align their work. This interdependency generates excessive meetings. A potential solution could involve merging the components into a single one, allowing one team to take full responsibility. Alternatively, decoupling the components could reduce interdependence. The optimal solution depends on the specific context and requirements.

This example underscores the importance of root cause analysis and the use of visual sociotechnical design tools to explore and communicate solutions.

Sociotechnical Engineering in Practice

Sociotechnical Engineering can also be effectively applied in practice. In Systems Engineering projects, there is often a role called the Lead Systems Engineer (LSE), who is primarily responsible for two key artifacts: system requirements and system design. Similarly, a new role can be introduced in projects: the Lead Sociotechnical Engineer (LSTE). This role is responsible for two sociotechnical artifacts: Sociotechnical Requirements (STR) and Sociotechnical Design (STD). The LSTE either replaces or complements the traditional role of the project lead or project manager.

The LSTE serves as the central role that bridges organization and technology, defining the scope for both organizational and technical decisions. The two artifacts, STR and STD, are continuously created, discussed, adjusted, and communicated throughout the project:

At the Beginning of the Project:

Sociotechnical functional requirements are analyzed to identify the necessary artifacts and roles.
This analysis helps assess which people and competencies are needed for the project.
All decisions are documented in the STR and made available for relevant stakeholders to review.

After Project Start:

Non-functional requirements are analyzed and discussed collaboratively within the team.
Collaborative Roles-Artifacts-People mappings are used to document and communicate the sociotechnical architecture of the project.
Documented requirements, decisions, and designs are regularly updated and adjusted throughout the project as needed.
The resulting decisions and designs are stored in the STR and STD documents, made visible to all, and made available for review.

In Case of Issues:

Two troubleshooting principles can be applied to address sociotechnical challenges.
It is crucial that solution approaches and decisions are documented in STR and STD so they remain reviewable and communicable.

The creation and maintenance of STR and STD is the responsibility of the Lead Sociotechnical Engineer (LSTE), but it is conducted as a collaborative process. Depending on the context, multiple stakeholders are involved as reviewers and/or contributors.

Conclusion

Sociotechnical systems consist of people and artifacts. They are complex and dynamic. While the mainstream discourse on sociotechnical systems focuses on the alignment between social structures and code structures, this article expands the discussion by adding two additional dimensions:

1. The reuse of Systems Engineering approaches for sociotechnical systems.

2. The consideration of a broader range of artifacts created by humans—not just code—as relevant elements of sociotechnical systems.

These two dimensions are introduced through an approach called Sociotechnical Engineering, which transfers classical Systems Engineering categories and principles to sociotechnical systems. Additionally, this article introduces the concept of "Role-as-a-Service" (RaaS) to bridge the gap between people and artifacts.

These approaches not only enable the reuse of proven Systems Engineering methods to navigate complexity but also bring the established language of Systems Engineering into the sociotechnical domain. This facilitates communication and decision-making within software organizations and empowers them to tackle the challenges of sociotechnical systems more effectively.

Do you want to learn more about Sociotechnical Engineering? Feel free to reach out to me!