6. Design patterns (extensibility)¶
The aim of the laboratory¶
The goals of this laboratory (based on a more complex, real-life example):
- Practicing some basic design principles that support extensibility, reusability, code clarity, and maintainability: SRP, OPEN-CLOSED, DRY, KISS, etc.
- Applying design patterns most closely related to extensibility (Template Method, Strategy, Dependency Injection).
- Practicing and combining additional techniques that support extensibility and reusability (e.g. delegate/lambda expressions) with design patterns.
- Practicing code refactoring.
Related lectures:
- Design patterns: patterns related to extensibility (introduction, Template Method, Strategy), as well as the “pattern” of Dependency Injection.
Prerequisites¶
Tools required for completing the laboratory:
- Visual Studio 2022
Laboratory on Linux or macOS
The material for this laboratory is primarily intended for Windows and Visual Studio, but it can also be completed on other operating systems with different development tools (e.g., VS Code, Rider, Visual Studio for Mac), or even with a simple text editor and CLI tools. This is made possible by the fact that the examples are presented in the context of a simple Console application (no Windows-specific elements), and the .NET 8 SDK is supported on Linux and macOS. Hello World on Linux.
Theoretical background and approach *¶
When developing more complex applications, we face numerous design decisions where we must choose from various options. If we do not consider maintainability and extensibility during these decisions, development can quickly become a nightmare. Client requests for changes and enhancements often require large-scale code rewrites/modifications, which can introduce new bugs and require significant effort for retesting the codebase.
Our goal is to implement such change and extension requirements by extending the code at a few well-defined points — without significantly modifying existing code. The keyword is: extension instead of modification. Relatedly, if certain logic is extensible, it will also be more general, and can be reused more easily in multiple contexts. In the long run, this leads to faster progress, shorter code, and avoids code duplication (which also improves maintainability).
Design patterns provide proven solutions to common design problems: they help make our code more extensible, maintainable, and as reusable as possible. This laboratory focuses on those patterns, design principles, and programming tools that address the above issues. However, we should avoid overengineering: only apply a design pattern if it provides real benefits in a given case. Otherwise, it just adds unnecessary complexity. In this spirit, our goal is not (and often not even possible) to foresee every future extensibility need. The key is to start from a simple solution, recognize issues as they arise, and continuously refactor our code to meet current (functional and non-functional) requirements and improve extensibility and reusability where appropriate.
It’s also worth noting that related design patterns and language tools can greatly assist in making our code unit testable: in many companies, it's a (justified) basic expectation during software development that developers write unit tests with high code coverage. However, achieving this is practically impossible if the units/classes in our code are too tightly coupled.
Task 0 – Getting familiar with the assignment and the starter application¶
Clone the starting application repository for Lab 6 from the following repository:
- Open a command prompt
- Navigate to any folder, for example:
c:\work\NEPTUN
- Run the following command:
git clone https://github.com/bmeviauab00/lab-patterns-extensibility-kiindulo.git
- Open the Lab-Patterns-Extensibility.sln solution in Visual Studio.
Task description¶
During this lab, we will work on a console-based data processing (more precisely, anonymizing) application, making it extensible in different ways using various techniques, in response to continuously evolving requirements. As part of the first task, we’ll also get familiar with the concept of anonymization.
The input of the application is a CSV text file, where each line contains data related to a particular person. Open the us-500.csv file in the Data folder of the filesystem (double-click it or open with Notepad). You will see the data of individuals listed in quotes and separated by commas (note: these are not real individuals). Let’s look at the first line:
"James","Rhymes","Benton, John B Jr","6649 N Blue Gum St","New Orleans ","Orleans","LA","70116","504-621-8927","504-845-1427","30","65","Heart-related","jRhymes@gmail.com"
The first person in the data row is named James Rhymes and works at the company "Benton, John B Jr". The next few fields represent address-related data. He is 30 years old and weighs 65 kg. The following field describes a more serious illness (in this case, "Heart-related"). The last column contains the person's email address.
Source and exact format of the data *
The data source is: https://www.briandunning.com/sample-data/, with a few extra fields added (age, weight, illness). The field order is: First Name, Last Name, Company, Address, City, County (where applicable), State/Province (where applicable), ZIP/Postal Code, Phone 1, Phone 2, Age, Weight, Illness, Email.
The primary function of the application is to anonymize these records based on the current requirements and write the results to an output CSV text file. The goal of anonymization is to transform the data in such a way that individuals cannot be identified, while still allowing for meaningful reporting. Anonymization is a distinct, serious, and challenging area in data processing. In this lab, our goal is not to develop solutions that are usable in real-world environments or entirely meaningful. The main focus is on applying a data processing algorithm to demonstrate software design patterns. This provides a more engaging context than simple data filtering/sorting/etc., which .NET already supports natively.
A few thoughts about anonymization
At first glance, anonymization might seem like a simple problem. For example, one might think it's enough to remove or mask people's names, street addresses, phone numbers, and email addresses. For example, for the first row of our input, the output might look like this:
"***","***","Benton, John B Jr","***","New Orleans ","Orleans","LA","70116","***","***","30","65","Heart-related","***"
But it’s not that simple, especially when dealing with large amounts of data. Imagine a small village where only a few people live. Suppose an anonymized person is 14 years old and weighs 95 kg. This is a rare ‘combination’, there is a good chance that no other person with these parameters lives in the village. If one of his classmates (an eighth grader, he is 14) looks at the ‘anonymised’ data, he will know who he is (there is no other eighth grader in the school who is that overweight), he will identify the person. So, for example, you he/she will know what illness the person has. Lesson: data can be revealing in context.
So, what’s the solution? We can’t just remove or mask the city, age, or weight because we need them for reporting. A common approach is to generalize the data instead of showing exact values. For example, instead of showing an exact age or weight, we show a range (e.g., 10–20 years, 80–100 kg). This way, individuals cannot be identified. We will also use this technique later on.
Initial requirements¶
The initial requirements for the application are:
- Files received from a specific client (all with the same format) must be converted using the same anonymizing algorithm into the same output format. The anonymization should simply involve "masking" the first and last names.
- Some data cleaning is needed. There may be unnecessary
_
and#
characters at the beginning or end of the city field in the input data, and these should be removed (trim operation). - After processing each row, the application must print to the console that the row has been processed, and after processing all the data, some summary information should be displayed: how many rows were processed and how many required trimming of the city name.
- Key aspect: The application will be used only for a short period and is not intended to be expanded later.
Note: In order to work with fewer fields in the code and to make the output clearer, a few fields will be omitted during processing.
For example, the expected output for the first row of our input file would be:
***; ***; LA; New Orleans; 30; 65; Heart-related
Solution 1 - All-in-one (1-Start/Start)¶
In the Visual Studio Solution Explorer, we see folders named with numbers 1 through 4. These contain the solutions for each iteration of work. The first solution is located in the "1-Start" folder under the project name "Start". Let’s take a look at the files in the project:
Person.cs
- Contains the data of a person that is of interest to us. We read each person’s data into objects of this class.Program.cs
- The main function is implemented here, with all the logic, "separated" by code comments. If the logic becomes more complicated, even after a few days (or hours), we might struggle to understand our own code. Let’s not focus on this solution.
Overall, the solution is very simple since we don’t foresee a long future for the code. However, the “script-like,” “all-in-one” solution that puts everything in one function is still not the best direction. It makes the code hard to understand and difficult to follow. Let’s not delve into this further.
Solution 2 (2-OrganizedToFunctions/OrganizedToFunctions-1)¶
Let’s move on to the solution found in the "2-OrganizedToFunctions" folder, in the project "OrganizedToFunctions-1" in Visual Studio. This solution is much more appealing because we have split the logic into functions. Let’s review the code briefly:
Anonymizer.cs
- The
Run
function is the "backbone" and contains the control logic. It calls the functions responsible for each step. ReadFromInput
: This function reads the source file, creates aPerson
object for each line, and returns a list of the readPerson
objects.TrimCityNames
: It performs data cleaning (trimming the city names).Anonymize
: It’s called for eachPerson
object read and is responsible for returning a newPerson
object with anonymized data.WriteToOutput
: Writes the anonymizedPerson
objects to the output file.PrintSummary
: Prints the summary of the process to the console at the end.
Program.cs
- Creates an
Anonymizer
object and runs it with theRun
function. Notice that the string used for masking in the anonymization process is provided as a constructor parameter.
Let’s try running it! Set "OrganizedToFunctions-1" as the startup project in Visual Studio (right-click and Set as Startup Project), then run it:
The output file can be found in the file manager in the "OrganizedToFunctions-1\bin\Debug\net8.0\" or a similarly named folder as "us-500.processed.txt". Open it and take a look at the data.
Evaluation of the solution¶
- The solution is fundamentally well-structured and easy to understand.
- It follows the KISS (Keep It Stupid Simple) principle, avoiding unnecessary complications. This is good because there are no anticipated future development needs, and there is no need to support different formats, logic, etc.
-
However, the solution does not follow one of the most fundamental and well-known design principles, namely the Single Responsibility Principle (SRP). This principle expects that each class should have only one responsibility (it should focus on just one thing).
- It’s clear that the
Anonymizer
class has multiple responsibilities: processing input, data cleaning, anonymization, producing output, etc. - This problem is not immediately noticeable and doesn’t cause any issues for us because each responsibility is simple and "fits" into a short function. But if any of these responsibilities became more complex, or were implemented in multiple functions, they should definitely be organized into separate classes.
Why is it problematic if a class has multiple responsibilities? *
- It becomes harder to understand its operation because it doesn’t focus on a single task.
- If any responsibility needs to change, the large class handling multiple tasks must be modified and retested.
- It’s clear that the
-
Automated integration (input-output) tests can be written for the solution, but "real" unit tests are not feasible.
Solution 3 (OrganizedToFunctions-2-TwoAlgorithms)¶
In contrast to the previous "plans", new user requirements have emerged. Our client changed his mind and now requests a different anonymization algorithm for a new data set: the ages of the individuals need to be stored in ranges, so the exact ages of the people should not be revealed. For simplicity, we will not anonymize the names in this case, considering it a "pseudo" anonymization (it can still make sense, but it’s not entirely accurate to call it anonymization).
Our solution, which supports both the old and the new algorithm (but only one at a time), can be found in the Visual Studio project OrganizedToFunctions-2-TwoAlgorithms. Let's take a look at the Anonymizer
class, focusing on the design principles:
- We introduced an
AnonymizerMode
enum type to determine which mode (algorithm) we will use for theAnonymizer
class. - The
Anonymizer
class now has two anonymization operations:Anonymize_MaskName
andAnonymize_AgeRange
. - The
Anonymizer
class stores the selected algorithm mode in the_anonymizerMode
field. Two constructors have been created to set the_anonymizerMode
based on the chosen mode. - The
Anonymizer
class checks the value of_anonymizerMode
in several places (e.g.,Run
,GetAnonymizerDescription
methods) and branches accordingly. -
In the
GetAnonymizerDescription
method, this check is essential because it is responsible for generating a one-line description of the anonymization algorithm. This description appears in the "summary" at the end of the process. For instance, if we are using the age anonymizer with a 20-year range, this summary will look like:Summary - Anonymizer (Age anonymizer with range size 20): Persons: 500, trimmed: 2
Evaluation of the solution¶
Overall, our solution has worse code quality compared to the previous one. Initially, it wasn’t an issue that the anonymization algorithms were not extendable because there was no demand for it. However, once the need to introduce a new algorithm arose, it became a problem that we didn’t make the solution extensible: now, we can expect that more algorithms will be introduced in the future.
Why do we say that our code is not extendable when "only" a new enum value and an extra if
/switch
branch should be introduced at some point in the code when a new algorithm should be introduced?
Open/Closed Principle
The key point is that we consider a class extendable if new behavior (in our case, new algorithms) can be introduced without modifying the class itself, simply by extending or adding to the code. In other words, in this case, we should not need to modify the Anonymizer
class, which is clearly not the case. This principle is known as the Open/Closed Principle: the class should be Open for Extension, Closed for Modification. The problem with modifying the code is that we likely introduce new bugs and need to retest the modified code, which can result in significant time/cost investment.
What is the exact goal, and how can we achieve it? There are certain parts of our class that we don’t want to hard-code:
- These are not data, but behaviors (code, logic).
- We don't solve these using
if
/switch
statements: we introduce "extension points" and somehow allow "any" code to run in those places. - The variable, case-dependent parts of the code should be moved to other classes (in a replaceable manner from the perspective of the class)!
Note
No magic here, we’ll use the tools we already know: inheritance with abstract/virtual methods, interfaces, or delegates.
Let’s identify the parts of our class that involve case-dependent, variable logic, and shouldn’t be hard-coded into the Anonymizer
class:
- One is the anonymization logic:
Anonymize_MaskName
/Anonymize_AgeRange
- The other is the
GetAnonymizerDescription
These need to be decoupled from the class, and we need to make these points extendable. The following diagram illustrates the general goal:
We’ll look at three specific design patterns/techniques for achieving the above:
- Template Method pattern
- Strategy pattern (with Dependency Injection)
- Delegate (optionally with Lambda expressions)
We have actually used these patterns during our studies before, but now we’ll explore them more deeply and practice applying them in a broader context. We will apply the first two in the lab, and the third will be practiced in a related homework assignment.
Solution 4 (3-TemplateMethod/TemplateMethod-1)¶
In this step, we will apply the Template Method design pattern to make our solution extendable at the necessary points.
Note
The name of the pattern is "misleading": it has nothing to do with the template methods we learned in C++!
Template Method-based solution class diagram
The following UML class diagram illustrates the Template Method-based solution, focusing on the core concepts:
In the pattern, the following principles help separate the "unchanging" and "changing" parts of the code (it's worth understanding these concepts, based on the above class diagram and applying them to our example):
- The "common/unchanging" parts are placed in a base class.
- We introduce extension points in this base class using abstract/virtual methods, which will be called in the extension points.
- The case-dependent implementation of these methods goes into the derived classes.
The "trick" is that when the base class calls these abstract/virtual methods, the case-dependent code in the derived classes gets executed.
Next, we will refactor the previous enum
and if
/switch
based solution to follow the Template Method pattern (and there will be no enum
anymore). We will introduce a base class and two algorithm-dependent derived classes.
Let's proceed with the following steps to refactor the code accordingly. The Visual Studio solution in the "3-TemplateMethod" folder, within the "TemplateMethod-0-Begin" project, contains the previous solution's code (a "copy" of it), which is where we will work:
- Rename the
Anonymizer
class toAnonymizerBase
(e.g., right-click the class name in the source file and press F2 to rename). - Add two new classes to the project:
NameMaskingAnonymizer
andAgeAnonymizer
(right-click on the project, select Add > Class). - Derive
NameMaskingAnonymizer
andAgeAnonymizer
fromAnonymizerBase
. -
Move the following parts from
AnonymizerBase
toNameMaskingAnonymizer
:- The
_mask
field. - The constructor with parameters
string inputFileName, string mask
, renaming it toNameMaskingAnonymizer
, and:- Removing the line
_anonymizerMode = AnonymizerMode.Name;
. -
Replacing the
this
constructor call with abase
constructor call.Constructor code
public NameMaskingAnonymizer(string inputFileName, string mask): base(inputFileName) { _mask = mask; }
- Removing the line
- The
-
Move the relevant parts from
AnonymizerBase
toAgeAnonymizer
:- The
_rangeSize
member variable. - The constructor with parameters
string inputFileName, int rangeSize
, renamed toAgeAnonymizer
:- Remove the line
_anonymizerMode = AnonymizerMode.Age;
. -
Replace the
this
constructor call with abase
constructor call.Constructor code
public AgeAnonymizer(string inputFileName, int rangeSize): base(inputFileName) { _rangeSize = rangeSize; }
- Remove the line
- The
-
In the
AnonymizerBase
class:- Delete the
AnonymizerMode
enum type. - Delete the
_anonymizerMode
member.
- Delete the
Identify the parts of the logic that are case-dependent and should not be hard-coded into the reusable AnonymizerBase
class:
- One is
Anonymize_MaskName
/Anonymize_AgeRange
, - The other is
GetAnonymizerDescription
.
Following the Template Method pattern, introduce abstract (or optionally virtual) methods in the base class for these, and call them from within the base class. The case-specific implementations should be placed in the derived classes using override
.
- Mark the
AnonymizerBase
class as abstract (add theabstract
keyword beforeclass
). -
In
AnonymizerBase
, introduce the following abstract method:protected abstract Person Anonymize(Person person);
This method will be responsible for performing the anonymization.
-
Move the
Anonymize_MaskName
method to theNameMaskingAnonymizer
class and modify its signature so that it overrides the abstractAnonymize
method defined in the base class:protected override Person Anonymize(Person person) { return new Person(_mask, _mask, person.CompanyName, person.Address, person.City, person.State, person.Age, person.Weight, person.Decease); }
The body of the function only needs to be modified so that it uses the
_mask
member variable instead of the removedmask
parameter. -
In a completely analogous way to the previous step, move the
Anonymize_AgeRange
method to theAgeAnonymizer
class, and modify its signature so that it overrides theAnonymize
abstract function in the base class:protected override Person Anonymize(Person person) { ... }
The body of the function only needs to be modified so that it uses the
_rangeSize
member variable instead of the removedrangeSize
parameter. -
In the
Run
function of theAnonymizerBase
class, we can now replace theAnonymize
calls in theif
/else
expression with a simple call to the abstract function.Replace:
Person person; if (_anonymizerMode == AnonymizerMode.Name) person = Anonymize_MaskName(persons[i], _mask); else if (_anonymizerMode == AnonymizerMode.Age) person = Anonymize_AgeRange(persons[i], _rangeSize); else throw new NotSupportedException("The requested anonymization mode is not supported.");
with:
var person = Anonymize(persons[i]);
We have completed one of our extension points. However, there is still one remaining, the GetAnonymizerDescription
, which is also case-dependent. Its transformation is very similar to the previous series of steps:
-
Copy the
GetAnonymizerDescription
method from theAnonymizerBase
class to theNameMaskingAnonymizer
, including theoverride
keyword in the signature, keeping only the logic relevant toNameMaskingAnonymizer
in the method body:protected override string GetAnonymizerDescription() { return $"NameMasking anonymizer with mask {_mask}"; }
-
Copy the
GetAnonymizerDescription
method fromAnonymizerBase
to theAgeAnonymizer
, including theoverride
keyword in the signature, keeping only the logic relevant toAgeAnonymizer
in the method body:protected override string GetAnonymizerDescription() { return $"Age anonymizer with range size {_rangeSize}"; }
-
The question is what to do with the
GetAnonymizerDescription
method inAnonymizerBase
. We will make it a virtual method, not abstract, since we can provide a meaningful default behavior here: simply return the class name (which would be "NameMaskingAnonymizer" for theNameMaskingAnonymizer
class, for example). With this, we also get rid of the rigidswitch
structure:protected virtual string GetAnonymizerDescription() { return GetType().Name; }
Reflection
The
GetType()
method, inherited from theobject
base class, returns aType
object for our class. This is part of reflection, a topic we will study in more detail in the lecture at the end of the semester.
There is only one thing left: in the Program.cs
Main
method, we now try to instantiate the AnonymizerBase
base class (due to the previous renaming). Instead, we should instantiate one of the two derived classes. For example:
NameMaskingAnonymizer anonymizer = new("us-500.csv", "***");
anonymizer.Run();
We are done. Let's test it to better "feel" if the extension points truly work (but if we are short on time during the lab, this is not particularly important, as we've done similar things in previous semesters in C++/Java context):
- In Visual Studio, set the TemplateMethod-0-Begin project as the startup project if we haven't already.
- Set a breakpoint on the
var person = Anonymize(persons[i]);
line in theAnonymizerBase
class. - When the debugger stops at this point during runtime, press
F11
to step into it. - You will observe that the
AgeAnonymizer
subclass method is called.
Let's take a look at the solution's class diagram:
The solution of our work can be found in the 3-TemplateMethod/TemplateMethod-1
project, in case you need it.
Why is the pattern called Template Method? *
The pattern is named Template Method because, using our application as an example, the Run
and PrintSummary
are "template methods" that define a skeleton logic, a framework, in which certain steps are left undefined. We leave the "code" for these steps to abstract/virtual functions, and the derived classes define their implementation.
Evaluation of the solution¶
Let's check if the solution meets our goals:
- The
AnonymizerBase
class has become more reusable. - If we need a new anonymization logic in the future, we simply derive from it. This is an extension, not a modification.
- Accordingly, the OPEN/CLOSED principle is fulfilled, meaning we can customize and extend the logic at the two points defined in the base class without modifying its code.
Should every method in our class be extensible?
Note that we didn't make every method in AnonymizerBase
virtual (thus not making the class extensible at every point). We only made them virtual where we believe future logic extension might be necessary.
Solution 5 (3-TemplateMethod/TemplateMethod-2-Progress)¶
Let's say a new, relatively simple requirement arises:
-
For the
NameMaskingAnonymizer
, we keep the previous simple progress display (after every row, we print out which row we are currently processing), -
However, for the
AgeAnonymizer
, the progress display needs to be different: we need to show, updated after every row, the percentage of processing completed.
The solution is very simple: by applying the Template Method pattern more broadly in the Run
method, we introduce an extension point for the progress display, and delegate the implementation to a virtual function.
Let's jump straight to the completed solution (3-TemplateMethod/TemplateMethod-2-Progress project):
- In the
AnonymizerBase
class, a newPrintProgress
virtual function (by default, it doesn't print anything) - In
Run
, a call to this function - Implementations (override) in
NameMaskingAnonymizer
andAgeAnonymizer
as needed
Currently, there is no significant learning to be gained from this, but in the next step, there will be.
Solution 6 (3-TemplateMethod/TemplateMethod-3-ProgressMultiple)¶
A new - and entirely logical - requirement has emerged: in the future, any anonymization algorithm should be usable with any progress display. Currently, this means four cross-combinations:
Anonymizer | Progress |
---|---|
Name anonymizer | Simple progress |
Name anonymizer | Percentage progress |
Age anonymizer | Simple progress |
Age anonymizer | Percentage progress |
Let's jump to the completed solution (3-TemplateMethod/TemplateMethod-3-ProgressMultiple project). Instead of code, open the Main.cd
class diagram in the project, and let's review the solution based on that (or we can view the diagram below in the guide).
It’s noticeable that something is "wrong" here, as we had to create a separate subclass for each cross-combination. To reduce code duplication, additional intermediate classes are also present in the hierarchy. Moreover:
- If a new anonymization algorithm is introduced in the future, we will need to write at least as many new classes as there are progress types supported.
- If a new progress type is introduced in the future, we will need to write at least as many new classes as there are anonymizer types supported.
What caused the problem? The fact that the behavior of our class needs to be extendable along multiple dimensions/aspects (in our case, anonymization and progress), and these need to be supported in many cross-combinations. If we had to add more aspects (e.g., reading methods, output generation), the problem would grow exponentially. In such cases, the Template Method design pattern is not applicable.
Solution 7 (4-Strategy/Strategy-1)¶
In this step, we will use the Strategy design pattern to make our initial solution extendable at the necessary points. The pattern separates the "unchanging/reusable" and "changing" parts based on the following principles:
- The "common/unchanging" parts are placed in a specific class (but this will not be a "base class").
- Unlike Template Method, we will use composition (containment) rather than inheritance: we delegate the implementation of behavior in the extension points to other objects contained as interfaces (rather than abstract/virtual functions).
- We do this for every aspect/dimension of the class behavior that we want to make replaceable/extensible, independently. As we will see, this avoids the combinatorial explosion we experienced in the previous chapter.
This is much simpler in practice than it may seem when described (we have already used it several times in our previous studies). Let's understand it in the context of our example.
Below, let's take a look at the class diagram illustrating the Strategy-based solution (building on the explanation that follows the diagram).
Strategy-based solution class diagram
The following UML class diagram illustrates the Strategy-based solution, focusing on the key aspects:
The first step in applying the Strategy pattern is to determine how many different aspects of the class behavior we want to make extendable. In our example, at least for now, there are two:
- Behavior related to anonymization, which consists of two operations:
- Anonymization logic
- Defining the description of the anonymization logic (producing the description string)
- Progress handling, which consists of one operation:
- Displaying progress
The hard part is done, from here on we can basically work mechanically following the Strategy pattern:
- For each of the above aspects, we need to introduce a strategy interface with the operations defined above, and create the corresponding implementations.
- In the
Anonymizer
class, we need to introduce a strategy interface member variable, and in the extension points, use the currently set strategy implementation objects through these member variables.
These elements are already present in the class diagram above. Now, let's move on to the code. Our starting environment is in the "Strategy-0-Begin" project in the "4-Strategy" folder, let's work in it. This is the same solution that uses enums, which we used as a starting point for the Template Method pattern as well.
Anonymization strategy¶
We start with handling the anonymization strategy/aspect. Let's introduce the corresponding interface:
- Create a folder named
AnonymizerAlgorithms
in the project (right-click on the "Strategy-0-Begin" project, then select Add/New Folder from the menu). In the following steps, let's place each interface and class into a separate source file, according to its name. -
Add in this folder an
IAnonymizerAlgorithm
interface with the following code:IAnonymizerAlgorithm.cspublic interface IAnonymizerAlgorithm { Person Anonymize(Person person); string GetAnonymizerDescription() => GetType().Name; }
We can also observe that for the
GetAnonymizerDescription
method, in modern C#, we can provide default implementations for interface methods if we wish.
Now let's implement the name anonymization strategy (i.e., a strategy implementation for name anonymization).
- Create a class
NameMaskingAnonymizerAlgorithm
in the same folder. - Move the
_mask
field from theAnonymizer
class to theNameMaskingAnonymizerAlgorithm
. -
Add the following constructor to
NameMaskingAnonymizerAlgorithm
:public NameMaskingAnonymizerAlgorithm(string mask) { _mask = mask; }
-
Implement the
IAnonymizerAlgorithm
interface. After adding: IAnonymizerAlgorithm
to the class name, it is advisable to use Visual Studio's code generation feature for the method skeletons: place the cursor on the interface name (click on it in the source code), press 'ctrl' + '.' and from the menu, choose "Implement interface". Note: since there is a default implementation forGetAnonymizerDescription
in the interface, only theAnonymize
method will be generated, which is fine for now. - Move the body of the
Anonymize_MaskName
method from theAnonymizer
class to theAnonymize
method inNameMaskingAnonymizerAlgorithm
. The only change in the method body is to use the_mask
field instead of the now non-existentmask
parameter. Also, delete theAnonymize_MaskName
method in theAnonymizer
class. -
Finally, let's implement the
GetAnonymizerDescription
method of the strategy interface. Copy theGetAnonymizerDescription
method from theAnonymizer
class to theNameMaskingAnonymizerAlgorithm
, and keep only the logic relevant to the name anonymizer, making the method public:public string GetAnonymizerDescription() { return $"NameMasking anonymizer with mask {_mask}"; }
-
Our strategy implementation for name anonymization is completed, its full code is the following:
NameMaskingAnonymizerAlgorithm.cspublic class NameMaskingAnonymizerAlgorithm: IAnonymizerAlgorithm { private readonly string _mask; public NameMaskingAnonymizerAlgorithm(string mask) { _mask = mask; } public Person Anonymize(Person person) { return new Person(_mask, _mask, person.CompanyName, person.Address, person.City, person.State, person.Age, person.Weight, person.Decease); } public string GetAnonymizerDescription() { return $"NameMasking anonymizer with mask {_mask}"; } }
In the next step, we will create the implementation of our IAnonymizerAlgorithm
strategy interface related to age anonymization.
- Add an
AgeAnonymizerAlgorithm
class in the same folder (AnonymizerAlgorithms). - Move the relevant
_rangeSize
field from theAnonymizer
class intoAgeAnonymizerAlgorithm
. -
Add the following constructor to
AgeAnonymizerAlgorithm
:public AgeAnonymizerAlgorithm(int rangeSize) { _rangeSize = rangeSize; }
-
Implement the
IAnonymizerAlgorithm
interface. After adding: IAnonymizerAlgorithm
after the class name, it's again recommended to generate the method skeletons forAnonymize
using Visual Studio, similar to the previous case. -
Move the body of the
Anonymize_AgeRange
method from theAnonymizer
class into theAnonymize
method ofAgeAnonymizerAlgorithm
. In the method body, simply replace the now-nonexistentrangeSize
parameter with the_rangeSize
field. Afterwards, delete theAnonymize_AgeRange
method from theAnonymizer
class. -
Now implement the
GetAnonymizerDescription
method from the strategy interface. Copy the implementation ofGetAnonymizerDescription
from theAnonymizer
class intoAgeAnonymizerAlgorithm
, but only keep the logic related to age anonymization, and make the method public:public string GetAnonymizerDescription() { return $"Age anonymizer with range size {_rangeSize}"; }
-
Our strategy implementation for age anonymization is completed, its full code is the following:
AgeAnonymizerAlgorithm.cspublic class AgeAnonymizerAlgorithm: IAnonymizerAlgorithm { private readonly int _rangeSize; public AgeAnonymizerAlgorithm(int rangeSize) { _rangeSize = rangeSize; } public Person Anonymize(Person person) { // This is whole number integer arithmetic, e.g. for 55 / 20 we get 2 int rangeIndex = int.Parse(person.Age) / _rangeSize; string newAge = $"{rangeIndex * _rangeSize}.. {(rangeIndex + 1) * _rangeSize}"; return new Person(person.FirstName, person.LastName, person.CompanyName, person.Address, person.City, person.State, newAge, person.Weight, person.Decease); } public string GetAnonymizerDescription() { return $"Age anonymizer with range size {_rangeSize}"; } }
Make sure to note that the interface and its implementations deal exclusively with anonymization logic — no other logic (e.g., progress handling) is included here!
Progress strategy¶
In the next step, let’s introduce the interface and implementations related to progress handling:
-
Create a folder named
Progresses
in the project. In the next steps, place each interface and class into a separate source file, named accordingly, as usual. -
Add an
IProgress
interface in this folder with the following code:Solution
IProgress.cspublic interface IProgress { void Report(int count, int index); }
-
Add the simple progress implementaion of this interface in the same folder. The implementation is "derived" from the
PrintProgress
method in theAnonymizer
class:Solution
SimpleProgress.cspublic class SimpleProgress : IProgress { public void Report(int count, int index) { Console.WriteLine($"{index + 1}. person processed"); } }
-
Add the percentage-based progress implementation of this interface in the same folder. We won’t go into the details of the code logic. There's no direct equivalent for this in the
Anonymizer
class, as it was only introduced in our template method–based solution (we didn’t inspect that code, but this is essentially its core idea):Solution
PercentProgress.cspublic class PercentProgress : IProgress { public void Report(int count, int index) { int percentage = (int)((double)(index + 1) / count * 100); Console.Write($"\rProcessing: {percentage} %"); if (index == count - 1) Console.WriteLine(); } }
Make sure to note that the interface and its implementations deal exclusively with progress handling — no other logic (e.g., anonymization) is included here!
Applying the strategies¶
The next important step is to make the anonymizer base class reusable and extensible using the strategies we’ve just introduced. In the Anonymizer.cs
file:
- Remove the following:
- The
AnonymizerMode
enum type. -
The
_anonymizerMode
field (as well as_mask
and_rangeSize
, if they still remain). -
Introduce strategy interface-type fields:
private readonly IProgress _progress; private readonly IAnonymizerAlgorithm _anonymizerAlgorithm;
-
Add the appropriate using directives to the top of the file:
using Lab_Extensibility.AnonymizerAlgorithms; using Lab_Extensibility.Progresses;
-
The initial values of
_progress
and_anonymizerAlgorithm
introduced in the previous section are null, set these references in the constructor to the implementation that suits our needs. For example:public Anonymizer(string inputFileName, string mask) : this(inputFileName) { _progress = new PercentProgress(); _anonymizerAlgorithm = new NameMaskingAnonymizerAlgorithm(mask); } public Anonymizer(string inputFileName, int rangeSize) : this(inputFileName) { _progress = new PercentProgress(); _anonymizerAlgorithm = new AgeAnonymizerAlgorithm(rangeSize); }
In the Anonymizer
class, delegate the currently hardcoded but anonymization-dependent logic to the strategy implementation referenced by _anonymizerAlgorithm
.
-
Delegate the
Anonymize
calls in theif
/else
expressions in theRun
function of theAnonymizer
class, to the_anonymizerAlgorithm
object.Replace:
Person person; if (_anonymizerMode == AnonymizerMode.Name) person = Anonymize_MaskName(persons[i], _mask); else if (_anonymizerMode == AnonymizerMode.Age) person = Anonymize_AgeRange(persons[i], _rangeSize); else throw new NotSupportedException("The requested anonymization mode is not supported.");
with:
Person person = _anonymizerAlgorithm.Anonymize(persons[i]);
-
If we haven't done so already, delete the
Anonymize_MaskName
andAnonymize_AgeRange
functions, since their code has already been moved into the corresponding strategy implementations, detached from the class. -
Our
PrintSummary
function currently calls the rigid,switch
-basedGetAnonymizerDescription
. Replace thisGetAnonymizerDescription
call by delegating it to the_anonymizerAlgorithm
object. In thePrintSummary
function (highlighting only the relevant part):Instead of:
... GetAnonymizerDescription() ...
Use:
... _anonymizerAlgorithm.GetAnonymizerDescription() ...
A few lines below, also delete the
GetAnonymizerDescription
function from the class (its code has already been moved into the appropriate strategy implementations).
The last step is to replace the progress handling hard-coded in the Anonymizer
class:
-
Here, delegate the request to the previously introduced
_progress
object. In theRun
function, replace one line as follows:Instead of:
PrintProgress(i);
Use:
_progress.Report(persons.Count, i);
-
Delete the
PrintProgress
function, as its code has already been moved to a suitable strategy implementation, detached from the class.
We are done. The completed solution can be found in the "4-Strategy/Strategy-1" project (if we are stuck somewhere or the code doesn't compile, we can compare it with this).
Evaluation of the solution¶
We are finished with the introduction of the strategy pattern. However, in its current form, it is almost never used. Let's check our solution: is it truly reusable, and is it possible to change the anonymizer algorithm or progress handling without modifying the Anonymizer
class? To do this, we need to examine whether there is any code in the class that is implementation-dependent.
Unfortunately, we can find such code. The constructor is hard-coded to create specific algorithm and progress implementations. We must check this in the code! If we want to change the algorithm or progress mode, we will have to modify the type after the new
operator in these lines, which means modifying the class itself.
Many — quite justifiably — do not consider this a true strategy-based solution in its current form. We will implement the complete solution in the next step.
Solution 8 (4-Strategy/Strategy-2-DI)¶
Dependency Injection (DI)
The solution is applying Dependency Injection (DI). The idea is that not the class itself instantiates its behavioural dependencies (these are the strategy implementations), but we pass them to it from outside, e.g. in constructor parameters, or even in the form of properties or setter operations. Referenced as interface types, of course!
Let's refactor the Anonymizer
class accordingly, so that it doesn't instantiate the strategy implementations itself, but instead receives them as constructor parameters:
- Delete all three constructors.
-
Add the following constructor:
public Anonymizer(string inputFileName, IAnonymizerAlgorithm anonymizerAlgorithm, IProgress progress = null) { ArgumentException.ThrowIfNullOrEmpty(inputFileName); ArgumentNullException.ThrowIfNull(anonymizerAlgorithm); _inputFileName = inputFileName; _anonymizerAlgorithm = anonymizerAlgorithm; _progress = progress; }
As we can see, providing the
progress
parameter is optional, as the class user may not be interested in any progress information. -
Since the
_progress
strategy can be null, we need to introduce a null check during its usage. Instead of using the.
operator, we will use the?.
operator:_progress?.Report(persons.Count, i);
-
Now we are done, the
Anonymizer
class has become completely independent of the strategy implementations. We can use theAnonymizer
class with any combination of anonymizer algorithms and progress handling (without modifying the class itself). Let's create threeAnonymizer
instances with different combinations in theMain
method of theProgram.cs
file (make sure to delete the existing code in theMain
method before):Anonymizer p1 = new("us-500.csv", new NameMaskingAnonymizerAlgorithm("***"), new SimpleProgress()); p1.Run(); Console.WriteLine("--------------------"); Anonymizer p2 = new("us-500.csv", new NameMaskingAnonymizerAlgorithm("***"), new PercentProgress()); p2.Run(); Console.WriteLine("--------------------"); Anonymizer p3 = new("us-500.csv", new AgeAnonymizerAlgorithm(20), new SimpleProgress()); p3.Run();
-
To ensure the code compiles, add the necessary
using
statements at the top of the file:using Lab_Extensibility.AnonymizerAlgorithms; using Lab_Extensibility.Progresses;
We are done, and the complete solution can be found in the "4-Strategy/Strategy-2-DI" project (if you encounter issues or the code doesn't compile, you can compare it with this project).
Checking the functionality
During the laboratory, there might not be time for this, but for anyone unsure about "why the strategy pattern works" and how the behavior will differ for the four cases above: it is recommended to set breakpoints in the Program.cs
file for the four Run
method calls and step through the functions in the debugger to verify that the correct strategy implementation is invoked.
An object diagram (Main.cd
) is included in the project, where the final solution can also be reviewed:
Strategy-based solution class diagram
The following UML class diagram illustrates our strategy-based solution:
Evaluation of the solution¶
Let's check whether the solution meets our goals:
- The
Anonymizer
class has become more reusable. - If a new anonymization logic is needed in the future, only a new
IAnonymizerAlgorithm
implementation needs to be introduced. This is an extension, not a modification. - If a new progress logic is required in the future, only a new
IProgress
implementation needs to be introduced. This is an extension, not a modification. - Both of the above points adhere to the OPEN/CLOSED principle, meaning that we can customize and extend the logic of
Anonymizer
without modifying its code. - There is no risk of the combinatorial explosion seen in the Template Method pattern: any
IAnonymizerAlgorithm
implementation can be conveniently used with anyIProgress
implementation, and we do not need to introduce new classes for each combination (this was demonstrated in theProgram.cs
file).
Additional Strategy advantages over Template Method *
- The behavior can be changed at runtime. If there is a need to change the anonymization or progress behavior after an
Anonymizer
object has been created, it can easily be done (we could simply introduceSetAnonymizerAlgorithm
andSetProgress
methods that would set the strategy used by the class to the implementation passed in as a parameter). - Supports unit testing (this is not covered in this laboratory).