Wednesday, July 21, 2004
Too self-assure, too self-centric
Does this dialog sounds familiar to you?
Me: The guidelines are so and so. Simply use it in this way.
He: But my solution works too...
Me: It doesn't matter whether it works or not, the others would have troubles understanding your code.
He: It's not very complex. Others can understand it.
Me: Why makes things more complex as they can be, especially if you go against the guidelines? In a year, you will not be able to understand this part yourself.
He: Bullshit, of course, I will!
Me: The guidelines are so and so. Simply use it in this way.
He: But my solution works too...
Me: It doesn't matter whether it works or not, the others would have troubles understanding your code.
He: It's not very complex. Others can understand it.
Me: Why makes things more complex as they can be, especially if you go against the guidelines? In a year, you will not be able to understand this part yourself.
He: Bullshit, of course, I will!
Wednesday, July 07, 2004
Lost In Estimation
Yesterday we were discussing a proposal of one developer, to introduce a tool which would check our sources on conformance to our corporate code-style guides. Our projects are very big, so the importance of the single code-style is high. But with such decisions, one should always consider the business point of view: will it bring a profit to the company or not.
Out of my experience, quite few software developers even think about it. And if you start to, you’ll face a big problem: in the software development, it is incredibly hard to get reasonable estimates to the costs and benefits. Let’s use the proposal I’ve described above as an example:
Benefits
What benefits would bring us an automated code-style check? Everyone agrees, that code-style guides is a good thing. Having met a chunk of text in the properly styled source code, the developer will immediately understand, what is it (variable, method, class), to which part of the project it belongs and what type it has. The consistent brackets placement increases readability. And more and more.
Now how can we estimate the benefits of the applying the style guides at least in man/hours (Bt)? Just no way. Common approaches like “let’s say we have 1000 developers and each developer would save 1 second a day while working with “more styled” code” in this case are nothing more than just guessing.
Costs
Let’s go to the costs. Sure, we would have initial costs (C0) of buying or developing such a tool. The rest would be our variable costs (Cv) which consist of the following:
Modification costs (Cm, man/hours). If the tool will find inconsistencies in your code, you have to modify it. It takes time, first, because when you change a name of the variable in one place, you have to change it everywhere where it is used. There are additional time costs which are usually being underestimated: all code analysis tools usually provide their output simply in a text file. That means you should read this file, acquire the file name and the line number, open that file in your editor, find the line and make modification. It is an incredibly ineffective and boring process, and it also takes a lot of time.
Re-testing costs (Crt, man/hours). Supposing our source have already been tested. By making any modification we create a probability of error. So normally, the code should be tested again. Which of course also takes time.
Decision
What we get so far? In order to make a decision, we should solve the following (S is a hourly salary of the developer):
(Bt * S) must be more than (C0 + S * (Cm + Crt) )
But unfortunately the only value we can be sure about, it C0. Definitely, we all need some proved methods to make better estimates.
Out of my experience, quite few software developers even think about it. And if you start to, you’ll face a big problem: in the software development, it is incredibly hard to get reasonable estimates to the costs and benefits. Let’s use the proposal I’ve described above as an example:
Benefits
What benefits would bring us an automated code-style check? Everyone agrees, that code-style guides is a good thing. Having met a chunk of text in the properly styled source code, the developer will immediately understand, what is it (variable, method, class), to which part of the project it belongs and what type it has. The consistent brackets placement increases readability. And more and more.
Now how can we estimate the benefits of the applying the style guides at least in man/hours (Bt)? Just no way. Common approaches like “let’s say we have 1000 developers and each developer would save 1 second a day while working with “more styled” code” in this case are nothing more than just guessing.
Costs
Let’s go to the costs. Sure, we would have initial costs (C0) of buying or developing such a tool. The rest would be our variable costs (Cv) which consist of the following:
Modification costs (Cm, man/hours). If the tool will find inconsistencies in your code, you have to modify it. It takes time, first, because when you change a name of the variable in one place, you have to change it everywhere where it is used. There are additional time costs which are usually being underestimated: all code analysis tools usually provide their output simply in a text file. That means you should read this file, acquire the file name and the line number, open that file in your editor, find the line and make modification. It is an incredibly ineffective and boring process, and it also takes a lot of time.
Re-testing costs (Crt, man/hours). Supposing our source have already been tested. By making any modification we create a probability of error. So normally, the code should be tested again. Which of course also takes time.
Decision
What we get so far? In order to make a decision, we should solve the following (S is a hourly salary of the developer):
(Bt * S) must be more than (C0 + S * (Cm + Crt) )
But unfortunately the only value we can be sure about, it C0. Definitely, we all need some proved methods to make better estimates.
Thursday, July 01, 2004
Command-line switches guidelines
Dear command-line-utilities-writers, please do not forget, that the length of the command line is not limited for a long time already. Please think about your colleagues that have to use your tools and don't encrypt the names of the tool's parameters. I'm tired of seeing switches like /A, -E++, /zqwgy and /buf.
Switches usually are for controlling something inside the tool. But users of your tool would likely be unfamiliar with the stuff inside it, isn't it? That means they cannot guess the names of the switches, so they have to read manual first (if there is one). Bad.
Further, even if the users know what the tool supposed to do and so they can guess the parameters, they cannot simply use the term as the command line switch, because the tool developer have encrypted to one character or incomprehensible abbreviation. What means: again to the manual. Also bad.
Ok, they have read the manual and now they have to use this tool. Should be everything Ok? Nope. They still cannot use the tool effectively without manual, because they always forget, how the particular command is abbreviated. The developers always have to remember thousand things anyway and there is no logical mapping between the term and the command line switch, it would be much harder to remember. Too bad.
So here are my guidelines the command line switch naming:
Use long descriptive names. That is exactly the same thing like with naming the variables and classes. If your tool has a buffer inside, instead of '/b:10' or even '/buf:10' use something like '/buffersize:10'. But be aware, if you have more than one buffer, be more specific. Say, '/incomingbuffersize' and '/outgoingbuffersize'.
For setting up some values, use value description as a noun. See the example above - '/buffer' is too abstract, '/buffersize' is right.
For ambiguous value measurement units, include them in the name. Thus, in the example above, '/buffersizeKb:10' would be even better.
For setting up the behavior, use verbs. Examples: '/showdelays' and '/logtofile'.
Be consistent in naming. Develop your own naming conventions for your tool or even better for the whole bunch of your tools, and follow them. Should the name start with an '/' or with '-', have first capital letters or not - it is less important, essentially it is consistent.
Provide built-in manual. Do not force the users to search it somewhere. Let your tool show all available options when being started without parameters or with any of these switches: '/?', '/h', 'help' - the more is the better.
Switches usually are for controlling something inside the tool. But users of your tool would likely be unfamiliar with the stuff inside it, isn't it? That means they cannot guess the names of the switches, so they have to read manual first (if there is one). Bad.
Further, even if the users know what the tool supposed to do and so they can guess the parameters, they cannot simply use the term as the command line switch, because the tool developer have encrypted to one character or incomprehensible abbreviation. What means: again to the manual. Also bad.
Ok, they have read the manual and now they have to use this tool. Should be everything Ok? Nope. They still cannot use the tool effectively without manual, because they always forget, how the particular command is abbreviated. The developers always have to remember thousand things anyway and there is no logical mapping between the term and the command line switch, it would be much harder to remember. Too bad.
So here are my guidelines the command line switch naming:
Use long descriptive names. That is exactly the same thing like with naming the variables and classes. If your tool has a buffer inside, instead of '/b:10' or even '/buf:10' use something like '/buffersize:10'. But be aware, if you have more than one buffer, be more specific. Say, '/incomingbuffersize' and '/outgoingbuffersize'.
For setting up some values, use value description as a noun. See the example above - '/buffer' is too abstract, '/buffersize' is right.
For ambiguous value measurement units, include them in the name. Thus, in the example above, '/buffersizeKb:10' would be even better.
For setting up the behavior, use verbs. Examples: '/showdelays' and '/logtofile'.
Be consistent in naming. Develop your own naming conventions for your tool or even better for the whole bunch of your tools, and follow them. Should the name start with an '/' or with '-', have first capital letters or not - it is less important, essentially it is consistent.
Provide built-in manual. Do not force the users to search it somewhere. Let your tool show all available options when being started without parameters or with any of these switches: '/?', '/h', 'help' - the more is the better.
Games from Within: Physical Structure and C++ - Part 1: A First Look
I have found a great, not, a briliant article today:
Games from Within: Physical Structure and C++ - Part 1: A First Look
It covers the issues which are only scarcely discussed everywhere: physical software structure, dealing with includes, reducing dependencies, measuring compile-times and so on. Strongly recommended reading!
Games from Within: Physical Structure and C++ - Part 1: A First Look
It covers the issues which are only scarcely discussed everywhere: physical software structure, dealing with includes, reducing dependencies, measuring compile-times and so on. Strongly recommended reading!
Wednesday, June 30, 2004
Managing Dependencies
As the size of the software project grows, the importance of dependencies between source files grows, too. Aside from architectural issues, dependencies are harmful in the way that they increase time, needed for compilation. For example in our company, the full compilation can take up to 8 hours (I'm not kidding, and be sure we are not using 286 processors). So, you cannot avoid dependencies at all, but it makes pretty much sense, to keep their count optimal.
In order to manage those dependencies, we have to know them first. Unfortunately, there are currently nearly a zero tools, that could help us.
Consider visual tools that draw dependency graphs: they are merely useless. All you'd get from them is just a neat picture. What would you do with it? "Oh! Look, there is a bunch of dependencies over there!" Later, after some time looking into the source code: "Hmm, but they all are needed... ". And the more complex the project is, the more incomprehensible becomes the dependency graph.
On the other pole there are "invisible tools", like makedepend. They silently scan all sources, preparing for the compilation only those files, that were changed, according to dependencies. They do know a lot about dependencies, but they are not going to share this knowledge with us.
What we need is the tool which would show us, exactly which dependencies were involved during the last build, possibly with the time spent for compilation of each dependency. This simple thing would give the developer information to think about. If he is wondering, why the compilation took an hour, although he has only changed one little file, he could see, what was compiled and trace the dependency between it and that small file.
In order to manage those dependencies, we have to know them first. Unfortunately, there are currently nearly a zero tools, that could help us.
Consider visual tools that draw dependency graphs: they are merely useless. All you'd get from them is just a neat picture. What would you do with it? "Oh! Look, there is a bunch of dependencies over there!" Later, after some time looking into the source code: "Hmm, but they all are needed... ". And the more complex the project is, the more incomprehensible becomes the dependency graph.
On the other pole there are "invisible tools", like makedepend. They silently scan all sources, preparing for the compilation only those files, that were changed, according to dependencies. They do know a lot about dependencies, but they are not going to share this knowledge with us.
What we need is the tool which would show us, exactly which dependencies were involved during the last build, possibly with the time spent for compilation of each dependency. This simple thing would give the developer information to think about. If he is wondering, why the compilation took an hour, although he has only changed one little file, he could see, what was compiled and trace the dependency between it and that small file.
Tuesday, June 22, 2004
To abstract or not to abstract ...
Ask yourself, what factor plays the most important role when you're making some decision in the software development process? If you'd be honest, it will be your personal preference. And the more experienced you are, the more arguments you can give to support what your like or dislike.
I was discussing one solution with my colleague. We had an implementation of a cyclic buffer for binary packets, which are stored in the buffer like [ContentSize][Content]. He was speaking about to replace this implementation with a stream, i.e. a class which has only read() and write() methods. He was stating that "it would now be easier to understand what this class do".
In reality, exactly the opposite is true: by moving towards more abstraction, you inevitably lose details, which provide hint to what the class really does. You can have all classes as "streams" in your system, but then you always have to figure out, which stream does exactly what.
It is not like abstracting is bad, not at all. Too abstract is bad, too many details is also bad. The best solution is always somewhere between. But my colleague seemed simply to like his idea and my arguments just meant mothing for him.
In this case, there were also another hidden consequences: a stream does not have format by definition. That means each client class of this new CCyclicStream must decipher the stream according to current format (size comes first, content second). So due to a "code purity" we make life of other people, that would have to implement all this stuff, harder. And I'm not even speaking about what happens, if the format will be changed ...
I was discussing one solution with my colleague. We had an implementation of a cyclic buffer for binary packets, which are stored in the buffer like [ContentSize][Content]. He was speaking about to replace this implementation with a stream, i.e. a class which has only read() and write() methods. He was stating that "it would now be easier to understand what this class do".
In reality, exactly the opposite is true: by moving towards more abstraction, you inevitably lose details, which provide hint to what the class really does. You can have all classes as "streams" in your system, but then you always have to figure out, which stream does exactly what.
It is not like abstracting is bad, not at all. Too abstract is bad, too many details is also bad. The best solution is always somewhere between. But my colleague seemed simply to like his idea and my arguments just meant mothing for him.
In this case, there were also another hidden consequences: a stream does not have format by definition. That means each client class of this new CCyclicStream must decipher the stream according to current format (size comes first, content second). So due to a "code purity" we make life of other people, that would have to implement all this stuff, harder. And I'm not even speaking about what happens, if the format will be changed ...
Is everydoby in? The ceremony is about to begin ...
This blog was created solely to stack up my thoughts, critique and observations of the modern software industry. I'm currently working in a large software company and I encounter related issues every day.