How to make fewer errors at the stage of code writing. Part N1

How to make fewer

code writing.

Author: Andrey Karpov

Date: 09.03.2011

Abstract I've arrived at the source code of

plugins, this is a rather large project whose size is

any other considerable project with a long development history, it has

Introduction

While examining defects in various

found in Miranda IM, I will try to formulate some recommendations that will help you to avoid many

errors and misprints already at the stage of code writing.

I used the PVS-Studio 4.14 analyzer to

and its popularity just confirms this fact. I am using this

complaints about its quality. The project is built in Visual Studio with the Warning Level 3 (

the amount of comments makes

1. Avoid functions memsetI will start with errors that occur when using low

memcpy, ZeroMemory and the like

I recommend you to avoid these functions by all means. Sure, you do not have to follow this tip literally

and replace all these functions with loops. But I have seen so many errors related to using these

functions that I strongly advise you to be very careful with them and use them only when it is really

necessary. In my opinion, there are only two cases when

1) Processing of large arrays, i.

algorithm, as compared to simple looping.

2) Processing large number of

In all the other cases, you'd better try to do without them. For instance, I

are unnecessary in such a program as Miranda. There are no resource

arrays in it. So, using functions

code. But this simplicity is very

make fewer errors at the stage of

code writing. Part N1.

source code of a widely know instant messenger Miranda IM

this is a rather large project whose size is about 950 thousand code lines in C and C++. And

project with a long development history, it has rather many

various applications, I noticed some regularities. B


at the stage of code writing.

analyzer to check Miranda IM. The Miranda IM project

and its popularity just confirms this fact. I am using this messenger myself and do not have any

complaints about its quality. The project is built in Visual Studio with the Warning Level 3 (

makes 20% of the whole program's source.

memset, memcpy, ZeroMemory and the likeI will start with errors that occur when using low-level functions to handle memory such as

and the like.




n my opinion, there are only two cases when using these functions is grounded

, i.e. in those places where you can really benefit from an optimized function

algorithm, as compared to simple looping.

large number of small arrays. The reason for this case also lies in performance gain

d better try to do without them. For instance, I believe

are unnecessary in such a program as Miranda. There are no resource-intensive algorithms or large

functions memset/memcpy is determined only by the convenience of writing short

very deceptive and having saved a couple of seconds

errors at the stage of

Miranda IM. Together with various

code lines in C and C++. And like

rather many errors and misprints.

By the examples of defects


Miranda IM. The Miranda IM project's code is rather quality

myself and do not have any

complaints about its quality. The project is built in Visual Studio with the Warning Level 3 (/W3) while

and the like level functions to handle memory such as memset,




using these functions is grounded:

in those places where you can really benefit from an optimized function

lso lies in performance gain.

believe that these functions

intensive algorithms or large

determined only by the convenience of writing short

deceptive and having saved a couple of seconds while writing the code,

you will spend weeks to catch this elusive memory corruption error. Let's examine several code samples

taken from the Miranda IM project.

V512 A call of the 'memcpy' function will lead to a buffer overflow or underflow. tabsrmm utils.cpp 1080

typedef struct _textrangew

{

CHARRANGE chrg;

LPWSTR lpstrText;

} TEXTRANGEW;

const wchar_t* Utils::extractURLFromRichEdit(...)

{

...

::CopyMemory(tr.lpstrText, L"mailto:", 7);

...

}

Only a part of the string is copied here. The error is awfully simple yet it remains. Most likely, there was

a string earlier consisting of 'char'. Then they switched to Unicode strings but forgot to change the

constant.

If you copy strings using functions which are designed quite for this purpose, this error can never occur.

Imagine that this code sample was written this way:

strncpy(tr.lpstrText, "mailto:", 7);

Then the programmer did not have to change number 7 when switching to Unicode strings:

wcsncpy(tr.lpstrText, L"mailto:", 7);

I am not saying that this code is ideal. But it is much better than using CopyMemory. Consider another

sample.

V568 It's odd that the argument of sizeof() operator is the '& ImgIndex' expression. clist_modern

modern_extraimage.cpp 302

void ExtraImage_SetAllExtraIcons(HWND hwndList,HAND LE hContact)

{

...

char *(ImgIndex[64]);

...

memset(&ImgIndex,0,sizeof(&ImgIndex));

...

}

The programmer intended to empty the array consisting of 64 pointers here. But only the first item will

be emptied instead. The same error, by the way, can be also found in another file. Thanks to our favorite

Copy-Paste:

V568 It's odd that the argument of sizeof() operator is the '& ImgIndex' expression. clist_mw

extraimage.c 295

The correct code must look this way:

memset(&ImgIndex,0,sizeof(ImgIndex));

By the way, taking the address from the array might additionally confuse the one who is reading the

code. Taking of the address here is unreasonable and the code may be rewritten this way:

memset(ImgIndex,0,sizeof(ImgIndex));

The next sample.

V568 It's odd that the argument of sizeof() operator is the '& rowOptTA' expression. clist_modern

modern_rowtemplateopt.cpp 258

static ROWCELL* rowOptTA[100];

void rowOptAddContainer(HWND htree, HTREEITEM hti)

{

...

ZeroMemory(rowOptTA,sizeof(&rowOptTA));

...

}

Again, it is the pointer's size which is calculated instead of the array's size. The correct expression is

"sizeof(rowOptTA)". I suggest using the following code to clear the array:

const size_t ArraySize = 100;

static ROWCELL* rowOptTA[ArraySize];

...

std::fill(rowOptTA, rowOptTA + ArraySize, nullptr);

I got used to meeting such lines which populate the code through the copy-paste method:





You think that is all about low-level handling of arrays? No, quite not. Read further, fear and punish

those who like to use memset.

V512 A call of the 'memset' function will lead to a buffer overflow or underflow. clist_modern

modern_image_array.cpp 59

static BOOL ImageArray_Alloc(LP_IMAGE_ARRAY_DATA ia d, int size)

{

...

memset(&iad->nodes[iad->nodes_allocated_size],

(size_grow - iad->nodes_allocated_size) *

sizeof(IMAGE_ARRAY_DATA_NODE),

0);

...

}

This time, the size of copied data is calculated correctly, but the second and third arguments are

swapped by mistake. Consequently, 0 items are filled. This is the correct code:

memset(&iad->nodes[iad->nodes_allocated_size], 0,

(size_grow - iad->nodes_allocated_size) *

sizeof(IMAGE_ARRAY_DATA_NODE));

I do not know how to rewrite this code fragment in a smarter way. To be more exact, you cannot make

it smart without touching other fragments and data structures.

A question arises how to do without memset when handling such structures as OPENFILENAME:

OPENFILENAME x;

memset(&x, 0, sizeof(x));

It's very simple. Create an emptied structure using this method:

OPENFILENAME x = { 0 };

2. Watch closely and check if you are working with a signed or unsigned

type The problem of confusing signed types with unsigned types might seem farfetched at first sight. But

programmers make a big mistake by underestimating this issue.

In most cases, people do not like to check compiler's warning messages concerning the comparison of

an int-variable to an unsigned-variable. Really, such code is usually correct. So programmers disable

these warnings or just ignore them. Or, they resort to the third method - add an explicit type conversion

to suppress the compiler's warning without going into details.

I suggest that you stop doing this and analyze the situation each time when a signed type meets an

unsigned type. And in general, be careful about what type an expression has or what is returned by a

function. Now examine several samples on this subject.

V547 Expression 'wParam >= 0' is always true. Unsigned type value is always >= 0. clist_mw

cluiframes.c 3140

There is the id2pos function in program code which returns value '-1' for an error. Everything is OK with

this function. In another place, the result of id2pos function is used as shown below:

typedef UINT_PTR WPARAM;

static int id2pos(int id);

static int nFramescount=0;

INT_PTR CLUIFrameSetFloat(WPARAM wParam,LPARAM lPar am)

{

...

wParam=id2pos(wParam);

if(wParam>=0&&(int)wParam<nFramescount)

if (Frames[wParam].floating)

...

}

The problem is that the wParam variable has an unsigned type. So, the condition 'wParam>=0' is always

true. If id2pos function returns '-1', the condition of checking for permissible values will not work and we

will start using a negative index.

I am almost sure that there was different code in the beginning:

if (wParam>=0 && wParam<nFramescount)

The Visual C++ compiler generated the warning "warning C4018: '<' : signed/unsigned mismatch". It is

this very warning that is enabled on Warning Level 3 with which Miranda IM is built. At that moment,

the programmer paid little attention to this fragment. He suppressed the warning by an explicit type

conversion. But the error did not disappear and only hidden itself. This is the correct code:

if ((INT_PTR)wParam>=0 && (INT_PTR)wParam<nFramescount)

So, I urge you to be careful with such places. I counted 33 conditions in Miranda IM which are always

true or always false due to confusion of signed/unsigned.

Let's go on. I especially like the next sample. And the comment, it is just beautiful.

V547 Expression 'nOldLength < 0' is always false. Unsigned type value is never < 0. IRC mstring.h 229

void Append( PCXSTR pszSrc, int nLength )

{

...

UINT nOldLength = GetLength();

if (nOldLength < 0)

{

// protects from underflow

nOldLength = 0;

}

...

}

I think there is no need in further explanations concerning this code.

Of course, it is not only programmers' fault that errors appear in programs. Sometimes library

developers play a dirty trick on us (in this case it is developers of WinAPI).

#define SRMSGSET_LIMITNAMESLEN_MIN 0

static INT_PTR CALLBACK DlgProcTabsOptions(...)

{

...

limitLength =

GetDlgItemInt(hwndDlg, IDC_LIMITNAMESLEN, NULL, TRUE) >=

SRMSGSET_LIMITNAMESLEN_MIN ?

GetDlgItemInt(hwndDlg, IDC_LIMITNAMESLEN, NULL, TRUE) :

SRMSGSET_LIMITNAMESLEN_MIN;

...

}

If you ignore the excessively complicated expression, the code looks correct. By the way, it was one

single line at first. I just arranged it into several lines to make it clearer. However, we are not discussing

editing now.

The problem is that the GetDlgItemInt() function returns quite not 'int' as the programmer expected.

This function returns UINT. This is its prototype from the "WinUser.h" file:

WINUSERAPI

UINT

WINAPI

GetDlgItemInt(

__in HWND hDlg,

__in int nIDDlgItem,

__out_opt BOOL *lpTranslated,

__in BOOL bSigned);

PVS-Studio generates the following message:

V547 Expression is always true. Unsigned type value is always >= 0. scriver msgoptions.c 458

And it is really so. The "GetDlgItemInt(hwndDlg, IDC_LIMITNAMESLEN, NULL, TRUE) >=

SRMSGSET_LIMITNAMESLEN_MIN" expression is always true.

Perhaps there is no error in this particular case. But I think you understand what I am driving at. Be

careful and check results your functions return.

3. Avoid too many calculations in one string Every programmer knows and responsibly says at discussions that one should write simple and clear

code. But in practice it seems that programmers participate in a secret contest for the most intricate

string with an interesting language construct or skill of juggling with pointers.

Most often errors occur in those places where programmers gather several actions in one line to make

code compact. Making code just a bit smarter, they risk misprinting or missing some side effects.

Consider this sample:

V567 Undefined behavior. The 's' variable is modified while being used twice between sequence points.

msn ezxml.c 371

short ezxml_internal_dtd(ezxml_root_t root, char *s , size_t len)

{

...

while (*(n = ++s + strspn(s, EZXML_WS)) && *n != '>') {

...

}

We have undefined behavior here. This code might work correctly for a long time but it is not

guaranteed that it will behave the same way after moving to a different compiler's version or

optimization switches. The compiler might well calculate '++s' first and then call the function 'strspn(s,

EZXML_WS)'. Or vice versa, it may call the function first and only then increment the 's' variable.

Here you have another example on why you should not try to gather everything in one line. Some

execution branches in Miranda IM are disabled/enabled with inserts like '&& 0'. For example:

if ((1 || altDraw) && ...

if (g_CluiData.bCurrentAlpha==GoalAlpha &&0)

if(checkboxWidth && (subindex==-1 ||1)) {

Everything is clear with these comparisons and they are well noticeable. Now imagine that you see a

fragment shown below. I have edited the code but initially it was ONE SINGLE line.

V560 A part of conditional expression is always false: 0. clist_modern modern_clui.cpp 2979

LRESULT CLUI::OnDrawItem( UINT msg, WPARAM wParam, LPARAM lParam )

{

...

DrawState(dis->hDC,NULL,NULL,(LPARAM)hIcon,0,

dis->rcItem.right+dis->rcItem.left-

GetSystemMetrics(SM_CXSMICON))/2+dx,

(dis->rcItem.bottom+dis->rcItem.top-

GetSystemMetrics(SM_CYSMICON))/2+dx,

0,0,

DST_ICON|

(dis->itemState&ODS_INACTIVE&&FALSE?DSS_DISABLE D:DSS_NORMAL));

...

}

If there is no error here, still it is hard to remember and find the word FALSE in this line. Have you found

it? So, it is a difficult task, isn't it? And what if there is an error? You have no chances to find it by just

reviewing the code. Such expressions should be arranged as a separate line. For example:

UINT uFlags = DST_ICON;

uFlags |= dis->itemState & ODS_INACTIVE && FALSE ?

DSS_DISABLED : DSS_NORMAL;

Personally I would make this code longer yet clearer:

UINT uFlags;

if (dis->itemState & ODS_INACTIVE && (((FALSE))))

uFlags = DST_ICON | DSS_DISABLED;

else

uFlags = DST_ICON | DSS_NORMAL;

Yes, this sample is longer but it is well readable and the word FALSE is well noticeable.

4. Align everything you can in code Code alignment makes it less probable that you will misprint or make a mistake using Copy-Paste. If you

still make an error, it will be much easier to find it during code review. Let's examine a code sample.

V537 Consider reviewing the correctness of 'maxX' item's usage. clist_modern modern_skinengine.cpp

2898

static BOOL ske_DrawTextEffect(...)

{

...

minX=max(0,minX+mcLeftStart-2);

minY=max(0,minY+mcTopStart-2);

maxX=min((int)width,maxX+mcRightEnd-1);

maxY=min((int)height,maxX+mcBottomEnd-1);

...

}

It is just a solid code fragment and it is not interesting to read it at all. Let's edit it:

minX = max(0, minX + mcLeftStart - 2);

minY = max(0, minY + mcTopStart - 2);

maxX = min((int)width, maxX + mcRightEnd - 1);

maxY = min((int)height, maxX + mcBottomEnd - 1);

This is not the most typical example but you agree that it is much easier to notice now that the maxX

variable is used twice, don't you?

Do not take my recommendation on alignment literally writing columns of code everywhere. First, it

requires some time when writing and editing code. Second, it may cause other errors. In the next

sample you will see how that very wish to make a nice column caused an error in Miranda IM's code.

V536 Be advised that the utilized constant value is represented by an octal form. Oct: 037, Dec: 31. msn

msn_mime.cpp 192

static const struct _tag_cpltbl

{

unsigned cp;

const char* mimecp;

} cptbl[] =

{

{ 037, "IBM037" }, // IBM EBCDIC US-Canada

{ 437, "IBM437" }, // OEM United States

{ 500, "IBM500" }, // IBM EBCDIC Internation al

{ 708, "ASMO-708" }, // Arabic (ASMO 708)

...

}

Trying to make a nice column of numbers, you might be easily carried away and write '0' in the

beginning making the constant an octal number.

So I define my recommendation more exactly: align everything you can in code, but do not align

numbers by writing zeroes.

5. Do not copy a line more than once Copying lines in programming is inevitable. But you may secure yourself by giving up on inserting a line

from the clipboard several times at once. In most cases, you'd better copy a line and then edit it. Then

again copy a line and edit it. And so on. If you do so, it is much harder to forget to change something in a

line or change it wrongly. Let's examine a code sample:

V525 The code containing the collection of similar blocks. Check items '1316', '1319', '1318', '1323',

'1323', '1317', '1321' in lines 954, 955, 956, 957, 958, 959, 960. clist_modern modern_clcopts.cpp 954

static INT_PTR CALLBACK DlgProcTrayOpts(...)

{

...

EnableWindow(GetDlgItem(hwndDlg,IDC_PRIMARYSTATUS ),TRUE);

EnableWindow(GetDlgItem(hwndDlg,IDC_CYCLETIMESPIN ),FALSE);

EnableWindow(GetDlgItem(hwndDlg,IDC_CYCLETIME),FA LSE);

EnableWindow(GetDlgItem(hwndDlg,IDC_ALWAYSPRIMARY ),FALSE);

EnableWindow(GetDlgItem(hwndDlg,IDC_ALWAYSPRIMARY ),FALSE);

EnableWindow(GetDlgItem(hwndDlg,IDC_CYCLE),FALSE) ;

EnableWindow(GetDlgItem(hwndDlg,IDC_MULTITRAY),FA LSE);

...

}

Most likely, there is no real error here; we just handle the item IDC_ALWAYSPRIMARY twice. However,

you may easily make an error in such blocks of copied-pasted lines.

6. Set a high warning level of your compiler and use static analyzers For many errors, there are no recommendations to give on how to avoid them. They are most often

misprints both novices and skillful programmers make.

However, many of these errors can be detected at the stage of code writing already. First of all with the

help of the compiler. And then with the help of static code analyzers' reports after night runs.

Someone would say now that it is a scarcely concealed advertising. But actually it is just another

recommendation that will help you to have fewer errors. If I have found errors using static analysis and

cannot say how to avoid them in code, it means that using static code analyzers is just that very

recommendation.

Now let's examine some samples of errors that may be quickly detected by static code analyzers:

V560 A part of conditional expression is always true: 0x01000. tabsrmm tools.cpp 1023

#define GC_UNICODE 0x01000

DWORD dwFlags;

UINT CreateGCMenu(...)

{

...

if (iIndex == 1 && si->iType != GCW_SERVER &&

!(si->dwFlags && GC_UNICODE)) {

...

}

We have a misprint here: the '&&' operator is used instead of '&' operator. I do not know how one could

secure oneself against this error while writing code. This is the correct condition:

(si->dwFlags & GC_UNICODE)

The next sample.

V528 It is odd that pointer to 'char' type is compared with the '\0' value. Probably meant: *str != '\0'.

clist_modern modern_skinbutton.cpp 282

V528 It is odd that pointer to 'char' type is compared with the '\0' value. Probably meant: *endstr !=

'\0'. clist_modern modern_skinbutton.cpp 283

static char *_skipblank(char * str)

{

char * endstr=str+strlen(str);

while ((*str==' ' || *str=='\t') && str!='\0') st r++;

while ((*endstr==' ' || *endstr=='\t') &&

endstr!='\0' && endstr<str)

endstr--;

...

}

The programmer just missed two asterisks '*' for pointer dereferencing operations. The result might be

a fatal one. This code is prone to violation access errors. This is the correct code:

while ((*str==' ' || *str=='\t') && *str!='\0') str ++;

while ((*endstr==' ' || *endstr=='\t') &&

*endstr!='\0' && endstr<str)

endstr--;

Again I cannot give any particular tip except using special tools for code check.

The next sample.

V514 Dividing sizeof a pointer 'sizeof (text)' by another value. There is a probability of logical error

presence. clist_modern modern_cachefuncs.cpp 567

#define SIZEOF(X) (sizeof(X)/sizeof(X[0]))

int Cache_GetLineText(..., LPTSTR text, int text_si ze, ...)

{

...

tmi.printDateTime(pdnce->hTimeZone, _T("t"), text , SIZEOF(text), 0);

...

}

Everything is OK at first sight. The text and its length which is calculated with the SIZEOF macro are

passed into the function. Actually this macro must be called COUNT_OF, but that's not the point. The

point is that we are trying to calculate the number of characters in the pointer. It is "sizeof(LPTSTR) /

sizeof(TCHAR)" which is calculated here. A human hardly notices such fragments but compiler and static

analyzer see them well. This is the corrected code:

tmi.printDateTime(pdnce->hTimeZone, _T("t"), text, text_size, 0);

The next sample

V560 A part of conditional expression is always true: 0x29. icqoscar8 fam_03buddy.cpp 632

void CIcqProto::handleUserOffline(BYTE *buf, WORD w Len)

{

...

else if (wTLVType = 0x29 && wTLVLen == sizeof(DWO RD))

...

}

In such cases, I recommend you to write a constant first in the condition. The following code will simply

not compile:

if (0x29 = wTLVType && sizeof(DWORD) == wTLVLen)

But many programmers, including myself, do not like this style. For instance, personally I get confused

because I want to know first what variable is being compared and only then - to what it is being

compared.

If the programmer does not want to use this comparison style, he has either to rely on

compiler/analyzer or risk.

By the way, this error is not a rare one despite being widely known among programmers. Here are three

more examples from Miranda IM where the PVS-Studio analyzer generated the V559 warning:

else if (ft->ft_magic = FT_MAGIC_OSCAR)

if (ret=0) {return (0);}

if (Drawing->type=CLCIT_CONTACT)

The code analyzer also allows you to detect very suspicious places in code, if not errors. For instance,

pointers serve not only as pointers in Miranda IM. In some places such games look fine, in other places

they look scary. Here is a code sample that alerts me:

V542 Consider inspecting an odd type cast: 'char *' to 'char'. clist_modern modern_toolbar.cpp 586

static void

sttRegisterToolBarButton(..., char * pszButtonName, ...)

{

...

if ((BYTE)pszButtonName)

tbb.tbbFlags=TBBF_FLEXSIZESEPARATOR;

else

tbb.tbbFlags=TBBF_ISSEPARATOR;

...

}

Actually we are checking here if the string's address is not equal to 256. I do not quite understand what

the developers intended to write in this condition. Perhaps this fragment is even correct but I doubt it.

You may find a lot of incorrect conditions using code analysis. For example:

V501 There are identical sub-expressions 'user->statusMessage' to the left and to the right of the '&&'

operator. jabber jabber_chat.cpp 214

void CJabberProto::GcLogShowInformation(...)

{

...

if (user->statusMessage && user->statusMessage)

...

}

And so on and so forth. I can give your other examples, a lot of them. But there is no reason. The main

point is that you may detect many errors with static analysis at the very early stages.

When a static analyzer finds few errors in your program, it does not seem interesting to use it. But this is

a wrong conclusion. You see, you paid with blood and sweat and spent hours on debugging and

correcting errors which analyzer could have found at early stages.

Static analysis is of large interest in the software development field and not as a tool for one-time

checks. Many errors and misprints are detected during testing and unit-test development. But if you

manage to find some of them at the stage of code writing already, you will have a great time and effort

gain. It is a pity when you debug a program for two hours just to notice an unnecessary semicolon '; '

after the 'for' operator. Usually you may get rid of this error by spending 10 minutes on static analysis of

files that have been changed during development process.

Summary In this article, I have shared only some of my ideas concerning ways of avoiding as many errors as

possible in C++ programming. There are some other ideas I am pondering on. I will try to write about

them in the next articles and posts.

P.S. It has become a tradition to ask, after reading such an article, if we have told the application's/library's

developers about the errors found. I will answer beforehand to a probable question if we have sent the

bug report to Miranda IM's developers.

No, we have not. This task is too resource-intensive. We have showed only a small part of what we

found in the project. There are about a hundred fragments in it about which I cannot say exactly if they

are errors or not. However, we will send this article to Miranda IM's authors and offer them a free

version of the PVS-Studio analyzer. If they'll get interested in the subject, they will check their source

code themselves and fix whatever they consider necessary to fix.

I must also clarify why I often cannot say exactly if a particular code fragment has an error. This is a

sample of ambiguous code:

V523 The 'then' statement is equivalent to the 'else' statement. scriver msglog.c 695

if ( streamData->isFirst ) {

if (event->dwFlags & IEEDF_RTL) {

AppendToBuffer(&buffer, &bufferEnd, &bufferAllo ced, "\\rtlpar");

} else {

AppendToBuffer(&buffer, &bufferEnd, &bufferAllo ced, "\\ltrpar");

}

} else {

if (event->dwFlags & IEEDF_RTL) {

AppendToBuffer(&buffer, &bufferEnd, &bufferAllo ced, "\\rtlpar");

} else {

AppendToBuffer(&buffer, &bufferEnd, &bufferAllo ced, "\\ltrpar");

}

}

Here you are two identical code fragments. Perhaps it is an error. Or maybe the programmer needs to

have two identical action sets in every branch, so he has written the code so that it could be easily

modified later. You need to know the program to make out if this place is a mistake or not.

Date post:	05-Dec-2014
Category:	Technology
Upload:	andrey-karpov
View:	810 times
Download:	1 times

How to make fewer errors at the stage of code writing. Part N1

Technology