Date post: | 16-Feb-2017 |
Category: |
Software |
Upload: | andrey-karpov |
View: | 166 times |
Download: | 1 times |
PVS-Studio team experience: checking various open source projects, or mistakes C, C++ and C# programmers make
Authors:Candidate of Engineering Sciences,Evgeniy Ryzhkov, [email protected] of Physico-Mathematical Sciences, Andrey Karpov, [email protected]
OOO "Program Verification Systems" (www.viva64.com)• Development, marketing and sales of our software product• Office: Tula, 200 km away from Moscow.• Staff: 14 people
A couple of words about static analysis
• Does everyone know, what static analysis is?
• PVS-Studio performs static analysis of source code written in C, C++ and C#. • C, C++-300 diagnostics;• C# - 100 diagnostics
Our achievements• To let the world know about our product, we check open-
source projects. By the moment we have checked 245 projects.• A side effect: we found 9574 errors and notified the authors about
them. • 9574/245 = 40 errors in a project - not that much. I would like to
stress, that this is a side effect. We didn’t have a goal to find as many errors as possible. Quite often, we stop when we find enough errors for an article.
Examples of errors
So, we have checked a lot of open source projects...• ... thus we have accumulated various observations that we would like
to share
Let’s start with boring stuff - typical errors• Let’s speak about the way the programmers usually see the static
analyzers’ work
A boring example N1OpenMW (C++)
std::string rangeTypeLabel(int idx){ const char* rangeTypeLabels [] = { "Self", "Touch", "Target" }; if (idx >= 0 && idx <= 3) return rangeTypeLabels[idx]; else return "Invalid";}
3 elements
If idx == 3, we have array index out of
bounds
V557 Array overrun is possible. The value of 'idx' index could reach 3. esmtool labels.cpp 502
A boring example N2CamStudio (C++)
int CopyStream(PAVIFILE pavi, PAVISTREAM pstm){ //.... BYTE p[20000]; //.... free(p); return 0;} V726 An attempt to free memory containing the 'p' array by
using the 'free' function. This is incorrect as 'p' was created on stack. playplusview.cpp 7059
A boring example N3Sony ATF (C#)
public static QuatF Slerp(QuatF q1, QuatF q2, float t){ double dot = q2.X * q1.X + q2.Y * q1.Y + q2.Z * q1.Z + q2.W * q1.W;
if (dot < 0) q1.X = -q1.X; q1.Y = -q1.Y; q1.Z = -q1.Z; q1.W = -q1.W;
....}
V3043 The code's operational logic does not correspond with its formatting. The statement is indented to the right, but it is always executed. It is possible that curly brackets are missing. Atf.Core.vs2010 QuatF.cs 282
A boring example N4Xenko (C#)
public string ToString(string format, IFormatProvider formatProvider){ if (format == null) return ToString(formatProvider);
return string.Format(formatProvider, "Red:{1} Green:{2} Blue:{3}", R.ToString(format, formatProvider), G.ToString(format, formatProvider), B.ToString(format, formatProvider));}
V3025 Incorrect format. A different number of format items is expected while calling 'Format' function. Expected: 4. Present: 3. SiliconStudio.Core.Mathematics Color3.cs 765
But life is way more interesting• Let’s look at the dark side
Programmers do not check comparison functions• Psychoanalysis;• "Can't be wrong" in functions like:
public static int Compare(FooType A, FooType B) { if (left < right) return -1; if (left > right) return 1; return 0;}
Easy. Example N1.IronPython and IronRuby (C#)
public static int Compare(SourceLocation left, SourceLocation right) { if (left < right) return -1; if (right > left) return 1; return 0;}
Example N2.Samba (C++)
static int compare_procids(const void *p1, const void *p2){ const struct server_id *i1 = (struct server_id *)p1; const struct server_id *i2 = (struct server_id *)p2;
if (i1->pid < i2->pid) return -1; if (i2->pid > i2->pid) return 1; return 0;}
Example N3.MySQL (C++)
A lot of similar strings. It should be fine.
static int rr_cmp(uchar *a, uchar *b){ if (a[0] != b[0]) return (int)a[0] - (int)b[0]; if (a[1] != b[1]) return (int)a[1] - (int)b[1]; if (a[2] != b[2]) return (int)a[2] - (int)b[2]; if (a[3] != b[3]) return (int)a[3] - (int)b[3]; if (a[4] != b[4]) return (int)a[4] - (int)b[4]; if (a[5] != b[5]) return (int)a[1] - (int)b[5]; if (a[6] != b[6]) return (int)a[6] - (int)b[6]; return (int)a[7] - (int)b[7];}
Easy. Example N4.CryEngine 3 SDK (C++)
inline bool operator != (const SEfResTexture &m) const{ if (stricmp(m_Name.c_str(), m_Name.c_str()) != 0 || m_TexFlags != m.m_TexFlags || m_bUTile != m.m_bUTile || ..... m_Sampler != m.m_Sampler) return true; return false;}
PVS-Studio is coming to the aid G3D Content Pak (C++)
bool Matrix4::operator==(const Matrix4& other) const { if (memcmp(this, &other, sizeof(Matrix4) == 0)) { return true; } ....}
V575 The 'memcmp' function processes '0' elements. Inspect the 'third' argument. graphics3D matrix4.cpp 269
PVS-Studio is coming to the aid It detects errors in all the previous cases:
1. V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless. SourceLocation.cs 156
2. V501 There are identical sub-expressions to the left and to the right of the '>' operator: i2->pid > i2->pid brlock.c 1901
3. V525 The code containing the collection of similar blocks. Check items '0', '1', '2', '3', '4', '1', '6' in lines 680, 682, 684, 689, 691, 693, 695. sql records.cc 680
4. V549 The first argument of 'stricmp' function is equal to the second argument. ishader.h 2089
Last line effect• About mountain - climbers;• The statistics was gathered from the
error base, when it had about 1500 error examples. • 84 suitable fragments were detected. • In 43 cases the mistake was in the last
line.
Example N1.TrinityCore (C++)
inlineVector3int32& operator+=(const Vector3int32& other) { x += other.x; y += other.y; z += other.y; return *this;}
Example N2.Source Engine SDK (C++)
inline void Init(float ix = 0, float iy = 0, float iz = 0, float iw = 0){ SetX(ix); SetY(iy); SetZ(iz); SetZ(iw);}
Example N3.Qt (C++)
.....::method_getImageData(.....) { .... qreal x = ctx->callData->args[0].toNumber(); qreal y = ctx->callData->args[1].toNumber(); qreal w = ctx->callData->args[2].toNumber(); qreal h = ctx->callData->args[3].toNumber(); if (!qIsFinite(x) || !qIsFinite(y) || !qIsFinite(w) || !qIsFinite(w)) ....}
Example N4.Space Engineers (C#)
void DeserializeV0(XmlReader reader){ .... if (property.Name == "Rotation" || property.Name == "AxisScale" || property.Name == "AxisScale") continue; ....}
PVS-Studio is coming to the aid Xamarin.Forms (C#)
internal bool IsDefault{ get { return Left == 0 && Top == 0 && Right == 0 && Left == 0; }}
V3001 There are identical sub-expressions 'Left == 0' to the left and to the right of the '&&' operator. Thickness.cs 29
PVS-Studio is coming to the aid It detects errors in all the previous cases:
1. V537 Consider reviewing the correctness of 'y' item's usage. g3dlib vector3int32.h 77
2. V525 The code containing the collection of similar blocks. Check items 'SetX', 'SetY', 'SetZ', 'SetZ' in lines 455, 456, 457, 458. Client (HL2) networkvar.h 455
3. V501 There are identical sub-expressions '!qIsFinite(w)' to the left and to the right of the '||' operator. qquickcontext2d.cpp 3305
4. V3001 There are identical sub-expressions 'property.Name == "AxisScale"' to the left and to the right of the '||' operator. Sandbox.Graphics MyParticleEmitter.cs 352
Let’s take a dark break: the compiler is to blame for everuthing!
Ffdshow
TprintPrefs::TprintPrefs(....){ memset(this, 0, sizeof(this)); // This doesn't seem to // help after optimization. dx = dy = 0; isOSD = false; xpos = ypos = 0; align = 0; ....}
It only seems that people verify the pointers (references) against null
• In fact, the programs are not ready to face nullptr/null;• This is the most common error that we
find in both C++ and in C# projects.
Example N1.Linux (C) kernel
static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n){ struct net *net = sock_net(skb->sk); struct nlattr *tca[TCA_ACT_MAX + 1]; u32 portid = skb ? NETLINK_CB(skb).portid : 0; ....}
The function got an
argument:
Dereferencing
Oops, it should be checked too.
Example N2.These bugs have ALWAYS been there. Taken from Cfront compiler, year 1985:
Pexpr expr::typ(Ptable tbl){ .... Pclass cl; .... cl = (Pclass) nn->tp; cl->permanent=1; if (cl == 0) error('i',"%k %s'sT missing",CLASS,s); ....}
Example N3.Nothing has changed for the past 30 years. Contemporary Clang compiler:
Instruction *InstCombiner::visitGetElementPtrInst(....) { .... Value *StrippedPtr = PtrOp->stripPointerCasts(); PointerType *StrippedPtrTy = dyn_cast<PointerType>(StrippedPtr->getType());
if (!StrippedPtr) return 0; ....}
Example N4.C # projects are no better. In the source code of 270 controls written by DevExpress we found 460 errors of this kind (1.7 error per project). Example:
public IList<ISeries> CreateBindingSeries(....) { DataBrowser seriesBrowser = CreateDataBrowser(....); .... int currentPosition = seriesBrowser.Position; if (seriesBrowser != null && seriesBrowser.Position >= 0) ....}
PVS-Studio is coming to the aid Unreal Engine 4 (C++)
FName UKismetNodeHelperLibrary::GetEnumeratorName( const UEnum* Enum, uint8 EnumeratorValue) { int32 EnumeratorIndex = Enum->GetIndexByValue(EnumeratorValue); return (NULL != Enum) ? Enum->GetEnum(EnumeratorIndex) : NAME_None;}
V595 The 'Enum' pointer was utilized before itwas verified against nullptr. Check lines: 146, 147. kismetnodehelperlibrary.cpp 146
PVS-Studio is coming to the aid It detects errors in all the previous cases:
1. V595 The 'skb' pointer was utilized before it was verified against nullptr. Check lines: 949, 951. act_api.c 949
2. V595 The 'cl' pointer was utilized before it was verified against nullptr. Check lines: 927, 928. expr.c 927
3. V595 The 'StrippedPtr' pointer was utilized before it was verified against nullptr. Check lines: 918, 920. LLVMInstCombine instructioncombining.cpp 918
4. V3095 The 'seriesBrowser' object was used before it was verified against null. Check lines: 509, 510. - ADDITIONAL IN CURRENT DevExpress.Charts.Core BindingProcedure.cs 509
What does a “normal” programmer think about a code analyzer? Myths and stereotypes
Laziness is on my side• "It is hard to start using static analysis, because
of the large number of messages on the first stage."
PVS-Studio is coming to the aid: markup base• Old messages can be marked as "uninteresting". This is a key point
when you embed the code analyzer into a real project.
All settings turned to the maximum!
• “The more messages the analyzer issues, the better is the analyzer”
"The first 10 messages”• People’s attention weakens very quickly. • The analyzer must take this into account.• Default settings are chosen in such a way that you have
maximum chances to see the error immediately.
The hardest part about static analysis: not to issue warnings
• C++: 105 open source projects
• C#: 36 open source projects
• Example V501
V501.Infix operation is considered as a dangerous one, if the right and the left operands are the same.
while (X < X)
if (A == B || A == B)
V501. The devil is in the details• X*X• while (*p++ == *a++ && *p++ == *a++)• There are number literals to the left and to the right
if (0 == 0)… 15 | 15 …• #define M1 100
#define M2 100if (x == M1 || x == M2)• float x = foo();
if (x == x)
V501. The devil is in the details• /or - apply to numeric constants: 1./1.• A string from Zlib:
if (opaque) items += size - size; / * make compiler happy * /• rand() - rand()
rand() % N - rand() % N• There are classes to the left and right of '|', '&', '^', '%'.
if (str == str) – look for if (vect ^ vect) – we’d better skip • sizeof(__int64) < sizeof(__int64)
V501. The devil is in the details• 0 << 31 | 0 << 30 | ...
(0 << 6) | (0 << 3) | …• '0' == 0x30 && 'A' == 0x41 && 'a' == 0x61• This is a template function to define NaN numbers.• Read(x) && Read(x)• #define USEDPARAM(p) ((&p) == (&p)) and others• To the right and left there is a function call with such names as
pop, _pop• Etc …
Interface? Infrastructure?• “Give me just a command line utility, nobody
cares about the other stuff”
PVS-Studio is coming to the aid:Ability to work with the list of messages. • Filters by the code of the message;• Filters by the message text; • Filters by the name of a file or a folder;• False alarm markup in the code
(Mark As False Alarm: //-V501), including macros;• 100 messages for an .h-file. • Interactivity is super important!
PVS-Studio is coming to the aid:Different ways to run the analyzer• Integration with IDE; • A separate application; • Monitoring of the compiler;• Command line version;• Integration with nightly builds;• IncrediBuild Support.
Static analysis is not a panacea• This is an answer to the question: "What else can I do to improve the
quality of the code”
On the topic of programming culture in Russia and in the world, or “Why should I care about static analysis at all?”
• Western people have used for a long time quite successfully. • Knowing the principles and tools for static code analysis gives you +10
points on the job interview and +20 during the implementation in your project. On top of it - a position of a Team Leader. • Where else can we find articles about static code analysis?
49/26
Q&A• Contact: [email protected]• Follow us on twitter: https://twitter.com/Code_Analysis• Visit the site: www.viva64.com• Come and talk to us during the conference (mostly, we are friendly
people and won’t bite you, we promise)
50/26