Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes and improvements regarding support of real-world MIBs #3

Merged
merged 18 commits into from
Sep 2, 2024

Conversation

dcvmoole
Copy link

Summary

The mibs.pysnmp.com collection of MIBs is a highly valuable test set for checking pysmi's support for real-world MIBs. In particular the pysmi post-v0.3.4 switch to Jinja2 template rendering appears to have caused a substantial regression, with the current pysmi version being able to compile and load only 65% of the MIBs in the mibs.pysnmp.com collection.

This PR fixes a series of small pysmi issues and adds a few new forms of leniency for somewhat broken MIBs. As a result of this PR, pysmi is now able to compile and load 84% of the MIBs in the mibs.pysnmp.com collection, with zero regressions. The included fixes also resolve a number of (open and recently closed) GitHub issues.

The full story

The background of this PR is simply that I wanted to use pysmi to prepare a series of private MIBs for use with pysnmp, but I ran into various problems that had mostly also been reported as pysnmp issues already. Now that pysnmp and pysmi are thankfully being actively maintained again, I decided to spend some spare time on addressing the problems I ran into myself. I ended up fixing several other issues as well.

At some point I realized the value of the resurrected mibs.pysnmp.com MIBs collection. As of writing, that collection, in the asn1/ directory, consists of 11961 MIB files1. Alongside, the pysnmp/ directory contains the compiled versions of those MIBs, to the extent that they could at any point be compiled at all. That directory contains 10713 compiled files right now, and out of those, 9903 can be successfully loaded by pysnmp today. Based on that, I think it is fair to state that the best that pysmi has ever done, is fully and properly support 9903 of the collection's 11961 MIB files, or just under 83%. According to their headers, the overwhelming majority of the compiled files have been generated by pysmi 0.3.4.

However, trying to compile and load the same collection of input MIBs with the current pysmi version (v1.4.4), with a few necessary base changes from this PR2, results in the following statistics:

MIB results: 11961 total, 7806 successful, 1231 failed to compile, 2600 failed to load, 254 failed on dependency, 70 failed for other reasons

These results are produced by a simple program I put together for the occasion, rather than by mibdump (which, most importantly, will not try to load the produced files). In any case, these statistics are the basis for my above claim that the current version of pysmi supports merely 65% of the MIBs (7806 out of 11961). The main cause of the regression compared to pysmi 0.3.4 appears to be the conversion to the use of Jinja2 templates of git-a42d1705. It appears that in several respects, that conversion was simply not fully finished.

With the fixes in this PR, which I offer for your consideration, the same test program produces the following output:

MIB results: 11961 total, 10058 successful, 1024 failed to compile, 708 failed to load, 101 failed on dependency, 70 failed for other reasons

As can be seen, the PR enables the use of 2252 additional MIBs from the collection, now totaling 84% (10058 out of 11961). The additional successes represent a strict improvement: there are no cases where a MIB could be compiled and loaded before this PR but not after. Some MIBs have changed between "failed to compile" and "failed to load" in either direction, but of course, from the user's perspective it is not important as to whether a MIB fails to compile or its compiled form fails to load.

I have done manual inspection of a sizable subset of the changes to the MIBs' .py files as generated before and after the PR, to make sure that the improvements are not due to parts of generated code being unintentionally omitted. After all, taken to the extreme, an empty Python file will load perfectly fine too..! All changes that I have checked, are however intentional improvements. Note that such manual inspection is feasible only because of one of the commits in this PR. Verifying all the produced changes was not feasible however, as the full delta is close to 250 KLoC.

Strictly going by numbers, the PR improves the support for the collection beyond what is found in the mibs.pysnmp.com pysnmp/ directory (10058 instead of 9903). That would suggest that this PR improves pysmi to a "MIB tolerance level" never seen before, which does sound nice. However, it should be noted that the latter set is not a strict subset of the former set: there are 38 MIBs whose current mibs.pysnmp.com Python files load successfully, yet are still not part of the "success set" after this PR. Curiously, at least one case (Wellfleet-MIB) is because the ASN.1 MIB is different in contents from the MIB that was used to generate the compiled version. Many of the other cases can likely be resolved by adding a level of tolerance of capitalization differences in MIB file name versus internal MIB name; I have not looked into that yet.

For completeness: in the above results, "failed on dependency" refers to compile-time dependencies only, as load-time dependency failures are harder to filter out, so those are counted as load failures instead. The "failed for other reasons" category mainly covers cases where the compilation did not fail but also did not produce a Python file named after the input MIB file.

I have made available the test program itself as a gist. The program is overly conservative in that it basically starts with a clean slate for every single MIB. In both cases reported above, it was run against the head of the mibs.pysnmp.com repo, which is currently git-66c9072d. On my test system, a single full run takes about 3.5 hours.

The pull request

The PR itself is split up into small commits in order to facilitate reviewing. Needless to say, I would be happy to address feedback.

Test cases are added where applicable: this PR increases the number of test cases from 114 to 251. Overall, the unit test set remains woefully inadequate for catching regressions, but at least the changes of this PR are covered.

The commits have been subjected to the repository's pre-commit hooks, including Black. As a side note, I noticed that running Black on the entire source tree ends up changing a lexer regex in such a way that pysmi becomes completely non-functional. That should probably be addressed at some point.

The commits have not taken into account PyCharm's inspection facility. Even though pysmi has many annotations for that, those annotations also seem to be highly incomplete as they are. I am not a PyCharm user, but I did give its inspection a go; at the very least, it found no new errors.

The fixes in this PR fully resolve the following GitHub issues:

It is perhaps worth noting that several of the commits in the PR make adjustments to the JSON output format as well, as side effect of necessary changes at the intermediate code generation level. It is my view that development of pysmi cannot move ahead without occasionally making breaking changes to the JSON format, although in that regard the impact of this PR is fortunately very limited anyway.

Looking further ahead

This PR aims to cover much of the low-hanging fruit. With a few slightly more extensive pysmi changes, my current estimate is that it is possible to increase the number of fully supported MIBs from the mibs.pysnmp.com collection further by roughly 600 MIBs or about 5%, thereby bringing up the total to about 89%. Most of that increase would require two more future changes:

  • Many MIBs use DEFVAL values that violate the constraints of their corresponding SYNTAX types. In such cases, pyasn1 will fail to instantiate the corresponding object as created by the MIB's compiled pysnmp code, thus resulting in a load failure. If pysmi were to detect such cases itself, it can drop the DEFVAL value altogether (as it already does in some cases), thus generating pysnmp code that can be loaded. Doing so would require pysmi's symbol table stage to save information on all constraints, rather than just a subset as it does now. This change would likely add support for a little over 300 MIBs.
  • Many MIBs have broken imports. Some MIBs assume imports to be transitive (e.g., when MIB B imports symbol X from MIB A, MIB C expects to be able to import symbol X from MIB B). Some MIBs import symbols from the plain wrong MIB. Other MIBs import symbols that are not available anywhere, but are also not used. All these cases can be fixed with an extra import processing stage that modifies the import and symbol tables to redirect or remove imports. This change would add support for a little over 150 MIBs.

I'd certainly appreciate feedback as to whether such additional changes would be welcome, although I cannot make any promises myself. In any case, going substantially beyond the estimated 89% will be quite a bit harder, as that requires a next level of tolerance for problems with MIBs. Even then, getting anywhere near 100% is impossible without all processing stages (including the parser) being able to selectively discard any and all parts of the input that it cannot both validate and process. I would say that for pysmi, that is not a desirable goal.

Footnotes

  1. A handful of files are actually not MIB files, but some accompanying files that were probably accidentally extracted from an archive along with actual MIBs. For the purpose of the analysis presented here, they do not have a substantial impact.

  2. This "baseline" run makes use of the first three changes from this PR. Without those changes, the output would be messed up by debug messages, the run would get stuck as soon as it gets to A3Com-products-MIB, and the resulting .py files could not easily be compared for regressions. I have confirmed in several ways that these three changes do not affect the baseline statistics in any other way.

The pysmi library uses a proper logging facility to generate its
debugging output in all cases, except for one location, which was
changed by commit git-0b2378ec to use print(). The result of that
change is that users of the library can no longer disable the output
from that debugging statement, thereby requiring work-arounds whenever
clean output is needed from the library.

This commit changes back the debugging statement to use the logging
facility as before.
A MIB may attempt to import itself, and multiple MIBs may attempt to
import each other. The pysmi compiler deals well with such cases as long
as the MIB *file* name matches the MIB name *within* the file. However,
the latter name is used to track whether a specific MIB has already been
loaded (or failed to load), which means that if that name differs from
the MIB's file name, pysmi may end up trying to load the MIB with the
same file name more than once, which results in an endless loop of
trying to load the same MIB file(s) repeatedly. In such cases, the
"mibdump" tool will simply end up hanging forever.

This commit adds basic protection for such freezes by ensuring that
every MIB file is loaded at most once. Depending on the specific MIBs,
circular imports may result in either successful or failed compilation.
That is: this commit does not aim to ensure the best possible outcome in
such cases, as they are rare and not generally supported. However, this
commit does ensure that such cases do not block progress.
For development, debugging, and regression testing purposes, it is
useful for code generation to be deterministic, in that when the same
(unchanged) MIB is compiled to Python code multiple times, the same code
is generated each time--at least, aside from the header comments.

The pysmi code generator currently already gets very close to that goal,
but falls short when it comes to the generated lists of symbols per
imported module. This commit makes generation of those lists
deterministic as well.
Due to being misnamed, the set of basic unit test cases for the
TEXTUAL-CONVENTION macro were not being executed. From the way they were
implemented, it is clear they were never tried in the first place,
either.

This commit rewrites the unit tests to work properly. As it turns out,
they also reveal a bug: the Python code generation unintentionally
omitted the REFERENCE clause from the generated code. This commit also
fixes that aspect.
The switch to Jinja2 based code generation in git-a42d170 inadvertently
broke support for DEFVAL clauses of object types with syntax Bits. Ever
since, MIB files that made use of DEFVAL for Bits types would cause
pysmi's pysnmp code generation process to fail on an exception. JSON
code generation did work, but did not produce the intended output.

This commit fixes the DEFVAL support for Bits, thereby making various
previously uncompilable MIB files not only compile, but also produce the
correct default values for Bits objects for objects upon use. The commit
also adds a pair of regression test cases.

This commit leaves open the issue that a DEFVAL clause with an empty bit
set will result in object instances that have no default value.

Please note that as indicated above, this commit makes a small breaking
change to the JSON output format structure for Bits DEFVAL values.
Previously, DEFVAL values of 0 (zero) for integer syntaxes would be
ignored, thereby causing instantiated objects to have no default value
rather than the default value of 0. Similarly, Empty DEFVAL bit sets
would be ignored, causing corresponding Bits-type object instances to
have no default value rather than an empty bit set as default value.

These problems, which have been in place since the inception of pysmi,
appear to be due mainly to an if-condition that intended to test for
an altogether different case, namely invalid object identifier values.

This commit addresses both above problems, with changes to the parser.
The test set is extended with new tests to cover the three involved
cases. An existing test needed to be fixed as well, by removing its
DEFVAL clause: the change revealed a latent bug in that test, which
could also be triggered previously by using a non-zero DEFVAL value.
For DEFVAL clauses in objects of syntax OBJECT IDENTIFIER, the pysnmp
code generation process did does not generate the intended code, which
resulted in nonsensical actual default values for such object instances,
instead of usable OIDs. This issue has been in place since the switch
to Jinja2 based code generation introduced by git-a42d170, which seems
to expect named OIDs whereas this case involves only numeric OIDs, thus
resulting in an unintended "stringification" of a Python tuple.

This commit fixes the problem and adds a basic test case.
An AUGMENTS clause causes the pysnmp code generation to issue a call on
the object being augmented, from the context of the object performing
the augumentation. However, if the former object is defined later on in
the MIB, it does not yet exist when being augmented. The result is that
loading the generated code fails on a NameError exception.

This commit fixes the problem by issuing the augmentation-related calls
in a second pass, after all the objects have been defined. During the
second pass, the object being augmented is guaranteed to exist.
MIB symbol names that contain hyphens cannot be used directly as
identifier names in the generated Python code. Therefore, such hyphens
are replaced with underscores when generating code for various MIB
nodes.

However, the generated export table also exports the symbols using the
"Pythonized" name as symbol name, rather than the original name. Before
the Jinja2 rewrite of git-a42d170, the original symbol names were then
made available by generating calls to setLabel() on MIB nodes. The label
generation got lost as part of that rewrite, with only some loose code
fragments remaining. That caused import errors for various MIBs.

Restoring the setLabel() calls would not be difficult. However, there is
no reason to export symbols using their Pythonized name at all: instead,
they can be exported by means of their original names from the MIB
files. The effect is the same, with less code and fewer exceptions to be
made. In addition, a next commit will enable exporting types. Types are
not covered by the existing setLabel() facility, and therefore would
always have required being exported by their real names.

Therefore, this commit exports the MIB nodes with their original names,
so that those nodes can be imported using those same names, as was
already generally expected. Therefore, this commit gets rid of the
aforementioned import errors, mainly by simplifying existing code.

As part of this change, the Jinja2 template is cleaned up, as it
performed many hyphen-to-underscore conversions that were entirely
redundant: with imported symbols as only exception, the symbol names are
always already converted to their Python versions before being passed to
the template. The template clean-up is necessary to avoid confusing the
reader as to which names are "Pythonized" or not already, and also
serves as the basis for another next commit (regarding Python keywords).

The existing tests are extended with new test cases to ensure that the
use of hyphens results in generation of proper Python code and that the
resulting MIB nodes' getLabel() method calls return the original names.

Please note that this commit makes a breaking change to the JSON output:
the "name" entries now contain the original, un-Pythonized version of
their corresponding entities. Note that in the previous JSON format, the
original node name from the MIB was not available at all.
Many MIBs rely on the ability to import type declarations from other
MIBs. For unknown reasons, the conversion to Jinja2 dropped the export
of type declarations, therefore causing those MIBs to fail to load. That
omission was likely unintentional; there are no unit tests that cover
this aspect.

This commit enables exporting type declarations from MIBs again, so that
those can be imported from other MIBS as well. The commit also adds unit
tests to avoid future regressions. Note that such tests necessarily rely
on a bit more functionality from pysnmp.
Even though RFC 2579 Sec. 3.5 explicitly forbids using a Textual
Convention as SYNTAX value of another Textual Convention, deriving
textual conventions from other textual conventions is widely used in
practice. However, for such cases, pysmi currently generates Python
code that cannot be loaded, as it leads to "Method Resolution Order"
errors.

The reason for those errors is that classes for textual conventions
derive from both the TextualConvention base class and the underlying
syntax class--in that order. In these "stacked" cases, the underlying
syntax class itself also derives from the TextualConvention base class.

This commit resolves those errors by changing the order of the base
classes. The test set is extended to check various cases that should
now be working properly as a result.
This commit fixes issues related to the use of Python keywords in MIB
files, which cause problems in the generated Python code if left as is.
Support for translating such reserved symbols to non-reserved symbols
was already already in place to some degree, but inadequate, as there
were still issues for both defining and importing symbols.

For defined symbols, this change replaces the incomplete solution from
git-b091479b with the original fix of git-00177b06, which was still in
place for symtable.py but got dropped for intermediate.py as part of the
Jinja2 rewrite of git-a42d170.

For imported symbols, this change centralizes the symbol replacement and
ensures that the Jinja2 template generates an import table with local
symbol names that have been subject to appropriate replacement.

The commit adds various new tests for both cases.
Free-form text values, such as DESCRIPTION strings, may contain a few
characters that can be problematic when transformed to Python strings:

 1. Line terminator characters may be used in any textual strings, as
    per RFC 2578 Sec. 3.1.1. The Jinja2 template uses triple-quote
    strings to support such characters in some of those textual strings,
    but not in all of them. When line terminator characters are used
    within strings that are transformed to single-quote Python strings,
    the resulting Python code can not be compiled.

 2. Backslash characters, which are similarly allowed in such textual
    strings, are interpreted within Python strings. They should instead
    be converted to literal characters. Note that the final character of
    a string may be a backslash, so use of Python's "r" string prefix
    does not fully solve this problem.

This commit adds a Jinja2 filter that can be used to convert any MIB-
provided textual string to a Python string with proper quoting and
escaping of backslashes. That new filter is now used for all textual
strings. To be exact: it is now used for all strings that are ultimately
obtained via the QUOTED_STRING parser token, which is (or at least
should be) the only way through which free-form text values can make
their way from input MIBs into the code generator.

As direct effect of this change, the proper functionality of mibdump's
"--keep-texts-layout" switch is restored. Since the switch to Jinja2 of
commit git-a42d1705, triple-quoted lines had an extra line terminator
character added at the end, and were always getting word-wrapped. Both
aspects are dropped with this change, so that "--keep-texts-layout"
results in a compiled MIB that perfectly preserves the original text
(aside from the original line terminator encodings, i.e., LF vs CR/LF).

The existing tests have been changed to reflect the removal of the
extra line terminator character as added with git-a42d1705. A few extra
representative tests are added to ensure that the above two cases are
handled properly in general and that the implementation of the
"--keep-texts-layout" switch functions as expected.
As per RFC 2578, conceptual-table object types (i.e., xyzTable) are
expected to have a "SEQUENCE OF" syntax whose type matches the syntax of
the table row (i.e., of xyzEntry). The pysmi symbol table building
procedure relies on that fact, and currently rejects any MIBs that do
not conform to that rule. However, there are several real-world MIBs
that have typos in their "SEQUENCE OF" type names.

This commit adds a leniency exception to the code in order to allow MIBs
with such typos to be compiled and loaded after all. The exception is
applied only when compiling the MIB would certainly fail otherwise, so
compilation of valid MIBs is guaranteed not to be affected by this
change. The exception requires keeping track of SEQUENCE types, but as
non-SEQUENCE types are already tracked as well, this is not considered a
substantial resource drain. A test case is added to verify the leniency.
SMIv2 requires that an INDEX clause lists a series of object-type names.
Unlike SMIv2 however, SMIv1 (as per RFC 1212 Sec. 2) also supports that
such a series contains certain index syntaxes instead of (or in addition
to) object-type names. For example, SMIv1 allows "INDEX { INTEGER }".

While pysmi's parser supports the latter case, its code generator has
support in place in its symbol table builder only, with the intermediate
module merely having a few loose and non-functional code fragments with
the remainder being left as a to-do item.

This commit finishes the support for SMIv1 index syntaxes, by properly
generating "fake column" object-type instances for the index syntaxes,
as was likely the original intention as well.

A few duplicate class variables in the symbol table and intermediate
modules are moved to the base class, as successful Python code
generation relies on the fact that both subclasses use exactly the same
values for those. While there, a small set of unused and obsolete class
variables is removed altogether.

The test set is extended with a new test module that covers the new
SMIv1-only functionality.
As part of pysmi's code generation, various imports are rewritten from
SMIv1 to SMIv2, so as to simplify the way in which standard MIBs are
handled. However, an apparent copy-paste error prevents that imports
from RFC1158-MIB are being rewritten as intended. The result is that
MIBs importing from RFC1158-MIB could be compiled but not loaded.

This commit fixes the copy-paste error, and adds a few basic test cases
to ensure that each group of rewrites is working as intended.
Omitting the IMPORTS statement is valid, but pysmi does not deal with
this case as it should. When there is an IMPORTS statement, pysmi
extends the parsed imports table with necessary extra imports. When the
IMPORTS statement is omitted however, that extension is applied to a
temporary throw-away import table instead.

For example, when a MIB without an IMPORTS table uses the INTEGER type,
pysmi rewrites INTEGER to Integer32 as usual. However, since the MIB's
imports table is not extended in this case, Integer32 itself is never
imported from SNMPv2-SMI, resulting in a load-time error.

This commit ensures that there is always a parsed import table, even if
it is empty. The extensions are then applied to it in all cases, and the
above example works as intended. A test case is added as well.
This fix is specifically for WIENER-CRATE-MIB, which uses a "SEQUENCE"
declaration that includes a table object symbol. As a result, the
corresponding object type is considered to be a table column, resulting
in generated Python code that cannot be loaded.

This commit adds an extra restriction for object types to be considered
table columns, namely that they must not have a "SEQUENCE OF" type
syntax. In other words, object types that are tables are no longer ever
considered to be table columns. That restriction allows WIENER-CRATE-MIB
to be loaded properly.

The commit also adds a test case, which consists of a simplified version
of the construction found in WIENER-CRATE-MIB.
@CLAassistant
Copy link

CLAassistant commented Aug 27, 2024

CLA assistant check
All committers have signed the CLA.

@dcvmoole
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@dcvmoole
Copy link
Author

I have read the Code of Conduct and I hereby accept the Terms

@dcvmoole
Copy link
Author

I have read the Code of Conduct and I hereby accept the Terms

..that does not seem to have had the intended effect, but I suppose it is the thought that counts.

@lextm lextm added the enhancement New feature or request label Sep 2, 2024
@lextm
Copy link

lextm commented Sep 2, 2024

Thanks. It took us a while to review the changes you made, and now we are happy to accept the tremendous contribution.

Don't worry about the breaking changes (like changes in JSON output). We can bump the version number to warn users of such.

@lextm lextm merged commit cddb4fc into lextudio:main Sep 2, 2024
17 of 19 checks passed
@dcvmoole
Copy link
Author

dcvmoole commented Sep 3, 2024

Thank you for reviewing and merging the changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants