You are viewing an old version of this page. View the current version.
The gmBasic tool does tool-assisted rewrite of VB6/ASP/COM code into the .NET languages. It is highly configurable using a variety of subsystems. This is a users manual for gmBasic. Its purpose is to describe how to use gmBasic to configure and control the the content of that rewrite. It is directed at individuals with extensive knowledge not just of VB6/ASP/COM code and .NET languages, but also of compiler design, data management, and automated language processing in general. If you are simply interested in producing a quick set of .NET codes which you can then finish on your own, then you have no need to directly use the resources described in this manual. If, however, you wish configure the content of the .NET codes produced beyond what is done for you automatically, then this manual describes how you can do that.
The resources provided by gmBasic are made available through five different subsystems:
|gmPL||The Great Migrations Programming Language is a simple command language used to issue instructions to gmBasic and to enter declarations into the symbol table. It is not a procedural programming language, it is an xml-style scripting language. It is not compiled; rather its commands are executed directly.|
|gmIL||The Great Migrations Intermediate Language is a reverse-polish representation. Source languages are compiled into gmIL. That representation is then analyzed, executed, and modified to be expressible in a target language. Then it can be authored in a target language.|
|gmSL||The Great Migrations Scripting Language is a procedural language for use with the migration, authoring, and reporting facilities of gmBasic. It uses Java-style syntax and is embedded into the gmPL scripts. It is compiled into the same gmIL as used for the other source languages and is executed directly by gmBasic.|
|gmNI||The Great Migrations Native Interface makes writing native methods in C possible to handle situations when a migration cannot be written entirely in the languages supported by the standard capabilities of the translation tool. Native code methods, loaded into runtime libraries, can handle events triggered during the translation process. These native methods have direct access to all of the information being managed during the translation via an extensive set of service classes. The actual methods used are the same as those referenced through gmSL, though their reference syntax and the form of some of the arguments differ to accommodate the differences in C and gmSL|
|gmCL||The Great Migrations Cand Line tools are a set of ANSI-C programs that perform the various independent operations needed by gmBasic. These include pBasic which executes gmPL scripts, Deploy which facilitates the deployment and creation of bundled text files, and Document which produces an HTML based manual using a symplified set of XML input files.|
Why is gmPL Needed?
Code bases, the input to gmBasic, are interrelated combinations of VB6, ASP, and COM code. To process them, gmBasic must be told where the code is, what files it includes, what order the files are to be processed in, what general configurations should be used, what migrations are to be performed, what fixes need to be made, and so on. The gmPL language is needed to provide this information. Its statements are aggregated into xml-style scripts which both direct the operation of gmBasic and document what operations are performed. Initially, these scripts are produced entirely by gmStudio. Reading the gmPL scripts, that gmStudio has produced allows the user to understand what was done. Moving forward from there, the user can add gmPL to the scripts to further control the process.
The rewriting of VB6/ASP/COM code into the .NET languages is not a syntax problem. It cannot be done with simple parsers, source code pattern recognition, and keyword replacement. It requires detailed analysis of what the source code is doing and of how similar operations can be performed and expressed in the target code. Also, it requires identifying those operations in the source code that have no direct expression in the target language. These operations must not only be isolated, they must be represented in the target language in a way that allows the user to supply the missing functionality. The approach taken by gmBasic revolves around this notion of "operations". It is these operations that must be derived from the source code, analysed to make them expressable in the target code, migrated to conform to user needs, and finally authored. The operations are defined and organized into groups called opcodes . Each group and each operation within the group, called subcodes is assigned an identifier. These identifiers are then organized into a reverse-polish language like the pseudo-code produced by the first pass of contemporary compilers. The gmIL is this language. It is needed to organize every phase of the translation process.
Special methods can be entered into library description files that transform the intermediate code associated with components in that library. The marriage of gmSL with the gmPL refactoring capabilities is an especially important feature of the language. This topic gives an extended description of how gmSL is used to transform intermediate code. Though focused on this aspect, this discussion is intended to give a general introduction to gmSL as well.
This discussion assumes familiarity with the gmPL language and with the intermediate language gmIL produced by the compiler.
The original question from the user was "Our VB6 code is in the following pretty standard pattern:"
"I would like it to produce code like this, which is roughly standard ADO.NET connected db access:"
"You can see that is a pretty drastic translation. What sort of effort are we talking to implement that? And what sort of technology area within gmStudio would you use?"
An example VB6 code was created that uses the external library "MSRDO20.DLL#Microsoft Remote Data Object 2.0". Some time was spent to make it a real working example using the .NET class "System.Data.SqlClient", that could passed on to show how gmBasic is able to perform these types of code transformations. The two specific VB6 methods to be transformed are as follows.
Following the request supplied by the user as closely as possible, the target translations for these two is
Using the reference script for MSRDO20.Dll exactly as produced by the gmBasic Idl translator and the following very simple standard translation script
the following unmigrated translation is produced.
The first step in any migration is to add migration names, using gmPL, to reflect any simple name changes that need to be done. The reader is assumed to be familiar with the gmPL component refactoring attributes -- in this case migName and migPattern. Doing this takes a lot of noise out of the differences between the migrated and unmigrated translations. In this case, they are as summarized here.
With these changes in place, the unmigrated translations are now as follows.
The remaining work involves adding a refactoring specification and finally gmSL code.
The first difference between the current translation and the target translation is in the method connectDB.
The character string, which consists of a series of "name=value;" pairs has been changed to remove the Driver pair and the DSN pair.
The first question that has to be asked is "How do we determine which character strings need to be changed?". In an editing context, perhaps all strings could be searched for these pairs and they could be universally removed. This would almost certainly work in this case, but in general changes need to be contingent on their use. The string above needs to be changed because in this later statement
it is being assigned to a connection string. Connection strings in the target environment do not have these two attributes. A search needs to be made of assignments to the ConnectionString property, and the strings used in these assignments need to be edited, if possible and necessary.
Transformation is not a process that works with source code or target code. It works with the intermediate code produced by the compiler. After the compiler has completed and after the analyser has finished the standard migrations built into gmBasic, references in the intermediate code to components in external libraries are scanned looking for those that have user supplied code registered for them. The purpose of that user code is to change that intermediate code so that it performs the target operation as opposed to its original source operation.
Once a statement that requires transformation has been identified, the first step is to examine the intermediate code currently produced for it. This can be seen in an audit report of the vbi file produced by the translation script or can be produced on-the-fly using the Opcode.DumpCode method. The intermdediate code for the conectionString assignment statement is as follows.
It can be seen that the external library component receiving the content of the variable dbs is Component:Connect:50673. Looking this component up in the symbol audit shows that it is
Starting the actual transformation process, then, will involve writing a gmSL method that will be notified of all code references to Lib_Property RDO._rdoConnection.Connect. The purpose of that method will be to locate any strings being assigned to that property and to edit them to remove any unwanted attributes.
The gmSL transform methods are introduced within library description file Refactor statement sections using the gmSL statement. In this case this is as follows.
The namespace for these methods is the event name used in the refactor statement and the class for the methods is Transform. Remember that in real migration projects many different libraries and codes are being migrated; therefore, carefull naming conventions are necessary. The actual gmSL code could be embedded within the gmSl statement; however, there are "intellisense" editors available for files that have the gmsl extension, so keeping this code separate makes it easier to author and maintain it.
For now the file msrdo20Transform.gmsl is simply as follows.
It simply logs a message to the translation log file and returns a zero indicating that no change has been made by the method.
The names of transpose methods are an "underline converted" form of the host relative identifier of the component whose reference code is to be transformed. Underline conversion changes all periods in the identifier to an underscore and changes all underscores in the identifier to double underscores. In this case the component identifier is RDO._rdoConnection.Connect. Making it host relative removes the leading RDO. and doing the underline conversion makes it __rdoConnection_Connect. This conversion is necessary to create unique but well-formed method identifiers. After the gmSL file is compiled, gmBasic scans the refactor host for components that match the underscore converted methods, sets their hasCodeHandler property True, and sets their migTransform member equal to the root of the transform method.
All transform methods have three parameters. The parameter subRoot is the root of the source code component that references the component being transformed. The parameter iStart is a starting code location that marks where the referencing code began. Its exact value will vary by the type of reference. The parameter iRefer is the code location of the actual reference to the transform component. To bring this into focus, the initial version of the transform method merely logs the content of these parameters. Running the translation script with Progress="1" produces the following log.
Focusing first on the actual message produced by the transform method, it confirms what was seen in the code dump earlier. The method connectDB contains a reference to RDO._rdoConnection.Connect at code offset 89 and that a good starting point is at offset 77 which in this case is the offset of the reference to the variable dbs whose content will eventually be evaluated and modified.
The log more importantly brings out the integration between the gmSL transpose capability and the overall processing of the translation script. Translation, when it becomes migration, is a very complex, iterative process. Things go wrong. Things do not work as expected. The translation produces a log file that describes what happens, and it also produces a vbi file that contains all of the detailed information about how the source code was migrated and what it was migrated into. When things go wrong it is important to have available an exact representation of the logic used to do the migration in the vbi file that actually produced it. Note in lines 05 and 06 above that the transpose class is being compiled in the same manner as the source code. All code associated with it is in the vbi file where it can be audited and examined in precisely the same manner as any other code. The translation produced is identical to the one before the refactor section was added, but the vbi is different. First the transform method itself has been added into the symbol table.
The source code for the method can be viewed if the EchoInput Select attribute is turned on.
An actual code dump of the transpose method can be examined.
Note that the gmSL, though it has a very different syntax than VB6, uses the identical gmIL operations. Finally, the actual entries for the migrated component have been updated to mark it has having a code handler whose offset is 85249 which is this method.
The actual code used to ultimately change the content of the dbs connection string works through references to the RDO._rdoConnection.Connect component. It is as follows. It begins with the same underlined converted method identifier, with the same standard parameters, and then the declarations of the local variables. Reading about, writing about, and understanding code whose purpose is to manage other code can get confusing quickly, so keep in mind that there are two coding levels.
Though not in this sample, there could certainly be other references to the Connect property that are not assignments from local variables. So the first step is to verify that this is a local variable assignment to the property.
Remember that the gmSL itself is stored in the same overall structure as the user code; therefore, the first two calls in almost every transpose method use the Opcode methods GetCode and GetLength that reference the user code and not the running code. Next the method checks that the iStart parameter is a reference to a local variable and that the property reference is an assignment. From the dump the expected code sequence is.
Note that the call that checked for the local variable assignment also trapped its root in the localVar. The next step is to look for a preceding assignment to this local variable.
The code for the RefactorCode_FindAssign method is shown below. If it returns a nonzero value, then that value is the code offset of the value being assigned to the variable. The next step is to obtain the actual string value being assigned to the local variable.
The method Opcode.GetString is passed the starting and ending offset of code that may produce a string constant when it is executed. The actual code being passed to GetString is
The method Opcode.GetString literally executes this code, even though it is user code, using the same engine as is used to execute the gmSL code. Since this code is being executed at compile time, it may not be possible to resolve it into a string -- if it contains variable references. If it can resolve it, it returns the string; if not, it returns a null-string. In this case the variable connect contains a resolved connection string which it can edit by removing the attribute-values pairs that are not to be used.
The final step is to replace the old string expression with the revised string. Note that this replacement may well shift code that precedes the referencing code. The calling method that is scanning for transform references needs to be told that this has happened. A non-zero return value tells the scanner that the method has made a change in the code and that scanning should resume at the indicated code location.
Moving now to the method RefactorCode_FindAssign, note that it does not contain an underline converted identifier so it is simply private to this class. Its parameters are the root offset of the variable for which an assignment is sought and the code location that the assignment must precede.
It consists of a simple scan from the front of the user method code for an assignment to the indicated variable. If found it returns the starting offset of the expression that defines the value being assigned.
The method RefactorCode_ReplaceAssign deletes the original code in the expression and then inserts a reference to the replacement string.
When doing code substitution, the most difficult step is deciding how much code should be deleted or inserted. In this case, when nDelete is computed, iEnd contains the location immediately after the ARG operation that closes the expression code and iAssign contains the location immediately after the LEV operation that opens the expression code. Thus, iEnd - iAssign needs to be offset by the size of the ARG operation which is needed in the new expression and by the size of the LSC operation which will load the new string.
Running this new code does now produce the desired change as the following file comparison shows.
The next difference between the unmigrated and migrated code is in execQuery in the string assigned to the variable SQL.
As with the previous string change requirement the fact that this string should change is determined from the fact that the variable SQL is used as the second argument to the method RDO._rdoConnection.CreateQuery.
Though the reference pattern is different and the actual editing is different, the logic of the transform method is about the same as the logic of the RDO._rdoConnection.Connect method. The name of the method is now constructed to refer to CreateQuery, the parameters are the same three. The declaration of the method is followed by the declarations of the local variables.
The first step is to make certain that this reference is a valid call to the method whose second argument is a variable reference. The code is a bit long but straightforward.
The highlighted code shows where the root of the SQL variable in sqlVar is determined. The second step is to find the preceding assignment to this variable.
The third step is to obtain the actual string value being assigned to the local variable.
If there is a constant query string, then the fourth step is to relace the "?"s with @index and if necessary replace it in the code.
Running this code causes the desired change in the translation as the following file comparison shows.
In addition to having to change the content of the query string, the actual call to the method needs to be changed into a combined method call followed by a propery assignment. The actual difference is
The easiest way to acheive these types of changes is to invent a new method that reflects the revised method and then to associate with that method the final migPattern. These new methods are placed in a migClass defined within the refactor section. The added declaration is
Note that it must precede the gmSL statement as the gmSL code references it. A new section of code can now be added to the transform method that scans forward in the referencing code, removes the old method calling operations and set command code, and replaces it with a reference to the pattern defined above.
Note that the statement that gets the root offset of the new method is highlighted. The comparison log shows that the code acheived the desired result.
The next difference between the unmigrated and migrated code is as follows. This particular difference occurs twice.
The compiler processes this by generating a generic COL.Item operation that must be replaced by a pattern string.
This pattern is again stored in the migClass DotNet. The code for the transform then is simply
It checks for the needed operation, finds the root of the new pattern variable, replaces the old operation code with the new reference to the pattern and returns the new code reference scan location. As the following change log shows, both instances of the rdoParameters where changed.
An important difference in the migrated code has to do with the SqlDataReader variable Results. In the unmigrated version, it is declared as a method level local variable and is then opened in the code.
In the migrated version, its scope is limited, as it is declared and opened in a using statement.
An important note here is that the target form of the OpenResultset has been migrated to ExecuteReader; however, with the symbol table it still has its source name which must be used. Trying to transform something like rdoPreparedStatement_ExecuteReader would not work.
The transform method itself begins in the standard way with the required declaractions.
The first set makes certain that the expected type of reference is present and it obtains the root of the Results variable.
The second step is to set the DeadCode property of the variable to True. Doing this blocks the declaration of the variable in the list of local variables.
And finally the third step changes the CMD.Set into an IFS.Using operation. This operation requires a type as well as a variable reference. The operation TYV.root displays the type of a root; so it is inserted into the code as well.
The file comparison shows that the declaration has been removed as well as the following desired change.
The using statement enters a new indentation level into the code; therefore, all the statements below it are shifted. In fact the translation log now shows this warning
The end of the using block must be found as well and entered into the migrated code.
Within the unmigrated code the Openresultset method has a corresponding Close method. In the migrated approach this method call is replaced by an IFS.EndUsing operation. The transform code is as follows.
The Opcode.CommentOut method finds the end of the statement containing the Close method reference and replaces it with a CMT.Delete operation which will delete the entire statement from the target code. It returns the code offset of that CMT operation. The transpose method then inserts an IFS.EndUsing operation after the CMT. This acheives the desired result.
The next difference is in the while loop that reads the records from the result set.
In the current unmigrated code the while loop checks for an end-of-file while the target code performs the actual read. Using the types of techniques used earlier, the simple approach is to change the migName of the EOF to Read() using gmPL.
Then the transform method for the property can simply check for the NOT operation and remove it. This is what the actual reference code looks like.
The transpose method then merely checks for the pattern and removes the NOT if present. The code is straight forward.
Checking the change log shows that the combination of the new migName and the removal of the NOT acheived the correct result.
Migrations that combine noncontingent renaming with contingent code modification are referred to as "shallow" transforms. The technology used by gmBasic is derived from the field of transformational grammar. The meaning of sentences is referred to as "deep structure", and the representation of the sentence as uttered is referred to as "surface structure". Rules that mix these two levels are called "shallow" and should generally be avoided. In our sample code, the only reference to the "EOF" property is in that while clause where the renaming to "Read()" is valid. But in other contexts, the transform would fail to apply, but the "EOF" would still be changed "Read()" -- certainly causing bad code.
In places such as this, shallow transforms are fine, but beware of them in larger scale migrations where they can introduce problems. A more complex approach would introduce a DotNet method Read and then do a contingent replacement.
The next difference is in the while loop that reads the records from the result set.
This difference involves the references to two properties rdColumns and to Value both of which can simply be removed. The actual references can be seen in the code audit.
Except for the names of the methods they are identical.
The change log shows that these changes produced the desired result.
The final difference between the two translations is that the MoveNext call is no longer needed. The migrated code already does the read in the while loop. The transform method can simply comment out the unwanted statement.
This demonstration migration has now been completed,