Friday, May 31, 2013

Tools: Notepad++ - Syntax Highlighting

I've posted about Notepad++ (Npp) tool briefly in a different context. I've been using it for a while and the more I use, the more I discover about the tool. This post is about the syntax highlighting in the tool and how you can customize it.

Npp is simply a text editor. It does the same thing Windows' Notepad does and then some. Hence the name Notepad++. With it, you can open almost any text file. Have you ever tried to open a text file from Unix (or Linux) in Windows' Notepad? All lines will be collapsed into a big jumble? This won't happen in Notepad++. It also allows you to adapt various encoding in file (ASCII/ANSI, UNICODE etc) and thus you won't see box characters (well, most of the time). So, it's a great text editor. But that's not all. What makes it a great tool are it's easy-to-use user interface, it's search capabilities (includes Regex out of the box), plugins etc. If you haven't already tried it, download it from here.

As developers we tend to expect a lot more from any tool. I myself have used several editors, starting DOS's EDIT, Unix Vi etc. (I still remember Brief Editor of the early 90's made by a company called Underware!) Each had unique capabilities and weaknesses. One thing they were all missing was the Syntax highlighting. For a long time, this was the feature of IDEs/Editors that came with the language. But in the era of extensible editors like Eclipse, we've come to expect Syntax highlighting as a minimum for any text editor.

I have always wondered how the programming tools color their syntax. I can understand a dedicated tool like Powerbuilder. They know their language, so they can just hard code it in their programs. How does a simple editor like Notepad++ do that without having knowledge about each language it supports? These tools provide some kind of mechanism (proprietary or standard) to define various aspects of the syntax and how to highlight them. But, they just don't know anything about the language itself. That is the syntax highlighting in such tools is based on pure syntax and not the semantics of the language. This is the key to understanding some anomalies in highlighting in these tools.

Notepad++ (Npp) is one of the editors based around a standard editing component called Scintilla. According to their home page, "Scintilla is a free source code editing component. It comes with complete source code and a license that permits use in any free project or commercial product.". More on this later. Npp comes with syntax definitions for several languages. But, if your language of choice is not in the list, no worries; you can simply add another "User Defined Language" (UDL). With the latest version (6.2.x), it's improved and called UDL2.

It's the UDL that I am going to talk about in this post. If you open Npp and click on Language menu, you will see a list of languages available. You can then click "Define Your language" option and get going with adding the definition for your language. I used this define the syntax highlighting for Powerbuilder code.

User Defined Language (UDL)



Fig 1. Npp UDL dialog

Using this dialog, you can create new syntax definitions, import, export and copy from another language definition. There are several tabs here. We will come to Folder & Default in a minute. Keywords Lists is where you put in the keywords in your language. You can separate the keywords in your language into several groups. For e.g., in my example below, I've 2 lists - one for all the reserved words in Powerbuilder and the second list contains all the types. Separating it thus allows you style your keywords distinctly.



Fig 2. Setting up Keyword Lists

When you define an entity - here Keyword list, you can also attach a styler to it. The styler is where you define the fonts and colors for the text for the particular type of entity. For e.g., I've colored PB reserved words in blue as shown in Fig3.



Fig 3. Styler Dialog

Next Tab in the UDL dialog is the "Comment & Number". This is where you can define the commenting symbols in your language. For comments they have 2 styles. Line Style which is a single comments. In PB (C, C++, Java etc) single line comments are identified // at the beginning of the text. Such comments can be at the beginning or end of the line or by themselves.

The second style is C style languages is the multi-line comments, anything in between /* and */. Here is a screenshot of the comment section. As you can see, there is a lot more you can do with it. In the styler for the comments, you can also specify about nested comments.



Fig 4. Setting up Syntax highlighting for comments

Similarly, you can highlight numbers.

The last tab is the Operators & Delimiters. This is where you list out the operators and any delimiters in the language you are defining. Operators are the symbols like +, -, /, * etc. Comma (,), semi-colon(;), colon(:), pipe (|) are examples of delimiters. Again, use the stylers to format them.

I will now get back the first tab - Folders & Default. Default is simple - This the normal text that is not defined in any of the tabs above - i.e., text that is not a keyword, comment, operator or a number. To define this, just click on the Default Style-> Styler button.

The folders section is a bit more interesting. Have you ever seen code folding? Eclipse has it. I believe, Visual Studio does. Powerbuilder itself does not offer it. It's the feature in some editors that let you selectively hide sections of code, so you can see the "big picture". For e.g., if you are working with a multi-level nested IF statements, you can hide inner ones to see what the outer most if/else does. You typically see code folding available for any control structures (let's call it block definers) in the programming language - like IF, CASE, FOR, DO, {, } etc. These typically have an opening and closing marker text or symbol. The code folding also helps in such cases, to see if any block is missing closures.

In this tab, Npp UDL offers 2 styles - one that doesn't require separators and the other that does. This is a bit confusing and I had to try both to understand. The one that doesn't require separator are symbols like {, }, (, ) etc. Even if you have text touching them, they are identifiable. You put them in the first style. The second style, that needs separators (typically space), will contain all the other block definers. For e.g., in PB we have IF...ELSE...END IF to define a IF block. The opener is the IF, END IF is the closer. ELSE happens to be the middle (Earlier versions of UDL only supported open and close. So, upgrade to the latest version of Npp, if you haven't already done so). Notice END IF is actually 2 words; in such cases surround them in Double quotes.

Folding in comments allows you to put specific comments in your code, that you can use for folding the section. We often comment out whole block of codes to test portions of code. Just add marker texts (open, middle, close) defined earlier to such comments, you will be able to just hide the whole block.



Fig 5. Code Folding in Npp

Of course, you can add styles in the stylers.

Here is a sample PB code in Npp, after I created the UDL "Powerbuilder" and applied to the file opened.



Fig 6. Editing PB code in Npp

Note: When I added code folding the syntax highlighting for the keywords in Code folding (IF, END IF, CHOOSE CASE etc) seem to disappear. It may be a bug in the current version of Npp. I will check and post back here.

[gallery include="4872,4873,4874,4875,4876,4877"]

No comments :

Post a Comment

I will be happy to hear your comments or suggestions about this post or this site in general.