Regex in ABAP, HTML processing in ABAP with regular expressions
Recently I faced a problem to proceed HTML-code and replace some CSS-expression with HTML’s tag analog.
For instance: font-weight: bold; property inside of tag value must be replaced with HTML-tag.
One of the ways to solve this problem is to use regular expressions in ABAP.
Further I’m going to explain my solution with detailed code of ABAP’s regex.
1. First-of-all we need to detect block, where there is a font-weight property, and then surround the content of this block with HTML tag.
- REPLACE ALL OCCURRENCES OF REGEX ‘(font-weight:[^>]*>)([^♦]*)(♦)()’
- IN html_string WITH ‘$1$2$3$4’ IGNORING CASE.
You may ask about „♦“ symbol, I’ll pay attention to it at the end of this post.
Some comments:
— Brackets „(…)“ allow to us to define an block, that can be placed or deleted in specific place in result of regex.
— Expression „[^>]*“ will get the string until the char „>“, the same logic with „[^♦]*“.
— By using „$“ char and number we can arrange and put concrete block to the specific place.
2. Now, when we have found the relevant block and surrounded its content with wanted tag we can remove font-weight property from block.
- REPLACE ALL OCCURRENCES OF REGEX ‘(font-weight:[^;]*;)’
- IN html_string WITH ” IGNORING CASE.
That’s all. We just replaced font-weight property in block with HTML-tag.
Now, it’s a turn to explain the meaning of „♦“ symbol. Actually, it’s a kind of workaround for the case of nested HTML-tags inside of span-block, e.g. ……….
In order to detect the end of span-block content and not the end of any nested tag I add an anchor — „♦“ symbol before and use this anchor in my regex.
At the and I have to remove this anchor with the following regex:
- REPLACE ALL OCCURRENCES OF REGEX ‘♦’
- IN html_string WITH ” IGNORING CASE.
Final code:
- ” set workaround for nested tags case
- ” I’m using a special char ‘♦’ in order to deal
- ” with case when we have a nested HTML tags and we want to know
- ” the real end of the string that we want to surround
- ” with basic HTML tag
- REPLACE ALL OCCURRENCES OF REGEX ‘’
- IN html_string WITH ‘♦’ IGNORING CASE.
- ” surround bold (FONT-WEIGHT: bold) text with HTML’s STRONG tag
- REPLACE ALL OCCURRENCES OF REGEX ‘(font-weight:[^>]*>)([^♦]*)(♦)()’
- IN html_string WITH ‘$1$2$3$4’ IGNORING CASE.
- ” remove unneeded CSS-style font-weight property
- REPLACE ALL OCCURRENCES OF REGEX ‘(font-weight:[^;]*;)’
- IN html_string WITH ” IGNORING CASE.
- ” delete workaround for nested tags case
- REPLACE ALL OCCURRENCES OF REGEX ‘♦’
- IN html_string WITH ” IGNORING CASE.
Thanks.
Additional links:
— More details on regex/regexp
https://www.regular-expressions.info
— Regular expression processing in ABAP
https://scn.sap.com/docs/DOC-10319
P.S. If you know the better way to solve this problem, feel free to share your experience ????
New NetWeaver Information at SAP.com
Very Helpfull