解析自定义文件格式的技巧

Question

抱歉标题有点模糊，但我真的不知道该怎么简洁地描述这个问题。

我创建了一个（或多或少）简单的领域特定语言，我打算用它来指定对不同实体（通常是从网页提交的表单）应用哪些验证规则。帖子底部有一个示例，展示了这个语言的样子。

我的问题是，我不知道该如何开始解析这个语言，把它转化成我可以使用的形式（我会用Python来进行解析）。我的目标是得到一个规则/过滤器的列表（以字符串形式，包括参数，比如'cocoa(99)'），这些规则应该按顺序应用到每个对象/实体上（也以字符串形式，比如'chocolate'，'chocolate.lindt'等）。

我不确定该用什么技术来开始，甚至不知道针对这种问题有哪些技术可用。你觉得最好的方法是什么？我并不想要一个完整的解决方案，只是想要一个大致的方向。

谢谢。

语言示例文件：

# Comments start with the '#' character and last until the end of the line
# Indentation is significant (as in Python)


constant NINETY_NINE = 99       # Defines the constant `NINETY_NINE` to have the value `99`


*:      # Applies to all data
    isYummy             # Everything must be yummy

chocolate:              # To validate, say `validate("chocolate", object)`
    sweet               # chocolate must be sweet (but not necessarily chocolate.*)

    lindt:              # To validate, say `validate("chocolate.lindt", object)`
        tasty           # Applies only to chocolate.lindt (and not to chocolate.lindt.dark, for e.g.)

        *:              # Applies to all data under chocolate.lindt
            smooth      # Could also be written smooth()
            creamy(1)   # Level 1 creamy
        dark:           # dark has no special validation rules
            extraDark:
                melt            # Filter that modifies the object being examined
                c:bitter        # Must be bitter, but only validated on client
                s:cocoa(NINETY_NINE)    # Must contain 99% cocoa, but only validated on server. Note constant
        milk:
            creamy(2)   # Level 2 creamy, overrides creamy(1) of chocolate.lindt.* for chocolate.lindt.milk
            creamy(3)   # Overrides creamy(2) of previous line (all but the last specification of a given rule are ignored)



ruleset food:       # To define a chunk of validation rules that can be expanded from the placeholder `food` (think macro)
    caloriesWithin(10, 2000)        # Unlimited parameters allowed
    edible
    leftovers:      # Nested rules allowed in rulesets
        stale

# Rulesets may be nested and/or include other rulesets in their definition



chocolate:              # Previously defined groups can be re-opened and expanded later
    ferrero:
        hasHazelnut



cake:
    tasty               # Same rule used for different data (see chocolate.lindt)
    isLie
    ruleset food        # Substitutes with rules defined for food; cake.leftovers must now be stale


pasta:
    ruleset food        # pasta.leftovers must also be stale




# Sample use (in JavaScript):

# var choc = {
#   lindt: {
#       cocoa: {
#           percent: 67,
#           mass:    '27g'
#       }
#   }
#   // Objects/groups that are ommitted (e.g. ferrro in this example) are not validated and raise no errors
#   // Objects that are not defined in the validation rules do not raise any errors (e.g. cocoa in this example)
# };
# validate('chocolate', choc);

# `validate` called isYummy(choc), sweet(choc), isYummy(choc.lindt), smooth(choc.lindt), creamy(choc.lindt, 1), and tasty(choc.lindt) in that order
# `validate` returned an array of any validation errors that were found

# Order of rule validation for objects:
# The current object is initially the object passed in to the validation function (second argument).
# The entry point in the rule group hierarchy is given by the first argument to the validation function.
# 1. First all rules that apply to all objects (defined using '*') are applied to the current object,
#    starting with the most global rules and ending with the most local ones.
# 2. Then all specific rules for the current object are applied.
# 3. Then a depth-first traversal of the current object is done, repeating steps 1 and 2 with each object found as the current object
# When two rules have equal priority, they are applied in the order they were defined in the file.



# No need to end on blank line

数据验证领域特定语言文件解析过滤器规则引擎解析技术自定义语言实体处理

解析自定义文件格式的技巧

6 个回答

撰写回答