如何从示例中的stings中仅提取网络指示符?

2024-04-28 06:37:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个潜在的恶意软件行为的样本,我想揭示所有的网络指标,如网站名称和ip地址,它是连接到。你知道吗

通过使用我得到的字符串输出

    $ strings 6787c54e6a2c5cffd1576dcdc8c4f42c954802b7
    %PDF-1.5
    1 0 obj
    <</Type/Page/Parent 80 0 R/Contents 36 0 R/MediaBox[0 0 612 792]/Annots[2 0 R 4 0 R 6 0 R 8 0 R 10 0 R 12 0 R 14 0 R 16 0 R 18 0 R]/Group 20 0 R/StructParents 1/Tabs/S/Resources<</Font<</F1 21 0 R/F2 23 0 R/F3 26 0 R/F4 29 0 R/F5 31 0 R>>/XObject<</Image6 33 0 R/Image9 34 0 R>>>>>>
    endobj
    2 0 obj
    <</Type/Annot/Subtype/Link/Rect[139.10001 398.20001 449.84 726.20001]/Border[0 0 0]/F 4/NM(PDFE-48D407B4789BA8880)/P 1 0 R/StructParent 0/A 3 0 R>>
    endobj
    3 0 obj
    <</S/URI/URI(http://www.pdfupdatersacrobat.top/website/hts-cache/index.php?userid=info@narainsfashionfabrics.com)>>
    endobj
    4 0 obj
    <</Type/Annot/Subtype/Link/Rect[232.39999 618.03003 370.14999 629.53003]/Border[0 0 0]/F 4/NM(PDFE-48D407B4789BA8881)/P 1 0 R/StructParent 2/A 5 0 R>>
    endobj
    5 0 obj
    <</S/URI/URI(>>
    endobj
    6 0 obj
    <</Type/Annot/Subtype/Link/Rect[278.87 583.20001 324.88 594.13]/Border[0 0 0]/F 4/NM(PDFE-48D407B4789BA8882)/P 1 0 R/StructParent 3/A 7 0 R>>
    endobj
    7 0 obj
    <</S/URI/URI()>>
    endobj
    8 0 obj
    <</Type/Annot/Subtype/Link/Rect[185.75999 377.28 398.16 733.67999]/Border[0 0 0]/C[0 0 0]/F 4/NM(PDFE-48D4183FB09C5EC13)/P 1 0 R/A 9 0 R/H/N>>
    endobj
    9 0 obj
    <</S/URI/URI(http://sajiye.net/file/website/file/main/index.php?userid=alwaha_alghannaa@hotmail.com)>>
    endobj
    10 0 obj
    <</Type/Annot/Subtype/Link/Rect[185.75999 373.67999 398.88 734.40002]/Border[0 0 0]/C[0 0 0]/F 4/NM(PDFE-48D4183FB09C5EC14)/P 1 0 R/A 11 0 R/H/N>>
    endobj
    11 0 obj
    <</S/URI/URI(http://sajiye.net/file/website/file/main/index.php?userid=kitja@siamdee2558.com)>>
    endobj
    12 0 obj
    <</Type/Annot/Subtype/Link/Rect[132.48 0 474.48001 772.56]/Border[0 0 0]/C[0 0 0]/F 4/NM(PDFE-48D460B5879C4D8C5)/P 1 0 R/A 13 0 R/H/N>>
    endobj
    13 0 obj
    <</S/URI/URI(http://nurking.pl/wp-admin/user/email.163.htm?login=)>>
    endobj
    14 0 obj
    <</Type/Annot/Subtype/Link/Rect[0 0 612 792]/Border[0 0 0]/C[0 0 0]/F 4/NM(PDFE-48D465334C760A446)/P 1 0 R/A 15 0 R/H/N>>
    endobj
    15 0 obj
    <</S/URI/URI(https://www.dropbox.com/s/76jr9jzg020gory/Swift%20Copy.uue?dl=1)>>
    endobj
    16 0 obj
    <</Type/Annot/Subtype/Link/Rect[.72 0 612 789.84003]/Border[0 0 0]/C[0 0 0]/F 4/NM(PDFE-48D4C7F946F3F02B7)/P 1 0 R/A 17 0 R/H/N>>
    endobj
    17 0 obj
    <</S/URI/URI(https://www.dropbox.com/s/28aaqjdradyy4io/Swift-Copy_pdf.uue?dl=1)>>
    endobj
    18 0 obj
    <</Type/Annot/Subtype/Link/Rect[0 5.76 612 792]/Border[0 0 0]/C[0 0 0]/F 4/P 1 0 R/A 19 0 R/H/N>>
    endobj
    19 0 obj
    <</S/URI/URI(https://www.dropbox.com/s/d71h5a56r16u3f0/swift_copy.jar?dl=1)>>
    endobj
    20 0 obj
    <</S/Transparency/CS/DeviceRGB>>
    endobj
    21 0 obj
    <</Type/Font/Subtype/TrueType/BaseFont/TimesNewRoman/FirstChar 32/LastChar 252/Encoding/WinAnsiEncoding/FontDescriptor 22 0 R/Widths[250 333 408 500 500 833 777 180 333 333 500 563 250 333 250 277 500 500 500 500 500 500 500 500 500 500 277 277 563 563 563 443 920 722 666 666 722 610 556 722 722 333 389 722 610 889 722 722 556 722 666 556 610 722 722 943 722 722 610 333 277 333 469 500 333 443 500 443 500 443 333 500 500 277 277 500 277 777 500 500 500 500 333 389 277 500 500 722 500 500 443 479 200 479 541 350 500 350 333 500 443 1000 500 500 333 1000 556 333 889 350 610 350 350 333 333 443 443 350 500 1000 333 979 389 333 722 350 443 722 250 333 500 500 500 500 200 500 333 759 275 500 563 333 759 500 399 548 299 299 333 576 453 333 333 299 310 500 750 750 750 443 722 722 722 722 722 722 889 666 610 610 610 610 333 333 333 333 722 722 722 722 722 722 722 563 722 722 722 722 722 722 556 500 443 443 443 443 443 443 666 443 443 443 443 443 277 277 277 277 500 500 500 500 500 500 500 548 500 500 500 500 500]>>
    endobj
    22 0 obj
    <</Type/FontDescriptor/FontName/TimesNewRoman/Flags 32/FontBBox[-568 -215 2045 891]/FontFamily(Times New Roman)/FontWeight 400/Ascent 891/CapHeight 693/Descent -215/MissingWidth 777/StemV 0/ItalicAngle 0/XHeight 485>>
    endobj
    23 0 obj
    <</Type/Font/Subtype/TrueType/BaseFont/ABCDEE+Calibri,BoldItalic/FirstChar 32/LastChar 117/Name/F2/Encoding/WinAnsiEncoding/FontDescriptor 24 0 R/Widths[226 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 630 0 459 0 0 0 0 0 0 0 0 668 532 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 528 0 412 0 491 316 0 0 246 0 0 246 804 527 527 0 0 0 0 347 527]>>
    endobj
    24 0 obj
    <</Type/FontDescriptor/FontName/ABCDEE+Calibri,BoldItalic/FontWeight 700/Flags 32/FontBBox[-691 -250 1265 750]/Ascent 750/CapHeight 750/Descent -250/StemV 53/ItalicAngle -11/AvgWidth 536/MaxWidth 1956/XHeight 250/FontFile2 25 0 R>>
    endobj
<</Type/Pages/Count 1/Kids[1 0 R]>>
endobj
81 0 obj
<</Type/Catalog/Pages 80 0 R/Lang(en-US)/MarkInfo<</Marked true>>/Metadata 83 0 R/StructTreeRoot 37 0 R>>
endobj
82 0 obj
<</Producer(RAD PDF 2.36.8.0 - http://www.radpdf.com)/Author(alesk)/Creator(RAD PDF)/RadPdfCustomData(pdfescape.com-open-AC00E8D5A4B4C84BC37A2054F4EC794B0297765728CB8415)/CreationDate(D:20160825075202+01'00')/ModDate(D:20170711012532-08'00')>>
endobj
83 0 obj
<</Type/Metadata/Subtype/XML/Length 1031>>stream
<?xpacket begin="
" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="DynaPDF 4.0.11.30, http://www.dynaforms.com">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
        xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:xmp="http://ns.adobe.com/xap/1.0/"
        xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/">
<pdf:Producer>RAD PDF 2.36.8.0 - http://www.radpdf.com</pdf:Producer>
<xmp:CreateDate>2016-08-25T07:52:02+01:00</xmp:CreateDate>
<xmp:CreatorTool>RAD PDF</xmp:CreatorTool>
<xmp:MetadataDate>2017-07-11T01:25:32-08:00</xmp:MetadataDate>
<xmp:ModifyDate>2017-07-11T01:25:32-08:00</xmp:ModifyDate>
<dc:creator><rdf:Seq><rdf:li xml:lang="x-default">alesk</rdf:li></rdf:Seq></dc:creator>
<xmpMM:DocumentID>uuid:a184332f-8592-38c8-908c-45914e523218</xmpMM:DocumentID>
<xmpMM:VersionID>1</xmpMM:VersionID>
<xmpMM:RenditionClass>default</xmpMM:RenditionClass>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
endstream
endobj
84 0 obj
<</Type/XRef/Size 85/Root 81 0 R/Info 82 0 R/ID[<299C21286E590F03363518EFD9FBBF99><299C21286E590F03363518EFD9FBBF99>]/W[1 3 0]/Filter/FlateDecode/Length 239>>stream
cx?{
endstream
endobj
startxref
204273
%%EOF

那么,有没有任何方法可以消化所有这些字符串,并使用任何regex或任何其他方法仅提取域或IP地址之类的网络指示符呢。你知道吗

欢迎提出建议

预期输出:

http://www.pdfupdatersacrobat.top/website/hts-cache/index.php?userid=info@narainsfashionfabrics.com
http://sajiye.net/file/website/file/main/index.php?userid=alwaha_alghannaa@hotmail.com
http://ns.adobe.com/pdf/1.3/

Tags: rectcomobjhttpwwwtypelinkrdf
2条回答

由于findstr只提供基本的RegEx功能,我建议使用PowerShell

(如有必要,分批包装)

当然,RegEx并没有去掉http行的尾部:

> gc .\sample.txt |sls '^.*?(https?:\/\/.*)$'|%{$_.Matches.Groups[1].Value}
http://www.pdfupdatersacrobat.top/website/hts-cache/index.php?userid=info@narainsfashionfabrics.com)>>
http://sajiye.net/file/website/file/main/index.php?userid=alwaha_alghannaa@hotmail.com)>>
http://sajiye.net/file/website/file/main/index.php?userid=kitja@siamdee2558.com)>>
http://nurking.pl/wp-admin/user/email.163.htm?login=)>>
https://www.dropbox.com/s/76jr9jzg020gory/Swift%20Copy.uue?dl=1)>>
https://www.dropbox.com/s/28aaqjdradyy4io/Swift-Copy_pdf.uue?dl=1)>>
https://www.dropbox.com/s/d71h5a56r16u3f0/swift_copy.jar?dl=1)>>
http://www.radpdf.com)/Author(alesk)/Creator(RAD PDF)/RadPdfCustomData(pdfescape.com-open-AC00E8D5A4B4C84BC37A2054F4EC794B0297765728CB8415)/CreationDate(D:20160825075202+01'00')/ModDate(D:20170711012532-08'00')>>
http://www.dynaforms.com">
http://www.w3.org/1999/02/22-rdf-syntax-ns#">
http://ns.adobe.com/pdf/1.3/"
http://purl.org/dc/elements/1.1/"
http://ns.adobe.com/xap/1.0/"
http://ns.adobe.com/xap/1.0/mm/">
http://www.radpdf.com</pdf:Producer>

对于可能的IP也同样粗糙

> gc .\sample.txt |sls '^(.*?(\d{1,3}\.){3}\d{1,3}.*)$'|%{$_.Matches.Groups[1].Value}
<</Producer(RAD PDF 2.36.8.0 - http://www.radpdf.com)/Author(alesk)/Creator(RAD PDF)/RadPdfCustomData(pdfescape.com-open-AC00E8D5A4B4C84BC37A2054F4EC794B0297765728CB8415)/CreationDate(D:20160825075202+01'00')/ModDate(D:20170711012532-08'00')>>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="DynaPDF 4.0.11.30, http://www.dynaforms.com">
<pdf:Producer>RAD PDF 2.36.8.0 - http://www.radpdf.com</pdf:Producer>
Aliases used:  
gc  = Get-Content  
sls = Select-String  
%   = ForEach-Object

是的,这是可能的。您可以找到所有的网址,然后提取他们使用反向引用。您可以阅读有关反向引用here的更多信息。你知道吗

# Pattern describing regular expression
pattern = re.compile(r'(\(https?[:_%A-Z=?/a-z0-9.-]+\))') 

# List where we store all URLs
urls = []

# For each invoice pattern you find in string, append it to list
for url in pattern.finditer(string):
    urls.append(url.group(1))

注意:

您应该使用pattern.finditter(),因为这样您可以通过调用string的文本中的所有模式结果进行迭代。从重新查找文件:

re.finditer(pattern, string, flags=0) Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

相关问题 更多 >