User Tag List

Results 1 to 8 of 8

Thread: How can I use the Regular Expression to pull data?

  1. #1
    Clicker Multimedia Fusion 2 DeveloperInstall Creator Pro

    Join Date
    May 2010
    Posts
    536
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How can I use the Regular Expression to pull data?

    Hey, folks!

    I was hoping there was someone in the community who knew how to parse HTML with the Regular Expression object better than I could (which is next to none!).

    I am trying to use the Regular Expression object to parse the title of HTML files (literally, from the <Title> tags). However, my attempts to do so have been unsuccessful; I don't know if I'm performing an incorrect action or using an incorrect expression to display the results.

    Is there anyone out there who could give me direction on how to parse the title of HTML files from the <Title> tags?

    Thank you very much for your help! I really appreciate it!



    Most graciously...

    RGBreality

  2. #2
    Clickteam Clickteam
    Anders's Avatar
    Join Date
    Jun 2006
    Location
    Denmark, Århus
    Posts
    3,456
    Mentioned
    5 Post(s)
    Tagged
    1 Thread(s)

    Re: How can I use the Regular Expression to pull data?

    Even though regular expressions are really bad at parsing HTML, in this case you don't even need to use RegEx to find the title.
    If you for example use String Parser 2, you can find the position of the first occurrence of "<title>" (an index number) and then the position of "</title>". You now have two index numbers where you want to find the string in between. You do that by using
    Mid$(<html here>, firstIndex + 7, lastIndex-firstIndex)

    That should give you the title. The +7 is the length of '<title>'.

  3. #3
    Clicker Multimedia Fusion 2 DeveloperInstall Creator Pro

    Join Date
    May 2010
    Posts
    536
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Re: How can I use the Regular Expression to pull data?

    Hey, Andos! Thank you for your reply!

    So, let me see if I can follow you here in a specific manner... All of the following uses the String Parser 2 object:

    Action 1: Set default delimiter to "<title>".
    Action 2: Add delimiter "</title>".
    Action 3: Perform a Mid$(<PATH TO HTML FILE>, .....)

    It's here that I get confused... How do I specify the index of the first (or default) delimiter, and then the second? I did see the "Get Delimiter Index" option, but that seemed to refer to the delimiter index you assigned that delimiter (not its placement in the source document).

    Could you offer me a little more guidance!

    Thank you very much, Andos! I really appreciate it!


    Most graciously...

    RGBreality

  4. #4
    Clickteam Clickteam
    Anders's Avatar
    Join Date
    Jun 2006
    Location
    Denmark, Århus
    Posts
    3,456
    Mentioned
    5 Post(s)
    Tagged
    1 Thread(s)

    Re: How can I use the Regular Expression to pull data?

    <html here> = the string containing the HTML

    I modified it a little to only use String Parser 2:
    http://andersriggelsen.dk/uploads/extractTitle.mfa

  5. #5
    Clicker Multimedia Fusion 2 DeveloperInstall Creator Pro

    Join Date
    May 2010
    Posts
    536
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Re: How can I use the Regular Expression to pull data?

    Hey, Andos!

    I still seem to be having some problems, though at least I'm getting varied results as I try different things...

    I'm not sure if I'm translating your example correctly (without using the counters). Here is the actual code I'm using to pull the specific title information from a Regular Expression object that contains the text of the HTML file:


    Mid$(GetString$( "Regular Expression object" ), indexOfSub( "String Parser", "<title>", 1)+6, indexOfSub( "String Parser", "</title>", 1)-7)


    Yet, if I simply put in the actual text from the Regular Expression object, the HTML code appears. Furthermore, if I use a starting character of 459 as the first character to extract the middle string, then I get proper results 99% of the time (once in a while it seems an HTML file's "<title>" tag begins later).

    So, I'm not sure what I might be doing wrong. Any ideas?

    Thank you for giving me a hand!



    Most appreciatively...

    RGBreality

  6. #6
    Clicker Multimedia Fusion 2 Developer

    Join Date
    Jun 2006
    Location
    Darlington, UK
    Posts
    3,298
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Re: How can I use the Regular Expression to pull data?

    Pretty sure it should be:
    Quote Originally Posted by RGBreality
    Mid$(GetString$( "Regular Expression object" ), indexOfSub( "String Parser", "<title>", 1)+7, indexOfSub( "String Parser", "</title>", 1)-indexOfSub( "String Parser", "<title>", 1))

  7. #7
    Clicker Multimedia Fusion 2 DeveloperSWF Export Module

    Join Date
    Jun 2006
    Posts
    6,773
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Re: How can I use the Regular Expression to pull data?

    RGBreality, you don't need the Regular Expression object at all. Look at Andos' example. Use it with the raw, unparsed HTML.

  8. #8
    Clicker Multimedia Fusion 2 DeveloperInstall Creator Pro

    Join Date
    May 2010
    Posts
    536
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Re: How can I use the Regular Expression to pull data?

    Hey, folks! Thank you for all your help!

    I was able to figure a work-around (using the Regular Expression object as search criteria for the "</title>" string.

    I'm afraid I can't pull directly from the raw HTML file, as I am only working from the directory path to the raw HTML file. So, I have been using the Regular Expression object to import the raw HTML file into its internal string; this was proving necessary as I'm using a derivative of Nifflas' (wrong spelling, sorry!) file-search example to find specific text from the file-search results. In doing so, I'm only referencing directory paths to the HTML files. (I have found that using Nifflas' groundwork, searching through thousands of pages of my company's HTML Help documents is TONS faster than using the Search object's indexing mode.)

    (It very well could be that I'm doing this totally ass-backwards, as I'm definitely still a MMF2 newbie. But at least I'm getting the results I wanted!)

    So, the syntax I used for the Extract Middle String expression ended up like this:

    Mid$(GetString$( "Regular Expression object" ), 456, Submatch Start( "Regular Expression", 1)-458)

    (where "Submatch Start" is the "</" portion of the "</title>" tag).

    The search functionality of this application I'm developing has turned out really well (though, of course, I stand on the shoulders of MMF2 giants). If anyone is interested, I'd be glad to share the source file once I have the search parameters finished (though I am horrible at documenting comments).

    Thanks again for everyone's assistance! I really do appreciate it!



    Most graciously...

    RGBreality

Similar Threads

  1. Pull Playercharcter up a Ledge with a nifty Animation
    By Gogeta in forum Multimedia Fusion 2 - Technical Support
    Replies: 26
    Last Post: 1st February 2014, 04:14 AM
  2. Please help me, or I might pull out all my hair!
    By StardustSpeedman in forum Multimedia Fusion 2 - Technical Support
    Replies: 2
    Last Post: 24th October 2012, 11:58 AM
  3. How to pull off a transition like Link's awakening?
    By FragmentedBergyo in forum Multimedia Fusion 2 - Technical Support
    Replies: 2
    Last Post: 4th October 2012, 12:02 AM
  4. How do you pull from an ini using a list?
    By aylictal in forum Multimedia Fusion 2 - Technical Support
    Replies: 5
    Last Post: 5th August 2012, 06:12 PM
  5. iOS 5 Pull Down Notifications interference
    By Keith in forum iOS Export Module Version 2.0
    Replies: 0
    Last Post: 22nd October 2011, 09:57 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •