Jump to content


Welcome to BrainDen.com - Brain Teasers Forum

Welcome to BrainDen.com - Brain Teasers Forum. Like most online communities you must register to post in our community, but don't worry this is a simple free process. To be a part of BrainDen Forums you may create a new account or sign in if you already have an account.
As a member you could start new topics, reply to others, subscribe to topics/forums to get automatic updates, get your own profile and make new friends.

Of course, you can also enjoy our collection of amazing optical illusions and cool math games.

If you like our site, you may support us by simply clicking Google "+1" or Facebook "Like" buttons at the top.
If you have a website, we would appreciate a little link to BrainDen.

Thanks and enjoy the Den :-)
Guest Message by DevFuse
 

Photo
- - - - -

Ripping BrainDen to an XML File -- Thoughts?


  • Please log in to reply
5 replies to this topic

#1 EventHorizon

EventHorizon

    Senior Member

  • VIP
  • PipPipPipPip
  • 512 posts
  • Gender:Male

Posted 22 July 2012 - 05:57 AM

Someone posted that they would like to have a paper copy of brainden to take on flights, camping, etc. I thought it would be nice to have a local electronic copy to search through, annotate, rank, mark favorites, mark ones to work on later, etc. It wouldn't be too hard to export from an electronic copy to a text document then into a word processor to format and print it out. XML is a standard, so I thought I may use that to store it.

I played around with handling XML documents in java (I generally use java when I want GUIs) and worked up a simple little XML editor and started manually copying and pasting puzzles into an XML document (and replacing images with text art). This turns out to be a very slow process. I got to around thread 1700 of 15000 (580 total puzzles in the file) before deciding there was a better way to approach it (though it was interesting to see all the unsolved puzzles and old puzzles I had forgotten about but liked). Not only this, but I wasn't recording who posted the puzzles, who gave the solutions, when it was posted, the brainden topic number (to easily find the topic on brainden later), etc.

Spoiler for For those who want to see my simple editor and incomplete database



I'm thinking I'll write a bash script to rip the first posts of all topics to a file, then manually prune repeats and other undesirables (should be much faster since I'm not loading web pages, and I won't be loading topic numbers that reach an error page), then manually go through each of them and find the posted solutions and add text art or descriptions in place of images.

I think I'll get at least the following pieces of information using the bash script:
-post date
-topic number
-who posted it
-title (if it exists)
-subforum it came from (ignoring the whole miscellaneous subforum)
-puzzle description (ie, the actual puzzle)

I'll do the following manually (after pruning undesirables):
-add solution (and person who posted it) or mark unsolved
-add tags to tell what kind of puzzle it is
-add Title (if it doesn't have one I'll make one up)
-replace images with text art or descriptions
-fix formatting

Here are some of the tags I'm thinking of using:
Spoiler for In a spoiler to conserve space


As for the viewer/editor, I'll plan to add in the following features:
-Export to text document (puzzles with their solutions or all puzzles first solutions after)
-Import file into current list
-Easily edit anything
-Can add in fields or tags and name them
-Tags as checkboxes
-Separate tab for each field (bigger text areas and don't need to see solution unless you click its tab)
-Ability to list by requested tag(s) and/or rank


Here are the questions I have:
Is there a better way to go about it than this approach?
Am I missing something that could easily be ripped using a script?
Should I not include some of the information listed above or include something else?
Did I miss any tags that would be good to use or do I have a useless one?
What other features should the new editor have?
Any other thoughts about this project?
Do you think this would be useful, or is it a waste of time?
Would anyone be interested in helping once I get to a point where I can divide workload?
  • 0

#2 rookie1ja

rookie1ja

    Senior Member

  • Site Admin
  • PipPipPipPip
  • 1341 posts
  • Gender:Male
  • Location:Slovakia

Posted 22 July 2012 - 09:55 PM

That seems great. Let me know if I can help anyhow. Unfortunately, I am not good at scripts so perhaps I could help with other questions.

Should I not include some of the information listed above or include something else?
The more puzzles, the easier the interface and search must be. Let's keep it simple.

Did I miss any tags that would be good to use or do I have a useless one?
Depends on the final form. For instance, BrainDen Android App has just 4 categories.

What other features should the new editor have?
Not sure how I should imagine the editor and working with the database - some kind of android app or a program interface with dropdown menu to select type of puzzle?

Any other thoughts about this project?
What are the advantages compared to live BrainDen version (apart from having it offline)? It's great to have listed just the best brain teasers like you selected but could you list some other advantages that would make this popular? Where/when/by whom could it be used? Is there a better way to propagate it compared to the live BrainDen site?

Do you think this would be useful, or is it a waste of time?
BrainDen site has static pages (old best of selection), this forum (great fresh puzzles), igoogle gadgets and Android App. I wonder how this new channel would fit in the strategy - what gaps it fills and what the added value is.

Would anyone be interested in helping once I get to a point where I can divide workload?
I can help as my skills and free time permits.

Don't hesitate to drop me a note or reply to this thread if you would like to discuss in more details.

Thanks.
  • 0

rookie1ja (site admin)
Optical Illusions
Support BrainDen

"To start: Press any key... Where's the 'any' key?" - Homer Simpson


#3 EventHorizon

EventHorizon

    Senior Member

  • VIP
  • PipPipPipPip
  • 512 posts
  • Gender:Male

Posted 23 July 2012 - 02:14 AM

I'm thinking the database will simply be a condensed form of the forum. It will have the puzzles and solutions without the discussion in between. It will occasionally need to be updated with the newest posted puzzles and solutions.

The more puzzles, the easier the interface and search must be. Let's keep it simple.
I was thinking the interface would be fairly similar to the one I used in the simple XML editor I made. It would have a drop down box at the top that by default lists every puzzle, but you can prune the list by tag values ("tag a, but not tag b" etc). But I guess it would be a good idea to think of other ways to simplify searching for puzzles in the database without increasing the size of the database much. The editor/viewer/app is separate from the database, so someone could always build another app to read and view the database.

The reason I like the tag approach is that it is independant of the actual wording of the puzzle. If the puzzle deals with geometry, it should be tagged "geometry" regardless of whether the description mentions geometry or even geometric terms. So each puzzle will have a set of tags that describe the essence of the puzzle and not just the way it was worded. The problem is finding enough useful tags to make finding any given puzzle not take too long.

What are the advantages compared to live BrainDen version (apart from having it offline)?
It will be a condensed form so you can find the solutions a lot easier (click the solution tab instead of searching through pages of posts). Someone could go through the database and fix grammar errors, ambiguous descriptions, poor formating, etc. It would be personalizable, so you could mark favorites, annotate, have a "to do" list tag for puzzles to work on later, etc. People could add in their own puzzles that they want to store and remember, but hopefully they would post them in the forum as well.

Where/when/by whom could it be used? Is there a better way to propagate it compared to the live BrainDen site?
BrainDen site has static pages (old best of selection), this forum (great fresh puzzles), igoogle gadgets and Android App. I wonder how this new channel would fit in the strategy - what gaps it fills and what the added value is.
It's just a condensed mirror of the forum really. I guess you could think of it as something that goes along with the forum. I'm not really sure how much value it would have to the community, which is why I'm interested in people's thoughts on this. Hopefully it can become something useful to many and not just some personal project.


I'm thinking how I'll split up work is to have people take a section of the database to make pretty, add solutions, tag puzzles, and maybe add in hints. So I'd need to get the script done (which shouldn't be that bad) and make an editor to use (not necessarily complete and polished, just enough to edit and tag) first, then split up sections to be worked on and returned to me to be combined. I'm sure this would leave duplicates, so they'll be taken care of later as they are found.

I've got a script that is almost ready to get all the first posts (I've parsed out all the information), but I need to decide how to work with bbcode like spoilers and such. I also need to decide how to organize the XML/database so I can have the script output it in a form ready to be distributed and worked on.
  • 0

#4 rookie1ja

rookie1ja

    Senior Member

  • Site Admin
  • PipPipPipPip
  • 1341 posts
  • Gender:Male
  • Location:Slovakia

Posted 23 July 2012 - 08:38 AM

Is it possible that visitor just clicks 1 thing from the web and he/she immediately is browsing the database. In other words, no need to install anything on computer (eg. no need to install Java environment) and everything can be visible on any desktop/mobile (not depending on operating system or installed programs)? I guess if we want an offline version then a lot of info has be downloaded but let's keep it in reasonable size for mobile phones.

Will the search work only based on tags or will it be full text search? Might be interesting to have both options - tags if you like a certain type of puzzles and full text search if you want to find a puzzle you read a few days ago (eg. search all "Morty's" puzzles).

Adding tag for difficulty might help - user could solve puzzles adequate to the level of skills. 1 admin person should rate all puzzles to avoid discrepancies.

Link to solution might be useful - if someone disagrees and wants to check reasoning of others.

Is there an easy way to incorporate the database into Android App? The major obstacle used to be that the solution has to be found in the forum thread while your approach could give the solution directly. On one hand, Android App would be a great place to have all content available offline, however, on the other hand it might be too big in MB to download all puzzles onto phone. So I am still thinking of an effective way of distribution.

I think that nature of this database (offline browsing of well sorted content anywhere) would have the best use on mobile phone - that's what we could focus on. What do you think?

Thanks.
  • 0

rookie1ja (site admin)
Optical Illusions
Support BrainDen

"To start: Press any key... Where's the 'any' key?" - Homer Simpson


#5 EventHorizon

EventHorizon

    Senior Member

  • VIP
  • PipPipPipPip
  • 512 posts
  • Gender:Male

Posted 24 July 2012 - 09:53 AM

I thought of another advantage. You can easily find all the unsolved puzzles. I'm kinda interested in the puzzles that have slipped through without being answered (I've already seen a few interesting ones from my very incomplete database). I know there are a few I've forgotten about, but had an idea to try or was planning on writing code to solve. There are also some puzzles whose answers were never confirmed. Some optimization problems may have better solutions out there somewhere. Perhaps each puzzle can be tagged one of unsolved, unconfirmed, or solved.

Will the search work only based on tags or will it be full text search? Might be interesting to have both options - tags if you like a certain type of puzzles and full text search if you want to find a puzzle you read a few days ago (eg. search all "Morty's" puzzles).
If they have access to the web, they could text search on the forum. Searching by author would be quick like tag searches. I was trying to avoid full text search. I don't want to deal with caching, creating word indexes, or other such optimizations. Depending on the platform it may be quick enough to search linearly through the database, but maybe not. Some optimization may not be that bad... but I haven't really looked too much into search optimization.

Adding tag for difficulty might help - user could solve puzzles adequate to the level of skills. 1 admin person should rate all puzzles to avoid discrepancies.
I think that's a great idea. Having one person rate them all may not necessarily remove discrepancies unless they follow a strict rubric of some sort. Having been a T.A. and graded tests I've seen how grading subjective answers will tend to let scores slide around over time without something to tie it down. Then again, a simple easy, medium, or hard (and very hard / impossible?) doesn't seem like it would be too hard to get a good approximation even with different people rating.

It may be easy to estimate the solution for some puzzles quickly or to be able to show a solution works easily, but it may have a tough logical derivation / proof. Things like this will complicate marking the difficulty. Some puzzles have an easy way to solve and a hard way, too. Perhaps multiple difficulty ratings (one for each method and one for proof), but that may be overkill. Maybe just give the range of difficulty since some puzzles include multiple parts.

Link to solution might be useful - if someone disagrees and wants to check reasoning of others.
Again, that would be a another good thing to include in the database. All it would need is post number(s). I think it would be good to include the reasoning/proofs in the database too (like I did, where applicable, with the incomplete database). Perhaps in a separate text field, now that I think of it. That way it could easily be separated programmatically from the simple answer if the database needs to be made smaller.

Is it possible that visitor just clicks 1 thing from the web and he/she immediately is browsing the database. In other words, no need to install anything on computer (eg. no need to install Java environment) and everything can be visible on any desktop/mobile (not depending on operating system or installed programs)?
Is there an easy way to incorporate the database into Android App? The major obstacle used to be that the solution has to be found in the forum thread while your approach could give the solution directly. On one hand, Android App would be a great place to have all content available offline, however, on the other hand it might be too big in MB to download all puzzles onto phone. So I am still thinking of an effective way of distribution.
I think that nature of this database (offline browsing of well sorted content anywhere) would have the best use on mobile phone - that's what we could focus on. What do you think?

If you are not requiring java, then I think that leaves html5 or flash for the web 1-click options (assuming we don't want simple html pages (would be really easy to make from the database with code), or cgi, servlets, etc). I don't know either (looked into flash a bit), but have been meaning to learn one. I think it could be doable (definitely on a computer (there are larger flash games files than what the database will be when compressed), though a mobile version may need restructuring depending on the platform).

The Android app itself would simply be a reader/viewer for the database (or could potentially have the database embedded in it, but I'd prefer it otherwise). The actual XML file I was thinking of, uncompressed, will likely be around 7-10mb (This is extrapolating from the size of my incomplete one, which may be off... though I don't think it would be more than 30mb), and a third of that size compressed (again, extrapolating from current compression rate). We could also separate the puzzle and solutions from the information about them (tags,author,title,etc in the index/"Table of Contents" file along with pointers into the larger file of puzzles and solutions (which could possibly be in some compressed form, and just decompress the puzzle/solution requested)), that way less data needs to be stored in memory and traversed regularly... and it would probably work fine for an Android app (I have no experience with Android or its apps. I just checked online and I think most cheap android phones have about 512mb of working memory, support for an SDHC card (so lots of storage space), and around a 1ghz processor... that should be more than enough of all those (by a lot).)

Once the database is made, it can be changed programmatically to be used on specific platforms. We'd just need to know what information the app on each platform will need left in the database, and what form the database needs to take. Of course, we'd need to actually code up the reader app on the specific platforms. It shouldn't be that hard to get it working on any specific platform (with possibly the exception of iPhone/iPad... since you need a mac (though it may be possible with a virtual machine hackintosh... still haven't gotten around to looking into that)).

The simplest app for me would just be some java code on my computer. It does sound like a good idea to have an Android app. We could also try a flash version for the web (just the index so it's small and just link it to the forum). We just need to get the database ready, then find ways to display it later.
  • 0

#6 rookie1ja

rookie1ja

    Senior Member

  • Site Admin
  • PipPipPipPip
  • 1341 posts
  • Gender:Male
  • Location:Slovakia

Posted 26 July 2012 - 09:25 AM

Solved/Unsolved tag would be quite useful.

Full text search online and tag search in the app is OK with me due to the mentioned constraints.

Puzzle difficulty grade (easy/medium/hard) - I trust your judgement and rating. Let's keep it simple - 3 categories rating puzzle itself, 1 rating per puzzle.

Link to solution/discussion agreed to be included.

Platform of distribution
Flash - no (poor google indexing, designated more for interactive games and not for text, lacking skills for easy/fast creation)
HTML5 - maybe (easier maintaining, good indexing, however separate html pages would just duplicate the forum content in another form)
Android App - still my favorite, if we agree to proceed within the existing BrainDen Android app then I might get you the details - in what form the xml could be easily incorporated into the app. iPhone/iPad will not be the primary target for the time being.

To summarize, incorporating the database into existing BrainDen Android App is the way to go. If you agree, let me know what details on the App you would need and I will send them to you.

Thanks.
  • 0

rookie1ja (site admin)
Optical Illusions
Support BrainDen

"To start: Press any key... Where's the 'any' key?" - Homer Simpson





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users