Page 1 of 2
Import scraped CSV from EAv2
Posted: Mon Dec 09, 2024 5:20 pm
by dandelion
I created a script to scrape EAv2 and export it into csv. It contains thread title, post author, post date/time, and post content. Is it possible to import it here maybe as a static subforum? Here is a sample run for the "Comments and Suggestions" subforum. See below link for the zipped csv file
https://drive.google.com/file/d/1asu3Ei ... share_link
Re: Import scraped CSV from EAv2
Posted: Tue Dec 10, 2024 4:47 am
by fusion
Thanks for this. I am just waiting to see if EAv2 will come through with passing us the current forums SQL file.
The SQL file will be the best way to move topics over. It will take some time but it will bring everything across.
Text based versions will of cause be great to have and if EAv2 does not come through with the above it will be the only way forward.
Re: Import scraped CSV from EAv2
Posted: Tue Dec 10, 2024 4:58 am
by dandelion
I feel like the old admins are just done with the site and ready to move on. They've done their jobs by handing over a copy of the drive to Kinsey.
Re: Import scraped CSV from EAv2
Posted: Tue Dec 10, 2024 6:52 am
by fusion
We are looking into that with them. If all goes the way EAv3 would like, we would hold onto eunuch.org but sadlly the eunuchworld.org would be let go.
We still have till 01/01/2025 so we will wait and see what happens before importing the old forum here.
Re: Import scraped CSV from EAv2
Posted: Tue Dec 10, 2024 8:04 am
by WheelyFixed
FWIW I just looked at the first 500 lines or so in a text editor, and it looked OK although it was hard to tell the difference between messages and quotes, and many of the messages seemed to have had the 'back trace' of previous messages in the thread...
LibreOffice Write didn't seem to want to do nice things with it, but I might have been able to do something to make it work if I really worked at it...
I think everything is there, but it might need a good bit of 'cleaning up' or other processing to suck it back into the forum in a usable format, which is beyond my skillset...
WheelyFixed
Re: Import scraped CSV from EAv2
Posted: Tue Dec 10, 2024 8:21 am
by dandelion
Perhaps I can make it so that the Post Content is the raw html of the site. That will require some massaging but at least it'll be somewhat more readable.
Another option would be to scrape the static html files of the forum. Hopefully that will preserve the links, and we can just host it as a separate site.
Re: Import scraped CSV from EAv2
Posted: Tue Dec 10, 2024 9:39 am
by fusion
Just wanted to let you know I can setup a GitLab for code hosting so we can keep it away from GitHub as we don't need to be giving Microsoft Access to our information.
let me know if you would like me to set this up.
Re: Import scraped CSV from EAv2
Posted: Tue Dec 10, 2024 2:26 pm
by dandelion
Oh you mean instead of me uploading stuff to google drive? Yea that may be a good idea. I'll have a "scrap pad" to put stuff to as I work through the archive.
fusion wrote: Tue Dec 10, 2024 9:39 am
Just wanted to let you know I can setup a GitLab for code hosting so we can keep it away from GitHub as we don't need to be giving Microsoft Access to our information.
let me know if you would like me to set this up.
Re: Import scraped CSV from EAv2
Posted: Sat Dec 14, 2024 8:59 am
by Nulloguy
dandelion wrote: Mon Dec 09, 2024 5:20 pm
I created a script to scrape EAv2 and export it into csv. It contains thread title, post author, post date/time, and post content. Is it possible to import it here maybe as a static subforum? Here is a sample run for the "Comments and Suggestions" subforum. See below link for the zipped csv file
https://drive.google.com/file/d/1asu3Ei ... share_link
I took a quick look and it seems like it would have gotten most everything. Still, there was so much going on over there in the past nearly 25 years, I think it would take some wonderkind to sort out if that's the case.
It appears the post content is there, but I'm not sure the post structure and hierarchy is. Though with the columns of the CSV, I think someone with the skills could import it into a PHP system, and work with something like WordPress to get it going.
Frankly, I think some sort of system there would be better than phpBB, but I'm not the one doing this, and I greatly appreciate the efforts of the folks who are taking this on.
Re: Import scraped CSV from EAv2
Posted: Sat Dec 14, 2024 11:24 am
by dandelion
Yea I'm probably going to modify the script so instead of capturing the plain text, it would capture the html code of the post contents. Or perhaps both for future flexibility.
But I'm rather pre-occupied with building the fiction archive v3...
Nulloguy wrote: Sat Dec 14, 2024 8:59 am
I took a quick look and it seems like it would have gotten most everything. Still, there was so much going on over there in the past nearly 25 years, I think it would take some wonderkind to sort out if that's the case.
It appears the post content is there, but I'm not sure the post structure and hierarchy is. Though with the columns of the CSV, I think someone with the skills could import it into a PHP system, and work with something like WordPress to get it going.
Frankly, I think some sort of system there would be better than phpBB, but I'm not the one doing this, and I greatly appreciate the efforts of the folks who are taking this on.