Wikisource:Scriptorium

From Wikisource
Jump to navigation Jump to search
Scriptorium
The Scriptorium is Wikisource's community discussion page. Feel free to ask questions or leave comments. You may join any current discussion or start a new one; please see Wikisource:Scriptorium/Help. Project members can often be found in the #wikisource IRC channel webclient. For discussion related to the entire project (not just the English chapter), please discuss at the multilingual Wikisource. There are currently 337 active users here.

Contents

Announcements[edit]

New template for hyphenated words across pages[edit]

Having just run across a work that tripped multiple edge cases in how ProofreadPage joins together pages, I finally put together a utility template to make dealing with these easier.

The details are in the template documentation, but the short version is that if you have a hyphenated word that has been split across pages (i.e. where the word should still be hyphenated when transcluded into mainspace), or where the page ends with something (like an em-dash) that should be joined with the following page without inserting a space character, you can throw a {{peh}} at the end and get the desired effect in both Page: and mainspace (or in Translation:, or anywhere else). It defaults to a hyphen (“-”) when no arguments are provided, or uses its first argument otherwise (e.g. {{peh|—}}).

It's had limited testing, but it's simple enough that I don't think there's much risk of weirdness.

Oh, also, you can (I think) achieve the exact same results using {{hws}}/{{hwe}}, so if you're already using those for this then there's no particular reason to switch. This is just intended as a simpler and easier to use way to achieve the same result for those of us (surely I'm not the only one? Right? Right…?) who find {{hws}}/{{hwe}} complicated and confusing to use for these scenarios. --Xover (talk) 17:58, 13 November 2019 (UTC)

Hm, that is a clever workaround… Much easier than hws/hwe. --Jan Kameníček (talk) 18:07, 13 November 2019 (UTC)

Proposals[edit]

Proposed changes to WS:WWI regarding advertisements[edit]

There is a proposal to update the wording of our policy regarding the inclusion of advertisements, in particular advertisements that are part of a larger transcluded text. Please see the discussion at Wikisource talk:What Wikisource includes#Proposed changes to Advertisement section. —Beleg Tâl (talk) 13:50, 15 November 2019 (UTC)

This is more of a clarification than a change of policy. The previous instructions were very vague and confusing. Kaldari (talk) 23:27, 18 November 2019 (UTC)

Automatically pull text status from Wikidata badges and display on main page of work[edit]

Per the discussion below at Wikisource:Scriptorium#KaldariBot, Sam Wilson and myself have set up a sandbox version of the header template that automatically pulls the text status of a work from its associated Wikidata item (based on the badge assigned to the sitelink) and then displays the appropriate indicator icon(s) in the upper-right part of the page and assigns the page to the appropriate category. (The icon and category are both taken from the properties of the badge's Wikidata item and are thus easily configurable.) To see some live examples of this, take a look at The Life of the Spider and The Riverside song book/The Open Window. Note that the template currently only pulls Wikisource badges (and thus not "featured" status since we are piggy-backing on Wikipedia's featured article badge), but this could easily be changed (See this and this). Please indicate below whether you think we should apply this functionality to the main {{header}} template, and thus to all works on Wikisource. Kaldari (talk) 23:42, 19 November 2019 (UTC)

@Billinghurst, @Mpaa, @Beleg Tâl: ^. Kaldari (talk) 23:43, 19 November 2019 (UTC)
@Kaldari: fantastic work! Is there a reason why it checks for mainspace in the invocation? I ask out of curiosity, because {{header}} should only be used in mainspace anyway, so hardcoding that restriction into only the one component seems unnecessary —Beleg Tâl (talk) 23:58, 19 November 2019 (UTC)
No reason other than being overly cautious. I'll remove the restriction. Kaldari (talk) 00:10, 20 November 2019 (UTC)
  • I think it'd be terrific to have badges displayed more prominently here, so that maybe they get used more and so it's easier to query for works that are fully validated. Another thing I've been wondering about is digital document (Q28064618): do we have a category for these here? We should add it as topic's main category (P910). —Sam Wilson 10:01, 20 November 2019 (UTC)
    Yes, it is a very good idea, thank you for introducing it. I also support the more prominent display of the badges. The text that appears when you hover over the badge (currently: "Help:Text status") might also be changed and tell the reader directly which status it is. --Jan Kameníček (talk) 10:58, 20 November 2019 (UTC)
    What is digital document (Q28064618) and why is it useful? Wouldn't every single item on this site fall into such a category? —Beleg Tâl (talk) 11:50, 20 November 2019 (UTC)
    • It's used to mark w:born-digital works, which might have a perfect text layer and not need proofreading, and for which the 'original' is of less importance (i.e. multiple copies are identical). Its history is described in phab:T153186. —Sam Wilson 23:39, 20 November 2019 (UTC)
  • Question: For Featured texts, would we want to do the same? I'm hesitant to store our badges off-site because it means that we won't have notification if a text's status is changed remotely. For fully validated works, I can see a simple means to double-check with a bot. But what would this mean for our Featured texts? --EncycloPetey (talk) 16:49, 20 November 2019 (UTC)
    • It's possible to show Wikidata changes on your watchlist here. I know that can be annoying sometimes, because there can be lots of edits, but it does make it easier to catch changes to badges and other metadata, without having to go to Wikidata. —Sam Wilson 23:39, 20 November 2019 (UTC)
      • But does that mean we have to rely on editors here watching certain pages over there to catch this? What happens when membership here changes, and no one is watching those pages any longer? You've indicated that this is a thing which is possible, but is it advisable to do it that way? --EncycloPetey (talk) 00:30, 22 November 2019 (UTC)
        • @EncycloPetey: Nope, it's easier than that: for any page you watch here, changes to its Wikidata item will appear in your watchlist, regardless of whether you watch the page over on Wikidata or not. —Sam Wilson 03:23, 22 November 2019 (UTC)
          • Cool! Does this require setting a preference? -Pete (talk) 05:08, 22 November 2019 (UTC)
            • @Peteforsyth: It can be enabled as a preference (in the Watchlist section of Special:Preferences) but it can also be turned in directly from the watchlist page (and saved in a watchlist filter, if desired; I have it enabled in my default filter). It's also available for Special:RecentChanges. —Sam Wilson 05:53, 22 November 2019 (UTC)
      • @Samwilson: That only answers the first part of my concern, not the second part. Doing things this way means that the only way we can keep track is if someone continuously active here has those page on their watchlist, maintains constant vigilance, and never leaves. --EncycloPetey (talk) 05:45, 22 November 2019 (UTC)
        • @EncycloPetey: But that's true of pages here as well, isn't it? If a page here isn't on anyone's watchlist, then changes to it are likely to not be noticed. And with the categorization feature, badged pages will be added to relevant categories and so people watching those categories can see when things come and go. —Sam Wilson 05:53, 22 November 2019 (UTC)
            • No, that's not true. The difference is that, under our current procdures, anyone watching Recent Changes here can spot that kind of edit if it happens locally, and does not have to have special pages in their Watchlist. By contrast, the proposal would require current editors to add a specialized set of pages to their Watchlist and would require whoever is monitoring to remain active and vigilant in perpetuity, or at least perpetuate through other monitors. And if the person who has these pages on their Watchlist is away for a week, they might miss such changes. --EncycloPetey (talk)
          • Agreed, this is true of everything here. But there is an important point here. Currently featured texts are always protected, so admins always know if {{featured}} is removed from a page regardless of whether anyone is watching the page. We can't protect Wikidata items in the same way. I think we would need a bot to handle it, periodically checking that Wikidata badges are correct and fixing them when they are not. (Such a bot would also be useful to ensure the proofread status badge correctly matches the Index status field, at least until the Index status field is updated to store the info directly in Wikidata.) —Beleg Tâl (talk) 13:42, 22 November 2019 (UTC)
            • But, as with having a Watchlist, it requires someone to have a bot and to use it regularly. This is an added specialist maintenance task. I've been involved in wikis when the one person with the necessary bot stopped editing, or stopped running the bot, or the community lost access to the bot. --EncycloPetey (talk) 16:44, 22 November 2019 (UTC)
              • @EncycloPetey: My initial proposal was to manage all of this through specific Wikisource templates rather than using Wikidata (similar to how {{featured}} works). However, Beleg Tâl thought using Wikidata would be a better idea. Frankly I don't care which method is used, my only goal is to make validated texts discoverable. Perhaps I should run a poll to find out which option has more support. Kaldari (talk) 19:25, 22 November 2019 (UTC)
                • Featured texts are a bit different from Validated texts.—The primary indicator of a work's Featured status is the presence of {{featured}} on the work page itself. Thus the addition of the badge and the conferring of Featured status are actually the same thing.—The primary indicator of a work's Validated status is the Progress field of the transcluded Index page. Copying this status manually from the Index page to a template in Mainspace is the sort of task which is invariably ignored by most editors and generally leads to a massive and ever-increasing backlog. Fortunately there is a tool whose entire purpose is to allow metadata to be centrally stored so as to prevent this exact issue; that tool is of course Wikidata. And as it happens, Wikidata already has a system set up to store this exact metadata in the form of badges.—The automatic sync between Index page and Wikidata does not yet exist, so the backlog I spoke of can be seen by observing how few Wikisource texts have Wikidata items with status badges (i.e. almost all of them).—Creating a bot to sync the data between Index pages and Wikidata, and then pulling the data directly from Wikidata, is exactly how Wikidata is intended to be used. Having a bot sync the data between Index pages and a third location (i.e. mainspace) is re-inventing the wheel that is already in place in the form of Wikidata, and it still leaves the Wikidata backlog unaddressed —Beleg Tâl (talk) 19:54, 22 November 2019 (UTC)
                • @Kaldari: Your initial proposal was only in regard to Validated works, which is a totally different thing from Featured works, as Beleg Tâl has explained above. The Validated works have an unprocessed backlog that is constantly growing, and that is worth addressing with some innovative solution. The Featured texts have no such backlog, and produce new items at a maximum rate of one per month. Creating a bot-reliant monitoring process that depends on outside data storage for the Featured texts is using a particle collider to open a peanut. --EncycloPetey (talk) 21:01, 22 November 2019 (UTC)
                  • @EncycloPetey: Got it. So you would prefer that text statuses like "validated" and "proofread" be handled with bots and Wikidata (which I'm happy to implement), but that "featured" continue to be handled manually with a template? If so, that's fine with me. Kaldari (talk) 22:24, 22 November 2019 (UTC)
                    • Yes, at least for now, based on the way cross-wiki data is handled. --EncycloPetey (talk) 00:24, 23 November 2019 (UTC)
                      • I agree with this solution too. I am also not convinced that anybody notices if somebody changes the status at Wikidata. Wikidata antivandal protection is very low measuring it by our standards and although it is possible to turn on showing the Wikidata changes here, my experience is that quite few people do it and the chance such a change gets through unnoticed is high. --Jan Kameníček (talk) 11:52, 23 November 2019 (UTC)
@EncycloPetey, @Jan.Kamenicek: I have withdrawn my proposal on Wikidata per your feedback. Do you have any further concerns with moving this forward? Kaldari (talk) 18:21, 24 November 2019 (UTC)
I have no other objections. Thank you for introducing the badges! --Jan Kameníček (talk) 20:09, 24 November 2019 (UTC)
Just one more detail: Above I suggested to change the text that appears when you hover over the badge, but I do not know whether my idea has been rejected or just unnoticed. Currently the text says just "Help:Text status". I suggest to replace this text with the status itself, e. g. "proofread" or "validated". --Jan Kameníček (talk) 20:20, 24 November 2019 (UTC)
@Jan.Kamenicek: Great suggestion. I'll see about implementing that. Kaldari (talk) 18:13, 25 November 2019 (UTC)
@Jan.Kamenicek: I've implemented your suggestion in the sandbox template. See The Riverside song book/The Open Window for example. Kaldari (talk) 21:00, 26 November 2019 (UTC)
Perfect, thanks. --Jan Kameníček (talk) 21:44, 26 November 2019 (UTC)
No, I have no other concerns besides Featured texts. If the community is fine with marking Validation status on Wikidata, then I back that as well. --EncycloPetey (talk) 21:55, 24 November 2019 (UTC)
Just waiting for someone to make the change at {{header}}. Kaldari (talk) 18:37, 28 November 2019 (UTC)
This is done now. Kaldari (talk) 22:02, 3 December 2019 (UTC)

Bot approval requests[edit]

KaldariBot[edit]

Pictogram voting comment.svg Comment if we are going to do this, what is the possibility to put the "validated" flag on the wikidata item interwiki for the respective works? To note that I am gathering that this is for situations where the Index: page has been marked as validated and there is a one-to-one relationship with a main namespace page. To note that this list does include subpages of works where the works have been uploaded as parts, some would warrant listing indepedently as validated, others, not so. [A find of <code/ shows interesting output. — billinghurst sDrewth 05:35, 31 October 2019 (UTC)
@Kaldari: I agree with billinghurst regarding subpages. I would leave them out as it might be controversial and limit entries to top level items for now.Mpaa (talk) 14:44, 10 November 2019 (UTC)
Please check your list for redirects. I have found one redirect in the list. — billinghurst sDrewth 05:39, 31 October 2019 (UTC)
I was planning to have the bot follow redirects when posting the template, but I'll just go ahead and replace any redirects with their targets in the list... Kaldari (talk) 23:03, 31 October 2019 (UTC)
@Billinghurst: I've replaced all the redirects in the list with their ultimate targets. Kaldari (talk) 19:42, 1 November 2019 (UTC)
Adding the flag in Wikidata should be fairly easy to do if the wikibase-api library supports it. If not, I'll need to dig into the actual Wikibase API, which might be complicated. Kaldari (talk) 00:12, 1 November 2019 (UTC)
It looks like wikibase-api does support setting badges. However, I think it would be best to add the badges after all the pages have been templated, categorized, and reviewed for accuracy. My list is mainly based on following title links from the Indexes. However, I've noticed that use of the title field varies considerably. Some link to disambiguation pages, some link to multiple pages, and some don't have links at all. I've tried to go through all the ones that are obviously problematic and fix them by hand, but I imagine I will have missed some. Once the pages are added to Category:Validated texts (by the template), it will be easier to review them all, as I can just open them in new tabs from the category page. Once they are reviewed, anyone could write a script to badge all the pages in the category. Kaldari (talk) 19:41, 1 November 2019 (UTC)
I think that we should have the header module pull the Validated status from Wikidata and display the badge that way, but I support having a bot ensure that the status is correct on Wikidata. —Beleg Tâl (talk) 21:55, 1 November 2019 (UTC)
@Beleg Tâl: I think that's a good idea in theory, but there are some practical problems. Few Wikisource editors bother to create or link Wikidata items to their works. Of the first 5 works in my list, only 2 were linked to Wikisource. If people aren't even linking to Wikidata consistently, I think there's a vanishingly small chance that they will try to keep the Wikidata badges up to date. I imagine that people will just start adding their works to Category:Validated texts manually, as it won't be very intuitive that you have to set a badge in Wikidata (a very obscure feature) in order to add the work to the Category on Wikisource. Plus I don't really see what we gain by having the status tracked on Wikidata rather than in Wikisource directly. Kaldari (talk) 01:33, 2 November 2019 (UTC)
@Kaldari: users will not add Wikidata badges manually, but they will not add Category:Validated texts manually either (I certainly won't). We'd need a bot either way, so we may as well have the bot do it "properly" i.e. by leveraging Wikidata (and thus preventing duplicate data). If we need a bot to create the Wikidata item in the first place, then we should look into that also. —Beleg Tâl (talk) 14:54, 2 November 2019 (UTC)
If I am not wrong, 922 items are missing wikidata item. I tried to pull Category:Validated texts from wikidata badge but I did not succeed, I could only pull Category:Validated. Anyone knows how?Mpaa (talk) 22:21, 5 November 2019 (UTC)
@Mpaa: The idea that I put forward, would be to have the header module check the text's associated data item, and if the validated badge is present, then it would add Category:Validated texts to the header (similar to how Category:Works with non-existent author pages is added by the header based on the presence of an associated author page, though this uses #ifexist instead of a Wikidata query) —Beleg Tâl (talk) 00:48, 6 November 2019 (UTC)
I got the idea, I am wondering if someone knows how to pull the badge for a sidlelink of a wikidata item, with the current available modules. If not, or if this is not supported, this doesnt seem a good way-forward at the moment, until this point is cleared.Mpaa (talk) 22:18, 8 November 2019 (UTC)
@Mpaa: Module:Edition contains code for retrieving the badge of the sitelink, though it doesn't do anything useful with it. —Beleg Tâl (talk) 15:29, 9 November 2019 (UTC)
@Beleg Tâl:, seems there are a few more steps to be done, like modify Module:Edition to get the badge ID or the category we want to associate to it.Mpaa (talk) 18:21, 9 November 2019 (UTC)
@Mpaa: it looks like w:Module:Wd does it properly, so we could just import that module. —Beleg Tâl (talk) 13:58, 10 November 2019 (UTC)
Sounds good to me. I am not very familiar with importing pages (I am uncertain if this needs to be flagged or not in this case: Include all templates), if someone volunteers better so, otherwise I will give it a try.
Update: As it seems the desired route is to record the information in Wikidata, I will need to write some more code: first to collect all the associated Wikidata items (unless Mpaa has already done this) and then to add the badges in Wikidata. Unfortunately, I'll be at the Wikimedia Technical Conference all next week, but hopefully I can start working on it afterwards if everyone agrees this is the best solution. Kaldari (talk) 20:08, 8 November 2019 (UTC)
If needed, I can generate a list, and also create the missing WD items. Mpaa (talk) 21:21, 11 November 2019 (UTC)

@EncycloPetey, @Billinghurst, @Mpaa, @Beleg Tâl: The first step of this process is now complete. We are now automatically pulling badge data from Wikidata and displaying it on Wikisource via the {{header}} template. We are also now automatically categorizing works based on these badges, for example, Category:Validated texts, Category:Proofread texts, etc. The next step is to add badges on Wikidata for texts that are missing them (via a bot). Before we do this, however, two questions need to be answered:

  1. Should "featured" text status replace the "validated" status or exist along-side it? In other words, is a featured text both "featured" and "validated", or is it just "featured" (which implies that it is also validated, proofread, etc.)?
  2. Should "digital document" status exist along-side other text statuses or replace them? In other words, can a text be both a "digital document" and "validated" (even though in theory a digital document shouldn't need proofreading and validation)?

Once these questions are settled, I can move ahead with the bot work. Kaldari (talk) 22:38, 3 December 2019 (UTC)

Typically, I have replaced the Validated badge with the Featured badge when marking Featured texts, but we could decide to retain both. I'm satisfied either way on that issue. The "digital document" badge refers only to the original, and applies only to works that were originally digital. It should never appear on works that were scanned from physical objects. Validation applies only to the process of verification, so there is no reason not to display both badges when both apply. However, I personally think it is inappropriate to use "digital document" as a badge, because the badges are meant to show quality of the work, and "digital document" is a statement about the source of a document rather than its status or quality on Wikisource. Even digital documents can require adapting to display correctly on Wikisource. --EncycloPetey (talk) 22:52, 3 December 2019 (UTC)

Repairs (and moves)[edit]

Designated for requests related to the repair of works (and scans of works) presented on Wikisource

Index:Carroll - Alice's Adventures in Wonderland.djvu[edit]

The following discussion is closed and will soon be archived:
resolved

Please see Kaldari's request at WS:S/H#Need help fixing Alice's Adventures in WonderlandBeleg Tâl (talk) 17:36, 30 October 2019 (UTC)

Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Kaldari (talk) 18:56, 11 November 2019 (UTC)

Author:Ferdinand_Moeller[edit]

The following discussion is closed and will soon be archived:
resolved

I updated this author's page with a middle initial. Should the page be moved to Author:Ferdinand A. Moeller even if the name isn't completely filled out? —Crocojim18 (talk) 01:33, 5 November 2019 (UTC)

Looks like it has already been moved. —Beleg Tâl (talk) 20:49, 6 November 2019 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. --Xover (talk) 09:55, 29 November 2019 (UTC)

Once a Week Vol. 7[edit]

Vol. 7 of Once a Week is missing pages 629 and 630. Of the two scans at IA, this is the one that has all the pages. Could someone please use it to repair the Djvu, or replace the whole thing if necessary, since the complete scan is fairly decent quality. Levana Taylor (talk) 22:32, 23 November 2019 (UTC)

@Levana Taylor: Done. Sorry about the delay. --Xover (talk) 13:50, 29 November 2019 (UTC)

Other discussions[edit]

Index:Carroll - Alice's Adventures in Wonderland.djvu[edit]

The following discussion is closed and will soon be archived:
Missing page has been incorporated into the work.

I believe I have found a copy of the missing plate from this book on flickr, by using Google image search, but I have been unable to locate the missing text page. (See the discussion on the project page.)

I have uploaded the copy of the plate that I found into the book category at commons ( Charles Robinson's illustrations of Alice's Adventures in Wonderland ) it is the one named Alice's Adventures in Wonderland - Carroll, Robinson - S205 - The whole pack rose up in the air.jpg

Could someone take a look to see if it can be used and if so inserted into the book at the appropriate point?

If it is not suitable please delete it.

Thanks Sp1nd01 (talk) 14:19, 28 October 2019 (UTC)

@Sp1nd01: Thanks! It's been incorporated into the work. Kaldari (talk) 19:16, 11 November 2019 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 13:56, 29 November 2019 (UTC)

Abuse filter edit request[edit]

The following discussion is closed and will soon be archived:
Declined. The noise in the logs from the filter is intended behaviour.

Hi. Can Special:AbuseFilter/36 please be tweaked to also exclude bots? When bots execute mass moves, they flood the log. Thanks, --DannyS712 (talk) 06:34, 2 November 2019 (UTC)

The purpose is to capture such moves where there is the potential for remaining redirects, so it is acting within scope of why I programmed it. As such it is recording what I want to see, so I am not considering it flooding the logs. — billinghurst sDrewth 06:40, 2 November 2019 (UTC)
@Billinghurst: my apologies, I thought it was for tracking misguided moves. However, bots also have suppressredirect, so if redirects aren't needed, wouldn't they be suppressed? Either way, thanks for explaining --DannyS712 (talk) 06:43, 2 November 2019 (UTC)
Yes it is its primary, though it is broader for checking, and also for clean up. It is not automatic to not create redirects, and there is no clear means to detect that no redirect has occurred, so it is a checking process. It doesn't happened that often, so I am not concerned about the few occasions that it occurs, it never truly floods the logs. Most bot moves usually occur early on, so it hasn't been problematic over the years. — billinghurst sDrewth 07:54, 2 November 2019 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 13:58, 29 November 2019 (UTC)

Index:Canadian Singers and Their Songs.djvu[edit]

The following discussion is closed and will soon be archived:
The momentarily misplaced file has reappeared. :)

What has happened to the above Index? It was here when I was working on it yesterday (Sat 2 Nov) in the morning Australian time. It now says Error: No such file. It was nearly proofread. --kathleen wright5 (talk) 07:06, 3 November 2019 (UTC)

@Kathleen.wright5: the file was deleted, per c:Commons:Deletion requests/File:Canadian Singers and Their Songs.djvu. As a result, the work here may need to be deleted too --DannyS712 (talk) 07:37, 3 November 2019 (UTC)
No. The file was apparently moved here. @Beleg Tâl: will better placed to advise where it arrived. Beeswaxcandle (talk) 07:48, 3 November 2019 (UTC)
(edit conflict) @DannyS712: It was published before 1924 so it is in the public domain in the US (rule of thumb: US term of protection is 95 years from date of publication), and enWS policy is that works must be public domain in the US (vs. Commons that requires PD in both US and country of origin). @Beleg Tâl: On Commons you indicated that you had transwikied it here, but I can't find it. Can you look into it? --Xover (talk) 07:58, 3 November 2019 (UTC)
very sad they should delete an entire compilation based on the dod of a single septuagenarian. but work can continue here when the promised transfer occurs. deletion on commons should never be a deletion rationale here; rather we should have our independent task flow and determination.Slowking4Rama's revenge 12:49, 3 November 2019 (UTC)
@Xover: @Beeswaxcandle: I did import the file (or thought I had done so). You can see that File:Canadian Singers and Their Songs.djvu is not redlinked, and does contain the licensing info I set up, so I'm not sure why the file itself is not there also. Fortunately, I can easily re-upload it from the source, and will do so as soon as I have a chance (probably later this evening). @Slowking4: we did have this discussion here, the work is unambiguously copyrighted in Canada and in violation of Commons policy, and I did (try to) move the file locally as part of our independent task flow and determination. —Beleg Tâl (talk) 21:13, 3 November 2019 (UTC)
@Beleg Tâl: You imported the File: page (the container), not the media:. You cannot special:import media files. — billinghurst sDrewth 21:34, 3 November 2019 (UTC)
as we see deletion is privileged, and saving by transfer is not. it is not obvious that it was a copyright vio since the nominator did not do the work of listing the authors. i guess that is the uploaders job, or the person transcribing here, otherwise we might have work after work deleted out from under a transcription effort. look forward to the required local upload from IA, since fairusebot is a distant memory. Slowking4Rama's revenge 02:52, 4 November 2019 (UTC)
That's lame. Looks like it's already being looked at on Phabricator, phab:T8071. —Beleg Tâl (talk) 14:29, 4 November 2019 (UTC)
I've also added it to the wishlist. —Beleg Tâl (talk) 14:46, 4 November 2019 (UTC)
This index seems to be here in some form. I've just validated Page:Canadian Singers and Their Songs.djvu/124 and it was proofread by Jason Boyd earlier today. [Revision history http://en.wikisource.org/w/index.php?title=Page:Canadian_Singers_and_Their_Songs.djvu/124&action=history] --kathleen wright5 (talk) 02:41, 4 November 2019 (UTC)
I have uploaded the file, everything is Yes check.svg DoneBeleg Tâl (talk) 14:29, 4 November 2019 (UTC)
Checkmark This section is considered resolved, for the purposes of archiving. If you disagree, replace this template with your comment. Xover (talk) 14:02, 29 November 2019 (UTC)

Copyright in Ethiopia and Template:PD-Ethiopia[edit]

Please note that our {{PD-Ethiopia}} is out of date: Ethiopia enacted a copyright law in 2004 but our template refers to a 1960 version. The old law provided protection only during the author's lifetime, but the new law is pma. 50 with some PD-EthiopianGov type exemptions. Crucially, however, our template does not distinguish between copyright status in Ethiopia and copyright status of Ethiopian works in the US.

Since Ethiopia still does not have copyright relations with the US, no Ethiopian works are currently protected by copyright in the US, and can be freely hosted here.

However, if transferring a file to Commons the distinction becomes relevant. In those circumstances, do not depend on our {{PD-Ethiopia}} tag! Each file with this tag will need to be assessed individually.

Ideally we would modify our Ethiopia-related licensing templates and then review and correctly tag all works in Category:PD-Ethiopia with both Ethiopian and US copyright status (some works may be eligible to move to Commons even under their stricter policy). --Xover (talk) 19:11, 6 November 2019 (UTC)

Fixed —Beleg Tâl (talk) 19:08, 8 November 2019 (UTC)

List of index pages[edit]

How is the List of Index Pages supposed to work? It seems that it always gives the same results no matter what is filled in the Search field. --Jan Kameníček (talk) 15:45, 8 November 2019 (UTC)

The results page says "The search engine does not work. Sorry for the inconvenience." So I assume it's supposed to work normally but is broken. —Beleg Tâl (talk) 18:48, 8 November 2019 (UTC)
Oh, thanks, my fault… --Jan Kameníček (talk) 00:36, 9 November 2019 (UTC)
Although my experience with Phabricator is much worse than bad, I have given it a try and reported it, see task T237831. --Jan Kameníček (talk) 20:54, 9 November 2019 (UTC)
In fact it had already been reported two months earlier: task T232710 --Jan Kameníček (talk) 17:51, 10 November 2019 (UTC)

Add Wikidata link to Index page[edit]

I made a thing: User:Samwilson/LinkIndexToWikidata.js. It adds a 'Wikidata item' row to the metadata table on Index pages, linking to the Wikidata item that refers to the Index page via Wikisource index page (P1957). If there's no link, it complains to you to fix it. :) To use, add this to your common.js page:
mw.loader.load('//en.wikisource.org/w/index.php?title=User:Samwilson/LinkIndexToWikidata.js&action=raw&ctype=text/javascript');Sam Wilson 23:24, 10 November 2019 (UTC)

@Samwilson: (Stupid-hat question) Why don't we just add the field to underlying template? Then we can gadgetify the script to make it more available. Or do we just gadgetify it anyway? — billinghurst sDrewth 06:36, 13 November 2019 (UTC)
@Billinghurst: Good question! It's because there's no sitelink from an Index page to its Wikidata item; the only link is via the URL stored in Wikisource index page (P1957), so the way a script can do it is by making a Wikidata Query Service request. A template (or Lua module) can't do that. Or do you mean, why don't we add a field for Wikidata ID to the template? That'd work, but it's duplicating the data (which is maybe not a bad thing; similar things are done elsewhere in the system). —Sam Wilson 12:25, 13 November 2019 (UTC)
I stopped bothering adding the index: backlink. It isn't in the WEF framework, and I just stopped bothering as it seemed to be of limited value. If it is being added to the {{book}} template at WD, then we can inhale it with the existing script, or we can enter it manually. Means that I created it a bit earlier. I sometimes wonder whether the duplication may allow for bots to better come along and tidy up. <shrug> — billinghurst sDrewth 12:49, 13 November 2019 (UTC)

Truth be known samwilson I would like to have more of the {{book}} data on the Index: page, hopefully passively added from Commons, or pulled from WD, rather than another manual addition. For instance I would like that where we have an IA work that we can have active link to that work. I want to be able more readily link to the jp2 zip file of the work so we can better work with image extraction and clean up, with our no longer actively supporting {{raw image}} extraction. Unfortunately I haven't found an online tool to open an online zip and extract single images, though I am still looking.

Now I don't know the best way to complete the three way dance with Commons and Wikidata, and it is always our issue that IA starts, Commons comes 2nd, then enWS Index: 3rd, enWS main ns, 4th, then usually WD comes 5th. If WD could occur at step two or step three (more automagically) and then Index: page that would be beautiful. Though that wish has never been fulfilled, and I have asked people like Lucas Werkmeister at a conceptual level … to silence, we are way down the food chain. RexxS is really helpful, though I don't like to push acquaintanceships too hard.

What those ignoramus thinks we need is <mode start=dream>

  • Update to MediaWiki:Proofreadpage index template for fields
    • though maybe it is a separate template can manually insert to start, or passively embed based on data links to WD (I dunno exactly, out of my paygrade)
  • Update to MediaWiki:Gadget-Fill Index.js which is the gadget the extracts data from Commons files and adds to respective Index: fields

billinghurst sDrewth 01:46, 14 November 2019 (UTC)

@Billinghurst: I'd love to help, of course, but I'm a complete noob here and I don't understand the workflow or the terminology you're using. Checking random works and authors, I find Wikidata links, but no link for a random transcription. Is that where you're stuck at? When I looked at Index:Paradise Lost (1667).djvu, I could see that the linked title and linked author both have Wikidata items, but obviously not the transcription. I think for the moment, Sam is right - you need WDQS to do the reverse lookup. However, I suspect that it should be possible to have a field on the index page that records the Wikidata item containing that link once it's been found. Magnus Manske has a bot that can create lists on wiki-pages from the results of a WDQS query, so maybe a bot run could populate such a field for you? I'll another think and see what I can work out. RexxS (talk) 18:56, 14 November 2019 (UTC)
smiley Thanks RexxS. The work you found is just going to be complicated for a range of reasons, so let me try something cleaner.

I have prepped a completed and transcluded work hopefully as a better example.

The three djvu-like pages they are suitably populated with expected data. All inter-related, and each containing different data. Noting that the WD item is for the edition, I haven't created one for the conceptual "book"

For a work in progress of transcription: Index:The best hundred Irish books.djvu <-> c:File:The best hundred Irish books.djvu <-> Internet Archive identifier : besthundredirish00obri, no wikidata item yet as I usually create those at the end, and no book item as I gave up creating those as too much extra effort.

If we need to get down and dirty then maybe we should pick a user talk space for the conversation, or a scratch space, or an IRC chat. <shrug> Guide me, I am really happy to step through things. Noting my [understanding of WikidatatIB = knowledge of WDQS = capability in Module: ns]. (I suck at programming … conceptual hole). — billinghurst sDrewth 01:26, 15 November 2019 (UTC)

Tech News: 2019-46[edit]

22:02, 11 November 2019 (UTC)

Spelling errors[edit]

I've forgotten the guidance on spelling errors in the original. "Seventeeth" in http://en.wikisource.org/wiki/Page%3APlomer_Dictionary_of_the_Booksellers_and_Printers_1907.djvu/177 ... Rich Farmbrough, 19:10 12 November 2019 (GMT)

You can use the {{SIC}} template. --Jan Kameníček (talk) 20:02, 12 November 2019 (UTC)
@Rich Farmbrough: We reproduce as they are. If you do use the template as suggested above, it is up to you whether you include text in the second parameter. Some consider it an annotation and and assumption so do not like it, some do like it. Personally, I use it though generally leave it empty unless it is really helpful to explain the alternate word. If you want to silently leave something inline, then we also have {{sic}}. As a note, if there is a whole swag of old text being reproduced we would not tag it, we let it stand. As per WP in wikilink first error, we would only tag the first error of each type. I saw that Martin has highlighted that work in a WD talk that work that he and I did. — billinghurst sDrewth 06:30, 13 November 2019 (UTC)

If you tweet, especially about Wikisource[edit]

Hi. For those who are on Twitter and tweet about Wikisource, a new reminder that some of us maintain @wikisource_en so please do include that account in your tweets as appropriate. Either in twitter, or here, please let us know your account so that we can follow. — billinghurst sDrewth 06:25, 13 November 2019 (UTC)

I'm @pigsonthewing, and will follow the above account as soon as Twitter lets me. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:35, 13 November 2019 (UTC)

AuFCL / MODCHK / random IP editor 114...[edit]

Dear AuFCL / MODCHK / random IP editor 114... Hoping that the NSW fires are not near to you, thinking that they are to your north-west and south-west. Best of luck with what is coming through your area. Wildfire sucks. — billinghurst sDrewth 12:16, 13 November 2019 (UTC)

Amen! --Xover (talk) 16:00, 13 November 2019 (UTC)

Greek: Aerodynamics[edit]

Could somebody who is able to read and write (rather: type) Greek please enter the words in that language on Page:Aerial Flight - Volume 1 - Aerodynamics - Frederick Lanchester - 1906.djvu/415 and the following page? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:46, 13 November 2019 (UTC)

@Pigsonthewing: If you use {{Greek missing}}, the page will be added automatically to Category:Pages with missing Greek characters which is monitored by users who can type Greek. —Beleg Tâl (talk) 15:10, 13 November 2019 (UTC)
@Beleg Tâl: Something new to learn every day. Done, thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:31, 13 November 2019 (UTC)

Signatories[edit]

Are signatories considered to be a kind of authors, i.e. can they have author pages even if no other work by them eligible for Wikisource exists? --Jan Kameníček (talk) 16:03, 13 November 2019 (UTC)

One more related question: If a person already has their author page, can I add there (under a separate heading) works which they have only signed? --Jan Kameníček (talk) 18:12, 13 November 2019 (UTC)
I think it depends - if they are just one of a whole bunch of signatures e.g. on a petition, I wouldn't bother creating an author page for them (but I might add it under a separate heading on an existing author page as you suggest). On the other hand, a situation like e.g. an official signing a document that was issued in their name but written by one of their staff, I would definitely treat the signing official as a full author. —Beleg Tâl (talk) 19:04, 13 November 2019 (UTC)
I see. What I had in mind was e.g. an international treaty signed by a bunch of statesmen, which is similar to your petition example. --Jan Kameníček (talk) 20:55, 13 November 2019 (UTC)
I would have just wikilinked it unless they are primary. It is one of those quandaries about how do we work with WhatLinksHere and running counts on those things to highlight where an author has exposure beyond the works they wrote. — billinghurst sDrewth 00:35, 14 November 2019 (UTC)
@Billinghurst: I see. So you do not think it should be mentioned at the author page if it is not a primary signature, right? --Jan Kameníček (talk) 13:24, 16 November 2019 (UTC)
I don't see how it is different from being mentioned/appearing in any article that we reproduce. I wikilink to their author page, and would only backlink back to the article where they are the focus. — billinghurst sDrewth 13:55, 16 November 2019 (UTC)

Ability to individually access single JP2 images from Internet Archive work archives[edit]

Now I may be completely slow on the uptake, however, today I have just identified that we can directly download individual JP2 files for the pages from a work. [If others had noticed this, then I apologise for missing your communications on this matter.]

Anyway this means that with something like GIMP, you can directly paste in the url of the JP2 page into GIMP > Open location and load it straight into application and edit the best quality file.

@Xover: I think that this means we can probably steal best quality pages from another copy of the same work and rebuild files. Correct?

To see a file list from a file's /details/ page at Internet Archive follow the SHOW ALL link > beside the .zip click the "View Contents" link and VOILA a file list where you can grab a useable link. example link http://archive.org/download/whofearstospeako00cuma

@Samwilson: if you can suck in the IA link to an index file, we can simply template this based on

http://archive.org/download/<ia-identifier>/<ia-identifier>_jp2.zip/

Alternatively maybe we get this linked up from within the book template at Commons. — billinghurst sDrewth 03:45, 14 November 2019 (UTC)

Pictogram voting comment.svg Comment I am thinking that at least as an initial measure we could build an optional manual IA field into {{raw image}} that can turn on a component that displays text and link to the directory listing of the JP2 file. At some later point, when we have soeone clever we may be able to build some linking of the Page:{{BASEPAGENAME}}/nn back to the Index, and any data known about the Index: page could be used to automatically populate the IA field. Just thoughts, happy to hear something cleverer. — billinghurst sDrewth 06:59, 14 November 2019 (UTC)
@Billinghurst: I'm not quite following your reasoning, but, yes, from a set of individual page images I can generate a DjVu with OCR text layer, regardless of where those page images came from. This is currently using hacky and semi-manual tooling that nobody but myself would ever use unless under duress, but I am investigating options for providing some kind of access to them for anyone to use in a way that is at least reasonably functional for normal people. In the mean time I am happy to generate DjVus for people if I am provided with a comprehensible specification of what page images in what order should make up the resulting DjVu. I can also do things like swap out a page in an existing DjVu, reorder pages in an existing DjVu, delete extraneous pages, etc., and am happy to do so, but, again, provided I get a clear specification of what needs to be done.
As for your larger thrust… I don't think I'm grasping the problem you are aiming to solve?
If we presume a correctly filled out Book template at Commons, the Index:-page preloader gadget can be extended to pull in the source link from there (in fact I think it already does, we just don't store it). The Index: template here can be extended to have a field to store the value from the source field at Commons. And it's possible to make a script that tries to pick out an IA link from that and generate a direct link to the "show all files" directory listing at IA. It is not possible to create a link directly to an individual page image at IA since their page images are arbitrarily named, and because we routinely make changes to works between IA and what's uploaded to Commons (think removing Google scan pages, calibration pages, duplicate pages, etc.). It is also technically possible, today, to make a script to go directly from a page in the Page: namespace to the directory listing at IA. Some or all of these will be somewhat hacky and prone to break, but that's already the case with the Index: preloader gadget and it seems to work enough to be worthwhile. *shrug*
In any case, lots of things are possible in this area, so it's mostly a matter of articulating which problem we are trying to solve. --Xover (talk) 08:08, 14 November 2019 (UTC)
Problem 1: {{raw image}} was previously used by Hesperian as an indicator to populate converted jp2 images as png images, this upload locally stopped a while ago due to time and effort. And users had to download the PNG and clean, then we have to go through a migration and deletion process. All butt ugly.
Info Template:raw page scan (transclusions: 21,218, links: 5) / Template:Raw image (transclusions: 12,659, links: 21,262)
Problem 2: people have used the jpg images from (expanded) scans at IA as the basis of an extracted images to upload to commons, or as an ugly screenshot to upload. All butt ugly.
(solution to P1 and P2) Links to the folder enables users to at least try and to get best available quality.
Problem 3 There are broken scans here, and often people haven't fixed them as it was too hard to extract from a djvu, or get a source page to OCR separately.
(solution to P3) new source of single page to insert into djvu, or new source of single page image to OCR online and insert; was flagging nothing more
Problem 4 While scan in file has been good, the OCR has been rubbish
(solution to P4) as per S3, can OCR individual page for paste of text
Re general comment: increasing our general connectivity in through Wikidata<->Commons was part of the discussion earlier on this page—samilson's script discussion above—and to IA is more helpful, sure there will be old data, and occasionally broken data, though such a process as this is more likely to find and get fixed. I am advocating that we keep taking these steps.
Re book => index. We haven't looked at it as a community, and Jarekt has been better developing it at Commons, and we should review how we utilise the links, the code or the data, at the moment we scrape data, rather than leverage the available sources, and then only complete fields when we need to override.
Re linking, it looks as direct linking is possible, eg. [1] though I was more advocating linking to the directory. — billinghurst sDrewth 09:00, 14 November 2019 (UTC)
@Billinghurst: Thanks, I'll try to see if I can come up with anything useful.
Regarding the direct linking, the problem isn't what IA provides, it's that we have no way to figure out which page image a Page: here corresponds to at IA. On the IA side, some scans count pages from zero, some from one; some include Google book pages, calibration pages, etc. that have been removed before upload to Commons, meaning our page 123 maybe be page 134 at IA. In other words, there's no way for a mere dumb computer to get from one to the other: you need a human being to connect the two. That said, there are things we can do to encourage the humans to add such links if we want them to: the {{raw image}} template can start by asking for an IA identifier if missing, and progress to link to the directory listing if one is provided, and also ask for an IA page identifier that will enable the direct link. --Xover (talk) 09:13, 14 November 2019 (UTC)
I have started a conversation at template talk:raw image though the work is done in module:rawImage which eliminates me from the fix, though maybe not all the grunt work needs to take place in the module. I have also noticed that we give guidance at Help:Adding images and that is part of the above problem. — billinghurst sDrewth 10:24, 14 November 2019 (UTC)
@Xover: if Djvu files are ported from IA leaving unchanged the internal 'page id', deletions, etc. should not cause problems. Inspecting the local djvu file, we could get the correct IA djvu page. This is not true if new 'page ids' are used when regenerating djvus from IA. This at least could allow offline scripts to work. Would be nice to have this info through an API command wishful thinking, I know ....)Mpaa (talk) 19:38, 14 November 2019 (UTC)
@Mpaa: Hmm. Interesting. I hadn't realised IA did that. The 'page ids' aren't actually identifiers as such, they're a "page name" and were, I believe, intended to be used essentially like our pagelist tag. I've been avoiding using them because they make it confusing when trying to manually manipulate a DjVu file (the DjVuLibre commands operate on physical page numbers, but DjView displays the "page name"; if the two differ you get seemingly random results). However I hadn't considered the possibility of using them to document the original page image from which the DjVu page was generated. I'll play around a bit when next I touch that code and see if there's anything clever we could do there. --Xover (talk) 19:49, 14 November 2019 (UTC)
For completeness, I mean this sort of info, e.g, <PARAM name="PAGE" value="whofearstospeako00cuma_0001.djvu"/> in here. I always try to leave that unchanged. Then we just need to play with the extension. I have seen bugs, e.g. the page offsets sometimes we get when uploading, related to changing these references.Mpaa (talk) 21:20, 14 November 2019 (UTC)

Work-specific disambig pages[edit]

A while ago, we agreed that it does not make sense for us to have author-specific disambiguation pages. For example, Sonnet (Shakespeare) should not exist as a disambiguation page, but instead all works by Shakespeare titled "Sonnet" should be listed directly at Sonnet and at Author:William Shakespeare.

I've noticed that we also have a number of work-specific disambiguation pages. For example, 1911 Encyclopædia Britannica/Abdera lists works entitled "Abdera" which are also part of the Encyclopedia Britannica. However, this page is redundant, as the works listed on that page are listed directly at Abdera and at 1911 Encyclopædia Britannica/Vol 1:1.

I would like to start merging these work-specific disambiguation pages into the main disambiguation pages, but I also want to get the community's input before I start. This also ties into my efforts to clean up the Wikidata items for Wikisource mainspace disambiguation pages. —Beleg Tâl (talk) 13:34, 15 November 2019 (UTC)

I would say
  • that the page "1911 Encyclopædia Britannica/Abdera" should redirect to the general disambiguation page for "Abdera" with merging of detail as required.
Philosophically we have agreed
  • one disambiguation page per term
  • where disambiguation contain main and other namespace items, then main namespace wins for siting
  • disambiguation pages can exist in any portal to disambiguate within a portal (above rules apply first)
While not desirable, I don't have a particular concern if we have work level disambig pages and nothing at root level where not attached to a WD item—to me they are low priority. That said, we should not have any work level disambiguation pages linked to WD, be it DB1911, DNB or whatever, and creation of further work-level pages should be dissuaded.
billinghurst sDrewth 14:17, 15 November 2019 (UTC)

Proposal for a new Featured texts badge on Wikidata[edit]

I've created a proposal on Wikidata for a new "Featured texts" badge, to compliment the existing "Featured article", "Featured list", and "Featured portal" badges used by the Wikipedias. If you have an opinion, please comment there (not here). Thanks. Kaldari (talk) 18:27, 15 November 2019 (UTC)

We already use the featured article for featured text (aliases at d:Q17437796). I think we felt that the words are interchangeable, and there is no link overlap issues. The Vampyre <-> d:Q58881954, if it isn't automatically appearing, that is our fault for not properly converting {{featured}} to properly leverage the tag.

We also need to better align d:help:badges of proofread, validated and digital document, as we should be building that into our {{header}} template. I note that there is a separation of Wikisource badge and Wikimedia badge. — billinghurst sDrewth 02:35, 16 November 2019 (UTC)

I have already brought up this topic on the proposal page itself. —Beleg Tâl (talk) 19:00, 16 November 2019 (UTC)

Tech News: 2019-47[edit]

20:16, 18 November 2019 (UTC)

Bulk replace[edit]

Do we have a tool to do find'n'replace across all the pages in a work? For example, to change all instances of {{frac}} to {{sfrac}}? Or do I need to find a willing bot operator for such things? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:48, 20 November 2019 (UTC)

@Pigsonthewing: That would be a bot request. --Xover (talk) 19:22, 20 November 2019 (UTC)

Duplicate works[edit]

Are The History of the City of Fredericksburg, Virginia and Fredericksburg, Virginia 1608-1908 the same work? They appear to be, although they have different authors (Silvanus Jackson Quinn versus Sylvanius Jackson Quinn). There is a scan on IA here (under the title The history of the city of Fredericksburg, Virginia), which appears to show the same. If so, should they be connected? The former a redirect to the latter, since it is complete? Could someone match-and-split to the scan? TE(æ)A,ea. (talk) 22:44, 21 November 2019 (UTC).

Looks the same to me, definitely should move the complete one to scan and change other to redirect —Beleg Tâl (talk) 01:22, 22 November 2019 (UTC)

horizontal TOC -- Template request[edit]

I have a long TOC at Translation:Likutei_Halakhot/Orach_Chayim/Early_Rising that I would like to make horizontal. The Wikipedia templates listed at w:Template:Horizontal_TOC namely {{horizontal TOC}}, {{horizontal TOC|nonum=yes}} etc. do not work here. Could those be made available here? Thanks! Nissimnanach (talk) 00:51, 22 November 2019 (UTC)Nissimnanach

@Nissimnanach: you could use __NOTOC__ to hide the current TOC, and manually create a horizontal one using {{Empty TOC}} —Beleg Tâl (talk) 01:24, 22 November 2019 (UTC)
@Beleg Tâl: I'll remember that but is there a good reason why the horizontal TOC template cannot be available here? I think it will save me effort and time and I'm lazy to manually do w:seds or Replaces and afraid of making mistakes. Nissimnanach (talk) 13:11, 22 November 2019 (UTC)Nissimnanach
@Nissimnanach: We generally prefer not to import general-purpose templates for one specific work, especially when we have other templates that work just as well (like {{Empty TOC}}). I'm hesitant to introduce an unwieldy and not-very-useful template like w:Template:Horizontal_TOC, but I would be more than willing to assist you with seds or replaces if you don't want to do them yourself. —Beleg Tâl (talk) 13:54, 22 November 2019 (UTC)
I've applied {{TOC limit}} for now - see if you think that's better. That said, as the page is currently 182,380 bytes long, it may be better to split each level-2 section to a sub-page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:36, 22 November 2019 (UTC)

How shall I transcribe two books in one?[edit]

I have started working on a publication of engravings by Wenceslaus Hollar. The book does not contain the year of publication, but HathiTrust states that it was published between 1794 and 1812. The book looks like a reprint of originally two separate books, one published in 1640 and the other in 1643. The problem is that this reprint does not have one title common for both parts.

Can I transcribe the publication as two separate works under their individual titles? Or should I transcribe them as one work and devise some title? I was considering using the first of the titles for the whole publication, but it would be really misleading, as it speaks only about England, while the other part deals with various European countries. --Jan Kameníček (talk) 19:50, 22 November 2019 (UTC)

I'd just transcribe them as two separate works, if there's no overall introduction or anything.--Prosfilaes (talk) 21:04, 23 November 2019 (UTC)
I also think it is the best solution, but I wanted to have it confirmed by somebody else. Thank you very much. --Jan Kameníček (talk) 21:25, 23 November 2019 (UTC)
I, on the other hand, would probably transcribe them as one work and devise some title, like I did with The Holly & the Ivy, and Twelve Articles and Lyra EcclesiasticaBeleg Tâl (talk) 21:30, 23 November 2019 (UTC)
Hm, simple connection of two titles with "and" could also be a solution. I’ll think about it for a while, thanks as well. --Jan Kameníček (talk) 23:20, 23 November 2019 (UTC)
I don't think there is a clear answer in general; this sort of thing needs a judgement call for each work, and with quite some leeway for individual contributor preference. It also needs to be considered whether the book in question is actually a publication and not merely two works bound together (as was common practice for collectors of all stripes in the 18th and early 19th century). And on this particular book the fact the two works have the same publisher might suggest they are one publication, while the fact both included works have separate colophons suggests they are independent publications bound together. Similarly, there appears to be no front or end matter that is common to both works: they share only the binding. It is hard to be categorical, but I suspect I would have eventually landed on treating these as separate works that had merely been bound together. But I would not have faulted anyone for landing on the opposite.
Incidentally, the publishers, “Laurie & Whittle”, are still around, trading these days as “Imray Laurie Norie & Wilson Ltd”. --Xover (talk) 08:32, 24 November 2019 (UTC)
I know of a number of examples where works more or less related were packed into one binding out of publishing constraints. I think that we should make sure that sure separate parts are separated out, like they would be in an anthology or magazine, and make them available individually, even if they are under a higher level heading for the complete work.--Prosfilaes (talk) 02:00, 27 November 2019 (UTC)
Which has been done by creating redirects at the root where they have been displayed as subpages. Where they have a set of known publishing components, especially with regard to how they are portrayed at Wikidata, then keeping to the known truth is best. Here the provenance of the work is simply not known, we just know that they shared the same binding.

We know that many of our works were singly published, serially published, and multiply published, so do what makes most sense that maintains the credibility of the publication/work(s). Document it well either in notes, or on talk page, so that someone can understand what you did when looked at in five years time. — billinghurst sDrewth 04:30, 27 November 2019 (UTC)

Thanks everybody for valuable opinions. I have considered them all and finally decided to keep them together (as the publisher enclosed them in common binding), but as two separate subpages and with explanation in the note. I think this solution shows that originally they were separate and at the same time it is faithfull to the intention of the reprint’s publisher. --Jan Kameníček (talk) 00:11, 8 December 2019 (UTC)

Wikisource is sixteen years old: let's celebrate![edit]

Wikisource-logo.svg
Wikisource is 16 years old!
Books.svg

Dear friends,

In order to celebrate merrily the sixteenth birthday of Wikisource, the it.source community revamped the proofreading contest that since the last six years has gathered hundreds of jolly good proofreaders! The main page is at

We invite during two weeks users to validate pages and award three of them with tens of euros to spend in books: visit the contest page for more details or ask me for them when in doubt.

This year we have also texts in Neapolitan, Venetian, Ligurian, Dolomitic Ladin and Lombard!

If you think that this announcement is worth sharing.... well, spread the news! :D

- εΔω 09:59, 24 November 2019 (UTC)

Time to vote for the Community Wishlist 2020[edit]

It is time to vote for 2020 Community Wishlist Survey. Vote ends December 2nd. Wikisource has 28 proposals.

The proposal m:Community Wishlist Survey 2020/Wikisource/Improve export of electronic books is back this year. The tech team worked on it last year but had to work on other proposals. So we are many to think that the work should be pursued and completed.

There are also many good proposals. --Viticulum (talk) 16:37, 25 November 2019 (UTC) from the French Wikisource.

Tech News: 2019-48[edit]

16:51, 25 November 2019 (UTC)

Category muddle: cultural events & traditions[edit]

At present, there is a thoroughly confused organization of categories relating to traditional observances, holidays, rites, and collective activities in various cultures. Category:Cultural events contains a mixture of all of those, and was apparently created solely to organize EB1911 articles. It is currently a subcategory of Category:Culture. It seems like some of that could go in Category:Traditions, but that currently contains only Category:Observances and Category:Holidays, "Holidays" being also a subcategory of "Observances;" if there’s a meaningful distinction between "Observances" and "Holidays" I’m not seeing it. How should all this be re-organized? I think some natural groupings might be 1. cultural traditions related to the rites of life such as weddings and funerals; 2. Festivals and holidays with their own history and traditions: Lupercalia/Easter/Arbor Day; 3. articles about specific collective activities considered as a cultural phenomenon, e.g. gladiatorial games and the English country fair. The first two could be grouped under Observances and called respectively Rites and Holidays; the third could be just a subcategory of Culture but I don't know what to call it. Is there also a need for a Traditions category to contain something-or-other that isn’t in those three? Levana Taylor (talk) 17:31, 25 November 2019 (UTC)

On Wikipedia, w:Category:Holidays are official designated days of observance and are a subcategory of w:Category:Observances. Observances that are not categorized as Holidays include w:Category:Anniversaries. On Commons, however, commons:Category:Observances is a subcategory of commons:Category:Holidays. Ultimately it probably doesn't matter. —Beleg Tâl (talk) 04:12, 26 November 2019 (UTC)
Thinking about it further, I don't think I would lump groups 1 and 2 together. Festivals/holidays/observances are a kind of cultural event, and so are weddings/funerals/birthdays (and so are fairs/sporting events/concerts/etc), but they aren't really the same kind of thing at all. I'd toss them all under Category:Cultural events with maybe a subcategory or two for group 2, any more than that is probably unnecessary.

Publishers' terminology[edit]

Question seen on Facebook:

When the publishing information is on a page that precedes the title page, is there a name for that page? Like title page verso when it's the back of the title page. It happens frequently with juvenile picture books.

Anyone know? Do we have a glossary of such terms? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:30, 26 November 2019 (UTC)

With a picture? I think "frontispiece" might be the term you're looking for? A glossary would be useful, but I don't think we have one. -Pete (talk) 23:38, 26 November 2019 (UTC)
No, without a picture. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 00:07, 27 November 2019 (UTC)
Do you mean a "copyright page"? These have seem to have wandered over the years http://www.thebookdesigner.com/2009/09/parts-of-a-book/
sometimes you see colophon, but it’s greek to me. front matter varies a lot depending on date and publisher. Slowking4Rama's revenge 17:54, 28 November 2019 (UTC)

Copyright and deletion discussions needing community input in December 2019[edit]

The following copyright discussions and proposed deletion discussions have been open for more than 14 days, and with more than 14 days since the last comments, without a clear consensus having emerged. This is typically (but not always) because the issue is not clear cut or revolves around either interpretation of policy, personal preference within the scope afforded by policy, or other judgement calls (possibly in the face of imperfect information). In order to resolve these discussions it would be valuable with wider input from the community.

Copyright discussions require some understanding of copyright and our copyright policy, but often the sticking points are not intricate questions of law so one need not be an intellectual property lawyer to provide valuable input (most actual copyright questions are clear cut, so it's usually not these that linger). For other discussions it is simply the low number of participants that makes determining a consensus challenging, and so any further input on the matter would be helpful. In some cases, even "I have no opinion on this matter" would be helpful in that it tells us that this is a question the community is comfortable letting the generally low number of participants in such discussions decide.


Copyright discussions


Proposed deletions


Note that while these are discussions that have lingered the longest without resolution, all discussions on these pages would benefit from wider input. Even if you just agree with everyone else on an obvious case, noting your agreement documents and makes obvious that fact in a way the absence of comments does not. The same reasoning applies for noting your dissent even if everyone else has voted otherwise: it is good to document that a decision was not unanimous.

In short, I encourage everyone to participate in these two venues! --Xover (talk) 06:46, 2 December 2019 (UTC)

Tech News: 2019-49[edit]

16:58, 2 December 2019 (UTC)

Google and Phe's OCR[edit]

I've been following the posts about Phe's OCR issue on Phabricator, while humming Sam Cooke's "A change is gonna come".

Google OCR is excellent at recognizing the accented text, but it makes a mess of paragraphs by loosing words, which sometimes end up at the bottom of the page, if at all.

Do I have any other options? — Ineuw (talk) 01:46, 4 December 2019 (UTC)

i have been known to copy paste the text from Internet Archive text version, when both OCRs fail, but it is slow. i.e. [8] for Index:Proceedings of the Royal Society of London Vol 1.djvu -- Slowking4Rama's revenge 03:02, 4 December 2019 (UTC)
Thanks for the reminder. I have the text layers of all of my projects, but after going through them it seemed to me double work, although a lot of errors can be fixed at once throughout one document with search and replace. — Ineuw (talk) 10:00, 4 December 2019 (UTC)
@Ineuw: Some time ago somebody advised me to copy mw.loader.load( '//wikisource.org/w/index.php?title=User:Putnik/TesseractOCR.js&action=raw&ctype=text/javascript' ) into my common.js. It creates another OCR button, and although it is very slow in comparison with Phe's tool, it is better than nothing. I usually proofread several pages at the same time: while I am working on one of them, the others are being processed. --Jan Kameníček (talk) 10:31, 4 December 2019 (UTC)
@Jan.Kamenicek: Thank you x 3. I had the same experience with Putnik's OCR, and abandoned it because it was so slow. Several pages at a time is an excellent idea. — Ineuw (talk) 19:48, 4 December 2019 (UTC)
I'm not sure if this is relevant here (or if it's already known by experienced Wikisource users), but I've recently figured out a really useful process for quickly getting texts into Wikisource from IA, in a form that requires less editing. It involves copying the "Full text" from the IA page (e.g. here), which generally has line breaks in a position that preserves paragraph breaks (two line breaks per paragraph), unlike the OCR layer. After I've copied that, I also use RegEx (either in a desktop text editor, or using the built-in Wikisource search-and-replace feature), to remove page headings (more or less) and take care of any other general tasks that the particular work in question requires. After doing that "rough cut" of tidying up the text, I use the Help:Match and split tool, creating individual pages for the work that are substantially better than the OCR layer. For an example of a work I've done this with, but I haven't yet done much proofreading for, see here. -Pete (talk) 23:59, 4 December 2019 (UTC)

Seemingly identical titles are not identical[edit]

From time to time I solve the same problem as seemingly identical titles are not identical, compare e.g. Bohemian legends and other poems/A Hussite Song⁠ with Bohemian legends and other poems/A Hussite Song. Some time ago I was told that it happens because of some invisible characters. However, as the characters are invisible, it is very annoying (and also difficult to find out which of the two titles is the wrong one). Is it possible to solve somehow so that it does not happen? --Jan Kameníček (talk) 10:47, 4 December 2019 (UTC)

I do not know what is possible and what is not, but it would be great if:
  1. preferebly the invisible characters could be ignored
  2. if not, if they could at least be made visible.
However, I do not know either, whether this can be achieved locally. --Jan Kameníček (talk) 14:17, 4 December 2019 (UTC)
@Jan.Kamenicek: In what situations or contexts do you run across these? There's no general way to ignore or make visible these characters, but there may be ways to help detecting or prevent them in specific contexts and situations. We also have some de facto policy that page names only use characters from an extended ASCII subset (it is sadly not an explicit written policy), which means we could conceivably have a bot create lists of pages with "illegal" characters that we could treat as a maintenance backlog to systematically (but not automatically) fix them. --Xover (talk) 14:30, 4 December 2019 (UTC)
@Xover: I am not sure, whether I can recall with certainty how it usually happens, but I think that it is in the following way: I have got some OCR text of a downloaded scan. Then I turn some expressions from this text into red links, and after clicking on the link I create a page. I usually do not notice anything suspicious until I make another link to the same page in a different way, e.g. manually, and the link turns red again, although the page has been founded. (This is usually only a matter of coincidence, because I often make the links by copying the title of the page, and in such a case the link works well as it was copied together with the invisible character, and so the problem probably stays unnoticed in some cases). --Jan Kameníček (talk) 14:43, 4 December 2019 (UTC)
@Jan.Kamenicek: Hmm. Well, the good news is that since you're (mainly) creating these yourself there's no need to solve this for everyone. The bad news is that there's no obvious and easy fix for it. The best I can come up with off the top of my head is a user script to sanitise links, but you'd need to remember to run it manually every time. Or maybe we could hook into the "Save" button so it checked every time you tried to save a page. --Xover (talk) 16:18, 4 December 2019 (UTC)
Well, I think that if such an invisible character gets into the title of a page from scanned text, it can happen to anyone, not only to me (I think I read somebody else complaining about it here as well some time ago). Something similar happens also in Commons, where they even run some bots to correct names of files or categories from time to time, but correcting the title days or weeks after its creation by bot is late if you need to work with it immediately. So I thought that some general solution like automatic removal of such characters could be found. However, the above mentioned solutions look too difficult :-( --Jan Kameníček (talk) 16:36, 4 December 2019 (UTC)
@Jan.Kamenicek, @Xover: I've suggested some new regexes to add at MediaWiki talk:Titleblacklist. These should help with the problem. Kaldari (talk) 16:47, 4 December 2019 (UTC)
@Kaldari: What is going to happen when somebody tries to create a page with such a blacklisted character? Will the character be removed and the page created without it, or will the page just be refused to be created? If the latter is true, will the contributors get some message with guidance why it was refused and what shall they do? --Jan Kameníček (talk) 18:28, 4 December 2019 (UTC)
@Jan.Kamenicek: The page would be refused and the editor would get an error message. It's actually possible to create custom error messages for each titleblacklist rule. Do you think that would be useful? Kaldari (talk) 19:08, 4 December 2019 (UTC)
The generic error message is: "You do not have permission to create this page, for the following reason: The title 'XXXXXX' has been banned from creation. It matches the following blacklist entry: XXXXXXX." Kaldari (talk) 19:10, 4 December 2019 (UTC)
@Kaldari: It would definitely be useful if the message were more specific and advised what to do, something like: "The proposed title of the page contains forbidden invisible characters, which might be a copied remnant of a scanning process. It is recommended to create the page again with manually typed title." (Or the message can be worded in a more comprehensible way than this attempt of mine.) --Jan Kameníček (talk) 19:47, 4 December 2019 (UTC)
@Jan.Kamenicek: Unfortunately, I don't have editinterface rights, so I can't create the custom error messages myself, but maybe Xover could. Kaldari (talk) 19:54, 4 December 2019 (UTC)
@Kaldari, @Jan.Kamenicek: Yes check.svg Done (diff). The custom error message is at MediaWiki:titleblacklist-invisible-characters-edit. It seems admins are exempted from the title blacklist so I haven't tested it. --Xover (talk) 09:02, 5 December 2019 (UTC)
@Xover: I have just tried it and managed to create a page with the forbidden characters :-( --Jan Kameníček (talk) 09:18, 5 December 2019 (UTC)
@Jan.Kamenicek: The WORD JOINER (U+20160) character was not included in the blacklist rules. I've added it and deleted the test page. See if you can recreate it now? --Xover (talk) 09:38, 5 December 2019 (UTC)
@Xover: Well done! Now it works as expected. Only the message with the code is very long and may be confusing to some (i. e.: "…and the blacklist rule that blocked it ( .*[\x{00A0}\x{1680}\x{180E}\x{2000}-\x{200B}\x{2028}\x{2029}\x{202F}\x{205F}\x{2060}\x{3000}].* <casesensitive|errmsg=titleblacklist-invisible-characters-edit> # Non-breaking and other unusual spaces), in order to be able to help. ). I suggest to leave the code out of the message. --Jan Kameníček (talk) 10:53, 5 December 2019 (UTC)
Although it's long, I think showing the specific code that blocked the title is important for troubleshooting. Kaldari (talk) 16:48, 5 December 2019 (UTC)

New Hampshire versions[edit]

we have three uploads to commons of the same internet archive book: Index:New Hampshire (Frost, 1923).djvu, Index:New Hampshire.pdf and File:New Hampshire by Robert Frost.djvu. we might want to adopt the former as the most progressed. and we might want to consider how we coordinate effort in public domain day uploads so as not to duplicate effort. Slowking4Rama's revenge 22:07, 5 December 2019 (UTC)

Macron combined with small caps[edit]

It seems that macron above a letter cannot be combined with the {{sc}} template and is pushed to the right, compare e.g. Theatrv̄ with Theatrv̄. It is not a big thing, but if there was some easy solution, it would be nice to solve it. --Jan Kameníček (talk) 11:31, 7 December 2019 (UTC)

@Jan.Kamenicek: This is a font + font renderer issue, probably caused by the lack of a precomposed code point for LATIN SMALL LETTER V WITH MACRON in Unicode (it's actually a LATIN SMALL LETTER V (U+0076) + COMBINING MACRON (U+0304). Your example renders fine in Safari/Chrome/Firefox on macOS. All {{sc}} does is wrap the text in a span: <span style="font-variant:small-caps">{{{1}}}</span>. --Xover (talk) 12:06, 7 December 2019 (UTC)

2020 Scanapalooza[edit]

As with last year, we will have many significant works entering public domain in the US in 2020.

An organizational page for identifying and tracking these works now exists at Wikisource:Requested texts/1924. --EncycloPetey (talk) 20:23, 7 December 2019 (UTC)

good work, see also http://everybodyslibraries.com/ were there are some suggestions about PD-not renewed works as well. -- Slowking4Rama's revenge 02:05, 8 December 2019 (UTC)