# WorkingWiki/Defaults

As of about July 18, 2012, this is the design document expressing how WorkingWiki attempts to find project descriptions and source files on their respective wiki pages.

Note: Since July 18, 2012, we no longer support implicit project descriptions. This will break some old projects, because they will no longer have a list of source file locations until the locations are manually added. Hopefully the disruption will be minimal, because those project will still be able to make their project files using the source files already stored in their working directories. We also only look for project descriptions in the ProjectDescription namespace, not on mainspace pages connected to the project.

There are two joined problems: how to find a project-description given a source-file or project-file tag on a page, and how to find the contents of a source file when a project-description gives its name but not its page location.

## Use cases

These are the main use cases I'm considering, to make sure they work as expected.

1. fully explicit and “correct” project.
<project-description> is stored on ProjectDescription:A, and includes a page attribute for every source file.
source-file and project-file tags are stored in various places in the wiki, and all contain an explicit project="A" attribute.
2. single-page project. (no longer supported.)
all the source files for the project are specified by source-file tags located on page B. These tags may appear without a project attribute. No project-description is provided. The project is implicitly taken to be called B, and it is populated by all the source files listed on page B.
Some of the source files may actually reside on other pages, however, such as images, which are generally stored on pages in the Image: namespace. They are included in the project by placing an empty tag such as <source-file filename="X.eps"/> on page B. In this case, the actual source file content can be located in any of several default locations, such as Image:X.eps.
3. the subpage convention.
project C has a project-description element, and optionally a wikitext description on page C, and source-file and project-file pages are on subpages of C. Filename f is located on page C/f, represented by a source-file without an explicit project attribute, and is listed in the project-description without an explicit page attribute.
4. “correct” project with subpage naming.
The way importProject.php sets up projects: project-description on ProjectDescription:D, with each project file f on page D/f. All page and project attributes are provided explicitly (but user can add more using this naming scheme and the default project and page location algorithm will do what they want).
This is still supported but probably shouldn't be, because people pretty much universally prefer to put the files on page D.
5. inline latex.
bits of math or other latex can be dropped into a line of wikitext, either using $$...$$ or using <latex>...</latex>. This construct is replaced by a source-file tag with file extension .tex-inline, and processed into XHTML by a make rule. This needs to work transparently regardless of whether there is a project description connected to the page, and regardless of whether there are slashes in the name of the page.
6. GetProjectFile.
The Special:GetProjectFile page is given a project name and a filename in a GET request, and finds the file in the project of the specified name. In order for this to work reliably, it must be possible to find any project description in the system given only a project name, including those that are described only by the sequence of source-files on a page.

Not sure I follow why we have to be so nice to people. What's wrong with requiring Project names to have no slashes, single-page projects to not be on subpages, and the project description to be either on the corresponding page in either the main or pd space. JoDu 21:02, 3 April 2009 (UTC)

If the project name is implicitly the page name, and $$ is to be allowed on pages with slashes, then project names have to be allowed to contain slashes. Also, I personally want to use single-page projects (besides $$ ones) on subpages. Project descriptions are required as you say, except for the final /, which is a minor detail. WuLi 22:47, 3 April 2009 (UTC) ... I don't know whether the subpage thing is entirely necessary, but I think it might be useful. WuLi 00:44, 21 July 2009 (UTC)

Project names may be allowed to contain '/', but the slashes would be encoded in making the working directory name — all projects' working directories are direct children of the same working-directories directory. File names are allowed to include slashes for real, supporting projects with subdirectories.

## Finding a project

First question here is what to do given an explicit project name, as in <project-file filename="X" project="Y">. Where should PE go to find the project-description data? We have some requirements:

• Because of the single-page convention, any page name should be able to be a project name, including subpages. This means project names with slashes have to be allowed.
• Project names should be normalized, so that url encoding or such doesn't cause the code to create two different projects where there should be one. Best is probably to use a Title object to normalize the project title, and call its getPrefixedDBKey() to get the normalized project name (underscores in place of spaces, slashes where they occur). Use Title::newFromText() to create the object. This allows namespace prefixes in the name and doesn't check it against the actual page titles in the DB. Consequently spaces (including %20) should never arise in normal project names.
• The content of a $$ or <latex> inclusion is processed as a "standalone" source-file. A standalone source file X on page Y is associated with a special project name "standalone?Y?X". This is chosen because the project name needs to be compatible with MediaWiki's page titling system, but unlikely to coincide with an actual page's title. Question marks are allowed in MW page titles, but cause problems in practice (they are interpreted by the web server, causing the page's URL to become distorted). Therefore any page that overlaps with this choice of titles would be unusable, and so is not expected in actual use. When given this special project name, WW knows to find the file on page Y, without looking for a project description.

The algorithm to find a normal project by name has changed as of July 2012. It's not really even an algorithm any more.

• First look at page ProjectDescription:Y for a project description.
• If not found there, look on page Y (in the main namespace). This is where people are likely to put them by hand.
• If not found there, try page Y/, if Y doesn't end with a slash. Sometimes people might conflate these two page names when using the subpage convention.
• We should think about this; it could do more harm than good, if people accidentally create both pages. At least, if we do it, we should provide a warning when both pages exist.
It would only come up if someone creates two project-description elements, which would equally be an issue if they create one on Y and one on ProjectDescription:Y WuLi 22:57, 3 April 2009 (UTC)
• If none of the above, use the project implied by the contents of page Y.
• This does not seem necessary JoDu 21:02, 3 April 2009 (UTC)
This is essential to the single-page convention. If we didn't have it, Special:GetProjectFile wouldn't work on any project that doesn't have an explicit project-description, meaning make.log files, pdf files and embedded images wouldn't work. WuLi 22:57, 3 April 2009 (UTC)

The second question is what to do when the project name is not specified, as in <source-file filename="X">. This is required by the single-page and subpage conventions. It's easier to type, and may be more robust as well, allowing projects to be moved from one location to another. Suppose the source-file tag is found on page P.

• Look for project named P, using the above algorithm, including inferring from page P. This covers the single-page convention.

For a file found on a subpage P/Q/R, to guarantee the subpage convention, we would

• Look for description for project P/Q/R.
• Look for description for project P/Q.
• Look for description for project P.
• If that isn't found, use the project implied by the contents of P/Q/R.

But: when the project name isn't given, we should try all these places for an explicit description before trying to make an implicit description. So we only call the first 3 steps of the find-by-name routine outlined above for each of these cases.

Note: hopefully no one will try to make project names that clash, like P/Q and P. If they did, there could be a conflict over creation of pages like P/Q/f.

## Finding a source file

The dual problem is when we have a project description that lists a source file without explicitly saying what page it's on. There are a number of reasonable default expectations about where that file can be located. Suppose project Y lists source-file Z.

If Z is a text file judging by the file extension,

• Look on page Y/Z (YZ if Y ends with a slash). This covers the subpage convention.
• Look on page Y. This covers the single-page convention.
• Also try on Y/, if it doesn't end in a slash already. If the project description might be there, source files might be too.
• Look on page Z. This covers the case where 'master.bib' is on page 'master.bib' and several latex projects include it.
• (I don't think we need this one, but maybe I'm wrong? — if Z is in a subdirectory, say "scratch/master.bib", should it look on page "master.bib"?)
• I don't follow this JoDu 21:02, 3 April 2009 (UTC)
This one is definitely an edge case — if you have a master.bib on page master.bib for use by multiple projects, and one of them includes it and wants the working filename to be scratch/master.bib, will the extension look for it on page master.bib and/or scratch/master.bib... or in your case, trying to include default.mk in a subdirectory... possibly not a real-life issue. WuLi 22:57, 3 April 2009 (UTC)

If Z is an image file judging by the file extension,

• Look on Image:Y/Z (or YZ), if the wiki has been customized to allow slashes in image names. They don't normally.
• Look on Image:Z.
• If the filename contains slashes (say Z0/Z1), look on Image:Z1.
• Failing that, search as above for a text file by this name.

If the file extension gives the wrong idea, you have to provide the page attribute.

## Some implications of the above design

• You have to either provide a complete project-description, with all the source-files, or none at all. You can't make a partial one, and expect the single-page convention to hold for the remaining files.
• You can't do the subpage convention without writing a project-description. If we did that, we would have to parse all the subpages of C every time we wanted to sync the working directory, which could be unacceptably hard on the server.

This is all complicated. I hope that covers it and is self-consistent.

## Notes on the future

I think we're gradually moving toward a much simpler set of expectations, in which project descriptions will always be explicitly stored on ProjectDescription:projectname - we've done that just now (7/2012) - and they'll be automatically maintained, meaning that page locations will always be specified. Then most of the above issues will become trivial.

Also, the "subpage convention" is unrealistically unwieldy. No one wants to put each file on its own page. We sometimes split projects onto several subpages, but with a whole set of files on each. So we should still support housing those files in the parent page's project, but we shouldn't put each file on a separate subpage by default when importing.

The rules for finding a source file that's an image should include looking on Image:Y\$Z, since that's where ImportProjectFiles puts it by default.