academic/muscle5: Added (MUSCLE 5: Next-generation MUSCLE)

Signed-off-by: Willy Sudiarto Raharjo <willysr@slackbuilds.org>
This commit is contained in:
Petar Petrov 2022-10-08 18:13:17 +01:00 committed by Willy Sudiarto Raharjo
parent d5622d171b
commit 1f43c07f3c
No known key found for this signature in database
GPG key ID: 3F617144D7238786
6 changed files with 273 additions and 0 deletions

28
academic/muscle5/README Normal file
View file

@ -0,0 +1,28 @@
MUSCLE 5: Next-generation MUSCLE
Muscle v5 is a major re-write of MUSCLE based on new algorithms.
* Highest accuracy, scalable to thousands of sequences:
Compared to previous versions, Muscle v5 is much more accurate, is often
faster, and scales to much larger datasets. At the time of writing (late
2021), Muscle v5 has the highest scores on multiple alignment benchmarks
including Balibase, Bralibase, Prefab and Balifam. It can align tens of
thousands of sequences with high accuracy on a low-cost commodity
computer (say, an 8-core Intel CPU with 32 Gb RAM). On large datasets,
Muscle v5 is 20-30% more accurate than MAFFT and Clustal-Omega.
* Alignment ensembles:
Muscle v5 can generate ensembles of high-accuracy alternative
alignments. All replicates have equal average accuracy on benchmark
test, including the MSA made with default parameters. By comparing
results of downstream analysis (trees, structure prediction...) on
different replicates, you can assess the effects of alignment errors on
your study.
* Manual:
https://drive5.com/muscle5/manual/
* Reference (included in the package)
R.C. Edgar (2021) "MUSCLE v5 enables improved estimates of phylogenetic
tree confidence by ensemble bootstrapping"
https://www.biorxiv.org/content/10.1101/2021.06.20.449169v1.full.pdf

View file

@ -0,0 +1,5 @@
References
R.C. Edgar (2021) "MUSCLE v5 enables improved estimates of phylogenetic
tree confidence by ensemble bootstrapping"
https://www.biorxiv.org/content/10.1101/2021.06.20.449169v1.full.pdf

View file

@ -0,0 +1,93 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.48.5.
.TH MUSCLE "1" "January 2022" "muscle 5.1" "User Commands"
.SH NAME
muscle \- Multiple alignment program of protein sequences
.SH DESCRIPTION
MUSCLE is a multiple alignment program for protein sequences. MUSCLE
stands for multiple sequence comparison by log-expectation. In the
authors tests, MUSCLE achieved the highest scores of all tested
programs on several alignment accuracy benchmarks, and is also one of
the fastest programs out there.
.SH USAGE
.SS "Align FASTA input, write aligned FASTA (AFA) output:"
.IP
muscle \fB\-align\fR input.fa \fB\-output\fR aln.afa
.PP
Align large input using Super5 algorithm if \fB\-align\fR is too expensive,
typically needed with more than a few hundred sequences:
.IP
muscle \fB\-super5\fR input.fa \fB\-output\fR aln.afa
.SS "Single replicate alignment:"
.IP
muscle \fB\-align\fR input.fa \fB\-perm\fR PERM \fB\-perturb\fR SEED \fB\-output\fR aln.afa
muscle \fB\-super5\fR input.fa \fB\-perm\fR PERM \fB\-perturb\fR SEED \fB\-output\fR aln.afa
.IP
PERM is guide tree permutation none, abc, acb, bca (default none).
SEED is perturbation seed 0, 1, 2... (default 0 = don't perturb).
.PP
Ensemble of replicate alignments, output in Ensemble FASTA (EFA) format,
EFA has one aligned FASTA for each replicate with header line "<PERM.SEED":
.IP
muscle \fB\-align\fR input.fa \fB\-stratified\fR \fB\-output\fR stratified_ensemble.efa
muscle \fB\-align\fR input.fa \fB\-diversified\fR \fB\-output\fR diversified_ensemble.afa
.HP
\fB\-replicates\fR N
.IP
Number of replicates, defaults 4, 100, 100 for stratified,
.IP
diversified, resampled. With \fB\-stratified\fR there is one
replicate per guide tree permutation, total is 4 x N.
.PP
Generate resampled ensemble from existing ensemble by sampling columns
with replacement:
.IP
muscle \fB\-resample\fR ensemble.efa \fB\-output\fR resampled.efa
.HP
\fB\-maxgapfract\fR F
.IP
Maximum fraction of gaps in a column (F=0..1, default 0.5).
.HP
\fB\-minconf\fR CC
.IP
Minimum column confidence (CC=0..1, default 0.5).
.PP
If ensemble output filename has @, then one FASTA file is generated
for each replicate where @ is replaced by perm.s, otherwise all replicates
are written to one EFA file.
.SS "Calculate disperson of an ensemble:"
.IP
muscle \fB\-disperse\fR ensemble.efa
.SS "Extract replicate with highest total CC (diversified input recommended):"
.IP
muscle \fB\-maxcc\fR ensemble.efa \fB\-output\fR maxcc.afa
.SS "Extract aligned FASTA files from EFA file:"
.IP
muscle \fB\-efa_explode\fR ensemble.efa
.SS "Convert FASTA to EFA, input has one filename per line:"
.IP
muscle \fB\-fa2efa\fR filenames.txt \fB\-output\fR ensemble.efa
.PP
Update ensemble by adding two sequences of digits to each replicate, digits
are column confidence (CC) values, e.g. "73" means CC=0.73, "++" is CC=1.0:
.IP
muscle \fB\-addconfseqs\fR ensemble.efa \fB\-output\fR ensemble_cc.efa
.PP
Calculate letter confidence (LC) values, \fB\-ref\fR specifies the alignment to
compare against the ensemble (e.g. from \fB\-maxcc\fR), output is in aligned
FASTA format with LC values 0, 1 ... 9 instead of letters:
.IP
muscle \fB\-letterconf\fR ensemble.efa \fB\-ref\fR aln.afa \fB\-output\fR letterconf.afa
.HP
\fB\-html\fR aln.html
.IP
Alignment colored by LC in HTML format.
.HP
\fB\-jalview\fR aln.features
.IP
Jalview feature file with LC values and colors.
.SS "More documentation at:"
.IP
https://drive5.com/muscle
.SH AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and
can be used for any other usage of the program.

View file

@ -0,0 +1,118 @@
#!/bin/bash
# Slackware build script for muscle5
# Copyright 2022 Petar Petrov slackalaxy@gmail.com
# All rights reserved.
#
# Redistribution and use of this script, with or without modification, is
# permitted provided that the following conditions are met:
#
# 1. Redistributions of this script must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY EXPRESS OR IMPLIED
# WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
# EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
cd $(dirname $0) ; CWD=$(pwd)
PRGNAM=muscle5
VERSION=${VERSION:-5.1}
BUILD=${BUILD:-1}
TAG=${TAG:-_SBo}
PKGTYPE=${PKGTYPE:-tgz}
SRCNAM=muscle
if [ -z "$ARCH" ]; then
case "$( uname -m )" in
i?86) ARCH=i586 ;;
arm*) ARCH=arm ;;
*) ARCH=$( uname -m ) ;;
esac
fi
# If the variable PRINT_PACKAGE_NAME is set, then this script will report what
# the name of the created package would be, and then exit. This information
# could be useful to other scripts.
if [ ! -z "${PRINT_PACKAGE_NAME}" ]; then
echo "$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.$PKGTYPE"
exit 0
fi
TMP=${TMP:-/tmp/SBo}
PKG=$TMP/package-$PRGNAM
OUTPUT=${OUTPUT:-/tmp}
if [ "$ARCH" = "i586" ]; then
SLKCFLAGS="-O2 -march=i586 -mtune=i686"
LIBDIRSUFFIX=""
elif [ "$ARCH" = "i686" ]; then
SLKCFLAGS="-O2 -march=i686 -mtune=i686"
LIBDIRSUFFIX=""
elif [ "$ARCH" = "x86_64" ]; then
SLKCFLAGS="-O2 -fPIC"
LIBDIRSUFFIX="64"
else
SLKCFLAGS="-O2"
LIBDIRSUFFIX=""
fi
set -e
rm -rf $PKG
mkdir -p $TMP $PKG $OUTPUT
cd $TMP
rm -rf $SRCNAM-$VERSION
tar xvf $CWD/$SRCNAM-$VERSION.tar.gz
cd $SRCNAM-$VERSION
chown -R root:root .
find -L . \
\( -perm 777 -o -perm 775 -o -perm 750 -o -perm 711 -o -perm 555 \
-o -perm 511 \) -exec chmod 755 {} \; -o \
\( -perm 666 -o -perm 664 -o -perm 640 -o -perm 600 -o -perm 444 \
-o -perm 440 -o -perm 400 \) -exec chmod 644 {} \;
cd src
# do not create static executable
sed -i "s:LDFLAGS += -static:#LDFLAGS += -static:" Makefile
make CFLAGS="$SLKCFLAGS" \
CXXFLAGS="$SLKCFLAGS"
install -D -m755 Linux/$SRCNAM $PKG/usr/bin/$PRGNAM
cd ..
# Thanks to Debian for the man page
mkdir -p $PKG/usr/man/man1
cp $CWD/$PRGNAM.1 $PKG/usr/man/man1/$PRGNAM.1
# The Makefile strips the binary...
#find $PKG -print0 | xargs -0 file | grep -e "executable" -e "shared object" | grep ELF \
# | cut -f 1 -d : | xargs strip --strip-unneeded 2> /dev/null || true
find $PKG/usr/man -type f -exec gzip -9 {} \;
for i in $( find $PKG/usr/man -type l ) ; do ln -s $( readlink $i ).gz $i.gz ; rm $i ; done
mkdir -p $PKG/usr/doc/$PRGNAM-$VERSION
cp -a \
CONTRIBUTING.md LICENSE README.md \
$PKG/usr/doc/$PRGNAM-$VERSION
cat $CWD/$PRGNAM.SlackBuild > $PKG/usr/doc/$PRGNAM-$VERSION/$PRGNAM.SlackBuild
cat $CWD/References > $PKG/usr/doc/$PRGNAM-$VERSION/References
mkdir -p $PKG/install
cat $CWD/slack-desc > $PKG/install/slack-desc
cd $PKG
/sbin/makepkg -l y -c n $OUTPUT/$PRGNAM-$VERSION-$ARCH-$BUILD$TAG.$PKGTYPE

View file

@ -0,0 +1,10 @@
PRGNAM="muscle5"
VERSION="5.1"
HOMEPAGE="https://github.com/rcedgar/muscle"
DOWNLOAD="https://github.com/rcedgar/muscle/archive/v5.1/muscle-5.1.tar.gz"
MD5SUM="99b5ef38a119994e7a8f0ea7a12b5987"
DOWNLOAD_x86_64=""
MD5SUM_x86_64=""
REQUIRES=""
MAINTAINER="Petar Petrov"
EMAIL="slackalaxy@gmail.com"

View file

@ -0,0 +1,19 @@
# HOW TO EDIT THIS FILE:
# The "handy ruler" below makes it easier to edit a package description.
# Line up the first '|' above the ':' following the base package name, and
# the '|' on the right side marks the last column you can put a character in.
# You must make exactly 11 lines for the formatting to be correct. It's also
# customary to leave one space after the ':' except on otherwise blank lines.
|-----handy-ruler------------------------------------------------------|
muscle5: muscle5 (MUSCLE 5: Next-generation MUSCLE)
muscle5:
muscle5: Muscle v5 is a major re-write of MUSCLE based on new algorithms.
muscle5: Compared to previous versions, Muscle v5 is much more accurate,
muscle5: faster, and scales to much larger datasets.
muscle5:
muscle5: https://drive5.com/muscle5/
muscle5: https://drive5.com/muscle5/manual/
muscle5:
muscle5:
muscle5: